Page 1 of 1

Memory status of C4510 Catalyst switch

Posted: Wed Dec 31, 2014 1:14 am
by phyo
I have a problem while i monitor the memory status of C4510 Catalyst Switch.
I have attached the plugin, the command, service and output(ok/return code of 255).
The plugin
https://db.tt/zHqDHMY1
The command
Image
The service
Image
Service OK state after click schedule an immediate check.
Image
Service fail (Return code of 255 is out of bounds) after a few minutes.
Image
I don't know what wrong with this service.
Let me know when you found the error.

Thanks.

Re: Memory status of C4510 Catalyst switch

Posted: Fri Jan 02, 2015 2:11 am
by Box293
When you execute this check at the CLI of your Nagios XI host, what are you getting? If it works the first time, what happens when you try it multiple times again?

From your screenshots you would be executing:

Code: Select all

su nagios
/usr/local/nagios/libexec/check_c4510_mem.pl -H C4310-VSS -C nagios -2 -T mem -w 70% -c 80% -f
Replace C4310-VSS with the IP Address of that device.

Re: Memory status of C4510 Catalyst switch

Posted: Sun Jan 04, 2015 8:21 am
by phyo
Box293 wrote:When you execute this check at the CLI of your Nagios XI host, what are you getting? If it works the first time, what happens when you try it multiple times again?
From your screenshots you would be executing:

Code: Select all

su nagios
/usr/local/nagios/libexec/check_c4510_mem.pl -H C4310-VSS -C nagios -2 -T mem -w 70% -c 80% -f
Replace C4310-VSS with the IP Address of that device.
Here is the output of this check from CLI.
[root@nagios01 ~]$ /usr/local/nagios/libexec/check_c4510_mem.pl -H C4510-VSS -C nagios -2 -T mem -w 70% -c 80% -f
Memory : used = 890 MB, free = 1059 MB, kernel reserve = 98 MB, utilization = 45 % : OK | utilization=45%;70;80
[root@nagios01 ~]$
It is working when I execute from CLI, then I created the service for memory usage. after that i just saw as like as the error picture from my first post. (return codes of 255 is out of bounds).

Re: Memory status of C4510 Catalyst switch

Posted: Sun Jan 04, 2015 6:16 pm
by Box293
Does it always have "return code of 255 is out of bounds" in the GUI. Does it ever work apart from the first time?

I did notice the script has a hard coded:
my $TIMEOUT = 15;
Perhaps increase that to 30 and see it that helps. You'll be able to do that by adding:

Code: Select all

-t 30
to $ARG1$ in the service definition.


Apart from that you may have to turn on debugging:

Try setting the debug level on and then restart Nagios.

Code: Select all

sed -i 's/.*debug_level=.*/debug_level=16/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart
Then tail the debug file
tail -f /usr/local/nagios/var/nagios.debug

Schedule an immediate check of the service and see what appears in the debug log.

Note: to turn off debugging:

Code: Select all

sed -i 's/.*debug_level=.*/debug_level=0/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart

Re: Memory status of C4510 Catalyst switch

Posted: Sun Jan 04, 2015 9:19 pm
by phyo
After debug on and run the immediate check, I got those information from debug file.

Code: Select all

[1420423977.729609] [016.0] [pid=42206] ** Handling check result for service 'Memory Usage' on host 'C4510-VSS' from 'Core Worker 42210'...
[1420423977.729648] [016.1] [pid=42206] HOST: C4510-VSS, SERVICE: Memory Usage, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 255, OUTPUT: (No output on stdout) stderr: Argument "noSuchInstance" isn't numeric in division (/) at /usr/local/nagios/libexec/check_c4510_mem.pl line 307.
Argument "noSuchInstance" isn't numeric in division (/) at /usr/local/nagios/libexec/check_c4510_mem.pl line 308.
Argument "noSuchInstance" isn't numeric in division (/) at /usr/local/nagios/libexec/check_c4510_mem.pl line 309.
Illegal division by zero at /usr/local/nagios/libexec/check_c4510_mem.pl line 310.
This information is useful?

Re: Memory status of C4510 Catalyst switch

Posted: Sun Jan 04, 2015 9:30 pm
by Box293
Thats exactly what we are after.

Does the debug log also show the command that was executed as I want to look at that as well.

Basically what is happening is that the plugin does not get a response from these OID's:
1.3.6.1.4.1.9.9.109.1.1.1.1.12.5000
1.3.6.1.4.1.9.9.109.1.1.1.1.13.5000
1.3.6.1.4.1.9.9.109.1.1.1.1.14.5000

And the plugin was not coded to detect if this is happening (thats why it's reporting a return code of 255).

I suspect, that if the plugin always works from the command line and not the GUI there is something different between the two (like a character not being escaped properly) (or perhaps the service or command definition is incorrect). Although that does not explain why it works the first time from the GUI and then fails from that point on.

Does it always have "return code of 255 is out of bounds" in the GUI. Does it ever work apart from the first time?

Re: Memory status of C4510 Catalyst switch

Posted: Mon Jan 05, 2015 1:20 am
by phyo
Box293 wrote:Does it always have "return code of 255 is out of bounds" in the GUI. Does it ever work apart from the first time?
"return code of 255 is out of bounds" is not always show in the GUI. But it shows frequently.

Re: Memory status of C4510 Catalyst switch

Posted: Mon Jan 05, 2015 1:50 am
by Box293
Then it sounds like the problem lies with the remote device not sending back the requested information from the OIDs.

I would start looking into the logs on the remote device to find out why.

Also, I would refer to some cisco documentation on this device, SNMP and the OID's I've previously mentioned. It may be normal for it not to return output.

Finally, you should correct the perl script to detect if the object exists/valid before being used. Specifically lines 307,308,309,310.