Page 1 of 2

Incorrect Status Reporting

Posted: Mon May 21, 2012 10:33 am
by buee
I have a strange problem this morning. I have 3 TrippLite battery backups out in the field that I monitor from Nagios via SNMP for their electrical status. This morning, all 3 of them entered critical status for no reason:

Code: Select all

***** Nagios *****

Notification Type: PROBLEM

Service: Battery Status
Host: germanvalley_bbu
Address: 10.0.4.2
State: There has been a change in power status

Date/Time: Mon May 21 09:34:34 CDT 2012

Additional Info:

SNMP CRITICAL - *3*
The SNMP is supposed to return "3".

Here is the command definition:

Code: Select all

# Custom SNMP - BBU Run Status
define command{
        command_name    check_bbu
        command_line    /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C public -P 2c -s 3 -w $ARG1$ -c $ARG2$
        }
And here is the service definition:

Code: Select all

define service{
        use                     generic-service
        host_name               German Valley BBU
        service_description     Battery Status
        check_command           check_bbu!5!2
        contact_groups          palspower
        }
When I run `/usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C public -P 2c -s 3` (replacing $HOSTNAME$ with the correct IP of course), it returns OK. Can anyone help me out on this?

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 11:08 am
by nscott
Well its certainly strange that it changed status. However, are you sure that it is returning EXACTLY 3? Can you run the plugin with those arguments and make sure that its not returning extra data beyond the expected 3?

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 11:23 am
by buee
nscott wrote:Well its certainly strange that it changed status. However, are you sure that it is returning EXACTLY 3? Can you run the plugin with those arguments and make sure that its not returning extra data beyond the expected 3?
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s =3
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s 3
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s "3"
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3

So perhaps adding quotes around the 3 would help? It's weird that this just randomly started happening today though.

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 11:34 am
by buee
FYI adding the quotes did not help.

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 11:40 am
by nscott
Yeah, I don't know the device would being returning *3* at this point but you'll definitely have to add something that will account for it doing that. It looks like its alternating between 3 and *3*?

The reason this happened:

Code: Select all

root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
Is because those * got expanded, can you try this:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"

But perhaps, depending on the datatype (if the this is actually an integer datatype) you could use

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3

This last one would be ideal if the OID is of Integer type.

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 12:06 pm
by buee
nscott wrote:Yeah, I don't know the device would being returning *3* at this point but you'll definitely have to add something that will account for it doing that. It looks like its alternating between 3 and *3*?

The reason this happened:

Code: Select all

root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
Is because those * got expanded, can you try this:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"

But perhaps, depending on the datatype (if the this is actually an integer datatype) you could use

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3

This last one would be ideal if the OID is of Integer type.
These:

Code: Select all

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
While working from the command line did not bring the service in to an OK state.

I found this log entry if it helps:

Code: Select all

[1337615276] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;SOFT;2;SNMP CRITICAL - *3*
[1337615336] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;SOFT;3;SNMP CRITICAL - *3*
[1337615396] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;HARD;4;SNMP CRITICAL - *3*

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 12:13 pm
by nscott
What happened when you ran this one:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 12:35 pm
by buee
nscott wrote:What happened when you ran this one:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
It returns OK from command line but doesn't change the status in Nagios.

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 12:47 pm
by nscott
How are you changing it in Nagios? Keep in mind the actual flag is being changed from -s to -c.

Re: Incorrect Status Reporting

Posted: Mon May 21, 2012 12:56 pm
by buee
nscott wrote:How are you changing it in Nagios? Keep in mind the actual flag is being changed from -s to -c.
Changed it in the commands.cfg file:

Code: Select all

# Custom SNMP - BBU Run Status
define command{
        command_name    check_bbu
        command_line    /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3 -w $ARG1$ -c $ARG2$
        }