Incorrect Status Reporting

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
buee
Posts: 26
Joined: Mon May 21, 2012 10:24 am

Incorrect Status Reporting

Post by buee »

I have a strange problem this morning. I have 3 TrippLite battery backups out in the field that I monitor from Nagios via SNMP for their electrical status. This morning, all 3 of them entered critical status for no reason:

Code: Select all

***** Nagios *****

Notification Type: PROBLEM

Service: Battery Status
Host: germanvalley_bbu
Address: 10.0.4.2
State: There has been a change in power status

Date/Time: Mon May 21 09:34:34 CDT 2012

Additional Info:

SNMP CRITICAL - *3*
The SNMP is supposed to return "3".

Here is the command definition:

Code: Select all

# Custom SNMP - BBU Run Status
define command{
        command_name    check_bbu
        command_line    /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C public -P 2c -s 3 -w $ARG1$ -c $ARG2$
        }
And here is the service definition:

Code: Select all

define service{
        use                     generic-service
        host_name               German Valley BBU
        service_description     Battery Status
        check_command           check_bbu!5!2
        contact_groups          palspower
        }
When I run `/usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C public -P 2c -s 3` (replacing $HOSTNAME$ with the correct IP of course), it returns OK. Can anyone help me out on this?
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Incorrect Status Reporting

Post by nscott »

Well its certainly strange that it changed status. However, are you sure that it is returning EXACTLY 3? Can you run the plugin with those arguments and make sure that its not returning extra data beyond the expected 3?
Nicholas Scott
Former Nagios employee
buee
Posts: 26
Joined: Mon May 21, 2012 10:24 am

Re: Incorrect Status Reporting

Post by buee »

nscott wrote:Well its certainly strange that it changed status. However, are you sure that it is returning EXACTLY 3? Can you run the plugin with those arguments and make sure that its not returning extra data beyond the expected 3?
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s =3
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s 3
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s "3"
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3

So perhaps adding quotes around the 3 would help? It's weird that this just randomly started happening today though.
buee
Posts: 26
Joined: Mon May 21, 2012 10:24 am

Re: Incorrect Status Reporting

Post by buee »

FYI adding the quotes did not help.
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Incorrect Status Reporting

Post by nscott »

Yeah, I don't know the device would being returning *3* at this point but you'll definitely have to add something that will account for it doing that. It looks like its alternating between 3 and *3*?

The reason this happened:

Code: Select all

root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
Is because those * got expanded, can you try this:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"

But perhaps, depending on the datatype (if the this is actually an integer datatype) you could use

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3

This last one would be ideal if the OID is of Integer type.
Nicholas Scott
Former Nagios employee
buee
Posts: 26
Joined: Mon May 21, 2012 10:24 am

Re: Incorrect Status Reporting

Post by buee »

nscott wrote:Yeah, I don't know the device would being returning *3* at this point but you'll definitely have to add something that will account for it doing that. It looks like its alternating between 3 and *3*?

The reason this happened:

Code: Select all

root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
Is because those * got expanded, can you try this:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"

But perhaps, depending on the datatype (if the this is actually an integer datatype) you could use

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3

This last one would be ideal if the OID is of Integer type.
These:

Code: Select all

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
While working from the command line did not bring the service in to an OK state.

I found this log entry if it helps:

Code: Select all

[1337615276] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;SOFT;2;SNMP CRITICAL - *3*
[1337615336] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;SOFT;3;SNMP CRITICAL - *3*
[1337615396] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;HARD;4;SNMP CRITICAL - *3*
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Incorrect Status Reporting

Post by nscott »

What happened when you ran this one:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
Nicholas Scott
Former Nagios employee
buee
Posts: 26
Joined: Mon May 21, 2012 10:24 am

Re: Incorrect Status Reporting

Post by buee »

nscott wrote:What happened when you ran this one:

/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
It returns OK from command line but doesn't change the status in Nagios.
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Incorrect Status Reporting

Post by nscott »

How are you changing it in Nagios? Keep in mind the actual flag is being changed from -s to -c.
Nicholas Scott
Former Nagios employee
buee
Posts: 26
Joined: Mon May 21, 2012 10:24 am

Re: Incorrect Status Reporting

Post by buee »

nscott wrote:How are you changing it in Nagios? Keep in mind the actual flag is being changed from -s to -c.
Changed it in the commands.cfg file:

Code: Select all

# Custom SNMP - BBU Run Status
define command{
        command_name    check_bbu
        command_line    /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3 -w $ARG1$ -c $ARG2$
        }
Locked