Page 1 of 2
Incorrect Status Reporting
Posted: Mon May 21, 2012 10:33 am
by buee
I have a strange problem this morning. I have 3 TrippLite battery backups out in the field that I monitor from Nagios via SNMP for their electrical status. This morning, all 3 of them entered critical status for no reason:
Code: Select all
***** Nagios *****
Notification Type: PROBLEM
Service: Battery Status
Host: germanvalley_bbu
Address: 10.0.4.2
State: There has been a change in power status
Date/Time: Mon May 21 09:34:34 CDT 2012
Additional Info:
SNMP CRITICAL - *3*
The SNMP is
supposed to return "3".
Here is the command definition:
Code: Select all
# Custom SNMP - BBU Run Status
define command{
command_name check_bbu
command_line /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C public -P 2c -s 3 -w $ARG1$ -c $ARG2$
}
And here is the service definition:
Code: Select all
define service{
use generic-service
host_name German Valley BBU
service_description Battery Status
check_command check_bbu!5!2
contact_groups palspower
}
When I run `/usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C public -P 2c -s 3` (replacing $HOSTNAME$ with the correct IP of course), it returns OK. Can anyone help me out on this?
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 11:08 am
by nscott
Well its certainly strange that it changed status. However, are you sure that it is returning EXACTLY 3? Can you run the plugin with those arguments and make sure that its not returning extra data beyond the expected 3?
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 11:23 am
by buee
nscott wrote:Well its certainly strange that it changed status. However, are you sure that it is returning EXACTLY 3? Can you run the plugin with those arguments and make sure that its not returning extra data beyond the expected 3?
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s =3
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s 3
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s "3"
SNMP OK - 3 | iso.3.6.1.2.1.33.1.4.1.0=3
So perhaps adding quotes around the 3 would help? It's weird that this just randomly started happening today though.
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 11:34 am
by buee
FYI adding the quotes did not help.
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 11:40 am
by nscott
Yeah, I don't know the device would being returning *3* at this point but you'll definitely have to add something that will account for it doing that. It looks like its alternating between 3 and *3*?
The reason this happened:
Code: Select all
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
Is because those * got expanded, can you try this:
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"
But perhaps, depending on the datatype (if the this is actually an integer datatype) you could use
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
This last one would be ideal if the OID is of Integer type.
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 12:06 pm
by buee
nscott wrote:Yeah, I don't know the device would being returning *3* at this point but you'll definitely have to add something that will account for it doing that. It looks like its alternating between 3 and *3*?
The reason this happened:
Code: Select all
root@monitor:~# /usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -s *3*
SNMP CRITICAL - *3* | iso.3.6.1.2.1.33.1.4.1.0=3
Is because those * got expanded, can you try this:
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"
But perhaps, depending on the datatype (if the this is actually an integer datatype) you could use
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
This last one would be ideal if the OID is of Integer type.
These:
Code: Select all
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -r "\*?3\*?"
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
While working from the command line did not bring the service in to an OK state.
I found this log entry if it helps:
Code: Select all
[1337615276] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;SOFT;2;SNMP CRITICAL - *3*
[1337615336] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;SOFT;3;SNMP CRITICAL - *3*
[1337615396] SERVICE ALERT: German Valley BBU;Battery Status;CRITICAL;HARD;4;SNMP CRITICAL - *3*
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 12:13 pm
by nscott
What happened when you ran this one:
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 12:35 pm
by buee
nscott wrote:What happened when you ran this one:
/usr/lib/nagios/plugins/check_snmp -H 10.0.4.2 -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3
It returns OK from command line but doesn't change the status in Nagios.
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 12:47 pm
by nscott
How are you changing it in Nagios? Keep in mind the actual flag is being changed from -s to -c.
Re: Incorrect Status Reporting
Posted: Mon May 21, 2012 12:56 pm
by buee
nscott wrote:How are you changing it in Nagios? Keep in mind the actual flag is being changed from -s to -c.
Changed it in the commands.cfg file:
Code: Select all
# Custom SNMP - BBU Run Status
define command{
command_name check_bbu
command_line /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.33.1.4.1.0 -C 1RC0MM -P 2c -c 3:3 -w $ARG1$ -c $ARG2$
}