Page 3 of 4

Re: Timeout issue

Posted: Fri Sep 12, 2014 8:20 pm
by cg28oh
Anything >4 produces the Plugin timeout. I even catch glimpses of the timeout when -t 4.

Code: Select all

./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 2 -t 30
CRITICAL - Plugin timed out while executing system call

Re: Timeout issue

Posted: Mon Sep 15, 2014 3:45 pm
by abrist
I just retested the timeout_state branch. Works fine for me, though improperly specified community strings will cause the error as will specifying an invalid snmp protocol version:

Code: Select all

[root@localhost nagios-plugins]# ./plugins/check_snmp -H <ip> -C <wrong community> -o ifInUnknownProtos.1 -e 2 -t 30
CRITICAL - Plugin timed out while executing system call
[root@localhost nagios-plugins]# ./plugins/check_snmp -H <ip> -C <proper community> -o ifInUnknownProtos.1 -e 2 -t 30  -P2c
CRITICAL - Plugin timed out while executing system call
[root@localhost nagios-plugins]# ./plugins/check_snmp -H <ip> -C <proper community> -o ifInUnknownProtos.1 -e 2 -t 30 -P1
SNMP OK - 0 | IF-MIB::ifInUnknownProtos.1=0c
Could you run your check again with the verbose flag (-vvv) and post the output?

Code: Select all

./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 2 -t 30 -vvv

Re: Timeout issue

Posted: Mon Sep 15, 2014 10:53 pm
by cg28oh
I've verified that the community and protocol version are correct. The same command to faster responding sites show no error. Here are the command, one with a 3 second timeout and one with 5 second. The plugin timeout message only appears on the -t 5.

Code: Select all

./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 3 -t 3 -vvv
/usr/bin/snmpget -Le -t 3 -r 3 -m ALL -v 1 [authpriv] 10.0.0.1:161 sysUpTime.0
External command error: Timeout: No Response from 10.0.0.1:161.

Code: Select all

 ./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 3 -t 5 -vvv
/usr/bin/snmpget -Le -t 5 -r 3 -m ALL -v 1 [authpriv] 10.0.0.1:161 sysUpTime.0
CRITICAL - Plugin timed out while executing system call
Maybe if you run the command to an IP that isn't alive with the -t 3, -t 10 or -t 30, maybe it will produce the same result I see? The end result I'm trying to achieve is the same no response message with -t 10 as with -t 3.

Command with a responding host

Code: Select all

./check_snmp -H 10.0.0.2 -C XX-o sysUpTime.0 -e 3 -t 10 -P1 -vvv
/usr/bin/snmpget -Le -t 10 -r 3 -m ALL -v 1 [authpriv] 10.0.0.2:161 sysUpTime.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (9949853) 1 day, 3:38:18.53
Processing oid 1 (line 1)
  oidname: DISMAN-EVENT-MIB::sysUpTimeInstance
  response: Timeticks: (9949853) 1 day, 3:38:18.53
SNMP OK - Timeticks: (9949853) 1 day, 3:38:18.53 |

Re: Timeout issue

Posted: Tue Sep 16, 2014 5:01 pm
by abrist
You are seeing two different timeout errors. One is the generic plugin timeout, and the other is the runcmd timeout. If your retries and timeout are really close, you may see this behavior. I will look into creating a bit more room for the external command to complete.

Re: Timeout issue

Posted: Thu Nov 06, 2014 12:26 pm
by cg28oh
Okay, I've still have been troubleshooting this (when time permits). I've went back to plugins version 1.4.16 and Nagios v3.5.1. This plugin version does *NOT* produce the system call timeout message with the high timeout values. Plugins Version 1.5 does. So looks likes something broke? between 1.4.16 and 1.5.

Code: Select all

[root@nagi01 plugins]# pwd
/root/nagios-plugins-1.4.16/plugins
[root@nagi01 plugins]# ./check_snmp -e 2 -t 10 xx.xx.xx.xx -C X----X-o sysUpTime.0
External command error: Timeout: No Response from xx.xx.xx.xx:161.

Code: Select all

[root@nagi01 plugins]# pwd
/root/nagios-plugins-1.5/plugins
[root@nagi01 plugins]# ./check_snmp -e 2 -t 10 xx.xx.xx.xx -C X----X -o sysUpTime.0
CRITICAL - Plugin timed out while executing system call

Re: Timeout issue

Posted: Thu Nov 06, 2014 3:12 pm
by abrist
Can you try adding an additional second to the alarm() in the plugin from the timeout_state branch?
Edit:

Code: Select all

plugins/check_snmp.c
Change line #344 from:

Code: Select all

alarm(timeout_interval + 1);
To:

Code: Select all

alarm(timeout_interval + 2);
And then recompile and test.

Re: Timeout issue

Posted: Thu Nov 06, 2014 5:41 pm
by cg28oh
In which version? 2.0.3?

Line #344 in 2.0.3 =

Code: Select all

alarm(0);
I can't seem to locate

Code: Select all

alarm(timeout_interval + 1);
in the file.

EDIT: Okay I looked back on the message board and figured it out.

Re: Timeout issue

Posted: Thu Nov 06, 2014 5:53 pm
by cmerchant
Thanks for the update. We'll leave this thread open for now.

Re: Timeout issue

Posted: Fri Dec 12, 2014 9:10 pm
by phobbs
Were you able to find a solution to this yet?
I ran across this problem today when a network interruption caused ~250 hosts to become unavailable and around 1300 SNMP checks to go critical at the same time. Alert messages spammed the mail server, the mysql database filled up the partition and crashed Nagios, basically a huge mess that took me all day to clean up. I'd like to make sure this kind of thing won't become a common occurrence.

Re: Timeout issue

Posted: Mon Dec 15, 2014 12:35 pm
by abrist
phobbs wrote:I ran across this problem today when a network interruption caused ~250 hosts to become unavailable and around 1300 SNMP checks to go critical at the same time.
Could you let us know how this relates to a difference in status output text? It sound like you just had a nasty network outage. The issues here with check_snmp are relating to the text output when a plugin times out, but the state should stay the same . . . .