Timeout issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

Anything >4 produces the Plugin timeout. I even catch glimpses of the timeout when -t 4.

Code: Select all

./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 2 -t 30
CRITICAL - Plugin timed out while executing system call
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Timeout issue

Post by abrist »

I just retested the timeout_state branch. Works fine for me, though improperly specified community strings will cause the error as will specifying an invalid snmp protocol version:

Code: Select all

[root@localhost nagios-plugins]# ./plugins/check_snmp -H <ip> -C <wrong community> -o ifInUnknownProtos.1 -e 2 -t 30
CRITICAL - Plugin timed out while executing system call
[root@localhost nagios-plugins]# ./plugins/check_snmp -H <ip> -C <proper community> -o ifInUnknownProtos.1 -e 2 -t 30  -P2c
CRITICAL - Plugin timed out while executing system call
[root@localhost nagios-plugins]# ./plugins/check_snmp -H <ip> -C <proper community> -o ifInUnknownProtos.1 -e 2 -t 30 -P1
SNMP OK - 0 | IF-MIB::ifInUnknownProtos.1=0c
Could you run your check again with the verbose flag (-vvv) and post the output?

Code: Select all

./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 2 -t 30 -vvv
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

I've verified that the community and protocol version are correct. The same command to faster responding sites show no error. Here are the command, one with a 3 second timeout and one with 5 second. The plugin timeout message only appears on the -t 5.

Code: Select all

./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 3 -t 3 -vvv
/usr/bin/snmpget -Le -t 3 -r 3 -m ALL -v 1 [authpriv] 10.0.0.1:161 sysUpTime.0
External command error: Timeout: No Response from 10.0.0.1:161.

Code: Select all

 ./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 3 -t 5 -vvv
/usr/bin/snmpget -Le -t 5 -r 3 -m ALL -v 1 [authpriv] 10.0.0.1:161 sysUpTime.0
CRITICAL - Plugin timed out while executing system call
Maybe if you run the command to an IP that isn't alive with the -t 3, -t 10 or -t 30, maybe it will produce the same result I see? The end result I'm trying to achieve is the same no response message with -t 10 as with -t 3.

Command with a responding host

Code: Select all

./check_snmp -H 10.0.0.2 -C XX-o sysUpTime.0 -e 3 -t 10 -P1 -vvv
/usr/bin/snmpget -Le -t 10 -r 3 -m ALL -v 1 [authpriv] 10.0.0.2:161 sysUpTime.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (9949853) 1 day, 3:38:18.53
Processing oid 1 (line 1)
  oidname: DISMAN-EVENT-MIB::sysUpTimeInstance
  response: Timeticks: (9949853) 1 day, 3:38:18.53
SNMP OK - Timeticks: (9949853) 1 day, 3:38:18.53 |
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Timeout issue

Post by abrist »

You are seeing two different timeout errors. One is the generic plugin timeout, and the other is the runcmd timeout. If your retries and timeout are really close, you may see this behavior. I will look into creating a bit more room for the external command to complete.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

Okay, I've still have been troubleshooting this (when time permits). I've went back to plugins version 1.4.16 and Nagios v3.5.1. This plugin version does *NOT* produce the system call timeout message with the high timeout values. Plugins Version 1.5 does. So looks likes something broke? between 1.4.16 and 1.5.

Code: Select all

[root@nagi01 plugins]# pwd
/root/nagios-plugins-1.4.16/plugins
[root@nagi01 plugins]# ./check_snmp -e 2 -t 10 xx.xx.xx.xx -C X----X-o sysUpTime.0
External command error: Timeout: No Response from xx.xx.xx.xx:161.

Code: Select all

[root@nagi01 plugins]# pwd
/root/nagios-plugins-1.5/plugins
[root@nagi01 plugins]# ./check_snmp -e 2 -t 10 xx.xx.xx.xx -C X----X -o sysUpTime.0
CRITICAL - Plugin timed out while executing system call
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Timeout issue

Post by abrist »

Can you try adding an additional second to the alarm() in the plugin from the timeout_state branch?
Edit:

Code: Select all

plugins/check_snmp.c
Change line #344 from:

Code: Select all

alarm(timeout_interval + 1);
To:

Code: Select all

alarm(timeout_interval + 2);
And then recompile and test.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

In which version? 2.0.3?

Line #344 in 2.0.3 =

Code: Select all

alarm(0);
I can't seem to locate

Code: Select all

alarm(timeout_interval + 1);
in the file.

EDIT: Okay I looked back on the message board and figured it out.
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Timeout issue

Post by cmerchant »

Thanks for the update. We'll leave this thread open for now.
phobbs
Posts: 7
Joined: Tue Oct 29, 2013 12:19 pm

Re: Timeout issue

Post by phobbs »

Were you able to find a solution to this yet?
I ran across this problem today when a network interruption caused ~250 hosts to become unavailable and around 1300 SNMP checks to go critical at the same time. Alert messages spammed the mail server, the mysql database filled up the partition and crashed Nagios, basically a huge mess that took me all day to clean up. I'd like to make sure this kind of thing won't become a common occurrence.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Timeout issue

Post by abrist »

phobbs wrote:I ran across this problem today when a network interruption caused ~250 hosts to become unavailable and around 1300 SNMP checks to go critical at the same time.
Could you let us know how this relates to a difference in status output text? It sound like you just had a nasty network outage. The issues here with check_snmp are relating to the text output when a plugin times out, but the state should stay the same . . . .
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked