Page 1 of 4

Timeout issue

Posted: Mon Aug 18, 2014 3:55 pm
by cg28oh
I have a nagios 4.0.8 server running on CentOS 6.5. I'm using it to monitor some VSAT devices that have a latency of 600ms to 3000ms. I see all kinds of snmp timeout errors that clear upon the next poll. Using tcpdump, nagios is sending the snmpget very quickly without waiting for the default timeout of the plugin 10 seconds. I'm having no luck in finding the configuration to make it wait for the timeout to exprire before a retey is done. It's also generating some extra traffic when the VSAT device sends replies to all of the snmpget requests.

16:48:04.343800 IP xx.xx.xx.49.32814 > xx.xx.xx.1.snmp: C=NSSNET GetRequest(36) .x.x.x.x.x.0
16:48:05.345177 IP xx.xx.xx.49.32814 > xx.xx.xx.1.snmp: C=NSSNET GetRequest(36) .x.x.x.x.x.0
16:48:05.919054 IP xx.xx.xx.1.snmp > xx.xx.xx.49.32814: C=NSSNET GetResponse(37) .x.x.x.x.x.0=96
16:48:06.191178 IP xx.xx.xx.1.snmp > xx.xx.xx.49.32814: C=NSSNET GetResponse(37) .x.x.x.x.x.0=96
16:48:06.191192 IP xx.xx.xx.49 > xx.xx.xx.1: ICMP 192.168.32.49 udp port 32814 unreachable, length 88

Re: Timeout issue

Posted: Tue Aug 19, 2014 11:38 am
by cg28oh
Some additional details.

When I modify the check_snmp in the commands.cfg to $USER1$/check_snmp -t 10 -H $HOSTADDRESS$ $ARG1$ which corrects the issue, but causes a new one when hosts are down. Now I get an error "CRITICAL - Plugin timed out while executing system call" when I set the -t >4. If it 4 or less the error is "External command error: Timeout: No Response from xx.xx.xx.1" I've set the nagios.cfg service_check_timeout=60 and it has no effect.

Re: Timeout issue

Posted: Tue Aug 19, 2014 11:05 pm
by Box293
Change -t 10 to -t 60

Code: Select all

$USER1$/check_snmp -t 60 -H $HOSTADDRESS$ $ARG1$
Does this help?

Re: Timeout issue

Posted: Wed Aug 20, 2014 8:58 am
by cg28oh
Now it says Plugin timeout 60.01. I reviewed the source code for the plugins and there seems to be an issue in the runcmd.c at runcmd_timeout_alarm_handler. It appears that that timeout is shorter then the timeout specified on the check_command.

runcmd.c system call timeout

Posted: Wed Aug 20, 2014 9:07 am
by cg28oh
I've run into an issue with setting the check_snmp timeout to 10 seconds and the system call is timing out before the check_command is completed. I'm guessing that the system call timeout is around 4 seconds. I can set the check_snmp timeout to 4 and not see the error "CRITICAL - Plugin timed out while executing system call". Anything beyond that and it will error. I'm not able to see (not much experience with code) if there's a way to increase that timeout to say 60 seconds to give the plugin time to finish. I think the issue is around the runcmd_timeout_alarm_handler.
Mod Note: Merged two topics into one

Re: Timeout issue

Posted: Wed Aug 20, 2014 8:38 pm
by Box293
Just to go back to basics, without involving nagios and command definitions ... what results do you get from testing at the command line.

Show the full command and it's output in a code block please.

Re: Timeout issue

Posted: Thu Aug 21, 2014 3:14 pm
by cg28oh

Code: Select all

./check_snmp -t 10 -H xx.xx.xx.1 -C XXX -o sysUpTime.0
CRITICAL - Plugin timed out while executing system call

./check_snmp -t 4 -H xx.xx.xx.1 -C XXX -o sysUpTime.0
External command error: Timeout: No Response from xx.xx.xx.1:161.

Re: Timeout issue

Posted: Thu Aug 21, 2014 8:30 pm
by Box293
What output do you get when running in verbose mode?

Code: Select all

./check_snmp -t 10 -H xx.xx.xx.1 -C XXX -o sysUpTime.0 -vvv

./check_snmp -t 4 -H xx.xx.xx.1 -C XXX -o sysUpTime.0 -vvv

./check_snmp -t 60 -H xx.xx.xx.1 -C XXX -o sysUpTime.0 -vvv
For example, I get:

Code: Select all

./check_snmp -t 60 -H 10.25.4.1 -C XXX -o sysUpTime.0 -vvv

/usr/bin/snmpget -Le -t 60 -r 5 -m ALL -v 1 [authpriv] 10.25.4.1:161 sysUpTime.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (483588800) 55 days, 23:18:08.00
Processing oid 1 (line 1)
  oidname: DISMAN-EVENT-MIB::sysUpTimeInstance
  response: Timeticks: (483588800) 55 days, 23:18:08.00
SNMP OK - Timeticks: (483588800) 55 days, 23:18:08.00 | 
Also, what happens when you do a port scan of the destination device?

For example, I get:

Code: Select all

nmap -s U -p 161 10.25.4.1

Starting Nmap 5.51 ( http://nmap.org ) at 2014-08-22 11:28 EST
Nmap scan report for 10.25.4.1
Host is up (0.0035s latency).
PORT    STATE SERVICE
161/udp open  snmp
MAC Address: E4:F4:C6:D3:2A:1D (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds

Re: Timeout issue

Posted: Fri Aug 22, 2014 9:29 am
by cg28oh
To clarify what my issue is, when the -t is set to 10, then all my devices that are down show the CRITICAL - Plugin timed out while executing system call on the check_snmp services. If I shorten it -t 4, then the status is UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1. Now if I set the -t 4, if a host is down, the plugin would take a total of 24 seconds execute and produce the UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1. When the -t is set to 10 and -e is set to 1, if a host is down, the plugin would take a total of 20 seconds (shorter than -t 4) execute and produce the but produces the CRITICAL - Plugin timed out while executing system call.

Code: Select all

 ./check_snmp -t 10 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 10 -r 5 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
CRITICAL - Plugin timed out while executing system call

./check_snmp -t 4 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 4 -r 5 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
External command error: Timeout: No Response from xx.xx.xx.1:161.

./check_snmp -t 60 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 60 -r 5 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
CRITICAL - Plugin timed out while executing system call

./check_snmp -t 10 -e 1 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 10 -r 1 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
CRITICAL - Plugin timed out while executing system call
When you look at the Problem summary screen, you see a bunch of Critical alarms. I would prefer to have the UKNOWN status.

-t 4
Image

-t 10 -e 1
Image

Re: Timeout issue

Posted: Fri Aug 22, 2014 3:11 pm
by sreinhardt
To clarify what my issue is, when the -t is set to 10, then all my devices that are down show the CRITICAL - Plugin timed out while executing system call on the check_snmp services.
This is because the call via snmpget is able to be sent, but your system doesn't appear to be responding within the 10 second timeout
If I shorten it -t 4, then the status is UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1.
This is because the snmpget did not have enough time to return to the check command, it is similar to the above message, but means that internally to the plugin and snmpget the time is too short.
Now if I set the -t 4, if a host is down, the plugin would take a total of 24 seconds execute and produce the UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1. When the -t is set to 10 and -e is set to 1, if a host is down, the plugin would take a total of 20 seconds (shorter than -t 4) execute and produce the but produces the CRITICAL - Plugin timed out while executing system call.
I'll have to put this on our list of things to look into if andy has not already resolved it in the timeout state change branch of nagios-plugins. We realize that some of the more complex plugins do have issues and are either resolved in that branch or will be before the next release. While we are at it, did you install from source or package, and what version are you presently on?