Timeout issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Timeout issue

Post by cg28oh »

I have a nagios 4.0.8 server running on CentOS 6.5. I'm using it to monitor some VSAT devices that have a latency of 600ms to 3000ms. I see all kinds of snmp timeout errors that clear upon the next poll. Using tcpdump, nagios is sending the snmpget very quickly without waiting for the default timeout of the plugin 10 seconds. I'm having no luck in finding the configuration to make it wait for the timeout to exprire before a retey is done. It's also generating some extra traffic when the VSAT device sends replies to all of the snmpget requests.

16:48:04.343800 IP xx.xx.xx.49.32814 > xx.xx.xx.1.snmp: C=NSSNET GetRequest(36) .x.x.x.x.x.0
16:48:05.345177 IP xx.xx.xx.49.32814 > xx.xx.xx.1.snmp: C=NSSNET GetRequest(36) .x.x.x.x.x.0
16:48:05.919054 IP xx.xx.xx.1.snmp > xx.xx.xx.49.32814: C=NSSNET GetResponse(37) .x.x.x.x.x.0=96
16:48:06.191178 IP xx.xx.xx.1.snmp > xx.xx.xx.49.32814: C=NSSNET GetResponse(37) .x.x.x.x.x.0=96
16:48:06.191192 IP xx.xx.xx.49 > xx.xx.xx.1: ICMP 192.168.32.49 udp port 32814 unreachable, length 88
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

Some additional details.

When I modify the check_snmp in the commands.cfg to $USER1$/check_snmp -t 10 -H $HOSTADDRESS$ $ARG1$ which corrects the issue, but causes a new one when hosts are down. Now I get an error "CRITICAL - Plugin timed out while executing system call" when I set the -t >4. If it 4 or less the error is "External command error: Timeout: No Response from xx.xx.xx.1" I've set the nagios.cfg service_check_timeout=60 and it has no effect.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Timeout issue

Post by Box293 »

Change -t 10 to -t 60

Code: Select all

$USER1$/check_snmp -t 60 -H $HOSTADDRESS$ $ARG1$
Does this help?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

Now it says Plugin timeout 60.01. I reviewed the source code for the plugins and there seems to be an issue in the runcmd.c at runcmd_timeout_alarm_handler. It appears that that timeout is shorter then the timeout specified on the check_command.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

runcmd.c system call timeout

Post by cg28oh »

I've run into an issue with setting the check_snmp timeout to 10 seconds and the system call is timing out before the check_command is completed. I'm guessing that the system call timeout is around 4 seconds. I can set the check_snmp timeout to 4 and not see the error "CRITICAL - Plugin timed out while executing system call". Anything beyond that and it will error. I'm not able to see (not much experience with code) if there's a way to increase that timeout to say 60 seconds to give the plugin time to finish. I think the issue is around the runcmd_timeout_alarm_handler.
Mod Note: Merged two topics into one
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Timeout issue

Post by Box293 »

Just to go back to basics, without involving nagios and command definitions ... what results do you get from testing at the command line.

Show the full command and it's output in a code block please.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

Code: Select all

./check_snmp -t 10 -H xx.xx.xx.1 -C XXX -o sysUpTime.0
CRITICAL - Plugin timed out while executing system call

./check_snmp -t 4 -H xx.xx.xx.1 -C XXX -o sysUpTime.0
External command error: Timeout: No Response from xx.xx.xx.1:161.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Timeout issue

Post by Box293 »

What output do you get when running in verbose mode?

Code: Select all

./check_snmp -t 10 -H xx.xx.xx.1 -C XXX -o sysUpTime.0 -vvv

./check_snmp -t 4 -H xx.xx.xx.1 -C XXX -o sysUpTime.0 -vvv

./check_snmp -t 60 -H xx.xx.xx.1 -C XXX -o sysUpTime.0 -vvv
For example, I get:

Code: Select all

./check_snmp -t 60 -H 10.25.4.1 -C XXX -o sysUpTime.0 -vvv

/usr/bin/snmpget -Le -t 60 -r 5 -m ALL -v 1 [authpriv] 10.25.4.1:161 sysUpTime.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (483588800) 55 days, 23:18:08.00
Processing oid 1 (line 1)
  oidname: DISMAN-EVENT-MIB::sysUpTimeInstance
  response: Timeticks: (483588800) 55 days, 23:18:08.00
SNMP OK - Timeticks: (483588800) 55 days, 23:18:08.00 | 
Also, what happens when you do a port scan of the destination device?

For example, I get:

Code: Select all

nmap -s U -p 161 10.25.4.1

Starting Nmap 5.51 ( http://nmap.org ) at 2014-08-22 11:28 EST
Nmap scan report for 10.25.4.1
Host is up (0.0035s latency).
PORT    STATE SERVICE
161/udp open  snmp
MAC Address: E4:F4:C6:D3:2A:1D (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cg28oh
Posts: 31
Joined: Mon Aug 18, 2014 9:38 am

Re: Timeout issue

Post by cg28oh »

To clarify what my issue is, when the -t is set to 10, then all my devices that are down show the CRITICAL - Plugin timed out while executing system call on the check_snmp services. If I shorten it -t 4, then the status is UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1. Now if I set the -t 4, if a host is down, the plugin would take a total of 24 seconds execute and produce the UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1. When the -t is set to 10 and -e is set to 1, if a host is down, the plugin would take a total of 20 seconds (shorter than -t 4) execute and produce the but produces the CRITICAL - Plugin timed out while executing system call.

Code: Select all

 ./check_snmp -t 10 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 10 -r 5 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
CRITICAL - Plugin timed out while executing system call

./check_snmp -t 4 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 4 -r 5 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
External command error: Timeout: No Response from xx.xx.xx.1:161.

./check_snmp -t 60 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 60 -r 5 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
CRITICAL - Plugin timed out while executing system call

./check_snmp -t 10 -e 1 -H xx.xx.xx.1 -C XXX -o sysUpTime -vvv
/usr/bin/snmpget -Le -t 10 -r 1 -m ALL -v 1 [authpriv] xx.xx.xx.1:161 sysUpTime
CRITICAL - Plugin timed out while executing system call
When you look at the Problem summary screen, you see a bunch of Critical alarms. I would prefer to have the UKNOWN status.

-t 4
Image

-t 10 -e 1
Image
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Timeout issue

Post by sreinhardt »

To clarify what my issue is, when the -t is set to 10, then all my devices that are down show the CRITICAL - Plugin timed out while executing system call on the check_snmp services.
This is because the call via snmpget is able to be sent, but your system doesn't appear to be responding within the 10 second timeout
If I shorten it -t 4, then the status is UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1.
This is because the snmpget did not have enough time to return to the check command, it is similar to the above message, but means that internally to the plugin and snmpget the time is too short.
Now if I set the -t 4, if a host is down, the plugin would take a total of 24 seconds execute and produce the UNKNOWN External command error: Timeout: No Response from xx.xx.xx.1. When the -t is set to 10 and -e is set to 1, if a host is down, the plugin would take a total of 20 seconds (shorter than -t 4) execute and produce the but produces the CRITICAL - Plugin timed out while executing system call.
I'll have to put this on our list of things to look into if andy has not already resolved it in the timeout state change branch of nagios-plugins. We realize that some of the more complex plugins do have issues and are either resolved in that branch or will be before the next release. While we are at it, did you install from source or package, and what version are you presently on?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked