Page 2 of 4

Re: Timeout issue

Posted: Sun Aug 24, 2014 11:35 pm
by cg28oh
Installed from source, Nagios Core 4.0.8 and plugins 2.0.3

Re: Timeout issue

Posted: Mon Aug 25, 2014 10:01 am
by sreinhardt
If I gave build instructions, would you be willing to pull down build and test the timeout branch, and see if that resolves the strange times in your testing? I can certainly setup internal test systems instead, but it seems like you have a pretty good setup incidentally to test this out.

Re: Timeout issue

Posted: Tue Aug 26, 2014 1:38 pm
by cg28oh
Sure thing!

Re: Timeout issue

Posted: Tue Aug 26, 2014 5:20 pm
by abrist
The old math from check_snmp timeout looks like:

Code: Select all

alarm(timeout_interval * retries + 5);
(With a default retries of 5)
As you can see, the "actual" timeout value gets very large, very quickly, essentially well exceeding what timeout you actually set.
The new code looks like:

Code: Select all

alarm(timeout_interval + 1);
(with the retries computed as a fraction of the total timeout)
To build the branch, make sure you have the necessary deps for building nagios plugins and then run the following:

Code: Select all

cd /tmp
wget https://github.com/nagios-plugins/nagios-plugins/archive/timeout_state.zip
unzip timeout_state
cd nagios-plugins-timeout_state/
./tools/setup
./configure
make
The new plugin bin should be located at:

Code: Select all

/tmp/nagios-plugins-timeout_state/plugins/check_snmp
If you wish to install all the plugins from the branch, run:

Code: Select all

cd /tmp/nagios-plugins-timeout_state
make install

Re: Timeout issue

Posted: Wed Sep 03, 2014 9:27 am
by cg28oh
Now the state is "CRITICAL - Plugin timed out while executing system call" with the default settings.

Re: Timeout issue

Posted: Fri Sep 05, 2014 10:23 am
by sreinhardt
What are the arguments you are passing the newly built binaries?

Re: Timeout issue

Posted: Tue Sep 09, 2014 6:41 am
by cg28oh
This is what I had set:

Code: Select all

define command{
        command_name    check_snmp
        command_line    $USER1$/check_snmp -e 1 -t 10 -H $HOSTADDRESS$ $ARG1$
        }
and once that produced the "System call timeout" message I tried the default setting

Code: Select all

define command{
        command_name    check_snmp
        command_line    $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
        }
which produced the same message.

Re: Timeout issue

Posted: Tue Sep 09, 2014 5:21 pm
by abrist
Does the remote device support snmp, is the firewall open, and is it listening for requests? Lets do a walk to find out:

Code: Select all

snmpwalk -c <community> -v1 <remote device ip address>
Or:

Code: Select all

snmpwalk -c <community> -v2c <remote device ip address>

Re: Timeout issue

Posted: Thu Sep 11, 2014 4:17 pm
by cg28oh
Yes they do, however they are satellite connection. Depending on the amount of sites online, the response time can range from 700ms to 8-10 seconds. Only SNMP v1 is supported.

Code: Select all

snmpget -v 1 -c X 10.0.0.1 sysUpTime.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (113517114) 13 days, 3:19:31.14

Re: Timeout issue

Posted: Fri Sep 12, 2014 2:49 pm
by abrist
As the default retries are divided by the timeout value, setting -t 10 (3 seconds or so) may not be enough. Try setting the timeout to a higher number like 30 seconds.