Page 1 of 2

check_nrpe works from CLI, fails from server with timeout

Posted: Tue Jun 18, 2013 11:06 am
by ChristopherSchultz
Today I saw that one of my services was giving an error with the message: "CHECK_NRPE: Socket timeout after 10 seconds."

I figured the service was down so I started checking. The service was up, so Nagios was making a mistake. So I went to the command line on the Nagios server (the one making the check_nrpe call, not the server being probed) and did this:

$ time /usr/lib/nagios/plugins/check_nrpe -H my.hostname -c check_my_nrpe_service

PING OK - Packet loss = 0%, RTA = 88.35 ms|rta=88.345001ms;100.000000;1000.000000;0.000000 pl=0%;10;10;0

real 0m4.129s
user 0m0.008s
sys 0m0.000s

(The "service" to be checked is basically running check_ping).

So, the probed server responds within 5 seconds, but check_nrpe complains about a 10-second timeout.

I have other services on this same server being checked via NRPE (e.g. system load, user load, disk space, etc.) and they all seem to work without a problem.

I searched around and the only promising lead was a badly-cached IP address lookup (which *has* happened to me when configuring iptables and a host's IP address changes), but I double-checked the hostname in the monitor's config file (it's correct), DNS resolves correctly, and I have restarted Nagios entirely just in case there was an incorrect cached DNS lookup. No change in behavior.

Any suggestions?

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Tue Jun 18, 2013 11:46 am
by abrist
Any reason why you are using check_nrpe to do a ping check? (Are you checking a separate network node?)

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Tue Jun 18, 2013 11:53 am
by ChristopherSchultz
Yes, I'm using check_ping from the remote host because I have to check to see whether a VPN tunnel is available from that host. I can't check it from anywhere else.

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Tue Jun 18, 2013 1:51 pm
by abrist
Try to increase the timeout, just in case the scheduler is a bit behind, or the server is under load.
Was this check working at one point? Or has it been failing since deployment?

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Thu Jun 20, 2013 10:58 am
by ChristopherSchultz
It /was/ working for a while. I have tried increasing the timeout, but I may be doing it incorrectly:

define command {
command_name check_nrpe_with_timeout
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t $ARG2$
}

define service {
use local-service
host_name hostname
service_description VPN:[Client Name]
check_command check_nrpe_with_timeout!check_VPN_[client_name]!30
}

I still get this error:
CHECK_NRPE: Socket timeout after 10 seconds.

I would have expected "socket timeout after 30 seconds" when specifying the timeout. I definitely restarted Nagios after making those changes, and I have only one Nagios server running -- no intermediaries).

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Thu Jun 20, 2013 1:03 pm
by abrist
You may have the directive:

Code: Select all

command_timeout=10
OR

Code: Select all

connection_timeout=10
declared in the remote host's nrpe.cfg

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Fri Jun 21, 2013 11:06 am
by ChristopherSchultz
This is all I have:

$ grep _timeout `find . -type f`
./conf.d/my_vpn_host.cfg: command_name check_nrpe_with_timeout
./conf.d/my_vpn_host.cfg: check_command check_nrpe_with_timeout!check_VPN_client_name!30
./nagios.cfg:service_check_timeout=60
./nagios.cfg:host_check_timeout=30
./nagios.cfg:event_handler_timeout=30
./nagios.cfg:notification_timeout=30
./nagios.cfg:ocsp_timeout=5
./nagios.cfg:perfdata_timeout=5

Any other suggestions?

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Fri Jun 21, 2013 11:35 am
by ChristopherSchultz
Whoops, I just realized that you might have meant the server being monitored -- seeing as how you suggested looking at nrpe.cfg.

I only have /etc/nagios/nrpe.cfg -- no other configuration files on the server.

$ grep _timeout nrpe.cfg
command_timeout=60
connection_timeout=300

So the 10-second timeout is still a mystery to me.

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Fri Jun 21, 2013 12:36 pm
by abrist
Also check your nagios.cfg on the core server for:

Code: Select all

service_check_timeout=10

Re: check_nrpe works from CLI, fails from server with timeou

Posted: Mon Jun 24, 2013 11:06 am
by ChristopherSchultz
You can see above that it is already set to 60:

> ./nagios.cfg:service_check_timeout=60