Page 1 of 1

(Service Check Timed Out)

Posted: Wed Aug 06, 2014 12:35 pm
by nyoung
I have an on-prem instance of Nagios XI in northern california. I am monitoring a debian instance in AWS east. I am getting intermittent errors for service checks. Initially they were "CHECK_NRPE: Socket timeout after 30 seconds" errors. This happened every few hours. When I updated the command from "-t 30" to "-t 60", this changed the error message to (Service Check Timed Out). I think this is a latency issue. What can I do to prevent false positives?

Re: (Service Check Timed Out)

Posted: Wed Aug 06, 2014 3:25 pm
by lmiltchev
If NRPE is running as a standalone daemon on the client machine, make sure you have the Nagios XI server's IP address added to the nrpe.cfg:

Code: Select all

allowed_hosts=127.0.0.1,<nagios server ip>
If NRPE is running under xinetd, make sure you add the Nagios XI server's IP to the "/etc/xinetd.d/nrpe":

Code: Select all

only_from = 127.0.0.1 <Nagios server ip>
Restart the daemon/xinetd so that changes can take effect.

What is the output of the following command, ran on the Nagios XI server?

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H <client ip>

Re: (Service Check Timed Out)

Posted: Wed Aug 06, 2014 3:41 pm
by nyoung
Allowed_hosts is fine as the check works 90% of the time. I dont need to adjust that. My question is more geared towards increasing the timeout to account for potential network latency issues.

Re: (Service Check Timed Out)

Posted: Wed Aug 06, 2014 3:49 pm
by lmiltchev
It's not going to be enough to change the timeout value on the server side of things but you will also need to change it on the client side. Read section IV in this document:

http://assets.nagios.com/downloads/nagi ... utions.pdf

Re: (Service Check Timed Out)

Posted: Wed Aug 06, 2014 3:53 pm
by nyoung
On the server side, it is sufficient to change that check_nrpe -t value on the command? Or is there another setting that needs to be updated as well.

I will take a look at that attachment. Thanks.

Re: (Service Check Timed Out)

Posted: Wed Aug 06, 2014 4:21 pm
by abrist
There is a max command and connection timeout in the remote host's nrpe.cfg that may need to be altered:

Code: Select all

command_timeout=<timeout>
connection_timeout=<timeout>
Connection timeout should be a little larger than the command timeout. Additionally, do not forget to restart the daemon after making these changes:

Code: Select all

service xinetd restart
Or:

Code: Select all

service nrpe restart