NRPE check not working properly

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
pratik.patel
Posts: 77
Joined: Wed Apr 19, 2017 10:51 am

NRPE check not working properly

Post by pratik.patel »

NRPE check is flipping. Shows expected output for a while and then shows
server1) CHECK_NRPE: Socket timeout after 10 seconds.
server2) NRPE check failed. Exit code: 2
Creating up/down alerts.


View Availability Report For This Service:::;
1)
017-09-04 02:21:39 2017-09-04 03:03:18 0d 0h 41m 39s SERVICE CRITICAL (HARD) CHECK_NRPE: Socket timeout after 10 seconds.
2017-09-04 03:03:18 2017-09-04 03:35:03 0d 0h 31m 45s SERVICE OK (HARD) All disks in Write Back mode
2017-09-04 03:35:03 2017-09-04 04:49:38 0d 1h 14m 35s SERVICE CRITICAL (HARD) CHECK_NRPE: Socket timeout after 10 seconds.
2017-09-04 04:49:38 2017-09-04 05:07:13 0d 0h 17m 35s SERVICE OK (HARD) All disks in Write Back mode
2017-09-04 05:07:13 2017-09-04 05:49:34 0d 0h 42m 21s SERVICE CRITICAL (HARD) CHECK_NRPE: Socket timeout after 10 seconds.
2017-09-04 05:49:34 2017-09-04 06:13:54 0d 0h 24m 20s SERVICE OK (HARD) All disks in Write Back mode
2017-09-04 06:13:54 2017-09-04 07:50:23 0d 1h 36m 29s SERVICE CRITICAL (HARD) CHECK_NRPE: Socket timeout after 10 seconds.
2017-09-04 07:50:23 2017-09-04 08:14:32 0d 0h 24m 9s SERVICE OK (HARD) All disks in Write Back mode
2017-09-04 08:14:32 2017-09-04 10:09:33 0d 1h 55m 1s SERVICE CRITICAL (HARD) CHECK_NRPE: Socket timeout after 10 seconds.
2017-09-04 10:44:19 2017-09-04 11:06:51 0d 0h 22m 32s SERVICE OK (HARD) All disks in Write Back mode
2017-09-04 11:20:49 2017-09-04 11:28:53 0d 0h 8m 4s+ SERVICE CRITICAL (HARD) CHECK_NRPE: Socket timeout after 10 seconds.

2)
017-09-04 00:00:00 2017-09-04 00:20:27 0d 0h 20m 27s SERVICE CRITICAL (HARD) CRITICAL: NRPE check failed. Exit code: 2
2017-09-04 00:20:27 2017-09-04 00:42:07 0d 0h 21m 40s SERVICE OK (HARD) OK Sites Live = 5, Threads = 1
2017-09-04 00:42:07 2017-09-04 03:45:23 0d 3h 3m 16s SERVICE CRITICAL (HARD) CRITICAL: NRPE check failed. Exit code: 2
2017-09-04 03:45:23 2017-09-04 05:24:03 0d 1h 38m 40s SERVICE OK (HARD) OK Sites Live = 5, Threads = 1
2017-09-04 05:24:03 2017-09-04 09:10:25 0d 3h 46m 22s SERVICE CRITICAL (HARD) CRITICAL: NRPE check failed. Exit code: 2
2017-09-04 09:10:25 2017-09-04 09:33:02 0d 0h 22m 37s SERVICE OK (HARD) OK Sites Live = 5, Threads = 1
2017-09-04 09:33:02 2017-09-04 10:09:33 0d 0h 36m 31s SERVICE CRITICAL (HARD) CRITICAL: NRPE check failed. Exit code: 2
2017-09-04 10:16:56 2017-09-04 10:40:59 0d 0h 24m 3s SERVICE OK (HARD) OK Sites Live = 5, Threads = 1
2017-09-04 11:18:49 2017-09-04 11:28:39 0d 0h 9m 50s+ SERVICE CRITICAL (HARD) CRITICAL: NRPE check failed. Exit code: 2
dwasswa

Re: NRPE check not working properly

Post by dwasswa »

“CRITICAL – Socket timeout after 10 seconds” is a false positive alarm and means, that Nagios failed to get the reply from the host being monitored in certain amount of time.

You can fix this by increasing the “Socket timeout” value from the default 10 seconds to anything higher.
You can increase the timeout on the check, but you will have to alter the check in Nagios and the command and connection timeout in the nrpe.cfg file on the remote host

Code: Select all

/usr/local/nagios/etc/nrpe.cfg
Command line:
In the Command Line, change -t xx to a higher value see below...

Code: Select all

$USER1$/check_nrpe -H $HOSTADDRESS$ -t 10 -c $ARG1$ $ARG1$
You may also find that certain plugins also have their own timeout argument, if this does exist you would need to define your NRPE command to also take this into account.

if this doesn't resolve the issue it will assist us in troubleshooting.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: NRPE check not working properly

Post by dwhitfield »

Thanks @Derick Wasswa!

Since this is two separate servers, this should probably be two separate tickets. I am curious though, do you mean this is two separate Core servers or two separate remote hosts?

Could you please give us the full command line for this check or checks?
Locked