check_nrpe plugin gives socket timed out/service timed out

Anuraag · Post by **Anuraag** » Wed Jun 17, 2015 5:37 am

Hello,

I have strange problem with my check_nrpe plugin only on a couple of servers..
One host has 142 services and the other has around 80. Almost every service on these servers run for every 5 mins and a few for every 10-15 mins.
Most of the times a group of services go to an unkown/critical state with either the

1.CHECK_NRPE: Socket timeout after 60 seconds.
2.(Service Check Timed Out)
3.CHECK_NRPE: Error receiving data from daemon.

At the very next check interval a few become Ok, a few still remain like that and then a few might add up to the unkown/critical state.. This is specific only to the two of these servers..
At this point of time when I try executing the scripts for the failed services manually they all work fine on the remote server.. I do not understand what exactly could be the issue..
Could someone help/suggest for this issue.. As i Mentioned, this issue is only with these specific 2 hosts.. Because of this we are missing out on the times when we really have a problem..

Below is stats of my setup:

Nagios 2.10
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 10-21-2007
License: GPL

Projected scheduling information for host and service
checks is listed below. This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---------------------------
Total hosts: 127
Total scheduled hosts: 1
Host inter-check delay method: SMART
Average host check interval: 300.00 sec
Host inter-check delay: 300.00 sec
Max host check spread: 30 min
First scheduled check: Wed Jun 17 11:24:20 2015
Last scheduled check: Wed Jun 17 11:24:20 2015

SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 2942
Total scheduled services: 2942
Service inter-check delay method: SMART
Average service check interval: 825.21 sec
Inter-check delay: 0.28 sec
Interleave factor method: SMART
Average services per host: 23.17
Service interleave factor: 24
Max service check spread: 30 min
First scheduled check: Wed Jun 17 11:24:54 2015
Last scheduled check: Thu Jun 18 14:00:00 2015

CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval: 10 sec
Max concurrent service checks: Unlimited

PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.

abrist · Post by **abrist** » Wed Jun 17, 2015 10:03 am

Have you taken a look at the following doc? Even though it says it is for XI, it pretty much applies to pure core installs as well:
https://assets.nagios.com/downloads/nag ... utions.pdf

Anuraag · Post by **Anuraag** » Fri Jun 19, 2015 1:41 am

Yes.. I have checked the document but my problem seems to be intermittent.. Not continuous..
During early hours of the day, all the checks execute perfectly without any issues but as the day start, the alerts start messing up..
At the early wee hours there is not much activity in the server nor the logs but as the day proceeds, it logs start picking up data and my scripts actually read through the data for every 5 mins to alert me.. So there are like almost 100 checks which do the same thing i.e., read through the logs.. Also my scripts take a bit of a time to execute as well.. I guess one of the reasons might be this but i'm not exactly sure..

Post by **tgriep** » Fri Jun 19, 2015 11:32 am

One possibility is if your script takes longer than 60 secondsto run when the system is busy, that could cause the issue.

Could you post your script so we can see what it does?

Nagios Support Forum

check_nrpe plugin gives socket timed out/service timed out

check_nrpe plugin gives socket timed out/service timed out

Re: check_nrpe plugin gives socket timed out/service timed o

Re: check_nrpe plugin gives socket timed out/service timed o

Re: check_nrpe plugin gives socket timed out/service timed o