check_nrpe plugin gives socket timed out/service timed out

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Anuraag
Posts: 4
Joined: Fri Feb 21, 2014 11:46 am

check_nrpe plugin gives socket timed out/service timed out

Post by Anuraag »

Hello,

I have strange problem with my check_nrpe plugin only on a couple of servers..
One host has 142 services and the other has around 80. Almost every service on these servers run for every 5 mins and a few for every 10-15 mins.
Most of the times a group of services go to an unkown/critical state with either the

1.CHECK_NRPE: Socket timeout after 60 seconds.
2.(Service Check Timed Out)
3.CHECK_NRPE: Error receiving data from daemon.

At the very next check interval a few become Ok, a few still remain like that and then a few might add up to the unkown/critical state.. This is specific only to the two of these servers..
At this point of time when I try executing the scripts for the failed services manually they all work fine on the remote server.. I do not understand what exactly could be the issue..
Could someone help/suggest for this issue.. As i Mentioned, this issue is only with these specific 2 hosts.. Because of this we are missing out on the times when we really have a problem..


Below is stats of my setup:

Nagios 2.10
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 10-21-2007
License: GPL

Projected scheduling information for host and service
checks is listed below. This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---------------------------
Total hosts: 127
Total scheduled hosts: 1
Host inter-check delay method: SMART
Average host check interval: 300.00 sec
Host inter-check delay: 300.00 sec
Max host check spread: 30 min
First scheduled check: Wed Jun 17 11:24:20 2015
Last scheduled check: Wed Jun 17 11:24:20 2015


SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 2942
Total scheduled services: 2942
Service inter-check delay method: SMART
Average service check interval: 825.21 sec
Inter-check delay: 0.28 sec
Interleave factor method: SMART
Average services per host: 23.17
Service interleave factor: 24
Max service check spread: 30 min
First scheduled check: Wed Jun 17 11:24:54 2015
Last scheduled check: Thu Jun 18 14:00:00 2015


CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval: 10 sec
Max concurrent service checks: Unlimited


PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_nrpe plugin gives socket timed out/service timed o

Post by abrist »

Have you taken a look at the following doc? Even though it says it is for XI, it pretty much applies to pure core installs as well:
https://assets.nagios.com/downloads/nag ... utions.pdf
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Anuraag
Posts: 4
Joined: Fri Feb 21, 2014 11:46 am

Re: check_nrpe plugin gives socket timed out/service timed o

Post by Anuraag »

Yes.. I have checked the document but my problem seems to be intermittent.. Not continuous..
During early hours of the day, all the checks execute perfectly without any issues but as the day start, the alerts start messing up..
At the early wee hours there is not much activity in the server nor the logs but as the day proceeds, it logs start picking up data and my scripts actually read through the data for every 5 mins to alert me.. So there are like almost 100 checks which do the same thing i.e., read through the logs.. Also my scripts take a bit of a time to execute as well.. I guess one of the reasons might be this but i'm not exactly sure..
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: check_nrpe plugin gives socket timed out/service timed o

Post by tgriep »

One possibility is if your script takes longer than 60 secondsto run when the system is busy, that could cause the issue.

Could you post your script so we can see what it does?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked