check_nrpe plugin gives socket timed out/service timed out
Posted: Wed Jun 17, 2015 5:37 am
Hello,
I have strange problem with my check_nrpe plugin only on a couple of servers..
One host has 142 services and the other has around 80. Almost every service on these servers run for every 5 mins and a few for every 10-15 mins.
Most of the times a group of services go to an unkown/critical state with either the
1.CHECK_NRPE: Socket timeout after 60 seconds.
2.(Service Check Timed Out)
3.CHECK_NRPE: Error receiving data from daemon.
At the very next check interval a few become Ok, a few still remain like that and then a few might add up to the unkown/critical state.. This is specific only to the two of these servers..
At this point of time when I try executing the scripts for the failed services manually they all work fine on the remote server.. I do not understand what exactly could be the issue..
Could someone help/suggest for this issue.. As i Mentioned, this issue is only with these specific 2 hosts.. Because of this we are missing out on the times when we really have a problem..
Below is stats of my setup:
Nagios 2.10
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 10-21-2007
License: GPL
Projected scheduling information for host and service
checks is listed below. This information assumes that
you are going to start running Nagios with your current
config files.
HOST SCHEDULING INFORMATION
---------------------------
Total hosts: 127
Total scheduled hosts: 1
Host inter-check delay method: SMART
Average host check interval: 300.00 sec
Host inter-check delay: 300.00 sec
Max host check spread: 30 min
First scheduled check: Wed Jun 17 11:24:20 2015
Last scheduled check: Wed Jun 17 11:24:20 2015
SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 2942
Total scheduled services: 2942
Service inter-check delay method: SMART
Average service check interval: 825.21 sec
Inter-check delay: 0.28 sec
Interleave factor method: SMART
Average services per host: 23.17
Service interleave factor: 24
Max service check spread: 30 min
First scheduled check: Wed Jun 17 11:24:54 2015
Last scheduled check: Thu Jun 18 14:00:00 2015
CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval: 10 sec
Max concurrent service checks: Unlimited
PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.
I have strange problem with my check_nrpe plugin only on a couple of servers..
One host has 142 services and the other has around 80. Almost every service on these servers run for every 5 mins and a few for every 10-15 mins.
Most of the times a group of services go to an unkown/critical state with either the
1.CHECK_NRPE: Socket timeout after 60 seconds.
2.(Service Check Timed Out)
3.CHECK_NRPE: Error receiving data from daemon.
At the very next check interval a few become Ok, a few still remain like that and then a few might add up to the unkown/critical state.. This is specific only to the two of these servers..
At this point of time when I try executing the scripts for the failed services manually they all work fine on the remote server.. I do not understand what exactly could be the issue..
Could someone help/suggest for this issue.. As i Mentioned, this issue is only with these specific 2 hosts.. Because of this we are missing out on the times when we really have a problem..
Below is stats of my setup:
Nagios 2.10
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 10-21-2007
License: GPL
Projected scheduling information for host and service
checks is listed below. This information assumes that
you are going to start running Nagios with your current
config files.
HOST SCHEDULING INFORMATION
---------------------------
Total hosts: 127
Total scheduled hosts: 1
Host inter-check delay method: SMART
Average host check interval: 300.00 sec
Host inter-check delay: 300.00 sec
Max host check spread: 30 min
First scheduled check: Wed Jun 17 11:24:20 2015
Last scheduled check: Wed Jun 17 11:24:20 2015
SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 2942
Total scheduled services: 2942
Service inter-check delay method: SMART
Average service check interval: 825.21 sec
Inter-check delay: 0.28 sec
Interleave factor method: SMART
Average services per host: 23.17
Service interleave factor: 24
Max service check spread: 30 min
First scheduled check: Wed Jun 17 11:24:54 2015
Last scheduled check: Thu Jun 18 14:00:00 2015
CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval: 10 sec
Max concurrent service checks: Unlimited
PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.