The production server is running the following.
Nagios XI Version : 2012R1.6
2.6.32-220.el6.x86_64 x86_64
CentOS release 6.2 (Final)
Gnome is not installed
PHP Version: 5.3.3
No hardware/software updates. The nagios plugins do vary slightly, one is a test instance and the other is production. We are waiting for the arrival of our new servers, then we will upgrade to your latest and greatest version of Nagios XI! Network team says there were no networking issues yesterday.
The problem also occurred last weekend (10/12) on my test server, which is a copy of the production server (different HW, but same OS, Nagios XI version, PHP). On the test server I noticed the host/service checks had stopped running at 12:30 or so on Saturday but Nagios was showing that the monitoring engine wasn’t running or anything else was amiss When I checked the event log, I found a bunch of those fork error messages occurring at ~13:00, ½ hr after the host/service checks stopped. Running a configuration update fixed the problem on this server. Ack, but it was temporary. I notice now that its back, the host/service checks stopped running yesterday at 10/17/2013 12:39:59. This time there were no fork errors and /tmp was filled with those check file droppings. I've left it as is for now, in case you want a system snapshot or anything
My colleague has created a spreadsheet of Nagios errors he noticed on our Production server over the last few days, that I can forward to you. Its an excel spreadsheet so I’ll have to email it to you separately.
OK, I’ve saved the weirdest part for last. We are still getting the error ‘socket timeout after 30 seconds’ but no fork errors now or last night. No false host down alerts and the load average have never been so low. The zombie count is very low, Total Processes service check (RSZDT error, is down-it too had been up)and the performance graphing has caught up but there are gaps.
Here is current screenshot of the Monitoring Engine Status page. It didn’t look anything like this yesterday.
I would still like to pursue this issue with you, after all we don’t want it to come back!
Penny Karr | IT Infrastructure Monitoring
Harvard Vanguard Medical Associates, an Affiliate of Atrius Health
254 Second Avenue | Needham, MA 02494
P (781) 292-1853 | F (781 292-1980 |
http://www.harvardvanguard.org
Email:
[email protected]
You do not have the required permissions to view the files attached to this post.