Same CPU load and notification from multiple hosts
Posted: Tue Dec 04, 2012 11:57 am
This has happened three times. Once a few months ago, once on Friday November 30 and then again yesterday.
I get notifications from all of the Mac servers I have being monitored and the event logs look like this:
2012-12-03 08:42:25SERVICE ALERT: host1;CPU Load;WARNING;HARD;3;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:24SERVICE ALERT: host2;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:24SERVICE ALERT: host3;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:24SERVICE ALERT: host4;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:23SERVICE ALERT: host5;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:23SERVICE ALERT: host6;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:23SERVICE ALERT: host7;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
With corresponding notifications on all the hosts.
As I check the hosts their CPU load is actually fine though in the logs and the notifications they look to be in a warning state with all the same load averages.
The only other thing that may be linked is an entry that looks like this:
2012-12-03 08:43:10ndomod: Successfully reconnected to data sink! 0 items lost, 848 queued items to flush.
2012-12-03 08:43:10ndomod: Successfully flushed 848 queued items to data sink.
On Friday I restarted the Nagios service to get the service checks to come back fine. Yesterday I decided to reboot the nagios server.
Any thoughts or suggestions on how to keep this from happening?
I get notifications from all of the Mac servers I have being monitored and the event logs look like this:
2012-12-03 08:42:25SERVICE ALERT: host1;CPU Load;WARNING;HARD;3;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:24SERVICE ALERT: host2;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:24SERVICE ALERT: host3;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:24SERVICE ALERT: host4;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:23SERVICE ALERT: host5;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:23SERVICE ALERT: host6;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
2012-12-03 08:42:23SERVICE ALERT: host7;CPU Load;WARNING;SOFT;1;WARNING - load average: 3.45, 4.18, 2.43
With corresponding notifications on all the hosts.
As I check the hosts their CPU load is actually fine though in the logs and the notifications they look to be in a warning state with all the same load averages.
The only other thing that may be linked is an entry that looks like this:
2012-12-03 08:43:10ndomod: Successfully reconnected to data sink! 0 items lost, 848 queued items to flush.
2012-12-03 08:43:10ndomod: Successfully flushed 848 queued items to data sink.
On Friday I restarted the Nagios service to get the service checks to come back fine. Yesterday I decided to reboot the nagios server.
Any thoughts or suggestions on how to keep this from happening?