Page 1 of 1

Mysterious load average alerts on Nagios XI 2011R3.2

Posted: Tue Jul 03, 2012 7:27 pm
by nagiosadmin42
We're running Nagios XI 2011R3.2 and are receiving intermittent load average alerts like the following:

Code: Select all

***** Nagios *****

Notification Type: PROBLEM

Service: Current Load
Host: localhost
Address: 127.0.0.1
State: CRITICAL

Date/Time: Tue Jul 3 17:04:18 PDT 2012

Additional Info:

CRITICAL - load average: 25.89, 25.82, 26.40
The mystery is what causes the alert, because when I receive the notification I immediately check the System Status page and it shows things are just fine:

Code: Select all

Load
1-min	0.16	
5-min	0.13	
15-min	0.09	


The service definition "Current Load" uses the command:

Code: Select all

$USER1$/check_load -w $ARG1$ -c $ARG2$ 
with
$ARG1$ = 5.0,4.0,3.0
$ARG2$ = 10.0,6.0,4.0
I've scanned /usr/local/nagios/var/nagios.log, and don't find any entries with high load average values.

Any ideas on where I should look for the cause of these alerts?

Re: Mysterious load average alerts on Nagios XI 2011R3.2

Posted: Thu Jul 05, 2012 10:52 am
by scottwilkerson
By any chance do you have more than 1 Nagios server floating around (backup or development?)? Does the alert indicate what the server name for this Nagios server?

Re: Mysterious load average alerts on Nagios XI 2011R3.2

Posted: Thu Jul 05, 2012 11:47 am
by nagiosadmin42
Unfortunately, as shown in my original post, the alert says "localhost" with IP "127.0.0.1" so it's very hard to know exactly which system is sending it.

That was a good idea about checking for multiple Nagios servers, however we have only one production Nagios XI server. We were initially using the virtual machine image to try out Nagios XI, and that instance is shut down.

Re: Mysterious load average alerts on Nagios XI 2011R3.2

Posted: Thu Jul 05, 2012 11:59 am
by nagiosadmin42
Ok, this is embarrassing... good catch on that multiple Nagios servers question. There WAS another Nagios Core dev system installed long ago, and we forgot all about it when we began investigating Nagios XI. We just logged onto that server and it is experiencing high load volumes due to some hadoop testing going on there. THANK YOU!

Re: Mysterious load average alerts on Nagios XI 2011R3.2

Posted: Thu Jul 05, 2012 12:09 pm
by nagiosadmin42
And, the "***** Nagios *****" header should have been the clue to all this... our production server alerts say "***** Nagios XI Alert *****".

d'oh!

Re: Mysterious load average alerts on Nagios XI 2011R3.2

Posted: Thu Jul 05, 2012 12:30 pm
by scottwilkerson
It's bound to happen sooner or later if you have several Nagios installs...