High Load Average

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
drakeu
Posts: 41
Joined: Thu Mar 04, 2010 5:02 pm

High Load Average

Post by drakeu »

Hello,

Last week we upgraded NagiosXI from 2012 to 2014R2.3 on a Red Hat 5 server. Things ran find until last night we started getting slow responses from some clients. Then this morning the load average this morning shot way up (over 80). I restarted the server and the problem remained. Stopping the nagios service brings the load down to between 2 and 5. As soon as restarting the service the load will jump back up. Kind of at a loss as to what to check. Any ideas? Thanks!
You do not have the required permissions to view the files attached to this post.
Last edited by drakeu on Thu Jan 08, 2015 11:43 am, edited 1 time in total.
drakeu
Posts: 41
Joined: Thu Mar 04, 2010 5:02 pm

Re: High Load Average

Post by drakeu »

I noticed some interesting things in our httpd log when these problems started. We had multiple of these messages:

[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: min_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 108, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: max_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 109, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: avg_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 110, referer: http://netmon.drake.edu/nagiosxi/admin/
drakeu
Posts: 41
Joined: Thu Mar 04, 2010 5:02 pm

Re: High Load Average

Post by drakeu »

Sorry that was in the error_log of /var/log/httpd
drakeu
Posts: 41
Joined: Thu Mar 04, 2010 5:02 pm

Re: High Load Average

Post by drakeu »

One more update. I just tried enabling the fix suggested for the known issue: Core 4 Load Spikes on 1.75 and 7 Hour Intervals

I then rebooted but the problem remained. Looking again at the error_log of httpd I'm seeing a couple more errors including a mysql connection issue:

Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 1: parser error : Start tag expected, '<' not found in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Message: A database connection error has been detected, we are attempting to rep in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: ^ in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
drakeu
Posts: 41
Joined: Thu Mar 04, 2010 5:02 pm

Re: High Load Average

Post by drakeu »

Just checking on the status or if you need anything from me. Unfortunately, do the current issue our monitoring is completely down. Thanks!
drakeu
Posts: 41
Joined: Thu Mar 04, 2010 5:02 pm

Re: High Load Average

Post by drakeu »

I tried the following steps:

1) Ran the Nagios repairmysql.sh against nagios and nagiosql
2) Errors in error_log disappeared but high load average remained (discovered that high load average started when logging into the web interface)
3) Ran a vacuum against Postgres as follows:

service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start

After finishing step 3 load average went back to normal and so far no problems. Will continue to monitor.
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: High Load Average

Post by cmerchant »

Keep us posted. Thanks.
drakeu
Posts: 41
Joined: Thu Mar 04, 2010 5:02 pm

Re: High Load Average

Post by drakeu »

Unfortunately, the problem seems to be reoccurring. It seems to occur after logins to the web administration but I can't be sure.
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: High Load Average

Post by cmerchant »

Could you go ahead and open a ticket so we can further look into your server. Thanks.
Locked