Page 1 of 1

High Load Average

Posted: Thu Jan 08, 2015 11:16 am
by drakeu
Hello,

Last week we upgraded NagiosXI from 2012 to 2014R2.3 on a Red Hat 5 server. Things ran find until last night we started getting slow responses from some clients. Then this morning the load average this morning shot way up (over 80). I restarted the server and the problem remained. Stopping the nagios service brings the load down to between 2 and 5. As soon as restarting the service the load will jump back up. Kind of at a loss as to what to check. Any ideas? Thanks!

Re: High Load Average

Posted: Thu Jan 08, 2015 11:31 am
by drakeu
I noticed some interesting things in our httpd log when these problems started. We had multiple of these messages:

[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: min_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 108, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: max_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 109, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: avg_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 110, referer: http://netmon.drake.edu/nagiosxi/admin/

Re: High Load Average

Posted: Thu Jan 08, 2015 11:32 am
by drakeu
Sorry that was in the error_log of /var/log/httpd

Re: High Load Average

Posted: Thu Jan 08, 2015 12:16 pm
by drakeu
One more update. I just tried enabling the fix suggested for the known issue: Core 4 Load Spikes on 1.75 and 7 Hour Intervals

I then rebooted but the problem remained. Looking again at the error_log of httpd I'm seeing a couple more errors including a mysql connection issue:

Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 1: parser error : Start tag expected, '<' not found in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Message: A database connection error has been detected, we are attempting to rep in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: ^ in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine

Re: High Load Average

Posted: Thu Jan 08, 2015 1:33 pm
by drakeu
Just checking on the status or if you need anything from me. Unfortunately, do the current issue our monitoring is completely down. Thanks!

Re: High Load Average

Posted: Thu Jan 08, 2015 2:59 pm
by drakeu
I tried the following steps:

1) Ran the Nagios repairmysql.sh against nagios and nagiosql
2) Errors in error_log disappeared but high load average remained (discovered that high load average started when logging into the web interface)
3) Ran a vacuum against Postgres as follows:

service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start

After finishing step 3 load average went back to normal and so far no problems. Will continue to monitor.

Re: High Load Average

Posted: Thu Jan 08, 2015 3:07 pm
by cmerchant
Keep us posted. Thanks.

Re: High Load Average

Posted: Thu Jan 08, 2015 3:13 pm
by drakeu
Unfortunately, the problem seems to be reoccurring. It seems to occur after logins to the web administration but I can't be sure.

Re: High Load Average

Posted: Thu Jan 08, 2015 3:19 pm
by cmerchant
Could you go ahead and open a ticket so we can further look into your server. Thanks.