Hello,
Last week we upgraded NagiosXI from 2012 to 2014R2.3 on a Red Hat 5 server. Things ran find until last night we started getting slow responses from some clients. Then this morning the load average this morning shot way up (over 80). I restarted the server and the problem remained. Stopping the nagios service brings the load down to between 2 and 5. As soon as restarting the service the load will jump back up. Kind of at a loss as to what to check. Any ideas? Thanks!
High Load Average
High Load Average
You do not have the required permissions to view the files attached to this post.
Last edited by drakeu on Thu Jan 08, 2015 11:43 am, edited 1 time in total.
Re: High Load Average
I noticed some interesting things in our httpd log when these problems started. We had multiple of these messages:
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: min_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 108, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: max_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 109, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: avg_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 110, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: min_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 108, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: max_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 109, referer: http://netmon.drake.edu/nagiosxi/admin/
[Thu Jan 08 09:28:32 2015] [error] [client 10.11.1.24] PHP Notice: Undefined index: avg_latency in /usr/local/nagiosxi/html/includes/utils-xmlsysstat.inc.php on line 110, referer: http://netmon.drake.edu/nagiosxi/admin/
Re: High Load Average
Sorry that was in the error_log of /var/log/httpd
Re: High Load Average
One more update. I just tried enabling the fix suggested for the known issue: Core 4 Load Spikes on 1.75 and 7 Hour Intervals
I then rebooted but the problem remained. Looking again at the error_log of httpd I'm seeing a couple more errors including a mysql connection issue:
Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 1: parser error : Start tag expected, '<' not found in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Message: A database connection error has been detected, we are attempting to rep in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: ^ in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
I then rebooted but the problem remained. Looking again at the error_log of httpd I'm seeing a couple more errors including a mysql connection issue:
Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 127.0.0.1] PHP Warning: mysql_pconnect() [<a href='function.mysql-pconnect'>function.mysql-pconnect</a>]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (11) in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysql.inc.php on line 383
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 1: parser error : Start tag expected, '<' not found in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Message: A database connection error has been detected, we are attempting to rep in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
[Thu Jan 08 11:11:39 2015] [error] [client 10.11.1.24] PHP Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: ^ in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 27, referer: http://netmon.drake.edu/nagiosxi/admin/ ... ringengine
Re: High Load Average
Just checking on the status or if you need anything from me. Unfortunately, do the current issue our monitoring is completely down. Thanks!
Re: High Load Average
I tried the following steps:
1) Ran the Nagios repairmysql.sh against nagios and nagiosql
2) Errors in error_log disappeared but high load average remained (discovered that high load average started when logging into the web interface)
3) Ran a vacuum against Postgres as follows:
service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start
After finishing step 3 load average went back to normal and so far no problems. Will continue to monitor.
1) Ran the Nagios repairmysql.sh against nagios and nagiosql
2) Errors in error_log disappeared but high load average remained (discovered that high load average started when logging into the web interface)
3) Ran a vacuum against Postgres as follows:
service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start
After finishing step 3 load average went back to normal and so far no problems. Will continue to monitor.
Re: High Load Average
Keep us posted. Thanks.
Re: High Load Average
Unfortunately, the problem seems to be reoccurring. It seems to occur after logins to the web administration but I can't be sure.
Re: High Load Average
Could you go ahead and open a ticket so we can further look into your server. Thanks.