Page 1 of 1

High Cpu load

Posted: Tue Jul 18, 2017 5:27 am
by op-team
Hi Guys,

Once again we have experienced a cpu overload on our nagios server (4 cpu and 32GB ram). The server was hanging and the monitoring not working as expected:
cpuload.png
totalProcesses.png
The main resource consumer was nagios daemon.
atop.PNG
We first tried to restart the services as follows
service nagios stop
service ndo2db restart
service nagios start

At the end we have rebooted the server to get back the normal operativity

Could you help us to find out the root cause ?

Please find attached my profile file.
CPU: 4
RAM: 32 GB

Re: High Cpu load

Posted: Tue Jul 18, 2017 9:49 am
by dwhitfield
Profile did not attach. Can you try again?

Also, have you taken a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf ?

Re: High Cpu load

Posted: Wed Jul 19, 2017 2:30 am
by op-team
My profile file:
profile.zip

Re: High Cpu load

Posted: Wed Jul 19, 2017 3:11 am
by op-team
Hi,

Most of the time, the system load remains under control:
server_stats.png
engine_status.png
We weren't having a major unexpected event (issue causing a lot of check retries) when the problem happened.

Re: High Cpu load

Posted: Wed Jul 19, 2017 12:56 pm
by tgriep
The System Profile only capture a small window of data from the log files so there wasn't any errors in it to debug the issue.
If you can check the log files in the following folder to see if there are any errors at that time and if you need help, post them here.

Code: Select all

/var/log/
Also, the Nagios archived log files may have some errors and those can be found here.

Code: Select all

/usr/local/nagios/var/archives/
You can post those as well.

Re: High Cpu load

Posted: Thu Jul 20, 2017 7:59 am
by op-team
Hi,

thanks for your replies.

No relevant errors in nagios log while i found the folllowing in messages logs:

Jul 9 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 9 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 10 05:30:03 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 10 05:30:03 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 11 05:30:03 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 11 05:30:03 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 12 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 12 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 12 13:59:40 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 12 13:59:40 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 13 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 13 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 13 13:44:53 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 13 13:44:53 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 14 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 14 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 18 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 18 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 19 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 19 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 20 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 20 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!

The Mysql max connections allowed:
db_max_connections.PNG
The used connections never exceed the maximum connections allowed
max_connections.PNG

Re: High Cpu load

Posted: Thu Jul 20, 2017 10:04 am
by tgriep
In the messages log, the error is happening most of the time at 5:30:04 in the morning.
If the MYSQL database is not running, that could cause that error, it the MYSQL database getting restarted every morning or that the repair script is being run at that time?

Without any other errors, there isn't much clues on why the issue happened.

Re: High Cpu load

Posted: Mon Aug 07, 2017 8:59 am
by op-team
Hi guys,

The problem occurred one more time yesterday morning from 2h34 AM

I noticed in messages log that the issue began with the following error "2017-08-06 02:34:02 Warning: A system time change of 9243 seconds (0d 2h 34m 3s forwards in time) has been detected. Compensating... "
messages.PNG
Please have a look to my profile file
profile.zip

Thanks

Re: High Cpu load

Posted: Mon Aug 07, 2017 12:16 pm
by tgriep
That could be the reason for the load. It the system changes time, it would reschedule the checks and cause more load.
Take a look at the settings for the ntp daemon and see if you can find out why the time changes.

I also see a lot of scripts running ftp's to various servers that are run out of cron.
With the time change, that would reset those as well causing them to rerun and that would cause the load to increase.