Hi Guys,
Once again we have experienced a cpu overload on our nagios server (4 cpu and 32GB ram). The server was hanging and the monitoring not working as expected:
The main resource consumer was nagios daemon.
We first tried to restart the services as follows
service nagios stop
service ndo2db restart
service nagios start
At the end we have rebooted the server to get back the normal operativity
Could you help us to find out the root cause ?
Please find attached my profile file.
CPU: 4
RAM: 32 GB
High Cpu load
High Cpu load
You do not have the required permissions to view the files attached to this post.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: High Cpu load
Profile did not attach. Can you try again?
Also, have you taken a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf ?
Also, have you taken a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf ?
Re: High Cpu load
My profile file:
You do not have the required permissions to view the files attached to this post.
Re: High Cpu load
Hi,
Most of the time, the system load remains under control: We weren't having a major unexpected event (issue causing a lot of check retries) when the problem happened.
Most of the time, the system load remains under control: We weren't having a major unexpected event (issue causing a lot of check retries) when the problem happened.
You do not have the required permissions to view the files attached to this post.
Re: High Cpu load
The System Profile only capture a small window of data from the log files so there wasn't any errors in it to debug the issue.
If you can check the log files in the following folder to see if there are any errors at that time and if you need help, post them here.
Also, the Nagios archived log files may have some errors and those can be found here.
You can post those as well.
If you can check the log files in the following folder to see if there are any errors at that time and if you need help, post them here.
Code: Select all
/var/log/Code: Select all
/usr/local/nagios/var/archives/Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: High Cpu load
Hi,
thanks for your replies.
No relevant errors in nagios log while i found the folllowing in messages logs:
Jul 9 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 9 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 10 05:30:03 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 10 05:30:03 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 11 05:30:03 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 11 05:30:03 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 12 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 12 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 12 13:59:40 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 12 13:59:40 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 13 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 13 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 13 13:44:53 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 13 13:44:53 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 14 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 14 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 18 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 18 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 19 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 19 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 20 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 20 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
The Mysql max connections allowed: The used connections never exceed the maximum connections allowed
thanks for your replies.
No relevant errors in nagios log while i found the folllowing in messages logs:
Jul 9 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 9 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 10 05:30:03 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 10 05:30:03 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 11 05:30:03 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 11 05:30:03 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 12 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 12 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 12 13:59:40 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 12 13:59:40 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 13 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 13 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 13 13:44:53 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 13 13:44:53 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 14 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 14 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 18 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 18 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 19 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 19 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
Jul 20 05:30:04 nagios ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 20 05:30:04 nagios ndo2db: Error: Connection to MySQL database has been lost!
The Mysql max connections allowed: The used connections never exceed the maximum connections allowed
You do not have the required permissions to view the files attached to this post.
Re: High Cpu load
In the messages log, the error is happening most of the time at 5:30:04 in the morning.
If the MYSQL database is not running, that could cause that error, it the MYSQL database getting restarted every morning or that the repair script is being run at that time?
Without any other errors, there isn't much clues on why the issue happened.
If the MYSQL database is not running, that could cause that error, it the MYSQL database getting restarted every morning or that the repair script is being run at that time?
Without any other errors, there isn't much clues on why the issue happened.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: High Cpu load
Hi guys,
The problem occurred one more time yesterday morning from 2h34 AM
I noticed in messages log that the issue began with the following error "2017-08-06 02:34:02 Warning: A system time change of 9243 seconds (0d 2h 34m 3s forwards in time) has been detected. Compensating... " Please have a look to my profile file
Thanks
The problem occurred one more time yesterday morning from 2h34 AM
I noticed in messages log that the issue began with the following error "2017-08-06 02:34:02 Warning: A system time change of 9243 seconds (0d 2h 34m 3s forwards in time) has been detected. Compensating... " Please have a look to my profile file
Thanks
You do not have the required permissions to view the files attached to this post.
Re: High Cpu load
That could be the reason for the load. It the system changes time, it would reschedule the checks and cause more load.
Take a look at the settings for the ntp daemon and see if you can find out why the time changes.
I also see a lot of scripts running ftp's to various servers that are run out of cron.
With the time change, that would reset those as well causing them to rerun and that would cause the load to increase.
Take a look at the settings for the ntp daemon and see if you can find out why the time changes.
I also see a lot of scripts running ftp's to various servers that are run out of cron.
With the time change, that would reset those as well causing them to rerun and that would cause the load to increase.
Be sure to check out our Knowledgebase for helpful articles and solutions!