Page 1 of 1
Nagios XI 5.7 - High CPU load caused by MySQL
Posted: Mon Jul 20, 2020 9:07 am
by alex9000
Hello,
since the update to Nagios XI 5.7 (first was 5.7.1 and now 5.7.2), we are facing high CPU load through the mysql server. That means as soon as we change something in Nagios, e.g. change hosts and apply configuration, schedule a downtime, change services..., the mysql server is causing 100% CPU load (on one core) and nagios is not displaying any status updates, after a few hours the CPU load is back to normal.
It is like a SQL query / update is taking a lot of time.
At the moment we cannot rely on the Nagios instance and are thinking of downgrading again, since 5.7.2 also did not help to solve the issue.
Nagios XI OS: Centos 7
CPU: 4 Cores
RAM: 16G
What can we do to solve this issue with Nagios 5.7.2 and high CPU load?
Thank you.
BR,
Alex
Re: Nagios XI 5.7 - High CPU load caused by MySQL
Posted: Tue Jul 21, 2020 10:04 am
by benjaminsmith
Hi Alex,
There could be some underlying database issues here, so I'd like to check a few queries and get a system profile. Please post the following queries to the thread:
Code: Select all
mysql -uroot -pnagiosxi -e "show variables like 'max_connections';"
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Thanks, Benjamin
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
Re: Nagios XI 5.7 - High CPU load caused by MySQL
Posted: Wed Jul 22, 2020 3:50 am
by alex9000
Hi,
that is the output of the query:
Code: Select all
-bash-4.2$ mysql -uroot -pxxx -e "show variables like 'max_connections';"
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 151 |
+-----------------+-------+
-bash-4.2$ mysql -uroot -pxxx -e "show global status like 'Max_used_connections';"
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| Max_used_connections | 68 |
+----------------------+-------+
-bash-4.2$ echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pxxx --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.02 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 421.45 |
| nagios_comments | 0.34 |
| nagios_configfiles | 0.01 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.39 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.01 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 37.87 |
| nagios_contactnotifications | 40.07 |
| nagios_contacts | 0.01 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.16 |
| nagios_customvariablestatus | 0.17 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 54.26 |
| nagios_eventhandlers | 0.01 |
| nagios_externalcommands | 0.28 |
| nagios_flappinghistory | 2.67 |
| nagios_host_contactgroups | 0.01 |
| nagios_host_contacts | 0.01 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.13 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.02 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.06 |
| nagios_hoststatus | 0.13 |
| nagios_instances | 0.00 |
| nagios_logentries | 54.15 |
| nagios_notifications | 11.12 |
| nagios_objects | 0.79 |
| nagios_processevents | 0.27 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.16 |
| nagios_service_contactgroups | 0.07 |
| nagios_service_contacts | 0.04 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.84 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.00 |
| nagios_servicegroups | 0.00 |
| nagios_services | 0.39 |
| nagios_servicestatus | 0.97 |
| nagios_statehistory | 486.02 |
| nagios_systemcommands | 0.04 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.02 |
| nagios_timeperiods | 0.00 |
| tbl_command | 0.04 |
| tbl_contact | 0.01 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.06 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.01 |
| tbl_hosttemplate | 0.01 |
| tbl_info | 0.13 |
| tbl_lnkContactToCommandHost | 0.00 |
| tbl_lnkContactToCommandService | 0.00 |
| tbl_lnkContactToContactgroup | 0.00 |
| tbl_lnkContactToContacttemplate | 0.00 |
| tbl_lnkContactToVariabledefinition | 0.00 |
| tbl_lnkContactgroupToContact | 0.00 |
| tbl_lnkContactgroupToContactgroup | 0.00 |
| tbl_lnkContacttemplateToCommandHost | 0.00 |
| tbl_lnkContacttemplateToCommandService | 0.00 |
| tbl_lnkContacttemplateToContactgroup | 0.00 |
| tbl_lnkContacttemplateToContacttemplate | 0.00 |
| tbl_lnkContacttemplateToVariabledefinition | 0.00 |
| tbl_lnkHostToContact | 0.00 |
| tbl_lnkHostToContactgroup | 0.01 |
| tbl_lnkHostToHost | 0.00 |
| tbl_lnkHostToHostgroup | 0.00 |
| tbl_lnkHostToHosttemplate | 0.01 |
| tbl_lnkHostToVariabledefinition | 0.01 |
| tbl_lnkHostdependencyToHost_DH | 0.00 |
| tbl_lnkHostdependencyToHost_H | 0.00 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.00 |
| tbl_lnkHostdependencyToHostgroup_H | 0.00 |
| tbl_lnkHostescalationToContact | 0.00 |
| tbl_lnkHostescalationToContactgroup | 0.00 |
| tbl_lnkHostescalationToHost | 0.00 |
| tbl_lnkHostescalationToHostgroup | 0.00 |
| tbl_lnkHostgroupToHost | 0.01 |
| tbl_lnkHostgroupToHostgroup | 0.00 |
| tbl_lnkHosttemplateToContact | 0.00 |
| tbl_lnkHosttemplateToContactgroup | 0.00 |
| tbl_lnkHosttemplateToHost | 0.00 |
| tbl_lnkHosttemplateToHostgroup | 0.00 |
| tbl_lnkHosttemplateToHosttemplate | 0.00 |
| tbl_lnkHosttemplateToVariabledefinition | 0.00 |
| tbl_lnkServiceToContact | 0.02 |
| tbl_lnkServiceToContactgroup | 0.05 |
| tbl_lnkServiceToHost | 0.08 |
| tbl_lnkServiceToHostgroup | 0.00 |
| tbl_lnkServiceToServicegroup | 0.00 |
| tbl_lnkServiceToServicetemplate | 0.08 |
| tbl_lnkServiceToVariabledefinition | 0.06 |
| tbl_lnkServicedependencyToHost_DH | 0.00 |
| tbl_lnkServicedependencyToHost_H | 0.00 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.00 |
| tbl_lnkServicedependencyToHostgroup_H | 0.00 |
| tbl_lnkServicedependencyToService_DS | 0.00 |
| tbl_lnkServicedependencyToService_S | 0.00 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.00 |
| tbl_lnkServiceescalationToContactgroup | 0.00 |
| tbl_lnkServiceescalationToHost | 0.00 |
| tbl_lnkServiceescalationToHostgroup | 0.00 |
| tbl_lnkServiceescalationToService | 0.00 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicegroupToService | 0.00 |
| tbl_lnkServicegroupToServicegroup | 0.00 |
| tbl_lnkServicetemplateToContact | 0.00 |
| tbl_lnkServicetemplateToContactgroup | 0.00 |
| tbl_lnkServicetemplateToHost | 0.00 |
| tbl_lnkServicetemplateToHostgroup | 0.00 |
| tbl_lnkServicetemplateToServicegroup | 0.00 |
| tbl_lnkServicetemplateToServicetemplate | 0.01 |
| tbl_lnkServicetemplateToVariabledefinition | 0.00 |
| tbl_lnkTimeperiodToTimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 0.57 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.00 |
| tbl_servicetemplate | 0.02 |
| tbl_session | 0.00 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.01 |
| tbl_timeperiod | 0.01 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.14 |
| xi_auditlog | 0.08 |
| xi_auth_tokens | 0.03 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.02 |
| xi_eventqueue | 0.03 |
| xi_events | 0.05 |
| xi_meta | 0.23 |
| xi_mibs | 0.05 |
| xi_options | 0.03 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.03 |
| xi_usermeta | 0.05 |
| xi_users | 0.03 |
+--------------------------------------------+------------+
In regards of the system profile: There is a lot of confidential and personal data in it, right?
Br,
Alex
Re: Nagios XI 5.7 - High CPU load caused by MySQL
Posted: Wed Jul 22, 2020 11:30 am
by benjaminsmith
Hi Alex,
We recommend to send those in a private message or open a support ticket ( those are only viewable by your team and Nagios support).
Benjamin
Re: Nagios XI 5.7 - High CPU load caused by MySQL
Posted: Tue Jul 28, 2020 8:58 am
by alex9000
Hi,
we are still facing the mysql issue, i did a query to see the processes and the load seems to come from this update query (it returns again and again):
Code: Select all
select * from INFORMATION_SCHEMA.PROCESSLIST where db = 'nagios';
| 12 | ndoutils | localhost | nagios | Execute | 3 | Updating | UPDATE nagios_commenthistory SET deletion_time = FROM_UNIXTIME(?), deletion_time_usec = ? WHERE comment_time = FROM_UNIXTIME(?) AND internal_comment_id = ? | 3892.520 | 0 | 0 | 0.000 |
Does this help you?
Br,
Alex
Re: Nagios XI 5.7 - High CPU load caused by MySQL
Posted: Tue Jul 28, 2020 9:03 am
by alex9000
I also found some locks related to this:
Code: Select all
| 10 | ndoutils | localhost | nagios | Execute | 1 | Waiting for table level lock | INSERT INTO nagios_commenthistory (instance_id, comment_type, entry_type, object_id, comment_time, internal_comment_id, author_name, comment_data, is_persistent, comment_source, expires, expiration_time, entry_time, entry_time_usec) VALUES (1,?,?,?,FROM_UNIXTIME(?),?,?,?,?,?,?,FROM_UNIXTIME(?),FROM_UNIXTIME(?),?) ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), comment_type = VALUES(comment_type), entry_type = VALUES(entry_type), object_id = VALUES(object_id), comment_time = VALUES(comment_time), internal_comment_id = VALUES(internal_comment_id), author_name = VALUES(author_name), comment_data = VALUES(comment_data), is_persistent = VALUES(is_persistent), comment_source = VALUES(comment_source), expires = VALUES(expires), expiration_time = VALUES(expiration_time), entry_time = VALUES(entry_time), entry_time_usec = VALUES(entry_time_usec) | 1079.105 | 0 | 0 | 0.000 |
Re: Nagios XI 5.7 - High CPU load caused by MySQL
Posted: Tue Jul 28, 2020 4:38 pm
by benjaminsmith
Hi Alex,
Since the issue still persists and sharing data over the forum is an issue here, let's get a support ticket opened for you at:
https://support.nagios.com/tickets/
Please attach a fresh system profile to the ticket, so we can check the status of the current logs.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
In the meantime run the following repair script, and restart the services, log in as root and run:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
Restart the services:
Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond