Page 1 of 2
Nagios XI 5.8.5 high CPU usage
Posted: Wed Aug 11, 2021 3:42 am
by ignacio.sanchez
Hello.
We have 4 Nagios XI instances already updated to 5.8.5, but we are facing high CPU usage on only one of them where I can't find the exact reason.
The server has 4 CPUs and 16GB RAM.
Already tried:
Code: Select all
check_result_reaper_frequency=3
max_check_result_reaper_time=10
(if I set it to the default "0", CPU usage will reach 311.95,232.62,145.09)
CPU usage is as below now (but webserver interface is so slow)
Code: Select all
top - 08:28:32 up 1 day, 34 min, 1 user, load average: 29.86, 25.89, 25.12
Tasks: 212 total, 28 running, 184 sleeping, 0 stopped, 0 zombie
%Cpu(s): 77.8 us, 19.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
KiB Mem : 16431432 total, 8797700 free, 2255444 used, 5378288 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 13016932 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21233 mysql 20 0 2630900 707264 9848 S 20.8 4.3 227:45.00 mysqld
401 nagios 20 0 251172 61528 2264 R 18.5 0.4 0:03.33 curl
952 nagios 20 0 243004 52804 2264 R 15.3 0.3 0:02.59 curl
1953 nagios 20 0 243004 38900 2264 R 15.3 0.2 0:01.35 curl
5145 apache 20 0 643764 33060 6108 R 15.3 0.2 0:04.50 httpd
2770 apache 20 0 119916 8480 2352 R 14.9 0.1 0:00.79 status.cgi
30747 apache 20 0 643892 32512 5656 R 14.6 0.2 0:08.59 httpd
31795 nagios 20 0 241748 95892 1352 R 14.3 0.6 0:04.14 send_nrdp.sh
3242 nagios 20 0 153116 11312 2236 R 12.7 0.1 0:00.39 check_ifopersta
31200 nagios 20 0 178320 55424 4392 R 11.7 0.3 0:14.84 mrtg
17779 apache 20 0 643776 33440 6204 S 11.0 0.2 0:07.58 httpd
3241 nagios 20 0 148164 10444 2196 R 10.4 0.1 0:00.32 check_ifopersta
3307 nagios 20 0 148032 10180 2192 R 10.1 0.1 0:00.31 check_ifopersta
7696 apache 20 0 643828 33368 6112 R 10.1 0.2 0:12.37 httpd
24838 apache 20 0 638756 28468 6244 S 10.1 0.2 0:28.56 httpd
3208 nagios 20 0 147636 9912 2192 R 8.4 0.1 0:00.26 check_ifopersta
3214 nagios 20 0 146448 8864 2188 R 8.1 0.1 0:00.25 check_ifopersta
3239 nagios 20 0 269904 6668 4824 S 8.1 0.0 0:00.25 curl
3277 nagios 20 0 205792 11660 4104 S 8.1 0.1 0:00.25 python
3339 nagios 20 0 146448 8868 2188 R 8.1 0.1 0:00.25 check_ifopersta
534 root 20 0 64000 27652 27276 S 7.5 0.2 38:35.42 systemd-journal
21348 apache 20 0 642904 32364 7200 R 7.5 0.2 0:24.72 httpd
3333 nagios 20 0 141372 7652 2140 R 6.5 0.0 0:00.20 check_ifopersta
3374 nagios 20 0 182792 8748 3488 R 4.9 0.1 0:00.15 python
2257 nagios 20 0 113700 2092 1388 S 4.2 0.0 0:00.99 send_nrdp.sh
15687 nagios 20 0 1047404 33956 3608 R 4.2 0.2 63:49.93 nagios
3406 nagios 20 0 138204 4500 2112 R 2.9 0.0 0:00.09 check_ifopersta
3413 nagios 20 0 138600 5024 2128 R 2.9 0.0 0:00.09 check_ifopersta
1085 root 20 0 592540 21180 18980 S 2.3 0.1 10:28.06 rsyslogd
9 root 20 0 0 0 0 R 1.3 0.0 31:53.37 rcu_sched
3015 nagios 20 0 115536 1700 1368 S 1.3 0.0 0:00.04 check_rrdtraf
3435 nagios 20 0 135216 3576 2072 R 1.3 0.0 0:00.04 check_ifopersta
31147 nagios 20 0 445680 26204 10544 S 1.3 0.2 0:00.59 php
3019 nagios 20 0 115536 1708 1368 S 1.0 0.0 0:00.03 check_rrdtraf
3445 nagios 20 0 18868 3172 1584 R 1.0 0.0 0:00.03 python
1753 root 20 0 162236 2436 1592 R 0.6 0.0 0:00.11 top
1 root 20 0 191308 4240 2620 S 0.3 0.0 3:29.58 systemd
6 root 20 0 0 0 0 S 0.3 0.0 1:34.27 ksoftirqd/0
47 root 39 19 0 0 0 R 0.3 0.0 0:37.35 khugepaged
712 dbus 20 0 58392 2668 1828 S 0.3 0.0 3:41.77 dbus-daemon
717 root 20 0 26492 1848 1456 S 0.3 0.0 1:33.76 systemd-logind
3243 nagios 20 0 115536 1700 1368 R 0.3 0.0 0:00.01 check_rrdtraf
3460 nagios 20 0 113284 1332 1148 R 0.3 0.0 0:00.01 sh
6524 root 20 0 159592 6184 4796 S 0.3 0.0 0:00.45 sshd
15691 nagios 20 0 10844 1116 820 S 0.3 0.0 2:16.41 nagios
15693 nagios 20 0 10844 1116 820 S 0.3 0.0 2:17.48 nagios
31186 nagios 20 0 445680 26072 10444 S 0.3 0.2 0:00.50 php
2 root 20 0 0 0 0 S 0.0 0.0 0:00.37 kthreadd
4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root rt 0 0 0 0 S 0.0 0.0 0:23.52 migration/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain
11 root rt 0 0 0 0 S 0.0 0.0 0:16.20 watchdog/0
12 root rt 0 0 0 0 S 0.0 0.0 0:16.16 watchdog/1
13 root rt 0 0 0 0 S 0.0 0.0 0:24.32 migration/1
14 root 20 0 0 0 0 S 0.0 0.0 1:06.53 ksoftirqd/1
16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H
17 root rt 0 0 0 0 S 0.0 0.0 0:16.63 watchdog/2
18 root rt 0 0 0 0 S 0.0 0.0 0:25.49 migration/2
19 root 20 0 0 0 0 S 0.0 0.0 1:04.31 ksoftirqd/2
The total number of hosts is 214 and 3450 services (but will increase soon)
I changed the following options in php.ini file too:
Code: Select all
max_execution_time = 120
max_input_vars = 50000
memory_limit = 1024M
Obviously, you'll need more information to troubleshoot the issue, so don't hesitate to ask.
Thanks in advance!
Re: Nagios XI 5.8.5 high CPU usage
Posted: Wed Aug 11, 2021 9:45 am
by dchurch
1. Please try running the database repair script, and let me know if that is successful. Run the following as root from the terminal.
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
See
Repairing The Nagios XI Database for complete instructions
2. Sometimes decreasing the amount of time the audit logs stay in the database can help.
The
Max Audit Log Age setting is controlled through the admin screens. You can get to it through
Admin =>
System Config -> Performance Settings, then click on the
Database tab. The default in Nagios XI 5.7.3 and later is 180 days.
I usually recommend the following settings for better performance on larger (1000+ hosts/services) Nagios XI installs:
- Max Log Entries Age: change to 10
- Max Audit Log Age: change to 10
- Max State History Age: change to 30
See this document:
Nagios XI Database Optimization
3. If that still doesn't work
Try truncating some of the poorly-optimized "paper-trail" tables:
If MySQL:
Code: Select all
mysql -uroot -pnagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
If PostgreSQL:
Code: Select all
psql -U nagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
4. If you're still having issues with mysqld taking up 100% of the CPU
What is the output from the following command?
Code: Select all
mysql -uroot -pnagiosxi --table <<< "select * from (select table_name, round(((data_length + index_length) / 1024 / 1024), 2) as sz from information_schema.tables where table_schema like 'nagios%') as x order by x.sz;"
Re: Nagios XI 5.8.5 high CPU usage
Posted: Thu Aug 12, 2021 4:28 am
by ignacio.sanchez
Hello dchurch.
dchurch wrote:1. Please try running the database repair script, and let me know if that is successful. Run the following as root from the terminal.
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
See
Repairing The Nagios XI Database for complete instructions
Done, successfully.
dchurch wrote:2. Sometimes decreasing the amount of time the audit logs stay in the database can help.
The
Max Audit Log Age setting is controlled through the admin screens. You can get to it through
Admin =>
System Config -> Performance Settings, then click on the
Database tab. The default in Nagios XI 5.7.3 and later is 180 days.
I usually recommend the following settings for better performance on larger (1000+ hosts/services) Nagios XI installs:
- Max Log Entries Age: change to 10
- Max Audit Log Age: change to 10
- Max State History Age: change to 30
See this document:
Nagios XI Database Optimization
Changed
dchurch wrote:3. If that still doesn't work
Try truncating some of the poorly-optimized "paper-trail" tables:
If MySQL:
Code: Select all
mysql -uroot -pnagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
If PostgreSQL:
Code: Select all
psql -U nagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
Tables truncated.
dchurch wrote:4. If you're still having issues with mysqld taking up 100% of the CPU
What is the output from the following command?
Code: Select all
mysql -uroot -pnagiosxi --table <<< "select * from (select table_name, round(((data_length + index_length) / 1024 / 1024), 2) as sz from information_schema.tables where table_schema like 'nagios%') as x order by x.sz;"
Code: Select all
+--------------------------------------------+---------+
| table_name | sz |
+--------------------------------------------+---------+
| nagios_hostgroups | 0.00 |
| nagios_host_parenthosts | 0.00 |
| nagios_servicegroups | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_servicegroup_members | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_contactstatus | 0.00 |
| nagios_externalcommands | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_timeperiods | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_downtimehistory | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_programstatus | 0.00 |
| nagios_dbversion | 0.00 |
| nagios_contactgroup_members | 0.00 |
| nagios_service_parentservices | 0.00 |
| nagios_contact_notificationcommands | 0.00 |
| nagios_contact_addresses | 0.00 |
| nagios_configfilevariables | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_timedeventqueue | 0.00 |
| nagios_configfiles | 0.00 |
| nagios_instances | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_comments | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostgroup_members | 0.01 |
| nagios_acknowledgements | 0.01 |
| nagios_host_contacts | 0.01 |
| nagios_host_contactgroups | 0.01 |
| nagios_contacts | 0.01 |
| nagios_timeperiod_timeranges | 0.01 |
| nagios_commands | 0.02 |
| tbl_lnkServiceToHostgroup | 0.02 |
| tbl_lnkHostToContactgroup | 0.02 |
| tbl_lnkServicetemplateToServicetemplate | 0.02 |
| tbl_lnkContactToContacttemplate | 0.02 |
| xi_cmp_ccm_backups | 0.02 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicetemplateToServicegroup | 0.02 |
| tbl_lnkContactToContactgroup | 0.02 |
| tbl_lnkServiceescalationToService | 0.02 |
| tbl_lnkServiceToContactgroup | 0.02 |
| tbl_lnkServicetemplateToHostgroup | 0.02 |
| tbl_lnkContactToCommandService | 0.02 |
| tbl_lnkServiceescalationToHostgroup | 0.02 |
| tbl_lnkServiceToContact | 0.02 |
| tbl_lnkServicetemplateToHost | 0.02 |
| tbl_lnkContactToCommandHost | 0.02 |
| tbl_lnkHostgroupToHost | 0.02 |
| tbl_lnkServiceescalationToHost | 0.02 |
| tbl_lnkHostescalationToHostgroup | 0.02 |
| tbl_lnkServiceescalationToContactgroup | 0.02 |
| tbl_lnkHosttemplateToVariabledefinition | 0.02 |
| tbl_lnkServicetemplateToContactgroup | 0.02 |
| tbl_lnkHostescalationToHost | 0.02 |
| tbl_lnkServiceescalationToContact | 0.02 |
| xi_meta | 0.02 |
| tbl_lnkHosttemplateToHosttemplate | 0.02 |
| tbl_lnkServicetemplateToContact | 0.02 |
| tbl_timedefinition | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| nagios_eventhandlers | 0.02 |
| tbl_lnkHosttemplateToHostgroup | 0.02 |
| tbl_lnkServicegroupToServicegroup | 0.02 |
| tbl_submenu | 0.02 |
| tbl_lnkHostescalationToContactgroup | 0.02 |
| tbl_lnkHosttemplateToHost | 0.02 |
| tbl_lnkServicegroupToService | 0.02 |
| tbl_lnkHostescalationToContact | 0.02 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkHostToContact | 0.02 |
| tbl_lnkHosttemplateToContactgroup | 0.02 |
| tbl_session_locks | 0.02 |
| tbl_lnkHostdependencyToHostgroup_H | 0.02 |
| tbl_lnkServicedependencyToService_S | 0.02 |
| tbl_lnkContacttemplateToVariabledefinition | 0.02 |
| xi_deploy_jobs | 0.02 |
| tbl_lnkHosttemplateToContact | 0.02 |
| tbl_session | 0.02 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.02 |
| tbl_lnkServicedependencyToService_DS | 0.02 |
| tbl_lnkContacttemplateToContacttemplate | 0.02 |
| xi_deploy_agents | 0.02 |
| tbl_lnkHostdependencyToHost_H | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_lnkServicedependencyToHostgroup_H | 0.02 |
| tbl_lnkContacttemplateToContactgroup | 0.02 |
| xi_commands | 0.02 |
| tbl_lnkHostgroupToHostgroup | 0.02 |
| tbl_lnkHostdependencyToHost_DH | 0.02 |
| tbl_permission | 0.02 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.02 |
| tbl_lnkContacttemplateToCommandService | 0.02 |
| tbl_lnkHostToVariabledefinition | 0.02 |
| tbl_mainmenu | 0.02 |
| tbl_lnkServicedependencyToHost_H | 0.02 |
| tbl_lnkContacttemplateToCommandHost | 0.02 |
| tbl_lnkServicedependencyToHost_DH | 0.02 |
| tbl_lnkContactgroupToContactgroup | 0.02 |
| xi_cmp_scheduledreports_log | 0.02 |
| tbl_lnkHostToHosttemplate | 0.02 |
| tbl_logbook | 0.02 |
| tbl_lnkHostToHostgroup | 0.02 |
| tbl_lnkTimeperiodToTimeperiod | 0.02 |
| tbl_lnkContactgroupToContact | 0.02 |
| tbl_lnkServiceToServicegroup | 0.02 |
| tbl_lnkHostToHost | 0.02 |
| tbl_lnkServicetemplateToVariabledefinition | 0.02 |
| tbl_lnkContactToVariabledefinition | 0.02 |
| tbl_domain | 0.03 |
| tbl_contacttemplate | 0.03 |
| tbl_contactgroup | 0.03 |
| tbl_user | 0.03 |
| tbl_contact | 0.03 |
| tbl_timeperiod | 0.03 |
| nagios_service_contactgroups | 0.03 |
| xi_eventqueue | 0.03 |
| tbl_settings | 0.03 |
| xi_users | 0.03 |
| xi_sysstat | 0.03 |
| tbl_hosttemplate | 0.03 |
| xi_sessions | 0.03 |
| tbl_servicegroup | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| tbl_hostgroup | 0.03 |
| xi_cmp_trapdata | 0.03 |
| tbl_hostextinfo | 0.03 |
| tbl_serviceextinfo | 0.03 |
| tbl_hostescalation | 0.03 |
| tbl_serviceescalation | 0.03 |
| nagios_systemcommands | 0.03 |
| tbl_hostdependency | 0.03 |
| tbl_servicedependency | 0.03 |
| xi_cmp_favorites | 0.03 |
| xi_events | 0.05 |
| xi_mibs | 0.05 |
| tbl_command | 0.06 |
| nagios_customvariablestatus | 0.06 |
| tbl_servicetemplate | 0.06 |
| nagios_customvariables | 0.06 |
| nagios_service_contacts | 0.06 |
| xi_options | 0.06 |
| nagios_hosts | 0.06 |
| tbl_lnkServiceToVariabledefinition | 0.09 |
| tbl_host | 0.09 |
| tbl_lnkServiceToHost | 0.11 |
| tbl_lnkServiceToServicetemplate | 0.11 |
| nagios_hostchecks | 0.12 |
| tbl_variabledefinition | 0.14 |
| nagios_flappinghistory | 0.14 |
| nagios_hoststatus | 0.14 |
| tbl_info | 0.17 |
| xi_usermeta | 0.25 |
| nagios_processevents | 0.33 |
| nagios_commenthistory | 0.36 |
| xi_cmp_nagiosbpi_backups | 0.48 |
| tbl_service | 0.50 |
| nagios_services | 0.75 |
| nagios_servicechecks | 1.05 |
| nagios_objects | 1.43 |
| nagios_servicestatus | 1.87 |
| xi_auth_tokens | 2.53 |
| xi_auditlog | 6.17 |
| nagios_contactnotificationmethods | 17.86 |
| nagios_contactnotifications | 18.89 |
| nagios_notifications | 32.03 |
| nagios_statehistory | 37.43 |
| nagios_logentries | 2513.74 |
+--------------------------------------------+---------+
But I reverted back max_concurrent_checks to 0 (it was working before this way), and here you have a print of all processes.
Code: Select all
top - 09:25:46 up 2 min, 2 users, load average: 76.43, 25.34, 9.01
Tasks: 257 total, 42 running, 215 sleeping, 0 stopped, 0 zombie
%Cpu(s): 64.8 us, 30.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 4.9 si, 0.0 st
KiB Mem : 16431432 total, 14773732 free, 1346700 used, 311000 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 14801788 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1427 mysql 20 0 2616024 135472 8896 S 9.3 0.8 0:31.48 mysqld
1502 apache 20 0 638624 27660 5656 R 5.7 0.2 0:10.35 httpd
1503 apache 20 0 638368 27408 5660 R 5.4 0.2 0:08.39 httpd
23355 nagios 20 0 203416 11224 4032 R 5.1 0.1 0:00.21 python
23376 nagios 20 0 202756 10700 3996 R 5.1 0.1 0:00.20 python
23377 nagios 20 0 205788 11652 4104 R 5.1 0.1 0:00.21 python
1607 apache 20 0 639000 28096 5808 R 4.8 0.2 0:08.61 httpd
1609 apache 20 0 643748 32528 5656 R 4.8 0.2 0:07.88 httpd
23358 nagios 20 0 203416 11220 4032 R 4.8 0.1 0:00.20 python
1505 apache 20 0 644904 33568 6716 R 4.5 0.2 0:08.82 httpd
15742 nagios 20 0 47496 17732 2344 R 4.5 0.1 0:02.66 mrtg
22948 nagios 20 0 142032 8444 2148 R 4.5 0.1 0:00.21 check_ifopersta
23346 nagios 20 0 140976 7400 2136 R 4.5 0.0 0:00.19 check_ifopersta
23349 nagios 20 0 187332 9220 3604 R 4.5 0.1 0:00.20 python
23350 nagios 20 0 194096 9844 3660 R 4.5 0.1 0:00.18 python
23352 nagios 20 0 147108 9388 2188 R 4.5 0.1 0:00.20 check_ifopersta
23353 nagios 20 0 200436 10352 3832 R 4.5 0.1 0:00.19 python
23356 nagios 20 0 140712 7132 2136 R 4.5 0.0 0:00.19 check_ifopersta
1598 apache 20 0 643772 32512 5652 R 4.2 0.2 0:04.07 httpd
17096 nagios 20 0 183492 62908 1352 R 4.2 0.4 0:02.20 send_nrdp.sh
20586 nagios 20 0 283992 49520 2264 R 4.2 0.3 0:00.95 curl
23348 nagios 20 0 184872 8932 3532 R 4.2 0.1 0:00.17 python
23360 nagios 20 0 200436 10344 3832 R 4.2 0.1 0:00.19 python
23361 nagios 20 0 140712 7128 2136 R 4.2 0.0 0:00.17 check_ifopersta
23373 nagios 20 0 200436 10348 3832 R 4.2 0.1 0:00.17 python
23378 nagios 20 0 187332 9224 3604 R 4.2 0.1 0:00.18 python
21523 nagios 20 0 157712 12132 2536 S 4.0 0.1 0:00.38 check_ifopersta
21760 nagios 20 0 155636 11992 2408 R 4.0 0.1 0:00.36 check_ifopersta
23357 nagios 20 0 189412 9600 3616 R 4.0 0.1 0:00.16 python
23363 nagios 20 0 141240 7656 2140 R 4.0 0.0 0:00.17 check_ifopersta
23364 nagios 20 0 189404 9604 3616 R 4.0 0.1 0:00.16 python
23367 nagios 20 0 194092 10072 3704 R 4.0 0.1 0:00.16 python
23371 nagios 20 0 203020 10692 3996 R 4.0 0.1 0:00.18 python
23375 nagios 20 0 194088 9840 3660 R 4.0 0.1 0:00.17 python
21538 nagios 20 0 155636 11988 2408 S 3.7 0.1 0:00.37 check_ifopersta
21638 nagios 20 0 155636 11996 2408 R 3.7 0.1 0:00.36 check_ifopersta
23359 nagios 20 0 184872 9012 3556 R 3.7 0.1 0:00.15 python
23366 nagios 20 0 184872 9012 3556 R 3.7 0.1 0:00.16 python
23369 nagios 20 0 140580 6864 2136 R 3.7 0.0 0:00.16 check_ifopersta
23354 nagios 20 0 184872 8972 3548 R 3.4 0.1 0:00.15 python
23368 nagios 20 0 182788 8852 3520 R 3.4 0.1 0:00.15 python
15440 nagios 20 0 453360 34728 11016 S 3.1 0.2 0:01.73 php
9 root 20 0 0 0 0 R 2.3 0.0 0:02.03 rcu_sched
21497 nagios 20 0 155636 11992 2408 S 1.7 0.1 0:00.33 check_ifopersta
21531 nagios 20 0 115540 1700 1368 S 1.4 0.0 0:00.05 check_rrdtraf
21532 nagios 20 0 115540 1700 1368 S 1.4 0.0 0:00.06 check_rrdtraf
21561 nagios 20 0 115540 1704 1368 R 1.4 0.0 0:00.09 check_rrdtraf
21507 nagios 20 0 115540 1704 1368 S 1.1 0.0 0:00.05 check_rrdtraf
21510 nagios 20 0 115540 1704 1368 S 1.1 0.0 0:00.08 check_rrdtraf
21536 nagios 20 0 115540 1700 1368 S 0.8 0.0 0:00.03 check_rrdtraf
21546 nagios 20 0 115540 1700 1368 S 0.8 0.0 0:00.05 check_rrdtraf
21555 nagios 20 0 115540 1700 1368 S 0.8 0.0 0:00.04 check_rrdtraf
21565 nagios 20 0 115540 1704 1368 S 0.8 0.0 0:00.07 check_rrdtraf
21689 nagios 20 0 115536 1704 1368 S 0.8 0.0 0:00.05 check_rrdtraf
21739 nagios 20 0 115540 1704 1368 S 0.8 0.0 0:00.03 check_rrdtraf
19 root 20 0 0 0 0 S 0.6 0.0 0:00.12 ksoftirqd/2
1613 root 20 0 162372 2396 1608 R 0.6 0.0 0:00.92 top
21489 nagios 20 0 115540 1700 1368 S 0.6 0.0 0:00.04 check_rrdtraf
21490 nagios 20 0 115540 1704 1368 S 0.6 0.0 0:00.04 check_rrdtraf
21491 nagios 20 0 115540 1704 1368 S 0.6 0.0 0:00.02 check_rrdtraf
And obviously, web browsing through Nagios interface becomes unusable.
Re: Nagios XI 5.8.5 high CPU usage
Posted: Thu Aug 12, 2021 11:23 am
by dchurch
Please run the following commands and post the output:
Code: Select all
rm -f /usr/local/nagiosxi/var/dbmaint.lock
time php /usr/local/nagiosxi/cron/dbmaint.php
Re: Nagios XI 5.8.5 high CPU usage
Posted: Thu Aug 12, 2021 11:49 am
by ignacio.sanchez
Code: Select all
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1565713788)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1597249788)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1628180988)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1627921788)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1621009788)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1621009788)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1621009788)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1626193788)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1628785488)
LASTOPT: 1628784905
INTERVAL: 60
NOW: 1628785788
OPTTIME: 1628788505
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < FROM_UNIXTIME(1628756988) AND status_code = 2
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < FROM_UNIXTIME(1628756988) AND status_code = 2
CLEANING nagiosxi TABLE 'auth_tokens'...
SQL: DELETE FROM xi_auth_tokens WHERE auth_valid_until < FROM_UNIXTIME(1628699388)
CLEANING nagiosxi TABLE 'cmp_trapdata_log'...
SQL: DELETE FROM xi_cmp_trapdata_log WHERE trapdata_log_datetime < FROM_UNIXTIME(1621009788)
CLEANING nagiosxi TABLE 'cmp_scheduledreports_log'...
SQL: DELETE FROM xi_cmp_scheduledreports_log WHERE report_run < FROM_UNIXTIME(1597249788)
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: Deleted 0 (DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL))
CLEANING nagiosxi TABLE 'auditlog'...
SQL: DELETE FROM xi_auditlog WHERE log_time < FROM_UNIXTIME(1627921788)
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1628756988)
Repair Complete: Removing Lock File
real 0m4.361s
user 0m0.436s
sys 0m0.151s
But still, performance is slow

(
max_concurrent_checks=0 (web interface almost unusable)
Code: Select all
top - 16:42:36 up 7:19, 2 users, load average: 93.72, 38.78, 24.57
Tasks: 400 total, 128 running, 272 sleeping, 0 stopped, 0 zombie
%Cpu(s): 68.3 us, 26.4 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 5.3 si, 0.0 st
KiB Mem : 16431432 total, 9984132 free, 2181016 used, 4266284 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 13232672 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16662 mysql 20 0 2628720 563436 9656 S 2.8 3.4 49:20.32 mysqld
20471 nagios 20 0 153512 11952 2380 R 2.6 0.1 0:00.32 check_ifopersta
22223 nagios 20 0 134772 5488 2152 R 2.6 0.0 0:00.11 python
20456 nagios 20 0 147636 9920 2192 R 2.3 0.1 0:00.31 check_ifopersta
20528 nagios 20 0 147372 9660 2188 R 2.3 0.1 0:00.31 check_ifopersta
22164 nagios 20 0 154900 6016 2292 D 2.3 0.0 0:00.11 python
22165 nagios 20 0 139392 5808 2132 R 2.3 0.0 0:00.10 check_ifopersta
22170 nagios 20 0 136856 5524 2152 R 2.3 0.0 0:00.10 python
22175 nagios 20 0 136852 5520 2152 R 2.3 0.0 0:00.10 python
22187 nagios 20 0 138732 5020 2128 R 2.3 0.0 0:00.11 check_ifopersta
22188 nagios 20 0 138600 5024 2128 R 2.3 0.0 0:00.11 check_ifopersta
22198 nagios 20 0 138864 5288 2132 R 2.3 0.0 0:00.12 check_ifopersta
22201 nagios 20 0 138468 4760 2128 R 2.3 0.0 0:00.11 check_ifopersta
22206 nagios 20 0 138468 4764 2128 R 2.3 0.0 0:00.10 check_ifopersta
22214 nagios 20 0 177660 7776 3452 R 2.3 0.0 0:00.10 python
22377 nagios 20 0 45444 4196 2436 R 2.3 0.0 0:00.10 snmpget
5184 apache 20 0 643728 32440 5576 R 2.1 0.2 0:02.75 httpd
11810 nagios 20 0 1228088 41900 3524 R 2.1 0.3 0:03.18 nagios
16251 apache 20 0 536244 26676 4216 R 2.1 0.2 0:00.66 httpd
16381 apache 20 0 534888 25108 4172 R 2.1 0.2 0:00.63 httpd
17132 nagios 20 0 311868 24096 7188 R 2.1 0.1 0:01.02 php
17840 apache 20 0 641944 30972 6068 R 2.1 0.2 0:08.50 httpd
20035 nagios 20 0 153248 11568 2264 R 2.1 0.1 0:00.37 check_ifopersta
20067 nagios 20 0 153380 11576 2264 R 2.1 0.1 0:00.36 check_ifopersta
20191 nagios 20 0 153116 11316 2236 R 2.1 0.1 0:00.34 check_ifopersta
20486 nagios 20 0 147768 10180 2192 R 2.1 0.1 0:00.33 check_ifopersta
20554 nagios 20 0 147768 9912 2192 R 2.1 0.1 0:00.30 check_ifopersta
22167 nagios 20 0 138068 4496 2112 R 2.1 0.0 0:00.10 check_ifopersta
22172 nagios 20 0 177524 7264 3168 R 2.1 0.0 0:00.09 python
22173 nagios 20 0 177520 7260 3168 R 2.1 0.0 0:00.10 python
22178 nagios 20 0 138204 4496 2112 R 2.1 0.0 0:00.10 check_ifopersta
22203 nagios 20 0 139128 5548 2132 R 2.1 0.0 0:00.10 check_ifopersta
22207 nagios 20 0 138068 4500 2112 R 2.1 0.0 0:00.10 check_ifopersta
22218 nagios 20 0 128080 5060 2100 R 2.1 0.0 0:00.10 python
22219 nagios 20 0 154900 6008 2292 D 2.1 0.0 0:00.09 python
22222 nagios 20 0 139128 5552 2132 R 2.1 0.0 0:00.10 check_ifopersta
22256 nagios 20 0 136852 5524 2152 R 2.1 0.0 0:00.10 python
22261 nagios 20 0 138996 5284 2132 R 2.1 0.0 0:00.10 check_ifopersta
22406 nagios 20 0 138932 5712 2168 R 2.1 0.0 0:00.09 python
22407 nagios 20 0 175304 6408 2544 R 2.1 0.0 0:00.09 python
22410 nagios 20 0 134768 5296 2136 R 2.1 0.0 0:00.09 python
22413 nagios 20 0 136856 5668 2168 R 2.1 0.0 0:00.09 python
22424 nagios 20 0 145548 5924 2224 R 2.1 0.0 0:00.09 python
22517 nagios 20 0 154900 6012 2292 D 2.1 0.0 0:00.09 python
22572 nagios 20 0 138204 4500 2112 R 2.1 0.0 0:00.09 check_ifopersta
22581 nagios 20 0 137936 4240 2088 R 2.1 0.0 0:00.09 check_ifopersta
22627 nagios 20 0 138204 4496 2112 R 2.1 0.0 0:00.09 check_ifopersta
23166 apache 20 0 653528 42260 5804 R 2.1 0.3 0:07.66 httpd
32271 apache 20 0 641168 30448 6052 R 2.1 0.2 0:10.67 httpd
368 nagios 20 0 172516 49476 2088 R 1.9 0.3 0:03.57 mrtg
6532 apache 20 0 643132 32480 6260 R 1.9 0.2 0:28.21 httpd
8798 apache 20 0 534964 25632 4772 R 1.9 0.2 0:00.65 httpd
16383 apache 20 0 536940 27548 4172 R 1.9 0.2 0:00.65 httpd
18622 nagios 20 0 131536 19652 1352 R 1.9 0.1 0:00.58 send_nrdp.sh
19966 nagios 20 0 153380 11572 2264 R 1.9 0.1 0:00.34 check_ifopersta
19972 nagios 20 0 153380 11568 2264 R 1.9 0.1 0:00.35 check_ifopersta
19973 nagios 20 0 153380 11568 2264 R 1.9 0.1 0:00.34 check_ifopersta
19997 nagios 20 0 153380 11572 2264 R 1.9 0.1 0:00.35 check_ifopersta
20074 nagios 20 0 153248 11572 2264 R 1.9 0.1 0:00.36 check_ifopersta
20087 nagios 20 0 153116 11316 2236 R 1.9 0.1 0:00.36 check_ifopersta
max_concurrent_checks=20 (web interface slow --not as before CPU usage issue--, but at least working)
Code: Select all
top - 16:47:41 up 7:24, 2 users, load average: 59.40, 121.26, 72.98
Tasks: 225 total, 27 running, 196 sleeping, 0 stopped, 2 zombie
%Cpu(s): 76.0 us, 18.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 5.1 si, 0.0 st
KiB Mem : 16431432 total, 10465372 free, 1687416 used, 4278644 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 13717956 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16662 mysql 20 0 2638192 582736 9656 S 22.7 3.5 50:02.58 mysqld
23166 apache 20 0 639180 28440 5816 R 16.6 0.2 0:34.14 httpd
2592 apache 20 0 644072 33128 6264 R 16.2 0.2 0:09.17 httpd
31314 nagios 20 0 114224 2448 1384 R 16.2 0.0 0:16.64 send_nrdp.sh
2601 apache 20 0 643724 32460 5616 R 15.9 0.2 0:07.64 httpd
6532 apache 20 0 640952 30312 6260 R 15.9 0.2 0:55.31 httpd
32271 apache 20 0 639648 28604 6052 R 15.6 0.2 0:37.83 httpd
24303 apache 20 0 644072 33376 6068 S 14.9 0.2 0:36.67 httpd
29116 apache 20 0 641872 31140 6052 R 14.3 0.2 0:32.67 httpd
5318 apache 20 0 639120 28272 5712 S 13.6 0.2 0:31.15 httpd
13625 nagios 20 0 150760 11104 2236 R 11.7 0.1 0:00.36 check_ifopersta
13637 nagios 20 0 148428 10708 2220 R 11.4 0.1 0:00.35 check_ifopersta
13606 nagios 20 0 113560 1952 1388 S 9.1 0.0 0:00.28 send_nrdp.sh
13769 nagios 20 0 146184 8612 2168 R 6.8 0.1 0:00.21 check_ifopersta
13773 nagios 20 0 141900 8180 2140 R 6.8 0.0 0:00.21 check_ifopersta
537 root 20 0 88484 40536 40156 S 5.5 0.2 10:01.19 systemd-journal
13817 nagios 20 0 187340 9224 3604 R 5.2 0.1 0:00.16 python
13832 nagios 20 0 187332 9600 3616 R 5.2 0.1 0:00.16 python
13816 nagios 20 0 184872 9016 3556 R 4.9 0.1 0:00.15 python
27771 nagios 20 0 1229272 41608 3552 S 4.9 0.3 0:08.30 nagios
13824 nagios 20 0 139524 5816 2132 R 4.5 0.0 0:00.14 check_ifopersta
13839 nagios 20 0 140316 6600 2132 R 4.2 0.0 0:00.13 check_ifopersta
9 root 20 0 0 0 0 S 2.9 0.0 8:30.08 rcu_sched
30865 nagios 20 0 172644 49476 2088 S 2.9 0.3 0:03.34 mrtg
13628 nagios 20 0 115536 1700 1368 S 1.6 0.0 0:00.05 check_rrdtraf
30862 nagios 20 0 174596 49652 2268 S 1.6 0.3 0:03.16 mrtg
30863 nagios 20 0 172512 49448 2080 S 1.6 0.3 0:02.56 mrtg
13527 nagios 20 0 115540 1700 1368 S 1.3 0.0 0:00.04 check_rrdtraf
13881 nagios 20 0 264836 4016 2960 R 1.3 0.0 0:00.04 curl
30866 nagios 20 0 172512 49432 2080 S 1.3 0.3 0:02.50 mrtg
5296 root 20 0 163008 3056 1608 R 1.0 0.0 0:07.04 top
27777 nagios 20 0 10844 1036 756 S 1.0 0.0 0:00.12 nagios
13551 nagios 20 0 115540 1704 1368 R 0.6 0.0 0:00.02 check_rrdtraf
12 root rt 0 0 0 0 S 0.3 0.0 0:04.34 watchdog/1
19 root 20 0 0 0 0 S 0.3 0.0 0:17.64 ksoftirqd/2
1071 root 20 0 557480 30276 28440 S 0.3 0.2 2:53.14 rsyslogd
9165 root 20 0 161728 6216 4832 S 0.3 0.0 0:00.92 sshd
9185 nagios 20 0 452616 33436 10524 S 0.3 0.2 0:00.98 php
13815 nagios 20 0 115536 1664 1356 S 0.3 0.0 0:00.01 check_rrdtraf
13835 nagios 20 0 115408 1648 1352 S 0.3 0.0 0:00.01 check_rrdtraf
13854 nagios 20 0 115408 1652 1352 S 0.3 0.0 0:00.01 check_rrdtraf
Re: Nagios XI 5.8.5 high CPU usage
Posted: Thu Aug 12, 2021 2:37 pm
by dchurch
If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.
If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:
Code: Select all
rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting
/usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.
Re: Nagios XI 5.8.5 high CPU usage
Posted: Thu Aug 12, 2021 2:59 pm
by ignacio.sanchez
PM sent, but it is stuck in my Outbox.
Re: Nagios XI 5.8.5 high CPU usage
Posted: Fri Aug 13, 2021 12:16 pm
by pbroste
Hello @ignacio.sanchez
Thanks for sending over the System Profile
After review we see
[ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
. Failing on nagios_logentries. We will hit up that with the
following:
Code: Select all
systemctl stop mariadb.service
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_logentries
systemctl start mariadb.service
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
Then;
Code: Select all
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_notifications'
Important: Running these commands will clear all entries from the affected tables. After you truncate tables, you should repeat the repair process outlined above.
Please bounce the nagios.service and verify.
Thanks,
Perry
Re: Nagios XI 5.8.5 high CPU usage
Posted: Mon Aug 16, 2021 6:13 am
by ignacio.sanchez
Hello Perry.
Seems the problem is now solved, thanks a lot!
For the next time, where can see I the error of "repair failed"? (I'm asking because after launching the repair process, all seems good, no error message received)
Re: Nagios XI 5.8.5 high CPU usage
Posted: Mon Aug 16, 2021 1:23 pm
by pbroste
Hello @ignacio.sanchez
Most excellent, I am glad that we were able to help resolve the issue. The referencing log entry "[ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed" is found /var/log/mysqld.log or /var/log/mariadb.log.
We will go ahead and lock.
Thanks,
Perry