5.5.1 Httpd high load
5.5.1 Httpd high load
Since moving to 5.5.0 httpd load has seen a sharp uptick. The issue got better after upgrading to 5.5.1. Prior to the 5.5.0 update we saw load averages around 1.24, now we're hitting around 3-6 consistently during 8-5 working hours.
We updated on the 9th.
detailed stats for today, notice the drop off as soon as people leave for the day. load averages for today, notice the drop off as soon as people leave for the day. load averages for past 30 days, things broke on the 12th...badly...which is why there's some missing graph data.
We updated on the 9th.
detailed stats for today, notice the drop off as soon as people leave for the day. load averages for today, notice the drop off as soon as people leave for the day. load averages for past 30 days, things broke on the 12th...badly...which is why there's some missing graph data.
You do not have the required permissions to view the files attached to this post.
I like graphs...
Re: 5.5.1 Httpd high load
There was an issue in 5.5.0 with Nagios Core that could cause the check interval to be stuck at the retry_interval - which could cause your system to do up to 5x more checks (or more) in certain circumstances. This issue was fixed in 5.5.1 which may explain why it went back down afterwards. Is it still lower now?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: 5.5.1 Httpd high load
We upgraded to 5.5.0 on the 9th and then 5.5.1 on the 12th as soon as it was available. Load is still much higher than prior to 5.5.0/5.5.1. Looking at top, Mysql is the highest generator of load, with multiple httpd processes being next in line. We are looking at offloading the DB, but it still seems that load has jumped since the update.jomann wrote:There was an issue in 5.5.0 with Nagios Core that could cause the check interval to be stuck at the retry_interval - which could cause your system to do up to 5x more checks (or more) in certain circumstances. This issue was fixed in 5.5.1 which may explain why it went back down afterwards. Is it still lower now?
This is what top looks like currently. We would usually sit at around 1.2 load prior to the updates.
Code: Select all
[root@srq-nagios-xi1 ~]# top
top - 12:50:48 up 6 days, 21:04, 1 user, load average: 4.33, 4.84, 5.23
Tasks: 269 total, 11 running, 258 sleeping, 0 stopped, 0 zombie
Cpu(s): 69.5%us, 8.3%sy, 0.0%ni, 21.3%id, 0.4%wa, 0.1%hi, 0.5%si, 0.0%st
Mem: 8045604k total, 6850220k used, 1195384k free, 73908k buffers
Swap: 2064376k total, 209528k used, 1854848k free, 3564420k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
61792 mysql 20 0 2280m 114m 4372 S 107.7 1.5 3303:12 mysqld
98120 apache 20 0 461m 44m 4908 R 44.6 0.6 1:22.45 httpd
91418 apache 20 0 540m 124m 4952 S 39.9 1.6 4:00.90 httpd
17399 apache 20 0 458m 41m 4732 S 15.0 0.5 0:35.26 httpd
18440 apache 20 0 517m 101m 5120 S 10.3 1.3 3:41.42 httpd
86350 apache 20 0 457m 40m 4764 R 6.3 0.5 2:10.23 httpd
61779 apache 20 0 457m 40m 4920 R 6.0 0.5 2:04.97 httpd
17398 apache 20 0 466m 45m 7844 R 5.3 0.6 0:53.94 httpd
27588 apache 20 0 453m 37m 4868 R 5.3 0.5 0:11.64 httpd
11186 apache 20 0 513m 97m 4800 R 5.0 1.2 0:35.11 httpd
114619 apache 20 0 523m 107m 4732 S 4.7 1.4 1:25.65 httpd
75596 nagios 20 0 56660 7664 992 S 4.3 0.1 2:16.00 ndo2db
80295 apache 20 0 522m 106m 4848 S 4.3 1.4 1:54.02 httpd
5435 apache 20 0 513m 97m 4916 R 4.0 1.2 0:35.58 httpd
72418 apache 20 0 514m 98m 4976 S 4.0 1.3 1:41.23 httpd
86351 apache 20 0 530m 114m 4848 S 3.7 1.5 1:46.97 httpd
18483 postgres 20 0 213m 7192 5696 S 2.3 0.1 0:04.66 postmaster
1520 root 20 0 0 0 0 S 1.0 0.0 19:32.93 flush-253:0
49486 nagios 20 0 212m 12m 7492 S 1.0 0.2 0:00.03 check_nagioslog
49487 nagios 20 0 212m 12m 7492 S 1.0 0.2 0:00.03 check_nagioslog
98331 postgres 20 0 212m 6568 4872 S 1.0 0.1 0:01.54 postmaster
75585 nagios 20 0 10124 1008 672 S 0.7 0.0 0:01.91 nagios
86481 postgres 20 0 212m 6660 4968 S 0.7 0.1 0:01.96 postmaster
91875 root 20 0 2194m 124m 1200 S 0.7 1.6 79:55.19 curlftpfs
11260 postgres 20 0 212m 6488 4800 S 0.3 0.1 0:00.54 postmaster
17444 postgres 20 0 212m 6552 4860 S 0.3 0.1 0:00.54 postmaster
17473 postgres 20 0 213m 6772 5040 S 0.3 0.1 0:00.69 postmaster
27998 postgres 20 0 212m 6084 4404 S 0.3 0.1 0:00.30 postmaster
46348 nagios 20 0 327m 36m 8888 S 0.3 0.5 0:00.62 php
46449 postgres 20 0 213m 9900 8136 S 0.3 0.1 0:00.58 postmaster
48568 root 20 0 15160 1464 1000 R 0.3 0.0 0:00.02 top
72692 postgres 20 0 212m 6664 4972 S 0.3 0.1 0:01.73 postmaster
75582 nagios 20 0 48536 29m 1376 S 0.3 0.4 0:32.95 nagios
75586 nagios 20 0 10124 1012 672 S 0.3 0.0 0:01.91 nagios
86484 postgres 20 0 212m 6656 4964 S 0.3 0.1 0:01.54 postmaster
91486 postgres 20 0 213m 6640 5132 S 0.3 0.1 0:03.71 postmaster
114823 postgres 20 0 212m 6560 4864 S 0.3 0.1 0:01.17 postmaster
1 root 20 0 19356 1036 868 S 0.0 0.0 0:23.66 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:42.58 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:27.78 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.58 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 1:10.87 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 0:25.64 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:00.27 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:44.61 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
13 root 20 0 0 0 0 S 0.0 0.0 0:30.08 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 0:00.43 watchdog/2
15 root RT 0 0 0 0 S 0.0 0.0 0:59.53 migration/3
I like graphs...
Re: 5.5.1 Httpd high load
Are you seeing any MySQL errors in /var/log/mysqld.log ? It does seem like MySQL is taking up the most resources here. I would also do a and see what is running on the system. It could be a backlog of events or possibly the change to use innodb databases for most of the Nagios XI data (although I don't think that is the problem)
Code: Select all
ps -ef | grep nagiosAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: 5.5.1 Httpd high load
I PM'd the output of ps -ef | grep nagiosjomann wrote:Are you seeing any MySQL errors in /var/log/mysqld.log ? It does seem like MySQL is taking up the most resources here. I would also do aand see what is running on the system. It could be a backlog of events or possibly the change to use innodb databases for most of the Nagios XI data (although I don't think that is the problem)Code: Select all
ps -ef | grep nagios
No error's in the MySQL logs for a few days now. If there's something else I can look at on that side of the fence let me know.
I like graphs...
Re: 5.5.1 Httpd high load
Well it seems like in your ps -ef there are multiple cron php files running, and I think they are overlapping one another, at least the event_handler.php and eventman.php cron files. Those are heavily interacting with the database. Can you do a count on the xi_eventqueue database and see how many are in queue?
Code: Select all
mysql -pnagiosxi nagiosxi -e "SELECT COUNT(*) FROM xi_eventqueue"As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: 5.5.1 Httpd high load
That command didn't work for us, there is no "nagiosxi", but after playing around with it, here is some data for you.jomann wrote:Well it seems like in your ps -ef there are multiple cron php files running, and I think they are overlapping one another, at least the event_handler.php and eventman.php cron files. Those are heavily interacting with the database. Can you do a count on the xi_eventqueue database and see how many are in queue?Code: Select all
mysql -pnagiosxi nagiosxi -e "SELECT COUNT(*) FROM xi_eventqueue"
Code: Select all
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| nagios |
| nagiosql |
| test |
+--------------------+
5 rows in set (0.05 sec)
mysql> use nagios
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+----------------------------------------+
| Tables_in_nagios |
+----------------------------------------+
| nagios_acknowledgements |
| nagios_commands |
| nagios_commenthistory |
| nagios_comments |
| nagios_configfiles |
| nagios_configfilevariables |
| nagios_conninfo |
| nagios_contact_addresses |
| nagios_contact_notificationcommands |
| nagios_contactgroup_members |
| nagios_contactgroups |
| nagios_contactnotificationmethods |
| nagios_contactnotifications |
| nagios_contacts |
| nagios_contactstatus |
| nagios_customvariables |
| nagios_customvariablestatus |
| nagios_dbversion |
| nagios_downtimehistory |
| nagios_eventhandlers |
| nagios_externalcommands |
| nagios_flappinghistory |
| nagios_host_contactgroups |
| nagios_host_contacts |
| nagios_host_parenthosts |
| nagios_hostchecks |
| nagios_hostdependencies |
| nagios_hostescalation_contactgroups |
| nagios_hostescalation_contacts |
| nagios_hostescalations |
| nagios_hostgroup_members |
| nagios_hostgroups |
| nagios_hosts |
| nagios_hoststatus |
| nagios_instances |
| nagios_logentries |
| nagios_notifications |
| nagios_objects |
| nagios_processevents |
| nagios_programstatus |
| nagios_runtimevariables |
| nagios_scheduleddowntime |
| nagios_service_contactgroups |
| nagios_service_contacts |
| nagios_service_parentservices |
| nagios_servicechecks |
| nagios_servicedependencies |
| nagios_serviceescalation_contactgroups |
| nagios_serviceescalation_contacts |
| nagios_serviceescalations |
| nagios_servicegroup_members |
| nagios_servicegroups |
| nagios_services |
| nagios_servicestatus |
| nagios_statehistory |
| nagios_systemcommands |
| nagios_timedeventqueue |
| nagios_timedevents |
| nagios_timeperiod_timeranges |
| nagios_timeperiods |
+----------------------------------------+
60 rows in set (0.00 sec)
mysql> close database;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'close database' at line 1
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| nagios |
| nagiosql |
| test |
+--------------------+
5 rows in set (0.00 sec)
mysql> use nagiosql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+--------------------------------------------+
| Tables_in_nagiosql |
+--------------------------------------------+
| tbl_command |
| tbl_contact |
| tbl_contactgroup |
| tbl_contacttemplate |
| tbl_domain |
| tbl_host |
| tbl_hostdependency |
| tbl_hostescalation |
| tbl_hostextinfo |
| tbl_hostgroup |
| tbl_hosttemplate |
| tbl_info |
| tbl_lnkContactToCommandHost |
| tbl_lnkContactToCommandService |
| tbl_lnkContactToContactgroup |
| tbl_lnkContactToContacttemplate |
| tbl_lnkContactToVariabledefinition |
| tbl_lnkContactgroupToContact |
| tbl_lnkContactgroupToContactgroup |
| tbl_lnkContacttemplateToCommandHost |
| tbl_lnkContacttemplateToCommandService |
| tbl_lnkContacttemplateToContactgroup |
| tbl_lnkContacttemplateToContacttemplate |
| tbl_lnkContacttemplateToVariabledefinition |
| tbl_lnkHostToContact |
| tbl_lnkHostToContactgroup |
| tbl_lnkHostToHost |
| tbl_lnkHostToHostgroup |
| tbl_lnkHostToHosttemplate |
| tbl_lnkHostToVariabledefinition |
| tbl_lnkHostdependencyToHost_DH |
| tbl_lnkHostdependencyToHost_H |
| tbl_lnkHostdependencyToHostgroup_DH |
| tbl_lnkHostdependencyToHostgroup_H |
| tbl_lnkHostescalationToContact |
| tbl_lnkHostescalationToContactgroup |
| tbl_lnkHostescalationToHost |
| tbl_lnkHostescalationToHostgroup |
| tbl_lnkHostgroupToHost |
| tbl_lnkHostgroupToHostgroup |
| tbl_lnkHosttemplateToContact |
| tbl_lnkHosttemplateToContactgroup |
| tbl_lnkHosttemplateToHost |
| tbl_lnkHosttemplateToHostgroup |
| tbl_lnkHosttemplateToHosttemplate |
| tbl_lnkHosttemplateToVariabledefinition |
| tbl_lnkServiceToContact |
| tbl_lnkServiceToContactgroup |
| tbl_lnkServiceToHost |
| tbl_lnkServiceToHostgroup |
| tbl_lnkServiceToServicegroup |
| tbl_lnkServiceToServicetemplate |
| tbl_lnkServiceToVariabledefinition |
| tbl_lnkServicedependencyToHost_DH |
| tbl_lnkServicedependencyToHost_H |
| tbl_lnkServicedependencyToHostgroup_DH |
| tbl_lnkServicedependencyToHostgroup_H |
| tbl_lnkServicedependencyToService_DS |
| tbl_lnkServicedependencyToService_S |
| tbl_lnkServiceescalationToContact |
| tbl_lnkServiceescalationToContactgroup |
| tbl_lnkServiceescalationToHost |
| tbl_lnkServiceescalationToHostgroup |
| tbl_lnkServiceescalationToService |
| tbl_lnkServicegroupToService |
| tbl_lnkServicegroupToServicegroup |
| tbl_lnkServicetemplateToContact |
| tbl_lnkServicetemplateToContactgroup |
| tbl_lnkServicetemplateToHost |
| tbl_lnkServicetemplateToHostgroup |
| tbl_lnkServicetemplateToServicegroup |
| tbl_lnkServicetemplateToServicetemplate |
| tbl_lnkServicetemplateToVariabledefinition |
| tbl_lnkTimeperiodToTimeperiod |
| tbl_logbook |
| tbl_mainmenu |
| tbl_permission |
| tbl_permission_inactive |
| tbl_service |
| tbl_servicedependency |
| tbl_serviceescalation |
| tbl_serviceextinfo |
| tbl_servicegroup |
| tbl_servicetemplate |
| tbl_session |
| tbl_session_locks |
| tbl_settings |
| tbl_submenu |
| tbl_timedefinition |
| tbl_timeperiod |
| tbl_user |
| tbl_variabledefinition |
+--------------------------------------------+
92 rows in set (0.00 sec)I like graphs...
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: 5.5.1 Httpd high load
you must have a postgres based system
Lets run
Lets run
Code: Select all
echo "SELECT COUNT(*) FROM xi_eventqueue;"|psql nagiosxi nagiosxiRe: 5.5.1 Httpd high load
Appears so.scottwilkerson wrote:you must have a postgres based system
Lets runCode: Select all
echo "SELECT COUNT(*) FROM xi_eventqueue;"|psql nagiosxi nagiosxi
Code: Select all
[root@srq-nagios-xi1 ~]# echo "SELECT COUNT(*) FROM xi_eventqueue;"|psql nagiosxi nagiosxi
count
-------
1
(1 row)
I like graphs...
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: 5.5.1 Httpd high load
@Ehamby, Please send in your Nagios XI System Profile.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to a cloud storage of your choice. You can share a link with me in a personal message.
After you upload the profile please post something in this thread to bring it up in the support queue.
*Profile was received and shared with the support team
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to a cloud storage of your choice. You can share a link with me in a personal message.
After you upload the profile please post something in this thread to bring it up in the support queue.
*Profile was received and shared with the support team
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.