Page 1 of 3

5.5.1 Httpd high load

Posted: Tue Jul 17, 2018 9:19 pm
by Envera IT
Since moving to 5.5.0 httpd load has seen a sharp uptick. The issue got better after upgrading to 5.5.1. Prior to the 5.5.0 update we saw load averages around 1.24, now we're hitting around 3-6 consistently during 8-5 working hours.

We updated on the 9th.

detailed stats for today, notice the drop off as soon as people leave for the day.
Capture.PNG
load averages for today, notice the drop off as soon as people leave for the day.
Capture4.PNG
load averages for past 30 days, things broke on the 12th...badly...which is why there's some missing graph data.
Capture3.PNG

Re: 5.5.1 Httpd high load

Posted: Wed Jul 18, 2018 10:00 am
by jomann
There was an issue in 5.5.0 with Nagios Core that could cause the check interval to be stuck at the retry_interval - which could cause your system to do up to 5x more checks (or more) in certain circumstances. This issue was fixed in 5.5.1 which may explain why it went back down afterwards. Is it still lower now?

Re: 5.5.1 Httpd high load

Posted: Wed Jul 18, 2018 11:52 am
by Envera IT
jomann wrote:There was an issue in 5.5.0 with Nagios Core that could cause the check interval to be stuck at the retry_interval - which could cause your system to do up to 5x more checks (or more) in certain circumstances. This issue was fixed in 5.5.1 which may explain why it went back down afterwards. Is it still lower now?
We upgraded to 5.5.0 on the 9th and then 5.5.1 on the 12th as soon as it was available. Load is still much higher than prior to 5.5.0/5.5.1. Looking at top, Mysql is the highest generator of load, with multiple httpd processes being next in line. We are looking at offloading the DB, but it still seems that load has jumped since the update.

This is what top looks like currently. We would usually sit at around 1.2 load prior to the updates.

Code: Select all

[root@srq-nagios-xi1 ~]# top
top - 12:50:48 up 6 days, 21:04,  1 user,  load average: 4.33, 4.84, 5.23
Tasks: 269 total,  11 running, 258 sleeping,   0 stopped,   0 zombie
Cpu(s): 69.5%us,  8.3%sy,  0.0%ni, 21.3%id,  0.4%wa,  0.1%hi,  0.5%si,  0.0%st
Mem:   8045604k total,  6850220k used,  1195384k free,    73908k buffers
Swap:  2064376k total,   209528k used,  1854848k free,  3564420k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 61792 mysql     20   0 2280m 114m 4372 S 107.7  1.5   3303:12 mysqld
 98120 apache    20   0  461m  44m 4908 R 44.6  0.6   1:22.45 httpd
 91418 apache    20   0  540m 124m 4952 S 39.9  1.6   4:00.90 httpd
 17399 apache    20   0  458m  41m 4732 S 15.0  0.5   0:35.26 httpd
 18440 apache    20   0  517m 101m 5120 S 10.3  1.3   3:41.42 httpd
 86350 apache    20   0  457m  40m 4764 R  6.3  0.5   2:10.23 httpd
 61779 apache    20   0  457m  40m 4920 R  6.0  0.5   2:04.97 httpd
 17398 apache    20   0  466m  45m 7844 R  5.3  0.6   0:53.94 httpd
 27588 apache    20   0  453m  37m 4868 R  5.3  0.5   0:11.64 httpd
 11186 apache    20   0  513m  97m 4800 R  5.0  1.2   0:35.11 httpd
114619 apache    20   0  523m 107m 4732 S  4.7  1.4   1:25.65 httpd
 75596 nagios    20   0 56660 7664  992 S  4.3  0.1   2:16.00 ndo2db
 80295 apache    20   0  522m 106m 4848 S  4.3  1.4   1:54.02 httpd
  5435 apache    20   0  513m  97m 4916 R  4.0  1.2   0:35.58 httpd
 72418 apache    20   0  514m  98m 4976 S  4.0  1.3   1:41.23 httpd
 86351 apache    20   0  530m 114m 4848 S  3.7  1.5   1:46.97 httpd
 18483 postgres  20   0  213m 7192 5696 S  2.3  0.1   0:04.66 postmaster
  1520 root      20   0     0    0    0 S  1.0  0.0  19:32.93 flush-253:0
 49486 nagios    20   0  212m  12m 7492 S  1.0  0.2   0:00.03 check_nagioslog
 49487 nagios    20   0  212m  12m 7492 S  1.0  0.2   0:00.03 check_nagioslog
 98331 postgres  20   0  212m 6568 4872 S  1.0  0.1   0:01.54 postmaster
 75585 nagios    20   0 10124 1008  672 S  0.7  0.0   0:01.91 nagios
 86481 postgres  20   0  212m 6660 4968 S  0.7  0.1   0:01.96 postmaster
 91875 root      20   0 2194m 124m 1200 S  0.7  1.6  79:55.19 curlftpfs
 11260 postgres  20   0  212m 6488 4800 S  0.3  0.1   0:00.54 postmaster
 17444 postgres  20   0  212m 6552 4860 S  0.3  0.1   0:00.54 postmaster
 17473 postgres  20   0  213m 6772 5040 S  0.3  0.1   0:00.69 postmaster
 27998 postgres  20   0  212m 6084 4404 S  0.3  0.1   0:00.30 postmaster
 46348 nagios    20   0  327m  36m 8888 S  0.3  0.5   0:00.62 php
 46449 postgres  20   0  213m 9900 8136 S  0.3  0.1   0:00.58 postmaster
 48568 root      20   0 15160 1464 1000 R  0.3  0.0   0:00.02 top
 72692 postgres  20   0  212m 6664 4972 S  0.3  0.1   0:01.73 postmaster
 75582 nagios    20   0 48536  29m 1376 S  0.3  0.4   0:32.95 nagios
 75586 nagios    20   0 10124 1012  672 S  0.3  0.0   0:01.91 nagios
 86484 postgres  20   0  212m 6656 4964 S  0.3  0.1   0:01.54 postmaster
 91486 postgres  20   0  213m 6640 5132 S  0.3  0.1   0:03.71 postmaster
114823 postgres  20   0  212m 6560 4864 S  0.3  0.1   0:01.17 postmaster
     1 root      20   0 19356 1036  868 S  0.0  0.0   0:23.66 init
     2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
     3 root      RT   0     0    0    0 S  0.0  0.0   0:42.58 migration/0
     4 root      20   0     0    0    0 S  0.0  0.0   0:27.78 ksoftirqd/0
     5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
     6 root      RT   0     0    0    0 S  0.0  0.0   0:00.58 watchdog/0
     7 root      RT   0     0    0    0 S  0.0  0.0   1:10.87 migration/1
     8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
     9 root      20   0     0    0    0 S  0.0  0.0   0:25.64 ksoftirqd/1
    10 root      RT   0     0    0    0 S  0.0  0.0   0:00.27 watchdog/1
    11 root      RT   0     0    0    0 S  0.0  0.0   0:44.61 migration/2
    12 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/2
    13 root      20   0     0    0    0 S  0.0  0.0   0:30.08 ksoftirqd/2
    14 root      RT   0     0    0    0 S  0.0  0.0   0:00.43 watchdog/2
    15 root      RT   0     0    0    0 S  0.0  0.0   0:59.53 migration/3

Re: 5.5.1 Httpd high load

Posted: Thu Jul 19, 2018 8:59 am
by jomann
Are you seeing any MySQL errors in /var/log/mysqld.log ? It does seem like MySQL is taking up the most resources here. I would also do a

Code: Select all

ps -ef | grep nagios
and see what is running on the system. It could be a backlog of events or possibly the change to use innodb databases for most of the Nagios XI data (although I don't think that is the problem)

Re: 5.5.1 Httpd high load

Posted: Thu Jul 19, 2018 9:29 am
by Envera IT
jomann wrote:Are you seeing any MySQL errors in /var/log/mysqld.log ? It does seem like MySQL is taking up the most resources here. I would also do a

Code: Select all

ps -ef | grep nagios
and see what is running on the system. It could be a backlog of events or possibly the change to use innodb databases for most of the Nagios XI data (although I don't think that is the problem)
I PM'd the output of ps -ef | grep nagios

No error's in the MySQL logs for a few days now. If there's something else I can look at on that side of the fence let me know.

Re: 5.5.1 Httpd high load

Posted: Thu Jul 19, 2018 11:12 am
by jomann
Well it seems like in your ps -ef there are multiple cron php files running, and I think they are overlapping one another, at least the event_handler.php and eventman.php cron files. Those are heavily interacting with the database. Can you do a count on the xi_eventqueue database and see how many are in queue?

Code: Select all

mysql -pnagiosxi nagiosxi -e "SELECT COUNT(*) FROM xi_eventqueue"

Re: 5.5.1 Httpd high load

Posted: Fri Jul 20, 2018 11:29 am
by Envera IT
jomann wrote:Well it seems like in your ps -ef there are multiple cron php files running, and I think they are overlapping one another, at least the event_handler.php and eventman.php cron files. Those are heavily interacting with the database. Can you do a count on the xi_eventqueue database and see how many are in queue?

Code: Select all

mysql -pnagiosxi nagiosxi -e "SELECT COUNT(*) FROM xi_eventqueue"
That command didn't work for us, there is no "nagiosxi", but after playing around with it, here is some data for you.

Code: Select all

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| nagios             |
| nagiosql           |
| test               |
+--------------------+
5 rows in set (0.05 sec)

mysql> use nagios
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+----------------------------------------+
| Tables_in_nagios                       |
+----------------------------------------+
| nagios_acknowledgements                |
| nagios_commands                        |
| nagios_commenthistory                  |
| nagios_comments                        |
| nagios_configfiles                     |
| nagios_configfilevariables             |
| nagios_conninfo                        |
| nagios_contact_addresses               |
| nagios_contact_notificationcommands    |
| nagios_contactgroup_members            |
| nagios_contactgroups                   |
| nagios_contactnotificationmethods      |
| nagios_contactnotifications            |
| nagios_contacts                        |
| nagios_contactstatus                   |
| nagios_customvariables                 |
| nagios_customvariablestatus            |
| nagios_dbversion                       |
| nagios_downtimehistory                 |
| nagios_eventhandlers                   |
| nagios_externalcommands                |
| nagios_flappinghistory                 |
| nagios_host_contactgroups              |
| nagios_host_contacts                   |
| nagios_host_parenthosts                |
| nagios_hostchecks                      |
| nagios_hostdependencies                |
| nagios_hostescalation_contactgroups    |
| nagios_hostescalation_contacts         |
| nagios_hostescalations                 |
| nagios_hostgroup_members               |
| nagios_hostgroups                      |
| nagios_hosts                           |
| nagios_hoststatus                      |
| nagios_instances                       |
| nagios_logentries                      |
| nagios_notifications                   |
| nagios_objects                         |
| nagios_processevents                   |
| nagios_programstatus                   |
| nagios_runtimevariables                |
| nagios_scheduleddowntime               |
| nagios_service_contactgroups           |
| nagios_service_contacts                |
| nagios_service_parentservices          |
| nagios_servicechecks                   |
| nagios_servicedependencies             |
| nagios_serviceescalation_contactgroups |
| nagios_serviceescalation_contacts      |
| nagios_serviceescalations              |
| nagios_servicegroup_members            |
| nagios_servicegroups                   |
| nagios_services                        |
| nagios_servicestatus                   |
| nagios_statehistory                    |
| nagios_systemcommands                  |
| nagios_timedeventqueue                 |
| nagios_timedevents                     |
| nagios_timeperiod_timeranges           |
| nagios_timeperiods                     |
+----------------------------------------+
60 rows in set (0.00 sec)

mysql> close database;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'close database' at line 1
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| nagios             |
| nagiosql           |
| test               |
+--------------------+
5 rows in set (0.00 sec)

mysql> use nagiosql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+--------------------------------------------+
| Tables_in_nagiosql                         |
+--------------------------------------------+
| tbl_command                                |
| tbl_contact                                |
| tbl_contactgroup                           |
| tbl_contacttemplate                        |
| tbl_domain                                 |
| tbl_host                                   |
| tbl_hostdependency                         |
| tbl_hostescalation                         |
| tbl_hostextinfo                            |
| tbl_hostgroup                              |
| tbl_hosttemplate                           |
| tbl_info                                   |
| tbl_lnkContactToCommandHost                |
| tbl_lnkContactToCommandService             |
| tbl_lnkContactToContactgroup               |
| tbl_lnkContactToContacttemplate            |
| tbl_lnkContactToVariabledefinition         |
| tbl_lnkContactgroupToContact               |
| tbl_lnkContactgroupToContactgroup          |
| tbl_lnkContacttemplateToCommandHost        |
| tbl_lnkContacttemplateToCommandService     |
| tbl_lnkContacttemplateToContactgroup       |
| tbl_lnkContacttemplateToContacttemplate    |
| tbl_lnkContacttemplateToVariabledefinition |
| tbl_lnkHostToContact                       |
| tbl_lnkHostToContactgroup                  |
| tbl_lnkHostToHost                          |
| tbl_lnkHostToHostgroup                     |
| tbl_lnkHostToHosttemplate                  |
| tbl_lnkHostToVariabledefinition            |
| tbl_lnkHostdependencyToHost_DH             |
| tbl_lnkHostdependencyToHost_H              |
| tbl_lnkHostdependencyToHostgroup_DH        |
| tbl_lnkHostdependencyToHostgroup_H         |
| tbl_lnkHostescalationToContact             |
| tbl_lnkHostescalationToContactgroup        |
| tbl_lnkHostescalationToHost                |
| tbl_lnkHostescalationToHostgroup           |
| tbl_lnkHostgroupToHost                     |
| tbl_lnkHostgroupToHostgroup                |
| tbl_lnkHosttemplateToContact               |
| tbl_lnkHosttemplateToContactgroup          |
| tbl_lnkHosttemplateToHost                  |
| tbl_lnkHosttemplateToHostgroup             |
| tbl_lnkHosttemplateToHosttemplate          |
| tbl_lnkHosttemplateToVariabledefinition    |
| tbl_lnkServiceToContact                    |
| tbl_lnkServiceToContactgroup               |
| tbl_lnkServiceToHost                       |
| tbl_lnkServiceToHostgroup                  |
| tbl_lnkServiceToServicegroup               |
| tbl_lnkServiceToServicetemplate            |
| tbl_lnkServiceToVariabledefinition         |
| tbl_lnkServicedependencyToHost_DH          |
| tbl_lnkServicedependencyToHost_H           |
| tbl_lnkServicedependencyToHostgroup_DH     |
| tbl_lnkServicedependencyToHostgroup_H      |
| tbl_lnkServicedependencyToService_DS       |
| tbl_lnkServicedependencyToService_S        |
| tbl_lnkServiceescalationToContact          |
| tbl_lnkServiceescalationToContactgroup     |
| tbl_lnkServiceescalationToHost             |
| tbl_lnkServiceescalationToHostgroup        |
| tbl_lnkServiceescalationToService          |
| tbl_lnkServicegroupToService               |
| tbl_lnkServicegroupToServicegroup          |
| tbl_lnkServicetemplateToContact            |
| tbl_lnkServicetemplateToContactgroup       |
| tbl_lnkServicetemplateToHost               |
| tbl_lnkServicetemplateToHostgroup          |
| tbl_lnkServicetemplateToServicegroup       |
| tbl_lnkServicetemplateToServicetemplate    |
| tbl_lnkServicetemplateToVariabledefinition |
| tbl_lnkTimeperiodToTimeperiod              |
| tbl_logbook                                |
| tbl_mainmenu                               |
| tbl_permission                             |
| tbl_permission_inactive                    |
| tbl_service                                |
| tbl_servicedependency                      |
| tbl_serviceescalation                      |
| tbl_serviceextinfo                         |
| tbl_servicegroup                           |
| tbl_servicetemplate                        |
| tbl_session                                |
| tbl_session_locks                          |
| tbl_settings                               |
| tbl_submenu                                |
| tbl_timedefinition                         |
| tbl_timeperiod                             |
| tbl_user                                   |
| tbl_variabledefinition                     |
+--------------------------------------------+
92 rows in set (0.00 sec)

Re: 5.5.1 Httpd high load

Posted: Fri Jul 20, 2018 1:50 pm
by scottwilkerson
you must have a postgres based system

Lets run

Code: Select all

echo "SELECT COUNT(*) FROM xi_eventqueue;"|psql nagiosxi nagiosxi

Re: 5.5.1 Httpd high load

Posted: Fri Jul 20, 2018 2:50 pm
by Envera IT
scottwilkerson wrote:you must have a postgres based system

Lets run

Code: Select all

echo "SELECT COUNT(*) FROM xi_eventqueue;"|psql nagiosxi nagiosxi
Appears so.

Code: Select all


[root@srq-nagios-xi1 ~]# echo "SELECT COUNT(*) FROM xi_eventqueue;"|psql nagiosxi nagiosxi                                                                                                                    

count
-------
     1
(1 row)


Re: 5.5.1 Httpd high load

Posted: Fri Jul 20, 2018 3:29 pm
by npolovenko
@Ehamby, Please send in your Nagios XI System Profile.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to a cloud storage of your choice. You can share a link with me in a personal message.
After you upload the profile please post something in this thread to bring it up in the support queue.


*Profile was received and shared with the support team