Page 1 of 2

High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Thu Sep 27, 2018 1:20 am
by mslominski
Last week, we performed the upgrade from 5.4.13 to 5.5.2-1 using the Nagios official RPM offline packages. Because we use the mod_gearman solution we downgraded the nagiosxi-nagioscore package to the version 5.4.13 (Core 4.2.4) according the KB.

The solution works, however not efficient. The CPU Usage of MySQL increase from 20% to 200% (2 vCPU), the IPC Message Queues are full all of the time and doesn't want to back to normal level (max 1000). The checks are not executed regularly as before.

We don’t have more ideas to find the source of this problem; therefore we posted this post on forum.

Technical information:
- RHEL 6.6 64bit - Kernel version 2.6.32-754.el6.x86_64
- VMware server: 10vCPU, 16 GB of RAM
- SSL configured and listen on 443, PHP Version: 5.3.3
- Gnome is not installed
- Total Services: 23823
- Total Hosts: 5244
- MySQL keeps only 7 days of Nagios data (Nagios DBs size is 1 GB)
- MySQL statistics: 47M q [717.191 qps], 170K conn, TX: 756G, RX: 137G, Reads / Writes: 5% / 95%
- All the checks are executed by the mod-gearman servers

Kernel configuration:
- kernel.msgmnb = 262144000
- kernel.msgmax = 262144000
- kernel.shmmax = 4294967295
- kernel.shmall = 268435456
- kernel.msgmni = 512000
- vm.swappiness = 10

MySQL configuration:
[mysqld]
#innodb_file_per_table=1
datadir=/var/lib/mysql
max_allowed_packet=512M
socket=/var/lib/mysql/mysql.sock
user=mysql
join_buffer_size=524288
key_buffer_size=211M
log-queries-not-using-indexes=1
long_query_time=1
max_connections=500
max_heap_table_size=64M
read_buffer_size=524288
read_rnd_buffer_size=524288
query_cache_type=1
query_cache_min_res_unit=2k
query_cache_limit=4M
query_cache_size=80M
skip-name-resolve
slow-query-log=0
slow-query-log-file=/var/log/mysqld-slow.log
symbolic-links=0
table_open_cache=1024
thread_cache_size=10
tmp_table_size=64M

Installed RPM:
- mysql-5.1.73-8.el6_8.x86_64
- nagiosxi-5.5.2-1.el6.x86_64
- nagiosxi-ajaxterm-5-4.13.el6.x86_64
- nagiosxi-mrtg-5.5.2-1.el6.x86_64
- nagiosxi-nagioscore-5-4.13.el6.x86_64
- nagiosxi-nagiosmobile-5.5.2-1.el6.x86_64
- nagiosxi-nagiosplugins-5.5.2-1.el6.x86_64
- nagiosxi-nagiosql-5-4.13.el6.x86_64
- nagiosxi-nagvis-5.5.2-1.el6.x86_64
- nagiosxi-ndoutils-5.5.2-1.el6.x86_64
- nagiosxi-nrds-5.5.2-1.el6.x86_64
- nagiosxi-nrpe-5.5.2-1.el6.x86_64
- nagiosxi-nsca-5.5.2-1.el6.x86_64
- nagiosxi-nxti-5.5.2-1.el6.x86_64
- nagiosxi-pnp-5.5.2-1.el6.x86_64
- nagiosxi-shellinabox-5.5.2-1.el6.x86_64
- nagiosxi-wkhtmltox-5.5.2-1.el6.x86_64
- nagiosxi-wmic-5.5.2-1.el6.x86_64
- gearmand-0.33-2.x86_64
- gearmand-devel-0.33-2.x86_64
- gearmand-server-0.33-2.x86_64
- mod_gearman2-2.1.1-1.el6.x86_64

Thank you in advance for your help.

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Thu Sep 27, 2018 1:44 pm
by cdienger
I would take a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf and specifically the parts about modifying the reaper settings as described, enabling the unified page settings, increasing the dashlet refresh multiplier, disabling auto-running, and enabling backend cache. If these don't seem to help, please PM me a profile(Admin > System Config > System Profile > Download Profile).

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Mon Oct 01, 2018 4:04 am
by mslominski
Hello Scott,

Thank you for the first feedback.

I had already set most of the recommendation proposed by you, especially concerning the dashlet refresh multiplier and disabling auto-running (both of them are off). The Dashlet Refresh Multiplier has 120000. We cannot enable the Backend Cache because the results must be visible in real-time for the end-users and we add and remove hosts and services frequently.

We saw that we have a lot of the http GET /nagiosxi/ajaxhelper.php which took a lot of time (2-5 seconds).

The biggest issue is that the IPCS Message Queue is full (256000 messages) and all of the system stops => no MG request to the workers. The system freezes.

We need to resolve this issue this week, if not do the rollback to the 5.4.13

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Mon Oct 01, 2018 4:13 am
by mslominski

Code: Select all

Nagios XI - System Info
System
Nagios XI version: 5.5.2
XI installed from: rpm
XI UUID: 4b5c132e-52e5-4f68-b221-c93e20538021
Release info: mpllnx0203 2.6.32-754.el6.x86_64 x86_64
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Server Name: nagios-eu.xxxxxxx.com
Server Address: 10.216.66.234
Server Port: 443
Date/Time
PHP Timezone: Europe/Paris
PHP Time: Mon, 01 Oct 2018 11:02:37 +0200
System Time: Mon, 01 Oct 2018 11:02:37 +0200
Nagios XI Data
License ends in: 
UUID: 
Install Type: manual/unknown

nagios (pid 12202) is running...
NPCD running (pid 12092).
ndo2db (pid 12085) is running...
CPU Load 15: 7.65
Total Hosts: 5315
Total Services: 23995

Function get_base_uri() returns: https://nagios-eu.xxxxxxx.com/nagiosxi/
Function get_base_url() returns: https://nagios-eu.xxxxxxx.com/nagiosxi/
Function get_backend_url(internal_call=false) returns: https://nagios-eu.xxxxxxx.com/nagiosxi/includes/components/profile/profile.php
Function get_backend_url(internal_call=true) returns: https://localhost/nagiosxi/backend/

Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1 

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.044 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.028/0.036/0.044/0.008 ms
Test wget To localhost
WGET From URL: https://localhost/nagiosxi/includes/components/ccm/
Running:

/usr/bin/wget https://localhost/nagiosxi/includes/components/ccm/ 

--2018-10-01 11:02:39-- https://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1, ::1
Connecting to localhost|127.0.0.1|:443... connected.
ERROR: cannot verify localhost's certificate, issued by "/C=FR/O=XXXX/CN=XXXX_PKI_SubCA6":
Unable to locally verify the issuer's authority.
ERROR: no certificate subject alternative name matches
requested host name "localhost".
To connect to localhost insecurely, use '--no-check-certificate'.
Network Settings

1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

2: eth0:  mtu 1500 qdisc mq state UP qlen 1000

    link/ether 00:50:56:bd:50:fe brd ff:ff:ff:ff:ff:ff

    inet 10.216.66.234/23 brd 10.216.67.255 scope global eth0


10.216.66.0/23 dev eth0  proto kernel  scope link  src 10.216.66.234 

default via 10.216.66.6 dev eth0 


Nagios XI Components
actions	2.0.1
active_directory	0.3
alertcloud	1.2.1
alertstream	2.1.0
autodiscovery	2.2.5
backendapiurl	1.0.3
bandwidthreport	1.8.0
bbmap	1.2.0
birdseye	3.2.2
bulkmodifications	2.2.0
capacityplanning	2.3.0
ccm	2.7.0
custom-includes	1.0.4
customlogin	1.0.0
customlogo	1.2.0
deploydashboard	1.3.0
deploynotification	1.3.3
duo	1.0.0
escalationwizard	1.5.0
freevariabletab	1.0.1
globaleventhandler	1.2.2
graphexplorer	2.2.0
helpsystem	2.0.0
highcharts	4.0.1
homepagemod	1.1.7
hypermap	1.1.6
hypermap_replay	1.2.0
isms	1.2.3
latestalerts	1.2.6
ldap_ad_integration	1.1.0
ldapauth	0.3
massacknowledge	2.1.14
metrics	1.2.10
minemap	1.2.4
nagiosbpi	2.7.1
nagioscore	
nagioscorecfg	
nagiosim	2.2.6
nagiosna	1.4.0
nagiosql	
nagiosxicustomendpoints	1.0
nagiosxicustomendpointsapm	1.0
nagiosxicustomendpointsapmnew	1.0
nagvis	2.0.0
nocscreen	1.1.2
nrdsconfigmanager	1.6.4
nxti	1.0.0
opscreen	1.8.0
perfdata	
performancedatatool	2016-05-05
pingaction	1.1.1
pnp	
profile	1.4.0
proxy	1.1.4
rdp	1.0.3
rename	1.6.0
rssnotifications	1.2
scheduledbackups	1.2.0
scheduledreporting	
similetimeline	1.5.0
snmptrapsender	1.5.5
statusmap	1.0.2
tracerouteaction	1.1.1
usermacros	1.1.0
xicore	
Nagios XI Config Wizards
ec2	1.0.0
s3	1.0.0
autodiscovery	1.4.1
bpiwizard	1.1.4
bulkhostimport	2.0.4
digitalocean	1.0.0
google-cloud	1.0.0
linode	1.0.0
microsoft-azure	1.0.0
rackspace	1.0.0
dhcp	1.1.4
dnsquery	1.1.3
docker	1.0.0
domain_expiration	1.1.4
email-delivery	2.0.4
esensors_websensor	1.1.4
exchange	1.3.2
folder_watch	1.0.5
ftpserver	1.5.5
genericnetdevice	1.0.3
ldapserver	1.3.3
linux-server	1.5.5
linux_snmp	1.5.4
macosx	1.3.0
mailserver	1.2.4
mongodb_database	1.1.2
mongodbserver	1.1.2
mountpoint	1.0.2
mssql_database	1.6.2
mssql_query	1.6.4
mssql_server	1.9.1
mysqlquery	1.2.3
mysqlserver	1.3.3
nagioslogserver	1.0.5
nagiostats	1.2.3
nagiosxiserver	1.3.0
ncpa	2.0.0
nna	1.0.4
nrpe	1.5.2
oraclequery	1.3.3
oracleserverspace	1.5.3
oracletablespace	1.5.4
passivecheck	1.2.4
passiveobject	1.1.3
postgresdb	1.5.3
postgresquery	1.2.3
postgresserver	1.3.4
printer	1.1.3
radiusserver	2.0.1
sla	1.3.2
snmp	1.5.8
snmp_trap	1.5.3
snmpwalk	1.3.6
solaris	1.2.5
sshproxy	1.5.7
switch	2.4.0
tcpudpport	1.3.3
tftp	1.0.2
vmware	1.7.1
watchguard	1.4.5
website	1.3.0
website_defacement	1.1.5
websiteurl	1.3.7
webtransaction	1.2.5
windowseventlog	1.3.3
windowsserver	1.6.1
windowsdesktop	1.6.1
windowssnmp	1.5.1
windowswmi	2.1.0
Nagios XI Dashlets
alertcloud	
bbmap	
capacityplanning	
graphexplorer	
hypermap	
latestalerts	
metrics	
metricsguage	
minemap	
xicore_xi_news_feed	
xicore_getting_started	
xicore_admin_tasks	
xicore_eventqueue_chart	
xicore_component_status	
xicore_server_stats	
xicore_monitoring_stats	
xicore_monitoring_perf	
xicore_monitoring_process	
xicore_perfdata_chart	
xicore_host_status_summary	
xicore_service_status_summary	
xicore_comments	
xicore_hostgroup_status_overview	
xicore_hostgroup_status_grid	
xicore_servicegroup_status_overview	
xicore_servicegroup_status_grid	
xicore_hostgroup_status_summary	
xicore_servicegroup_status_summary	
xicore_available_updates	
xicore_network_outages	
xicore_network_outages_summary	
xicore_network_health	
xicore_host_status_tac_summary	
xicore_service_status_tac_summary	
xicore_feature_status_tac_summary	
availability	
custom_dashlet	1.0.5
gauges	1.2.2
googlemapdashlet	1.1.0
internethealthreport	
internettrafficreport	
rss_dashlet	1.1.0
sansrisingports	2.0
sla	
worldtimeserver	2.0.0
How can I publish the system profile which won't be visible for others but only for Nagios support?

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Mon Oct 01, 2018 7:30 am
by mslominski
One information important: When we see that the IPCS Message Queue is full, we stop the service HTTP for 5 minutes. The number of the IPCS Message Queue come back to 100 000. Then, we start the service HTTP again.

The CPU Usage of the mysql service is very high

# ps aux | grep mysql
root 24204 0.0 0.0 106116 1384 pts/2 S 08:28 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql 24364 196 4.4 5791660 732052 pts/2 Sl 08:28 708:34 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Mon Oct 01, 2018 1:30 pm
by cdienger
You can click my username to bring up a link to send me a private message with the profile. Please also send it to @Nagios Support and update this thread once it has been sent.

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Mon Oct 01, 2018 3:16 pm
by mslominski
Hello

The requested profile has been uploaded.

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Tue Oct 02, 2018 12:20 pm
by cdienger
I've received the profile and going over it now. I noticed that the reaper settings are still the same. Make the suggested change and monitor the system and let me know if that helps with the queue.

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Fri Oct 05, 2018 9:24 am
by cdienger
We've identified a problem that can cause slowness and a fix will be included in 5.5.5. I don't have an official release date but patch versions are put out fairly frequently and I would recommend testing with that version once it is available. Change logs and roadmaps can be found at:

https://assets.nagios.com/downloads/nag ... NGES-5.TXT
https://www.nagios.com/roadmaps/

Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1

Posted: Sun Oct 21, 2018 3:53 pm
by Interrex
Hi, we had the same problem running 5.5.2-1, did all the tuning and fixes we found without any luck.
After we upgraded to version 5.5.5 the system was working very stabel, cpu load on 5.5.5 is less than 20% for mysql and the ipc mq is empty.
Before upgrade, on v5.5.2-1 the CPU was avarag on 350% for mysql and the ipc mq started to fill up after 1 minute. After 20-30 min the system stopped updating (Nagios Core was working as normal all the time ;) )