High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
-
mslominski
- Posts: 7
- Joined: Sun Feb 01, 2015 9:11 am
High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
Last week, we performed the upgrade from 5.4.13 to 5.5.2-1 using the Nagios official RPM offline packages. Because we use the mod_gearman solution we downgraded the nagiosxi-nagioscore package to the version 5.4.13 (Core 4.2.4) according the KB.
The solution works, however not efficient. The CPU Usage of MySQL increase from 20% to 200% (2 vCPU), the IPC Message Queues are full all of the time and doesn't want to back to normal level (max 1000). The checks are not executed regularly as before.
We don’t have more ideas to find the source of this problem; therefore we posted this post on forum.
Technical information:
- RHEL 6.6 64bit - Kernel version 2.6.32-754.el6.x86_64
- VMware server: 10vCPU, 16 GB of RAM
- SSL configured and listen on 443, PHP Version: 5.3.3
- Gnome is not installed
- Total Services: 23823
- Total Hosts: 5244
- MySQL keeps only 7 days of Nagios data (Nagios DBs size is 1 GB)
- MySQL statistics: 47M q [717.191 qps], 170K conn, TX: 756G, RX: 137G, Reads / Writes: 5% / 95%
- All the checks are executed by the mod-gearman servers
Kernel configuration:
- kernel.msgmnb = 262144000
- kernel.msgmax = 262144000
- kernel.shmmax = 4294967295
- kernel.shmall = 268435456
- kernel.msgmni = 512000
- vm.swappiness = 10
MySQL configuration:
[mysqld]
#innodb_file_per_table=1
datadir=/var/lib/mysql
max_allowed_packet=512M
socket=/var/lib/mysql/mysql.sock
user=mysql
join_buffer_size=524288
key_buffer_size=211M
log-queries-not-using-indexes=1
long_query_time=1
max_connections=500
max_heap_table_size=64M
read_buffer_size=524288
read_rnd_buffer_size=524288
query_cache_type=1
query_cache_min_res_unit=2k
query_cache_limit=4M
query_cache_size=80M
skip-name-resolve
slow-query-log=0
slow-query-log-file=/var/log/mysqld-slow.log
symbolic-links=0
table_open_cache=1024
thread_cache_size=10
tmp_table_size=64M
Installed RPM:
- mysql-5.1.73-8.el6_8.x86_64
- nagiosxi-5.5.2-1.el6.x86_64
- nagiosxi-ajaxterm-5-4.13.el6.x86_64
- nagiosxi-mrtg-5.5.2-1.el6.x86_64
- nagiosxi-nagioscore-5-4.13.el6.x86_64
- nagiosxi-nagiosmobile-5.5.2-1.el6.x86_64
- nagiosxi-nagiosplugins-5.5.2-1.el6.x86_64
- nagiosxi-nagiosql-5-4.13.el6.x86_64
- nagiosxi-nagvis-5.5.2-1.el6.x86_64
- nagiosxi-ndoutils-5.5.2-1.el6.x86_64
- nagiosxi-nrds-5.5.2-1.el6.x86_64
- nagiosxi-nrpe-5.5.2-1.el6.x86_64
- nagiosxi-nsca-5.5.2-1.el6.x86_64
- nagiosxi-nxti-5.5.2-1.el6.x86_64
- nagiosxi-pnp-5.5.2-1.el6.x86_64
- nagiosxi-shellinabox-5.5.2-1.el6.x86_64
- nagiosxi-wkhtmltox-5.5.2-1.el6.x86_64
- nagiosxi-wmic-5.5.2-1.el6.x86_64
- gearmand-0.33-2.x86_64
- gearmand-devel-0.33-2.x86_64
- gearmand-server-0.33-2.x86_64
- mod_gearman2-2.1.1-1.el6.x86_64
Thank you in advance for your help.
The solution works, however not efficient. The CPU Usage of MySQL increase from 20% to 200% (2 vCPU), the IPC Message Queues are full all of the time and doesn't want to back to normal level (max 1000). The checks are not executed regularly as before.
We don’t have more ideas to find the source of this problem; therefore we posted this post on forum.
Technical information:
- RHEL 6.6 64bit - Kernel version 2.6.32-754.el6.x86_64
- VMware server: 10vCPU, 16 GB of RAM
- SSL configured and listen on 443, PHP Version: 5.3.3
- Gnome is not installed
- Total Services: 23823
- Total Hosts: 5244
- MySQL keeps only 7 days of Nagios data (Nagios DBs size is 1 GB)
- MySQL statistics: 47M q [717.191 qps], 170K conn, TX: 756G, RX: 137G, Reads / Writes: 5% / 95%
- All the checks are executed by the mod-gearman servers
Kernel configuration:
- kernel.msgmnb = 262144000
- kernel.msgmax = 262144000
- kernel.shmmax = 4294967295
- kernel.shmall = 268435456
- kernel.msgmni = 512000
- vm.swappiness = 10
MySQL configuration:
[mysqld]
#innodb_file_per_table=1
datadir=/var/lib/mysql
max_allowed_packet=512M
socket=/var/lib/mysql/mysql.sock
user=mysql
join_buffer_size=524288
key_buffer_size=211M
log-queries-not-using-indexes=1
long_query_time=1
max_connections=500
max_heap_table_size=64M
read_buffer_size=524288
read_rnd_buffer_size=524288
query_cache_type=1
query_cache_min_res_unit=2k
query_cache_limit=4M
query_cache_size=80M
skip-name-resolve
slow-query-log=0
slow-query-log-file=/var/log/mysqld-slow.log
symbolic-links=0
table_open_cache=1024
thread_cache_size=10
tmp_table_size=64M
Installed RPM:
- mysql-5.1.73-8.el6_8.x86_64
- nagiosxi-5.5.2-1.el6.x86_64
- nagiosxi-ajaxterm-5-4.13.el6.x86_64
- nagiosxi-mrtg-5.5.2-1.el6.x86_64
- nagiosxi-nagioscore-5-4.13.el6.x86_64
- nagiosxi-nagiosmobile-5.5.2-1.el6.x86_64
- nagiosxi-nagiosplugins-5.5.2-1.el6.x86_64
- nagiosxi-nagiosql-5-4.13.el6.x86_64
- nagiosxi-nagvis-5.5.2-1.el6.x86_64
- nagiosxi-ndoutils-5.5.2-1.el6.x86_64
- nagiosxi-nrds-5.5.2-1.el6.x86_64
- nagiosxi-nrpe-5.5.2-1.el6.x86_64
- nagiosxi-nsca-5.5.2-1.el6.x86_64
- nagiosxi-nxti-5.5.2-1.el6.x86_64
- nagiosxi-pnp-5.5.2-1.el6.x86_64
- nagiosxi-shellinabox-5.5.2-1.el6.x86_64
- nagiosxi-wkhtmltox-5.5.2-1.el6.x86_64
- nagiosxi-wmic-5.5.2-1.el6.x86_64
- gearmand-0.33-2.x86_64
- gearmand-devel-0.33-2.x86_64
- gearmand-server-0.33-2.x86_64
- mod_gearman2-2.1.1-1.el6.x86_64
Thank you in advance for your help.
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
I would take a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf and specifically the parts about modifying the reaper settings as described, enabling the unified page settings, increasing the dashlet refresh multiplier, disabling auto-running, and enabling backend cache. If these don't seem to help, please PM me a profile(Admin > System Config > System Profile > Download Profile).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
mslominski
- Posts: 7
- Joined: Sun Feb 01, 2015 9:11 am
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
Hello Scott,
Thank you for the first feedback.
I had already set most of the recommendation proposed by you, especially concerning the dashlet refresh multiplier and disabling auto-running (both of them are off). The Dashlet Refresh Multiplier has 120000. We cannot enable the Backend Cache because the results must be visible in real-time for the end-users and we add and remove hosts and services frequently.
We saw that we have a lot of the http GET /nagiosxi/ajaxhelper.php which took a lot of time (2-5 seconds).
The biggest issue is that the IPCS Message Queue is full (256000 messages) and all of the system stops => no MG request to the workers. The system freezes.
We need to resolve this issue this week, if not do the rollback to the 5.4.13
Thank you for the first feedback.
I had already set most of the recommendation proposed by you, especially concerning the dashlet refresh multiplier and disabling auto-running (both of them are off). The Dashlet Refresh Multiplier has 120000. We cannot enable the Backend Cache because the results must be visible in real-time for the end-users and we add and remove hosts and services frequently.
We saw that we have a lot of the http GET /nagiosxi/ajaxhelper.php which took a lot of time (2-5 seconds).
The biggest issue is that the IPCS Message Queue is full (256000 messages) and all of the system stops => no MG request to the workers. The system freezes.
We need to resolve this issue this week, if not do the rollback to the 5.4.13
-
mslominski
- Posts: 7
- Joined: Sun Feb 01, 2015 9:11 am
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
Code: Select all
Nagios XI - System Info
System
Nagios XI version: 5.5.2
XI installed from: rpm
XI UUID: 4b5c132e-52e5-4f68-b221-c93e20538021
Release info: mpllnx0203 2.6.32-754.el6.x86_64 x86_64
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Server Name: nagios-eu.xxxxxxx.com
Server Address: 10.216.66.234
Server Port: 443
Date/Time
PHP Timezone: Europe/Paris
PHP Time: Mon, 01 Oct 2018 11:02:37 +0200
System Time: Mon, 01 Oct 2018 11:02:37 +0200
Nagios XI Data
License ends in:
UUID:
Install Type: manual/unknown
nagios (pid 12202) is running...
NPCD running (pid 12092).
ndo2db (pid 12085) is running...
CPU Load 15: 7.65
Total Hosts: 5315
Total Services: 23995
Function get_base_uri() returns: https://nagios-eu.xxxxxxx.com/nagiosxi/
Function get_base_url() returns: https://nagios-eu.xxxxxxx.com/nagiosxi/
Function get_backend_url(internal_call=false) returns: https://nagios-eu.xxxxxxx.com/nagiosxi/includes/components/profile/profile.php
Function get_backend_url(internal_call=true) returns: https://localhost/nagiosxi/backend/
Ping Test localhost
Running:
/bin/ping -c 3 localhost 2>&1
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.044 ms
--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.028/0.036/0.044/0.008 ms
Test wget To localhost
WGET From URL: https://localhost/nagiosxi/includes/components/ccm/
Running:
/usr/bin/wget https://localhost/nagiosxi/includes/components/ccm/
--2018-10-01 11:02:39-- https://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1, ::1
Connecting to localhost|127.0.0.1|:443... connected.
ERROR: cannot verify localhost's certificate, issued by "/C=FR/O=XXXX/CN=XXXX_PKI_SubCA6":
Unable to locally verify the issuer's authority.
ERROR: no certificate subject alternative name matches
requested host name "localhost".
To connect to localhost insecurely, use '--no-check-certificate'.
Network Settings
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:50:56:bd:50:fe brd ff:ff:ff:ff:ff:ff
inet 10.216.66.234/23 brd 10.216.67.255 scope global eth0
10.216.66.0/23 dev eth0 proto kernel scope link src 10.216.66.234
default via 10.216.66.6 dev eth0
Nagios XI Components
actions 2.0.1
active_directory 0.3
alertcloud 1.2.1
alertstream 2.1.0
autodiscovery 2.2.5
backendapiurl 1.0.3
bandwidthreport 1.8.0
bbmap 1.2.0
birdseye 3.2.2
bulkmodifications 2.2.0
capacityplanning 2.3.0
ccm 2.7.0
custom-includes 1.0.4
customlogin 1.0.0
customlogo 1.2.0
deploydashboard 1.3.0
deploynotification 1.3.3
duo 1.0.0
escalationwizard 1.5.0
freevariabletab 1.0.1
globaleventhandler 1.2.2
graphexplorer 2.2.0
helpsystem 2.0.0
highcharts 4.0.1
homepagemod 1.1.7
hypermap 1.1.6
hypermap_replay 1.2.0
isms 1.2.3
latestalerts 1.2.6
ldap_ad_integration 1.1.0
ldapauth 0.3
massacknowledge 2.1.14
metrics 1.2.10
minemap 1.2.4
nagiosbpi 2.7.1
nagioscore
nagioscorecfg
nagiosim 2.2.6
nagiosna 1.4.0
nagiosql
nagiosxicustomendpoints 1.0
nagiosxicustomendpointsapm 1.0
nagiosxicustomendpointsapmnew 1.0
nagvis 2.0.0
nocscreen 1.1.2
nrdsconfigmanager 1.6.4
nxti 1.0.0
opscreen 1.8.0
perfdata
performancedatatool 2016-05-05
pingaction 1.1.1
pnp
profile 1.4.0
proxy 1.1.4
rdp 1.0.3
rename 1.6.0
rssnotifications 1.2
scheduledbackups 1.2.0
scheduledreporting
similetimeline 1.5.0
snmptrapsender 1.5.5
statusmap 1.0.2
tracerouteaction 1.1.1
usermacros 1.1.0
xicore
Nagios XI Config Wizards
ec2 1.0.0
s3 1.0.0
autodiscovery 1.4.1
bpiwizard 1.1.4
bulkhostimport 2.0.4
digitalocean 1.0.0
google-cloud 1.0.0
linode 1.0.0
microsoft-azure 1.0.0
rackspace 1.0.0
dhcp 1.1.4
dnsquery 1.1.3
docker 1.0.0
domain_expiration 1.1.4
email-delivery 2.0.4
esensors_websensor 1.1.4
exchange 1.3.2
folder_watch 1.0.5
ftpserver 1.5.5
genericnetdevice 1.0.3
ldapserver 1.3.3
linux-server 1.5.5
linux_snmp 1.5.4
macosx 1.3.0
mailserver 1.2.4
mongodb_database 1.1.2
mongodbserver 1.1.2
mountpoint 1.0.2
mssql_database 1.6.2
mssql_query 1.6.4
mssql_server 1.9.1
mysqlquery 1.2.3
mysqlserver 1.3.3
nagioslogserver 1.0.5
nagiostats 1.2.3
nagiosxiserver 1.3.0
ncpa 2.0.0
nna 1.0.4
nrpe 1.5.2
oraclequery 1.3.3
oracleserverspace 1.5.3
oracletablespace 1.5.4
passivecheck 1.2.4
passiveobject 1.1.3
postgresdb 1.5.3
postgresquery 1.2.3
postgresserver 1.3.4
printer 1.1.3
radiusserver 2.0.1
sla 1.3.2
snmp 1.5.8
snmp_trap 1.5.3
snmpwalk 1.3.6
solaris 1.2.5
sshproxy 1.5.7
switch 2.4.0
tcpudpport 1.3.3
tftp 1.0.2
vmware 1.7.1
watchguard 1.4.5
website 1.3.0
website_defacement 1.1.5
websiteurl 1.3.7
webtransaction 1.2.5
windowseventlog 1.3.3
windowsserver 1.6.1
windowsdesktop 1.6.1
windowssnmp 1.5.1
windowswmi 2.1.0
Nagios XI Dashlets
alertcloud
bbmap
capacityplanning
graphexplorer
hypermap
latestalerts
metrics
metricsguage
minemap
xicore_xi_news_feed
xicore_getting_started
xicore_admin_tasks
xicore_eventqueue_chart
xicore_component_status
xicore_server_stats
xicore_monitoring_stats
xicore_monitoring_perf
xicore_monitoring_process
xicore_perfdata_chart
xicore_host_status_summary
xicore_service_status_summary
xicore_comments
xicore_hostgroup_status_overview
xicore_hostgroup_status_grid
xicore_servicegroup_status_overview
xicore_servicegroup_status_grid
xicore_hostgroup_status_summary
xicore_servicegroup_status_summary
xicore_available_updates
xicore_network_outages
xicore_network_outages_summary
xicore_network_health
xicore_host_status_tac_summary
xicore_service_status_tac_summary
xicore_feature_status_tac_summary
availability
custom_dashlet 1.0.5
gauges 1.2.2
googlemapdashlet 1.1.0
internethealthreport
internettrafficreport
rss_dashlet 1.1.0
sansrisingports 2.0
sla
worldtimeserver 2.0.0-
mslominski
- Posts: 7
- Joined: Sun Feb 01, 2015 9:11 am
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
One information important: When we see that the IPCS Message Queue is full, we stop the service HTTP for 5 minutes. The number of the IPCS Message Queue come back to 100 000. Then, we start the service HTTP again.
The CPU Usage of the mysql service is very high
# ps aux | grep mysql
root 24204 0.0 0.0 106116 1384 pts/2 S 08:28 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql 24364 196 4.4 5791660 732052 pts/2 Sl 08:28 708:34 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
The CPU Usage of the mysql service is very high
# ps aux | grep mysql
root 24204 0.0 0.0 106116 1384 pts/2 S 08:28 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql 24364 196 4.4 5791660 732052 pts/2 Sl 08:28 708:34 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
You can click my username to bring up a link to send me a private message with the profile. Please also send it to @Nagios Support and update this thread once it has been sent.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
mslominski
- Posts: 7
- Joined: Sun Feb 01, 2015 9:11 am
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
Hello
The requested profile has been uploaded.
The requested profile has been uploaded.
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
I've received the profile and going over it now. I noticed that the reaper settings are still the same. Make the suggested change and monitor the system and let me know if that helps with the queue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
We've identified a problem that can cause slowness and a fix will be included in 5.5.5. I don't have an official release date but patch versions are put out fairly frequently and I would recommend testing with that version once it is available. Change logs and roadmaps can be found at:
https://assets.nagios.com/downloads/nag ... NGES-5.TXT
https://www.nagios.com/roadmaps/
https://assets.nagios.com/downloads/nag ... NGES-5.TXT
https://www.nagios.com/roadmaps/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: High MySQL CPU & IPC MQ full after upgrade to 5.5.2-1
Hi, we had the same problem running 5.5.2-1, did all the tuning and fixes we found without any luck.
After we upgraded to version 5.5.5 the system was working very stabel, cpu load on 5.5.5 is less than 20% for mysql and the ipc mq is empty.
Before upgrade, on v5.5.2-1 the CPU was avarag on 350% for mysql and the ipc mq started to fill up after 1 minute. After 20-30 min the system stopped updating (Nagios Core was working as normal all the time
)
After we upgraded to version 5.5.5 the system was working very stabel, cpu load on 5.5.5 is less than 20% for mysql and the ipc mq is empty.
Before upgrade, on v5.5.2-1 the CPU was avarag on 350% for mysql and the ipc mq started to fill up after 1 minute. After 20-30 min the system stopped updating (Nagios Core was working as normal all the time