High Load CPU

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
asardouk
Posts: 31
Joined: Thu Feb 25, 2016 4:49 am

High Load CPU

Post by asardouk »

Hello,

I have an issue with NagiosXI CPU load. Below my system configuration :

Code: Select all

Nagios XI 5.2.5
Centos 7
429 Hosts
3163 Services
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
Ram : 8G
Swap : 4G
The system have already reached a load of 60 and then cut data collection and it was very slow. I tried this link : https://support.nagios.com/kb/article.p ... ategory=44
No result, so i just stopped service checks and then re active them. and the load goes down to a value between 2 and 4. and since almost two month the load is going up. So i decided to make a deep analyse on this issue and to resolve it for once.

Image

Almost all the 429 hosts have 8 services each.
check_xi_service_mrtgtraf
check_xi_service_ping
3 shell scripts with snmpwalk and snmpget
3 shell scripts with snmpget

check frequency :
Hosts Check :
10 hosts - check every 3 minutes
rest 5 minutes

Services check :
5 min 2 service checks
7 min 3 service checks
10 min 3 service checks
I made some test on script time execution :

Image

And i think that service check latency and execution time are good :

Image

I cleanned about 335 mrtg cfg file from a total of 379
I executed the script to repair the nagios XI DB

Code: Select all

10:38:47     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:38:47     all   52.02    0.00   18.23    0.22    0.00    0.28    0.00    0.00    0.00   29.25
10:38:47       0   52.13    0.00   18.22    0.22    0.00    0.27    0.00    0.00    0.00   29.16
10:38:47       1   52.10    0.00   18.19    0.23    0.00    0.28    0.00    0.00    0.00   29.19
10:38:47       2   51.77    0.00   18.18    0.25    0.00    0.28    0.00    0.00    0.00   29.52
10:38:47       3   52.07    0.00   18.32    0.20    0.00    0.29    0.00    0.00    0.00   29.11

Code: Select all

10:43:27        CPU     %user     %nice   %system   %iowait    %steal     %idle
10:43:32        all     28.99      0.00     10.10      0.10      0.00     60.80
10:43:37        all     50.15      0.00     19.13      0.05      0.00     30.67
10:43:42        all     55.91      0.00     20.61      0.61      0.00     22.88
10:43:47        all     53.36      0.00     19.53      0.10      0.00     27.01
10:43:52        all     54.47      0.00     20.90      0.05      0.00     24.58
10:43:57        all     47.29      0.00     17.67      0.50      0.00     34.54
10:44:02        all     50.43      0.00     16.56      0.10      0.00     32.91
10:44:07        all     64.17      0.00     24.21      0.05      0.00     11.58
10:44:12        all     50.45      0.00     20.42      0.10      0.00     29.02
10:44:17        all     48.74      0.00     17.29      0.05      0.00     33.92
10:44:22        all     52.52      0.00     21.02      0.05      0.00     26.41

Code: Select all

[root@supervision XXXXXXX]# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
15.2  1761 mysql    /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/var/lib/mysql/mysql.sock
%CPU   PID USER     COMMAND
 3.7  4221 apache   /usr/sbin/httpd -DFOREGROUND
 3.4 23504 apache   /usr/sbin/httpd -DFOREGROUND
 2.7 25893 apache   /usr/sbin/httpd -DFOREGROUND
 2.6 25487 apache   /usr/sbin/httpd -DFOREGROUND
 2.3 25892 apache   /usr/sbin/httpd -DFOREGROUND
 2.3 23505 apache   /usr/sbin/httpd -DFOREGROUND
 2.2  8897 apache   /usr/sbin/httpd -DFOREGROUND
 2.1 23507 apache   /usr/sbin/httpd -DFOREGROUND
When i stop the httpd service, load goes down by 1 max. So i donc think that the httpd is guilty :D
I don't know what to do to reduce this load. The VM is using almost 90% of the CPU all the time.
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.2.5 - CentOS Linux release 7.2.1511
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: High Load CPU

Post by rkennedy »

Two things to look at -
1. Can you set up an event handler to run during the spikes? Something as simple as ps -eo pcpu,args --sort=-%cpu will show us a top down list as to what's eating up the CPU. If it's MySQL - there is a possibility that's slowing down all around.
2. Can you PM over a profile for us to review? (Admin -> System Profile -> Download Profile) - this will help to see if there is anything odd going on in your environment.
Former Nagios Employee
asardouk
Posts: 31
Joined: Thu Feb 25, 2016 4:49 am

Re: High Load CPU

Post by asardouk »

This is the load of the past 24 hours :

Image

I placed an event handler with a script who write in a file the result of : ps -eo pcpu,args --sort=-%cpu

Code: Select all

[root@supervision libexec]# ps -eo pcpu,args --sort=-%cpu
%CPU COMMAND
14.7 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file
 3.6 /usr/sbin/httpd -DFOREGROUND
 3.2 /usr/sbin/httpd -DFOREGROUND
 2.8 /usr/sbin/httpd -DFOREGROUND
 2.8 /usr/sbin/httpd -DFOREGROUND
 2.6 /usr/sbin/httpd -DFOREGROUND
 2.6 /usr/sbin/httpd -DFOREGROUND
 2.5 /usr/sbin/httpd -DFOREGROUND
 2.4 /usr/sbin/httpd -DFOREGROUND
 2.4 /usr/sbin/httpd -DFOREGROUND
 2.4 /usr/sbin/httpd -DFOREGROUND
 2.3 /usr/sbin/httpd -DFOREGROUND
 2.1 /usr/sbin/httpd -DFOREGROUND
 2.1 /usr/sbin/httpd -DFOREGROUND
 2.1 /usr/sbin/httpd -DFOREGROUND
 1.6 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
 1.0 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
 0.7 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
 0.6 [rcu_sched]
 0.4 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
 0.3 /bin/bash /usr/local/nagios/libexec/check_interface_inf.sh -H xxx.xxx.xxx.xxx -C XXXXXXXX -T ALL -I LAN
 0.3 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
 0.3 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
 0.2 [rcuos/0]
 0.2 [rcuos/1]
 0.2 [rcuos/2]
 0.2 [rcuos/3]
 0.2 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
 0.1 [kworker/u8:1]
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.2.5 - CentOS Linux release 7.2.1511
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: High Load CPU

Post by rkennedy »

Do you have a good amount of reports running by any chance that occur daily? This could definitely impact why httpd is being called. Also, what sort of disks is the XI machine running on?

Can you PM over a copy of your entire error_log / access_log, and also a few copies of the nagios.log for recent dates from here? /usr/local/nagios/var/archives/

I'd like to see how things are lining up.

Things in your profile look clean other than the spikes. The halt may be lieing on SQL which is why httpd would be hanging.
Former Nagios Employee
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: High Load CPU

Post by tmcdonald »

Is anything else installed and running on this machine? Are you doing any manual querying of the database? Are a lot of people logged in at once running reports, or are there many scheduled reports? This definitely seems like a high load for a normal XI MySQL server.
Former Nagios employee
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: High Load CPU

Post by avandemore »

During a high load period, please post the output from this:

Code: Select all

top -bcn1
Have you looked through this document?
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Previous Nagios employee
asardouk
Posts: 31
Joined: Thu Feb 25, 2016 4:49 am

Re: High Load CPU

Post by asardouk »

rkennedy wrote:Do you have a good amount of reports running by any chance that occur daily? This could definitely impact why httpd is being called. Also, what sort of disks is the XI machine running on?

Can you PM over a copy of your entire error_log / access_log, and also a few copies of the nagios.log for recent dates from here? /usr/local/nagios/var/archives/

I'd like to see how things are lining up.

Things in your profile look clean other than the spikes. The halt may be lieing on SQL which is why httpd would be hanging.
I have only one BW daily report.
Disks are SCSI non_SSD.
error and access log file are large (about 400MB) how can i send them to you ?

I check the data i get from the event handlers i placed on cpu load.
mysql is always using about 15% of cpu, httpd between 20 and 35% and when the load goes up it's an mrtg process who uses between 25 and 70% of cpu.
Nagios XI 5.2.5 - CentOS Linux release 7.2.1511
asardouk
Posts: 31
Joined: Thu Feb 25, 2016 4:49 am

Re: High Load CPU

Post by asardouk »

tmcdonald wrote:Is anything else installed and running on this machine? Are you doing any manual querying of the database? Are a lot of people logged in at once running reports, or are there many scheduled reports? This definitely seems like a high load for a normal XI MySQL server.
I have Nagvis and 2 scripts executed every 5 minutes.
First one just read users token and write them in a distant DB.
Second one get contacts and hostlist json and write in files.
Nagios XI 5.2.5 - CentOS Linux release 7.2.1511
asardouk
Posts: 31
Joined: Thu Feb 25, 2016 4:49 am

Re: High Load CPU

Post by asardouk »

Image
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.2.5 - CentOS Linux release 7.2.1511
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: High Load CPU

Post by ssax »

Please run these commands and send me the entire output:

Code: Select all

echo "select count(*) from xi_meta;" | mysql -uroot -pnagiosxi nagiosxi
echo "SELECT table_schema as 'DB', table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES ORDER BY (data_length + index_length) DESC \t;" | mysql -uroot -pnagiosxi

If the xi_meta table is very large (or has a very large number of entries) and it's still not working properly, we may need to truncate the xi_events, xi_meta, and the xi_eventqueue tables, you may be hitting a bug.
- NOTE: Running this command will wipe out all data from the Home > Event Log OR Reports > Event Log report (this is the only information affected, those 2 locations are the same report), make sure you have good XI backups

https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Code: Select all

echo "truncate table xi_events; truncate table xi_eventqueue; truncate table xi_meta;" | mysql -uroot -pnagiosxi nagiosxi

Thank you
Locked