High Load CPU

asardouk · Post by **asardouk** » Wed Jan 25, 2017 1:56 pm

Hello,

I have an issue with NagiosXI CPU load. Below my system configuration :

Nagios XI 5.2.5
Centos 7
429 Hosts
3163 Services
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
Ram : 8G
Swap : 4G

The system have already reached a load of 60 and then cut data collection and it was very slow. I tried this link : https://support.nagios.com/kb/article.p ... ategory=44
No result, so i just stopped service checks and then re active them. and the load goes down to a value between 2 and 4. and since almost two month the load is going up. So i decided to make a deep analyse on this issue and to resolve it for once.

Almost all the 429 hosts have 8 services each.

check_xi_service_mrtgtraf
check_xi_service_ping
3 shell scripts with snmpwalk and snmpget
3 shell scripts with snmpget

check frequency :
Hosts Check :
10 hosts - check every 3 minutes
rest 5 minutes

Services check :
5 min 2 service checks
7 min 3 service checks
10 min 3 service checks

I made some test on script time execution :

And i think that service check latency and execution time are good :

I cleanned about 335 mrtg cfg file from a total of 379
I executed the script to repair the nagios XI DB

Code: Select all

10:38:47     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:38:47     all   52.02    0.00   18.23    0.22    0.00    0.28    0.00    0.00    0.00   29.25
10:38:47       0   52.13    0.00   18.22    0.22    0.00    0.27    0.00    0.00    0.00   29.16
10:38:47       1   52.10    0.00   18.19    0.23    0.00    0.28    0.00    0.00    0.00   29.19
10:38:47       2   51.77    0.00   18.18    0.25    0.00    0.28    0.00    0.00    0.00   29.52
10:38:47       3   52.07    0.00   18.32    0.20    0.00    0.29    0.00    0.00    0.00   29.11

Code: Select all

10:43:27        CPU     %user     %nice   %system   %iowait    %steal     %idle
10:43:32        all     28.99      0.00     10.10      0.10      0.00     60.80
10:43:37        all     50.15      0.00     19.13      0.05      0.00     30.67
10:43:42        all     55.91      0.00     20.61      0.61      0.00     22.88
10:43:47        all     53.36      0.00     19.53      0.10      0.00     27.01
10:43:52        all     54.47      0.00     20.90      0.05      0.00     24.58
10:43:57        all     47.29      0.00     17.67      0.50      0.00     34.54
10:44:02        all     50.43      0.00     16.56      0.10      0.00     32.91
10:44:07        all     64.17      0.00     24.21      0.05      0.00     11.58
10:44:12        all     50.45      0.00     20.42      0.10      0.00     29.02
10:44:17        all     48.74      0.00     17.29      0.05      0.00     33.92
10:44:22        all     52.52      0.00     21.02      0.05      0.00     26.41

Code: Select all

[root@supervision XXXXXXX]# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
15.2  1761 mysql    /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/var/lib/mysql/mysql.sock
%CPU   PID USER     COMMAND
 3.7  4221 apache   /usr/sbin/httpd -DFOREGROUND
 3.4 23504 apache   /usr/sbin/httpd -DFOREGROUND
 2.7 25893 apache   /usr/sbin/httpd -DFOREGROUND
 2.6 25487 apache   /usr/sbin/httpd -DFOREGROUND
 2.3 25892 apache   /usr/sbin/httpd -DFOREGROUND
 2.3 23505 apache   /usr/sbin/httpd -DFOREGROUND
 2.2  8897 apache   /usr/sbin/httpd -DFOREGROUND
 2.1 23507 apache   /usr/sbin/httpd -DFOREGROUND

When i stop the httpd service, load goes down by 1 max. So i donc think that the httpd is guilty

I don't know what to do to reduce this load. The VM is using almost 90% of the CPU all the time.

rkennedy · Post by **rkennedy** » Wed Jan 25, 2017 4:51 pm

Two things to look at -
1. Can you set up an event handler to run during the spikes? Something as simple as ps -eo pcpu,args --sort=-%cpu will show us a top down list as to what's eating up the CPU. If it's MySQL - there is a possibility that's slowing down all around.
2. Can you PM over a profile for us to review? (Admin -> System Profile -> Download Profile) - this will help to see if there is anything odd going on in your environment.

asardouk · Post by **asardouk** » Thu Jan 26, 2017 12:55 pm

This is the load of the past 24 hours :

I placed an event handler with a script who write in a file the result of : ps -eo pcpu,args --sort=-%cpu

Code: Select all

[root@supervision libexec]# ps -eo pcpu,args --sort=-%cpu
%CPU COMMAND
14.7 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file
 3.6 /usr/sbin/httpd -DFOREGROUND
 3.2 /usr/sbin/httpd -DFOREGROUND
 2.8 /usr/sbin/httpd -DFOREGROUND
 2.8 /usr/sbin/httpd -DFOREGROUND
 2.6 /usr/sbin/httpd -DFOREGROUND
 2.6 /usr/sbin/httpd -DFOREGROUND
 2.5 /usr/sbin/httpd -DFOREGROUND
 2.4 /usr/sbin/httpd -DFOREGROUND
 2.4 /usr/sbin/httpd -DFOREGROUND
 2.4 /usr/sbin/httpd -DFOREGROUND
 2.3 /usr/sbin/httpd -DFOREGROUND
 2.1 /usr/sbin/httpd -DFOREGROUND
 2.1 /usr/sbin/httpd -DFOREGROUND
 2.1 /usr/sbin/httpd -DFOREGROUND
 1.6 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
 1.0 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
 0.7 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
 0.6 [rcu_sched]
 0.4 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
 0.3 /bin/bash /usr/local/nagios/libexec/check_interface_inf.sh -H xxx.xxx.xxx.xxx -C XXXXXXXX -T ALL -I LAN
 0.3 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
 0.3 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
 0.2 [rcuos/0]
 0.2 [rcuos/1]
 0.2 [rcuos/2]
 0.2 [rcuos/3]
 0.2 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
 0.1 [kworker/u8:1]

rkennedy · Post by **rkennedy** » Thu Jan 26, 2017 5:59 pm

Do you have a good amount of reports running by any chance that occur daily? This could definitely impact why httpd is being called. Also, what sort of disks is the XI machine running on?

Can you PM over a copy of your entire error_log / access_log, and also a few copies of the nagios.log for recent dates from here? /usr/local/nagios/var/archives/

I'd like to see how things are lining up.

Things in your profile look clean other than the spikes. The halt may be lieing on SQL which is why httpd would be hanging.

tmcdonald · Post by **tmcdonald** » Thu Jan 26, 2017 6:00 pm

Is anything else installed and running on this machine? Are you doing any manual querying of the database? Are a lot of people logged in at once running reports, or are there many scheduled reports? This definitely seems like a high load for a normal XI MySQL server.

avandemore · Post by **avandemore** » Thu Jan 26, 2017 6:01 pm

During a high load period, please post the output from this:

Code: Select all

top -bcn1

Have you looked through this document?
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

asardouk · Post by **asardouk** » Fri Jan 27, 2017 10:09 am

rkennedy wrote:Do you have a good amount of reports running by any chance that occur daily? This could definitely impact why httpd is being called. Also, what sort of disks is the XI machine running on?

Can you PM over a copy of your entire error_log / access_log, and also a few copies of the nagios.log for recent dates from here? /usr/local/nagios/var/archives/

I'd like to see how things are lining up.

Things in your profile look clean other than the spikes. The halt may be lieing on SQL which is why httpd would be hanging.

I have only one BW daily report.
Disks are SCSI non_SSD.
error and access log file are large (about 400MB) how can i send them to you ?

I check the data i get from the event handlers i placed on cpu load.
mysql is always using about 15% of cpu, httpd between 20 and 35% and when the load goes up it's an mrtg process who uses between 25 and 70% of cpu.

asardouk · Post by **asardouk** » Fri Jan 27, 2017 11:11 am

tmcdonald wrote:Is anything else installed and running on this machine? Are you doing any manual querying of the database? Are a lot of people logged in at once running reports, or are there many scheduled reports? This definitely seems like a high load for a normal XI MySQL server.

I have Nagvis and 2 scripts executed every 5 minutes.
First one just read users token and write them in a distant DB.
Second one get contacts and hostlist json and write in files.

asardouk · Post by **asardouk** » Fri Jan 27, 2017 11:51 am

ssax · Post by **ssax** » Fri Jan 27, 2017 1:50 pm

Please run these commands and send me the entire output:

Code: Select all

echo "select count(*) from xi_meta;" | mysql -uroot -pnagiosxi nagiosxi
echo "SELECT table_schema as 'DB', table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES ORDER BY (data_length + index_length) DESC \t;" | mysql -uroot -pnagiosxi

If the xi_meta table is very large (or has a very large number of entries) and it's still not working properly, we may need to truncate the xi_events, xi_meta, and the xi_eventqueue tables, you may be hitting a bug.
- NOTE: Running this command will wipe out all data from the Home > Event Log OR Reports > Event Log report (this is the only information affected, those 2 locations are the same report), make sure you have good XI backups

https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Code: Select all

echo "truncate table xi_events; truncate table xi_eventqueue; truncate table xi_meta;" | mysql -uroot -pnagiosxi nagiosxi

Thank you

Nagios Support Forum

High Load CPU

High Load CPU

Re: High Load CPU

Re: High Load CPU

Re: High Load CPU

Re: High Load CPU

Re: High Load CPU

Re: High Load CPU

Re: High Load CPU

Re: High Load CPU

Re: High Load CPU