I have an issue with NagiosXI CPU load. Below my system configuration :
Code: Select all
Nagios XI 5.2.5
Centos 7
429 Hosts
3163 Services
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Model name: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
Ram : 8G
Swap : 4GNo result, so i just stopped service checks and then re active them. and the load goes down to a value between 2 and 4. and since almost two month the load is going up. So i decided to make a deep analyse on this issue and to resolve it for once.
Almost all the 429 hosts have 8 services each.
I made some test on script time execution :check_xi_service_mrtgtraf
check_xi_service_ping
3 shell scripts with snmpwalk and snmpget
3 shell scripts with snmpget
check frequency :
Hosts Check :
10 hosts - check every 3 minutes
rest 5 minutes
Services check :
5 min 2 service checks
7 min 3 service checks
10 min 3 service checks
And i think that service check latency and execution time are good :
I cleanned about 335 mrtg cfg file from a total of 379
I executed the script to repair the nagios XI DB
Code: Select all
10:38:47 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
10:38:47 all 52.02 0.00 18.23 0.22 0.00 0.28 0.00 0.00 0.00 29.25
10:38:47 0 52.13 0.00 18.22 0.22 0.00 0.27 0.00 0.00 0.00 29.16
10:38:47 1 52.10 0.00 18.19 0.23 0.00 0.28 0.00 0.00 0.00 29.19
10:38:47 2 51.77 0.00 18.18 0.25 0.00 0.28 0.00 0.00 0.00 29.52
10:38:47 3 52.07 0.00 18.32 0.20 0.00 0.29 0.00 0.00 0.00 29.11Code: Select all
10:43:27 CPU %user %nice %system %iowait %steal %idle
10:43:32 all 28.99 0.00 10.10 0.10 0.00 60.80
10:43:37 all 50.15 0.00 19.13 0.05 0.00 30.67
10:43:42 all 55.91 0.00 20.61 0.61 0.00 22.88
10:43:47 all 53.36 0.00 19.53 0.10 0.00 27.01
10:43:52 all 54.47 0.00 20.90 0.05 0.00 24.58
10:43:57 all 47.29 0.00 17.67 0.50 0.00 34.54
10:44:02 all 50.43 0.00 16.56 0.10 0.00 32.91
10:44:07 all 64.17 0.00 24.21 0.05 0.00 11.58
10:44:12 all 50.45 0.00 20.42 0.10 0.00 29.02
10:44:17 all 48.74 0.00 17.29 0.05 0.00 33.92
10:44:22 all 52.52 0.00 21.02 0.05 0.00 26.41Code: Select all
[root@supervision XXXXXXX]# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
15.2 1761 mysql /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/var/lib/mysql/mysql.sock
%CPU PID USER COMMAND
3.7 4221 apache /usr/sbin/httpd -DFOREGROUND
3.4 23504 apache /usr/sbin/httpd -DFOREGROUND
2.7 25893 apache /usr/sbin/httpd -DFOREGROUND
2.6 25487 apache /usr/sbin/httpd -DFOREGROUND
2.3 25892 apache /usr/sbin/httpd -DFOREGROUND
2.3 23505 apache /usr/sbin/httpd -DFOREGROUND
2.2 8897 apache /usr/sbin/httpd -DFOREGROUND
2.1 23507 apache /usr/sbin/httpd -DFOREGROUNDI don't know what to do to reduce this load. The VM is using almost 90% of the CPU all the time.