Page 1 of 2
Nagios XI CPU NEar 100 %
Posted: Wed Aug 05, 2015 5:58 pm
by abdelhafeth.mzahem
Dears
We have a problem in Nagios XI system that the CPU is over 95% All the times.
this cause a delay, Faults, and some times crash.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11926 nagios 20 0 161m 24m 2272 S 7.6 0.3 0:00.23 check_nwc_healt
12411 nagios 20 0 161m 24m 2232 S 7.6 0.3 0:00.23 check_nwc_healt
33977 snmptt 20 0 190m 40m 1720 S 9.0 0.5 2:54.97 snmptt
32313 mysql 20 0 2185m 46m 6240 S 11.3 0.6 3:55.15 mysqld
10067 apache 20 0 443m 29m 4536 S 4.6 0.4 0:11.23 httpd
11880 nagios 20 0 161m 24m 2232 S 4.0 0.3 0:00.23 check_nwc_healt
12464 nagios 20 0 142m 13m 1988 R 3.6 0.2 0:00.11 check_nwc_healt
6169 nagios 20 0 217m 21m 7840 S 3.3 0.3 0:00.47 php
11970 nagios 20 0 137m 10m 2156 S 3.0 0.1 0:00.09 check_snmp_proc
12394 nagios 20 0 137m 10m 2152 S 3.0 0.1 0:00.09 check_snmp_stor
12380 nagios 20 0 137m 10m 2152 S 2.7 0.1 0:00.08 check_snmp_stor
12467 nagios 20 0 139m 10m 1988 R 2.7 0.1 0:00.08 check_nwc_healt
12471 root 20 0 135m 6732 2052 R 1.7 0.1 0:00.05 snmptthandler
12473 nagios 20 0 128m 7008 1956 R 1.3 0.1 0:00.04 check_snmp_stor
33279 nagios 20 0 32444 11m 1508 S 1.3 0.1 0:40.92 nagios
33294 nagios 20 0 51324 2796 1004 S 1.3 0.0 0:44.00 ndo2db
1623 root 20 0 194m 5340 1152 S 0.7 0.1 0:47.76 snmptrapd
2157 nagios 20 0 782m 5084 768 S 0.7 0.1 0:17.72 rrdcached
6168 nagios 20 0 217m 22m 8272 S 0.7 0.3 0:00.19 php
34 root 20 0 0 0 0 S 0.3 0.0 0:02.48 kblockd/0
405 root 20 0 15160 1488 1000 R 0.3 0.0 0:00.27 top
6199 postgres 20 0 210m 5248 3752 S 0.3 0.1 0:00.01 postmaster
6212 postgres 20 0 210m 8316 6608 S 0.3 0.1 0:00.19 postmaster
33293 nagios 20 0 50204 1596 888 S 0.3 0.0 0:05.65 ndo2db
Please we need to solve this problem ASAP.
Re: Nagios XI CPU NEar 100 %
Posted: Wed Aug 05, 2015 6:11 pm
by Box293
Can you run:
Post the results here.
How many CPU cores does your XI server have?
Re: Nagios XI CPU NEar 100 %
Posted: Wed Aug 05, 2015 7:31 pm
by abdelhafeth.mzahem
dear Sir
We increased the cores from 4 to 8 and the cpu usage decreased from 90 % to 55%
[root@nms ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_nms2-lv_root
60G 27G 31G 46% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 485M 70M 390M 16% /boot
[root@nms ~]#
and when run the command you sent the result differ between two runs :
[root@nms ~]# top -n 1
top - 03:28:24 up 3:43, 1 user, load average: 2.96, 3.08, 4.98
Tasks: 363 total, 6 running, 357 sleeping, 0 stopped, 0 zombie
Cpu(s): 54.2%us, 17.9%sy, 0.0%ni, 21.5%id, 6.2%wa, 0.1%hi, 0.2%si, 0.0%st
Mem: 8053804k total, 6422448k used, 1631356k free, 179180k buffers
Swap: 4063224k total, 36884k used, 4026340k free, 5032676k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32788 nagios 20 0 20408 7304 1504 R 40.9 0.1 0:00.21 snmpget
32797 nagios 20 0 137m 10m 2156 S 17.5 0.1 0:00.09 check_snmp_proc
32800 nagios 20 0 133m 8908 2016 R 13.6 0.1 0:00.07 check_snmp_stor
35871 snmptt 20 0 188m 37m 1720 S 11.7 0.5 5:04.43 snmptt
32804 root 20 0 135m 6308 2052 R 7.8 0.1 0:00.04 snmptthandler
28592 mysql 20 0 4233m 57m 6316 S 5.8 0.7 5:48.55 mysqld
1623 root 20 0 194m 5868 1152 S 3.9 0.1 1:52.63 snmptrapd
1347 root 20 0 174m 2056 1584 S 1.9 0.0 0:11.88 vmtoolsd
1441 root 20 0 243m 4420 772 S 1.9 0.1 0:24.89 rsyslogd
23055 root RT 0 0 0 0 S 1.9 0.0 0:02.24 migration/7
29648 postgres 20 0 211m 8520 6756 S 1.9 0.1 0:00.16 postmaster
32760 root 20 0 15156 1412 892 R 1.9 0.0 0:00.01 top
32806 nagios 20 0 127m 3820 1932 R 1.9 0.0 0:00.01 check_nwc_healt
35610 nagios 20 0 51324 2800 1008 S 1.9 0.0 1:19.11 ndo2db
35890 nagios 20 0 37984 6312 248 S 1.9 0.1 0:10.37 nagios
1 root 20 0 19232 800 628 S 0.0 0.0 0:03.55 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.08 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:04.98 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.55 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:03.27 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 0:00.67 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:04.39 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
13 root 20 0 0 0 0 S 0.0 0.0 0:00.61 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/2
15 root RT 0 0 0 0 S 0.0 0.0 0:04.86 migration/3
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/3
17 root 20 0 0 0 0 S 0.0 0.0 0:00.72 ksoftirqd/3
18 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/3
19 root 20 0 0 0 0 S 0.0 0.0 0:01.25 events/0
20 root 20 0 0 0 0 S 0.0 0.0 0:01.14 events/1
21 root 20 0 0 0 0 S 0.0 0.0 0:01.37 events/2
22 root 20 0 0 0 0 S 0.0 0.0 0:01.78 events/3
[root@nms ~]# top -n 1
top - 03:28:40 up 3:43, 1 user, load average: 2.90, 3.06, 4.93
Tasks: 352 total, 4 running, 348 sleeping, 0 stopped, 0 zombie
Cpu(s): 54.2%us, 17.9%sy, 0.0%ni, 21.5%id, 6.2%wa, 0.1%hi, 0.2%si, 0.0%st
Mem: 8053804k total, 6481264k used, 1572540k free, 179180k buffers
Swap: 4063224k total, 36884k used, 4026340k free, 5034464k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
34868 nagios 20 0 148m 19m 2016 R 43.0 0.2 0:00.22 check_nwc_healt
34873 nagios 20 0 144m 15m 1988 R 35.2 0.2 0:00.18 check_nwc_healt
34881 root 20 0 136m 7168 2080 R 17.6 0.1 0:00.09 snmptthandler
35871 snmptt 20 0 188m 37m 1720 S 11.7 0.5 5:06.45 snmptt
28592 mysql 20 0 4233m 57m 6316 S 5.9 0.7 5:49.80 mysqld
15 root RT 0 0 0 0 S 2.0 0.0 0:04.88 migration/3
23040 root 20 0 0 0 0 S 2.0 0.0 0:00.76 kblockd/4
29613 postgres 20 0 210m 5620 4040 S 2.0 0.1 0:00.05 postmaster
34840 root 20 0 15156 1408 892 R 2.0 0.0 0:00.01 top
35594 nagios 20 0 10016 964 676 S 2.0 0.0 0:03.74 nagios
35595 nagios 20 0 10016 964 676 S 2.0 0.0 0:03.69 nagios
35602 nagios 20 0 10016 976 676 S 2.0 0.0 0:03.36 nagios
35610 nagios 20 0 51324 2800 1008 S 2.0 0.0 1:19.61 ndo2db
1 root 20 0 19232 800 628 S 0.0 0.0 0:03.55 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.08 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:04.99 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.55 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:03.28 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 0:00.67 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:04.40 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
13 root 20 0 0 0 0 S 0.0 0.0 0:00.61 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/2
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/3
17 root 20 0 0 0 0 S 0.0 0.0 0:00.72 ksoftirqd/3
18 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/3
19 root 20 0 0 0 0 S 0.0 0.0 0:01.25 events/0
20 root 20 0 0 0 0 S 0.0 0.0 0:01.14 events/1
21 root 20 0 0 0 0 S 0.0 0.0 0:01.37 events/2
22 root 20 0 0 0 0 S 0.0 0.0 0:01.78 events/3
23 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup
24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns
[root@nms ~]#
Re: Nagios XI CPU NEar 100 %
Posted: Wed Aug 05, 2015 8:20 pm
by Box293
I would also suggest you add more RAM to your server, another 4Gb - 8GB.
Re: Nagios XI CPU NEar 100 %
Posted: Thu Aug 06, 2015 1:55 am
by abdelhafeth.mzahem
Ok We will Add anaother 4 GB of RAM.
but IF you can help why mysqldb & snmptt & httpd & check_nw_health services plugins consume Huge amount of CPU
that cause high load on server then hange ? and how to solve it.
Re: Nagios XI CPU NEar 100 %
Posted: Thu Aug 06, 2015 8:24 am
by lmiltchev
It is hard to know what is causing the high CPU load without knowing the environment. Quite often, mysql uses lot of CPU because of errors (crashed tables). Check the mysqld.log to make sure you don't have any crashed tables:
Some plugins are very "CPU intensive", i.e. SNMP, WMI, and vmware checks, etc. In general, compiled plugins are much more efficient than perl, python, and php (interpreted) plugins, and monitoring switch and router bandwidth also tends to use more CPU than most.
Are you using mainly active checks, passive checks or a combination of both?
How many hosts + services total are you monitoring? Make sure you meet and exceed our general guidelines on the hardware requirements, needed to run XI:
https://assets.nagios.com/downloads/nag ... ements.pdf
You may need to "fine tune" your Nagios XI instance in order to maximize performance. Here's a couple of links that will give you an idea of how you can boost the performance of your Nagios XI server:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
https://assets.nagios.com/downloads/nag ... p#boosting
Hope this helps.
Re: Nagios XI CPU NEar 100 %
Posted: Sun Aug 09, 2015 12:36 am
by abdelhafeth.mzahem
Dear Sir,
We have about 500 Hosts & 3500 Services we are monitoring.
What do you mean by active and passive checks ?
[root@nms ~]# tail -20 /var/log/mysqld.log
150809 7:48:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:48:07 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:48:07 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:48:25 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:50:59 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:52:06 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:52:15 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:52:23 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:52:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:52:45 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:53:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:53:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:53:12 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:53:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:54:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:55:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:55:05 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:55:05 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
150809 7:55:12 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
You have new mail in /var/spool/mail/root
[root@nms ~]#
Re: Nagios XI CPU NEar 100 %
Posted: Sun Aug 09, 2015 6:46 pm
by Box293
Yes it looks like you have crashed tables.
Can you please run these commands on your Nagios XI server in an SSH session:
Code: Select all
cd /usr/local/nagiosxi/scripts/
./repair_databases.sh
This may take a while to complete. Once it has, please scroll back up through the history to make sure there were not any database errors (you may see an ndo2db error when stopping the service but this is ok).
Re: Nagios XI CPU NEar 100 %
Posted: Tue Aug 11, 2015 10:51 am
by abdelhafeth.mzahem
Dear
We upgrade the cpu from 4 cores to 8 cores and memory from 8GB to 12 GB and also the Storage from 70 GB to 200GB and do the database repair, and everything is ok, but today the following error in db logs occured, and the data collection stopped many times and not works until kill all services and start them again,
[root@nms ~]# tail -20 /var/log/mysqld.log
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
150811 18:44:52 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
[root@nms ~]#
we also define the RAMDISK as NAgios document procedure, but not solve the problem and seems not work fine.
our nagios is not stable since 3 weeks ago as we added addtional 250 devices.
please your help as the monitoring stopped many times during the 3 days ago
Re: Nagios XI CPU NEar 100 %
Posted: Tue Aug 11, 2015 1:44 pm
by tgriep
Is the drive running out of space?
Please run the following and post the output.