Page 1 of 2

high load issue

Posted: Fri Mar 15, 2013 10:17 am
by cvanleke
Hello,

Back again (http://support.nagios.com/forum/viewtop ... f=6&t=9711) with another issue, this time regarding high loads (100+) on my Nagios Server. Everything is running at crawl speed, even commands via SSH have a very noticeable delay.

Here's the nagiostats output not long after I restarted the monitoring engine.

Code: Select all

Nagios Stats 3.4.1
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 05-11-2012
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File:                            /usr/local/nagios/var/status.dat
Status File Age:                        0d 0h 0m 30s
Status File Version:                    3.4.1

Program Running Time:                   0d 0h 9m 59s
Nagios PID:                             13548
Used/High/Total Command Buffers:        0 / 0 / 4096

Total Services:                         2626
Services Checked:                       2626
Services Scheduled:                     2619
Services Actively Checked:              2619
Services Passively Checked:             7
Total Service State Change:             0.000 / 22.630 / 0.777 %
Active Service Latency:                 0.006 / 426.583 / 220.596 sec
Active Service Execution Time:          0.031 / 77.066 / 16.936 sec
Active Service State Change:            0.000 / 22.630 / 0.779 %
Active Services Last 1/5/15/60 min:     109 / 697 / 1635 / 2619
Passive Service Latency:                0.917 / 0.917 / 0.917 sec
Passive Service State Change:           0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:    0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:              2351 / 1 / 1 / 273
Services Flapping:                      4
Services In Downtime:                   0

Total Hosts:                            40
Hosts Checked:                          40
Hosts Scheduled:                        40
Hosts Actively Checked:                 40
Host Passively Checked:                 0
Total Host State Change:                0.000 / 0.000 / 0.000 %
Active Host Latency:                    0.000 / 428.729 / 206.815 sec
Active Host Execution Time:             0.033 / 12.009 / 1.188 sec
Active Host State Change:               0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:        3 / 18 / 25 / 40
Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
Passive Host State Change:              0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
Hosts Up/Down/Unreach:                  36 / 4 / 0
Hosts Flapping:                         0
Hosts In Downtime:                      0

Active Host Checks Last 1/5/15 min:     17 / 63 / 98
   Scheduled:                           4 / 10 / 22
   On-demand:                           13 / 53 / 76
   Parallel:                            5 / 23 / 41
   Serial:                              0 / 0 / 0
   Cached:                              12 / 40 / 57
Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
Active Service Checks Last 1/5/15 min:  290 / 795 / 1670
   Scheduled:                           290 / 795 / 1670
   On-demand:                           0 / 0 / 0
   Cached:                              0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:      0 / 0 / 0

I am also running NagVis, but I don't think that'll be the issue.

Let me know if you need more info.

Thanks,
Christophe

Re: high load issue

Posted: Fri Mar 15, 2013 10:41 am
by abrist
You have 40 hosts and 2600+ services.
1. Are these switches that you are monitoring?
2. What type of hardware is in this server?
3. What type of checks are you running?

Re: high load issue

Posted: Fri Mar 15, 2013 11:12 am
by cvanleke
Yes, almost all the hosts are switches with ping and port status + bandwidth checks.
Server is running in a CentOS 6.3 VM.

Re: high load issue

Posted: Fri Mar 15, 2013 1:44 pm
by mguthrie
this time regarding high loads (100+) on my Nagios Server
What seems to be taking up so much CPU when you run

Code: Select all

top

Re: high load issue

Posted: Mon Mar 18, 2013 3:34 am
by cvanleke
The hardware running Nagios is an Intel Xeon X5650 (1 core) with 741MB of RAM.

top says not much, it's mostly a lot of check_ifopersta with around 1% CPU each and apache httpd processes.

Re: high load issue

Posted: Mon Mar 18, 2013 7:47 am
by scottwilkerson
If you go to Admin -> System Status what do you have in the I/O Wait ?

Re: high load issue

Posted: Mon Mar 18, 2013 8:10 am
by cvanleke
Very low wait, usually between 0 and 5%.

Re: high load issue

Posted: Mon Mar 18, 2013 9:56 am
by scottwilkerson
Can you run the following and report the results

Code: Select all

ps -eo pid,comm,%cpu,pcpu,user,nice,cpu,pid,args | sort -rk 3|head -n 30

Re: high load issue

Posted: Mon Mar 18, 2013 10:35 am
by cvanleke
The output varies a lot.

Code: Select all

PID COMMAND         %CPU %CPU USER      NI CPU   PID COMMAND
14782 httpd            3.9  3.9 apache     0   - 14782 /usr/sbin/httpd
14966 php              3.5  3.5 nagios     0   - 14966 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
14894 httpd            3.5  3.5 apache     0   - 14894 /usr/sbin/httpd
14893 httpd            3.5  3.5 apache     0   - 14893 /usr/sbin/httpd
14841 httpd            3.1  3.1 apache     0   - 14841 /usr/sbin/httpd
14891 httpd            2.7  2.7 apache     0   - 14891 /usr/sbin/httpd
14959 php              2.5  2.5 nagios     0   - 14959 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
14892 httpd            2.5  2.5 apache     0   - 14892 /usr/sbin/httpd
14967 php              2.3  2.3 nagios     0   - 14967 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
14962 php              2.3  2.3 nagios     0   - 14962 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
14654 mysqld           2.2  2.2 mysql      0   - 14654 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
14961 php              2.1  2.1 nagios     0   - 14961 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
 1737 nagios           1.2  1.2 nagios     0   -  1737 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
14615 httpd            1.0  1.0 apache     0   - 14615 /usr/sbin/httpd
14613 httpd            1.0  1.0 apache     0   - 14613 /usr/sbin/httpd
14609 httpd            1.0  1.0 apache     0   - 14609 /usr/sbin/httpd
14608 httpd            1.0  1.0 apache     0   - 14608 /usr/sbin/httpd
14604 httpd            1.0  1.0 apache     0   - 14604 /usr/sbin/httpd
14614 httpd            0.9  0.9 apache     0   - 14614 /usr/sbin/httpd
14611 httpd            0.8  0.8 apache     0   - 14611 /usr/sbin/httpd

Code: Select all

13912 nagio <defunct>  0.0  0.0 nagios     0   - 13912 [nagios] <defunct>
13750 nagio <defunct>  0.0  0.0 nagios     0   - 13750 [nagios] <defunct>
13681 nagio <defunct>  0.0  0.0 nagios     0   - 13681 [nagios] <defunct>
13614 nagio <defunct>  0.0  0.0 nagios     0   - 13614 [nagios] <defunct>
13411 nagio <defunct>  0.0  0.0 nagios     0   - 13411 [nagios] <defunct>
13410 nagio <defunct>  0.0  0.0 nagios     0   - 13410 [nagios] <defunct>
12802 nagio <defunct>  0.0  0.0 nagios     0   - 12802 [nagios] <defunct>
12801 nagio <defunct>  0.0  0.0 nagios     0   - 12801 [nagios] <defunct>
  PID COMMAND         %CPU %CPU USER      NI CPU   PID COMMAND
14654 mysqld           2.0  2.0 mysql      0   - 14654 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 1737 nagios           1.3  1.3 nagios     0   -  1737 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
14681 local            1.0  1.0 postfix    0   - 14681 local -t unix
14138 php              0.5  0.5 nagios     0   - 14138 /usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php
14124 php              0.5  0.5 nagios     0   - 14124 /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php
14120 php              0.5  0.5 nagios     0   - 14120 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
14111 php              0.5  0.5 nagios     0   - 14111 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
14105 php              0.5  0.5 nagios     0   - 14105 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
13543 check_ifopersta  0.5  0.5 nagios     0   - 13543 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 86 -v 2
13518 check_ifopersta  0.5  0.5 nagios     0   - 13518 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 52 -v 2
13222 check_ifopersta  0.5  0.5 nagios     0   - 13222 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 13 -v 2
14608 httpd            0.5  0.5 apache     0   - 14608 /usr/sbin/httpd
14604 httpd            0.5  0.5 apache     0   - 14604 /usr/sbin/httpd
 3161 ndo2db           0.4  0.4 nagios     0   -  3161 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
14199 php              0.4  0.4 nagios     0   - 14199 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
14177 php              0.4  0.4 nagios     0   - 14177 /usr/bin/php -q /usr/local/nagiosxi/cron/nom.php
14106 php              0.4  0.4 nagios     0   - 14106 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
14613 httpd            0.4  0.4 apache     0   - 14613 /usr/sbin/httpd
14609 httpd            0.4  0.4 apache     0   - 14609 /usr/sbin/httpd
   28 kswapd0          0.3  0.3 root       0   -    28 [kswapd0]
    1 init             0.3  0.3 root       0   -     1 /sbin/init
If I leave the Nagios service running for a few minutes the vm becomes completely unresponsive.

Christophe

Re: high load issue

Posted: Mon Mar 18, 2013 11:38 am
by scottwilkerson
one more

Code: Select all

ps -ef|grep bin/nag
And can you attach your nagios.cfg

Thanks