high load issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cvanleke
Posts: 8
Joined: Thu Mar 14, 2013 6:14 am

high load issue

Post by cvanleke »

Hello,

Back again (http://support.nagios.com/forum/viewtop ... f=6&t=9711) with another issue, this time regarding high loads (100+) on my Nagios Server. Everything is running at crawl speed, even commands via SSH have a very noticeable delay.

Here's the nagiostats output not long after I restarted the monitoring engine.

Code: Select all

Nagios Stats 3.4.1
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 05-11-2012
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File:                            /usr/local/nagios/var/status.dat
Status File Age:                        0d 0h 0m 30s
Status File Version:                    3.4.1

Program Running Time:                   0d 0h 9m 59s
Nagios PID:                             13548
Used/High/Total Command Buffers:        0 / 0 / 4096

Total Services:                         2626
Services Checked:                       2626
Services Scheduled:                     2619
Services Actively Checked:              2619
Services Passively Checked:             7
Total Service State Change:             0.000 / 22.630 / 0.777 %
Active Service Latency:                 0.006 / 426.583 / 220.596 sec
Active Service Execution Time:          0.031 / 77.066 / 16.936 sec
Active Service State Change:            0.000 / 22.630 / 0.779 %
Active Services Last 1/5/15/60 min:     109 / 697 / 1635 / 2619
Passive Service Latency:                0.917 / 0.917 / 0.917 sec
Passive Service State Change:           0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:    0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:              2351 / 1 / 1 / 273
Services Flapping:                      4
Services In Downtime:                   0

Total Hosts:                            40
Hosts Checked:                          40
Hosts Scheduled:                        40
Hosts Actively Checked:                 40
Host Passively Checked:                 0
Total Host State Change:                0.000 / 0.000 / 0.000 %
Active Host Latency:                    0.000 / 428.729 / 206.815 sec
Active Host Execution Time:             0.033 / 12.009 / 1.188 sec
Active Host State Change:               0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:        3 / 18 / 25 / 40
Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
Passive Host State Change:              0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
Hosts Up/Down/Unreach:                  36 / 4 / 0
Hosts Flapping:                         0
Hosts In Downtime:                      0

Active Host Checks Last 1/5/15 min:     17 / 63 / 98
   Scheduled:                           4 / 10 / 22
   On-demand:                           13 / 53 / 76
   Parallel:                            5 / 23 / 41
   Serial:                              0 / 0 / 0
   Cached:                              12 / 40 / 57
Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
Active Service Checks Last 1/5/15 min:  290 / 795 / 1670
   Scheduled:                           290 / 795 / 1670
   On-demand:                           0 / 0 / 0
   Cached:                              0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:      0 / 0 / 0

I am also running NagVis, but I don't think that'll be the issue.

Let me know if you need more info.

Thanks,
Christophe
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: high load issue

Post by abrist »

You have 40 hosts and 2600+ services.
1. Are these switches that you are monitoring?
2. What type of hardware is in this server?
3. What type of checks are you running?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cvanleke
Posts: 8
Joined: Thu Mar 14, 2013 6:14 am

Re: high load issue

Post by cvanleke »

Yes, almost all the hosts are switches with ping and port status + bandwidth checks.
Server is running in a CentOS 6.3 VM.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: high load issue

Post by mguthrie »

this time regarding high loads (100+) on my Nagios Server
What seems to be taking up so much CPU when you run

Code: Select all

top
cvanleke
Posts: 8
Joined: Thu Mar 14, 2013 6:14 am

Re: high load issue

Post by cvanleke »

The hardware running Nagios is an Intel Xeon X5650 (1 core) with 741MB of RAM.

top says not much, it's mostly a lot of check_ifopersta with around 1% CPU each and apache httpd processes.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: high load issue

Post by scottwilkerson »

If you go to Admin -> System Status what do you have in the I/O Wait ?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
cvanleke
Posts: 8
Joined: Thu Mar 14, 2013 6:14 am

Re: high load issue

Post by cvanleke »

Very low wait, usually between 0 and 5%.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: high load issue

Post by scottwilkerson »

Can you run the following and report the results

Code: Select all

ps -eo pid,comm,%cpu,pcpu,user,nice,cpu,pid,args | sort -rk 3|head -n 30
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
cvanleke
Posts: 8
Joined: Thu Mar 14, 2013 6:14 am

Re: high load issue

Post by cvanleke »

The output varies a lot.

Code: Select all

PID COMMAND         %CPU %CPU USER      NI CPU   PID COMMAND
14782 httpd            3.9  3.9 apache     0   - 14782 /usr/sbin/httpd
14966 php              3.5  3.5 nagios     0   - 14966 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
14894 httpd            3.5  3.5 apache     0   - 14894 /usr/sbin/httpd
14893 httpd            3.5  3.5 apache     0   - 14893 /usr/sbin/httpd
14841 httpd            3.1  3.1 apache     0   - 14841 /usr/sbin/httpd
14891 httpd            2.7  2.7 apache     0   - 14891 /usr/sbin/httpd
14959 php              2.5  2.5 nagios     0   - 14959 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
14892 httpd            2.5  2.5 apache     0   - 14892 /usr/sbin/httpd
14967 php              2.3  2.3 nagios     0   - 14967 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
14962 php              2.3  2.3 nagios     0   - 14962 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
14654 mysqld           2.2  2.2 mysql      0   - 14654 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
14961 php              2.1  2.1 nagios     0   - 14961 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
 1737 nagios           1.2  1.2 nagios     0   -  1737 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
14615 httpd            1.0  1.0 apache     0   - 14615 /usr/sbin/httpd
14613 httpd            1.0  1.0 apache     0   - 14613 /usr/sbin/httpd
14609 httpd            1.0  1.0 apache     0   - 14609 /usr/sbin/httpd
14608 httpd            1.0  1.0 apache     0   - 14608 /usr/sbin/httpd
14604 httpd            1.0  1.0 apache     0   - 14604 /usr/sbin/httpd
14614 httpd            0.9  0.9 apache     0   - 14614 /usr/sbin/httpd
14611 httpd            0.8  0.8 apache     0   - 14611 /usr/sbin/httpd

Code: Select all

13912 nagio <defunct>  0.0  0.0 nagios     0   - 13912 [nagios] <defunct>
13750 nagio <defunct>  0.0  0.0 nagios     0   - 13750 [nagios] <defunct>
13681 nagio <defunct>  0.0  0.0 nagios     0   - 13681 [nagios] <defunct>
13614 nagio <defunct>  0.0  0.0 nagios     0   - 13614 [nagios] <defunct>
13411 nagio <defunct>  0.0  0.0 nagios     0   - 13411 [nagios] <defunct>
13410 nagio <defunct>  0.0  0.0 nagios     0   - 13410 [nagios] <defunct>
12802 nagio <defunct>  0.0  0.0 nagios     0   - 12802 [nagios] <defunct>
12801 nagio <defunct>  0.0  0.0 nagios     0   - 12801 [nagios] <defunct>
  PID COMMAND         %CPU %CPU USER      NI CPU   PID COMMAND
14654 mysqld           2.0  2.0 mysql      0   - 14654 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 1737 nagios           1.3  1.3 nagios     0   -  1737 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
14681 local            1.0  1.0 postfix    0   - 14681 local -t unix
14138 php              0.5  0.5 nagios     0   - 14138 /usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php
14124 php              0.5  0.5 nagios     0   - 14124 /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php
14120 php              0.5  0.5 nagios     0   - 14120 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
14111 php              0.5  0.5 nagios     0   - 14111 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
14105 php              0.5  0.5 nagios     0   - 14105 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
13543 check_ifopersta  0.5  0.5 nagios     0   - 13543 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 86 -v 2
13518 check_ifopersta  0.5  0.5 nagios     0   - 13518 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 52 -v 2
13222 check_ifopersta  0.5  0.5 nagios     0   - 13222 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 13 -v 2
14608 httpd            0.5  0.5 apache     0   - 14608 /usr/sbin/httpd
14604 httpd            0.5  0.5 apache     0   - 14604 /usr/sbin/httpd
 3161 ndo2db           0.4  0.4 nagios     0   -  3161 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
14199 php              0.4  0.4 nagios     0   - 14199 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
14177 php              0.4  0.4 nagios     0   - 14177 /usr/bin/php -q /usr/local/nagiosxi/cron/nom.php
14106 php              0.4  0.4 nagios     0   - 14106 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
14613 httpd            0.4  0.4 apache     0   - 14613 /usr/sbin/httpd
14609 httpd            0.4  0.4 apache     0   - 14609 /usr/sbin/httpd
   28 kswapd0          0.3  0.3 root       0   -    28 [kswapd0]
    1 init             0.3  0.3 root       0   -     1 /sbin/init
If I leave the Nagios service running for a few minutes the vm becomes completely unresponsive.

Christophe
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: high load issue

Post by scottwilkerson »

one more

Code: Select all

ps -ef|grep bin/nag
And can you attach your nagios.cfg

Thanks
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked