This support forum board is for support questions relating to
Nagios XI , our flagship commercial network monitoring solution.
cvanleke
Posts: 8 Joined: Thu Mar 14, 2013 6:14 am
Post
by cvanleke » Fri Mar 15, 2013 10:17 am
Hello,
Back again (
http://support.nagios.com/forum/viewtop ... f=6&t=9711 ) with another issue, this time regarding high loads (100+) on my Nagios Server. Everything is running at crawl speed, even commands via SSH have a very noticeable delay.
Here's the nagiostats output not long after I restarted the monitoring engine.
Code: Select all
Nagios Stats 3.4.1
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 05-11-2012
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /usr/local/nagios/var/status.dat
Status File Age: 0d 0h 0m 30s
Status File Version: 3.4.1
Program Running Time: 0d 0h 9m 59s
Nagios PID: 13548
Used/High/Total Command Buffers: 0 / 0 / 4096
Total Services: 2626
Services Checked: 2626
Services Scheduled: 2619
Services Actively Checked: 2619
Services Passively Checked: 7
Total Service State Change: 0.000 / 22.630 / 0.777 %
Active Service Latency: 0.006 / 426.583 / 220.596 sec
Active Service Execution Time: 0.031 / 77.066 / 16.936 sec
Active Service State Change: 0.000 / 22.630 / 0.779 %
Active Services Last 1/5/15/60 min: 109 / 697 / 1635 / 2619
Passive Service Latency: 0.917 / 0.917 / 0.917 sec
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 2351 / 1 / 1 / 273
Services Flapping: 4
Services In Downtime: 0
Total Hosts: 40
Hosts Checked: 40
Hosts Scheduled: 40
Hosts Actively Checked: 40
Host Passively Checked: 0
Total Host State Change: 0.000 / 0.000 / 0.000 %
Active Host Latency: 0.000 / 428.729 / 206.815 sec
Active Host Execution Time: 0.033 / 12.009 / 1.188 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min: 3 / 18 / 25 / 40
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 36 / 4 / 0
Hosts Flapping: 0
Hosts In Downtime: 0
Active Host Checks Last 1/5/15 min: 17 / 63 / 98
Scheduled: 4 / 10 / 22
On-demand: 13 / 53 / 76
Parallel: 5 / 23 / 41
Serial: 0 / 0 / 0
Cached: 12 / 40 / 57
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 290 / 795 / 1670
Scheduled: 290 / 795 / 1670
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0
External Commands Last 1/5/15 min: 0 / 0 / 0
I am also running NagVis, but I don't think that'll be the issue.
Let me know if you need more info.
Thanks,
Christophe
abrist
Red Shirt
Posts: 8334 Joined: Thu Nov 15, 2012 1:20 pm
Post
by abrist » Fri Mar 15, 2013 10:41 am
You have 40 hosts and 2600+ services.
1. Are these switches that you are monitoring?
2. What type of hardware is in this server?
3. What type of checks are you running?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the
Dark Side .
cvanleke
Posts: 8 Joined: Thu Mar 14, 2013 6:14 am
Post
by cvanleke » Fri Mar 15, 2013 11:12 am
Yes, almost all the hosts are switches with ping and port status + bandwidth checks.
Server is running in a CentOS 6.3 VM.
mguthrie
Posts: 4380 Joined: Mon Jun 14, 2010 10:21 am
Post
by mguthrie » Fri Mar 15, 2013 1:44 pm
this time regarding high loads (100+) on my Nagios Server
What seems to be taking up so much CPU when you run
cvanleke
Posts: 8 Joined: Thu Mar 14, 2013 6:14 am
Post
by cvanleke » Mon Mar 18, 2013 3:34 am
The hardware running Nagios is an Intel Xeon X5650 (1 core) with 741MB of RAM.
top says not much, it's mostly a lot of check_ifopersta with around 1% CPU each and apache httpd processes.
scottwilkerson
DevOps Engineer
Posts: 19396 Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:
Post
by scottwilkerson » Mon Mar 18, 2013 7:47 am
If you go to Admin -> System Status what do you have in the I/O Wait ?
cvanleke
Posts: 8 Joined: Thu Mar 14, 2013 6:14 am
Post
by cvanleke » Mon Mar 18, 2013 8:10 am
Very low wait, usually between 0 and 5%.
scottwilkerson
DevOps Engineer
Posts: 19396 Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:
Post
by scottwilkerson » Mon Mar 18, 2013 9:56 am
Can you run the following and report the results
Code: Select all
ps -eo pid,comm,%cpu,pcpu,user,nice,cpu,pid,args | sort -rk 3|head -n 30
cvanleke
Posts: 8 Joined: Thu Mar 14, 2013 6:14 am
Post
by cvanleke » Mon Mar 18, 2013 10:35 am
The output varies a lot.
Code: Select all
PID COMMAND %CPU %CPU USER NI CPU PID COMMAND
14782 httpd 3.9 3.9 apache 0 - 14782 /usr/sbin/httpd
14966 php 3.5 3.5 nagios 0 - 14966 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
14894 httpd 3.5 3.5 apache 0 - 14894 /usr/sbin/httpd
14893 httpd 3.5 3.5 apache 0 - 14893 /usr/sbin/httpd
14841 httpd 3.1 3.1 apache 0 - 14841 /usr/sbin/httpd
14891 httpd 2.7 2.7 apache 0 - 14891 /usr/sbin/httpd
14959 php 2.5 2.5 nagios 0 - 14959 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
14892 httpd 2.5 2.5 apache 0 - 14892 /usr/sbin/httpd
14967 php 2.3 2.3 nagios 0 - 14967 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
14962 php 2.3 2.3 nagios 0 - 14962 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
14654 mysqld 2.2 2.2 mysql 0 - 14654 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
14961 php 2.1 2.1 nagios 0 - 14961 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
1737 nagios 1.2 1.2 nagios 0 - 1737 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
14615 httpd 1.0 1.0 apache 0 - 14615 /usr/sbin/httpd
14613 httpd 1.0 1.0 apache 0 - 14613 /usr/sbin/httpd
14609 httpd 1.0 1.0 apache 0 - 14609 /usr/sbin/httpd
14608 httpd 1.0 1.0 apache 0 - 14608 /usr/sbin/httpd
14604 httpd 1.0 1.0 apache 0 - 14604 /usr/sbin/httpd
14614 httpd 0.9 0.9 apache 0 - 14614 /usr/sbin/httpd
14611 httpd 0.8 0.8 apache 0 - 14611 /usr/sbin/httpd
Code: Select all
13912 nagio <defunct> 0.0 0.0 nagios 0 - 13912 [nagios] <defunct>
13750 nagio <defunct> 0.0 0.0 nagios 0 - 13750 [nagios] <defunct>
13681 nagio <defunct> 0.0 0.0 nagios 0 - 13681 [nagios] <defunct>
13614 nagio <defunct> 0.0 0.0 nagios 0 - 13614 [nagios] <defunct>
13411 nagio <defunct> 0.0 0.0 nagios 0 - 13411 [nagios] <defunct>
13410 nagio <defunct> 0.0 0.0 nagios 0 - 13410 [nagios] <defunct>
12802 nagio <defunct> 0.0 0.0 nagios 0 - 12802 [nagios] <defunct>
12801 nagio <defunct> 0.0 0.0 nagios 0 - 12801 [nagios] <defunct>
PID COMMAND %CPU %CPU USER NI CPU PID COMMAND
14654 mysqld 2.0 2.0 mysql 0 - 14654 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
1737 nagios 1.3 1.3 nagios 0 - 1737 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
14681 local 1.0 1.0 postfix 0 - 14681 local -t unix
14138 php 0.5 0.5 nagios 0 - 14138 /usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php
14124 php 0.5 0.5 nagios 0 - 14124 /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php
14120 php 0.5 0.5 nagios 0 - 14120 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
14111 php 0.5 0.5 nagios 0 - 14111 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
14105 php 0.5 0.5 nagios 0 - 14105 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
13543 check_ifopersta 0.5 0.5 nagios 0 - 13543 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 86 -v 2
13518 check_ifopersta 0.5 0.5 nagios 0 - 13518 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 52 -v 2
13222 check_ifopersta 0.5 0.5 nagios 0 - 13222 /usr/bin/perl -w /usr/local/nagios/libexec/check_ifoperstatus -H 10.10.54.3 -C csc-snmp -k 13 -v 2
14608 httpd 0.5 0.5 apache 0 - 14608 /usr/sbin/httpd
14604 httpd 0.5 0.5 apache 0 - 14604 /usr/sbin/httpd
3161 ndo2db 0.4 0.4 nagios 0 - 3161 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
14199 php 0.4 0.4 nagios 0 - 14199 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
14177 php 0.4 0.4 nagios 0 - 14177 /usr/bin/php -q /usr/local/nagiosxi/cron/nom.php
14106 php 0.4 0.4 nagios 0 - 14106 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
14613 httpd 0.4 0.4 apache 0 - 14613 /usr/sbin/httpd
14609 httpd 0.4 0.4 apache 0 - 14609 /usr/sbin/httpd
28 kswapd0 0.3 0.3 root 0 - 28 [kswapd0]
1 init 0.3 0.3 root 0 - 1 /sbin/init
If I leave the Nagios service running for a few minutes the vm becomes completely unresponsive.
Christophe
scottwilkerson
DevOps Engineer
Posts: 19396 Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:
Post
by scottwilkerson » Mon Mar 18, 2013 11:38 am
one more
And can you attach your nagios.cfg
Thanks