Nagios 2014R2.3 on VM HIGH LOAD SPIKES

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
carlos.atos
Posts: 29
Joined: Mon Nov 10, 2014 1:08 pm

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by carlos.atos »

Hello scottwilkerson ,

Sorry I did that yesterday as well, sorry again.
here is the output at the same moment of the spikes yesterday for both VMs

4 vCPU VM:

Code: Select all

[root@localhost ~]# ps -ef|grep php
nagios   24771 24769  0 15:23 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios   24774 24768  0 15:23 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios   24775 24771  5 15:23 ?        00:00:02 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios   24777 24766  0 15:23 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios   24779 24777  9 15:23 ?        00:00:03 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios   24781 24774  7 15:23 ?        00:00:03 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios   24784 24770  0 15:23 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios   24786 24784 13 15:23 ?        00:00:05 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
root     25011 15036  0 15:23 pts/0    00:00:00 grep php
[root@localhost ~]# 
2 vCPU VM:

Code: Select all

[root@localhost ~]# ps -ef|grep php
nagios    4937  4934  0 14:48 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios    4939  4936  0 14:48 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios    4940  4932  0 14:48 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios    4941  4937  3 14:48 ?        00:00:01 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios    4943  4939  8 14:48 ?        00:00:03 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios    4944  4940  3 14:48 ?        00:00:01 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios    4949  4935  0 14:48 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios    4952  4949  3 14:48 ?        00:00:01 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
root      5171  4797  0 14:48 pts/0    00:00:00 grep php
what do you see?
I will be waiting for any spike to catch them again with ps -ef|grep php|grep -v /bin/sh .

cheers
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by scottwilkerson »

I was wondering if there was going to be a bunch of these processes but it doesn't look like it

I guess if you see a spike again a better command would be just

Code: Select all

ps aux
One thing to note though, if this is a VM environment, if another VM starts to totally eat all of the disk I/O you would see a spike in load even of your CPU's had free cycles, and Nagios itself has a large appetite for disk I/O
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
carlos.atos
Posts: 29
Joined: Mon Nov 10, 2014 1:08 pm

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by carlos.atos »

Hi scottwilkerson

I had a big load spike yesterday but I was't in the office to notice it and catch the procceses. both nagios VMs on the Host went crazy up to 300 in load.
4vCPUs-FEB-09-current_load.png
I connected to the console on the 4vCPUs VM and there were a message saying that i have a message in /var/spool/mail/root

and this is what I got from there.

Code: Select all

###### WARNING ######
Errors reported during AutoMySQLBackup execution.. Backup failed
Error log below..
-- Warning: Skipping the data of table mysql.event. Specify the --events option explicitly.

From [email protected]  Tue Dec 23 08:00:09 2014
Return-Path: <[email protected]>
X-Original-To: root@localhost
Delivered-To: [email protected]
Received: by localhost.localdomain (Postfix, from userid 0)
        id 8A50565E; Tue, 23 Dec 2014 08:00:09 +0000 (GMT)
Date: Tue, 23 Dec 2014 08:00:09 +0000
To: [email protected]
Subject: PostgreSQL Backup Log for localhost - 2014-12-23
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0

2015-01-26 01:50:40: ERROR: Target[193.138.100.2_49][_OUT_] ' $target->[50]{$mode} ' did not eval into defined data
2015-01-26 01:50:40: ERROR: Target[193.138.100.2_50][_IN_] ' $target->[51]{$mode} ' did not eval into defined data
2015-01-26 01:50:40: ERROR: Target[193.138.100.2_50][_OUT_] ' $target->[51]{$mode} ' did not eval into defined data
2015-01-26 01:50:40: ERROR: Target[193.138.100.2_51][_IN_] ' $target->[52]{$mode} ' did not eval into defined data
2015-01-26 01:50:40: ERROR: Target[193.138.100.2_51][_OUT_] ' $target->[52]{$mode} ' did not eval into defined data
2015-01-26 01:50:40: ERROR: Target[193.138.100.2_66][_IN_] ' $target->[53]{$mode} ' did not eval into defined data
2015-01-26 01:50:40: ERROR: Target[193.138.100.2_66][_OUT_] ' $target->[53]{$mode} ' did not eval into defined data

From [email protected]  Sun Feb  8 00:32:02 2015
Return-Path: <[email protected]>
X-Original-To: root
Delivered-To: [email protected]
Received: by localhost.localdomain (Postfix, from userid 0)
        id 4C96E70B; Sun,  8 Feb 2015 00:21:54 +0000 (GMT)
From: [email protected] (Cron Daemon)
To: [email protected]
Subject: Cron <root@localhost> LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/root>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=root>
X-Cron-Env: <USER=root>
Message-Id: <[email protected]>
Date: Sun,  8 Feb 2015 00:21:29 +0000 (GMT)

2015-02-08 00:16:54: ERROR: I guess another mrtg is running. A lockfile (/var/lock/mrtg/mrtg_l) aged
223 seconds is hanging around. If you are sure that no other mrtg
is running you can remove the lockfile

From [email protected]  Sun Feb  8 12:09:30 2015
Return-Path: <[email protected]>
X-Original-To: root
Delivered-To: [email protected]
Received: by localhost.localdomain (Postfix, from userid 0)
        id 6FED9940; Sun,  8 Feb 2015 08:25:24 +0000 (GMT)
From: [email protected] (Cron Daemon)
To: [email protected]
Subject: Cron <root@localhost>   /root/scripts/autopostgresqlbackup
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/root>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=root>
X-Cron-Env: <USER=root>
Message-Id: <[email protected]>
Date: Sun,  8 Feb 2015 08:15:48 +0000 (GMT)

psql: FATAL:  sorry, too many clients already
I ran ps aux (but not at the spike moment) and this is what I've got.

Code: Select all

[root@localhost ~]# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  19360  1020 ?        Ss   Feb03   1:42 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Feb03   0:49 [migration/0]
root         4  0.0  0.0      0     0 ?        S    Feb03   0:06 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/0]
root         6  0.0  0.0      0     0 ?        S    Feb03   0:06 [watchdog/0]
root         7  0.0  0.0      0     0 ?        S    Feb03   0:21 [migration/1]
root         8  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/1]
root         9  0.0  0.0      0     0 ?        S    Feb03   0:13 [ksoftirqd/1]
root        10  0.0  0.0      0     0 ?        S    Feb03   0:06 [watchdog/1]
root        11  0.0  0.0      0     0 ?        S    Feb03   0:56 [migration/2]
root        12  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/2]
root        13  0.0  0.0      0     0 ?        S    Feb03   0:09 [ksoftirqd/2]
root        14  0.0  0.0      0     0 ?        S    Feb03   0:05 [watchdog/2]
root        15  0.0  0.0      0     0 ?        S    Feb03   0:24 [migration/3]
root        16  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/3]
root        17  0.0  0.0      0     0 ?        S    Feb03   0:11 [ksoftirqd/3]
root        18  0.0  0.0      0     0 ?        S    Feb03   0:06 [watchdog/3]
root        19  0.0  0.0      0     0 ?        S    Feb03   2:13 [events/0]
root        20  0.0  0.0      0     0 ?        S    Feb03   3:55 [events/1]
root        21  0.0  0.0      0     0 ?        S    Feb03   2:20 [events/2]
root        22  0.0  0.0      0     0 ?        S    Feb03   2:58 [events/3]
root        23  0.0  0.0      0     0 ?        S    Feb03   0:00 [cgroup]
root        24  0.0  0.0      0     0 ?        S    Feb03   0:00 [khelper]
root        25  0.0  0.0      0     0 ?        S    Feb03   0:00 [netns]
root        26  0.0  0.0      0     0 ?        S    Feb03   0:00 [async/mgr]
root        27  0.0  0.0      0     0 ?        S    Feb03   0:00 [pm]
root        28  0.0  0.0      0     0 ?        S    Feb03   0:11 [sync_supers]
root        29  0.0  0.0      0     0 ?        S    Feb03   0:09 [bdi-default]
root        30  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/0]
root        31  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/1]
root        32  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/2]
root        33  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/3]
root        34  0.0  0.0      0     0 ?        S    Feb03   1:34 [kblockd/0]
root        35  0.0  0.0      0     0 ?        S    Feb03   0:53 [kblockd/1]
root        36  0.0  0.0      0     0 ?        S    Feb03   1:23 [kblockd/2]
root        37  0.0  0.0      0     0 ?        S    Feb03   0:50 [kblockd/3]
root        38  0.0  0.0      0     0 ?        S    Feb03   0:00 [kacpid]
root        39  0.0  0.0      0     0 ?        S    Feb03   0:00 [kacpi_notify]
root        40  0.0  0.0      0     0 ?        S    Feb03   0:00 [kacpi_hotplug]
root        41  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_aux]
root        42  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/0]
root        43  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/1]
root        44  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/2]
root        45  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/3]
root        46  0.0  0.0      0     0 ?        S    Feb03   0:00 [ksuspend_usbd]
root        47  0.0  0.0      0     0 ?        S    Feb03   0:00 [khubd]
root        48  0.0  0.0      0     0 ?        S    Feb03   0:00 [kseriod]
root        49  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/0]
root        50  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/1]
root        51  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/2]
root        52  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/3]
root        53  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/0]
root        54  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/1]
root        55  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/2]
root        56  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/3]
root        57  0.0  0.0      0     0 ?        S    Feb03   0:00 [linkwatch]
root        58  0.0  0.0      0     0 ?        S    Feb03   0:08 [khungtaskd]
root        59  0.1  0.0      0     0 ?        S    Feb03  10:30 [kswapd0]
root        60  0.0  0.0      0     0 ?        SN   Feb03   0:00 [ksmd]
root        61  0.0  0.0      0     0 ?        SN   Feb03   2:01 [khugepaged]
root        62  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/0]
root        63  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/1]
root        64  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/2]
root        65  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/3]
root        66  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/0]
root        67  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/1]
root        68  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/2]
root        69  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/3]
root        77  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/0]
root        78  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/1]
root        79  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/2]
root        80  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/3]
root        81  0.0  0.0      0     0 ?        S    Feb03   0:00 [pciehpd]
root        83  0.0  0.0      0     0 ?        S    Feb03   0:00 [kpsmoused]
root        84  0.0  0.0      0     0 ?        S    Feb03   0:00 [usbhid_resumer]
root        85  0.0  0.0      0     0 ?        S    Feb03   0:00 [deferwq]
root       116  0.0  0.0      0     0 ?        S    Feb03   0:00 [kdmremove]
root       117  0.0  0.0      0     0 ?        S    Feb03   0:00 [kstriped]
root       283  0.0  0.0      0     0 ?        S    Feb03   0:46 [mpt_poll_0]
root       284  0.0  0.0      0     0 ?        S    Feb03   0:00 [mpt/0]
root       309  0.0  0.0      0     0 ?        S    Feb03   0:00 [scsi_eh_0]
root       318  0.0  0.0      0     0 ?        S    Feb03   0:00 [scsi_eh_1]
root       319  0.0  0.0      0     0 ?        S    Feb03   0:00 [scsi_eh_2]
root       422  0.0  0.0      0     0 ?        S    Feb03   0:00 [kdmflush]
root       424  0.0  0.0      0     0 ?        S    Feb03   0:00 [kdmflush]
root       442  0.0  0.0      0     0 ?        S    Feb03   3:16 [jbd2/dm-0-8]
root       443  0.0  0.0      0     0 ?        S    Feb03   0:00 [ext4-dio-unwrit]
root       516  0.0  0.0  11028   260 ?        S<s  Feb03   0:00 /sbin/udevd -d
root       712  0.0  0.0      0     0 ?        S    Feb03   0:35 [vmmemctl]
root       720  0.0  0.0      0     0 ?        S    Feb03   1:45 [flush-253:0]
root       858  0.0  0.0  11028   244 ?        S<   Feb03   0:00 /sbin/udevd -d
root       865  0.0  0.0  10636   248 ?        S<   Feb03   0:00 /sbin/udevd -d
root       889  0.0  0.0      0     0 ?        S    Feb03   0:00 [jbd2/sda1-8]
root       890  0.0  0.0      0     0 ?        S    Feb03   0:00 [ext4-dio-unwrit]
root       928  0.0  0.0      0     0 ?        S    Feb03   2:10 [kauditd]
root      1175  0.1  0.0  93144   728 ?        S<sl Feb03  11:05 auditd
root      1199  0.1  0.1 249476  6860 ?        Sl   Feb03  13:09 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
dbus      1212  0.0  0.0  21532   448 ?        Ss   Feb03   0:00 dbus-daemon --system
root      1401  0.0  0.0  66688   592 ?        Ss   Feb03   0:45 /usr/sbin/sshd
root      1410  0.0  0.0  22188   620 ?        Ss   Feb03   0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
root      1420  0.3  0.0 189580  2296 ?        Sl   Feb03  26:12 /usr/sbin/vmtoolsd
ntp       1425  0.0  0.0  30732  1464 ?        Ss   Feb03   1:14 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root      1475  0.0  0.0 108168  1244 ?        S    Feb03   0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/my
mysql     1592  0.6  1.0 2256124 40968 ?       Sl   Feb03  57:01 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql -
postgres  1631  0.1  0.1 216356  4160 ?        S    Feb03  16:49 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres  1702  0.0  0.0 179368   684 ?        Ss   Feb03   2:02 postgres: logger process                          
postgres  1713  0.0  0.0 216472  2952 ?        Ss   Feb03   7:09 postgres: writer process                          
postgres  1714  0.0  0.0 216356   968 ?        Ss   Feb03   5:18 postgres: wal writer process                      
postgres  1715  0.0  0.0 216644  1180 ?        Ss   Feb03   4:35 postgres: autovacuum launcher process             
postgres  1716  0.0  0.0 179636   868 ?        Ss   Feb03   8:29 postgres: stats collector process                 
root      1717  0.0  0.0  81328  2820 ?        Ss   Feb03   2:41 /usr/libexec/postfix/master
postfix   1727  0.0  0.0  83144  3016 ?        S    Feb03   1:54 qmgr -l -t fifo -u
root      1752  0.0  0.3 336788 15316 ?        Ss   Feb03   2:31 /usr/sbin/httpd
root      1762  0.1  0.0 117336   740 ?        Ss   Feb03   9:08 crond
nagios    1772  0.1  0.0 368888   936 ?        S    Feb03  15:30 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
ajaxterm  1780  0.1  0.0 170340  1616 ?        Sl   Feb03  10:25 python /usr/share/ajaxterm/ajaxterm.py --daemon --port=8022 --uid=ajaxterm
nagios    1852  0.0  0.0  50296   236 ?        Ss   Feb03   0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root      1861  0.0  0.0  67552  1360 ?        Ss   Feb03   0:00 login -- root     
root      1863  0.0  0.0   4064   488 tty2     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty2
root      1865  0.0  0.0   4064   488 tty3     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty3
root      1867  0.0  0.0   4064   488 tty4     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty4
root      1869  0.0  0.0   4064   488 tty5     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty5
root      1871  0.0  0.0   4064   488 tty6     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty6
root      2068  0.0  0.0 2085024 1212 ?        Sl   Feb03   0:00 /usr/sbin/console-kit-daemon --no-daemon
root      2138  0.0  0.0 108304  1340 tty1     Ss+  Feb03   0:00 -bash
apache    5872  0.8  0.9 464048 37492 ?        S    Feb08  12:12 /usr/sbin/httpd
apache    5875  0.8  0.9 464304 37544 ?        S    Feb08  11:54 /usr/sbin/httpd
apache    5877  0.8  0.9 464820 38324 ?        S    Feb08  12:34 /usr/sbin/httpd
apache    5881  0.8  0.9 464048 37488 ?        S    Feb08  12:22 /usr/sbin/httpd
apache    5883  0.8  0.9 464816 38264 ?        S    Feb08  11:53 /usr/sbin/httpd
apache    5886  0.8  0.9 464304 37528 ?        S    Feb08  12:08 /usr/sbin/httpd
apache    5888  0.9  0.9 465356 38772 ?        S    Feb08  13:44 /usr/sbin/httpd
apache    5890  0.9  0.9 465248 38720 ?        S    Feb08  13:44 /usr/sbin/httpd
apache    5905  0.9  0.9 465500 38828 ?        S    Feb08  13:57 /usr/sbin/httpd
apache    5921  0.9  0.9 465500 38960 ?        S    Feb08  13:52 /usr/sbin/httpd
postgres  5923  0.0  0.1 217688  5356 ?        Ss   Feb08   0:57 postgres: nagiosxi nagiosxi [local] idle          
postgres  5924  0.0  0.1 217688  5356 ?        Ss   Feb08   0:59 postgres: nagiosxi nagiosxi [local] idle          
apache    5928  0.9  0.9 465184 38624 ?        S    Feb08  13:33 /usr/sbin/httpd
postgres  5929  0.0  0.1 217688  5340 ?        Ss   Feb08   1:02 postgres: nagiosxi nagiosxi [local] idle          
postgres  5934  0.0  0.1 217688  5372 ?        Ss   Feb08   0:59 postgres: nagiosxi nagiosxi [local] idle          
postgres  5940  0.0  0.1 217688  5348 ?        Ss   Feb08   1:00 postgres: nagiosxi nagiosxi [local] idle          
postgres  5946  0.0  0.1 217688  5360 ?        Ss   Feb08   0:56 postgres: nagiosxi nagiosxi [local] idle          
apache    5953  0.8  0.9 464560 37780 ?        S    Feb08  12:42 /usr/sbin/httpd
apache    5958  0.8  0.9 464724 38176 ?        S    Feb08  12:16 /usr/sbin/httpd
postgres  5961  0.0  0.1 217688  5348 ?        Ss   Feb08   1:01 postgres: nagiosxi nagiosxi [local] idle          
apache    5962  0.9  0.9 464948 38372 ?        S    Feb08  13:19 /usr/sbin/httpd
apache    5966  0.8  0.9 464560 38000 ?        S    Feb08  12:11 /usr/sbin/httpd
apache    6041  0.8  0.9 464048 37484 ?        S    Feb08  12:38 /usr/sbin/httpd
postgres  6061  0.0  0.1 217760  6020 ?        Ss   Feb08   1:05 postgres: nagiosxi nagiosxi [local] idle          
postgres  6070  0.0  0.1 217688  5356 ?        Ss   Feb08   1:00 postgres: nagiosxi nagiosxi [local] idle          
postgres  6071  0.0  0.1 217688  5348 ?        Ss   Feb08   0:58 postgres: nagiosxi nagiosxi [local] idle          
postgres  6079  0.0  0.1 217688  5356 ?        Ss   Feb08   0:55 postgres: nagiosxi nagiosxi [local] idle          
postgres  6087  0.0  0.1 217688  5356 ?        Ss   Feb08   1:03 postgres: nagiosxi nagiosxi [local] idle          
apache    6100  0.9  0.9 465224 38676 ?        S    Feb08  13:17 /usr/sbin/httpd
postgres  6112  0.0  0.1 217688  5344 ?        Ss   Feb08   0:51 postgres: nagiosxi nagiosxi [local] idle          
postgres  6122  0.0  0.1 217688  5360 ?        Ss   Feb08   1:07 postgres: nagiosxi nagiosxi [local] idle          
postgres  6123  0.0  0.1 217688  5312 ?        Ss   Feb08   0:57 postgres: nagiosxi nagiosxi [local] idle          
postgres  6124  0.0  0.1 217688  5352 ?        Ss   Feb08   0:59 postgres: nagiosxi nagiosxi [local] idle          
postgres  6141  0.0  0.1 217688  5352 ?        Ss   Feb08   1:04 postgres: nagiosxi nagiosxi [local] idle          
postfix  20498  0.0  0.1  81612  4040 ?        S    12:49   0:00 smtp -t unix -u
postfix  20499  0.0  0.1  81612  4040 ?        S    12:49   0:00 smtp -t unix -u
postfix  20500  0.0  0.1  81612  4040 ?        S    12:49   0:00 smtp -t unix -u
postfix  20501  0.1  0.1  81612  4036 ?        S    12:49   0:00 smtp -t unix -u
postfix  20502  0.0  0.1  81612  4044 ?        S    12:49   0:00 smtp -t unix -u
root     20605  0.4  0.1 100448  4400 ?        Ss   12:50   0:00 sshd: root@pts/0 
root     20614  0.0  0.0 140224  1336 ?        S    12:50   0:00 CROND
root     20616  0.0  0.0 140224  1336 ?        S    12:50   0:00 CROND
root     20617  0.0  0.0 140224  1332 ?        S    12:50   0:00 CROND
root     20618  0.0  0.0 140224  1336 ?        S    12:50   0:00 CROND
nagios   20624  0.0  0.0 106060  1264 ?        Ss   12:50   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/lo
nagios   20625  0.0  0.0 106060  1268 ?        Ss   12:50   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/loc
nagios   20626  0.0  0.0 106060  1264 ?        Ss   12:50   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/l
nagios   20628  0.0  0.0 106060  1264 ?        Ss   12:50   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /us
nagios   20635  2.7  0.6 329156 25556 ?        S    12:50   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios   20636  1.7  0.5 319644 22704 ?        S    12:50   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios   20639  1.8  0.5 319936 23028 ?        S    12:50   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios   20640  1.7  0.7 327476 30744 ?        S    12:50   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
postgres 20647  0.4  0.1 217688  4992 ?        Ss   12:50   0:00 postgres: nagiosxi nagiosxi [local] idle          
postgres 20648  0.1  0.1 217744  5472 ?        Ss   12:50   0:00 postgres: nagiosxi nagiosxi [local] idle          
postgres 20651  0.0  0.1 217688  4980 ?        Ss   12:50   0:00 postgres: nagiosxi nagiosxi [local] idle          
postgres 20660  0.5  0.1 217788  5312 ?        Ss   12:50   0:00 postgres: nagiosxi nagiosxi [local] idle          
root     20735  0.1  0.0 108300  1844 pts/0    Ss   12:50   0:00 -bash
root     20804  0.0  0.0 110228  1152 pts/0    R+   12:50   0:00 ps aux
nagios   24965  0.2  0.0  27168  1796 ?        Ss   Feb03  21:58 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   24967  0.0  0.0  10016   780 ?        S    Feb03   3:59 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   24968  0.0  0.0  10016   780 ?        S    Feb03   4:20 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   24969  0.0  0.0  10016   780 ?        S    Feb03   5:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   24970  0.0  0.0  10016   780 ?        S    Feb03   5:44 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   24971  0.0  0.0  10016   784 ?        S    Feb03   5:21 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   24972  0.0  0.0  10016   776 ?        S    Feb03   4:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   24975  0.0  0.0  50296   844 ?        S    Feb03   3:42 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   24976  0.0  0.0  50432  1048 ?        S    Feb03   7:39 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   24986  0.0  0.0  22336   284 ?        S    Feb03   2:28 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
postfix  28212  0.0  0.0  81408  3804 ?        S    11:28   0:00 pickup -l -t fifo -u
The other VM ( 2vCPUS) was totally frozen, so I shutted down and move to another Datastore (if the case is that I have I/O disk issues). I think that with the VMs on diferent Datastores we could discard and I/O issue right?

so let see what can you suggest from this ?
Cheers,
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by lmiltchev »

Can you run the following command and report any errors?

Code: Select all

LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
Also, I noticed the following message:

Code: Select all

psql: FATAL:  sorry, too many clients already
You can try to increase the max_connections number in the "/var/lib/pgsql/data/postgresql.conf" file. Are you using the "default" setting of 100? What is the output of the following command?

Code: Select all

echo 'show max_connections;' | psql nagiosxi nagiosxi
Be sure to check out our Knowledgebase for helpful articles and solutions!
carlos.atos
Posts: 29
Joined: Mon Nov 10, 2014 1:08 pm

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by carlos.atos »

Hello lmiltchev

The Server went crazy again in out of office hours, so I couldn't get the top at the moment. the 2 VMs are in different DataStores but both of them got a high spike. I will finally shutdown 2vCPU VM tho discard any jammig between them, although I dont think so, could it be bothering the other VM?
Load 4vCPUs FEB-10.PNG
There is no new entries on /var/spool/mail/root
About the comand that you ask me to run:

Code: Select all

[root@localhost ~]# LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
[root@localhost ~]# echo 'show max_connections;' | psql nagiosxi nagiosxi
 max_connections 
-----------------
 100
(1 row)

[root@localhost ~]# 
I'm using the defaults values on the postgresql.conf , do you want me to increase the max_connections up to which value?

Cheers
You do not have the required permissions to view the files attached to this post.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by abrist »

carlos.atos wrote:I'm using the defaults values on the postgresql.conf , do you want me to increase the max_connections up to which value?
Double it for good measure. Remember to restart mysqld.
I think Ludmil was curious about issues with mrtg, so lets time the script to see if it is taking longer than expected:

Code: Select all

LANG=C LC_ALL=C time /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
As of late, how consistent is the spike? Is it on a predictable interval or at a specific time?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
carlos.atos
Posts: 29
Joined: Mon Nov 10, 2014 1:08 pm

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by carlos.atos »

Hello Abrist

the max_connections has been set to 200 on /var/lib/pgsql/data/postgresql.conf
the mysqld service was restarted after this.

about the other script, this is what I've got.

Code: Select all

[root@localhost ~]# LANG=C LC_ALL=C time /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
-bash: time: command not found
[root@localhost ~]# 
As of late, how consistent is the spike? Is it on a predictable interval or at a specific time?
I could say that there were up to 300 and then I lost the graphs some hours ( i think because the high load), and had ocurred since this saturday 8th at 11 pm up to 2am and then on monday at similar hours. we could dare to say that it could occurr tonight, i'll try to monitor it to check for the spike.

This the graph for this week.
localhost-current_load week5-11 FEB.jpg
Cheers,
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by scottwilkerson »

Are these VM's running on server that has other VM's that could have scheduled jobs or something monopolizing the resources on the machine?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
carlos.atos
Posts: 29
Joined: Mon Nov 10, 2014 1:08 pm

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by carlos.atos »

Hi Guys,

No good news for today, there was a peak nearly 100 in load from 4 am to 8 am, Unfortunately I wasn't at the office to connect and check the TOP or ps aux. but the load was in 2,46 so I took this in order to see what can you analize from this:

Code: Select all

top - 09:20:42 up 8 days, 21:56,  2 users,  load average: 2.46, 1.64, 1.77
Tasks: 197 total,   2 running, 195 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.4%us,  0.7%sy,  0.0%ni, 90.8%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3924212k total,  1624644k used,  2299568k free,    43228k buffers
Swap:  2064380k total,    43532k used,  2020848k free,   136916k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                       
11357 apache    20   0  454m  34m 8308 S  8.3  0.9   9:28.06 httpd                                                                         
11620 postgres  20   0  212m 5252 3720 S  8.0  0.1   1:30.63 postmaster                                                                    
24327 postgres  20   0  212m 5240 3708 R  7.0  0.1   0:57.22 postmaster                                                                    
24282 apache    20   0  454m  34m 8280 S  5.3  0.9   4:36.85 httpd                                                                         
 5074 apache    20   0  454m  34m 8316 S  4.0  0.9   7:12.62 httpd                                                                         
14247 apache    20   0  454m  34m 8284 S  3.6  0.9   7:40.70 httpd                                                                         
   22 root      20   0     0    0    0 S  0.3  0.0   5:10.66 events/3                                                                      
 9365 mysql     20   0 2203m  41m 4948 S  0.3  1.1  13:52.40 mysqld                                                                        
13928 nagios    20   0  312m  22m 8064 S  0.3  0.6   0:01.77 php                                                                           
13988 postgres  20   0  212m 5324 3752 S  0.3  0.1   0:00.37 postmaster                                                                    
13992 postgres  20   0  212m 5520 3944 S  0.3  0.1   0:00.82 postmaster                                                                    
14152 root      20   0 15128 1360  964 R  0.3  0.0   0:00.20 top                                                                           
27755 apache    20   0  454m  34m 8252 S  0.3  0.9   5:12.75 httpd                                                                         
    1 root      20   0 19360 1020  856 S  0.0  0.0   3:19.37 init                                                                          
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd                                                                      
    3 root      RT   0     0    0    0 S  0.0  0.0   1:26.19 migration/0                                                                   
    4 root      20   0     0    0    0 S  0.0  0.0   0:13.11 ksoftirqd/0                                                                   
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/0                                                                     
    6 root      RT   0     0    0    0 S  0.0  0.0   0:08.68 watchdog/0                                                                    
    7 root      RT   0     0    0    0 S  0.0  0.0   0:31.46 migration/1                                                                   
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/1                                                                     
    9 root      20   0     0    0    0 S  0.0  0.0   0:20.16 ksoftirqd/1                                                                   
   10 root      RT   0     0    0    0 S  0.0  0.0   0:07.96 watchdog/1                                                                    
   11 root      RT   0     0    0    0 S  0.0  0.0   1:32.61 migration/2                                                                   
   12 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/2                                                                     
   13 root      20   0     0    0    0 S  0.0  0.0   0:14.34 ksoftirqd/2                                                                   
   14 root      RT   0     0    0    0 S  0.0  0.0   0:07.34 watchdog/2                                                                    
   15 root      RT   0     0    0    0 S  0.0  0.0   0:34.85 migration/3                                                                   
   16 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/3                                                                     
   17 root      20   0     0    0    0 S  0.0  0.0   0:17.85 ksoftirqd/3                                                                   
   18 root      RT   0     0    0    0 S  0.0  0.0   0:08.24 watchdog/3                                                                    
   19 root      20   0     0    0    0 S  0.0  0.0   3:39.24 events/0                                                                      
   20 root      20   0     0    0    0 S  0.0  0.0   6:24.68 events/1                                                                      
   21 root      20   0     0    0    0 S  0.0  0.0   4:16.54 events/2                                                                      
   23 root      20   0     0    0    0 S  0.0  0.0   0:00.00 cgroup                                                                        
   24 root      20   0     0    0    0 S  0.0  0.0   0:00.17 khelper                                                                       
   25 root      20   0     0    0    0 S  0.0  0.0   0:00.00 netns                                                                         
   26 root      20   0     0    0    0 S  0.0  0.0   0:00.00 async/mgr                                                                     
   27 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pm                                                                            
   28 root      20   0     0    0    0 S  0.0  0.0   0:15.63 sync_supers                                                                   
   29 root      20   0     0    0    0 S  0.0  0.0   0:13.72 bdi-default                                                                   
   30 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/0                                                                 
   31 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/1                                                                 
   32 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/2                                                                 
   33 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/3                                                                 
   34 root      20   0     0    0    0 S  0.0  0.0   2:11.62 kblockd/0           
ps aux

Code: Select all

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  19360  1020 ?        Ss   Feb03   3:19 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Feb03   1:26 [migration/0]
root         4  0.0  0.0      0     0 ?        S    Feb03   0:13 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/0]
root         6  0.0  0.0      0     0 ?        S    Feb03   0:08 [watchdog/0]
root         7  0.0  0.0      0     0 ?        S    Feb03   0:31 [migration/1]
root         8  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/1]
root         9  0.0  0.0      0     0 ?        S    Feb03   0:20 [ksoftirqd/1]
root        10  0.0  0.0      0     0 ?        S    Feb03   0:07 [watchdog/1]
root        11  0.0  0.0      0     0 ?        S    Feb03   1:32 [migration/2]
root        12  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/2]
root        13  0.0  0.0      0     0 ?        S    Feb03   0:14 [ksoftirqd/2]
root        14  0.0  0.0      0     0 ?        S    Feb03   0:07 [watchdog/2]
root        15  0.0  0.0      0     0 ?        S    Feb03   0:34 [migration/3]
root        16  0.0  0.0      0     0 ?        S    Feb03   0:00 [stopper/3]
root        17  0.0  0.0      0     0 ?        S    Feb03   0:17 [ksoftirqd/3]
root        18  0.0  0.0      0     0 ?        S    Feb03   0:08 [watchdog/3]
root        19  0.0  0.0      0     0 ?        S    Feb03   3:39 [events/0]
root        20  0.0  0.0      0     0 ?        S    Feb03   6:24 [events/1]
root        21  0.0  0.0      0     0 ?        S    Feb03   4:16 [events/2]
root        22  0.0  0.0      0     0 ?        S    Feb03   5:10 [events/3]
root        23  0.0  0.0      0     0 ?        S    Feb03   0:00 [cgroup]
root        24  0.0  0.0      0     0 ?        S    Feb03   0:00 [khelper]
root        25  0.0  0.0      0     0 ?        S    Feb03   0:00 [netns]
root        26  0.0  0.0      0     0 ?        S    Feb03   0:00 [async/mgr]
root        27  0.0  0.0      0     0 ?        S    Feb03   0:00 [pm]
root        28  0.0  0.0      0     0 ?        S    Feb03   0:15 [sync_supers]
root        29  0.0  0.0      0     0 ?        S    Feb03   0:13 [bdi-default]
root        30  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/0]
root        31  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/1]
root        32  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/2]
root        33  0.0  0.0      0     0 ?        S    Feb03   0:00 [kintegrityd/3]
root        34  0.0  0.0      0     0 ?        S    Feb03   2:11 [kblockd/0]
root        35  0.0  0.0      0     0 ?        S    Feb03   1:20 [kblockd/1]
root        36  0.0  0.0      0     0 ?        S    Feb03   2:03 [kblockd/2]
root        37  0.0  0.0      0     0 ?        S    Feb03   1:17 [kblockd/3]
root        38  0.0  0.0      0     0 ?        S    Feb03   0:00 [kacpid]
root        39  0.0  0.0      0     0 ?        S    Feb03   0:00 [kacpi_notify]
root        40  0.0  0.0      0     0 ?        S    Feb03   0:00 [kacpi_hotplug]
root        41  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_aux]
root        42  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/0]
root        43  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/1]
root        44  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/2]
root        45  0.0  0.0      0     0 ?        S    Feb03   0:00 [ata_sff/3]
root        46  0.0  0.0      0     0 ?        S    Feb03   0:00 [ksuspend_usbd]
root        47  0.0  0.0      0     0 ?        S    Feb03   0:00 [khubd]
root        48  0.0  0.0      0     0 ?        S    Feb03   0:00 [kseriod]
root        49  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/0]
root        50  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/1]
root        51  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/2]
root        52  0.0  0.0      0     0 ?        S    Feb03   0:00 [md/3]
root        53  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/0]
root        54  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/1]
root        55  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/2]
root        56  0.0  0.0      0     0 ?        S    Feb03   0:00 [md_misc/3]
root        57  0.0  0.0      0     0 ?        S    Feb03   0:00 [linkwatch]
root        58  0.0  0.0      0     0 ?        S    Feb03   0:24 [khungtaskd]
root        59  0.1  0.0      0     0 ?        S    Feb03  16:37 [kswapd0]
root        60  0.0  0.0      0     0 ?        SN   Feb03   0:00 [ksmd]
root        61  0.0  0.0      0     0 ?        SN   Feb03   2:58 [khugepaged]
root        62  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/0]
root        63  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/1]
root        64  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/2]
root        65  0.0  0.0      0     0 ?        S    Feb03   0:00 [aio/3]
root        66  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/0]
root        67  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/1]
root        68  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/2]
root        69  0.0  0.0      0     0 ?        S    Feb03   0:00 [crypto/3]
root        77  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/0]
root        78  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/1]
root        79  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/2]
root        80  0.0  0.0      0     0 ?        S    Feb03   0:00 [kthrotld/3]
root        81  0.0  0.0      0     0 ?        S    Feb03   0:00 [pciehpd]
root        83  0.0  0.0      0     0 ?        S    Feb03   0:00 [kpsmoused]
root        84  0.0  0.0      0     0 ?        S    Feb03   0:00 [usbhid_resumer]
root        85  0.0  0.0      0     0 ?        S    Feb03   0:00 [deferwq]
root       116  0.0  0.0      0     0 ?        S    Feb03   0:00 [kdmremove]
root       117  0.0  0.0      0     0 ?        S    Feb03   0:00 [kstriped]
root       283  0.0  0.0      0     0 ?        S    Feb03   1:02 [mpt_poll_0]
root       284  0.0  0.0      0     0 ?        S    Feb03   0:00 [mpt/0]
root       309  0.0  0.0      0     0 ?        S    Feb03   0:00 [scsi_eh_0]
root       318  0.0  0.0      0     0 ?        S    Feb03   0:00 [scsi_eh_1]
root       319  0.0  0.0      0     0 ?        S    Feb03   0:00 [scsi_eh_2]
root       422  0.0  0.0      0     0 ?        S    Feb03   0:00 [kdmflush]
root       424  0.0  0.0      0     0 ?        S    Feb03   0:00 [kdmflush]
root       442  0.0  0.0      0     0 ?        R    Feb03   5:21 [jbd2/dm-0-8]
root       443  0.0  0.0      0     0 ?        S    Feb03   0:00 [ext4-dio-unwrit]
root       516  0.0  0.0  11028   260 ?        S<s  Feb03   0:00 /sbin/udevd -d
root       712  0.0  0.0      0     0 ?        S    Feb03   0:48 [vmmemctl]
root       720  0.0  0.0      0     0 ?        S    Feb03   2:53 [flush-253:0]
root       858  0.0  0.0  11028   244 ?        S<   Feb03   0:00 /sbin/udevd -d
root       865  0.0  0.0  10636   248 ?        S<   Feb03   0:00 /sbin/udevd -d
root       889  0.0  0.0      0     0 ?        S    Feb03   0:00 [jbd2/sda1-8]
root       890  0.0  0.0      0     0 ?        S    Feb03   0:00 [ext4-dio-unwrit]
root       928  0.0  0.0      0     0 ?        S    Feb03   2:58 [kauditd]
root      1175  0.1  0.0  93144   724 ?        S<sl Feb03  17:39 auditd
root      1199  0.1  0.1 249476  7016 ?        Sl   Feb03  21:51 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
dbus      1212  0.0  0.0  21532   448 ?        Ss   Feb03   0:00 dbus-daemon --system
root      1401  0.0  0.0  66688   560 ?        Ss   Feb03   1:14 /usr/sbin/sshd
root      1410  0.0  0.0  22188   620 ?        Ss   Feb03   0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
root      1420  0.3  0.0 189580  2224 ?        Sl   Feb03  39:08 /usr/sbin/vmtoolsd
ntp       1425  0.0  0.0  30732  1448 ?        Ss   Feb03   2:01 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
postgres  1631  0.1  0.1 216356  4156 ?        S    Feb03  25:37 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres  1702  0.0  0.0 179368   668 ?        Ss   Feb03   2:59 postgres: logger process                          
postgres  1713  0.0  0.0 216472  2676 ?        Ss   Feb03  10:24 postgres: writer process                          
postgres  1714  0.0  0.0 216356   996 ?        Ss   Feb03   7:32 postgres: wal writer process                      
postgres  1715  0.0  0.0 216644  1164 ?        Ss   Feb03   7:53 postgres: autovacuum launcher process             
postgres  1716  0.1  0.0 179636   868 ?        Ss   Feb03  13:29 postgres: stats collector process                 
root      1717  0.0  0.0  81328  2784 ?        Ss   Feb03   5:06 /usr/libexec/postfix/master
postfix   1727  0.0  0.0  83144  3052 ?        S    Feb03   3:40 qmgr -l -t fifo -u
root      1762  0.1  0.0 117336   740 ?        Ss   Feb03  14:24 crond
nagios    1772  0.1  0.0 368888  1020 ?        S    Feb03  24:24 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
ajaxterm  1780  0.1  0.0 170340  1612 ?        Sl   Feb03  16:08 python /usr/share/ajaxterm/ajaxterm.py --daemon --port=8022 --uid=ajaxterm
nagios    1852  0.0  0.0  50296   268 ?        Ss   Feb03   0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root      1861  0.0  0.0  67552  1360 ?        Ss   Feb03   0:00 login -- root     
root      1863  0.0  0.0   4064   488 tty2     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty2
root      1865  0.0  0.0   4064   488 tty3     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty3
root      1867  0.0  0.0   4064   488 tty4     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty4
root      1869  0.0  0.0   4064   488 tty5     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty5
root      1871  0.0  0.0   4064   488 tty6     Ss+  Feb03   0:00 /sbin/mingetty /dev/tty6
root      2068  0.0  0.0 2085024 1212 ?        Sl   Feb03   0:00 /usr/sbin/console-kit-daemon --no-daemon
root      2138  0.0  0.0 108304  1340 tty1     Ss+  Feb03   0:00 -bash
root      9257  0.0  0.0 108168  1296 ?        S    Feb11   0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/my
mysql     9365  1.0  1.0 2256396 42284 ?       Sl   Feb11  13:54 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql -
root     13940  0.4  0.1 100448  4412 ?        Ss   09:20   0:01 sshd: root@pts/0 
root     14042  0.2  0.0 108300  1876 pts/0    Ss   09:20   0:00 -bash
postfix  14711  0.2  0.1  81612  4076 ?        S    09:22   0:00 smtp -t unix -u
postfix  14716  0.2  0.1  81612  4080 ?        S    09:22   0:00 smtp -t unix -u
postfix  14717  0.1  0.1  81612  4076 ?        S    09:22   0:00 smtp -t unix -u
postfix  14718  0.1  0.1  81612  4080 ?        S    09:22   0:00 smtp -t unix -u
nagios   15186  0.7  0.0  22852  1820 ?        Ss   09:23   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15188  0.1  0.0  10016   920 ?        S    09:23   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   15189  0.1  0.0  10016   920 ?        S    09:23   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   15190  0.1  0.0  10016   920 ?        S    09:23   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   15191  0.1  0.0  10016   912 ?        S    09:23   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   15192  0.2  0.0  10016   924 ?        S    09:23   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   15193  0.0  0.0  10016   912 ?        S    09:23   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   15194  0.5  0.0  50296  1208 ?        S    09:23   0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   15195  0.8  0.0  50432  1384 ?        S    09:23   0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   15199  0.0  0.0  22336   828 ?        S    09:23   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     15282  0.1  0.0 140224  1336 ?        S    09:24   0:00 CROND
root     15284  0.2  0.0 140224  1336 ?        S    09:24   0:00 CROND
root     15285  0.1  0.0 140224  1332 ?        S    09:24   0:00 CROND
root     15286  0.1  0.0 140224  1336 ?        S    09:24   0:00 CROND
nagios   15289  0.2  0.0 106060  1264 ?        Ss   09:24   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /us
nagios   15290  0.0  0.0 106060  1264 ?        Ss   09:24   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/loc
nagios   15292  0.0  0.0 106060  1268 ?        Ss   09:24   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/l
nagios   15293  5.2  0.5 319672 23120 ?        S    09:24   0:01 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios   15294  1.8  0.5 319644 22820 ?        S    09:24   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios   15297  0.1  0.0 106060  1268 ?        Ss   09:24   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/lo
nagios   15298  4.5  0.5 319936 23024 ?        S    09:24   0:01 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios   15301  2.1  0.7 327476 30736 ?        S    09:24   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
postgres 15308  1.7  0.1 217724  4992 ?        Ss   09:24   0:00 postgres: nagiosxi nagiosxi [local] idle          
postgres 15309  1.0  0.1 217788  5316 ?        Ss   09:24   0:00 postgres: nagiosxi nagiosxi [local] idle          
postgres 15318  0.9  0.1 217740  5492 ?        Ss   09:24   0:00 postgres: nagiosxi nagiosxi [local] idle          
postgres 15336  1.2  0.1 217724  4984 ?        Ss   09:24   0:00 postgres: nagiosxi nagiosxi [local] idle          
root     15524  0.0  0.0 110232  1156 pts/0    R+   09:24   0:00 ps aux
postfix  21829  0.0  0.0  81408  3840 ?        S    08:00   0:00 pickup -l -t fifo -u
there wasn't new messages in /var/spool/mail/root rather than the last ones.

about
scottwilkerson wrote:Are these VM's running on server that has other VM's that could have scheduled jobs or something monopolizing the resources on the machine?
there was another Nagios VM ( the same copy using 2vCPUS and aonther HDD) but I shutted it down days ago. Both presented the same Load behaviour, for that reason I shutted it dow. I was thinking that It would be a Jamming between each others

what else do you think?

Cheers,
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Nagios 2014R2.3 on VM HIGH LOAD SPIKES

Post by WillemDH »

Maybe, as a test, you could disable the services and hosts making use of mrtg and temporarily stop mrtg. If you still get load spikes, at least you know it's not caused by mrtg. Maybe wait for Nagios support to react to see if they think it's a good idea.
Nagios XI 5.8.1
https://outsideit.net
Locked