Nagios high cpu load with wmic

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
HessianKnight
Posts: 8
Joined: Thu Jul 28, 2011 9:15 am

Nagios high cpu load with wmic

Post by HessianKnight »

Hi,

Setup:
--------
We run a Nagios 3 Core with the check_wmi_plus plugin.

The plugin uses Zenoss' wmic, which Ive compiled from source.
Nagios runs under Debian Squeeze in a VMware ESXi Hypervisor. The Linux has plenty of RAM + 4 Cores with 2,3 GHz each.
The Nagios installation checks about 300 Hosts with ca. 3000 checks in total.

Question:
------------
Is it possible to reduce the load or use the CPUs more efficently?
Top shows constantly a load average: 5.97, 5.83, 5.62.
Whats interesting each CPU runs at under <30% user with idle at about 70%.
I have the suspicion that its not using the whole CPU power....
The Linux VM also slows down the whole VMware Hyperisor, still not running at 100% (see below).

Code: Select all

top - 14:17:33 up 47 days,  4:11,  1 user,  load average: 7.41, 6.16, 5.73
Tasks: 113 total,   2 running, 110 sleeping,   0 stopped,   1 zombie
Cpu0  : 13.8%us, 10.8%sy,  0.0%ni, 74.4%id,  0.7%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu1  : 16.3%us, 12.1%sy,  0.0%ni, 71.0%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 13.6%us, 10.4%sy,  0.0%ni, 76.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 12.1%us,  9.2%sy,  0.0%ni, 78.0%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   3895264k total,  3459336k used,   435928k free,   299660k buffers
Swap:   477176k total,       52k used,   477124k free,  2730444k cached
Nagios Statistics:

Code: Select all

Service Check Execution Time:	0.03 / 31.30 / 2.650 sec
Service Check Latency:	1.74 / 199.01 / 169.317 sec
Host Check Execution Time:	0.02 / 15.26 / 3.921 sec
Host Check Latency:	30.98 / 186.18 / 167.100 sec
regards
HK
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios high cpu load with wmic

Post by mguthrie »

I think you're hitting a wall on disk I\O since it's a VM, take a look at this labs posting, it might help your scenario. Your latencies are probably high because Nagios has to keep waiting to write to the disk. I recommend making use of RAM disks.
http://labs.nagios.com/2012/01/30/nagio ... g-disk-io/
HessianKnight
Posts: 8
Joined: Thu Jul 28, 2011 9:15 am

Re: Nagios high cpu load with wmic

Post by HessianKnight »

Hi!
thanks for your reply!

I have finished the first suggestions (object file). It rocks! It reduced Service Check Latency to 1/3 of the previous value.

Btw, I couldn't find any explanation of the latencies table (Tactical Overview, upper right corner)...
I guess the smaller the values - the better!

I was thinking about a SSD, but a SAS SSD starts from 1,5k Euro so a ram disk might the more viable option.

Here it is:

Code: Select all

Monitoring Performance
Service Check Execution Time:	0.00 / 35.39 / 4.431 sec
Service Check Latency:	0.00 / 207.95 / 29.993 sec
Host Check Execution Time:	0.04 / 15.17 / 4.028 sec
Host Check Latency:	0.00 / 131.16 / 22.987 sec
# Active Host / Service Checks:	601 / 3040
# Passive Host / Service Checks:	0 / 15
I also gave it max concurrent checks 3x,it skyrocketed the Load up to 15 ;) but at least there are no nudges in the logs any more.

regards,
HK
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios high cpu load with wmic

Post by mguthrie »

RAM disks are free, and I've run 40 checks per second on a dual-core desktop machine by making full use of it, and I think with your system being on a VM I would experiment with moving as much as you can to the RAM disk.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios high cpu load with wmic

Post by jsmurphy »

I would also recommend reducing the number of vCPU's particularly if you are on ESXi 4.x or lower. Adding more vCPU's can actually DECREASE performance, of not just the VM but the ESXi server, if the VM is on an ESXi server with appreciable load (say 50% or more resource utilization). The reason for this is that the hypervisor needs to schedule 4 simultaneous physical cores to be free to execute commands for that server, which means that sometimes this increases the processor wait time to such an extent that it surpasses the benefit of even having 4 procs.

In my environment we try to avoid going above dual proc for this reason, ESXi 5 made some performance improvements to the CPU scheduling algorithm that apparently helped with this issue but I have no idea to what extent it was mitigated.
HessianKnight
Posts: 8
Joined: Thu Jul 28, 2011 9:15 am

Re: Nagios high cpu load with wmic

Post by HessianKnight »

Jsmurphy, thanks for the hints but in my case its changed nothing.
Using ESXi 3, the best performance is shown when using aggressive settings i.e. CPU reservations + priorities.

Lowering the number of cores gives no positive effects.

Ill try to move as much as possible to ramdisks.
Locked