nagios server has high cpu
Posted: Thu May 24, 2012 12:14 am
Hi,
Our nagiosxi server (vm appliance) is showing elevated CPU levels since I installed and configured the vmware SDK so that we could monitor our ESXi hosts.
We have 18 ESXI hosts. if I run top from the console I often see the esxi perl script at the top in multiple intances each taking 10% cpu. All the esxi servers are generating correct stats
The vm is getting all the cpu it is requesting from the host, wait time is 0
The vm is currently configured with 1 vCPU.
On the server statistics dashlet User time is often red at 95% with load stats showing 9.40 6.17 5.49
I have allocated the server 2GB of ram and it is using 1547 MB with 483Mb free, swap is not being used at all.
Regarding checks...
Active Host checks 16 86 163
Active Service checks 140 966 1781
Host Check Execution time avg = 0.05s (max=0.15s)
Service Check execution time avg = 1.56s (max=47.46s)
I have seen the server more than once show all hosts/services as flapping and has suspended notifcations. I wonder if this is a result of the high CPU meaning it cannot get round alll its checks in time. We need to avoid this situation as it will compromise our capability to monitor our environment.
Questions:
Can you advise if this is typical behaviour given the number of hosts/services we are monitoring?
Would the server benefit from another vCPU being added? (wait time is still expected to be zero if we do this)
Is there something we can check to see that the vmware sdk and ESXi plugins are working correctly/optimally?
Is it possible to qukcly determine whick checks are taking the longest eg the 47s, as this will be blowing out the average i am sure.
Cheers,
KB.
Our nagiosxi server (vm appliance) is showing elevated CPU levels since I installed and configured the vmware SDK so that we could monitor our ESXi hosts.
We have 18 ESXI hosts. if I run top from the console I often see the esxi perl script at the top in multiple intances each taking 10% cpu. All the esxi servers are generating correct stats
The vm is getting all the cpu it is requesting from the host, wait time is 0
The vm is currently configured with 1 vCPU.
On the server statistics dashlet User time is often red at 95% with load stats showing 9.40 6.17 5.49
I have allocated the server 2GB of ram and it is using 1547 MB with 483Mb free, swap is not being used at all.
Regarding checks...
Active Host checks 16 86 163
Active Service checks 140 966 1781
Host Check Execution time avg = 0.05s (max=0.15s)
Service Check execution time avg = 1.56s (max=47.46s)
I have seen the server more than once show all hosts/services as flapping and has suspended notifcations. I wonder if this is a result of the high CPU meaning it cannot get round alll its checks in time. We need to avoid this situation as it will compromise our capability to monitor our environment.
Questions:
Can you advise if this is typical behaviour given the number of hosts/services we are monitoring?
Would the server benefit from another vCPU being added? (wait time is still expected to be zero if we do this)
Is there something we can check to see that the vmware sdk and ESXi plugins are working correctly/optimally?
Is it possible to qukcly determine whick checks are taking the longest eg the 47s, as this will be blowing out the average i am sure.
Cheers,
KB.