Hi,
Our nagiosxi server (vm appliance) is showing elevated CPU levels since I installed and configured the vmware SDK so that we could monitor our ESXi hosts.
We have 18 ESXI hosts. if I run top from the console I often see the esxi perl script at the top in multiple intances each taking 10% cpu. All the esxi servers are generating correct stats
The vm is getting all the cpu it is requesting from the host, wait time is 0
The vm is currently configured with 1 vCPU.
On the server statistics dashlet User time is often red at 95% with load stats showing 9.40 6.17 5.49
I have allocated the server 2GB of ram and it is using 1547 MB with 483Mb free, swap is not being used at all.
Regarding checks...
Active Host checks 16 86 163
Active Service checks 140 966 1781
Host Check Execution time avg = 0.05s (max=0.15s)
Service Check execution time avg = 1.56s (max=47.46s)
I have seen the server more than once show all hosts/services as flapping and has suspended notifcations. I wonder if this is a result of the high CPU meaning it cannot get round alll its checks in time. We need to avoid this situation as it will compromise our capability to monitor our environment.
Questions:
Can you advise if this is typical behaviour given the number of hosts/services we are monitoring?
Would the server benefit from another vCPU being added? (wait time is still expected to be zero if we do this)
Is there something we can check to see that the vmware sdk and ESXi plugins are working correctly/optimally?
Is it possible to qukcly determine whick checks are taking the longest eg the 47s, as this will be blowing out the average i am sure.
Cheers,
KB.
nagios server has high cpu
Re: nagios server has high cpu
I would start by adding the second CPU to the XI VM. ESX (SDK) checks are brutal for CPU usage compared to most.
As far as the flapping state goes, I don't think that has to do with execution time or latency since that won't affect host states, I think your Nagios server may have actually lost connectivity for brief moments.
How does your check latency look?
As far as the flapping state goes, I don't think that has to do with execution time or latency since that won't affect host states, I think your Nagios server may have actually lost connectivity for brief moments.
How does your check latency look?
Re: nagios server has high cpu
Hi,
The host check latency is : 0.01 0.20 0.05
The service check latency is : 0.00 0.46 0.12
The flapping is interesting. We have had other vms occasionally drop out from network coms even though nothing has changed. We have our switches in nagios now too, and I understand we have already seen some odd behaviour from one 10GigE port. We have our network team following up on this now.
I will halt the server, take a snapshot, add a second vCPU and reboot. I will report back with findings.
Cheers,
KB.
The host check latency is : 0.01 0.20 0.05
The service check latency is : 0.00 0.46 0.12
The flapping is interesting. We have had other vms occasionally drop out from network coms even though nothing has changed. We have our switches in nagios now too, and I understand we have already seen some odd behaviour from one 10GigE port. We have our network team following up on this now.
I will halt the server, take a snapshot, add a second vCPU and reboot. I will report back with findings.
Cheers,
KB.
Re: nagios server has high cpu
Hi,
The server appears to behaving much better with the addtional of the extra vCPU.
The server has running for a while now with 2 vCPU and reports the following:
Host check latency = 0.05 0.37 0.21 (min, max, avg)
Host check execution time = 0.01, 0.05, 0.02
Service Check latency = 0.00, 0.32, 0.13
Service Check Execution time = 0.00 17.55 0.64
Server load (1min, 5min, 15min)
1.34, 1.15, 1.03 (during peak checking activity)
0.74, 0.93, 0.97 (during low point of peak checking cycle)
CPU Stats
user = ~15%
system = ~3.5%
idle = 71.24%
Memory
used = 592 mb
free = 1425 mb
swap not being used at all
I think that has resolved the issue for us.
Cheers,
KB.
The server appears to behaving much better with the addtional of the extra vCPU.
The server has running for a while now with 2 vCPU and reports the following:
Host check latency = 0.05 0.37 0.21 (min, max, avg)
Host check execution time = 0.01, 0.05, 0.02
Service Check latency = 0.00, 0.32, 0.13
Service Check Execution time = 0.00 17.55 0.64
Server load (1min, 5min, 15min)
1.34, 1.15, 1.03 (during peak checking activity)
0.74, 0.93, 0.97 (during low point of peak checking cycle)
CPU Stats
user = ~15%
system = ~3.5%
idle = 71.24%
Memory
used = 592 mb
free = 1425 mb
swap not being used at all
I think that has resolved the issue for us.
Cheers,
KB.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: nagios server has high cpu
glad to hear it running better.