nagios server has high cpu

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

nagios server has high cpu

Post by KiwiBloke »

Hi,

Our nagiosxi server (vm appliance) is showing elevated CPU levels since I installed and configured the vmware SDK so that we could monitor our ESXi hosts.

We have 18 ESXI hosts. if I run top from the console I often see the esxi perl script at the top in multiple intances each taking 10% cpu. All the esxi servers are generating correct stats

The vm is getting all the cpu it is requesting from the host, wait time is 0
The vm is currently configured with 1 vCPU.

On the server statistics dashlet User time is often red at 95% with load stats showing 9.40 6.17 5.49

I have allocated the server 2GB of ram and it is using 1547 MB with 483Mb free, swap is not being used at all.

Regarding checks...
Active Host checks 16 86 163
Active Service checks 140 966 1781

Host Check Execution time avg = 0.05s (max=0.15s)
Service Check execution time avg = 1.56s (max=47.46s)

I have seen the server more than once show all hosts/services as flapping and has suspended notifcations. I wonder if this is a result of the high CPU meaning it cannot get round alll its checks in time. We need to avoid this situation as it will compromise our capability to monitor our environment.

Questions:
Can you advise if this is typical behaviour given the number of hosts/services we are monitoring?
Would the server benefit from another vCPU being added? (wait time is still expected to be zero if we do this)
Is there something we can check to see that the vmware sdk and ESXi plugins are working correctly/optimally?
Is it possible to qukcly determine whick checks are taking the longest eg the 47s, as this will be blowing out the average i am sure.

Cheers,

KB.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: nagios server has high cpu

Post by mguthrie »

I would start by adding the second CPU to the XI VM. ESX (SDK) checks are brutal for CPU usage compared to most.

As far as the flapping state goes, I don't think that has to do with execution time or latency since that won't affect host states, I think your Nagios server may have actually lost connectivity for brief moments.

How does your check latency look?
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

Re: nagios server has high cpu

Post by KiwiBloke »

Hi,

The host check latency is : 0.01 0.20 0.05
The service check latency is : 0.00 0.46 0.12

The flapping is interesting. We have had other vms occasionally drop out from network coms even though nothing has changed. We have our switches in nagios now too, and I understand we have already seen some odd behaviour from one 10GigE port. We have our network team following up on this now.

I will halt the server, take a snapshot, add a second vCPU and reboot. I will report back with findings.

Cheers,

KB.
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

Re: nagios server has high cpu

Post by KiwiBloke »

Hi,

The server appears to behaving much better with the addtional of the extra vCPU.

The server has running for a while now with 2 vCPU and reports the following:

Host check latency = 0.05 0.37 0.21 (min, max, avg)
Host check execution time = 0.01, 0.05, 0.02
Service Check latency = 0.00, 0.32, 0.13
Service Check Execution time = 0.00 17.55 0.64

Server load (1min, 5min, 15min)
1.34, 1.15, 1.03 (during peak checking activity)
0.74, 0.93, 0.97 (during low point of peak checking cycle)

CPU Stats
user = ~15%
system = ~3.5%
idle = 71.24%

Memory
used = 592 mb
free = 1425 mb

swap not being used at all

I think that has resolved the issue for us.

Cheers,

KB.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: nagios server has high cpu

Post by scottwilkerson »

glad to hear it running better.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked