CPU Load Spike daily

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
belvdr
Posts: 81
Joined: Tue Oct 08, 2013 9:17 pm

Re: CPU Load Spike daily

Post by belvdr »

BanditBBS wrote:Update: as of 9am CST I have been at double digit load for 25 minutes and counting. I'm going to kill the nagios process and le tit sit for a few minutes just so I can use the server.
Just curious and on whim, what are your server details? I'm curious if it's a VM hardware thing.
emislivec
Posts: 52
Joined: Tue Feb 25, 2014 10:06 am

Re: CPU Load Spike daily

Post by emislivec »

abrist wrote:
BanditBBS wrote: So in short, the new scheduling is not good at all*except for the load easing).
I have a suspicion that commit will get reverted . . . .
...yeah, I have a suspicion you're right...
That commit applies a random adjustment to certain checks. It obviously needs to be more constrained, and probably not random...
abrist wrote:More on this soon as Eric[1]'s work progresses. . . .
I'm looking into the static scheduling algorithm Core runs on startup when not using the retained schedule. We probably need to run a full, proper reschedule more often, which is simpler than it sounds. Checks with different/longer intervals, retries, dependent checks, host/service additions/removals, time changes, etc., make things 'interesting'. Efficiently generating a smooth schedule that doesn't bunch checks at some point is pretty much impossible without violating the check_interval somewhat. However, the steady state should be much smoother for simpler schedules.

So yeah, back to the code.
Smark
Posts: 32
Joined: Tue Jan 08, 2013 6:12 pm

Re: CPU Load Spike daily

Post by Smark »

Hello everyone,

I've read through this post but I want to be sure what I'm seeing on my system is the same issue we're discussing in this thread... Please see the attachment which shows the load average of the Nagios server over about 24hrs.

I did nothing differently but did notice the change in load pattern shortly after an update. While there were upticks in usage, the spikes were smaller and closer together. For the time being I've just been throwing CPUs at it. Currently I'm at 16 vCPUs at 6GB RAM.
You do not have the required permissions to view the files attached to this post.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: CPU Load Spike daily

Post by tmcdonald »

Your symptoms do seem to line up well with what we are seeing elsewhere.
Former Nagios employee
Smark
Posts: 32
Joined: Tue Jan 08, 2013 6:12 pm

Re: CPU Load Spike daily

Post by Smark »

tmcdonald wrote:Your symptoms do seem to line up well with what we are seeing elsewhere.
Thanks. Please let me know how I can help by submitting logs or configuration information. While troubleshooting another issue (turned out to be related to colons in service names) I de-provisioned my RAMDisk and noticed the load spikes went from about 12-15 to ~20ish.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: CPU Load Spike daily

Post by BanditBBS »

belvdr wrote:
BanditBBS wrote:Update: as of 9am CST I have been at double digit load for 25 minutes and counting. I'm going to kill the nagios process and le tit sit for a few minutes just so I can use the server.
Just curious and on whim, what are your server details? I'm curious if it's a VM hardware thing.
I am going to test if it is a VM host issue hopefully later today. Getting another host configured now! I saw this at my house testing 2014 and I moved the server to a new VM host and the issue never appeared again. The weird thing is, both hosts were identical HP DL360G5 with 64GB ram and 2 quad core XEON.

The server at work we are running on is:
Dell M605, ESXi 5.x, 64 GB, two 6 core AMD 2.40Ghz
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
belvdr
Posts: 81
Joined: Tue Oct 08, 2013 9:17 pm

Re: CPU Load Spike daily

Post by belvdr »

BanditBBS wrote:
belvdr wrote:
BanditBBS wrote:Update: as of 9am CST I have been at double digit load for 25 minutes and counting. I'm going to kill the nagios process and le tit sit for a few minutes just so I can use the server.
Just curious and on whim, what are your server details? I'm curious if it's a VM hardware thing.
I am going to test if it is a VM host issue hopefully later today. Getting another host configured now! I saw this at my house testing 2014 and I moved the server to a new VM host and the issue never appeared again. The weird thing is, both hosts were identical HP DL360G5 with 64GB ram and 2 quad core XEON.

The server at work we are running on is:
Dell M605, ESXi 5.x, 64 GB, two 6 core AMD 2.40Ghz
Are your virtual hardware and tools up to date?

We're on Hyper-V 2012. My VM specs are:

4 vCPU
12 GB RAM
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: CPU Load Spike daily

Post by BanditBBS »

belvdr wrote:Are your virtual hardware and tools up to date?

We're on Hyper-V 2012. My VM specs are:

4 vCPU
12 GB RAM
Working on getting myself access to Vcenter so I can look at this myself.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
belvdr
Posts: 81
Joined: Tue Oct 08, 2013 9:17 pm

Re: CPU Load Spike daily

Post by belvdr »

BanditBBS wrote:Working on getting myself access to Vcenter so I can look at this myself.
Great! If I can be of any help, let me know. Here's the VMware KB on virtual hardware versions for various products:

http://kb.vmware.com/selfservice/micros ... Id=1003746

I haven't experienced any issue by upgrading virtual hardware and tools before, so it's a fairly safe option. Be sure to update tools first, then hardware.
belvdr
Posts: 81
Joined: Tue Oct 08, 2013 9:17 pm

Re: CPU Load Spike daily

Post by belvdr »

Were you able to find anything interesting in the VM hardware?
Locked