CPU Load Spike daily

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: CPU Load Spike daily

Post by abrist »

Fair enough. Can we move you to a remote, I want to see this behavior first hand . . . .
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: CPU Load Spike daily

Post by BanditBBS »

abrist wrote:Fair enough. Can we move you to a remote, I want to see this behavior first hand . . . .
Umm, no(that felt good!). I don't have it installed anymore. With all the testing I was doing to my environment I had reverted back to a clean install and honestly don't have the time to test it right now.

All I did was install it and created a couple services. One with a 5 minute check interval and one with a 60 minute interval. Once restarted(after deleting retention or waiting long enough for checks to be in past) start nagios and let it schedule and run the checks. Then just look at the last check time and next check time, will be no where near 5 or 60 minutes. Reschedule the next check of the service results in same behavior. The next check runs immediately and then the next scheduled check is 7-10 minutes for the 5 minute check and was 30-40 minutes for the 1 hour check.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: CPU Load Spike daily

Post by abrist »

Thanks for the details, I will pass them off to the Erics.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
tylerhoadley
Posts: 43
Joined: Tue Jul 02, 2013 1:41 pm

Re: CPU Load Spike daily

Post by tylerhoadley »

Hey,

So I looked at the commit changes that were imposed.

I have a dev XI box where I was running r1.1 with core 4.0.6. before upgrading I multiplied 1000 service checks (HTTP) and forced a check on all services on this host.

the monitoring engine queue showed the huge spike for checks...

I then downloaded the latest XI tarball, extracted the nagios-core (nagios-4.0.7.tar.gz) and patched the the two *.c files with the commited code. tar'd it back up and then updated XI to r1.3 (via ./upgrade)

I stopped core, Deleted the retention.dat file and waited 5 mins for these 1000 1min check to lapse. (seen all checks in one column ready to go) I then started it back up, and here is my queue.

Image

I'm assuming this is what the "commit" is suppose to do... it looks fairly even on queued checks.

Please Comment.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: CPU Load Spike daily

Post by BanditBBS »

Tyler, yes it is what it should do, howveer, in my test I saw the 5 minute check interval being ignored and checks being schedule anywhere from 7-10 minutes.

Can you verify after some of the checks run the next scheduled check time is actually what you had set?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
tylerhoadley
Posts: 43
Joined: Tue Jul 02, 2013 1:41 pm

Re: CPU Load Spike daily

Post by tylerhoadley »

First 5 mins....

Service State: Ok
Duration: 30m 55s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:20:00
Next Check: 2014-07-21 15:25:00

second 5 mins...

Service State: Ok
Duration: 35m 8s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:25:00
Next Check: 2014-07-21 15:30:00

and third 5 min wait....

Service State: Ok
Duration: 40m 5s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:30:00
Next Check: 2014-07-21 15:35:00

Fourth 5 min....

Service State: Ok
Duration: 50m 15s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:40:00
Next Check: 2014-07-21 15:45:00
Last edited by tylerhoadley on Mon Jul 21, 2014 2:40 pm, edited 2 times in total.
User avatar
tylerhoadley
Posts: 43
Joined: Tue Jul 02, 2013 1:41 pm

Re: CPU Load Spike daily

Post by tylerhoadley »

I will follow a few other service checks as well, just to be safe.


All seem to be reporting the same "correct" check times.
Last edited by tylerhoadley on Mon Jul 21, 2014 2:50 pm, edited 1 time in total.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: CPU Load Spike daily

Post by BanditBBS »

Very interesting, can't imagine what I had done wrong, but if I am proven wrong, more power to ya :) If it continues to act right to you, then it had to be something I did.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
estanley
Posts: 19
Joined: Thu Apr 08, 2010 9:28 am

Re: CPU Load Spike daily

Post by estanley »

I believe I found and fixed this issue in commit 4e6eb7c. The problem was that the next check time for services was always getting an additional random delay of between 0 and the check interval. This should only have been true when the next check time is further in the future because of a check timeperiod constraint. The logic for host checks was already correct. Please test this commit and let us know whether it resolves the issue.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: CPU Load Spike daily

Post by BanditBBS »

estanley wrote:I believe I found and fixed this issue in commit 4e6eb7c. The problem was that the next check time for services was always getting an additional random delay of between 0 and the check interval. This should only have been true when the next check time is further in the future because of a check timeperiod constraint. The logic for host checks was already correct. Please test this commit and let us know whether it resolves the issue.
Eric,

Tyler seems to have tested it well and I am comfortable with his results. Unless you really want me to test the commit, I'd say its golden :)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked