Page 6 of 8

Re: CPU Load Spike daily

Posted: Fri Jul 18, 2014 2:28 pm
by abrist
Fair enough. Can we move you to a remote, I want to see this behavior first hand . . . .

Re: CPU Load Spike daily

Posted: Fri Jul 18, 2014 2:34 pm
by BanditBBS
abrist wrote:Fair enough. Can we move you to a remote, I want to see this behavior first hand . . . .
Umm, no(that felt good!). I don't have it installed anymore. With all the testing I was doing to my environment I had reverted back to a clean install and honestly don't have the time to test it right now.

All I did was install it and created a couple services. One with a 5 minute check interval and one with a 60 minute interval. Once restarted(after deleting retention or waiting long enough for checks to be in past) start nagios and let it schedule and run the checks. Then just look at the last check time and next check time, will be no where near 5 or 60 minutes. Reschedule the next check of the service results in same behavior. The next check runs immediately and then the next scheduled check is 7-10 minutes for the 5 minute check and was 30-40 minutes for the 1 hour check.

Re: CPU Load Spike daily

Posted: Mon Jul 21, 2014 12:35 pm
by abrist
Thanks for the details, I will pass them off to the Erics.

Re: CPU Load Spike daily

Posted: Mon Jul 21, 2014 2:15 pm
by tylerhoadley
Hey,

So I looked at the commit changes that were imposed.

I have a dev XI box where I was running r1.1 with core 4.0.6. before upgrading I multiplied 1000 service checks (HTTP) and forced a check on all services on this host.

the monitoring engine queue showed the huge spike for checks...

I then downloaded the latest XI tarball, extracted the nagios-core (nagios-4.0.7.tar.gz) and patched the the two *.c files with the commited code. tar'd it back up and then updated XI to r1.3 (via ./upgrade)

I stopped core, Deleted the retention.dat file and waited 5 mins for these 1000 1min check to lapse. (seen all checks in one column ready to go) I then started it back up, and here is my queue.

Image

I'm assuming this is what the "commit" is suppose to do... it looks fairly even on queued checks.

Please Comment.

Re: CPU Load Spike daily

Posted: Mon Jul 21, 2014 2:19 pm
by BanditBBS
Tyler, yes it is what it should do, howveer, in my test I saw the 5 minute check interval being ignored and checks being schedule anywhere from 7-10 minutes.

Can you verify after some of the checks run the next scheduled check time is actually what you had set?

Re: CPU Load Spike daily

Posted: Mon Jul 21, 2014 2:25 pm
by tylerhoadley
First 5 mins....

Service State: Ok
Duration: 30m 55s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:20:00
Next Check: 2014-07-21 15:25:00

second 5 mins...

Service State: Ok
Duration: 35m 8s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:25:00
Next Check: 2014-07-21 15:30:00

and third 5 min wait....

Service State: Ok
Duration: 40m 5s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:30:00
Next Check: 2014-07-21 15:35:00

Fourth 5 min....

Service State: Ok
Duration: 50m 15s
Service Stability: Unchanging (stable)
Last Check: 2014-07-21 15:40:00
Next Check: 2014-07-21 15:45:00

Re: CPU Load Spike daily

Posted: Mon Jul 21, 2014 2:27 pm
by tylerhoadley
I will follow a few other service checks as well, just to be safe.


All seem to be reporting the same "correct" check times.

Re: CPU Load Spike daily

Posted: Mon Jul 21, 2014 2:29 pm
by BanditBBS
Very interesting, can't imagine what I had done wrong, but if I am proven wrong, more power to ya :) If it continues to act right to you, then it had to be something I did.

Re: CPU Load Spike daily

Posted: Tue Jul 22, 2014 8:14 am
by estanley
I believe I found and fixed this issue in commit 4e6eb7c. The problem was that the next check time for services was always getting an additional random delay of between 0 and the check interval. This should only have been true when the next check time is further in the future because of a check timeperiod constraint. The logic for host checks was already correct. Please test this commit and let us know whether it resolves the issue.

Re: CPU Load Spike daily

Posted: Tue Jul 22, 2014 8:27 am
by BanditBBS
estanley wrote:I believe I found and fixed this issue in commit 4e6eb7c. The problem was that the next check time for services was always getting an additional random delay of between 0 and the check interval. This should only have been true when the next check time is further in the future because of a check timeperiod constraint. The logic for host checks was already correct. Please test this commit and let us know whether it resolves the issue.
Eric,

Tyler seems to have tested it well and I am comfortable with his results. Unless you really want me to test the commit, I'd say its golden :)