Page 1 of 5

CPU Load spike every 7 hours

Posted: Sun Jun 15, 2014 4:58 am
by WillemDH
Hello,

I noticed CPU load is spiking every 7 hours on my Nagios XI production server. Check out the screenshot. The spikes exactly happen every 7 hours and started after the update. As I do not know of any checks running every 7 hours, and the backup is only running once a day, I have no immediate idea where this is coming from.

Could this be related to the Nagios 2014 R1.1 update? How can I best troubleshoot this? By doing a top at the expected spike time?
Friday 23:04
Saturday 06:04
Saturday 13:04
Saturday 19:59
Sunday 03:09
Sunday 9:54
Any other ideas? This is not an urgent issue, as we have 6 vCPU's, but I would like to find out what is causing it.

Grtz

Willem

Re: CPU Load spike every 7 hours

Posted: Sun Jun 15, 2014 9:49 am
by BanditBBS
I'm having a very similar problem on my prod servers and now at home. http://support.nagios.com/forum/viewtop ... 16&t=27703 is my thread. The server I am trying to diagnose is downloaded OVF from nagios and no changes. Hopefully one of ours gets figured out and it helps the other person!

Re: CPU Load spike every 7 hours

Posted: Sun Jun 15, 2014 3:19 pm
by Box293
Around the time of the spike, what is being recorded in the event log?

Home > Monitoring Process > Event Log

Re: CPU Load spike every 7 hours

Posted: Tue Jun 17, 2014 6:39 am
by WillemDH
Very strange, but it seems the load spikes 'normalized'. I did nothing configuration-wise which could explain this change in behaviour. See screenshot. I did not have the opportunity to check the logs when the load spiked.

I'll check again in a week to see if there is any returning cpu load pattern.

Willem

Re: CPU Load spike every 7 hours

Posted: Tue Jun 17, 2014 8:07 am
by BanditBBS
Willem,

Have you been graphing/monitoring the I/O Wait on the nagios servers since your upgrade? (Using the monitoring wizard for nagios server) If so, has that been going high as well?

EDIT: IGNORE THIS - We have determined the I/O wait issue is real for us and a disk issue. Unrelated to the high CPU load spike.

Re: CPU Load spike every 7 hours

Posted: Tue Jun 17, 2014 9:08 am
by tmcdonald
Well then...
ohno.png
Fresh install, literally only logged in and ran top. Came back the next day to this. Needless to say this is something we'll be looking into knowing we can replicate it in-house.

Re: CPU Load spike every 7 hours

Posted: Tue Jun 17, 2014 9:13 am
by BanditBBS
Yeah, its not a big issue but there is definitely a pattern you can see and if it is only monitoring localhost, what the heck is on a 7 hour loop.

thanks!

Re: CPU Load spike every 7 hours

Posted: Tue Jun 17, 2014 10:42 am
by WillemDH
Good to know you guys can reproduce this. Good luck on finding the root cause!

Grtz

Willem

Re: CPU Load spike every 7 hours

Posted: Tue Jun 17, 2014 4:53 pm
by emislivec
This may be an issue with the way checks are (re)scheduled in Core.

top is good for seeing the load and CPU usage; in addition,

Code: Select all

ps -ef f
with the extra 'f' lists processes as a tree which makes it easy to what Nagios is running, as well as what processes are contributing to the load.

Also, nagios.log files from when the load spikes could be helpful. To get more information out of Core, debug_level=28 will write debug info on the process, scheduled events and checks to /usr/local/nagios/var/nagios.debug The extra writes from debug logging hit performance a bit, so it's not for general production use.

Are you seeing problems in monitoring: high check latencies, timeouts or retries? Anything else indicating a problem other than the high load?

Re: CPU Load spike every 7 hours

Posted: Wed Jun 18, 2014 2:15 am
by WillemDH
As I said
This is not an urgent issue, as we have 6 vCPU's, but I would like to find out what is causing it.
So no I'm not having any issues with Nagios atm. But I'm guessing If we had only 2 vCPU's, it could have been an issue. As I said in a later post, it seems the situation has stabilized, so it's kind of hard to give you any more information.