Page 4 of 8

Re: CPU Load Spike daily

Posted: Fri Jul 11, 2014 2:44 pm
by BanditBBS
belvdr wrote:Were you able to find anything interesting in the VM hardware?
Nope! It didn't have tools installed, so I corrected that issue, but I am still having hideous load spikes. Either stopping Nagios or mysqld(random which does it) brings it back down, then it operates fine again for a while. I'm still really leaning towards virtual environment as being my issue. I may try and get a physical box. I have to get this working and released to customer!

Re: CPU Load Spike daily

Posted: Mon Jul 14, 2014 11:09 am
by Smark
BanditBBS wrote:
belvdr wrote:Were you able to find anything interesting in the VM hardware?
Nope! It didn't have tools installed, so I corrected that issue, but I am still having hideous load spikes. Either stopping Nagios or mysqld(random which does it) brings it back down, then it operates fine again for a while. I'm still really leaning towards virtual environment as being my issue. I may try and get a physical box. I have to get this working and released to customer!
If others are getting the same symptoms then it's not just YOUR environment. If it's everyone's virtual environment then whatever recently changed in Nagios caused it and needs to be rectified.

We have room to spare but this is my current dashboard with the RAMDisk re-enabled. Without the RAMDisk my load spikes are 25ish, with it they're 15ish.
2014-07-14_09-08-31.png
Edit: You can see the scheduler starting to get spike-y too. After a restart or a config change it is pretty much flat from 0-3min. Our check frequency is set to 3min so we wouldn't ever expect anything past that.

Re: CPU Load Spike daily

Posted: Mon Jul 14, 2014 12:03 pm
by belvdr
Smark wrote:If others are getting the same symptoms then it's not just YOUR environment. If it's everyone's virtual environment then whatever recently changed in Nagios caused it and needs to be rectified.
We were simply trying to find a common denominator for these issues. What does your environment look like? I am not seeing this anywhere on Hyper-V.

Re: CPU Load Spike daily

Posted: Mon Jul 14, 2014 12:34 pm
by BanditBBS
Smark, most others who have this issue it doesn't bring their machine to a crawl. When your server spike is it unusable like mine?

Re: CPU Load Spike daily

Posted: Tue Jul 15, 2014 3:04 pm
by BanditBBS
Well, still having the issue on a completely new system. Built a new VM running RHEL 5 64-bit and the issue instantly happened. If I stop the nagios service it definitely makes a difference, but it still goes high.

I just deactivated all services except host pings and localhost load. The load is still going up to 2 just doing those few of checks. I would think it shouldn't be higher that 0.1 doing just some pings.

Re: CPU Load Spike daily

Posted: Tue Jul 15, 2014 6:46 pm
by BanditBBS
Doing a bunch of testing on this last night/today/tonight(I though being a manager would be fun!)!

I have a 8 core 64GB VM. It was hitting loads as high as 180 with 2014r1.2 on it with 900 total checks(hosts and services combined). I just reverted back to a snapshot from when I was handed over the VM last night. For 90 minutes now I have been watching the load and doing nothing else, 0.00/0.00/0.00. In addition to this, I had a 2012r2.5 server that had been idling forever(test box) that never spiked over 0.5. Well, I upgraded that to 2014r1.2 and the load has gone up to 0.7/0.68/0.7. I realize that is still low, but it only has localhost in it. The load is actually falling very very slowly over time as well.

I have another machine being built for me right now that will be a fresh/clean install with nothing what so ever added, just an IP and RHN. I'll let you all know what I find out.

Re: CPU Load Spike daily

Posted: Thu Jul 17, 2014 11:49 am
by tylerhoadley
Are you using perl scripts intensively? SNMP, webinject....

I too have had this problem, and I believe its due to embedded perl not being apart of nagios core 4.

to rectify this, I implemented mod_gearman (nagios4 rpm's) with embedded perl again and tweaked its default config to queue these perl scripts.

I also posted here http://support.nagios.com/forum/viewtop ... =6&t=28153 prior to getting my access to the customer forums.

Cheers

Re: CPU Load Spike daily

Posted: Thu Jul 17, 2014 11:50 am
by BanditBBS
O.M.G.!!!!!

First off, my high load issue is not limited to 2014. I had a second server built and installed 2012r2.9 on it. The load issue was present on both servers and at the same time! I firmly believe it is an issue in my vm environment, not sure what though, however and IP conflict was discovered on the VM host today and was resolved. Server has been fine since that time. If that ends up being the issue for me, I'm going to be upset someone's non-attention to detail screwed me this long! Waiting a total of 24 hours before I call my issue resolved.

Re: CPU Load Spike daily

Posted: Thu Jul 17, 2014 12:01 pm
by slansing
I wanted to stop in and let you know the high load issue is still on our radar. As a temporary bandaid (until we push a resolution out) we are recommending that you follow this new FAQ entry:

http://support.nagios.com/wiki/index.ph ... _Intervals

Awesome news on the VMware front through, please let us know if it stays healthy in the next 24 hours!

Re: CPU Load Spike daily

Posted: Thu Jul 17, 2014 12:09 pm
by tylerhoadley
Well, an IP confliction could do that, seen DNS controllers reboot cause hugh spikes too because it clogs the queue execution and hope that this resolved your issue.

However I posted my finding under your thread for general public reading in the customer forum and potentially help you and others. If you see my other thread, you will see the high load peaks, and perl is my culprit, I know this now and the fact we use it intensively on our network gear and webinjects which caused bottlenecks in the native nagios queue. the load picture is evidence enough as well as the gearman_top queue visual cmd... and just prior to the embedded perl mark, it was still a bit high before I flipped it in /etc/mod_gearman/mod_gearman_worker.conf to use embedded perl again. that load pic is over 5 weeks time.

Hope this helps anyone out there with this type of setup and using embedded perl in nagios core 3 or XI 2012R*.* because nagios 4 doesn't have this function anymore and WILL cause you grief and time troubleshooting to find this info out... 5 weeks later, my system is stable and running better than before.

Cheers,