Nagios 4 Load issues

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Wilb
Posts: 9
Joined: Wed Oct 19, 2016 3:19 am

Re: Nagios 4 Load issues

Post by Wilb »

OK, spoke a little too soon on this. Having now got back in front of my machine to look further, I realise the default settings for auto_reschedule_checks are just resulting in loads of checks perpetually being rescheduled - so I had checks from before the weekend which had still not executed. I've now lowered the auto_rescheduling_window to 45 which has solved this issue and increased the load a little (but this is to be expected).
Screenshot_2016-10-25_13-42-45.png
Screenshot_2016-10-25_13-42-45.png (10.84 KiB) Viewed 1876 times
Also attached are the obfuscated outputs of top & ps that I captured the other day. To be honest though I'm not sure that there is anything significant in there at all.
Attachments
top_output.txt
(30.78 KiB) Downloaded 141 times
ps_output.txt
(10.49 KiB) Downloaded 133 times
Wilb
Posts: 9
Joined: Wed Oct 19, 2016 3:19 am

Re: Nagios 4 Load issues

Post by Wilb »

As a final additional point for now, looking at the load profile on my 4.2.1 host it does look like those peaks are still there, albeit on a significantly lower scale.
still_peaky.png
still_peaky.png (10.82 KiB) Viewed 1876 times
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4 Load issues

Post by dwhitfield »

I don't know how much time you spent on the old thread, but did you go through our document on tuning? https://assets.nagios.com/downloads/nag ... uning.html
Wilb
Posts: 9
Joined: Wed Oct 19, 2016 3:19 am

Re: Nagios 4 Load issues

Post by Wilb »

I've not done any in depth tuning yet - but I'm a long time Nagios user through various versions and it's the first time I've encountered any hosts with this sort of peaky load pattern which settles back down into nothing. My thinking is that if I was seeing genuine performance issues then I would expect the box to have perpetually large 5 and 15 minute load averages. Having said that, at first glance I believe most of the recommendations on the tuning page are already in place.

The good news is that after a day of running my 4.0.8 servers with auto_reschedule_checks enabled with an auto_rescheduling_window set to 45 the load profile of the servers look much better. I am still seeing slight peaks on a ~7 hour schedule though which still makes me think there is something happening behind the scenes that may cause problems down the line.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4 Load issues

Post by dwhitfield »

Any chance you could give the ramdisk a shot, even on a test box? A lot of people see significant improvement with that. Before putting the ramdisk in production, please make sure you've tested thoroughly. Setting up the ramdisk is relatively straightforward, but uninstalling it can definitely be a challenge.
Wilb
Posts: 9
Joined: Wed Oct 19, 2016 3:19 am

Re: Nagios 4 Load issues

Post by Wilb »

I'll try and make some time to do that. Our infra guys confirmed that the SAN that backs off our production instances show little latency around these spike times, but worth a go as we're not on SSDs or anything.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4 Load issues

Post by dwhitfield »

I did a little research on our ramdisk scripts, which are unofficial. They are unofficial because they were written by support rather than the dev team, which means they don't get regular testing and updating. So far, I knew all of this.

What I found out is that they only currently work for XI. If you think a dev-supported version of our ramdisk scripts would be a great addition to Core, I encourage you to file a feature request at https://github.com/NagiosEnterprises/na ... issues/new.

All of that aside, let us know how it goes!
Last edited by dwhitfield on Wed Oct 26, 2016 12:17 pm, edited 1 time in total.
Reason: supported vs. supporter
Locked