Nagios 4.1.1 and 4.2.3 cause very high loads on server

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.1.1 and 4.2.3 cause very high loads on server

Post by dwhitfield »

termcap wrote:Considering auto_scheduler is an experimental feature, is it ok if I keep it on for a production box ?
Ultimately, you are going to have to make a judgment call with your setup. We test, but we can't possibly test with every setup out there. The only metric I can give you is that there are not yet any bugs reported against that feature: https://github.com/NagiosEnterprises/na ... pen%20auto

As far as using an offloader, if you are having performance issues, that's probably the next place to go. With ours coming out though, maybe you should wait? Again, that's really a judgment call on your part. I wish I had a date to tell people on when that will be out.

If you want my opinion, with the caveat that I don't know how much pain 4.1.1 is causing you, I'd run 4.2.4 in a test box for a week with auto_scheduler turned on and see if I run into issues. If I had reason to be more paranoid, I'd give it two weeks. Of course, you might not have that much time.

We can always leave this open and see if anyone in the community is using auto_scheduler.

Also, you could always start a thread that specifically asks about opinions on auto_scheduler. I don't know how many people are going to come to a thread on high load to give an opinion on auto_scheduler, but we certainly have community members that contribute on the forums.
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Re: Nagios 4.1.1 and 4.2.3 cause very high loads on server

Post by termcap »

Thank you for the feedback dwhitfield, I will certainly wait for the off-loader, I understand it can take some time for its release, in the meanwhile I will keep the auto_rescheduler turned on as that solves my current concern. Meanwhile do you have any insights on the batching behavior that I have described in this post ? -> https://support.nagios.com/forum/viewto ... 36#p205298

PS: I am running 4.2.3 and not 4.1.1 (I upgraded from 4.1.1 long back)
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios 4.1.1 and 4.2.3 cause very high loads on server

Post by tmcdonald »

termcap wrote:On the other hand when I set the auto_reschedule_checks = 1 , the check scheduling behavior changes and rather than seeing bunches of checks like before, I now see a constant stream of checks running with almost negligible quiet times
Unless I am misreading your post, that seems to be the expected behavior when switching from 0 to 1 with auto_reschedule_checks:

https://assets.nagios.com/downloads/nag ... ule_checks
Former Nagios employee
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Re: Nagios 4.1.1 and 4.2.3 cause very high loads on server

Post by termcap »

tmcdonald wrote: Unless I am misreading your post, that seems to be the expected behavior when switching from 0 to 1 with auto_reschedule_checks:

https://assets.nagios.com/downloads/nag ... ule_checks
Yes, what I want to call attention to is when auto_reschedule_checks is set to 0, Nagios executes checks in unequal bunches, I see around 4 bunches with long quiet times between them.

1st bunch ~ 10% checks
2st bunch ~ 10-12% checks
3st bunch ~ 10-20% checks
4th bunch ~ 50% - 60% checks


Its the fourth bunch that causes my load to spike.

Further more, my another casual observation is that checks with the same names appear to bunched around the same time.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.1.1 and 4.2.3 cause very high loads on server

Post by dwhitfield »

termcap wrote: Intrestingly if I use the the experimental auto_scheduler option then my setup seems to run smoothly with only a couple of spikes in an 24 hour cycle.
While this was considered experimental in 3, it is no longer considered such in 4.

https://assets.nagios.com/downloads/nag ... gmain.html
vs.
https://assets.nagios.com/downloads/nag ... gmain.html

Have you changed any of your settings in auto_rescheduling_interval or auto_rescheduling_window?

Ultimately, if you are not happy with anything you can get from interval/window, we'll need to start looking at logs. My gut says that interval/window don't help, with 7000+ checks you'll need to start looking at mod_gearman, or wait for our offloader, but there might be a bottleneck somewhere else.
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Re: Nagios 4.1.1 and 4.2.3 cause very high loads on server

Post by termcap »

dwhitfield wrote:
termcap wrote: ....but there might be a bottleneck somewhere else.
Yes you may be correct, I do think there is more to it than just Nagios, will keep the thread updated.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.1.1 and 4.2.3 cause very high loads on server

Post by dwhitfield »

If tar -zcvf /tmp/supporttar.tar.gz /usr/local/nagios/var is too large we can just start with nagios.log in that dir, but we should probably get a look at what's actually going on on the system.
Locked