Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Post by Guest »

On Wed, 2010-12-01 at 15:40 +0100, Fredrik Thulin wrote:
> On Wed, 2010-12-01 at 15:14 +0100, Andreas Ericsson wrote:
> ...
> > > Host checks were still being scheduled, and every time a host check was
> > > found at the front of event_list_low, Nagios would log "We're not
> > > executing host checks right now, so we'll skip this event." and then
> > > sleep for sleep_time seconds (0.25 was my setting, based on (Ubuntu)
> > > defaults) (!!!).
> >
> >
> > This should only happen if you've set a check_interval for hosts but
> > have disabled them globally, either via nagios.cfg or via an external
> > command. It seems weird that we run usleep() instead of just issuing
> > a sched_yield() or something though, which would be a virtual noop
> > unless other processes are waiting to run.
>
> Guilty of setting a check_interval for hosts, even on slave servers,
> yes.

Mea culpa. This sounded so plausible that I confessed right away, but
upon actually looking at my host template (all hosts use this), I don't
see what makes Nagios schedule host checks. This is what I was running
at the time (I've since tried to tune the reaping pass by disabling flap
detection, perf_data, event_handler and notifications on the check slave
servers (without any dramatical improvement)) :

define host {
name SU-generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1

max_check_attempts 10
notification_interval 1
notification_period 24x7
notification_options d,u,r

register 0
}

> > > I made the attached minimalistic patch to not sleep if the next event in
> > > the event list is already due.
> > >
> >
> > Seems sensible, but I think it can be improved, such as issuing either
> > a sched_yield() or, if sched_yield() is not available, running usleep(10)
> > every 100 skipped items or so. That would avoid pinning the cpu but would
> > still be a lot faster than what we have today.
>
> What is sched_yield? I can't find that function anywhere in the source
> code. Feel free to improve the patch - as I've previously said C isn't
> my game.

Since you haven't responded or elaborated on your enhancement
suggestion, how about applying the patch I sent until someone works up
the incentive to improve it further?

> I'll try changing reaping interval to every 2 seconds as per your
> advice, but I guess it will still take 30-40% of the total time.

Tried this. When reaping every 2 seconds, each pass takes ~0.7 seconds
and no real improvement in check latency can be observed.

/Fredrik







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked