Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Guest · Post by **Guest** » Fri Dec 03, 2010 9:06 am

On Wed, 2010-12-01 at 15:40 +0100, Fredrik Thulin wrote:
> On Wed, 2010-12-01 at 15:14 +0100, Andreas Ericsson wrote:
> ...
> > > Host checks were still being scheduled, and every time a host check was
> > > found at the front of event_list_low, Nagios would log "We're not
> > > executing host checks right now, so we'll skip this event." and then
> > > sleep for sleep_time seconds (0.25 was my setting, based on (Ubuntu)
> > > defaults) (!!!).
> >
> >
> > This should only happen if you've set a check_interval for hosts but
> > have disabled them globally, either via nagios.cfg or via an external
> > command. It seems weird that we run usleep() instead of just issuing
> > a sched_yield() or something though, which would be a virtual noop
> > unless other processes are waiting to run.
>
> Guilty of setting a check_interval for hosts, even on slave servers,
> yes.

Mea culpa. This sounded so plausible that I confessed right away, but
upon actually looking at my host template (all hosts use this), I don't
see what makes Nagios schedule host checks. This is what I was running
at the time (I've since tried to tune the reaping pass by disabling flap
detection, perf_data, event_handler and notifications on the check slave
servers (without any dramatical improvement)) :

define host {
name SU-generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1

max_check_attempts 10
notification_interval 1
notification_period 24x7
notification_options d,u,r

register 0
}

> > > I made the attached minimalistic patch to not sleep if the next event in
> > > the event list is already due.
> > >
> >
> > Seems sensible, but I think it can be improved, such as issuing either
> > a sched_yield() or, if sched_yield() is not available, running usleep(10)
> > every 100 skipped items or so. That would avoid pinning the cpu but would
> > still be a lot faster than what we have today.
>
> What is sched_yield? I can't find that function anywhere in the source
> code. Feel free to improve the patch - as I've previously said C isn't
> my game.

Since you haven't responded or elaborated on your enhancement
suggestion, how about applying the patch I sent until someone works up
the incentive to improve it further?

> I'll try changing reaping interval to every 2 seconds as per your
> advice, but I guess it will still take 30-40% of the total time.

Tried this. When reaping every 2 seconds, each pass takes ~0.7 seconds
and no real improvement in check latency can be observed.

/Fredrik

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]