Page 1 of 1

Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Posted: Fri Dec 03, 2010 11:20 am
by Guest
On Fri, 2010-12-03 at 11:40 +0100, Andreas Ericsson wrote:
> Sorry for the long delay. It seems I was half asleep when I scrolled by
> this mail earlier.

No problem.

> ...
> > What is sched_yield? I can't find that function anywhere in the source
> > code. Feel free to improve the patch - as I've previously said C isn't
> > my game.
> >
>
> sched_yield() causes the kernel to check through its scheduling queue and
> see if there are other processes waiting to run. If there are, those other
> processes will run. If not, the current process will continue running.

As I see it, the Nagios scheduler can't afford to miss the opportunity
to start another check, but I'm not going to protest if you prefer

if (...) {
shed_yield();
continue;
}

...
> > and with the tiniest C program that appends results to a file as
> > ocsp_command.
> >
>
> Use Nagios' own native perfdata writing instead and use a same-partition
> "mv" command to move the perfdata file to the reaper spool directory.

Thanks for the tip, I'll have a look at that.

> > We should have a beer and talk about scheduling sometime, since we're
> > both in Stockholm (?).
> >
>
> I'm in gothenburg. We frequently do developer beer things at our office
> here though, so if you happen to come by, we'll crack open a few :)

Thanks for the invite =).

> > My first scheduler ticked once per second and *BAM* started 30+ checks.
> >
> > A lot of the times, a significant number of these checks were exactly
> > the same check (but different target hosts), so my theory is they all
> > requested the very same resources around the same millisecond. When I
> > changed the scheduler to start one check every 50 ms instead, I saw that
> > I could start around 25% more checks every second. Other theories are
> > welcome, but that was my observation.
> >
>
> The problem is the tick-time. I'm guessing you fired the checks and then
> did sleep(1) (or whatever the erlang equivalent is), but that means you
> lose a couple of milliseconds each second (the time it takes to fire up
> the checks), which will inevitably cause you to drift in the scheduler.
> All such sleep()-alike calls are implemented in the kernel with a TICK
> precision that varies from system to system. Most systems have a 10 usec
> tick-rate, so if you start sleeping at 1.94 seconds and sleep for one
> second you'll end up at 2.94 instead of, as a scheduler would wish, at
> 2.0 when checks are actually scheduled.

No, actually not. Erlang is a soft real time system. My approach was to
ask the Erlang VM to send me a tick every N ms (N = 300s * 1000 / number
of checks). So if N is 50, the VM will signal me once every 50 ms, very
precisely and without any drift.

I then just had to finish starting another check command in = I'll see about adding something similar to your patch to the scheduler.
> It's a good one in spirit, but the implementation left a little to be
> desired.

Thanks! It would really make my life easier if the patch was in the next
Nagios release Ubuntu ships =).

/Fredrik







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]