Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Post by Guest »

On Fri, 2010-12-03 at 11:40 +0100, Andreas Ericsson wrote:
> Sorry for the long delay. It seems I was half asleep when I scrolled by
> this mail earlier.

No problem.

> ...
> > What is sched_yield? I can't find that function anywhere in the source
> > code. Feel free to improve the patch - as I've previously said C isn't
> > my game.
> >
>
> sched_yield() causes the kernel to check through its scheduling queue and
> see if there are other processes waiting to run. If there are, those other
> processes will run. If not, the current process will continue running.

As I see it, the Nagios scheduler can't afford to miss the opportunity
to start another check, but I'm not going to protest if you prefer

if (...) {
shed_yield();
continue;
}

...
> > and with the tiniest C program that appends results to a file as
> > ocsp_command.
> >
>
> Use Nagios' own native perfdata writing instead and use a same-partition
> "mv" command to move the perfdata file to the reaper spool directory.

Thanks for the tip, I'll have a look at that.

> > We should have a beer and talk about scheduling sometime, since we're
> > both in Stockholm (?).
> >
>
> I'm in gothenburg. We frequently do developer beer things at our office
> here though, so if you happen to come by, we'll crack open a few :)

Thanks for the invite =).

> > My first scheduler ticked once per second and *BAM* started 30+ checks.
> >
> > A lot of the times, a significant number of these checks were exactly
> > the same check (but different target hosts), so my theory is they all
> > requested the very same resources around the same millisecond. When I
> > changed the scheduler to start one check every 50 ms instead, I saw that
> > I could start around 25% more checks every second. Other theories are
> > welcome, but that was my observation.
> >
>
> The problem is the tick-time. I'm guessing you fired the checks and then
> did sleep(1) (or whatever the erlang equivalent is), but that means you
> lose a couple of milliseconds each second (the time it takes to fire up
> the checks), which will inevitably cause you to drift in the scheduler.
> All such sleep()-alike calls are implemented in the kernel with a TICK
> precision that varies from system to system. Most systems have a 10 usec
> tick-rate, so if you start sleeping at 1.94 seconds and sleep for one
> second you'll end up at 2.94 instead of, as a scheduler would wish, at
> 2.0 when checks are actually scheduled.

No, actually not. Erlang is a soft real time system. My approach was to
ask the Erlang VM to send me a tick every N ms (N = 300s * 1000 / number
of checks). So if N is 50, the VM will signal me once every 50 ms, very
precisely and without any drift.

I then just had to finish starting another check command in = I'll see about adding something similar to your patch to the scheduler.
> It's a good one in spirit, but the implementation left a little to be
> desired.

Thanks! It would really make my life easier if the patch was in the next
Nagios release Ubuntu ships =).

/Fredrik







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked