Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Post by Guest »

On Fri, 2010-12-03 at 12:55 +0100, Andreas Ericsson wrote:
...
> > No, actually not. Erlang is a soft real time system. My approach was to
> > ask the Erlang VM to send me a tick every N ms (N = 300s * 1000 / number
> > of checks). So if N is 50, the VM will signal me once every 50 ms, very
> > precisely and without any drift.
> >
>
> If N is constant, it can't be the lvalue of the above expression.

I meant to say that N is calculated when the list of checks is
(re)loaded. As I don't even try to have retry_intervals and such, a
steady tick interval works great as long as I can finish initiating
another service check in between ticks.

Note that I say initiate, not complete - I have more cores that can
finish the job of starting the check.

Applying back pressure to the spawner when there in fact *isn't* enough
system resources to start checks is an interesting topic that I don't
have any clear ideas about how to do. My naive implementation was to not
ever queue tick signals, but rather skip them if I couldn't finish
processing a tick before additional ticks arrive.

> > I then just had to finish starting another check command in = > and go back to sleep. All handling of check results is done completely
> > asynchronous to this starting of new checks.
> >
> > This is all in src/npers_spawner.erl if anyone is interested in the
> > details.
> >
>
> That's still "doing more than you did before", on a system level, so the
> previous implementation must have been buggy somehow. Perhaps erlang
> blocked a few signals when the signal handler was already running, or
> perhaps you didn't start enough checks per tick?

I agree it is more work for the scheduler, but that is better than
having under-utilized additional CPUs/cores, right?

> If the above expression was correct (N is not constant), this algorithm
> makes the cost for running a single check exponential with the number of
> checks to run. Ie, the more checks you have, the more expensive each check
> will become. The curve will converge on (infinity - 1) faster with a larger
> exponent. In this case, the exponent is ticks/sec, so reducing the ticktime
> means you're effectively reducing performance unless there are other
> factors involved that shaves enough cycles to make this change disappear
> in the noise.

Sorry, you lost me here. Perhaps I just failed to explain what N was?

/Fredrik







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked