Page 1 of 1

Re: [Nagios-devel] RFC/RFP Nagios command workers

Posted: Wed Jun 29, 2011 7:51 am
by Guest
On 06/28/2011 05:13 PM, Matthieu Kermagoret wrote:
> Hi list,
>
> First of all, sorry for the delayed response, last month was pretty
> crazy at work :-p
>
> On Mon, May 23, 2011 at 12:38 PM, Andreas Ericsson wrote:
>> On 05/23/2011 11:37 AM, Matthieu Kermagoret wrote:
>> Because shipping an official module that does it would mean not only
>> supporting the old complexity, but also the new one. Having a single
>> default system for running checks would definitely be preferrable to
>> supporting multiple ones.
>>
>
> I agree with you when you say that a single system is better than two.
> However I fear that the worker system would need very more code than a
> simpler system (and less code usually means less bugs) and that the
> worker system would destabilize Nagios.

Quite the opposite, really. The amount of backflips we're doing right
now to make sure the core is threadsafe is huge, so it's likely this
patch will even reduce the LoC count in Nagios.

> For years it's been Nagios'
> development team's policy not to include features that could be
> written as modules. I liked it that way.
>

Everything can be written as modules. The worker process thing will have
the nice sideeffect that modules can register sockets that core Nagios
will listen to events from, with a special callback when there's data
available on the socket. This reduces complexity of a lot of modules by
a fair bit. With worker-processes instead of multiple threads it's also
trivial to write modules with regards to thread-safety, and potential
leaks in worker modules (such as embedded perl) can be ignored, since
we can just kill the worker process and spawn a new one once it's done
some arbitrary number of checks. This is how Apache handles leaky
modules and we could do far worse than using the world's most popular
webserver as an example.

There's also another thing. Mozilla Firefox has been accused of feature
stagnation in the core since they let addon writers handle adding new
features, and far from everybody uses modules. Google Chrome has taken
a fair share of users from Firefox lately, partly because it implements
some of the more popular modules directly in-core. Nagios has also been
accused of feature stagnation, even though broker module development
has flourished in recent years (nagios with modules is nothing like the
old nagios without them), so it makes sense to add certain selected
module capabilities to the core.

>>> 1) Remove the multiple fork system to execute a command. The Nagios
>>> Core process forks directly the process that will exec the command
>>> (more or less sh's parsing of command line, don't really know if this
>>> could/should be integreted in the Core).
>>>
>>
>> This really can't be done without using multiple threads since the
>> core can't wait() for input and children while at the same time
>> issuing select() calls to multiplex the new output of currently
>> running checks.
>>
>
> What about a signal handler on SIGCHLD that would wait() terminated
> process and a select() on pipe FDs connected to child processes, with
> a timeout to kill non-responding checks ?
>

Highly impractical for shortlived children and with so many pipes to
listen to. It would mean we'd be iterating over the entire childstack
several hundred times per second just to read new output. We're forced
to do that, since pipes can't contain an infinite amount of data. The
child's write() call will fail when the pipe is full and the children
won't exit while waiting to write. Doing so many select() calls means
the scheduler will suffer greatly, along with modules that wish to run
code in the main thread every now and then.

With sockets, we can let each worker handle a smaller number of checks
at the time, and since they have no scheduling responsibilities the
master process is free to just await new input.

>>> 2) The root process and the subprocess are connected with a pipe() so
>>> that the command output can be fetched by reading the pipe. Nagios
>>> will maintain a list of currently running commands.
>>>
>>
>> Pipes are limited in th

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]