On 08/17/2011 01:26 AM, Adam Augustine wrote:
> On Mon, Aug 15, 2011 at 7:25 AM, Andreas Ericsson wrote:
>
>> On 08/09/2011 09:13 PM, Adam Augustine wrote:
>>>
>>> But in spite of that, it seems that moving the reaper code into a thread
>>> would be generically useful for Nagios. I know it has been discussed on
>> this
>>> list in the past.
>>>
>>
>> It would also cause a bunch of problems. What we're working on instead is
>> implementing worker processes which communicate with a master process via
>> a unix socket. One such process could act as a (mostly dormant) reaper for
>> the checkresult files in the spool directory.
>>
>>
> Ah, it seems the scope of the worker process socket effort is much larger
> than I had expected.
Indeed it is.
> Does this mean that modules that were initially NEBs
> can instead be implemented as wholly independent processes, communicating
> back over that socket (presumably more than just a unix domain socket, but
> also a network socket as well)?
>
No. If that was the case Nagios core would be dependant on the latency
of a different process to allow eventbroker modules to be able to block
checks, notifications and whatnot.
>
>> > If the Merlin reaper thread is wholly contained within the Merlin NEB
>> (as it
>>> appears to be) and is not in any way patching the Nagios core code, then
>> my
>>> question is, how is that working without conflicting with the main event
>>> loop reaper code?
>>
>> Mainly by making Nagios itself threadsafe all API's the broker module uses.
>> That's why Merlin needs Nagios 3.3.1 or one of the post-3.2.3 versions made
>> available through git.op5.org
>>
>>
> Ah, so there are modifications necessary to pre-3.3.1 versions of Nagios to
> override the reaping process.
You're not listening. There are several *different* reapers running at
the same time. The changes for Merlin's reaping has absolutely nothing
at all to do with Nagios' reaping of checkresults and external commands.
The changes in Nagios core for the benefit of Merlin was simply to make
macros obtainable from several threads at the same time, and then using
those new threadsafe macro fetching apis from other apis, making those
other apis safe to use from multiple threads as well.
> Nagios 3.3.1 now has real (and threadsafe)
> APIs for manipulating internal data structures, where before there weren't
> any. This makes perfect sense to me. The Merlin reaper thread uses the same
> API to update the in-memory data structures that the main event loop reaper
> code would, so no conflicts.
>
Not really. Merlin uses manual but atomic updates of the internal objects.
Atomic operations are inherently threadsafe. The changes were solely so
that external merlin events triggering status updates, log messages and
other things that require macro resolution could happen in tandem with
Nagios' normal operations without either of them causing a NULL-read
segmentation violation.
>
>> > My quick glance at the NEB callbacks for
>>> EVENT_CHECK_REAPER seems to indicate that there isn't any
>>> NEBERROR_CALLBACKOVERRIDE associated with it. So I am very curious how it
>> is
>>> being handled.
>>>
>>
>> You're talking about two different reapers. They don't interfere with
>> each other at all.
>>
>
> I think I understand now, presuming that the Merlin reaper and the main
> Nagios event loop reaper are both using the new thread safe APIs.
>
They are, more or less.
> But I am still a little confused. You mention above that implementing the
> reaper code as a Nagios thread would cause a lot of problems, but isn't that
> what the Merlin NEB module does?
Yes, it does, but Merlin doesn't load a bunch of modules that it has to
make sure are only called in a threadsafe manner.
> Are you encountering a lot of problems with
> that approach?
Not really. Merlin is running stable in production on well over 500 systems
and has been doing that for the past year or so.
> Or was it specifically the /moving/ the reaper into a thread
> that you thought was a bad idea?
>
For Nagios, yes. It's a terribly bad idea to f
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]