Page 1 of 1

Re: [Nagios-devel] RFC/PATCH: Handle external service check results

Posted: Fri Apr 13, 2007 3:00 am
by Guest
Stefan Rompf wrote:
> Hi,
>
> like other people on this list, we've been bitten by the problem that nagios
> fork()s subprocesses when service check results arrive via the external
> command pipe. When nagios lags for example due to hostchecks, in most cases
> enough forked processes pile up to bring nagios over its resource limits.
> Even if this doesn't happen, results will be fed in the wrong order.
>
> I've developed the following solution that is quite different to the spool
> directory approach:
>
> -passive service check results are added to passive_check_result_list as
> before. However, for our use case it does not make sense to keep multiple
> results for one service as soon as nagios starts lagging. So we have a
> duplicate detection that keeps only the newest check result per service.
> -Instead of forking subprocesses, a permanently running thread feeds the
> results on passive_check_result_list back via write_svc_message(). So two
> threads of the process talk to each other via a pipe, but I didn't want to
> make my changes too invasive ;-)
> -Instead of polling the command pipe every 0.5 seconds, select() on the file
> descriptor is used now if there are enough external_command_buffer_slots.
> Problem here was that with no writer on the pipe, select() endlessly signaled
> an EOF. Fixed by opening the command pipe R/W.
>
> The patch has been developed on nagios 2.6 and linux, afterwards forward
> ported to current CVS. It seems to work, but needs further testing. Even
> compilation tests on different architectures would be interesting, I'm not
> sure how widespread the tsearch()-API is.
>
> Thoughts?
>
> Stefan

Sounds interesting. I'm still leaning towards the spool directory idea,
as it provides from resistance to problems when Nagios isn't running
and/or the external command file pipe fills up.

One thing to watch out for is the idea of discarding old/duplicate check
results. This isn't always a good thing. Consider security alerts that
come in as passive checks. If you discard all but the newest alert you
could potentially miss some critical information...



Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]