Re: [Nagios-devel] RFC/PATCH: Handle external service check results

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] RFC/PATCH: Handle external service check results

Post by Guest »

Stefan Rompf wrote:
> Hi,
>
> like other people on this list, we've been bitten by the problem that nagios
> fork()s subprocesses when service check results arrive via the external
> command pipe. When nagios lags for example due to hostchecks, in most cases
> enough forked processes pile up to bring nagios over its resource limits.
> Even if this doesn't happen, results will be fed in the wrong order.
>
> I've developed the following solution that is quite different to the spool
> directory approach:
>
> -passive service check results are added to passive_check_result_list as
> before. However, for our use case it does not make sense to keep multiple
> results for one service as soon as nagios starts lagging. So we have a
> duplicate detection that keeps only the newest check result per service.
> -Instead of forking subprocesses, a permanently running thread feeds the
> results on passive_check_result_list back via write_svc_message(). So two
> threads of the process talk to each other via a pipe, but I didn't want to
> make my changes too invasive ;-)
> -Instead of polling the command pipe every 0.5 seconds, select() on the file
> descriptor is used now if there are enough external_command_buffer_slots.
> Problem here was that with no writer on the pipe, select() endlessly signaled
> an EOF. Fixed by opening the command pipe R/W.
>
> The patch has been developed on nagios 2.6 and linux, afterwards forward
> ported to current CVS. It seems to work, but needs further testing. Even
> compilation tests on different architectures would be interesting, I'm not
> sure how widespread the tsearch()-API is.
>
> Thoughts?
>
> Stefan

Sounds interesting. I'm still leaning towards the spool directory idea,
as it provides from resistance to problems when Nagios isn't running
and/or the external command file pipe fills up.

One thing to watch out for is the idea of discarding old/duplicate check
results. This isn't always a good thing. Consider security alerts that
come in as passive checks. If you discard all but the newest alert you
could potentially miss some critical information...



Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked