> Hello list,
>
> I apologize in advance should this topic have already been raised in the
> past.
>
>
>
> We make fairly intensive use of Nagios at our company (around 1700
> machines, for 26000 services), using a cluster of OpenBSD machines.
>
> We do distribution using NSCA (a re-made Ruby implementation of the
> server), and external handler programs to offload sending the packets
> (which leaves to Nagios the sole task of writing results to a named pipe).
>
http://www.op5.org/community/plugin-inv ... cts/merlin
http://git.op5.org/git/?p=nagios/merlin ... ;hb=master
http://git.op5.org/git/?p=nagios/merlin ... ME;hb=HEAD
Make especially sure you read the first paragraph of the README.
> While tuning my configuration and creating several service groups
> (simply for display purposes), I stumbled upon several problems :
>
> 1) An actual bug : Beyond a certain number of members, Nagios simply
> fumbles at handling service checks for affected services within its
> child processes, and then reports the failure with a very misleading
> error message : "Warning : Return code 127 was out of bounds. Make sure
> the plugin you're trying to run actually exists". (when the EXACT same
> configuration, minus service groups, works perfectly fine)
>
> I haven't pinpointed the final cause for this one, and I think I have
> simply found a triggering case, but this seems to hint at a deeper
> problem in the check handling. (Additionally, the message associated
> with code 127 should be made more accurate, as I spent several days
> figuring if any combination of funny PATH environment variables and such
> could prevent the execution of my scripts)
>
> As a temporary fix for my setup, I removed the related servicegroups
> entries, and I am running fine for now, but I am hoping this will be
> fixed in a future version, as this is really more than just a small
> annoyance.
>
Disable environment macros instead. If you're not using that macro on
the command-line, your checks will continue to work. It's not a bug in
Nagios, as such, it's just that environment variables and command line
shares memory space, and that space is limited. For your 300k+ list of
servicegroup members, you exhaust that space very quickly, and check
execution fails.
>
> 2) A performance problem : The MACRO_SERVICEGROUPMEMBERS code is
> painfully slow and extremely costly in CPU performance. The attached
> patch file is my attempt at fixing the most obvious issues :
> - Repetitive malloc/realloc (I initially caught on this by ktrace-ing
> the processes and realizing Nagios was mapping/unmapping a lot of memory).
> - Repetitive string duplications and length calculations
>
> The above code has been tested for a few hours on a busy Nagios setup
> and performs much faster, as expected. (Reduction of several thousands
> of malloc/realloc calls to 1, by initally calculating the memory size to
> be allocated, thus avoiding unneeded system calls and memory areas
> duplication)
>
Nice patch. I'll apply it tomorrow when it's my Nagios day. Any chance
you could whip up something similar for HOSTGROUPMEMBERS until then?
>
> 3) Which brings me to a feature request : Nagios does not cache the
> generation output of standard macros such as service group members
> (derivated from configuration, and therefore static within any given
> Nagios process), and has to go through the process of regenerating the
> list every single time a child process is executed and environment
> macros are set. This is extremely time-consuming, and further
> performance improvements could be achieved through this.
>
Such a performance increase would come at a fairly costly price though,
since Nagios fork()'s each time it runs a check and the memory would
be duplicated to each child. Most of it should be shared on Linux, but
for Solaris, BSD and others it might prevent Nagios from running
altogether, and it would be a complete and utter waste to stash them
if environment variables are turned off and they're neve
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]