Re: [Nagios-devel] Problems with many hanging Nagios processes

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Problems with many hanging Nagios processes

Post by Guest »

Hi Ton!

> Here is what we did to resolve.
>
> 1. Edit the include/nagios.h.in
> change
> #define COMMAND_BUFFER_SLOTS 1024
> to
> #define COMMAND_BUFFER_SLOTS 60000
>
> And change
> #define SERVICE_BUFFER_SLOTS 1024
> to
> #define SERVICE_BUFFER_SLOTS 60000
>
>
>
> I was intrigued by this as we have a performance issue, but not with the
> same symptoms. Our problem is that NSCA processes increase when the nagios
> server is under load. They appear to be blocking on writing to the command
> pipe. Switching NSCA to single daemon mitigates the problem (slaves will
> timeout their passive results), but we wanted to know where any slow downs
> could be.

We had the NSCA related performance issues too.
We started writing to a file on the slaves, the results it gets to be
forwarded to master.
Then once every 10 or 15 seconds, send that file over to master.



On 12/21/06, Ton Voon wrote:
> Hi Mahesh,
>
>
> On 19 Dec 2006, at 00:42, Mahesh Kunjal wrote:
>
> Here is what we did to resolve.
>
> 1. Edit the include/nagios.h.in
> change
> #define COMMAND_BUFFER_SLOTS 1024
> to
> #define COMMAND_BUFFER_SLOTS 60000
>
> And change
> #define SERVICE_BUFFER_SLOTS 1024
> to
> #define SERVICE_BUFFER_SLOTS 60000
>
>
>
> I was intrigued by this as we have a performance issue, but not with the
> same symptoms. Our problem is that NSCA processes increase when the nagios
> server is under load. They appear to be blocking on writing to the command
> pipe. Switching NSCA to single daemon mitigates the problem (slaves will
> timeout their passive results), but we wanted to know where any slow downs
> could be.
>
> From your findings, we've created a performance static patch, attached. This
> collects the maximum and current values for the command and service buffer
> slots and is then written to status.dat (by default every 10 seconds). What
> I found with a fake slave sending 128 results every 5 seconds was that the
> maximum values were fairly low (under 100), but when I put the server under
> load, the maximum_command_buffer_items shot up to 1969 and the
> maximum_service_buffer_items shot up to 2156 (had changed from defaults to
> your 60000).
>
> This could show if the buffer is filled at various points or if there is not
> enough data ready for Nagios to process further down the chain.
>
> I'd be interested in figures from other systems.
>
> Warning: the patch is not thread safe, so there is no guarantees that the
> statistic data will not be corrupted (but should not affect usual Nagios
> operation). Applies onto Nagios 2.5. Tested on Debian with 2.6 kernel.
>
> Ton
>
> http://www.altinity.com
> T: +44 (0)870 787 9243
> F: +44 (0)845 280 1725
> Skype: tonvoon
>
>
>
>
>
>
>
>





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked