Re: [Nagios-devel] Problems with many hanging Nagios processes

Guest · Post by **Guest** » Thu Dec 21, 2006 8:47 am

Hi Ton!

> Here is what we did to resolve.
>
> 1. Edit the include/nagios.h.in
> change
> #define COMMAND_BUFFER_SLOTS 1024
> to
> #define COMMAND_BUFFER_SLOTS 60000
>
> And change
> #define SERVICE_BUFFER_SLOTS 1024
> to
> #define SERVICE_BUFFER_SLOTS 60000
>
>
>
> I was intrigued by this as we have a performance issue, but not with the
> same symptoms. Our problem is that NSCA processes increase when the nagios
> server is under load. They appear to be blocking on writing to the command
> pipe. Switching NSCA to single daemon mitigates the problem (slaves will
> timeout their passive results), but we wanted to know where any slow downs
> could be.

We had the NSCA related performance issues too.
We started writing to a file on the slaves, the results it gets to be
forwarded to master.
Then once every 10 or 15 seconds, send that file over to master.

On 12/21/06, Ton Voon wrote:
> Hi Mahesh,
>
>
> On 19 Dec 2006, at 00:42, Mahesh Kunjal wrote:
>
> Here is what we did to resolve.
>
> 1. Edit the include/nagios.h.in
> change
> #define COMMAND_BUFFER_SLOTS 1024
> to
> #define COMMAND_BUFFER_SLOTS 60000
>
> And change
> #define SERVICE_BUFFER_SLOTS 1024
> to
> #define SERVICE_BUFFER_SLOTS 60000
>
>
>
> I was intrigued by this as we have a performance issue, but not with the
> same symptoms. Our problem is that NSCA processes increase when the nagios
> server is under load. They appear to be blocking on writing to the command
> pipe. Switching NSCA to single daemon mitigates the problem (slaves will
> timeout their passive results), but we wanted to know where any slow downs
> could be.
>
> From your findings, we've created a performance static patch, attached. This
> collects the maximum and current values for the command and service buffer
> slots and is then written to status.dat (by default every 10 seconds). What
> I found with a fake slave sending 128 results every 5 seconds was that the
> maximum values were fairly low (under 100), but when I put the server under
> load, the maximum_command_buffer_items shot up to 1969 and the
> maximum_service_buffer_items shot up to 2156 (had changed from defaults to
> your 60000).
>
> This could show if the buffer is filled at various points or if there is not
> enough data ready for Nagios to process further down the chain.
>
> I'd be interested in figures from other systems.
>
> Warning: the patch is not thread safe, so there is no guarantees that the
> statistic data will not be corrupted (but should not affect usual Nagios
> operation). Applies onto Nagios 2.5. Tested on Debian with 2.6 kernel.
>
> Ton
>
> http://www.altinity.com
> T: +44 (0)870 787 9243
> F: +44 (0)845 280 1725
> Skype: tonvoon
>
>
>
>
>
>
>
>

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]