Re: [Nagios-devel] Problems with many hanging Nagios processes (Nagios
Posted: Mon Nov 28, 2005 8:10 am
[email protected] wrote:
> Hi everybody,
>
> unfortunately nobody answered to Alex from viveconsulting.co.nz who had a
> problem with "Nagios spawning rogue ..." and mailed to nagios mailing list
> some months ago.
A link to the mail archives would be helpful.
> Right now, we have the same problemn very likely he
> described in a very detailed way. I tried also a lot of different things
> (from configuration changes to tuning issues) to find out the real problem
> and I guess the real bottleneck is the pipe used for communication between
> Nagios processes.
Most likely. It's the only real bottleneck in nagios today, so...
> But I found not many reports e.g. emails about this
> problem in the web and mail archives.
>
> So why am I writing to list? Maybe someone can give me a hint, how to solve
> or workaround that problem? We have 677 services configured and use 350
> RRDs. Our Nagios CMS is a PIII 866 MHz with SCSI RAID 5. The system load is
> a little bit more than 1.00. As long as we stay below 1.00 no problem, but
> otherwise ... (Detailed problem description in Alexs' mail)
>
CMS? Content Management System?
Anyways, 677 services shouldn't be a problem.
> This is just our start with Nagios. We want to configure thousands of
> services and more than 100 hundred hosts. We would also invest in faster
> hardware, dual CPU, 2GB memory and faster SCSI HDDs but is faster hardware
> an option?
It helps, but not very much I'm afraid. The bottleneck requires a kernel
recompile to be solved on most systems, and that's a very bad thing to
do just to fix this particular problem.
> Looking at this issue with the focus on implementation: If the
> pipe is the bottleneck it will stay a bottle neck on faster hardware too.
> But maybe faster hardware will allow us to configure 3000 services, what
> would be enough for the Nagios instance. And then, we deploy another Nagios
> instance ...
>
This is definitely a solution. Otherwise you could keep your eyes open
in the somewhat near future for a mail with
[PATCH] checks: Multiplex running checks.
in the topic. I'm working on it right now, but perhaps Ethan won't let
it in for the 2.x branch since it's a fairly massive change.
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> Hi everybody,
>
> unfortunately nobody answered to Alex from viveconsulting.co.nz who had a
> problem with "Nagios spawning rogue ..." and mailed to nagios mailing list
> some months ago.
A link to the mail archives would be helpful.
> Right now, we have the same problemn very likely he
> described in a very detailed way. I tried also a lot of different things
> (from configuration changes to tuning issues) to find out the real problem
> and I guess the real bottleneck is the pipe used for communication between
> Nagios processes.
Most likely. It's the only real bottleneck in nagios today, so...
> But I found not many reports e.g. emails about this
> problem in the web and mail archives.
>
> So why am I writing to list? Maybe someone can give me a hint, how to solve
> or workaround that problem? We have 677 services configured and use 350
> RRDs. Our Nagios CMS is a PIII 866 MHz with SCSI RAID 5. The system load is
> a little bit more than 1.00. As long as we stay below 1.00 no problem, but
> otherwise ... (Detailed problem description in Alexs' mail)
>
CMS? Content Management System?
Anyways, 677 services shouldn't be a problem.
> This is just our start with Nagios. We want to configure thousands of
> services and more than 100 hundred hosts. We would also invest in faster
> hardware, dual CPU, 2GB memory and faster SCSI HDDs but is faster hardware
> an option?
It helps, but not very much I'm afraid. The bottleneck requires a kernel
recompile to be solved on most systems, and that's a very bad thing to
do just to fix this particular problem.
> Looking at this issue with the focus on implementation: If the
> pipe is the bottleneck it will stay a bottle neck on faster hardware too.
> But maybe faster hardware will allow us to configure 3000 services, what
> would be enough for the Nagios instance. And then, we deploy another Nagios
> instance ...
>
This is definitely a solution. Otherwise you could keep your eyes open
in the somewhat near future for a mail with
[PATCH] checks: Multiplex running checks.
in the topic. I'm working on it right now, but perhaps Ethan won't let
it in for the 2.x branch since it's a fairly massive change.
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]