[Nagios-devel] Nagios scalability issues

Guest · Post by **Guest** » Mon Jul 19, 2004 11:31 am

This is a multipart message in MIME format.
--=_alternative 006B1FD9C2256ED6_=
Content-Type: text/plain; charset="us-ascii"

Hi Guys,

I've been using Nagios for about two years, and now it seems that it's
kinda approaching its limits.

We use our nagios server to remotely monitor 400+ hosts with a total of
~2500 services. We have some plugins running on the main monitoring
machine, but most are run on the remote machines by using the NRPE plugin.
Things have been getting slower and slower, and although we upgraded both
the hardware and the software (we're using Nagios 1.2), it's still too
slow.

I'd like to get some suggestions on improving Nagios performance. In my
opinion, the biggest problem with Nagios is the fact that all checks are
scheduled on a single machine - and this makes it not scale well.

After a bit of thought, I decided to start implementing an alternative
monitoring engine - it's a lightweight client-server system that moves the
scheduler part from the main server to the individual machines. This way
each machine schedules its ~10-20 service checks, and reports back to the
server the changes in the service status. And it fixes all of Nagios's
problem - at least all that matter to me: the main server becomes less
loaded, and the network load is also much reduced. The server part of my
application just collects the results from client machines and writes them
to a database, so that adds practically no load to the server machine. I'm
planning on writing another component that would take these results and
send them to nagios - but I have a few questions. What's the best way to
send check results to nagios? Will the external command interface work?
I'm also interested in the scalability of this feature.

I've also noticed that we are getting pretty close to Nagios 2.0 (maybe
we'll have it in another year or two

. So I'd like to ask the
developers if they are planning to implement something similar in Nagios
or in the NRPE or NSCA plugins by the time 2.0 is launched - so that I
know my work is not in vain

Thanks,
Cristian Streng.

--=_alternative 006B1FD9C2256ED6_=
Content-Type: text/html; charset="us-ascii"

Hi Guys,

I've been using Nagios for about two years, and now it seems that it's kinda approaching its limits.

We use our nagios server to remotely monitor 400+ hosts with a total of ~2500 services. We have some plugins running on the main monitoring machine, but most are run on the remote machines by using the NRPE plugin. Things have been getting slower and slower, and although we upgraded both the hardware and the software (we're using Nagios 1.2), it's still too slow.

I'd like to get some suggestions on improving Nagios performance. In my opinion, the biggest problem with Nagios is the fact that all checks are scheduled on a single machine - and this makes it not scale well.

After a bit of thought, I decided to start implementing an alternative monitoring engine - it's a lightweight client-server system that moves the scheduler part from the main server to the individual machines. This way each machine schedules its ~10-20 service checks, and reports back to the server the changes in the service status. And it fixes all of Nagios's problem - at least all that matter to me: the main server becomes less loaded, and the network load is also much reduced. The server part of my application just collects the results from client machines and writes them to a database, so that adds practically no load to the server machine. I'm planning on writing another component that would take these results and send them to nagios - but I have a few questions. What's the best way to send check results to nagios? Will the external command interface work? I'm also interested in the scalability of this feature.

I've also noticed that we are getting pr

...[email truncated]...

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]