Page 1 of 1

Re: [Nagios-devel] A different way?

Posted: Wed Oct 14, 2009 8:04 am
by Guest
On 10/12/2009 09:28 PM, Gaspar, Carson wrote:
> Apologies for replying to this thread rather late, but I figured I
> should speak up, as someone who has implemented a distributed design.
> More apologies for hellish Outlook quoting, which I have attempted to
> make legible :-(
>

I just rewrapped everything now. The lines were somewhat in excess of
400 characters. Sorry about that, but I couldn't read the mail while
editing the reply otherwise.

> -----Original Message----- From: Andreas Ericsson [mailto:[email protected]]
>
>> On 09/25/2009 01:05 AM, Steven D. Morrey wrote:
>>> The checks are already executing on the local machine, so how
>>> about a daemon on each machine, the daemon would keep the
>>> schedule and execute service checks locally, processing the
>>> result and returning the results and the required actions (based
>>> on a local policy) to nagios which would then do the actual work
>>> of handling notifications etc and so forth. This way nagios could
>>> be an auditor, if it doesn't receive a result on time as
>>> expected, then it could query the daemon to see whats gone wrong,
>>> if that fails then it could initiate a host check, etc.
>
> I see 2 or 3 major differences between your proposal and the current
> passive schemes:
> - Nagios can more easily poke "lost" systems (you
> can do this now with UNKNOWN and some clever notification&
> escalation configs, or possibly with obsessing, but it's far more
> obscure and convoluted)
> - If I understand you, you're also proposing
> pushing the flap detection logic (and possibly more, but determining
> what else has no off-host dependencies is difficult - dependency
> checks would need to be central, for example)
> - It would be possible
> for Nagios to act as a configuration management system for the
> monitoring config of the remote nodes, instead of requiring some
> outboard system
>

I don't know which one of these attributes you see in which solution,
so I really can't comment on them.

>> Nagios still needs to retain the ability to execute checks on its
>> own, or it won't be able to monitor things like routers and
>> switches.
>
> No, it doesn't. You can monitor those things via plug-ins that run on
> worker nodes. This is _especially_ important for things like latency
> monitoring, where you may want your probe point to be a different
> place on the network than you Nagios server.
>

You quoted this out of context. The paragraph just above it is really
the important one. Since each agent-daemon would act as a very small
Nagios daemon, "Nagios" in this sense can be any of the multitudes of
Nagios daemons. The missing paragraph was this, btw:

"I'm all for it, provided network checks can still be done from afar
and I don't have to fiddle with a lot of configuration to figure out
which ones are which. That's where this all breaks down though."

>> The two important savings can be had anyway by simply adding more
>> systems, and that doesn't involve modifying the monitored systems
>> at all (unless one wants to install a local agent to get more
>> detailed monitoring data, ofcourse). Networks that are large enough
>> to require multiple Nagios servers are almost invariably owned by
>> large corporations which have no qualms at all about paying an
>> additional $5.000 for a new server, but often have policies and
>> laws regulating what kind of software they're allowed to run on
>> their systems.
>
>> I think we'll gain very, very little by moving down this road.
>> Should we decide, at some point in the future, that it's a good
>> thing to do, I'm sure the Merlin protocol can be (ab)used to make
>> such a daemon workable though.
>
> Speaking as someone that actually works at one of those "large
> corporations" (and has worked at several others), You're smoking
> crack. We care deeply about bad scaling, and are not willing to buy
> 100 servers (not an exaggeration for 2.x, probably more like 20-40
> servers for 3.x) to fix bad code design. If I hadn't written a
> passive check framework, we would never have been able to deploy
> Nagios.
>

Let's say 30 servers f

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]