Re: [Nagios-devel] A different way?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] A different way?

Post by Guest »

On 10/12/2009 09:28 PM, Gaspar, Carson wrote:
> Apologies for replying to this thread rather late, but I figured I
> should speak up, as someone who has implemented a distributed design.
> More apologies for hellish Outlook quoting, which I have attempted to
> make legible :-(
>

I just rewrapped everything now. The lines were somewhat in excess of
400 characters. Sorry about that, but I couldn't read the mail while
editing the reply otherwise.

> -----Original Message----- From: Andreas Ericsson [mailto:[email protected]]
>
>> On 09/25/2009 01:05 AM, Steven D. Morrey wrote:
>>> The checks are already executing on the local machine, so how
>>> about a daemon on each machine, the daemon would keep the
>>> schedule and execute service checks locally, processing the
>>> result and returning the results and the required actions (based
>>> on a local policy) to nagios which would then do the actual work
>>> of handling notifications etc and so forth. This way nagios could
>>> be an auditor, if it doesn't receive a result on time as
>>> expected, then it could query the daemon to see whats gone wrong,
>>> if that fails then it could initiate a host check, etc.
>
> I see 2 or 3 major differences between your proposal and the current
> passive schemes:
> - Nagios can more easily poke "lost" systems (you
> can do this now with UNKNOWN and some clever notification&
> escalation configs, or possibly with obsessing, but it's far more
> obscure and convoluted)
> - If I understand you, you're also proposing
> pushing the flap detection logic (and possibly more, but determining
> what else has no off-host dependencies is difficult - dependency
> checks would need to be central, for example)
> - It would be possible
> for Nagios to act as a configuration management system for the
> monitoring config of the remote nodes, instead of requiring some
> outboard system
>

I don't know which one of these attributes you see in which solution,
so I really can't comment on them.

>> Nagios still needs to retain the ability to execute checks on its
>> own, or it won't be able to monitor things like routers and
>> switches.
>
> No, it doesn't. You can monitor those things via plug-ins that run on
> worker nodes. This is _especially_ important for things like latency
> monitoring, where you may want your probe point to be a different
> place on the network than you Nagios server.
>

You quoted this out of context. The paragraph just above it is really
the important one. Since each agent-daemon would act as a very small
Nagios daemon, "Nagios" in this sense can be any of the multitudes of
Nagios daemons. The missing paragraph was this, btw:

"I'm all for it, provided network checks can still be done from afar
and I don't have to fiddle with a lot of configuration to figure out
which ones are which. That's where this all breaks down though."

>> The two important savings can be had anyway by simply adding more
>> systems, and that doesn't involve modifying the monitored systems
>> at all (unless one wants to install a local agent to get more
>> detailed monitoring data, ofcourse). Networks that are large enough
>> to require multiple Nagios servers are almost invariably owned by
>> large corporations which have no qualms at all about paying an
>> additional $5.000 for a new server, but often have policies and
>> laws regulating what kind of software they're allowed to run on
>> their systems.
>
>> I think we'll gain very, very little by moving down this road.
>> Should we decide, at some point in the future, that it's a good
>> thing to do, I'm sure the Merlin protocol can be (ab)used to make
>> such a daemon workable though.
>
> Speaking as someone that actually works at one of those "large
> corporations" (and has worked at several others), You're smoking
> crack. We care deeply about bad scaling, and are not willing to buy
> 100 servers (not an exaggeration for 2.x, probably more like 20-40
> servers for 3.x) to fix bad code design. If I hadn't written a
> passive check framework, we would never have been able to deploy
> Nagios.
>

Let's say 30 servers f

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked