Re: [Nagios-devel] Patches for improved NEB control

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Patches for improved NEB control

Post by Guest »

Wow. We got a lot of responses.

We are going to take a few days to write up some documentation and we will
post that to the list.

Just one thing we noticed in the e-mails is that there is some confusion
between this NDO. DNX does not replace NDO. NDO is still required for
multiple data centers.

Bob


> This is a very nice patch indeed. It doesn't break anything that's
> working now, but lets module-authors get more power over how nagios
> executes checks. It's also relatively small and non-intrusive and, as a
> side-effect, it makes it possible to write plugins as modules. Overall,
> I like it.
>
> Some questions though, inlined below. Oh, and I would very much like to
> see the module. :)
>
>
> [email protected] wrote:
>> Attached is a patch-set I would like some feedback on.
>>
>> The purpose of this patch is to allow Nagios the ability to delegate the
>> execution of service checks to a NEB module.
>>
>> Why would we want to do this? I'm glad you asked...
>>
>> The point is to allow Nagios to scale efficiently in large-scale
>> environments by delegating service checks to multi-node "check"
>> clusters.
>> That is, it facilitates the creation of a Nagios Service Check Cluster
>> (or
>> multiple independent clusters,) that can be deployed in either one
>> location or multiple locations.
>>
>> The benefits are:
>>
>> 1. It de-couples Service Check execution from Scheduling on the same
>> box.
>> Sure, you can do this by setting up multiple Nagios instances that
>> report
>> their results passivley back up to the "master" Nagios box, but that
>> requires manually splitting up you configuration among multiple Nagios
>> instances, setting up all of the passive result reporting, etc.
>>
>> In this scenario, you can keep your centrally-located master
>> configuration
>> file and have the service check distributed to light-weight,
>> geographically-dispersed service check clusters.
>>
>
> How does the module determine which node checks what?
> How is configuration distributed?
>
>> 2. Scalability. You can support more simultaneous service checks by
>> adding more light-weight service check nodes incrementally.
>>
>
> Do you have to restart the "master" nagios in order for this to work, or
> will they be picked up as one goes along?
> If "picked up as one goes along", how does handshake and authentication
> work?
>
>> You can start with zero external nodes (i.e., all checks still executed
>> by
>> Nagios internally.) Then add one node as you service check count
>> increases. Then gradually (or quickly,) increase the node count,
>> locally
>> or remotely, as your service check count grows, and the system will
>> scale
>> appropriately.
>>
>> Anyway, it's not the ultimate, end-all, be-all, but we have found it
>> helps
>> us scale and manage Nagios efficiently in our large-scale,
>> multi-datacenter environment. The hope is that this will be considered
>> as
>> a potential part of the new Nagios architecture some day.
>>
>> For those who want to know how Nagios actually delegates service check
>> execution to an external cluster via a NEB module, here are the
>> high-level
>> details:
>>
>> We have written a multi-threaded NEB module that registers a
>> NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
>> NEBTYPE_SERVICECHECK_INITIATE event.
>>
>> It then takes each service check and distributes it across the network
>> to
>> multiple "worker" nodes in a cluster (via XML-RPC). It also takes care
>> of
>> processing the check results, posting them to the internal Nagios result
>> queue, plugin timeout conditions, etc.
>>
>
> Does this go through the FIFO pipe? If so, I'm afraid it doesn't solve
> the biggest issue in scaling Nagios to large networks.
>
> --
> Andreas Ericsson [email protected]
> OP5 AB www.op5.se
> Tel: +46 8-230225 Fax: +46 8-230231
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integra

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked