Re: [Nagios-devel] Patches for improved NEB control

Guest · Post by **Guest** » Thu Oct 26, 2006 7:43 am

Hi Bob,

this sounds really good for "advanced distribute monitoring".

Perhaps you can write a litte bit more about what to do, if possible, if
you want to include this into failover monitoring for redundance purposes?

Hendrik

[email protected] schrieb:
> Attached is a patch-set I would like some feedback on.
>
> The purpose of this patch is to allow Nagios the ability to delegate the
> execution of service checks to a NEB module.
>
> Why would we want to do this? I'm glad you asked...
>
> The point is to allow Nagios to scale efficiently in large-scale
> environments by delegating service checks to multi-node "check" clusters.
> That is, it facilitates the creation of a Nagios Service Check Cluster (or
> multiple independent clusters,) that can be deployed in either one
> location or multiple locations.
>
> The benefits are:
>
> 1. It de-couples Service Check execution from Scheduling on the same box.
> Sure, you can do this by setting up multiple Nagios instances that report
> their results passivley back up to the "master" Nagios box, but that
> requires manually splitting up you configuration among multiple Nagios
> instances, setting up all of the passive result reporting, etc.
>
> In this scenario, you can keep your centrally-located master configuration
> file and have the service check distributed to light-weight,
> geographically-dispersed service check clusters.
>
> 2. Scalability. You can support more simultaneous service checks by
> adding more light-weight service check nodes incrementally.
>
> You can start with zero external nodes (i.e., all checks still executed by
> Nagios internally.) Then add one node as you service check count
> increases. Then gradually (or quickly,) increase the node count, locally
> or remotely, as your service check count grows, and the system will scale
> appropriately.
>
> Anyway, it's not the ultimate, end-all, be-all, but we have found it helps
> us scale and manage Nagios efficiently in our large-scale,
> multi-datacenter environment. The hope is that this will be considered as
> a potential part of the new Nagios architecture some day.
>
> For those who want to know how Nagios actually delegates service check
> execution to an external cluster via a NEB module, here are the high-level
> details:
>
> We have written a multi-threaded NEB module that registers a
> NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
> NEBTYPE_SERVICECHECK_INITIATE event.
>
> It then takes each service check and distributes it across the network to
> multiple "worker" nodes in a cluster (via XML-RPC). It also takes care of
> processing the check results, posting them to the internal Nagios result
> queue, plugin timeout conditions, etc.
>
> The way this works is that Nagios now checks the return code from NEB
> modules who are registered for the NEBCALLBACK_SERVICE_CHECK_DATA event.
>
> If the NEB module returns the "new" NEBERROR_CALLBACKOVERRIDE result code,
> Nagios "delegates" execution of the service check to the NEB module.
> Otherwise, Nagios continues to execute the service check itself, as it
> normally does.
>
> So, the attached patch files enable this functionality.
>
> Note that this patch set does not include our multi-threaded NEB module
> (if you're interested in that, just e-mail me - it's meant to be open
> source.) It just includes the patches to allow a NEB modules to override
> service check execution.
>
> This should be a pretty straightforward patch, and doesn't modify any
> functionality in the absence of the broker. We just need it to expand the
> flexibility of what a NEB module can do.
>
> Thanks,
> Bob
>
> ------------------------------------------------------------------------
>
> --- /home/icsrwi/proj/nagios-2.4-ORIG/base/broker.c 2005-12-23 12:31:35.000000000 -0700
> +++ broker.c 2006-08-16 11:25:51.597024488 -0600
> @@ -293,17 +293,18 @@
>
>
> /* send service check data to broker */
> -void broker_service_check(int type, int flags, int attr, service *svc, int check_type, struct timeval start_time, struct timeval end_time, char *command, double laten

...[email truncated]...

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]