Re: [Nagios-devel] Patches for improved NEB control

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Patches for improved NEB control

Post by Guest »

Hi Bob,

this sounds really good for "advanced distribute monitoring".

Perhaps you can write a litte bit more about what to do, if possible, if
you want to include this into failover monitoring for redundance purposes?

Hendrik

[email protected] schrieb:
> Attached is a patch-set I would like some feedback on.
>
> The purpose of this patch is to allow Nagios the ability to delegate the
> execution of service checks to a NEB module.
>
> Why would we want to do this? I'm glad you asked...
>
> The point is to allow Nagios to scale efficiently in large-scale
> environments by delegating service checks to multi-node "check" clusters.
> That is, it facilitates the creation of a Nagios Service Check Cluster (or
> multiple independent clusters,) that can be deployed in either one
> location or multiple locations.
>
> The benefits are:
>
> 1. It de-couples Service Check execution from Scheduling on the same box.
> Sure, you can do this by setting up multiple Nagios instances that report
> their results passivley back up to the "master" Nagios box, but that
> requires manually splitting up you configuration among multiple Nagios
> instances, setting up all of the passive result reporting, etc.
>
> In this scenario, you can keep your centrally-located master configuration
> file and have the service check distributed to light-weight,
> geographically-dispersed service check clusters.
>
> 2. Scalability. You can support more simultaneous service checks by
> adding more light-weight service check nodes incrementally.
>
> You can start with zero external nodes (i.e., all checks still executed by
> Nagios internally.) Then add one node as you service check count
> increases. Then gradually (or quickly,) increase the node count, locally
> or remotely, as your service check count grows, and the system will scale
> appropriately.
>
> Anyway, it's not the ultimate, end-all, be-all, but we have found it helps
> us scale and manage Nagios efficiently in our large-scale,
> multi-datacenter environment. The hope is that this will be considered as
> a potential part of the new Nagios architecture some day.
>
> For those who want to know how Nagios actually delegates service check
> execution to an external cluster via a NEB module, here are the high-level
> details:
>
> We have written a multi-threaded NEB module that registers a
> NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
> NEBTYPE_SERVICECHECK_INITIATE event.
>
> It then takes each service check and distributes it across the network to
> multiple "worker" nodes in a cluster (via XML-RPC). It also takes care of
> processing the check results, posting them to the internal Nagios result
> queue, plugin timeout conditions, etc.
>
> The way this works is that Nagios now checks the return code from NEB
> modules who are registered for the NEBCALLBACK_SERVICE_CHECK_DATA event.
>
> If the NEB module returns the "new" NEBERROR_CALLBACKOVERRIDE result code,
> Nagios "delegates" execution of the service check to the NEB module.
> Otherwise, Nagios continues to execute the service check itself, as it
> normally does.
>
> So, the attached patch files enable this functionality.
>
> Note that this patch set does not include our multi-threaded NEB module
> (if you're interested in that, just e-mail me - it's meant to be open
> source.) It just includes the patches to allow a NEB modules to override
> service check execution.
>
> This should be a pretty straightforward patch, and doesn't modify any
> functionality in the absence of the broker. We just need it to expand the
> flexibility of what a NEB module can do.
>
> Thanks,
> Bob
>
> ------------------------------------------------------------------------
>
> --- /home/icsrwi/proj/nagios-2.4-ORIG/base/broker.c 2005-12-23 12:31:35.000000000 -0700
> +++ broker.c 2006-08-16 11:25:51.597024488 -0600
> @@ -293,17 +293,18 @@
>
>
> /* send service check data to broker */
> -void broker_service_check(int type, int flags, int attr, service *svc, int check_type, struct timeval start_time, struct timeval end_time, char *command, double laten

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked