[Nagios-devel] Core 4 Remote Workers

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Core 4 Remote Workers

Post by Guest »

All,

I've been giving some thought to remote workers for core 4 and wanted to
run those thoughts by this list. I see remote workers as a very useful
extension to the worker concept in core 4.

To implement remote workers, I think there are about 4 basic things that
would need to be done.
1. Implement the ability to listen to multiple query handler interfaces
(precursor to #2)
2. Implement the ability to create and listen on TCP socket query
handler interfaces.
3. Add a host key to the worker registration to allow workers to specify
the host(s) for which it will handle checks.
4. Write a stand-alone remote worker that can connect to the core
instance via TCP.

The reason I have steps 1 and 2, instead of combining them is first,
because a generalized solution is more extensible and second, I think
having multiple TCP listeners is a reasonable use case where you have a
multi-homed system, but you may not want to listen on all interfaces.

The host key should be allowed to specify one or more IP addresses, IP
subnets, contiguous IP address ranges, host names and host name
patterns/wildcards (i.e. *.example.com). If multiple workers register
for the same host, some sort of distribution mechanism should be used to
load balance the workers.

Using the second criteria of host to determine which worker gets the
check raises the question of the order of precedence for the criteria.
Initially, I think the host should have precedence over plugin, but I
can see implementing and order of precedence option in the core
configuration file. This would be more important if additional worker
selection criteria were added.

The communication between the remote worker and the core process should
be able to be protected by SSL. The remote worker will need a mechanism
to retry the connection in the event the network drops the connection.

I realize this is a sizable change and we may not want it to happen
before the release of 4.0. Thoughts on this are welcome.

Further down the road, I can see developing a remote worker proxy, whose
sole job is to broker the communication between core and even more
remote workers. This would enable a tree-shaped worker hierarchy for
monitoring environments that are both large and dispersed geographically
and/or topologically. This would require a re-registration process so
the proxy workers could keep core updated with their abilities as
leaf-node workers connected and disconnected.

Thoughts?

--
Eric Stanley
___
Developer
Nagios Enterprises, LLC
Email:estanley@nagios.com
Web:www.nagios.com






This post was automatically imported from historical nagios-devel mailing list archives
Original poster: estanley@nagios.com
Locked