Re: [Nagios-devel] Notification configuration (Was RFC/RFP: Service

Guest · Post by **Guest** » Wed May 18, 2011 1:31 pm

On 05/18/2011 02:12 PM, Max Schubert wrote:
> Andreas,
>
> On Tue, May 17, 2011 at 7:57 AM, Andreas Ericsson wrote:
>>> Any plans to detatch notification attributes from service / host
>>> definitions in 4.x and make them their own top-level configuration
>>> class like escalations to make it easier to scale notification
>>> definitions for large projects?
>>>
>>
>> Not really. What would such an object look like? How would it add
>> additional benefit compared to using templates for hosts and services?
>> I think if I could just see some sort of example definition of it I'd
>> get an inkling of why some seem to think it's such a great idea. Right
>> now, I see no additional benefit to it.
>
> It would look just like an escalation.

So why not just use normal escalations then? The normal case is not
the global superlarge company, but a team of admins all sharing
responsibility for a limited number of hosts.

> What doesn't work well for
> large configurations with notification policies being stuck into host
> and service objects is this scenario (which is the one we are in at
> work by design):
> * Multiple configuration editors who own various parts of the Nagios
> configuration tree - in our case this used to be one big tree, now we
> have set up separate trees for separate projects - we have about 20-30
> people who can edit their project-specific configurations.

Again not the normal case.

> * A set of services that are global in nature - service -> hostgroup
> -> host - baseline monitoring required by all projects using
> standards established by multiple organizations in our company - for
> our example, base host monitoring with an SNMP agent (6 services
> across every host) - we have other global services as well and a core
> team who develop, maintain an augment both our distributed Nagios
> software and these global services and configurations
> * A set of services that are specific to each project using our
> distributed variant of Nagios - managed by subject matter experts on
> each team.
>

> With this scenario, how do we let each group that is responsible for
> hosts that have these global services on them create individually
> tailored notification policies since there is one notification policy
> per service?
> * We configure our base service and host to 'notify' on every state
> change using the command name do_nothing
> * We created a custom patch so that when the string 'do_nothing' is
> seen in the command name this state change only increments the
> notification count - it does not trigger any external command to run

Good example of "making the unusual possible". Would it suffice to
add an internal command in Nagios so that some magic marker, such as
':' (without the quotes) causes no command to be run? The nifty part
of using the common colon as a magic thing for this is that it's sort
of backwards compatible, as it's been a builtin version of "/bin/true"
in shells since forever.

> * We created a patch (partial - no serialization to disk) for
> escalation logic that tracks in memory when a fault escalation was
> sent so that OK escalations are only sent in response to something
> that was in a fault state. We are working on completing this patch so
> that across restarts the state is saved.

Nice!

I'd implement this as an external list of contacts that have been
notified of the problem state and therefore should be notified of
the recovery. Make the list accessable through a hash table with
the object name as the key and just walk the (sorted) list of
contacts to be notified when the problem goes away and you'll have
the complete list of contacts to notify to. Unfortunately, adding
additional pointers in the object is a no-go due to ABI compatibility.

I'd happily accept such a patch in a heartbeat, as it'd remove a
bit of complexity in the current code without altering or removing
any API's that broker modules might use.

> * We have all groups use escalations to define their notification
> policies - the service and host notification commands then trigger our
> distribute

...[email truncated]...

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]