Re: [Nagios-devel] Notification configuration (Was RFC/RFP: Service

Guest · Post by **Guest** » Wed May 18, 2011 11:12 am

Andreas,

On Tue, May 17, 2011 at 7:57 AM, Andreas Ericsson wrote:
>> Any plans to detatch notification attributes from service / host
>> definitions in 4.x and make them their own top-level configuration
>> class like escalations =A0to make it easier to scale notification
>> definitions for large projects?
>>
>
> Not really. What would such an object look like? How would it add
> additional benefit compared to using templates for hosts and services?
> I think if I could just see some sort of example definition of it I'd
> get an inkling of why some seem to think it's such a great idea. Right
> now, I see no additional benefit to it.

It would look just like an escalation. What doesn't work well for
large configurations with notification policies being stuck into host
and service objects is this scenario (which is the one we are in at
work by design):
* Multiple configuration editors who own various parts of the Nagios
configuration tree - in our case this used to be one big tree, now we
have set up separate trees for separate projects - we have about 20-30
people who can edit their project-specific configurations.
* A set of services that are global in nature - service -> hostgroup
-> host - baseline monitoring required by all projects using
standards established by multiple organizations in our company - for
our example, base host monitoring with an SNMP agent (6 services
across every host) - we have other global services as well and a core
team who develop, maintain an augment both our distributed Nagios
software and these global services and configurations
* A set of services that are specific to each project using our
distributed variant of Nagios - managed by subject matter experts on
each team.

With this scenario, how do we let each group that is responsible for
hosts that have these global services on them create individually
tailored notification policies since there is one notification policy
per service?
* We configure our base service and host to 'notify' on every state
change using the command name do_nothing
* We created a custom patch so that when the string 'do_nothing' is
seen in the command name this state change only increments the
notification count - it does not trigger any external command to run
* We created a patch (partial - no serialization to disk) for
escalation logic that tracks in memory when a fault escalation was
sent so that OK escalations are only sent in response to something
that was in a fault state. We are working on completing this patch so
that across restarts the state is saved.
* We have all groups use escalations to define their notification
policies - the service and host notification commands then trigger our
distributed pollers to send escalation requests to a network-based
notification service we have that then lets the notification requests
trigger email, SMS, SNMP traps, etc without having to re-configure
Nagios for every notification transport /. method change.

Yeah, it is very ugly, and why? Because 1 notification policy per
service, that doesn't scale well when taking advantage of service ->
hostgroup -> host mappings, which is a critical pattern to use when
scaling a configuration.

We have over 9000 hosts being monitored by our distributed framework
(and growing) with around 30 configuration editors and 120+ users.
Our distributed framework was centralized and a ''one project for all"
but now is a cluster of distributed set ups, one distributed set up
per project, which is scaling nicely. Our largest distributed
installations have 3900 and 5100 hosts in them respectively - we have
4 other distributed instances that are just getting ramped up and only
have a few dozen hosts apiece at this point.

So while this is ugly, it works! All editors can define escalation
objects that take into account both their individual needs for global
service notifications as well as any project-specific notifications -
and by putting project-specific hosts in project-specific host groups,
for most groups, two escalation policy definitions are all that are
needed per

...[email truncated]...

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]