Page 1 of 1

Recovery notifications after escalation

Posted: Mon Apr 30, 2012 10:14 pm
by mrichards
Our org wants to alert a different contact per escalation level (for a large group of non critical servers), e.g.

First alert to contact1
Second alert to contact2 (15 mins later)
Third alert to contact3 (60 mins later)
Fourth and subsequent alerts to contact1,contact2,contact3 (every 24 hours)

This notification scheme has been setup and alert notifications are being delivered.

The issue is that we want to ensure that all contacts notified of an alert also receive notification of a recovery, but recovery notifications are only being delivered to contacts for the current escalation level. The behavior we've observed is in line with the Notification Escalations doco which mentions recovery notifications:

> If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery.

It seems that our policy of alerting a different contact(group) per escalation level is at the heart of the issue here, and may be in opposition to how the Nagios developers intended for escalations to work. The Notification Escalations doco also says:

> When defining notification escalations, it is important to keep in mind that any contact groups that were members of "lower" escalations (i.e. those with lower notification number ranges) should also be included in "higher" escalation definitions. This should be done to ensure that anyone who gets notified of a problem continues to get notified as the problem is escalated.

The motivation behind alerting a different contact(group) per escalation level is to reduce the number of alert emails received by individual contacts. Once a contact has been notified about an issue we want to be able to escalate the issue to then next contact without generating an additional alert email to the first contact for an issue they've already been alerted to.

Does anyone have any suggestions about how to resolve our problem with ensuring all alerted contacts also receive recovery notifications, or suggestions of alternative ways to prevent contacts from being notified too frequently about an issue (I know that last bit might sound crazy to some people :) )

Thanks.

Re: Recovery notifications after escalation

Posted: Tue May 01, 2012 7:20 pm
by jsmurphy
To be honest I'm struggling to think of an apt work around for this problem... you might be able to create some kind of notification handler that tracks who has been emailed so far and then notify everyone once an OK state is sent for that particular alert chain, but it seems like a lot of work for such a small gain.

I can't think of another way that allows you to keep your current alerting methodology and gives you that functionality, you would pretty much have to do as it suggests in the doco.

Re: Recovery notifications after escalation

Posted: Wed May 02, 2012 9:44 am
by agriffin
Yup, as jsmurphy said, I don't think you'll be able to get this behavior from built-in settings. You'll have to do some kind of scripting or programming, either in the notification handler, nagios' source, or your email gateway.

Re: Recovery notifications after escalation

Posted: Wed May 02, 2012 7:22 pm
by mrichards
Thanks for your responses.

It's important to us that everyone involved with an issue receives notifications of all types. As we would prefer to stick with Nagios's native features I am going to use the more common configuration of including contacts from lower escalation levels in higher levels and approach the "issue" of too many emails by enabling acknowledgment via email and include links to our Nagios FAQ in the alert emails to try and promote use of the scheduled downtime and alert acknowledgment features.