Recovery notifications after escalation
Posted: Mon Apr 30, 2012 10:14 pm
Our org wants to alert a different contact per escalation level (for a large group of non critical servers), e.g.
First alert to contact1
Second alert to contact2 (15 mins later)
Third alert to contact3 (60 mins later)
Fourth and subsequent alerts to contact1,contact2,contact3 (every 24 hours)
This notification scheme has been setup and alert notifications are being delivered.
The issue is that we want to ensure that all contacts notified of an alert also receive notification of a recovery, but recovery notifications are only being delivered to contacts for the current escalation level. The behavior we've observed is in line with the Notification Escalations doco which mentions recovery notifications:
> If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery.
It seems that our policy of alerting a different contact(group) per escalation level is at the heart of the issue here, and may be in opposition to how the Nagios developers intended for escalations to work. The Notification Escalations doco also says:
> When defining notification escalations, it is important to keep in mind that any contact groups that were members of "lower" escalations (i.e. those with lower notification number ranges) should also be included in "higher" escalation definitions. This should be done to ensure that anyone who gets notified of a problem continues to get notified as the problem is escalated.
The motivation behind alerting a different contact(group) per escalation level is to reduce the number of alert emails received by individual contacts. Once a contact has been notified about an issue we want to be able to escalate the issue to then next contact without generating an additional alert email to the first contact for an issue they've already been alerted to.
Does anyone have any suggestions about how to resolve our problem with ensuring all alerted contacts also receive recovery notifications, or suggestions of alternative ways to prevent contacts from being notified too frequently about an issue (I know that last bit might sound crazy to some people
)
Thanks.
First alert to contact1
Second alert to contact2 (15 mins later)
Third alert to contact3 (60 mins later)
Fourth and subsequent alerts to contact1,contact2,contact3 (every 24 hours)
This notification scheme has been setup and alert notifications are being delivered.
The issue is that we want to ensure that all contacts notified of an alert also receive notification of a recovery, but recovery notifications are only being delivered to contacts for the current escalation level. The behavior we've observed is in line with the Notification Escalations doco which mentions recovery notifications:
> If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery.
It seems that our policy of alerting a different contact(group) per escalation level is at the heart of the issue here, and may be in opposition to how the Nagios developers intended for escalations to work. The Notification Escalations doco also says:
> When defining notification escalations, it is important to keep in mind that any contact groups that were members of "lower" escalations (i.e. those with lower notification number ranges) should also be included in "higher" escalation definitions. This should be done to ensure that anyone who gets notified of a problem continues to get notified as the problem is escalated.
The motivation behind alerting a different contact(group) per escalation level is to reduce the number of alert emails received by individual contacts. Once a contact has been notified about an issue we want to be able to escalate the issue to then next contact without generating an additional alert email to the first contact for an issue they've already been alerted to.
Does anyone have any suggestions about how to resolve our problem with ensuring all alerted contacts also receive recovery notifications, or suggestions of alternative ways to prevent contacts from being notified too frequently about an issue (I know that last bit might sound crazy to some people
Thanks.