Issue with service escalations (treat ack different)

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
rniesten
Posts: 4
Joined: Mon Sep 03, 2018 10:16 am
Location: Maastricht

Issue with service escalations (treat ack different)

Post by rniesten »

What I'm trying to achieve is the following:
Send out a message to Slack when a service changes to warning or critical around the clock. Repeat this notification every day until it is acknowledged or changed to OK.

After playing around with escalations I was finally able to manage using the following strategy:
- create a copy of the contact slack user with notification period "workhours"
- create an escalation with first_notification 1 that repeats every 8 hours (a working day has 8 working hours, so basically repeating every day)

define contact{
contact_name slack-workhours
use generic-contact
alias Slack Channel for repeated notifications (during working hours)
service_notification_commands notify-service-by-slack
host_notification_commands notify-host-by-slack
service_notification_period workhours
}


define serviceescalation{
host_name *
service_description *
# hostgroup_name !prod-env ; escalation is not valid for production environment
first_notification 2 ; after the first notification this escalation should kick in ..
last_notification 0 ; .. and repeat forever
contacts slack-workhours ; send notification to slack-workhours (which accepts notifications only durin workhours)
notification_interval 480 ; problem notification should be repeated every day (8 hours working day)
escalation_period 24x7 ; this escalation is valid 24x7 ..
escalation_options w,c,u ; .. for the states warning, critical and unknown
}


This works perfectly for warning, critical, unknown and OK. Alerts outside business hours are send to Slack (but repeated notification are only send during business hours) and recovery notifications are also send during the night. This way looking in Slack gives sufficient information outside business hours without the need to check Nagios :-).

Now the "BUT"... Acknowledgements are suppressed as well :-(

If an engineer acknowledges a service alert, he is busy with this issue. I don't want that other engineers are informed and don't need to check Nagios finding out that someone else is already looking into it...
How can I achieve the use case described above while acknowledgements are being sent out outside business hours.

Thanks in advance.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Issue with service escalations (treat ack different)

Post by ssax »

Because you are not notifying outside of business hours the acknowledgements are going to be suppressed during that time, I don't think you will be able to do what you're trying to do without opening the notification hours outside of business hours, that's the only way the acknowledgments will be sent.
rniesten
Posts: 4
Joined: Mon Sep 03, 2018 10:16 am
Location: Maastricht

Re: Issue with service escalations (treat ack different)

Post by rniesten »

Thanks for your answer.

I've workaround it the following way:
- I changed the escalation to start with the 3rd escalation instead of the second.
- I changed to notification time of the services to 8 hours

This way a first notification is sent out via the service. The engineers are supposed to acknowledge the service on short notice, so in a "normal" situation the engineer will acknowledge the alert (which causes a notification) and all further notifications are suppressed. If the engineer doesn't acknowledge the alert, a second problem notification will sent out after 8 hours (and than the escalation kicks in). Acknowledgements after this moment do NOT notify outside business hours.
It's the the ideal situation, but the team agreed this was a workable situation.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Issue with service escalations (treat ack different)

Post by ssax »

I'm glad you were able to find an acceptable solution, am I okay to mark this as resolved and lock the topic?
Locked