Host/Service escalation trick...

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
gsloop
Posts: 4
Joined: Wed Jan 30, 2013 12:20 am

Host/Service escalation trick...

Post by gsloop »

I don't know if anyone has done this - it's a hard topic to search on, and I wasn't able to find anyone who addressed this issue - so i thought I'd post it here.


Essentialy the problem I had was:

Normally I'd generate three alerts, say every 5 minutes, [This was defined as a service or host escalation #1] - then no more alerts for this escalation.
Then I wanted one alert, say every hour - forever. [This is escalation #2]

But I also didn't want some alerts, say from midnight to 6a.

BUT...

When the new "day" started at 6a, I wanted alerts immediately for any services that were down - but with the above, I'd get an alert within one hour, but not right at the start of the day.

Example: Say a service went down at 2:59a. Normally I'd get three initial alerts (from escalation #1), but I'm filtering with a time period, so I won't get those inital three alerts, or the hourly - until the next 60 minute period, after 6a, rolls around -(from escalation #2). [i.e. 3:59, 4:59, 5:59 and 6:59 I won't get the 3:59a-5:59a alerts. And I will get the 6:59 alert - but I want to know the service was down at 6a, not nearly 7a!]

So, here's my solution.

Define another time period. In my case - it's from 6:00-6:15a, and for each day you want to get alerts.
[I actually have several, depending on when I want the period to start, say at 4, 6a or 8a]

Then define a third host/service escalation.
This one has a "last_notification 0"
I have the notification interval set to 5m
So, this escalation will generate alerts every 5m, for forever.
I set the escalation_period to that 6:00-6:15a time-period.

So, the notification_period will only let the notifications go out for 15 minutes each day - and I've set that time-period to be the first 15m of the larger period.

The result is: I get three alerts from any down service right at the beginning of any larger period, and then they "stop" [or, more accurately, are squelched] until tomorrow - if they're still down.

Hope that helps someone else - and perhaps this could go in a FAQ somewhere. [Provided this is novel - but I certainly wasn't able to find it with my Google-fu, or any other searching I did.]

-Greg

Keywords: notifications, service escalation, host escalation, initial notifications at the beginning of a period, escalate notification
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Host/Service escalation trick...

Post by scottwilkerson »

Greg,

Thanks for sharing!
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked