Alert/notification delay issues

c.slagel · Post by **c.slagel** » Mon Jan 07, 2013 5:30 pm

I'm having trouble figuring out why my notification delay isn't working the way I am hoping it to work...

Currently I have http content monitors as services under a host.

Under alert settings the host's have "notifications enabled" as "on" and appropriate contact groups assigned.
Also first notification delay is set to 13 minutes.

Each service has "notifications enabled" set to "skip".

The content monitors will re-check every 3 minutes on fail for 3 attempts, and upon entering a hard state fire off a service restart event handler.

However I'm still getting the alert that it's entering a hard "critical" state and it's not waiting out the 13 minutes to give the event handler time to run.

Should I have the alert setting setup differently to achieve receiving these premature alerts?

Thanks!

scottwilkerson · Post by **scottwilkerson** » Tue Jan 08, 2013 11:49 am

The time for "first notification delay" is timed based on the last known OK state

not from the first failure

So if you normally check on 5 minute intervals the service would reach a HARD state at

Code: Select all

5 + 3 + 3 + 3 = 14

Being you have your "first notification delay" set to 13 it would be sent immediately.

If you want the notification to go out 13 minutes after it goes into a hard state, set it to 27.

c.slagel · Post by **c.slagel** » Fri Feb 01, 2013 12:40 pm

Revisiting this, having issues again.

Currently setup like this:

Service Check Settings:

Check every 5 min
Retry interval 3 min
Retries 3 times

Screen Shot 2013-02-01 at 9.31.23 AM.png

So that's 14 min

Upon entering a hard state it kicks off an event handler to restart the service with a 2 min delay between service stop and start.

So a total of 16 minutes should pass between last known good state and when the service comes back up.

As far as ALERT settings, I have these set on each HOST,

First Notification delay set to 20 minutes just to be sure.

Screen Shot 2013-02-01 at 9.28.09 AM.png

And the SERVICE alert settings are just set to "skip" so that they just follow the settings of the host they reside on:

Screen Shot 2013-02-01 at 9.35.57 AM.png

Now I'm pretty confident that this WAS working correctly for a while, however I've started to get alerts that there is a problem with services, then a few minutes later that they've recovered so it's obvious that the event handler IS working, but they are not abiding by my first notification delay setting.

Any input into why this may be?

Thanks!

scottwilkerson · Post by **scottwilkerson** » Fri Feb 01, 2013 3:09 pm

Check every 5 min
Retry interval 3 min
Retries 3 times

So that's 14 min
Upon entering a hard state it kicks off an event handler to restart the service with a 2 min delay between service stop and start.

So a total of 16 minutes should pass between last known good state and when the service comes back up.

Actually, it would be 21 minutes, because the check is performed every 5 minutes, so the last knows UP time is 5 minutes BEFORE the first failure.

Basically, you need to add 5 minutes to your calculation...

Nagios Support Forum

Alert/notification delay issues

Alert/notification delay issues

Re: Alert/notification delay issues

Re: Alert/notification delay issues

Re: Alert/notification delay issues