Email notification - changing operation time

d3ag0s · Post by **d3ag0s** » Sun Dec 23, 2012 2:24 pm

I have the following service definition:

define service{
use custom
name custom
is_volatile 0
check_period 24x7
max_check_attempts 10
normal_check_interval 10
retry_check_interval 1
contact_groups MyNotifications
notification_interval 10
notification_period 24x7
notification_options c,r,u,w
register 0
}

That has the following host definition associated:

Code: Select all

define host{
        use                     generic-host  ; Name of host template to use
        host_name               name
        alias                   name
        address                 8.4.4.4
        check_command           check-host-alive
        max_check_attempts	10
        notification_interval   60
        notification_period     24x7
        notification_options    d,u,r
        }

The generic-host is the default template:

Code: Select all

define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1       	; Host notifications are enabled
        event_handler_enabled           1       	; Host event handler is enabled
        flap_detection_enabled          1       	; Flap detection is enabled
        failure_prediction_enabled      1       	; Failure prediction is enabled
        process_perf_data               1       	; Process performance data
        retain_status_information       1       	; Retain status information across program restarts
        retain_nonstatus_information    1       	; Retain non-status information across program restarts
	notification_period		24x7		; Send host notifications at any time
        register                        0       	; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }

I have configured my system to send a notification and right now (if I got this correctly) is sending a notification if the service is alerting for 10 minutes, until the alert is gone. I am looking on tweaking a bit the notification time to have the first notification sent after 3 minutes and the next one after 10 minutes (and so on), until the alert is gone. Can someone advice how I can accomplish this?

Post by **jsmurphy** » Wed Dec 26, 2012 5:01 pm

Those notification intervals sound waaaaay too verbose and you are going to annoy people pretty darn quick with those metrics. Here is how you can accomplish it even though I would advise otherwise:

define service{
use custom
name custom
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 10
retry_check_interval 1
contact_groups MyNotifications
notification_interval 10
notification_period 24x7
notification_options c,r,u,w
register 0
}

d3ag0s · Post by **d3ag0s** » Wed Dec 26, 2012 5:30 pm

What would be your advice?

slansing · Post by **slansing** » Thu Dec 27, 2012 10:15 am

I believe he was suggesting that you increase the amount of time between notifications, 60 seconds is quite fast unless of course you have a team that works VERY fast.

Post by **jsmurphy** » Thu Dec 27, 2012 6:30 pm

Think about it like this, you get your first notification... you now have 10 minutes to fix the problem before another notification, are you likely to forget about the problem in the space of 10 minutes? Are you always going to be able to respond to a problem within 10 minutes, let alone solve the problem? What about complex outages where you might have 10+ services in a critical state that aren't part of a dependency structure alerting every 10 minutes?

You may also wish to consider if 2 minutes is long enough to filter out false positives, this stuff will likely be dictated by company policy but the noisier your monitoring is the more likely people will begin ignoring it (like the story of the boy who cried wolf).

We use the following metrics:
Regular check interval: 10 minutes
Retry check interval: 2 minutes
Total failed attempts before notification: 3
notification interval: 1 per hour
notify only on critical

This may or may not be suitable for you, but we know about a problem within 4 ~ 14 minutes of the problem occurring and depending on the urgency of the problem we may or may not get to it within an hour. Warnings are dealt with ad-hoc from a NOC screen.

Nagios Support Forum

Email notification - changing operation time

Email notification - changing operation time

Re: Email notification - changing operation time

Re: Email notification - changing operation time

Re: Email notification - changing operation time

Re: Email notification - changing operation time