Page 1 of 1

Informations about rescheduled notifications

Posted: Fri Jul 17, 2015 5:43 am
by xefil
Hello to all,
I'm using nagios 3.2.3 on my setup.
Looking on the documentation about the notifications and the time_periods I know they are applied at different levels and I've found this:
Hosts-Services Level
The fourth host or service filter that must be passed is the time period test. Each host and service definition has a <notification_period> option that specifies which time period contains valid notification times for the host or service. If the time that the notification is being made does not fall within a valid time range in the specified time period, no one gets contacted. If it falls within a valid time range, the notification gets passed to the next filter... Note: If the time period filter is not passed, Nagios will reschedule the next notification for the host or service (if its in a non-OK state) for the next valid time present in the time period. This helps ensure that contacts are notified of problems as soon as possible when the next valid time in time period arrives.
Contact Level
The last filter that must be passed for each contact is the time period test. Each contact definition has a <notification_period> option that specifies which time period contains valid notification times for the contact. If the time that the notification is being made does not fall within a valid time range in the specified time period, the contact will not be notified. If it falls within a valid time range, the contact gets notified!
Does it mean that if a notification has succesfully passed the service and host filters (in particular it's on a valid timeperiod) but on the contact level the contact has a different timeperiod that doesn't accept notifications, this notification is not rescheduled and will never sent out?

This is what I would like to obtain:
- host+service timeperiod 0-24
- contact timeperiod 7-18
- event notification time: 23
The alarm is triggered at 23. host+service timeperiod is valid. contact timeperiod is not valid. The notificaion will send out at 7AM, when it becomes valid.

BTW, readiung the doc, seems that the rescheduling of the notification happens only on host-services timeperiods.

Thanks for the support!

Simon

Re: Informations about rescheduled notifications

Posted: Fri Jul 17, 2015 1:40 pm
by jdalrymple
** EDIT ** Correction
xefil wrote:Does it mean that if a notification has succesfully passed the service and host filters (in particular it's on a valid timeperiod) but on the contact level the contact has a different timeperiod that doesn't accept notifications
The contact timeperiod is last in line and trumps everything else, so the notification won't go out...
xefil wrote:this notification is not rescheduled and will never sent out?
This is not the case.
xefil wrote:BTW, readiung the doc, seems that the rescheduling of the notification happens only on host-services timeperiods.
I think maybe a better way to think of it is that the notification is tied to the host/service, not to the contact. I can understand the confusion, but I also have no trouble making the distinction of a contact timeperiod being a filter and not in any way related to host/service notifications.

Re: Informations about rescheduled notifications

Posted: Fri Jul 17, 2015 1:43 pm
by tmcdonald
xefil wrote:Does it mean that if a notification has succesfully passed the service and host filters (in particular it's on a valid timeperiod) but on the contact level the contact has a different timeperiod that doesn't accept notifications, this notification is not rescheduled and will never sent out?
This is not correct. See my explanation below:
xefil wrote: - host+service timeperiod 0-24
- contact timeperiod 7-18
- event notification time: 23
The alarm is triggered at 23. host+service timeperiod is valid. contact timeperiod is not valid. The notificaion will send out at 7AM, when it becomes valid.

BTW, readiung the doc, seems that the rescheduling of the notification happens only on host-services timeperiods.
This should be the correct behavior.

If you have a host/service go down at 2300 and you have a 24-hour host/service notification period, it will always pass the host/service test. Then it looks to see if it passes the contact test, which it does not. The next notification attempt is rescheduled X minutes later where X is the notification_interval (let's assume 60 minutes). So it tries again at midnight and fails, then again at 0100, 0200, etc. failing to send until 0700 when it passes the contact test. At this point it should send.

Now remember, if at any point it recovers you will not get an email at 0700 saying it failed, followed by another saying it recovered. They do not queue up like that.

Re: Informations about rescheduled notifications

Posted: Mon Jul 20, 2015 2:39 am
by xefil
Thanks jdalrymple for your answer, all as expected :-(
tmcdonald wrote:
If you have a host/service go down at 2300 and you have a 24-hour host/service notification period, it will always pass the host/service test. Then it looks to see if it passes the contact test, which it does not. The next notification attempt is rescheduled X minutes later where X is the notification_interval (let's assume 60 minutes). So it tries again at midnight and fails, then again at 0100, 0200, etc. failing to send until 0700 when it passes the contact test. At this point it should send.

Now remember, if at any point it recovers you will not get an email at 0700 saying it failed, followed by another saying it recovered. They do not queue up like that.
Hello tmcdonald,

I've understood how the notification_interval works and it's ok to be re-notified in the morning if it's still not in OK state. What happens next is that this issue get re-notified 'x' times every notification_interval. I mean: at 0700AM the custer gets notified and it's ok. I would like to prevent it gets re-notified again and again until it's set in downtime/ack or it becomes OK again.
I need to notify customers with these requirements:
1- the contact should always get notified in the morning based on his time_period if the host/service is still in a non-OK state
2- the contact should be notified only one time

Ideas how to handle this?

Thanks!

Simon

Re: Informations about rescheduled notifications

Posted: Mon Jul 20, 2015 11:44 am
by tgriep
If you set the notification interval to 0 for the service/host check, that should make it so they will get notified once.
Is that what you are looking for?
notification interval

This directive is used to define the number of "time units" to wait before re-notifying a contact that this service is still in a non-OK state. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this service - only one problem notification will be sent out.

Re: Informations about rescheduled notifications

Posted: Tue Jul 21, 2015 3:32 am
by xefil
tgriep wrote:If you set the notification interval to 0 for the service/host check, that should make it so they will get notified once.
Is that what you are looking for?
Nope, because if the contact has a notification period outside the time_period set on the host/service, if I've correctly understood the notification flows, as soon the host goes down (or service critical) and the notification is sent out, the contact never get this notification. If set the notification_interval to 0, the contact is never notified. If set to, i.e. 60minutes, the contact is notified every 60minutes starting from the first moment in the contact time period.
Starting from the example above, let me explain:

CASE1 - not what I need
  • host+service timeperiod 0-24
  • notification_interval 0
  • contact timeperiod 7-18
  • event notification time: 23
The alarm is triggered at 23. host+service timeperiod is valid. contact timeperiod is not valid.
Result: Contact is never notified.

CASE2 - not what I need
  • host+service timeperiod 0-24
  • notification_interval 60 minutes
  • contact timeperiod 7-18
  • event notification time: 23
The alarm is triggered at 23. host+service timeperiod is valid. contact timeperiod is not valid, but notification_interval retries the notification every 60 minutes.
At 7AM the contact will be notified, as well as at 8AM, 9AM, 10AM, .... until the issue is recovered, set in ACK or DOWNTIME.
Result: Contact is notified too often. I would need only a first notify at 7AM

What I need is that at 7AM the contact is notified only one time with option to specify a renotification just in case if needed.

Thanks,

Simon

Re: Informations about rescheduled notifications

Posted: Tue Jul 21, 2015 4:42 pm
by tmcdonald
I would go with Case 2 above then just have the contact acknowledge the alert to silence it. That's the closest I think you are going to get with our built-in logic.