Page 1 of 1
Escalations not notifying of recovery
Posted: Wed Jun 15, 2016 2:47 pm
by SavaSC
I am setting up notification escalations. The problem I was having is that there was no notification when the Host came back up. I finally figured it out. While the escalation had "Escalation Options -> r" checked, the test Host record did not. Once I put a check in the box for notification on Recovery on the Host record, I started getting the recovery alerts.
From my reading of the documentation, I was under the impression that once the escalation took over Nagios would notify according to the escalation rules. This did not seem to happen for me.
Here is the code I had:
Code: Select all
define host {
host_name Curts Tester
use HOU Hosts
address dt-00112233
max_check_attempts 1
check_interval 1
retry_interval 1
active_checks_enabled 1
check_period 24x7
notification_interval 1
notification_period 24x7
notification_options d,u,
notifications_enabled 1
register 1
}
define hostescalation {
# config_name Text notification - SS team
host_name Curts Tester
hostgroup_name Critical Systems
contacts jcalford1,srvsupport
first_notification 3
last_notification 0
notification_interval 1
escalation_period 24x7
escalation_options u,r,d,
}
I changed the Host config to be the following and it started working correctly.
Code: Select all
define host {
host_name Curts Tester
use HOU Hosts
address dt-00112233
max_check_attempts 1
check_interval 1
retry_interval 1
active_checks_enabled 1
check_period 24x7
notification_interval 1
notification_period 24x7
notification_options d,u,r,
notifications_enabled 1
register 1
}
Just trying to get my head around expected results with escalations.
Thanks.
Re: Escalations not notifying of recovery
Posted: Wed Jun 15, 2016 4:20 pm
by lmiltchev
I believe, this is by design. If something is not set on the host/service then it won't matter if it is set on an escalation. You will need to have "recovery" set on the host too (if you want to escalate on recovery).
Re: Escalations not notifying of recovery
Posted: Thu Jun 16, 2016 7:27 am
by SavaSC
Here is the bit from the documents that is confusing me:
https://assets.nagios.com/downloads/nag ... tions.html
If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery.
What is actually happening is that unless both the normal notifications and the escalation are set to notify on Recovery then none notify. If they are both set, then both notify. It's an all or nothing situation. The documentation doesn't cover what happens if you have different settings between the normal notifications and the escalated ones.
I just wanted to make sure that this is a "feature" and not a "bug". Also, to document it in case someone else might have the same issue.
If this is normal activity, you may close this thread.
Thanks!
Re: Escalations not notifying of recovery
Posted: Thu Jun 16, 2016 9:26 am
by lmiltchev
If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery.
This was a very specific example about overlapping escalation ranges (see the paragraph above it), which I believe has nothing to do with your particular issue.
What is actually happening is that unless both the normal notifications and the escalation are set to notify on Recovery then none notify. If they are both set, then both notify. It's an all or nothing situation.
Do your "regular' contacts get notified? Only the escalated ones should receive email notifications with this setup. You have:
define hostescalation {
# config_name Text notification - SS team
host_name Curts Tester
hostgroup_name Critical Systems
contacts jcalford1,srvsupport
first_notification 3
last_notification 0
notification_interval 1
escalation_period 24x7
escalation_options u,r,d,
}
Setting the last notification value to zero should keep escalating notifications "forever" (not switching back to "regular" notifications). Here's a quote from our documentation:
Hostescalation - last notification
This directive is a number that identifies the last notification for which this escalation is effective. For instance, if you set this value to 5, this escalation will not be used if more than five notifications are sent out for the host. Setting this value to 0 means to keep using this escalation entry forever (no matter how many notifications go out).
Parameter name: last_notification
Required: yes
Re: Escalations not notifying of recovery
Posted: Thu Jun 16, 2016 9:40 am
by SavaSC
Do your "regular' contacts get notified? Only the escalated ones should receive email notifications with this setup
Normal notification set to not notify on Recovery and escalations set to notify on Recovery = no notification for anyone on recovery.
Both set to notify on Recovery = Both normal and escalation contacts get notified on Recovery.
Setting the last notification value to zero should keep escalating notifications "forever" (not switching back to "regular" notifications).
This is what I thought as well. It doesn't seem to be working that way. I don't have a problem with both getting notified, it is just different from what the documentation said so I thought I'd say something.
Re: Escalations not notifying of recovery
Posted: Thu Jun 16, 2016 9:51 am
by lmiltchev
Both set to notify on Recovery = Both normal and escalation contacts get notified on Recovery.
I asked a co-worker to test this as I was not able to recreate the issue. Meanwhile, we will need a little bit more information. Who is the "regular" contact that received notifications (but was not supposed to)? I don't see any contacts defined on the "Curts Tester" host? Are they defined in the "HOU Hosts" template? Can you post the relevant configs?
Re: Escalations not notifying of recovery
Posted: Thu Jun 16, 2016 10:19 am
by SavaSC
lmiltchev wrote:I asked a co-worker to test this as I was not able to recreate the issue. Meanwhile, we will need a little bit more information. Who is the "regular" contact that received notifications (but was not supposed to)? I don't see any contacts defined on the "Curts Tester" host? Are they defined in the "HOU Hosts" template? Can you post the relevant configs?
Yes, the "regular" contact is set up in the HOU Hosts template. Here is that template.
Code: Select all
define host {
name HOU Hosts
alias Default setting for servers in HOU
parents +HOU-RTR-INT
hostgroups +ALL-HOUSTON
check_command check-host-alive!!!!!!!!
max_check_attempts 10
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 0
check_period 24x7
obsess_over_host 0
check_freshness 0
event_handler_enabled null
flap_detection_enabled 0
process_perf_data 0
retain_status_information 1
retain_nonstatus_information 1
contact_groups +Oncall
notification_interval 60
notification_period 24x7
first_notification_delay 0
notification_options d,u,r,
notifications_enabled 1
register 0
}
The Oncall group sends an email to our On Call mailbox.
Here is the escalation config.
Code: Select all
define hostescalation {
# config_name Text notification - SS team
host_name Curts Tester
hostgroup_name Critical Systems
contacts jcalford1
first_notification 3
last_notification 0
notification_interval 1
escalation_period 24x7
escalation_options u,r,d,
}
The contact jcalford1 sends a text via Twilio to my cell phone.
Do you need any other configs?
Re: Escalations not notifying of recovery
Posted: Thu Jun 16, 2016 3:37 pm
by lmiltchev
I am waiting for Matt to finish testing. Meanwhile, I gave host escalations a second try.
Here's my configs.
host
Code: Select all
define host {
host_name CentOS6-NRPE
use xiwizard_linuxserver_host_copy_1
address x.x.x.x
parents xxxxxxx
max_check_attempts 1
check_interval 1
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 1
notification_period xi_timeperiod_24x7
first_notification_delay 0
notification_options d,u,r,
notifications_enabled 1
icon_image centos.png
statusmap_image centos.png
_xiwizard linux-server
register 1
}
host template
Code: Select all
define host {
name xiwizard_linuxserver_host_copy_1
check_command check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
use my_custom_template
contact_groups admins
register 0
}
contact group
Code: Select all
define contactgroup {
contactgroup_name admins
alias Nagios Administrators
members ludmil
}
host escalation
Code: Select all
define hostescalation {
# config_name test
host_name CentOS6-NRPE
hostgroup_name linux-servers
contacts nagiosadmin
first_notification 3
last_notification 0
notification_interval 1
escalation_period 24x7
escalation_options u,r,d,
}
So, my "regular" contact is "ludmil", and my "escalated" contact is "nagiosadmin". On the "CentOS6-NRPE" I ran from the command line:
and waited...
Two "regular" notifications were sent to ludmil. The third one got escalated to "nagiosadmin". I verified that all of the following notifications are indeed escalated and sent to "nagiosadmin", then I brought the interface up on the host:
When the host recovered, the host recovery notification was only sent to "nagiosadmin" (escalated notification). The "ludmil" contact didn't receive an email.
example01.PNG
To me, this is working as expected. If you want to continue troubleshooting the issue, most probably you will need to open a ticket in our email ticketing system. It is going to be easier/faster to fix the issue via a remote session.
Re: Escalations not notifying of recovery
Posted: Mon Jun 20, 2016 7:17 am
by SavaSC
Thank you for looking into this. Since you can not reproduce the error, there must be something wonky with our setup. As long as I can get the restore to report, even if it's to everyone, I'm OK with it.
Again, I appreciate you taking so much time on this. You may mark this completed.