Page 1 of 1

Escalations not notifying of recovery

Posted: Wed Jun 15, 2016 2:47 pm
by SavaSC
I am setting up notification escalations. The problem I was having is that there was no notification when the Host came back up. I finally figured it out. While the escalation had "Escalation Options -> r" checked, the test Host record did not. Once I put a check in the box for notification on Recovery on the Host record, I started getting the recovery alerts.

From my reading of the documentation, I was under the impression that once the escalation took over Nagios would notify according to the escalation rules. This did not seem to happen for me.

Here is the code I had:

Code: Select all


define host {
	host_name			Curts Tester
	use				HOU Hosts
	address				dt-00112233
	max_check_attempts		1
	check_interval			1
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		1
	notification_period		24x7
	notification_options		d,u,
	notifications_enabled		1
	register			1
	}	

define hostescalation {
	#	config_name	Text notification - SS team
		host_name                     		Curts Tester
		hostgroup_name                		Critical Systems
		contacts                      		jcalford1,srvsupport
		first_notification            		3
		last_notification             		0
		notification_interval         		1
		escalation_period             		24x7
		escalation_options            		u,r,d,
	}	

I changed the Host config to be the following and it started working correctly.

Code: Select all

define host {
	host_name			Curts Tester
	use				HOU Hosts
	address				dt-00112233
	max_check_attempts		1
	check_interval			1
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		1
	notification_period		24x7
	notification_options		d,u,r,
	notifications_enabled		1
	register			1
	}	

Just trying to get my head around expected results with escalations.

Thanks.

Re: Escalations not notifying of recovery

Posted: Wed Jun 15, 2016 4:20 pm
by lmiltchev
I believe, this is by design. If something is not set on the host/service then it won't matter if it is set on an escalation. You will need to have "recovery" set on the host too (if you want to escalate on recovery).

Re: Escalations not notifying of recovery

Posted: Thu Jun 16, 2016 7:27 am
by SavaSC
Here is the bit from the documents that is confusing me:
https://assets.nagios.com/downloads/nag ... tions.html
If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery.
What is actually happening is that unless both the normal notifications and the escalation are set to notify on Recovery then none notify. If they are both set, then both notify. It's an all or nothing situation. The documentation doesn't cover what happens if you have different settings between the normal notifications and the escalated ones.

I just wanted to make sure that this is a "feature" and not a "bug". Also, to document it in case someone else might have the same issue.

If this is normal activity, you may close this thread.

Thanks!

Re: Escalations not notifying of recovery

Posted: Thu Jun 16, 2016 9:26 am
by lmiltchev
If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery.
This was a very specific example about overlapping escalation ranges (see the paragraph above it), which I believe has nothing to do with your particular issue.
What is actually happening is that unless both the normal notifications and the escalation are set to notify on Recovery then none notify. If they are both set, then both notify. It's an all or nothing situation.
Do your "regular' contacts get notified? Only the escalated ones should receive email notifications with this setup. You have:
define hostescalation {
# config_name Text notification - SS team
host_name Curts Tester
hostgroup_name Critical Systems
contacts jcalford1,srvsupport
first_notification 3
last_notification 0
notification_interval 1
escalation_period 24x7
escalation_options u,r,d,
}
Setting the last notification value to zero should keep escalating notifications "forever" (not switching back to "regular" notifications). Here's a quote from our documentation:
Hostescalation - last notification

This directive is a number that identifies the last notification for which this escalation is effective. For instance, if you set this value to 5, this escalation will not be used if more than five notifications are sent out for the host. Setting this value to 0 means to keep using this escalation entry forever (no matter how many notifications go out).

Parameter name: last_notification
Required: yes

Re: Escalations not notifying of recovery

Posted: Thu Jun 16, 2016 9:40 am
by SavaSC
Do your "regular' contacts get notified? Only the escalated ones should receive email notifications with this setup
Normal notification set to not notify on Recovery and escalations set to notify on Recovery = no notification for anyone on recovery.
Both set to notify on Recovery = Both normal and escalation contacts get notified on Recovery.
Setting the last notification value to zero should keep escalating notifications "forever" (not switching back to "regular" notifications).
This is what I thought as well. It doesn't seem to be working that way. I don't have a problem with both getting notified, it is just different from what the documentation said so I thought I'd say something.

Re: Escalations not notifying of recovery

Posted: Thu Jun 16, 2016 9:51 am
by lmiltchev
Both set to notify on Recovery = Both normal and escalation contacts get notified on Recovery.
I asked a co-worker to test this as I was not able to recreate the issue. Meanwhile, we will need a little bit more information. Who is the "regular" contact that received notifications (but was not supposed to)? I don't see any contacts defined on the "Curts Tester" host? Are they defined in the "HOU Hosts" template? Can you post the relevant configs?

Re: Escalations not notifying of recovery

Posted: Thu Jun 16, 2016 10:19 am
by SavaSC
lmiltchev wrote:I asked a co-worker to test this as I was not able to recreate the issue. Meanwhile, we will need a little bit more information. Who is the "regular" contact that received notifications (but was not supposed to)? I don't see any contacts defined on the "Curts Tester" host? Are they defined in the "HOU Hosts" template? Can you post the relevant configs?
Yes, the "regular" contact is set up in the HOU Hosts template. Here is that template.

Code: Select all

define host {
       name                          		HOU Hosts
       alias                         		Default setting for servers in HOU
       parents                       		+HOU-RTR-INT
       hostgroups                    		+ALL-HOUSTON
       check_command                 		check-host-alive!!!!!!!!
       max_check_attempts            		10
       check_interval                		5
       retry_interval                		1
       active_checks_enabled         		1
       passive_checks_enabled        		0
       check_period                  		24x7
       obsess_over_host              		0
       check_freshness               		0
       event_handler_enabled         		null
       flap_detection_enabled        		0
       process_perf_data             		0
       retain_status_information     		1
       retain_nonstatus_information  		1
       contact_groups                		+Oncall
       notification_interval         		60
       notification_period           		24x7
       first_notification_delay      		0
       notification_options          		d,u,r,
       notifications_enabled         		1
       register                    		0

}	
The Oncall group sends an email to our On Call mailbox.

Here is the escalation config.

Code: Select all

define hostescalation {
	#	config_name	Text notification - SS team
		host_name                     		Curts Tester
		hostgroup_name                		Critical Systems
		contacts                      		jcalford1
		first_notification            		3
		last_notification             		0
		notification_interval         		1
		escalation_period             		24x7
		escalation_options            		u,r,d,
	}	
The contact jcalford1 sends a text via Twilio to my cell phone.

Do you need any other configs?

Re: Escalations not notifying of recovery

Posted: Thu Jun 16, 2016 3:37 pm
by lmiltchev
I am waiting for Matt to finish testing. Meanwhile, I gave host escalations a second try. :)

Here's my configs.

host

Code: Select all

define host {
	host_name			CentOS6-NRPE
	use				xiwizard_linuxserver_host_copy_1
	address				x.x.x.x
	parents				xxxxxxx
	max_check_attempts		1
	check_interval			1
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		1
	notification_period		xi_timeperiod_24x7
	first_notification_delay	0
	notification_options		d,u,r,
	notifications_enabled		1
	icon_image			centos.png
	statusmap_image			centos.png
	_xiwizard			linux-server
	register			1
	}
host template

Code: Select all

define host {
       name                          		xiwizard_linuxserver_host_copy_1
       check_command                 		check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
       use                           		my_custom_template
       contact_groups                		admins
       register                    		0

}
contact group

Code: Select all

define contactgroup {
	contactgroup_name             		admins
	alias                         		Nagios Administrators
	members                       		ludmil
	}

host escalation

Code: Select all

define hostescalation {
	#	config_name	test
		host_name                     		CentOS6-NRPE
		hostgroup_name                		linux-servers
		contacts                      		nagiosadmin
		first_notification            		3
		last_notification             		0
		notification_interval         		1
		escalation_period             		24x7
		escalation_options            		u,r,d,
	}
So, my "regular" contact is "ludmil", and my "escalated" contact is "nagiosadmin". On the "CentOS6-NRPE" I ran from the command line:

Code: Select all

ifdown eth0
and waited...

Two "regular" notifications were sent to ludmil. The third one got escalated to "nagiosadmin". I verified that all of the following notifications are indeed escalated and sent to "nagiosadmin", then I brought the interface up on the host:

Code: Select all

ifup eth0
When the host recovered, the host recovery notification was only sent to "nagiosadmin" (escalated notification). The "ludmil" contact didn't receive an email.
example01.PNG
To me, this is working as expected. If you want to continue troubleshooting the issue, most probably you will need to open a ticket in our email ticketing system. It is going to be easier/faster to fix the issue via a remote session.

Re: Escalations not notifying of recovery

Posted: Mon Jun 20, 2016 7:17 am
by SavaSC
Thank you for looking into this. Since you can not reproduce the error, there must be something wonky with our setup. As long as I can get the restore to report, even if it's to everyone, I'm OK with it.

Again, I appreciate you taking so much time on this. You may mark this completed.