Page 1 of 1
serviceescalation not working
Posted: Thu Oct 25, 2018 9:35 am
by kwhogster
Hi All,
I have configured Nagios to use an email only group after hours for warnings and send email and SMS for critical during the day and after hours. What I have noticed is that I am getting both email and SMS warning notifications after hours.
Nagios 4.3.4
Nsclient 5.1.44
windows 2012/2012R2/2016
# e-drive.proto
define service{
use generic-service
host_name hostname
service_description E:\ Drive Space
is_volatile 0
check_period 24x7
max_check_attempts 3
check_interval 5
retry_interval 1
contact_groups NT-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_win_disk!E
}
define serviceescalation{
host_name hostname
service_description E:\ Drive Space
notification_interval 0
contact_groups NT-admins-email
escalation_period my-nonworkhours
escalation_options w,r
}
define serviceescalation{
host_name hostname
service_description E:\ Drive Space
first_notification 0
last_notification 0
notification_interval 120
contact_groups NT-admins
escalation_period my-nonworkhours
escalation_options u,c,r
}
define servicedependency {
host_name hostname
service_description NRPE Status
dependent_service_description E:\ Drive Space
execution_failure_criteria u,c,p,w
notification_failure_criteria u,c,p,w
dependency_period 24x7
}
Any thoughts?
Thank you
Tom
Re: serviceescalation not working
Posted: Thu Oct 25, 2018 4:30 pm
by scottwilkerson
kwhogster wrote:What I have noticed is that I am getting both email and SMS warning notifications after hours.
But are they escalated? your service config does contain these items
Code: Select all
contact_groups NT-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
which would trigger warning notification to the members of NT-admins contact_groups 24x7
Re: serviceescalation not working
Posted: Fri Oct 26, 2018 12:29 pm
by kwhogster
Hello Scott,
According to my understanding of the docs the service escalation should override the setting of the service during the specified times. I understand your question of rather or not the serviceescalation executed. Just not sure how to test. Based on the info below I believe I set up the serviceescalation correctly.
https://assets.nagios.com/downloads/nag ... tions.html
Once a notification is escalated, the contact/groups and notification options for the object will be overridden by the escalation's settings.
contact_groups: This directive is used to identify the short name of the contact group that should be notified when the service notification is escalated. Multiple contact groups should be separated by commas. You must specify at least one contact or contact group in each service escalation definition.
notification_interval: This directive is used to determine the interval at which notifications should be made while this escalation is valid. If you specify a value of 0 for the interval, Nagios will send the first notification when this escalation definition is valid, but will then prevent any more problem notifications from being sent out for the host. Notifications are only sent out when the host recovers. This is useful if you want to stop having notifications sent out after a certain amount of time. Note: If multiple escalation entries for a host overlap for one or more notification ranges, the smallest notification interval from all escalation entries is used.
escalation_period: This directive is used to specify the short name of the time period during which this escalation is valid. If this directive is not specified, the escalation is considered to be valid during all times.
escalation_options: This directive is used to define the criteria that determine when this service escalation is used. The escalation is used only if the service is in one of the states specified in this directive. If this directive is not specified in a service escalation, the escalation is considered to be valid during all service states. Valid options are a combination of one or more of the following: r = escalate on an OK (recovery) state, w = escalate on a WARNING state, u = escalate on an UNKNOWN state, and c = escalate on a CRITICAL state. Example: If you specify w in this field, the escalation will only be used if the service is in a WARNING state.
Re: serviceescalation not working
Posted: Fri Oct 26, 2018 1:09 pm
by scottwilkerson
From the document you linked:
When Are Notifications Escalated?
Notifications are escalated if and only if one or more escalation definitions matches the current notification that is being sent out. If a host or service notification does not have any valid escalation definitions that applies to it, the contact group(s) specified in either the host group or service definition will be used for the notification.
Your notifications do not have a valid escalation definitions that applies to it if the notification is a WARNING and your escalation that has a escalation_options that contains the w flag is only valid for a specific timeperiod.
Re: serviceescalation not working
Posted: Fri Oct 26, 2018 2:58 pm
by kwhogster
Hi Scott,
By design we are monitoring with SMS and email during business hours and non-business hours we are sending a single email for warning conditions and SMS and email for critical and unknown states during and after business hours. During business hours we want email and SMS for c,w,r,u after hours we want w,r to email and c,u,r to email and SMS. Can you elaborate more on my not having a valid escalation definition.
HTH
Tom
Re: serviceescalation not working
Posted: Fri Oct 26, 2018 3:11 pm
by scottwilkerson
As I understand your setup, if a notification comes in during
my-nonworkhours hours, the following will not match as it doesn't have a
first_notification number set:
Code: Select all
define serviceescalation{
host_name hostname
service_description E:\ Drive Space
notification_interval 0
contact_groups NT-admins-email
escalation_period my-nonworkhours
escalation_options w,r
}
This will cause the regular notification to trigger which is set in the service
Code: Select all
# e-drive.proto
define service{
use generic-service
host_name hostname
service_description E:\ Drive Space
is_volatile 0
check_period 24x7
max_check_attempts 3
check_interval 5
retry_interval 1
contact_groups NT-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_win_disk!E
}
Re: serviceescalation not working
Posted: Fri Oct 26, 2018 3:28 pm
by kwhogster
Hello Scott,
Thanks for your help.
We will make the change and post our results.
Tom
Re: serviceescalation not working
Posted: Fri Oct 26, 2018 3:31 pm
by scottwilkerson
kwhogster wrote:Hello Scott,
Thanks for your help.
We will make the change and post our results.
Tom
Sounds good