Host DOWN doesn't send aler after escalation

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
emi65
Posts: 119
Joined: Fri Aug 17, 2012 3:41 am

Host DOWN doesn't send aler after escalation

Post by emi65 »

Hi expert

I'm using Nagios Core 4.4.5 in the RedHat 7 environment

I configured an host alert to check host alive each 2 minutes
Check Interval 2 min
Retry Check 1 min
Max Check Attempts 3

So in case of problem Nagios send an email alert after 5 minutes

Also I set an Notification Interval to 20 minute
The behavior excpeted is to send an email each 20 minutes if the host status doesn't change

I set and host escalation with parameters
First notification * 2
Last notification * 0
Notification interval * 0

In this way an SMS alert is sent after the 2th notification

This work correctly

Email notification and SMS notification (Escalation) is sent but
after these 2 notification nothing is sent again
I expected a email each 20 minutes Notification Interval to 20 minute

I add a host.cfg file of this host

Seems that the host notification_interval doesn't work

Someone could you help me ?

Thanks
Emilio
Attachments
SISVRASTSX404W8.cfg
(2.04 KiB) Downloaded 207 times
emi65
Posts: 119
Joined: Fri Aug 17, 2012 3:41 am

Re: Host DOWN doesn't send aler after escalation

Post by emi65 »

No one have a suggestion for this situation

After Host down I receive an email an no other until host goes up

Why I don't receive any email after the time set to notification interval ?

Thanks
Emilio
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Host DOWN doesn't send aler after escalation

Post by mcapra »

emi65 wrote: I configured an host alert to check host alive each 2 minutes
Check Interval 2 min
Retry Check 1 min
Max Check Attempts 3
I think this would have Nagios dispatch an alert no greater than 4 minutes from initially detecting the problem, assuming the problem didn't start until immediately after the last ~2min check execution. Based on my interpretation of the documentation, it sounds like max_check_attempts is inclusive of the first problematic check (before the retry_interval checks start):
https://assets.nagios.com/downloads/nag ... tions.html
This directive is used to define the number of times that Nagios will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the service check again.
There was a point in my life when I knew definitively whether or not retry_interval was inclusive of that initial problematic check, but I can't remember :P
emi65 wrote:Seems that the host notification_interval doesn't work
I believe your understanding of how the first_notification and last_notification directives work is correct.

However, when notification_interval is set to 0, per the docs:
If you specify a value of 0 for the interval, Nagios will send the first notification when this escalation definition is valid, but will then prevent any more problem notifications from being sent out for the host. Notifications are only sent out when the host recovers.
It sounds like you actually want your notification_interval for the hostescalation definition to be 20, if you want that escalation to repeat every 20 minutes forever until the problem is solved. The notification_interval for the hostescalation is probably superseding the notification_interval for the host in this case.

As an aside, I'm not 100% sure why that directive is required. I'd think for the particular hostescalation/servicescalation, if the notification_interval directive is not defined, you could just inherit whatever the associated host/service triggering the escalation has defined.
Former Nagios employee
https://www.mcapra.com/
Locked