Hi there,
I did some poking around the forums and I think I understand how recurring downtime works, but I'd like to both verify that understanding and identify how to handle my situation.
In short, if I add a host to recurring downtime, the downtime only "activates" if the host goes down during the period that it is configured for.
In my situation, the downtime is intended to handle an automated reboot of a server. (Not something we want to do, but something we have no control over at the moment) ... But we keep getting alerts about services on that server being down. What I believe is happening is that the reboot occurs on schedule, during the recurring downtime period, but the reboot is quick enough that Nagios "misses" the host being down. As a result, the node never truly enters downtime and that downtime isn't inherited by the services on that node. Services take a bit longer to come back up and as a result, Nagios notices this and alerts.
How do we fix this? The easiest way I could think of was to put a * in the service textbox, but Nagios won't accept that. So will I need to add recurring downtime for every service on this box?
Thanks,
Jason
Recurring downtime for hosts (inherit services?)
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Recurring downtime for hosts (inherit services?)
The easiest way to do this is to make sure in the config for the services you have a retry Retry interval & Max check attempts specified so that
Retry interval * Max check attempts > the Check interval for the host.
A notification won't go out until Retry interval * Max check attempts. If the host check happens before that it will be marked down and supress the service notifications.
Retry interval * Max check attempts > the Check interval for the host.
A notification won't go out until Retry interval * Max check attempts. If the host check happens before that it will be marked down and supress the service notifications.