We have a VCenter VMWare Server that controls access to a number of child ESX servers. These child servers all have a host dependency set to not to check or alert if the master VCenter server goes down. (Execution failure criteria=d,u; Notification failure criteria=d,u)
We also have a service dependency set so if the VMWare Runtime service on the VCenter Server goes crit or unreachable, checks and alerts will be suppressed on the child ESX host's services. (Execution failure criteria=u,c; Notification failure criteria=u,c)
We set a 30 minute flexible host downtime at 08:07 for the VCenter Server, and specified a non-triggered downtime for all child hosts. at 08:11 we received a notification that the VCenter Server was down, and downtime had started. We then immediately received alerts on all VCenter Server services and all child ESX servers. We did not receive any Host alerts for any of the child ESX servers.
My understanding is that the services are dependent on the host they are run against, so the host downtime should have covered the services also. If this was not the case, I would have thought the service alerts from the child ESX servers would have been suppressed by the service dependencies that were applied against the VCenter's service, which was down. I can supply any applicable configs requested, but does anyone have an idea of what I may have done wrong here?
We are using NagiosXI 2011R2.3.
Unexpected Downtime Behavior
Unexpected Downtime Behavior
--
Griffin Wakem
Griffin Wakem
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unexpected Downtime Behavior
Besides the dependencies, do you have parent child relationships setup for these hosts?
The reason I ask is because when you specify "and specified a non-triggered downtime for all child hosts" when scheduling downtime this line is based an the parent chile relationship NOT dependency relationships...
The reason I ask is because when you specify "and specified a non-triggered downtime for all child hosts" when scheduling downtime this line is based an the parent chile relationship NOT dependency relationships...
Re: Unexpected Downtime Behavior
I should have specified that, Thanks for reminding me. Yes, all of the children have the VCenter Server in question specified as the only parent.
--
Griffin Wakem
Griffin Wakem
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unexpected Downtime Behavior
Ok, the only other thing I think, is that you will need to choose "Schedule triggered downtime for all hosts"
Re: Unexpected Downtime Behavior
That was the only thing I could think of also. Next time we do this, I will use triggered downtime instead of non-triggered downtime.
I'm still not clear why the services would alert when the host was down, regardless of downtime scheduled. Shouldn't the services under a host still be dependent on the host being up?
I'm still not clear why the services would alert when the host was down, regardless of downtime scheduled. Shouldn't the services under a host still be dependent on the host being up?
--
Griffin Wakem
Griffin Wakem
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unexpected Downtime Behavior
This might have something to do with the service dependencies you have setup but I'm not totally sure