Unexpected Downtime Behavior
Posted: Mon Jun 11, 2012 11:35 am
We have a VCenter VMWare Server that controls access to a number of child ESX servers. These child servers all have a host dependency set to not to check or alert if the master VCenter server goes down. (Execution failure criteria=d,u; Notification failure criteria=d,u)
We also have a service dependency set so if the VMWare Runtime service on the VCenter Server goes crit or unreachable, checks and alerts will be suppressed on the child ESX host's services. (Execution failure criteria=u,c; Notification failure criteria=u,c)
We set a 30 minute flexible host downtime at 08:07 for the VCenter Server, and specified a non-triggered downtime for all child hosts. at 08:11 we received a notification that the VCenter Server was down, and downtime had started. We then immediately received alerts on all VCenter Server services and all child ESX servers. We did not receive any Host alerts for any of the child ESX servers.
My understanding is that the services are dependent on the host they are run against, so the host downtime should have covered the services also. If this was not the case, I would have thought the service alerts from the child ESX servers would have been suppressed by the service dependencies that were applied against the VCenter's service, which was down. I can supply any applicable configs requested, but does anyone have an idea of what I may have done wrong here?
We are using NagiosXI 2011R2.3.
We also have a service dependency set so if the VMWare Runtime service on the VCenter Server goes crit or unreachable, checks and alerts will be suppressed on the child ESX host's services. (Execution failure criteria=u,c; Notification failure criteria=u,c)
We set a 30 minute flexible host downtime at 08:07 for the VCenter Server, and specified a non-triggered downtime for all child hosts. at 08:11 we received a notification that the VCenter Server was down, and downtime had started. We then immediately received alerts on all VCenter Server services and all child ESX servers. We did not receive any Host alerts for any of the child ESX servers.
My understanding is that the services are dependent on the host they are run against, so the host downtime should have covered the services also. If this was not the case, I would have thought the service alerts from the child ESX servers would have been suppressed by the service dependencies that were applied against the VCenter's service, which was down. I can supply any applicable configs requested, but does anyone have an idea of what I may have done wrong here?
We are using NagiosXI 2011R2.3.