Delay check of unreachable hosts
Posted: Wed Dec 05, 2012 12:15 am
Hi,
I work in a company with approximately 70 remote sites. Each site has a monitoring setup similar to the following:
Appropriate parent/child relationships have been defined so that in the event of a power or network outage, the physical hosts, ESX host and VMs all go to unreachable, with a ticket raised to our service desk. For network issues, this is fine.
For power outages that go on long enough, the UPS sends shutdown commands to all the relevant hosts and everything shuts down gracefully. When power comes back, everything automatically restarts. My issue lies in the fact that the router has a power on time of about a minute. So, the router comes back first, then Nagios checks the physcial hosts and ESX host. Finding them down, it then raises a ticket for each of the downed servers. Once the ESX host finishes booting, it then checks the guests. However, the guests need a bit of time to boot, so we'll normally receive a ticket for each guest.
Is it possible to delay the initial check when a server is unreachable? For example, if it can be delayed by 5 minutes, all the hosts will have booted and 95% of the time everything has started correctly, in which case no alerts are required. Obviously, for the 5% of the time when something doesn't go as planned, we need to be notified.
I'm running Nagios 3.4.1 on Ubuntu 12.04.
Thanks,
Leigh.
I work in a company with approximately 70 remote sites. Each site has a monitoring setup similar to the following:
Code: Select all
Router - Physical host
- ESX Host - File server
- Print server
- SCCM serverFor power outages that go on long enough, the UPS sends shutdown commands to all the relevant hosts and everything shuts down gracefully. When power comes back, everything automatically restarts. My issue lies in the fact that the router has a power on time of about a minute. So, the router comes back first, then Nagios checks the physcial hosts and ESX host. Finding them down, it then raises a ticket for each of the downed servers. Once the ESX host finishes booting, it then checks the guests. However, the guests need a bit of time to boot, so we'll normally receive a ticket for each guest.
Is it possible to delay the initial check when a server is unreachable? For example, if it can be delayed by 5 minutes, all the hosts will have booted and 95% of the time everything has started correctly, in which case no alerts are required. Obviously, for the 5% of the time when something doesn't go as planned, we need to be notified.
I'm running Nagios 3.4.1 on Ubuntu 12.04.
Thanks,
Leigh.