I work in a company with approximately 70 remote sites. Each site has a monitoring setup similar to the following:
Code: Select all
Router - Physical host
- ESX Host - File server
- Print server
- SCCM server
For power outages that go on long enough, the UPS sends shutdown commands to all the relevant hosts and everything shuts down gracefully. When power comes back, everything automatically restarts. My issue lies in the fact that the router has a power on time of about a minute. So, the router comes back first, then Nagios checks the physcial hosts and ESX host. Finding them down, it then raises a ticket for each of the downed servers. Once the ESX host finishes booting, it then checks the guests. However, the guests need a bit of time to boot, so we'll normally receive a ticket for each guest.
Is it possible to delay the initial check when a server is unreachable? For example, if it can be delayed by 5 minutes, all the hosts will have booted and 95% of the time everything has started correctly, in which case no alerts are required. Obviously, for the 5% of the time when something doesn't go as planned, we need to be notified.
I'm running Nagios 3.4.1 on Ubuntu 12.04.
Thanks,
Leigh.