Delay check of unreachable hosts

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
leighr
Posts: 2
Joined: Tue Dec 04, 2012 11:23 pm

Delay check of unreachable hosts

Post by leighr »

Hi,

I work in a company with approximately 70 remote sites. Each site has a monitoring setup similar to the following:

Code: Select all

Router - Physical host
       - ESX Host - File server
                  - Print server
                  - SCCM server
Appropriate parent/child relationships have been defined so that in the event of a power or network outage, the physical hosts, ESX host and VMs all go to unreachable, with a ticket raised to our service desk. For network issues, this is fine.

For power outages that go on long enough, the UPS sends shutdown commands to all the relevant hosts and everything shuts down gracefully. When power comes back, everything automatically restarts. My issue lies in the fact that the router has a power on time of about a minute. So, the router comes back first, then Nagios checks the physcial hosts and ESX host. Finding them down, it then raises a ticket for each of the downed servers. Once the ESX host finishes booting, it then checks the guests. However, the guests need a bit of time to boot, so we'll normally receive a ticket for each guest.

Is it possible to delay the initial check when a server is unreachable? For example, if it can be delayed by 5 minutes, all the hosts will have booted and 95% of the time everything has started correctly, in which case no alerts are required. Obviously, for the 5% of the time when something doesn't go as planned, we need to be notified.

I'm running Nagios 3.4.1 on Ubuntu 12.04.

Thanks,

Leigh.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Delay check of unreachable hosts

Post by jsmurphy »

You can't do what you've suggested but there is a first_notification_delay option. I've never used it so I'm not sure if it will cancel sending the notification if it comes back up within the delay interval... I can't imagine it being a particularly useful option if it didn't. Might be worth trying that out and seeing if that can solve your problem?

There are possibly some other ways to work around this but it would require some fairly complicated changes to your configuration and monitoring.
leighr
Posts: 2
Joined: Tue Dec 04, 2012 11:23 pm

Re: Delay check of unreachable hosts

Post by leighr »

Not quite what I was hoping for, but thanks for the reply.

I'll have a play with the first_notification_delay option.

Thanks,

Leigh.
Locked