Forcing a check on a host when another goes down.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
MichelCote
Posts: 1
Joined: Wed Apr 13, 2011 6:34 am

Forcing a check on a host when another goes down.

Post by MichelCote »

Hi everyone,

I've been using Nagios core for a few years now to overlook my company's network.

Basically I have about 650 stores each with an internet connection on our VPN.

Each router are setup as 2 different hosts, one that checks the xxx.xxx.xxx.1 address (main gateway I call "ro") and the other checking a loopback IP xxx.xxx.xxx.254 (which I call "lo") all our routers have a DNS name consisting of a regional letter, Q for Québec and A for Atlantic (we're located in Québec, Canada), then the store number and either "ro" or "lo" to reach the proper IP.

As one of the front ends for our support techs I have a webpage that polls a MySQL DB which is updated by an eventhandler whenever the hosts goes down or up. That webpage then combines the results of each of the "ro" and "lo" to show the router as being either up (doesn't display), down (both IP are not responding shows red), on Loopback ("ro" not responding but "lo" is, shows purple) OR on Cell backup ("ro" is responding but not "lo", shows blue). As you can understand we have a cell backup for when the main DSL or Cable lines in our store fails. Since it costs a lot more to use the cell backup we want to know when this happens.

Anyway... Now for the actual question.

Since both checks for "ro" and "lo" are not always synchronized it can happen that we see a store on cell backup or loopback while it is either completely down or up while Nagios is catching up on both checks.

I would like to know if there is a way to force a check on another host when one either goes down or up so we have the least amount of time between both checks?

So for example on a "ro" hosts I would see OnDown -> check "lo" and OnRecover -> check "lo"... Same on the "lo" hosts.

Hope I've been clear enough and thanks for any reply.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Forcing a check on a host when another goes down.

Post by abrist »

As you are already familiar with event handlers, you should be able to use one to do what you are looking for. When host 'a' checks as 'down', are you always looking to check host 'b' next? Or is the second host to be checked dynamic? If it is always static (the same host every time host 'a' goes down, you could write an event handler to write to the nagios pipe forcing a check on host 'b'.

Code: Select all

#!/bin/sh

NOW='date +%s'

echo "[$NOW] SCHEDULE_HOST_CHECK;<host b>;$NOW" \
>/usr/local/nagios/var/rw/nagios.cmd
You will probably want some additional logic to check whether host 'a' is in a down state, so you will have to pass the event handler a few macros pertaining to host a (the state and hostname most likely).
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked