parent/child/dependencies and best practices
Posted: Thu Nov 01, 2018 7:15 am
Greetings.
I have a nagios server named Alpha. It can get to hosts Delta, Echo, and Foxtrot. Delta is a "bastion" host. It can get to hosts Tango, Uniform, and Victor; but Alpha cannot get to those hosts directly.
Since Alpha cannot get to Tango/Uniform/Victor, I have checks set up like this:
And Delta has nrpe configured like this:
And this works, for the most part.
The problem is, we had a situation where Delta was UP, but Delta's nrpe wasn't running, so nagios could not check Tango or any Tango services. This generated a bunch of incorrect alerts that Tango was DOWN, when it was in fact UNREACHABLE.
As I understand it, the parent/child relationship typically expects the "parent" to be a switch that allows/prevents access, and if the switch is down, we assume the hosts behind it are unreachable. That's not the case here.
I could use dependencies. However, as I understand it, a host (Tango) cannot be dependent on a service (Delta's nrpe port answering queries). A host can be dependent on a host, or a service can be dependent on a service.
So it sounds like my course of action is to stop doing "host checks" for Tango, only do service checks, and make every Tango service dependent on Delta's nrpe service. But that seems inefficient and unwieldy.
Is there a better option for my configuration? I'm not opposed to redesigning the nagios structure if I need to (though I'd rather not). I'd like to find a way to make the host Tango dependent on the service Delta-nrpe.
Thanks.
I have a nagios server named Alpha. It can get to hosts Delta, Echo, and Foxtrot. Delta is a "bastion" host. It can get to hosts Tango, Uniform, and Victor; but Alpha cannot get to those hosts directly.
Since Alpha cannot get to Tango/Uniform/Victor, I have checks set up like this:
Code: Select all
define host{
use linux-server ; Inherit default values from a template
host_name tango ; The name we're giving to this host
check_command check_nrpe_remote!delta!check_by_nrpe_ping_$HOSTNAME$ -t 20
address 192.168.10.166 ; IP address of the host
parents delta ;
}
Code: Select all
command[check_by_nrpe_ping_tango]=/usr/lib64/nagios/plugins/check_ping -H tango -w 100,20% -c 500,60%
command[check_by_nrpe_ping_uniform]=/usr/lib64/nagios/plugins/check_ping -H uniform -w 100,20% -c 500,60%
command[check_by_nrpe_ping_victor]=/usr/lib64/nagios/plugins/check_ping -H victor -w 100,20% -c 500,60%
The problem is, we had a situation where Delta was UP, but Delta's nrpe wasn't running, so nagios could not check Tango or any Tango services. This generated a bunch of incorrect alerts that Tango was DOWN, when it was in fact UNREACHABLE.
As I understand it, the parent/child relationship typically expects the "parent" to be a switch that allows/prevents access, and if the switch is down, we assume the hosts behind it are unreachable. That's not the case here.
I could use dependencies. However, as I understand it, a host (Tango) cannot be dependent on a service (Delta's nrpe port answering queries). A host can be dependent on a host, or a service can be dependent on a service.
So it sounds like my course of action is to stop doing "host checks" for Tango, only do service checks, and make every Tango service dependent on Delta's nrpe service. But that seems inefficient and unwieldy.
Is there a better option for my configuration? I'm not opposed to redesigning the nagios structure if I need to (though I'd rather not). I'd like to find a way to make the host Tango dependent on the service Delta-nrpe.
Thanks.