Page 1 of 2
Parent Relationship Not Suppressing Downstream Notifications
Posted: Fri Jan 24, 2014 10:27 am
by Smark
Hello,
We have our configuration structured like this:
[Host: Nagios Server]
--> XXX.XXX.217.1 [Host: Nagios' Subnet Uplink]
----> XXX.XXX.218.1 [Host: Remote Host Subnet Uplink]
------> XXX.XXX.218.20 [Host: Remote Host]
--------> [Service: Remote Host Service]
As I understand it, if 218.1 is unreachable I should not get notifications for 218.20 or the services running on it. If 218.20 appears as down, Nagios will immediately check it's parent (218.1) to see if it is responding. If it is, then send an alert for 218.20, if it is not, check 217.1 and so on.
Last night we had an outage somewhere above the 218.20 level (either 218.1 or 217.1) and we still got notifications for all of our hosts on the 218 subnet. Is there something I'm missing? At the moment we have 218.20's parent set to the 218.1 host, and 218.1's parent set to the 217.1 host. Is this correct? Where am I going wrong?
Thanks!
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Fri Jan 24, 2014 11:48 am
by tmcdonald
Can you post the configs for those relationships? If you need to censor anything out please do.
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Fri Jan 24, 2014 12:20 pm
by Smark
In reality the path from Nagios (217.140) to XXXDC01 (218.49) goes like this:
217.150 (Nagios - OfficeThree)
217.1 (Router - OfficeThree)
179.10 (ISP Uplink - OfficeThree)
179.9 (ISP Uplink - OfficeTwo)
179.17 / 179.18 (Redundant ISP Uplinks - OfficeOne)
218.1 (Router - OfficeOne)
218.49 (XXXDC01 - OfficeOne)
The path physically goes from Nagios (OfficeThree) to OfficeTwo to OfficeOne to the local router in OfficeOne to the end host XXXDC01.
Sorry if this appears confusing, but these were all created by NagiosQL:
Code: Select all
define host {
host_name XXXDC01
use Domain Controller
alias XXXDC01
display_name XXXDC01
address XXX.XXX.218.49
parents XXX.XXX.218.1 Switch OfficeOne
hostgroups Domain Controllers Prod
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.218.1 Switch OfficeOne
use XXX Switch
alias XXX.XXX.218.1 Switch OfficeOne
display_name XXX.XXX.218.1 Switch OfficeOne
address XXX.XXX.218.1
parents XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup,XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
hostgroups XXX Switches
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup
use XXX Switch
alias XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup
display_name XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup
address XXX.XXX.179.17
parents XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
hostgroups XXX Switches
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
use XXX Switch
alias XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
display_name XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
address XXX.XXX.179.18
parents XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
hostgroups XXX Switches
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
use YYY Switch
alias XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
display_name XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
address XXX.XXX.179.9
parents XXX.XXX.179.10 GlobalNet OfficeTwo Company
hostgroups YYY Switches
check_command check-host-alive!!!!!!!!
register 1
}
define host {
host_name XXX.XXX.179.10 GlobalNet OfficeTwo Company
use YYY Switch
alias XXX.XXX.179.10 GlobalNet OfficeTwo Company
display_name XXX.XXX.179.10 GlobalNet OfficeTwo Company
address XXX.XXX.179.10
parents XXX.XXX.217 OfficeThree Core
hostgroups YYY Switches
check_command check-host-alive!!!!!!!!
register 1
}
define host {
host_name XXX.XXX.217 OfficeThree Core
use ZZZ Switch
alias XXX.XXX.217 OfficeThree Core
display_name XXX.XXX.217 OfficeThree Core
address XXX.XXX.217.1
hostgroups ZZZ Switches
check_command check-host-alive!!!!!!!!
register 1
}
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Fri Jan 24, 2014 1:18 pm
by slansing
I just want to clear up the first post. When you assign a parent host to another host, if that parent host goes down the host you assigned it to will appear unknown, as well as it's services. As I am sure you are aware this is done to simulate real network reachability, since the child host cannot get through to Nagios if it's parent is offline.
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Fri Jan 24, 2014 3:17 pm
by Smark
slansing wrote:I just want to clear up the first post. When you assign a parent host to another host, if that parent host goes down the host you assigned it to will appear unknown, as well as it's services. As I am sure you are aware this is done to simulate real network reach-ability, since the child host cannot get through to Nagios if it's parent is offline.
Hi,
Yes. My initial post may have been worded oddly. We use it purely for network-reachability resolutions. The only hosts that are parents are switching equipment. For hosts, their parent is their local switch. For switches, it's their local router.
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Mon Jan 27, 2014 11:25 am
by scottwilkerson
Smark wrote:
Last night we had an outage somewhere above the 218.20 level (either 218.1 or 217.1) and we still got notifications for all of our hosts on the 218 subnet. Is there something I'm missing? At the moment we have 218.20's parent set to the 218.1 host, and 218.1's parent set to the 217.1 host. Is this correct? Where am I going wrong?
Thanks!
Did the notification say the hosts were DOWN or UNREACHABLE?
Many people disable UNREACHABLE notifications to eliminate this situation.
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Mon Jan 27, 2014 12:26 pm
by Smark
scottwilkerson wrote:Smark wrote:
Last night we had an outage somewhere above the 218.20 level (either 218.1 or 217.1) and we still got notifications for all of our hosts on the 218 subnet. Is there something I'm missing? At the moment we have 218.20's parent set to the 218.1 host, and 218.1's parent set to the 217.1 host. Is this correct? Where am I going wrong?
Thanks!
Did the notification say the hosts were DOWN or UNREACHABLE?
Many people disable UNREACHABLE notifications to eliminate this situation.
The notification said "DOWN". Like you said, we also have UNREACHABLE notifications disabled.
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Mon Jan 27, 2014 2:09 pm
by scottwilkerson
I did notice some of these have the following specified, and others do not
It is possible that the items that do not would report HARD DOWN before max_check_attempts made the parent down.
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Mon Jan 27, 2014 3:27 pm
by Smark
scottwilkerson wrote:I did notice some of these have the following specified, and others do not
It is possible that the items that do not would report HARD DOWN before max_check_attempts made the parent down.
So, that part of our network has an inherently unreliable connection, so it's check attempts are 5, all other objects in Nagios are 2. Everything in that portion of the network (meaning the switches and the hosts) has the max attempts set to 5.
If I'm understanding correctly, when an end host gets a hard down, Nagios goes to do a full check of it's parent. As what point does it say "this parent is up/down"? Is it after max_check_attempts, or some other setting?
Re: Parent Relationship Not Suppressing Downstream Notificat
Posted: Mon Jan 27, 2014 3:55 pm
by scottwilkerson
DOWN after it goes HARD down (down for max_check_attempts * retry_interval)
UP after a check attempt comes back UP