Parent Relationship Not Suppressing Downstream Notifications
Parent Relationship Not Suppressing Downstream Notifications
Hello,
We have our configuration structured like this:
[Host: Nagios Server]
--> XXX.XXX.217.1 [Host: Nagios' Subnet Uplink]
----> XXX.XXX.218.1 [Host: Remote Host Subnet Uplink]
------> XXX.XXX.218.20 [Host: Remote Host]
--------> [Service: Remote Host Service]
As I understand it, if 218.1 is unreachable I should not get notifications for 218.20 or the services running on it. If 218.20 appears as down, Nagios will immediately check it's parent (218.1) to see if it is responding. If it is, then send an alert for 218.20, if it is not, check 217.1 and so on.
Last night we had an outage somewhere above the 218.20 level (either 218.1 or 217.1) and we still got notifications for all of our hosts on the 218 subnet. Is there something I'm missing? At the moment we have 218.20's parent set to the 218.1 host, and 218.1's parent set to the 217.1 host. Is this correct? Where am I going wrong?
Thanks!
We have our configuration structured like this:
[Host: Nagios Server]
--> XXX.XXX.217.1 [Host: Nagios' Subnet Uplink]
----> XXX.XXX.218.1 [Host: Remote Host Subnet Uplink]
------> XXX.XXX.218.20 [Host: Remote Host]
--------> [Service: Remote Host Service]
As I understand it, if 218.1 is unreachable I should not get notifications for 218.20 or the services running on it. If 218.20 appears as down, Nagios will immediately check it's parent (218.1) to see if it is responding. If it is, then send an alert for 218.20, if it is not, check 217.1 and so on.
Last night we had an outage somewhere above the 218.20 level (either 218.1 or 217.1) and we still got notifications for all of our hosts on the 218 subnet. Is there something I'm missing? At the moment we have 218.20's parent set to the 218.1 host, and 218.1's parent set to the 217.1 host. Is this correct? Where am I going wrong?
Thanks!
Re: Parent Relationship Not Suppressing Downstream Notificat
Can you post the configs for those relationships? If you need to censor anything out please do.
Former Nagios employee
Re: Parent Relationship Not Suppressing Downstream Notificat
In reality the path from Nagios (217.140) to XXXDC01 (218.49) goes like this:
217.150 (Nagios - OfficeThree)
217.1 (Router - OfficeThree)
179.10 (ISP Uplink - OfficeThree)
179.9 (ISP Uplink - OfficeTwo)
179.17 / 179.18 (Redundant ISP Uplinks - OfficeOne)
218.1 (Router - OfficeOne)
218.49 (XXXDC01 - OfficeOne)
The path physically goes from Nagios (OfficeThree) to OfficeTwo to OfficeOne to the local router in OfficeOne to the end host XXXDC01.
Sorry if this appears confusing, but these were all created by NagiosQL:
217.150 (Nagios - OfficeThree)
217.1 (Router - OfficeThree)
179.10 (ISP Uplink - OfficeThree)
179.9 (ISP Uplink - OfficeTwo)
179.17 / 179.18 (Redundant ISP Uplinks - OfficeOne)
218.1 (Router - OfficeOne)
218.49 (XXXDC01 - OfficeOne)
The path physically goes from Nagios (OfficeThree) to OfficeTwo to OfficeOne to the local router in OfficeOne to the end host XXXDC01.
Sorry if this appears confusing, but these were all created by NagiosQL:
Code: Select all
define host {
host_name XXXDC01
use Domain Controller
alias XXXDC01
display_name XXXDC01
address XXX.XXX.218.49
parents XXX.XXX.218.1 Switch OfficeOne
hostgroups Domain Controllers Prod
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.218.1 Switch OfficeOne
use XXX Switch
alias XXX.XXX.218.1 Switch OfficeOne
display_name XXX.XXX.218.1 Switch OfficeOne
address XXX.XXX.218.1
parents XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup,XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
hostgroups XXX Switches
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup
use XXX Switch
alias XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup
display_name XXX.XXX.179.17 GlobalNet OfficeOne ISPName Backup
address XXX.XXX.179.17
parents XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
hostgroups XXX Switches
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
use XXX Switch
alias XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
display_name XXX.XXX.179.18 GlobalNet OfficeOne ISPName Primary
address XXX.XXX.179.18
parents XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
hostgroups XXX Switches
check_command check-host-alive!!!!!!!!
max_check_attempts 5
register 1
}
define host {
host_name XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
use YYY Switch
alias XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
display_name XXX.XXX.179.9 GlobalNet OfficeTwo ISPName
address XXX.XXX.179.9
parents XXX.XXX.179.10 GlobalNet OfficeTwo Company
hostgroups YYY Switches
check_command check-host-alive!!!!!!!!
register 1
}
define host {
host_name XXX.XXX.179.10 GlobalNet OfficeTwo Company
use YYY Switch
alias XXX.XXX.179.10 GlobalNet OfficeTwo Company
display_name XXX.XXX.179.10 GlobalNet OfficeTwo Company
address XXX.XXX.179.10
parents XXX.XXX.217 OfficeThree Core
hostgroups YYY Switches
check_command check-host-alive!!!!!!!!
register 1
}
define host {
host_name XXX.XXX.217 OfficeThree Core
use ZZZ Switch
alias XXX.XXX.217 OfficeThree Core
display_name XXX.XXX.217 OfficeThree Core
address XXX.XXX.217.1
hostgroups ZZZ Switches
check_command check-host-alive!!!!!!!!
register 1
}
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Parent Relationship Not Suppressing Downstream Notificat
I just want to clear up the first post. When you assign a parent host to another host, if that parent host goes down the host you assigned it to will appear unknown, as well as it's services. As I am sure you are aware this is done to simulate real network reachability, since the child host cannot get through to Nagios if it's parent is offline.
Re: Parent Relationship Not Suppressing Downstream Notificat
Hi,slansing wrote:I just want to clear up the first post. When you assign a parent host to another host, if that parent host goes down the host you assigned it to will appear unknown, as well as it's services. As I am sure you are aware this is done to simulate real network reach-ability, since the child host cannot get through to Nagios if it's parent is offline.
Yes. My initial post may have been worded oddly. We use it purely for network-reachability resolutions. The only hosts that are parents are switching equipment. For hosts, their parent is their local switch. For switches, it's their local router.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Parent Relationship Not Suppressing Downstream Notificat
Did the notification say the hosts were DOWN or UNREACHABLE?Smark wrote: Last night we had an outage somewhere above the 218.20 level (either 218.1 or 217.1) and we still got notifications for all of our hosts on the 218 subnet. Is there something I'm missing? At the moment we have 218.20's parent set to the 218.1 host, and 218.1's parent set to the 217.1 host. Is this correct? Where am I going wrong?
Thanks!
Many people disable UNREACHABLE notifications to eliminate this situation.
Re: Parent Relationship Not Suppressing Downstream Notificat
The notification said "DOWN". Like you said, we also have UNREACHABLE notifications disabled.scottwilkerson wrote:Did the notification say the hosts were DOWN or UNREACHABLE?Smark wrote: Last night we had an outage somewhere above the 218.20 level (either 218.1 or 217.1) and we still got notifications for all of our hosts on the 218 subnet. Is there something I'm missing? At the moment we have 218.20's parent set to the 218.1 host, and 218.1's parent set to the 217.1 host. Is this correct? Where am I going wrong?
Thanks!
Many people disable UNREACHABLE notifications to eliminate this situation.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Parent Relationship Not Suppressing Downstream Notificat
I did notice some of these have the following specified, and others do not
It is possible that the items that do not would report HARD DOWN before max_check_attempts made the parent down.
Code: Select all
max_check_attempts 5Re: Parent Relationship Not Suppressing Downstream Notificat
So, that part of our network has an inherently unreliable connection, so it's check attempts are 5, all other objects in Nagios are 2. Everything in that portion of the network (meaning the switches and the hosts) has the max attempts set to 5.scottwilkerson wrote:I did notice some of these have the following specified, and others do notIt is possible that the items that do not would report HARD DOWN before max_check_attempts made the parent down.Code: Select all
max_check_attempts 5
If I'm understanding correctly, when an end host gets a hard down, Nagios goes to do a full check of it's parent. As what point does it say "this parent is up/down"? Is it after max_check_attempts, or some other setting?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Parent Relationship Not Suppressing Downstream Notificat
DOWN after it goes HARD down (down for max_check_attempts * retry_interval)
UP after a check attempt comes back UP
UP after a check attempt comes back UP