Parent/Child Blocking issues
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Parent/Child Blocking issues
Have come across a problem with alerts being generated for Child hosts while an upstream Parent is down.
My understanding is that if a Host goes down, then all subsequent Child hosts will not be monitored (and no alerts generated) while that parent is down.
I have setup the following scenario The REMOTE-SITE host uses check_multiaddr with 2 IPs . This works fine - I've checked and failed 1 IP at a time and it only goes critical when both IPs are not pingable. So far so good.
I tested this by changing the IPs of REMOTE-SITE to nonexistent IPs. Nagios picked up that REMOTE-SITE was down and generated the correct notifications.
Looked at Network outages - It showed that REMOTE-SITE was down , however it showed that 7 Hosts were affected. Only 4 devices are configured so this is wrong. Nagios doesn't display what those 7 hosts are so I'm not sure how it got that number.
It also appears that the devices behnd the REMOTE-SITE continued to be monitored by Nagios ?
The next test I ran was to change the IP of one of the devices behind the firewall. Beacause the REMOTE-SITE Host was down, I expected that monitoring would have been suspended.
However, that device showed down and created a notification.
Is there something wrong with my logic here? I expect that all downstream monitoring should stop when a parent goes down. If the parent is down, why would you bother trying to monitor any child devices?
I'm running NagiosXI 5.4.2
regards... Fred
My understanding is that if a Host goes down, then all subsequent Child hosts will not be monitored (and no alerts generated) while that parent is down.
I have setup the following scenario The REMOTE-SITE host uses check_multiaddr with 2 IPs . This works fine - I've checked and failed 1 IP at a time and it only goes critical when both IPs are not pingable. So far so good.
I tested this by changing the IPs of REMOTE-SITE to nonexistent IPs. Nagios picked up that REMOTE-SITE was down and generated the correct notifications.
Looked at Network outages - It showed that REMOTE-SITE was down , however it showed that 7 Hosts were affected. Only 4 devices are configured so this is wrong. Nagios doesn't display what those 7 hosts are so I'm not sure how it got that number.
It also appears that the devices behnd the REMOTE-SITE continued to be monitored by Nagios ?
The next test I ran was to change the IP of one of the devices behind the firewall. Beacause the REMOTE-SITE Host was down, I expected that monitoring would have been suspended.
However, that device showed down and created a notification.
Is there something wrong with my logic here? I expect that all downstream monitoring should stop when a parent goes down. If the parent is down, why would you bother trying to monitor any child devices?
I'm running NagiosXI 5.4.2
regards... Fred
You do not have the required permissions to view the files attached to this post.
Re: Parent/Child Blocking issues
This is not entirely correct. If the parent is down, the child' status would be "UNREACHABLE", instead of "DOWN", however the child will still be monitored. You could set up your notifications in a way that you won't be receiving "UNREACHABLE" notifications, only the "DOWN" ones (in order to "reduce the noise"). Please, check the "Determining Status and Reachability of Network Hosts" article in our official Nagios Core documentation here:My understanding is that if a Host goes down, then all subsequent Child hosts will not be monitored (and no alerts generated) while that parent is down.
https://assets.nagios.com/downloads/nag ... ility.html
If you don't want to monitor a host (i.e. child), when another host (parent) is down, you could set up host dependencies with the execution failure criteria set up. Read more on host and service dependencies here:
https://assets.nagios.com/downloads/nag ... ncies.html
If you don't want to monitor services when a host is down, then you can set the following directive in the nagios.cfg:
Code: Select all
host_down_disable_service_checks=1https://support.nagios.com/kb/article.php?id=505
Let us know if this helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Parent/Child Blocking issues
Thanks for the detailed response. I think I jumped a step in my explanation. The issue I had was that I got an alert that the Child was "DOWN" while the upstream parent "REMOTE-SITE" was down.
As you stated, I would have expected an "Unreachable" state - which by the way I don't receive notifications for.
It's interesting that you say that Nagios still monitors the child though while the parent is down? What value does this provide ?
I didn't want to go down the path of Host dependencies - this really complicates what should already happen with a Parent/Child relationship?
And yes I also implemented the "host_down_disable_service_checks=1" directive as soon as it was released. Again , there really is no point in trying to monitor the services if the Host is down.
I'm still interested in knowing why the Network Outages shows the wrong number of hosts affected. There are only 4 Hosts behind the "REMOTE-SITE" host.
As you stated, I would have expected an "Unreachable" state - which by the way I don't receive notifications for.
It's interesting that you say that Nagios still monitors the child though while the parent is down? What value does this provide ?
I didn't want to go down the path of Host dependencies - this really complicates what should already happen with a Parent/Child relationship?
And yes I also implemented the "host_down_disable_service_checks=1" directive as soon as it was released. Again , there really is no point in trying to monitor the services if the Host is down.
I'm still interested in knowing why the Network Outages shows the wrong number of hosts affected. There are only 4 Hosts behind the "REMOTE-SITE" host.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Parent/Child Blocking issues
If you provide a profile and the logs, we can work out what your Nagios install saw and acted upon.
XI > Admin > System Profile > Download Profile
/usr/local/nagios/var/nagios.log
/usr/local/nagios/var/archives/*.log
Please include the zip file in your response. You can PM myself or other support personnel if you have privacy concerns.
XI > Admin > System Profile > Download Profile
/usr/local/nagios/var/nagios.log
/usr/local/nagios/var/archives/*.log
Please include the zip file in your response. You can PM myself or other support personnel if you have privacy concerns.
Previous Nagios employee
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Parent/Child Blocking issues
Just as an update, I received your logs. I will get back to you.
Previous Nagios employee
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Parent/Child Blocking issues
I'm having a hard time correlating the logs against defined objects, can you send the profile as well?
Previous Nagios employee
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Parent/Child Blocking issues
The profile was included in the zip file I sent. I'll send it separately.
Just tried to PM it to you and get the following message
Obviously 12 hours between messages is not long enough ?
Just tried to PM it to you and get the following message
Code: Select all
You cannot make another post so soon after your last.-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Parent/Child Blocking issues
Oh, you are correct I have your profile. Sry about that. If I don't reply back soon, I have you in my calendar for tmw morning.
I still don't see anything for the ip's like 10.1.1.130/131, what hosts do those correspond to?
I have no idea on the forum thing, did you hit that back button or anything else out of order happen?
I still don't see anything for the ip's like 10.1.1.130/131, what hosts do those correspond to?
I have no idea on the forum thing, did you hit that back button or anything else out of order happen?
Previous Nagios employee
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Parent/Child Blocking issues
I will PM you the real IP addresses - I had to change them when I posted the diagram above. You will see the hostnames anyway in the Notification snapshot I sent you.
The issues with the PM this morning was as simple as me selecting the PM icon by your name. It popped up the message screen as per normal - I attached the file and pressed submit.
The issues with the PM this morning was as simple as me selecting the PM icon by your name. It popped up the message screen as per normal - I attached the file and pressed submit.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Parent/Child Blocking issues
I'm going to try and answer your outstanding question all in this one post. If I miss one, let me know.
Not everyone uses Nagios the same way. For example a child host may contain different notification parameters where this behavior is useful.It's interesting that you say that Nagios still monitors the child though while the parent is down? What value does this provide ?
I'm unable to understand you exact question here, but host dependencies have different functionality than parent/child. You should use the one that provides the functionality you need.I didn't want to go down the path of Host dependencies - this really complicates what should already happen with a Parent/Child relationship?
Can you provide a screenshot of your Network Status Map? The legacy version of it may be helpful as well.I'm still interested in knowing why the Network Outages shows the wrong number of hosts affected. There are only 4 Hosts behind the "REMOTE-SITE" host.
Previous Nagios employee