Page 1 of 3

Parent/Child Blocking issues

Posted: Tue Feb 28, 2017 10:22 pm
by Fred Kroeger
Have come across a problem with alerts being generated for Child hosts while an upstream Parent is down.
My understanding is that if a Host goes down, then all subsequent Child hosts will not be monitored (and no alerts generated) while that parent is down.
I have setup the following scenario
Parent child.PNG
The REMOTE-SITE host uses check_multiaddr with 2 IPs . This works fine - I've checked and failed 1 IP at a time and it only goes critical when both IPs are not pingable. So far so good.

I tested this by changing the IPs of REMOTE-SITE to nonexistent IPs. Nagios picked up that REMOTE-SITE was down and generated the correct notifications.
Looked at Network outages - It showed that REMOTE-SITE was down , however it showed that 7 Hosts were affected. Only 4 devices are configured so this is wrong. Nagios doesn't display what those 7 hosts are so I'm not sure how it got that number.

It also appears that the devices behnd the REMOTE-SITE continued to be monitored by Nagios ?

The next test I ran was to change the IP of one of the devices behind the firewall. Beacause the REMOTE-SITE Host was down, I expected that monitoring would have been suspended.
However, that device showed down and created a notification.

Is there something wrong with my logic here? I expect that all downstream monitoring should stop when a parent goes down. If the parent is down, why would you bother trying to monitor any child devices?

I'm running NagiosXI 5.4.2

regards... Fred

Re: Parent/Child Blocking issues

Posted: Wed Mar 01, 2017 4:04 pm
by lmiltchev
My understanding is that if a Host goes down, then all subsequent Child hosts will not be monitored (and no alerts generated) while that parent is down.
This is not entirely correct. If the parent is down, the child' status would be "UNREACHABLE", instead of "DOWN", however the child will still be monitored. You could set up your notifications in a way that you won't be receiving "UNREACHABLE" notifications, only the "DOWN" ones (in order to "reduce the noise"). Please, check the "Determining Status and Reachability of Network Hosts" article in our official Nagios Core documentation here:
https://assets.nagios.com/downloads/nag ... ility.html

If you don't want to monitor a host (i.e. child), when another host (parent) is down, you could set up host dependencies with the execution failure criteria set up. Read more on host and service dependencies here:
https://assets.nagios.com/downloads/nag ... ncies.html

If you don't want to monitor services when a host is down, then you can set the following directive in the nagios.cfg:

Code: Select all

host_down_disable_service_checks=1
This is a global option though, and it will affect ALL of the services. Read more on the topic here:
https://support.nagios.com/kb/article.php?id=505

Let us know if this helped.

Re: Parent/Child Blocking issues

Posted: Thu Mar 02, 2017 7:02 pm
by Fred Kroeger
Thanks for the detailed response. I think I jumped a step in my explanation. The issue I had was that I got an alert that the Child was "DOWN" while the upstream parent "REMOTE-SITE" was down.
As you stated, I would have expected an "Unreachable" state - which by the way I don't receive notifications for.

It's interesting that you say that Nagios still monitors the child though while the parent is down? What value does this provide ?

I didn't want to go down the path of Host dependencies - this really complicates what should already happen with a Parent/Child relationship?

And yes I also implemented the "host_down_disable_service_checks=1" directive as soon as it was released. Again , there really is no point in trying to monitor the services if the Host is down.
I'm still interested in knowing why the Network Outages shows the wrong number of hosts affected. There are only 4 Hosts behind the "REMOTE-SITE" host.

Re: Parent/Child Blocking issues

Posted: Fri Mar 03, 2017 3:19 pm
by avandemore
If you provide a profile and the logs, we can work out what your Nagios install saw and acted upon.

XI > Admin > System Profile > Download Profile
/usr/local/nagios/var/nagios.log
/usr/local/nagios/var/archives/*.log

Please include the zip file in your response. You can PM myself or other support personnel if you have privacy concerns.

Re: Parent/Child Blocking issues

Posted: Tue Mar 07, 2017 3:27 pm
by avandemore
Just as an update, I received your logs. I will get back to you.

Re: Parent/Child Blocking issues

Posted: Tue Mar 07, 2017 4:38 pm
by avandemore
I'm having a hard time correlating the logs against defined objects, can you send the profile as well?

Re: Parent/Child Blocking issues

Posted: Tue Mar 07, 2017 5:32 pm
by Fred Kroeger
The profile was included in the zip file I sent. I'll send it separately.

Just tried to PM it to you and get the following message

Code: Select all

You cannot make another post so soon after your last.
Obviously 12 hours between messages is not long enough ?

Re: Parent/Child Blocking issues

Posted: Tue Mar 07, 2017 5:55 pm
by avandemore
Oh, you are correct I have your profile. Sry about that. If I don't reply back soon, I have you in my calendar for tmw morning.

I still don't see anything for the ip's like 10.1.1.130/131, what hosts do those correspond to?

I have no idea on the forum thing, did you hit that back button or anything else out of order happen?

Re: Parent/Child Blocking issues

Posted: Wed Mar 08, 2017 1:07 am
by Fred Kroeger
I will PM you the real IP addresses - I had to change them when I posted the diagram above. You will see the hostnames anyway in the Notification snapshot I sent you.
The issues with the PM this morning was as simple as me selecting the PM icon by your name. It popped up the message screen as per normal - I attached the file and pressed submit.

Re: Parent/Child Blocking issues

Posted: Wed Mar 08, 2017 2:58 pm
by avandemore
I'm going to try and answer your outstanding question all in this one post. If I miss one, let me know.
It's interesting that you say that Nagios still monitors the child though while the parent is down? What value does this provide ?
Not everyone uses Nagios the same way. For example a child host may contain different notification parameters where this behavior is useful.
I didn't want to go down the path of Host dependencies - this really complicates what should already happen with a Parent/Child relationship?
I'm unable to understand you exact question here, but host dependencies have different functionality than parent/child. You should use the one that provides the functionality you need.
I'm still interested in knowing why the Network Outages shows the wrong number of hosts affected. There are only 4 Hosts behind the "REMOTE-SITE" host.
Can you provide a screenshot of your Network Status Map? The legacy version of it may be helpful as well.