Parent/Child Blocking issues

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Parent/Child Blocking issues

Post by Fred Kroeger »

Have come across a problem with alerts being generated for Child hosts while an upstream Parent is down.
My understanding is that if a Host goes down, then all subsequent Child hosts will not be monitored (and no alerts generated) while that parent is down.
I have setup the following scenario
Parent child.PNG
The REMOTE-SITE host uses check_multiaddr with 2 IPs . This works fine - I've checked and failed 1 IP at a time and it only goes critical when both IPs are not pingable. So far so good.

I tested this by changing the IPs of REMOTE-SITE to nonexistent IPs. Nagios picked up that REMOTE-SITE was down and generated the correct notifications.
Looked at Network outages - It showed that REMOTE-SITE was down , however it showed that 7 Hosts were affected. Only 4 devices are configured so this is wrong. Nagios doesn't display what those 7 hosts are so I'm not sure how it got that number.

It also appears that the devices behnd the REMOTE-SITE continued to be monitored by Nagios ?

The next test I ran was to change the IP of one of the devices behind the firewall. Beacause the REMOTE-SITE Host was down, I expected that monitoring would have been suspended.
However, that device showed down and created a notification.

Is there something wrong with my logic here? I expect that all downstream monitoring should stop when a parent goes down. If the parent is down, why would you bother trying to monitor any child devices?

I'm running NagiosXI 5.4.2

regards... Fred
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Parent/Child Blocking issues

Post by lmiltchev »

My understanding is that if a Host goes down, then all subsequent Child hosts will not be monitored (and no alerts generated) while that parent is down.
This is not entirely correct. If the parent is down, the child' status would be "UNREACHABLE", instead of "DOWN", however the child will still be monitored. You could set up your notifications in a way that you won't be receiving "UNREACHABLE" notifications, only the "DOWN" ones (in order to "reduce the noise"). Please, check the "Determining Status and Reachability of Network Hosts" article in our official Nagios Core documentation here:
https://assets.nagios.com/downloads/nag ... ility.html

If you don't want to monitor a host (i.e. child), when another host (parent) is down, you could set up host dependencies with the execution failure criteria set up. Read more on host and service dependencies here:
https://assets.nagios.com/downloads/nag ... ncies.html

If you don't want to monitor services when a host is down, then you can set the following directive in the nagios.cfg:

Code: Select all

host_down_disable_service_checks=1
This is a global option though, and it will affect ALL of the services. Read more on the topic here:
https://support.nagios.com/kb/article.php?id=505

Let us know if this helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child Blocking issues

Post by Fred Kroeger »

Thanks for the detailed response. I think I jumped a step in my explanation. The issue I had was that I got an alert that the Child was "DOWN" while the upstream parent "REMOTE-SITE" was down.
As you stated, I would have expected an "Unreachable" state - which by the way I don't receive notifications for.

It's interesting that you say that Nagios still monitors the child though while the parent is down? What value does this provide ?

I didn't want to go down the path of Host dependencies - this really complicates what should already happen with a Parent/Child relationship?

And yes I also implemented the "host_down_disable_service_checks=1" directive as soon as it was released. Again , there really is no point in trying to monitor the services if the Host is down.
I'm still interested in knowing why the Network Outages shows the wrong number of hosts affected. There are only 4 Hosts behind the "REMOTE-SITE" host.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Parent/Child Blocking issues

Post by avandemore »

If you provide a profile and the logs, we can work out what your Nagios install saw and acted upon.

XI > Admin > System Profile > Download Profile
/usr/local/nagios/var/nagios.log
/usr/local/nagios/var/archives/*.log

Please include the zip file in your response. You can PM myself or other support personnel if you have privacy concerns.
Previous Nagios employee
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Parent/Child Blocking issues

Post by avandemore »

Just as an update, I received your logs. I will get back to you.
Previous Nagios employee
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Parent/Child Blocking issues

Post by avandemore »

I'm having a hard time correlating the logs against defined objects, can you send the profile as well?
Previous Nagios employee
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child Blocking issues

Post by Fred Kroeger »

The profile was included in the zip file I sent. I'll send it separately.

Just tried to PM it to you and get the following message

Code: Select all

You cannot make another post so soon after your last.
Obviously 12 hours between messages is not long enough ?
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Parent/Child Blocking issues

Post by avandemore »

Oh, you are correct I have your profile. Sry about that. If I don't reply back soon, I have you in my calendar for tmw morning.

I still don't see anything for the ip's like 10.1.1.130/131, what hosts do those correspond to?

I have no idea on the forum thing, did you hit that back button or anything else out of order happen?
Previous Nagios employee
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child Blocking issues

Post by Fred Kroeger »

I will PM you the real IP addresses - I had to change them when I posted the diagram above. You will see the hostnames anyway in the Notification snapshot I sent you.
The issues with the PM this morning was as simple as me selecting the PM icon by your name. It popped up the message screen as per normal - I attached the file and pressed submit.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Parent/Child Blocking issues

Post by avandemore »

I'm going to try and answer your outstanding question all in this one post. If I miss one, let me know.
It's interesting that you say that Nagios still monitors the child though while the parent is down? What value does this provide ?
Not everyone uses Nagios the same way. For example a child host may contain different notification parameters where this behavior is useful.
I didn't want to go down the path of Host dependencies - this really complicates what should already happen with a Parent/Child relationship?
I'm unable to understand you exact question here, but host dependencies have different functionality than parent/child. You should use the one that provides the functionality you need.
I'm still interested in knowing why the Network Outages shows the wrong number of hosts affected. There are only 4 Hosts behind the "REMOTE-SITE" host.
Can you provide a screenshot of your Network Status Map? The legacy version of it may be helpful as well.
Previous Nagios employee
Locked