Parent/Child Blocking issues

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child Blocking issues

Post by Fred Kroeger »

Thanks all - let me know if there is any more info I can feed you. We got a couple of hundred notifcations yesterday for the Child hosts after the Parent went down so in my case the notifications aren't getting blocked nor are the Child hosts seen as unreachable. The email notifications showed them as down. This is similar to the Notifications screenshot I sent previously where there was one child behind the parent.
Fred
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Parent/Child Blocking issues

Post by bheden »

I guess I do have a few questions:

Has this ever happened before?

Have you ever had an outage happen previously where you noticed that the states of child hosts were actually being set to UNREACHABLE?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Parent/Child Blocking issues

Post by bheden »

Also, looking through the source in Core - if you were to enable debugging and set verbosity to 2 - we'd probably have some useful debugging output if you were able to simulate another outage perhaps. Is this a possibility?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child Blocking issues

Post by Fred Kroeger »

I'm pretty sure that I tested this in an older version of Nagios some time ago - which is why I went down this path for this particular installation. Since all the checks are run by a Mod Gearman worker at the clients site, it made sense to make all the hosts a child of the Worker, so if we lost connection to the worker then we wouldn't get hit with an alert for every host and service.

I could schedule this test after Wednesday next week. Let me know what needs to be set and what files you want me to send.

Fred
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Parent/Child Blocking issues

Post by bheden »

Since all the checks are run by a Mod Gearman worker at the clients site, it made sense to make all the hosts a child of the Worker, so if we lost connection to the worker then we wouldn't get hit with an alert for every host and service.
This still makes sense.

In regards to your initial post here, the picture with

Code: Select all


                 /---- FIREWALL-1A ------\
                /                         \
REMOTE-SITE ---<                           >--- 2x Devices at Remote Site
                \                         /
                 \---- FIREWALL-1B ------/

This doesn't look like a ModGearman parent relationship to me. Perhaps I'm mistaken? You mention this, and then the ModGearman worker as a parent also. Did ALL of the parent/child relationships fail in such a way that ALL children of ALL parents were DOWN instead of UNREACHABLE? Or maybe some of them worked and some of them didn't? If some did work, which ones? By "which ones" - I literally mean the host names so I can match up the relationships based on the profile you've submitted.

Can you point out the host names of some of the ModGearman worker parent/child relationships? I only see one obvious one with it set as the parent for only 2 hosts.

Provide me this information so that I can review your object definitions, and then I can give you a detailed instruction list. Which of the parent/child relationships are you going to simulate failure for?

Also, if you're not comfortable listing those host names publicly I can accept them in a PM.

Thanks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child Blocking issues

Post by Fred Kroeger »

Yes - the original diagram was basically everything after the Worker.
I have PM'd you the full topology together with the hostnames/IPs so that you can follow the paths.
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Parent/Child Blocking issues

Post by bheden »

Just to get this off of the support team's dashboard, I'm replying. Fred, I'll respond here or reply to your PM directly when I have some meaningful information.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
Locked