Page 1 of 1

Seeking suggestions for Parent / Child monitoring

Posted: Thu Apr 27, 2017 1:13 pm
by MrWoodward
Been experimenting recently with the Parent / Child relationship with monitoring in Nagios XI.

For example, I have a Host Hypervisor set as a Parent to a bunch of VMs running under the Hypervisor which are the Children. Nagios XI is set to check the Hypervisor (parent) as well as all the VMs (children) at the same interval. Say, every 1 minute.

As an experiment, I unplugged the network cable for the Hypervisor. The children all reported as down as did the parent.

My understanding was that the parent/child relationship was supposed to prevent this. Or rather, I thought what was supposed to happen is that only the parent was supposed to alert and the children silently set themselves to a downed state so as not to create too much "noise" in the reporting.

We considered monitoring the Hypervisor every minute and the children every 2 minutes, but wasn't sure if this was the best way to minimize the problem (as it might still be possible for the network error to be observed by the check on the children before the check on the parent.)

What's the recommended way to configure this?

Thanks!

Re: Seeking suggestions for Parent / Child monitoring

Posted: Fri Apr 28, 2017 9:47 am
by tmcdonald
The way you described it, the children should have been listed as UNREACHABLE instead of DOWN. In order to diagnose this, I would need to see a copy of your configs. In the XI web interface, go to Admin -> System Profile and click the blue "Download Profile" button. Then PM that profile.zip to me. After you have done that, please reply back in this thread with the name of the parent and at least one of the children so we can verify.

Re: Seeking suggestions for Parent / Child monitoring

Posted: Fri Apr 28, 2017 2:44 pm
by MrWoodward
Ok, I'll work on getting this to you. Thanks.

Re: Seeking suggestions for Parent / Child monitoring

Posted: Fri Apr 28, 2017 3:13 pm
by MrWoodward
Ok, so we tested it again and the VMs were UNREACHABLE. (Not DOWN, my mistake.)

What we were concerned about was that since the Hypervisor and the VMs all were in a degraded state simultaneously, we only wanted to get an alert about the Hypervisor being DOWN and not the VMs being UNREACHABLE.

Instead we got a whole bunch of alerts about the VMs being UNREACHABLE and then a few seconds later we got an alert about the Hypervisor being DOWN.

How do we manage this?

Thanks

Re: Seeking suggestions for Parent / Child monitoring

Posted: Fri Apr 28, 2017 3:38 pm
by avandemore
Does this document answer your question?

https://assets.nagios.com/downloads/nag ... ility.html

Re: Seeking suggestions for Parent / Child monitoring

Posted: Mon May 01, 2017 3:25 pm
by MrWoodward
Yes, I think that helps greatly.

Re: Seeking suggestions for Parent / Child monitoring

Posted: Mon May 01, 2017 4:16 pm
by avandemore
Was there anything else you needed help with or are we ok to close this ticket?