Parent/Child anomaly

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Parent/Child anomaly

Post by Fred Kroeger »

I have set up the following Parent/Child relationship

Code: Select all

              |----- Router A -----|
Nagios--------|                    |---------Switch X ------- Servers [1 - X]
              |----- Router B -----|
Basically , Switch X has 2 Parents (Router A & B) and the Servers (1 to X) have Switch X as their Parent

My understanding is that if either Router A OR Router B are Down, then monitoring continues as per normal,
If Router A AND Router B are Down , then we have a blocking outage and monitoring of the Servers is suspended.

However this does not appear to be the case?
Currentlly I have Router A down - the Network Outages link shows that 53 Hosts & 451 Serrvices are affected.
I would only expect to see that if both Routers are down?
The downstream Switch & Servers are all still being actively monitored, so it would appear that they haven't been affected?

Why am I seeing a Network Outage then ? Also it is showing a Severity of 165 - what is this ?

Network Outages
Severity Host State Duration Hosts Affected Services Affected
165 Router-A Down 62d 7h 26m 10s 53 451


I'm running NagiosXI 2012R2.8c

Fred
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Parent/Child anomaly

Post by tmcdonald »

Can you post either screenshots or text configurations of the two routers and the switch? Preferably also at least one of the servers behind the switch.
Former Nagios employee
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child anomaly

Post by Fred Kroeger »

I have attached the Overview screen which shows the Network outage.
I have attached what is displayed when I select the Blocking Outage - this shows that 53 hosts are affected (and their services)
However displaying any of the hosts that are a child of the Switch X , show they are being still being actively monitored.


Switch Config

Code: Select all

define host {
        host_name                       SWITCH X
        use                             xiwizard_genericnetdevice_host
        address                         x.x.x.x
        parents                         ROUTER-A,ROUTER-B
        hostgroups                      C-IAS
        check_command                   check-host-alive!!!!!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    24x7
        notification_interval           0
        notification_period             24x7
        icon_image                      cisco.png
        statusmap_image                 cisco.png
        _xiwizard                       snmpwalk
        register                        1
        }
Router-A config - note: ROUTER-B is identical

Code: Select all

define host {
        host_name                       ROUTER-A
        use                             xiwizard_genericnetdevice_host
        address                         x.x.x.x
        hostgroups                      C-ASG,Network - ASA Devices
        check_command                   check-host-alive!!!!!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    24x7
        notification_interval           0
        notification_period             24x7
        notes                           -31.99628, 115.8883650000000
        icon_image                      cisco.png
        statusmap_image                 cisco.png
        _xiwizard                       snmpwalk
        register                        1
        }
You do not have the required permissions to view the files attached to this post.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Parent/Child anomaly

Post by BanditBBS »

Fred,

I don't mean to hijack your thread here, but I have quite similar dual parent setup here with just about everything in my organization. I haven't seen this yet, so i am very interested.

What it sounds like to me is the servers are all still being monitored, it is just alerting you to a network outage affect the downstream hosts. While it isn't a true outage since there is redundancy, it is still an outage that potentially affects a number of downstream hosts because they no longer have that redundancy.

devil's advocate here: As long as the servers are still being monitored and alerts still being sent for them(can you test that?), wouldn't you want alerted to the potential outage like that?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Parent/Child anomaly

Post by lmiltchev »

@Fred Kroeger

I suspect this could be a bug but cannot tell for sure. I am curious to find out if you can see any network outages when you log in Nagios Core directly (http://<ip>/nagios). Do you get the same results?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Parent/Child anomaly

Post by Fred Kroeger »

Checked Nagios Core and it displays the Blocking Outage the same as NagiosXI.
I checked Network Status Map and it shows the 2 paths to the Switch and one path is Red. However, Hypermap only displays a single path to the switch via the router that is showing down.

Happy for you to add comments Bandit - yes it is useful to know that there is a potential problem to downstream hosts but perhaps we shouldn't call it a "Blocking Outage"?
I am still actively monitoring all the downstream hosts , so it isn't really blocking any hosts/services.

I guess something else that would be useful to see on the Blocking Outage screen is a way of displaying which hosts are affected - the number of hosts is displayed but it would help to be able to click on the hosts field and it display a table of hostnames.

Fred
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Parent/Child anomaly

Post by slansing »

I suppose it may have been called that in the sense that a switch going down, blocks you from contacting it's children on the other end, in a sense.
Locked