Page 1 of 1

Parent/Child anomaly

Posted: Thu Mar 06, 2014 1:42 am
by Fred Kroeger
I have set up the following Parent/Child relationship

Code: Select all

              |----- Router A -----|
Nagios--------|                    |---------Switch X ------- Servers [1 - X]
              |----- Router B -----|
Basically , Switch X has 2 Parents (Router A & B) and the Servers (1 to X) have Switch X as their Parent

My understanding is that if either Router A OR Router B are Down, then monitoring continues as per normal,
If Router A AND Router B are Down , then we have a blocking outage and monitoring of the Servers is suspended.

However this does not appear to be the case?
Currentlly I have Router A down - the Network Outages link shows that 53 Hosts & 451 Serrvices are affected.
I would only expect to see that if both Routers are down?
The downstream Switch & Servers are all still being actively monitored, so it would appear that they haven't been affected?

Why am I seeing a Network Outage then ? Also it is showing a Severity of 165 - what is this ?

Network Outages
Severity Host State Duration Hosts Affected Services Affected
165 Router-A Down 62d 7h 26m 10s 53 451


I'm running NagiosXI 2012R2.8c

Fred

Re: Parent/Child anomaly

Posted: Thu Mar 06, 2014 10:45 am
by tmcdonald
Can you post either screenshots or text configurations of the two routers and the switch? Preferably also at least one of the servers behind the switch.

Re: Parent/Child anomaly

Posted: Thu Mar 06, 2014 7:54 pm
by Fred Kroeger
I have attached the Overview screen which shows the Network outage.
I have attached what is displayed when I select the Blocking Outage - this shows that 53 hosts are affected (and their services)
However displaying any of the hosts that are a child of the Switch X , show they are being still being actively monitored.


Switch Config

Code: Select all

define host {
        host_name                       SWITCH X
        use                             xiwizard_genericnetdevice_host
        address                         x.x.x.x
        parents                         ROUTER-A,ROUTER-B
        hostgroups                      C-IAS
        check_command                   check-host-alive!!!!!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    24x7
        notification_interval           0
        notification_period             24x7
        icon_image                      cisco.png
        statusmap_image                 cisco.png
        _xiwizard                       snmpwalk
        register                        1
        }
Router-A config - note: ROUTER-B is identical

Code: Select all

define host {
        host_name                       ROUTER-A
        use                             xiwizard_genericnetdevice_host
        address                         x.x.x.x
        hostgroups                      C-ASG,Network - ASA Devices
        check_command                   check-host-alive!!!!!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    24x7
        notification_interval           0
        notification_period             24x7
        notes                           -31.99628, 115.8883650000000
        icon_image                      cisco.png
        statusmap_image                 cisco.png
        _xiwizard                       snmpwalk
        register                        1
        }

Re: Parent/Child anomaly

Posted: Fri Mar 07, 2014 9:40 am
by BanditBBS
Fred,

I don't mean to hijack your thread here, but I have quite similar dual parent setup here with just about everything in my organization. I haven't seen this yet, so i am very interested.

What it sounds like to me is the servers are all still being monitored, it is just alerting you to a network outage affect the downstream hosts. While it isn't a true outage since there is redundancy, it is still an outage that potentially affects a number of downstream hosts because they no longer have that redundancy.

devil's advocate here: As long as the servers are still being monitored and alerts still being sent for them(can you test that?), wouldn't you want alerted to the potential outage like that?

Re: Parent/Child anomaly

Posted: Fri Mar 07, 2014 9:49 am
by lmiltchev
@Fred Kroeger

I suspect this could be a bug but cannot tell for sure. I am curious to find out if you can see any network outages when you log in Nagios Core directly (http://<ip>/nagios). Do you get the same results?

Re: Parent/Child anomaly

Posted: Mon Mar 10, 2014 8:41 pm
by Fred Kroeger
Checked Nagios Core and it displays the Blocking Outage the same as NagiosXI.
I checked Network Status Map and it shows the 2 paths to the Switch and one path is Red. However, Hypermap only displays a single path to the switch via the router that is showing down.

Happy for you to add comments Bandit - yes it is useful to know that there is a potential problem to downstream hosts but perhaps we shouldn't call it a "Blocking Outage"?
I am still actively monitoring all the downstream hosts , so it isn't really blocking any hosts/services.

I guess something else that would be useful to see on the Blocking Outage screen is a way of displaying which hosts are affected - the number of hosts is displayed but it would help to be able to click on the hosts field and it display a table of hostnames.

Fred

Re: Parent/Child anomaly

Posted: Tue Mar 11, 2014 9:17 am
by slansing
I suppose it may have been called that in the sense that a switch going down, blocks you from contacting it's children on the other end, in a sense.