Page 1 of 1
Parent/Child anomaly
Posted: Thu Mar 06, 2014 1:42 am
by Fred Kroeger
I have set up the following Parent/Child relationship
Code: Select all
|----- Router A -----|
Nagios--------| |---------Switch X ------- Servers [1 - X]
|----- Router B -----|
Basically , Switch X has 2 Parents (Router A & B) and the Servers (1 to X) have Switch X as their Parent
My understanding is that if either Router A OR Router B are Down, then monitoring continues as per normal,
If Router A AND Router B are Down , then we have a blocking outage and monitoring of the Servers is suspended.
However this does not appear to be the case?
Currentlly I have Router A down - the Network Outages link shows that 53 Hosts & 451 Serrvices are affected.
I would only expect to see that if both Routers are down?
The downstream Switch & Servers are all still being actively monitored, so it would appear that they haven't been affected?
Why am I seeing a Network Outage then ? Also it is showing a Severity of 165 - what is this ?
Network Outages
Severity Host State Duration Hosts Affected Services Affected
165 Router-A Down 62d 7h 26m 10s 53 451
I'm running NagiosXI 2012R2.8c
Fred
Re: Parent/Child anomaly
Posted: Thu Mar 06, 2014 10:45 am
by tmcdonald
Can you post either screenshots or text configurations of the two routers and the switch? Preferably also at least one of the servers behind the switch.
Re: Parent/Child anomaly
Posted: Thu Mar 06, 2014 7:54 pm
by Fred Kroeger
I have attached the Overview screen which shows the Network outage.
I have attached what is displayed when I select the Blocking Outage - this shows that 53 hosts are affected (and their services)
However displaying any of the hosts that are a child of the Switch X , show they are being still being actively monitored.
Switch Config
Code: Select all
define host {
host_name SWITCH X
use xiwizard_genericnetdevice_host
address x.x.x.x
parents ROUTER-A,ROUTER-B
hostgroups C-IAS
check_command check-host-alive!!!!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period 24x7
notification_interval 0
notification_period 24x7
icon_image cisco.png
statusmap_image cisco.png
_xiwizard snmpwalk
register 1
}
Router-A config - note: ROUTER-B is identical
Code: Select all
define host {
host_name ROUTER-A
use xiwizard_genericnetdevice_host
address x.x.x.x
hostgroups C-ASG,Network - ASA Devices
check_command check-host-alive!!!!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period 24x7
notification_interval 0
notification_period 24x7
notes -31.99628, 115.8883650000000
icon_image cisco.png
statusmap_image cisco.png
_xiwizard snmpwalk
register 1
}
Re: Parent/Child anomaly
Posted: Fri Mar 07, 2014 9:40 am
by BanditBBS
Fred,
I don't mean to hijack your thread here, but I have quite similar dual parent setup here with just about everything in my organization. I haven't seen this yet, so i am very interested.
What it sounds like to me is the servers are all still being monitored, it is just alerting you to a network outage affect the downstream hosts. While it isn't a true outage since there is redundancy, it is still an outage that potentially affects a number of downstream hosts because they no longer have that redundancy.
devil's advocate here: As long as the servers are still being monitored and alerts still being sent for them(can you test that?), wouldn't you want alerted to the potential outage like that?
Re: Parent/Child anomaly
Posted: Fri Mar 07, 2014 9:49 am
by lmiltchev
@Fred Kroeger
I suspect this could be a bug but cannot tell for sure. I am curious to find out if you can see any network outages when you log in Nagios Core directly (http://<ip>/nagios). Do you get the same results?
Re: Parent/Child anomaly
Posted: Mon Mar 10, 2014 8:41 pm
by Fred Kroeger
Checked Nagios Core and it displays the Blocking Outage the same as NagiosXI.
I checked Network Status Map and it shows the 2 paths to the Switch and one path is Red. However, Hypermap only displays a single path to the switch via the router that is showing down.
Happy for you to add comments Bandit - yes it is useful to know that there is a potential problem to downstream hosts but perhaps we shouldn't call it a "Blocking Outage"?
I am still actively monitoring all the downstream hosts , so it isn't really blocking any hosts/services.
I guess something else that would be useful to see on the Blocking Outage screen is a way of displaying which hosts are affected - the number of hosts is displayed but it would help to be able to click on the hosts field and it display a table of hostnames.
Fred
Re: Parent/Child anomaly
Posted: Tue Mar 11, 2014 9:17 am
by slansing
I suppose it may have been called that in the sense that a switch going down, blocks you from contacting it's children on the other end, in a sense.