Page 2 of 2

Re: No Blocking Outage Alerts for Parents

Posted: Wed Mar 13, 2013 2:41 pm
by kurt2439
OK I just re-ran this to make sure I am not missing something. I am checking host UP using a simple ping.

When apple is down (change address to an unresolvable FQDN), I receive alerts correctly. Just an alert for apple

When just shiva2 and soekris1 are down (change address to an IP that will not respond to ping), I receive alerts correctly. Just for shiva2 and soekris1

Now to test an everything down condition I change the host check_command to an check_nrpe check. If nrpe does not respond, nagios will call the host DOWN. This is applied through the "generic_host" template so this template applies to Apple (but I have overridden the host check_command to always be a ping test for this host), shiva2, soekris1, and all the hosts behind shiva2 and soekris1. I then block all NRPE traffic to shiva2/soekris1 and all the hosts behind them with a firewall rule on the shiva2/soekris1 firewalls. As an additional note, I have taken 1 host, 'apache1' and made its parent 'apple' even though the parent is really shiva2/soekris1 in the real world.

apache1 sends an alert as expected. shiva2 and soekris1 sit at "Current Attempt 1/3 (SOFT state)" but are shown as blocking outages. when the "next scheduled check" time comes, it passes without any change in the current attempt count, and is re-scheduled for a minute or so later. apache1, shiva2, and soekris1 are all shown under Hosts Down. All the other servers that are dependent on shiva2/soekris1 are shown as unreachable. All other servers are correctly in the UP state. Alerts for soekris1/shiva2 never come -- at least not in the time I let it run. In my experience, had I done a manual host alive check via the web interface, the current attempt does increment and I will get an alert if I manually check it enough to get it to a HARD state (I did not test this in my most recent run so my memory could be incorrect here).

since apache1 has no dependents and I get alerts correctly, is there something up with the children of shiva2/soekris1 that are preventing alerts for these parents? Hope my explanation is clear.

Re: No Blocking Outage Alerts for Parents

Posted: Thu Mar 14, 2013 10:13 am
by scottwilkerson
While this is a bit confusing, you never mentioned the state of
sp_int_sw1
or
soekris1-sham,soekris2-shamm

Are they Up or Down during this test?

Are you making apache1 the ONLY parent of apple?

Re: No Blocking Outage Alerts for Parents

Posted: Thu Mar 14, 2013 11:47 am
by kurt2439
sp_int_sw1 and soekris1-sham, soekris2-sham are UP in all of these tests

no Apple is the parent of apache1 and it is not the only host that is the direct parent of. Apple is the direct parent of apache1, soekris1, and shiva2.

Re: No Blocking Outage Alerts for Parents

Posted: Thu Mar 14, 2013 3:58 pm
by scottwilkerson
kurt2439 wrote: In my experience, had I done a manual host alive check via the web interface, the current attempt does increment and I will get an alert if I manually check it enough to get it to a HARD state (I did not test this in my most recent run so my memory could be incorrect here).
As far as I am aware, if a hosts parent is in a down state, the children will not update their Current attempt and will NEVER send notification until the parent goes back to an UP state.

Re: No Blocking Outage Alerts for Parents

Posted: Thu Mar 14, 2013 4:33 pm
by kurt2439
the hosts i am concerned about, "soekris1" & "shiva2" are children of "Apple" which is UP.

the logic you describe makes sense and is desirable but that is not a situation I am describing here, though the behavior of nagios seems to imply something of this nature being true.

Re: No Blocking Outage Alerts for Parents

Posted: Thu Mar 14, 2013 4:50 pm
by scottwilkerson
Interesting...

Other possible related nagios.cfg parameters

Code: Select all

soft_state_dependencies
enable_predictive_host_dependency_checks
Additionally, are there any hostdependencies defined in the configuration?

Re: No Blocking Outage Alerts for Parents

Posted: Thu Mar 14, 2013 5:16 pm
by kurt2439
host dependencies! ugh. I did not think so because of the parent logic and didn't even think to check but of course this was the problem. Waste of time...

Thanks for helping me with this issue!