No Blocking Outage Alerts for Parents

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

OK I just re-ran this to make sure I am not missing something. I am checking host UP using a simple ping.

When apple is down (change address to an unresolvable FQDN), I receive alerts correctly. Just an alert for apple

When just shiva2 and soekris1 are down (change address to an IP that will not respond to ping), I receive alerts correctly. Just for shiva2 and soekris1

Now to test an everything down condition I change the host check_command to an check_nrpe check. If nrpe does not respond, nagios will call the host DOWN. This is applied through the "generic_host" template so this template applies to Apple (but I have overridden the host check_command to always be a ping test for this host), shiva2, soekris1, and all the hosts behind shiva2 and soekris1. I then block all NRPE traffic to shiva2/soekris1 and all the hosts behind them with a firewall rule on the shiva2/soekris1 firewalls. As an additional note, I have taken 1 host, 'apache1' and made its parent 'apple' even though the parent is really shiva2/soekris1 in the real world.

apache1 sends an alert as expected. shiva2 and soekris1 sit at "Current Attempt 1/3 (SOFT state)" but are shown as blocking outages. when the "next scheduled check" time comes, it passes without any change in the current attempt count, and is re-scheduled for a minute or so later. apache1, shiva2, and soekris1 are all shown under Hosts Down. All the other servers that are dependent on shiva2/soekris1 are shown as unreachable. All other servers are correctly in the UP state. Alerts for soekris1/shiva2 never come -- at least not in the time I let it run. In my experience, had I done a manual host alive check via the web interface, the current attempt does increment and I will get an alert if I manually check it enough to get it to a HARD state (I did not test this in my most recent run so my memory could be incorrect here).

since apache1 has no dependents and I get alerts correctly, is there something up with the children of shiva2/soekris1 that are preventing alerts for these parents? Hope my explanation is clear.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: No Blocking Outage Alerts for Parents

Post by scottwilkerson »

While this is a bit confusing, you never mentioned the state of
sp_int_sw1
or
soekris1-sham,soekris2-shamm

Are they Up or Down during this test?

Are you making apache1 the ONLY parent of apple?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

sp_int_sw1 and soekris1-sham, soekris2-sham are UP in all of these tests

no Apple is the parent of apache1 and it is not the only host that is the direct parent of. Apple is the direct parent of apache1, soekris1, and shiva2.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: No Blocking Outage Alerts for Parents

Post by scottwilkerson »

kurt2439 wrote: In my experience, had I done a manual host alive check via the web interface, the current attempt does increment and I will get an alert if I manually check it enough to get it to a HARD state (I did not test this in my most recent run so my memory could be incorrect here).
As far as I am aware, if a hosts parent is in a down state, the children will not update their Current attempt and will NEVER send notification until the parent goes back to an UP state.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

the hosts i am concerned about, "soekris1" & "shiva2" are children of "Apple" which is UP.

the logic you describe makes sense and is desirable but that is not a situation I am describing here, though the behavior of nagios seems to imply something of this nature being true.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: No Blocking Outage Alerts for Parents

Post by scottwilkerson »

Interesting...

Other possible related nagios.cfg parameters

Code: Select all

soft_state_dependencies
enable_predictive_host_dependency_checks
Additionally, are there any hostdependencies defined in the configuration?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

host dependencies! ugh. I did not think so because of the parent logic and didn't even think to check but of course this was the problem. Waste of time...

Thanks for helping me with this issue!
Locked