Re: No Blocking Outage Alerts for Parents
Posted: Wed Mar 13, 2013 2:41 pm
OK I just re-ran this to make sure I am not missing something. I am checking host UP using a simple ping.
When apple is down (change address to an unresolvable FQDN), I receive alerts correctly. Just an alert for apple
When just shiva2 and soekris1 are down (change address to an IP that will not respond to ping), I receive alerts correctly. Just for shiva2 and soekris1
Now to test an everything down condition I change the host check_command to an check_nrpe check. If nrpe does not respond, nagios will call the host DOWN. This is applied through the "generic_host" template so this template applies to Apple (but I have overridden the host check_command to always be a ping test for this host), shiva2, soekris1, and all the hosts behind shiva2 and soekris1. I then block all NRPE traffic to shiva2/soekris1 and all the hosts behind them with a firewall rule on the shiva2/soekris1 firewalls. As an additional note, I have taken 1 host, 'apache1' and made its parent 'apple' even though the parent is really shiva2/soekris1 in the real world.
apache1 sends an alert as expected. shiva2 and soekris1 sit at "Current Attempt 1/3 (SOFT state)" but are shown as blocking outages. when the "next scheduled check" time comes, it passes without any change in the current attempt count, and is re-scheduled for a minute or so later. apache1, shiva2, and soekris1 are all shown under Hosts Down. All the other servers that are dependent on shiva2/soekris1 are shown as unreachable. All other servers are correctly in the UP state. Alerts for soekris1/shiva2 never come -- at least not in the time I let it run. In my experience, had I done a manual host alive check via the web interface, the current attempt does increment and I will get an alert if I manually check it enough to get it to a HARD state (I did not test this in my most recent run so my memory could be incorrect here).
since apache1 has no dependents and I get alerts correctly, is there something up with the children of shiva2/soekris1 that are preventing alerts for these parents? Hope my explanation is clear.
When apple is down (change address to an unresolvable FQDN), I receive alerts correctly. Just an alert for apple
When just shiva2 and soekris1 are down (change address to an IP that will not respond to ping), I receive alerts correctly. Just for shiva2 and soekris1
Now to test an everything down condition I change the host check_command to an check_nrpe check. If nrpe does not respond, nagios will call the host DOWN. This is applied through the "generic_host" template so this template applies to Apple (but I have overridden the host check_command to always be a ping test for this host), shiva2, soekris1, and all the hosts behind shiva2 and soekris1. I then block all NRPE traffic to shiva2/soekris1 and all the hosts behind them with a firewall rule on the shiva2/soekris1 firewalls. As an additional note, I have taken 1 host, 'apache1' and made its parent 'apple' even though the parent is really shiva2/soekris1 in the real world.
apache1 sends an alert as expected. shiva2 and soekris1 sit at "Current Attempt 1/3 (SOFT state)" but are shown as blocking outages. when the "next scheduled check" time comes, it passes without any change in the current attempt count, and is re-scheduled for a minute or so later. apache1, shiva2, and soekris1 are all shown under Hosts Down. All the other servers that are dependent on shiva2/soekris1 are shown as unreachable. All other servers are correctly in the UP state. Alerts for soekris1/shiva2 never come -- at least not in the time I let it run. In my experience, had I done a manual host alive check via the web interface, the current attempt does increment and I will get an alert if I manually check it enough to get it to a HARD state (I did not test this in my most recent run so my memory could be incorrect here).
since apache1 has no dependents and I get alerts correctly, is there something up with the children of shiva2/soekris1 that are preventing alerts for these parents? Hope my explanation is clear.