Nagios does not suppress all dependents on failure
Posted: Mon Mar 14, 2011 7:56 am
When I have a number of hosts that depend on another host (typically a router), failure of the dependency host (router) only suppresses failure reports for those of the dependent hosts which happen to be checked by Nagios after Nagios detects the primary (dependency) failure.
For instance if the checking sequence is: Router, Host 1, Host 2, Host 3, Router, Host 1, Host 2, Host 3, ... and the router fails just after Host 1 was checked, then Nagios will send out alarm mails for Host 2, Host 3, Router, and then suppress the alarm mail for Host 1. This creates lots of noise mails, that easily drown out the real issues.
Searching the forum and the web indicates that others have reported the same issue, but with no meaningful replies. So what can be done, and is this something anyone is trying to fix?
Info: Nagios core 3.2.1 (Debian package). NRPE not involved.
Example dependency rule (slightly changed):
For instance if the checking sequence is: Router, Host 1, Host 2, Host 3, Router, Host 1, Host 2, Host 3, ... and the router fails just after Host 1 was checked, then Nagios will send out alarm mails for Host 2, Host 3, Router, and then suppress the alarm mail for Host 1. This creates lots of noise mails, that easily drown out the real issues.
Searching the forum and the web indicates that others have reported the same issue, but with no meaningful replies. So what can be done, and is this something anyone is trying to fix?
Info: Nagios core 3.2.1 (Debian package). NRPE not involved.
Example dependency rule (slightly changed):
Code: Select all
define hostdependency {
host_name router2-ipv4
dependent_hostgroup_name externalhosts-ipv4
inherits_parent 1
execution_failure_criteria d,u
notification_failure_criteria d,u
}