There are a couple of things to talk about here. Some of this goes back to basics but it's easier to have an example to discuss scenarios with.
1) How long it takes for the host to go into a hard state and become "down" compared to how long it takes your services to go into a hard state and become "down".
Example:
Host
check_interval = 5
max_check_attempts = 3
retry_interval = 2
Service(s)
check_interval = 2
max_check_attempts = 3
retry_interval = 1
1:10pm - Host is checked and detected as UP, next check is 1.15pm
1.11pm - Host goes down, nagios does not know about it yet
1.12pm - Service check fails, retry interval is 1 so next attempt is 1.13pm (soft state)
1.13pm - Service check retry fails, retry interval is 1 so next attempt is 1.14pm (soft state)
1.14pm - Service check fails, max_check_attempts reached so alert is sent (hard state)
1.15pm - Host check fails, retry interval is 2 so next attempt is 1.17pm (soft state)
more service checks happening / retrying / alerting
1.17pm - Host check fails, retry interval is 2 so next attempt is 1.19pm (soft state)
more service checks happening / retrying / alerting
1.19pm - Host check fails, max_check_attempts reached so alert is sent (hard state)
No more service alerts will be sent until the host recovers
Basically, the point I am making here is that your service check interval / retry interval / max check attempts need to exceed what the host check interval / retry interval / max check attempts are. Once the host goes into a hard down state then service checks will continue to be checked however no notifications will be sent.
2) Host and Service Dependencies
Dependencies are a great way to stop checks from being scheduled / executed when something goes down. However if I remember correctly, you can't make services depend on a host object. To get around this you create a service that you depend on.
You can create a ping service for that host and then create a service dependency for all other services on that host which depend on that ping service. In the dependency you define what state of the dependent service will allow the depending services to be executed on their next schedule.
Once that ping service goes into a hard critical state, all other service checks that depend on it will not be executed and hence their state will remain as per the last time the check ran. Once that ping service goes into a hard OK state, all other services checks will be allowed to execute again.
lmiltchev wrote:Imagine a scenario, where ping/icmp checks are disabled on a host by the firewall. The host would be up and you would still want to be checking services, regardless of the fact that nagios is showing the host as "down"...
I completely get what your saying. From a different perspective, if it was only the ping/icmp packets being denied by the firewall and other service checks were still executing OK, it would help in your troubleshooting. It's really an open ended debate ... it all depends on your environment.