Service Alerts being sent out when host is down

rajasegar · Post by **rajasegar** » Tue Jul 22, 2014 12:45 am

Nagios XI 2014R1.2

This morning we have network issues and service alerts were being sent out even though the host was timing out.
Please advice where to check on this?

Thanks.

Post by **Box293** » Tue Jul 22, 2014 1:11 am

For the host that was timing out, what are the values for:

Check Interval
Retry Interval
Max Check Attempts

rajasegar · Post by **rajasegar** » Tue Jul 22, 2014 1:18 am

Box293 wrote:For the host that was timing out, what are the values for:
Check Interval
Retry Interval
Max Check Attempts

Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2

Post by **Box293** » Tue Jul 22, 2014 2:36 am

Was the host in a hard state when the service alerts were being sent out?

With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:

1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed

You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.

Does this match up with the Host Alert History?

Host Status Detail
Advanced Tab
See this host in Nagios Core
View Alert History For This Host

rajasegar · Post by **rajasegar** » Tue Jul 22, 2014 2:51 am

Box293 wrote:Was the host in a hard state when the service alerts were being sent out?

With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:

1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed

You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.

Does this match up with the Host Alert History?
Host Status Detail
Advanced Tab
See this host in Nagios Core
View Alert History For This Host

Yes, what you are saying makes sense.
How do I stop sending out alerts even with Host in Soft State?

abrist · Post by **abrist** » Tue Jul 22, 2014 9:36 am

rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?

Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.

rajasegar · Post by **rajasegar** » Tue Jul 22, 2014 7:39 pm

abrist wrote:
rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?
Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.

This will cause too many false alerts.
I know theoretically it should not but it does especially for network flapping issues. Sometimes about 50% more false alerts.

Post by **Box293** » Tue Jul 22, 2014 10:15 pm

Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:

Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2

Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.

rajasegar · Post by **rajasegar** » Wed Jul 23, 2014 1:34 am

Box293 wrote:Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:

Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.

Will explore this but most of them want notification in single poll especially those checking log files.

Post by **Box293** » Wed Jul 23, 2014 2:04 am

I meant to put Max Check Attempts - 2 for the host, I editied it so it makes sense.

Nagios Support Forum

Service Alerts being sent out when host is down

Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down

Re: Service Alerts being sent out when host is down