Page 1 of 2

Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 12:45 am
by rajasegar
Nagios XI 2014R1.2


This morning we have network issues and service alerts were being sent out even though the host was timing out.
Please advice where to check on this?

Thanks.

Re: Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 1:11 am
by Box293
For the host that was timing out, what are the values for:
  • Check Interval
    Retry Interval
    Max Check Attempts

Re: Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 1:18 am
by rajasegar
Box293 wrote:For the host that was timing out, what are the values for:
  • Check Interval
    Retry Interval
    Max Check Attempts
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2

Re: Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 2:36 am
by Box293
Was the host in a hard state when the service alerts were being sent out?

With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:

1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed

You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.

Does this match up with the Host Alert History?
  • Host Status Detail
    Advanced Tab
    See this host in Nagios Core
    View Alert History For This Host

Re: Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 2:51 am
by rajasegar
Box293 wrote:Was the host in a hard state when the service alerts were being sent out?

With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:

1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed

You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.

Does this match up with the Host Alert History?
  • Host Status Detail
    Advanced Tab
    See this host in Nagios Core
    View Alert History For This Host
Yes, what you are saying makes sense.
How do I stop sending out alerts even with Host in Soft State?

Re: Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 9:36 am
by abrist
rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?
Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.

Re: Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 7:39 pm
by rajasegar
abrist wrote:
rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?
Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.
This will cause too many false alerts.
I know theoretically it should not but it does especially for network flapping issues. Sometimes about 50% more false alerts.

Re: Service Alerts being sent out when host is down

Posted: Tue Jul 22, 2014 10:15 pm
by Box293
Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:

Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2

Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.

Re: Service Alerts being sent out when host is down

Posted: Wed Jul 23, 2014 1:34 am
by rajasegar
Box293 wrote:Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:

Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.
Will explore this but most of them want notification in single poll especially those checking log files.

Re: Service Alerts being sent out when host is down

Posted: Wed Jul 23, 2014 2:04 am
by Box293
I meant to put Max Check Attempts - 2 for the host, I editied it so it makes sense.