Service Alerts being sent out when host is down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Service Alerts being sent out when host is down

Post by rajasegar »

Nagios XI 2014R1.2


This morning we have network issues and service alerts were being sent out even though the host was timing out.
Please advice where to check on this?

Thanks.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Service Alerts being sent out when host is down

Post by Box293 »

For the host that was timing out, what are the values for:
  • Check Interval
    Retry Interval
    Max Check Attempts
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Service Alerts being sent out when host is down

Post by rajasegar »

Box293 wrote:For the host that was timing out, what are the values for:
  • Check Interval
    Retry Interval
    Max Check Attempts
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Service Alerts being sent out when host is down

Post by Box293 »

Was the host in a hard state when the service alerts were being sent out?

With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:

1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed

You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.

Does this match up with the Host Alert History?
  • Host Status Detail
    Advanced Tab
    See this host in Nagios Core
    View Alert History For This Host
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Service Alerts being sent out when host is down

Post by rajasegar »

Box293 wrote:Was the host in a hard state when the service alerts were being sent out?

With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:

1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed

You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.

Does this match up with the Host Alert History?
  • Host Status Detail
    Advanced Tab
    See this host in Nagios Core
    View Alert History For This Host
Yes, what you are saying makes sense.
How do I stop sending out alerts even with Host in Soft State?
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Service Alerts being sent out when host is down

Post by abrist »

rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?
Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Service Alerts being sent out when host is down

Post by rajasegar »

abrist wrote:
rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?
Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.
This will cause too many false alerts.
I know theoretically it should not but it does especially for network flapping issues. Sometimes about 50% more false alerts.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Service Alerts being sent out when host is down

Post by Box293 »

Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:

Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2

Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.
Last edited by Box293 on Wed Jul 23, 2014 2:03 am, edited 1 time in total.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Service Alerts being sent out when host is down

Post by rajasegar »

Box293 wrote:Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:

Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3

For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.
Will explore this but most of them want notification in single poll especially those checking log files.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Service Alerts being sent out when host is down

Post by Box293 »

I meant to put Max Check Attempts - 2 for the host, I editied it so it makes sense.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked