Nagios XI 2014R1.2
This morning we have network issues and service alerts were being sent out even though the host was timing out.
Please advice where to check on this?
Thanks.
Service Alerts being sent out when host is down
Service Alerts being sent out when host is down
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Service Alerts being sent out when host is down
For the host that was timing out, what are the values for:
- Check Interval
Retry Interval
Max Check Attempts
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Service Alerts being sent out when host is down
Check Interval - 5 minutesBox293 wrote:For the host that was timing out, what are the values for:
- Check Interval
Retry Interval
Max Check Attempts
Retry Interval - 5 minutes
Max Check Attempts - 2
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Service Alerts being sent out when host is down
Was the host in a hard state when the service alerts were being sent out?
With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:
1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed
You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.
Does this match up with the Host Alert History?
With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:
1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed
You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.
Does this match up with the Host Alert History?
- Host Status Detail
Advanced Tab
See this host in Nagios Core
View Alert History For This Host
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Service Alerts being sent out when host is down
Yes, what you are saying makes sense.Box293 wrote:Was the host in a hard state when the service alerts were being sent out?
With your current settings, the host has to return a warning or critical state twice before it goes into a hard state and supresses service alerts. To explain with dummy times:
1.00pm = Host is checked and is OK, next check is 1.05pm
1.01pm = Host goes down, Nagios does not yet know about it
1.05pm = Host is checked and is critical, 1st attempt so goes into a soft state, no host alerts sent yet, next check is 1.10pm
1.10pm = Host is checked and is critical, 2nd attempt so goes into a hard state, host alerts sent and service alerts are supressed
You can see by this example. service alerts will continue to be sent from 1.01pm through to 1.10pm when the host goes into a hard state.
Does this match up with the Host Alert History?
- Host Status Detail
Advanced Tab
See this host in Nagios Core
View Alert History For This Host
How do I stop sending out alerts even with Host in Soft State?
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
Re: Service Alerts being sent out when host is down
Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Service Alerts being sent out when host is down
This will cause too many false alerts.abrist wrote:Decrease the host retries to 0. This way when the host fails it will be immediately a HARD problem state.rajasegar wrote:How do I stop sending out alerts even with Host in Soft State?
I know theoretically it should not but it does especially for network flapping issues. Sometimes about 50% more false alerts.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Service Alerts being sent out when host is down
Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:
Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2
Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3
For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.
Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 2
Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3
For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.
Last edited by Box293 on Wed Jul 23, 2014 2:03 am, edited 1 time in total.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Service Alerts being sent out when host is down
Will explore this but most of them want notification in single poll especially those checking log files.Box293 wrote:Your other option is to configure your service checks with more retries / intervals so they will not go into a hard state until after the host does. For example:
Host
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3
Service
Check Interval - 5 minutes
Retry Interval - 5 minutes
Max Check Attempts - 3
For the services, with 3 max check attempts, it would take 15 minutes for a service to go into a hard state and by that time the host would have gone into a hard state.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Service Alerts being sent out when host is down
I meant to put Max Check Attempts - 2 for the host, I editied it so it makes sense.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.