I'm noticing some issues on our instance where hosts and services will show down almost instantly (which is as expected). The issue is that once the issue is resolved, sometimes it will take hours to show as recovered.
Take simple ping on a camera for instance.
These were setup without any check intervals on the camera, but rather through a template.
However, that template doesn't have check intervals specifically set, only a check period. That template has another template applied to it. It's this second template that has the check intervals assigned.
Is this a correct way of using tmeplates? Or should I look at revamping this?
Recovery Latency
Re: Recovery Latency
Re: templates. You'll want to set the values in the template that you'd like to be inherited by the services attached to the template. I try to set them as completely as possible so I get less unexpected results.
Re: recovery times. I would think this would be based on your "retry interval" since Nagios uses this interval (instead of check interval) after state change is detected. What is your retry interval?
The other issue I've seen with long recoveries is the host or service is flapping and you don't get the recovery until it settles down (which could be hours).
Re: recovery times. I would think this would be based on your "retry interval" since Nagios uses this interval (instead of check interval) after state change is detected. What is your retry interval?
The other issue I've seen with long recoveries is the host or service is flapping and you don't get the recovery until it settles down (which could be hours).
Re: Recovery Latency
Here's how the person before me set it up:
Actual Host in Nagios doesn't have any check sor alert settings. The host has a template assigned to it: xiwizard_genericnetdevice_host
When I go to that template, I see the following:
Additional Templates - xiwizard_generic_host
Under check settings:
Additional Templates - none
Under check settings:
Actual Host in Nagios doesn't have any check sor alert settings. The host has a template assigned to it: xiwizard_genericnetdevice_host
When I go to that template, I see the following:
Additional Templates - xiwizard_generic_host
Under check settings:
- Check period - 24x7
- Freshness threshold - 1800
- Event Handler - host-notify-by-email
Additional Templates - none
Under check settings:
- Max. Check attempts - 5
- Retry interval - 1
- Check interval - 5
- Event handler - host-notify-by-email
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Recovery Latency
Flapping requires certain scenario before it would go into a flapping state. Additionally, do you have flap detection enabled?
Re: Recovery Latency
Yes, flapping detection is enabled on the xiwizard_generic_host template, but not on the xiwizard_genericnetdevice_host template:
Check Settings:
Check Settings:
- Flap detection enabled - On
- Retain status information - On
- Retain non-status information - On
- Process perf data - On
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Recovery Latency
Are you sure the host was in a flapping state (UP DOWN UP DOWN UP DOWN etc), or is it just down?
Re: Recovery Latency
If you go to the Service Detail for one of the services in question and "Schedule and Immediate Check", does the status remain bad or refresh as OK?
Re: Recovery Latency
It was just down. It was not showing as flapping in Nagios. This is on more than one host for what it's worth and it's been happening for a while now.scottwilkerson wrote:Are you sure the host was in a flapping state (UP DOWN UP DOWN UP DOWN etc), or is it just down?
In the past, when I've done this, it hasn't changed status, even though I can ping it from Nagios as well as from my desktop.455157 wrote:If you go to the Service Detail for one of the services in question and "Schedule and Immediate Check", does the status remain bad or refresh as OK?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Recovery Latency
What version of Nagios XI are you using?
Re: Recovery Latency
Unfortunately, I'm still on Nagios XI 2011R2.3.
Not being able to get out past our proxy to update has made things quite difficult in the updating department.
Is this something that has been improved upon?
Not being able to get out past our proxy to update has made things quite difficult in the updating department.
Is this something that has been improved upon?