Recovery Latency

jbennett · Post by **jbennett** » Wed Sep 26, 2012 9:32 am

I'm noticing some issues on our instance where hosts and services will show down almost instantly (which is as expected). The issue is that once the issue is resolved, sometimes it will take hours to show as recovered.

Take simple ping on a camera for instance.

These were setup without any check intervals on the camera, but rather through a template.

However, that template doesn't have check intervals specifically set, only a check period. That template has another template applied to it. It's this second template that has the check intervals assigned.

Is this a correct way of using tmeplates? Or should I look at revamping this?

Post by **CGraham** » Wed Sep 26, 2012 10:01 am

Re: templates. You'll want to set the values in the template that you'd like to be inherited by the services attached to the template. I try to set them as completely as possible so I get less unexpected results.

Re: recovery times. I would think this would be based on your "retry interval" since Nagios uses this interval (instead of check interval) after state change is detected. What is your retry interval?

The other issue I've seen with long recoveries is the host or service is flapping and you don't get the recovery until it settles down (which could be hours).

jbennett · Post by **jbennett** » Wed Sep 26, 2012 11:22 am

Here's how the person before me set it up:

Actual Host in Nagios doesn't have any check sor alert settings. The host has a template assigned to it: xiwizard_genericnetdevice_host

When I go to that template, I see the following:

Additional Templates - xiwizard_generic_host
Under check settings:

Check period - 24x7
Freshness threshold - 1800
Event Handler - host-notify-by-email

When I go to the xiwizard_generic_host template, I see the following:
Additional Templates - none
Under check settings:

Max. Check attempts - 5
Retry interval - 1
Check interval - 5
Event handler - host-notify-by-email

When these items are still showing down, they aren't showing as flapping. They also don't alert via email as flapping. They just still show down.

scottwilkerson · Post by **scottwilkerson** » Wed Sep 26, 2012 5:11 pm

Flapping requires certain scenario before it would go into a flapping state. Additionally, do you have flap detection enabled?

jbennett · Post by **jbennett** » Thu Sep 27, 2012 9:05 am

Yes, flapping detection is enabled on the xiwizard_generic_host template, but not on the xiwizard_genericnetdevice_host template:

Check Settings:

Flap detection enabled - On
Retain status information - On
Retain non-status information - On
Process perf data - On

scottwilkerson · Post by **scottwilkerson** » Thu Sep 27, 2012 5:12 pm

Are you sure the host was in a flapping state (UP DOWN UP DOWN UP DOWN etc), or is it just down?

455157 · Post by **455157** » Fri Sep 28, 2012 5:24 pm

If you go to the Service Detail for one of the services in question and "Schedule and Immediate Check", does the status remain bad or refresh as OK?

jbennett · Post by **jbennett** » Mon Oct 01, 2012 8:57 am

scottwilkerson wrote:Are you sure the host was in a flapping state (UP DOWN UP DOWN UP DOWN etc), or is it just down?

It was just down. It was not showing as flapping in Nagios. This is on more than one host for what it's worth and it's been happening for a while now.

455157 wrote:If you go to the Service Detail for one of the services in question and "Schedule and Immediate Check", does the status remain bad or refresh as OK?

In the past, when I've done this, it hasn't changed status, even though I can ping it from Nagios as well as from my desktop.

scottwilkerson · Post by **scottwilkerson** » Mon Oct 01, 2012 9:16 am

What version of Nagios XI are you using?

jbennett · Post by **jbennett** » Mon Oct 01, 2012 9:53 am

Unfortunately, I'm still on Nagios XI 2011R2.3.

Not being able to get out past our proxy to update has made things quite difficult in the updating department.

Is this something that has been improved upon?

Nagios Support Forum

Recovery Latency

Recovery Latency

Re: Recovery Latency

Re: Recovery Latency

Re: Recovery Latency

Re: Recovery Latency

Re: Recovery Latency

Re: Recovery Latency

Re: Recovery Latency

Re: Recovery Latency

Re: Recovery Latency