retry_interval changing after first notification

stucky · Post by **stucky** » Mon May 04, 2015 2:53 pm

Hi

I noticed what I'd consider strange behaviour regarding the "retry_interval" for a host. I have it set to 1m in a template and I have a host that inherits that.
The normal check interval is 5m. That works as expected. When I down the host the interval does indeed change to 1m. However, after the 4 SOFT downs the HARD Down kicks in and a notification is sent. This seems to automatically change the retry_interval back to 5m eventhough the host state is not changing. Now I bring the host back up but it can still take up to 5 mins for nagios to register that.
I don't see the 2 connected. Why would a notification set the retry_interval back to the check_interval ? Is this by design ?

abrist · Post by **abrist** » Mon May 04, 2015 3:00 pm

This is actually working as expected. Retry intervals are only used when an object is in a SOFT PROBLEM STATE. So once the check goes into the PROBLEM STATE, it stays SOFT until the max number of retry checks have been reached. After that, it changes to HARD PROBLEM STATE and will resume the original interval for checks.
Does that make sense?
EDIT: (more info)
From: http://nagios.sourceforge.net/docs/3_0/ ... tions.html

retry_interval: This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.

jdalrymple · Post by **jdalrymple** » Mon May 04, 2015 3:00 pm

By design:

Object Definitions wrote:retry_interval: This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.

A better way to think of it is that retry_interval is only in effect during the SOFT state.

stucky · Post by **stucky** » Mon May 04, 2015 3:09 pm

Wow this forum is fast !
Thank you. I remember last using nagios around version 2.9 and I don't remember this ever being the case. I seem to distinctly remember that recoveries were reported very quickly and I always attributed that to the more aggressive retry_check value. Has this changed ?
I honestly don't follow the logic here. Why are we changing the retry back to regular eventhough the state is still down ? Now nagios is less accurate in reporting when the host came back. What are we gaining here ? Can this be tweaked ?

abrist · Post by **abrist** » Mon May 04, 2015 3:16 pm

stucky wrote:Has this changed ?

I do not believe so - I think this behavior is older than dirt in nagios.

stucky wrote:Now nagios is less accurate in reporting when the host came back. What are we gaining here ? Can this be tweaked ?

My suggestion would be to run the standard checks at 1 minute if you are very concerned. You are already dealing with a max potential 10 minutes before a notification is sent if the host is down (5 min interval with 5x1 min retries).

stucky · Post by **stucky** » Mon May 04, 2015 4:03 pm

Interesting...I had never noticed that before. Thx

abrist · Post by **abrist** » Mon May 04, 2015 4:05 pm

No problem. Have a good one!

Nagios Support Forum

retry_interval changing after first notification

retry_interval changing after first notification

Re: retry_interval changing after first notification

Re: retry_interval changing after first notification

Re: retry_interval changing after first notification

Re: retry_interval changing after first notification

Re: retry_interval changing after first notification

Re: retry_interval changing after first notification