retry_interval changing after first notification

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

retry_interval changing after first notification

Post by stucky »

Hi

I noticed what I'd consider strange behaviour regarding the "retry_interval" for a host. I have it set to 1m in a template and I have a host that inherits that.
The normal check interval is 5m. That works as expected. When I down the host the interval does indeed change to 1m. However, after the 4 SOFT downs the HARD Down kicks in and a notification is sent. This seems to automatically change the retry_interval back to 5m eventhough the host state is not changing. Now I bring the host back up but it can still take up to 5 mins for nagios to register that.
I don't see the 2 connected. Why would a notification set the retry_interval back to the check_interval ? Is this by design ?
You do not have the required permissions to view the files attached to this post.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: retry_interval changing after first notification

Post by abrist »

This is actually working as expected. Retry intervals are only used when an object is in a SOFT PROBLEM STATE. So once the check goes into the PROBLEM STATE, it stays SOFT until the max number of retry checks have been reached. After that, it changes to HARD PROBLEM STATE and will resume the original interval for checks.
Does that make sense?
EDIT: (more info)
From: http://nagios.sourceforge.net/docs/3_0/ ... tions.html
retry_interval: This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: retry_interval changing after first notification

Post by jdalrymple »

By design:
Object Definitions wrote:retry_interval: This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
A better way to think of it is that retry_interval is only in effect during the SOFT state.
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

Re: retry_interval changing after first notification

Post by stucky »

Wow this forum is fast !
Thank you. I remember last using nagios around version 2.9 and I don't remember this ever being the case. I seem to distinctly remember that recoveries were reported very quickly and I always attributed that to the more aggressive retry_check value. Has this changed ?
I honestly don't follow the logic here. Why are we changing the retry back to regular eventhough the state is still down ? Now nagios is less accurate in reporting when the host came back. What are we gaining here ? Can this be tweaked ?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: retry_interval changing after first notification

Post by abrist »

stucky wrote:Has this changed ?
I do not believe so - I think this behavior is older than dirt in nagios.
stucky wrote:Now nagios is less accurate in reporting when the host came back. What are we gaining here ? Can this be tweaked ?
My suggestion would be to run the standard checks at 1 minute if you are very concerned. You are already dealing with a max potential 10 minutes before a notification is sent if the host is down (5 min interval with 5x1 min retries).
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

Re: retry_interval changing after first notification

Post by stucky »

Interesting...I had never noticed that before. Thx
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: retry_interval changing after first notification

Post by abrist »

No problem. Have a good one!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked