Hi
I noticed what I'd consider strange behaviour regarding the "retry_interval" for a host. I have it set to 1m in a template and I have a host that inherits that.
The normal check interval is 5m. That works as expected. When I down the host the interval does indeed change to 1m. However, after the 4 SOFT downs the HARD Down kicks in and a notification is sent. This seems to automatically change the retry_interval back to 5m eventhough the host state is not changing. Now I bring the host back up but it can still take up to 5 mins for nagios to register that.
I don't see the 2 connected. Why would a notification set the retry_interval back to the check_interval ? Is this by design ?
retry_interval changing after first notification
retry_interval changing after first notification
You do not have the required permissions to view the files attached to this post.
Re: retry_interval changing after first notification
This is actually working as expected. Retry intervals are only used when an object is in a SOFT PROBLEM STATE. So once the check goes into the PROBLEM STATE, it stays SOFT until the max number of retry checks have been reached. After that, it changes to HARD PROBLEM STATE and will resume the original interval for checks.
Does that make sense?
EDIT: (more info)
From: http://nagios.sourceforge.net/docs/3_0/ ... tions.html
Does that make sense?
EDIT: (more info)
From: http://nagios.sourceforge.net/docs/3_0/ ... tions.html
retry_interval: This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: retry_interval changing after first notification
By design:
A better way to think of it is that retry_interval is only in effect during the SOFT state.Object Definitions wrote:retry_interval: This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
Re: retry_interval changing after first notification
Wow this forum is fast !
Thank you. I remember last using nagios around version 2.9 and I don't remember this ever being the case. I seem to distinctly remember that recoveries were reported very quickly and I always attributed that to the more aggressive retry_check value. Has this changed ?
I honestly don't follow the logic here. Why are we changing the retry back to regular eventhough the state is still down ? Now nagios is less accurate in reporting when the host came back. What are we gaining here ? Can this be tweaked ?
Thank you. I remember last using nagios around version 2.9 and I don't remember this ever being the case. I seem to distinctly remember that recoveries were reported very quickly and I always attributed that to the more aggressive retry_check value. Has this changed ?
I honestly don't follow the logic here. Why are we changing the retry back to regular eventhough the state is still down ? Now nagios is less accurate in reporting when the host came back. What are we gaining here ? Can this be tweaked ?
Re: retry_interval changing after first notification
I do not believe so - I think this behavior is older than dirt in nagios.stucky wrote:Has this changed ?
My suggestion would be to run the standard checks at 1 minute if you are very concerned. You are already dealing with a max potential 10 minutes before a notification is sent if the host is down (5 min interval with 5x1 min retries).stucky wrote:Now nagios is less accurate in reporting when the host came back. What are we gaining here ? Can this be tweaked ?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: retry_interval changing after first notification
Interesting...I had never noticed that before. Thx
Re: retry_interval changing after first notification
No problem. Have a good one!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.