Nagios Support Forum

Posted: **Sun Jun 30, 2013 7:16 pm**

I've created a Service dependency that works as expected if all the dependent services have a normal status.
eg: If Service A (check_nrpe-version) goes critical because the agent has stopped then no alerts are generated for Services B to Z if they were previously normal.
However if Service B has a Hard Warning status already , if the agent is stopped, it immediately goes to a hard Critical (connection refused).
I would have expected that it would go through the normal retries before becoming hard? An alert is then generated before Service A goes critical to stop the other services.
How can I get a service to go through the normal retries on a transition from Warning to Critical?

regards... Fred

Posted: **Mon Jul 01, 2013 2:18 pm**

Just to clarify:
A service in a HARD WARNING state will move to a HARD CRITICAL state without retries when the remote agent is stopped, is that correct?

Posted: **Wed Jul 03, 2013 2:55 am**

That is correct
When the agent is stopped, Nagios displays a connection refused message with a Critical state.
If the serveice previously had a hard warning then it moves to a hard critical immediately.

regards... Fred

Posted: **Wed Jul 03, 2013 1:22 pm**

The way this logic works is, if you have a service in a soft warning state, at..lets say 3/5 retries, and then it goes into a critical state. It will continue on the path of retries, leaving 2 left and then switch to a hard critical state. The only way this number resets is if the service returns to an OK state.

Posted: **Tue Jul 09, 2013 2:38 am**

If you're saying that the retry counter only resets when the status return to normal, then that means that a change to Critical will send an immediate notification and then a change back to Warning will send another immediate notification.

Surely you would want the retry counter to reset on every status change after it has gone hard - not just normal? Otherwise we're generating a lot of notifications (email/sms, etc).

Fred

Posted: **Tue Jul 09, 2013 9:37 am**

While I can see your point as to potential false positives and in a few cases some unexpected behavior. The alternative, is that notifications for a potential issue could be extremely delayed, and that would be far worse. Imagine a case where memory on a machine is fluctuating between warning and critical states, if the counter were reset between each state change, you could potentially either never receive an alert as it keeps getting reset or have very delayed notifications as it finally stayed in one state for long enough to alert.

Posted: **Wed Jul 17, 2013 9:02 pm**

Sorry - only just got back to following this one up.
That's not quite what I was thinking. I would expect that any state change will trigger the retry counter and continue until it goes hard.
Any other status change would then restart the counter again.

Posted: **Thu Jul 18, 2013 10:36 am**

You could submit this as a core feature request, precisely a toggle ability on any state change to reset the counter. However I think most people would agree with the possibility of ever resetting counters, it could lead to a large lag in notifications.

Nagios Support Forum

Service Dependency behaviour

Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour