Service Dependency behaviour

Fred Kroeger · Post by **Fred Kroeger** » Sun Jun 30, 2013 7:16 pm

I've created a Service dependency that works as expected if all the dependent services have a normal status.
eg: If Service A (check_nrpe-version) goes critical because the agent has stopped then no alerts are generated for Services B to Z if they were previously normal.
However if Service B has a Hard Warning status already , if the agent is stopped, it immediately goes to a hard Critical (connection refused).
I would have expected that it would go through the normal retries before becoming hard? An alert is then generated before Service A goes critical to stop the other services.
How can I get a service to go through the normal retries on a transition from Warning to Critical?

regards... Fred

abrist · Post by **abrist** » Mon Jul 01, 2013 2:18 pm

Just to clarify:
A service in a HARD WARNING state will move to a HARD CRITICAL state without retries when the remote agent is stopped, is that correct?

Fred Kroeger · Post by **Fred Kroeger** » Wed Jul 03, 2013 2:55 am

That is correct
When the agent is stopped, Nagios displays a connection refused message with a Critical state.
If the serveice previously had a hard warning then it moves to a hard critical immediately.

regards... Fred

slansing · Post by **slansing** » Wed Jul 03, 2013 1:22 pm

The way this logic works is, if you have a service in a soft warning state, at..lets say 3/5 retries, and then it goes into a critical state. It will continue on the path of retries, leaving 2 left and then switch to a hard critical state. The only way this number resets is if the service returns to an OK state.

Fred Kroeger · Post by **Fred Kroeger** » Tue Jul 09, 2013 2:38 am

If you're saying that the retry counter only resets when the status return to normal, then that means that a change to Critical will send an immediate notification and then a change back to Warning will send another immediate notification.

Surely you would want the retry counter to reset on every status change after it has gone hard - not just normal? Otherwise we're generating a lot of notifications (email/sms, etc).

Fred

sreinhardt · Post by **sreinhardt** » Tue Jul 09, 2013 9:37 am

While I can see your point as to potential false positives and in a few cases some unexpected behavior. The alternative, is that notifications for a potential issue could be extremely delayed, and that would be far worse. Imagine a case where memory on a machine is fluctuating between warning and critical states, if the counter were reset between each state change, you could potentially either never receive an alert as it keeps getting reset or have very delayed notifications as it finally stayed in one state for long enough to alert.

Fred Kroeger · Post by **Fred Kroeger** » Wed Jul 17, 2013 9:02 pm

Sorry - only just got back to following this one up.
That's not quite what I was thinking. I would expect that any state change will trigger the retry counter and continue until it goes hard.
Any other status change would then restart the counter again.

sreinhardt · Post by **sreinhardt** » Thu Jul 18, 2013 10:36 am

You could submit this as a core feature request, precisely a toggle ability on any state change to reset the counter. However I think most people would agree with the possibility of ever resetting counters, it could lead to a large lag in notifications.

Nagios Support Forum

Service Dependency behaviour

Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour

Re: Service Dependency behaviour