Service Dependency behaviour

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Service Dependency behaviour

Post by Fred Kroeger »

I've created a Service dependency that works as expected if all the dependent services have a normal status.
eg: If Service A (check_nrpe-version) goes critical because the agent has stopped then no alerts are generated for Services B to Z if they were previously normal.
However if Service B has a Hard Warning status already , if the agent is stopped, it immediately goes to a hard Critical (connection refused).
I would have expected that it would go through the normal retries before becoming hard? An alert is then generated before Service A goes critical to stop the other services.
How can I get a service to go through the normal retries on a transition from Warning to Critical?

regards... Fred
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Service Dependency behaviour

Post by abrist »

Just to clarify:
A service in a HARD WARNING state will move to a HARD CRITICAL state without retries when the remote agent is stopped, is that correct?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Service Dependency behaviour

Post by Fred Kroeger »

That is correct
When the agent is stopped, Nagios displays a connection refused message with a Critical state.
If the serveice previously had a hard warning then it moves to a hard critical immediately.

regards... Fred
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Service Dependency behaviour

Post by slansing »

The way this logic works is, if you have a service in a soft warning state, at..lets say 3/5 retries, and then it goes into a critical state. It will continue on the path of retries, leaving 2 left and then switch to a hard critical state. The only way this number resets is if the service returns to an OK state.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Service Dependency behaviour

Post by Fred Kroeger »

If you're saying that the retry counter only resets when the status return to normal, then that means that a change to Critical will send an immediate notification and then a change back to Warning will send another immediate notification.

Surely you would want the retry counter to reset on every status change after it has gone hard - not just normal? Otherwise we're generating a lot of notifications (email/sms, etc).

Fred
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Service Dependency behaviour

Post by sreinhardt »

While I can see your point as to potential false positives and in a few cases some unexpected behavior. The alternative, is that notifications for a potential issue could be extremely delayed, and that would be far worse. Imagine a case where memory on a machine is fluctuating between warning and critical states, if the counter were reset between each state change, you could potentially either never receive an alert as it keeps getting reset or have very delayed notifications as it finally stayed in one state for long enough to alert.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Service Dependency behaviour

Post by Fred Kroeger »

Sorry - only just got back to following this one up.
That's not quite what I was thinking. I would expect that any state change will trigger the retry counter and continue until it goes hard.
Any other status change would then restart the counter again.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Service Dependency behaviour

Post by sreinhardt »

You could submit this as a core feature request, precisely a toggle ability on any state change to reset the counter. However I think most people would agree with the possibility of ever resetting counters, it could lead to a large lag in notifications.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked