I've created a Service dependency that works as expected if all the dependent services have a normal status.
eg: If Service A (check_nrpe-version) goes critical because the agent has stopped then no alerts are generated for Services B to Z if they were previously normal.
However if Service B has a Hard Warning status already , if the agent is stopped, it immediately goes to a hard Critical (connection refused).
I would have expected that it would go through the normal retries before becoming hard? An alert is then generated before Service A goes critical to stop the other services.
How can I get a service to go through the normal retries on a transition from Warning to Critical?
regards... Fred
Service Dependency behaviour
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Service Dependency behaviour
Just to clarify:
A service in a HARD WARNING state will move to a HARD CRITICAL state without retries when the remote agent is stopped, is that correct?
A service in a HARD WARNING state will move to a HARD CRITICAL state without retries when the remote agent is stopped, is that correct?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Service Dependency behaviour
That is correct
When the agent is stopped, Nagios displays a connection refused message with a Critical state.
If the serveice previously had a hard warning then it moves to a hard critical immediately.
regards... Fred
When the agent is stopped, Nagios displays a connection refused message with a Critical state.
If the serveice previously had a hard warning then it moves to a hard critical immediately.
regards... Fred
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Service Dependency behaviour
The way this logic works is, if you have a service in a soft warning state, at..lets say 3/5 retries, and then it goes into a critical state. It will continue on the path of retries, leaving 2 left and then switch to a hard critical state. The only way this number resets is if the service returns to an OK state.
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Service Dependency behaviour
If you're saying that the retry counter only resets when the status return to normal, then that means that a change to Critical will send an immediate notification and then a change back to Warning will send another immediate notification.
Surely you would want the retry counter to reset on every status change after it has gone hard - not just normal? Otherwise we're generating a lot of notifications (email/sms, etc).
Fred
Surely you would want the retry counter to reset on every status change after it has gone hard - not just normal? Otherwise we're generating a lot of notifications (email/sms, etc).
Fred
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Service Dependency behaviour
While I can see your point as to potential false positives and in a few cases some unexpected behavior. The alternative, is that notifications for a potential issue could be extremely delayed, and that would be far worse. Imagine a case where memory on a machine is fluctuating between warning and critical states, if the counter were reset between each state change, you could potentially either never receive an alert as it keeps getting reset or have very delayed notifications as it finally stayed in one state for long enough to alert.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Service Dependency behaviour
Sorry - only just got back to following this one up.
That's not quite what I was thinking. I would expect that any state change will trigger the retry counter and continue until it goes hard.
Any other status change would then restart the counter again.
That's not quite what I was thinking. I would expect that any state change will trigger the retry counter and continue until it goes hard.
Any other status change would then restart the counter again.
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Service Dependency behaviour
You could submit this as a core feature request, precisely a toggle ability on any state change to reset the counter. However I think most people would agree with the possibility of ever resetting counters, it could lead to a large lag in notifications.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.