Page 1 of 1

Service recovery logged as soft instead of hard

Posted: Sun Nov 17, 2019 8:45 pm
by crystal.then
Hello,

We've got an issue where sometimes a service recovery is soft when it should be hard. As a result we're not getting notifications for those recoveries.

It's definitely this bug:
https://github.com/NagiosEnterprises/na ... issues/651

However we're running Nagios XI 5.6.7, which I believe should include the bug fix. Any idea on how to fix the issue?

Re: Service recovery logged as soft instead of hard

Posted: Mon Nov 18, 2019 7:39 am
by scottwilkerson
Have you noticed is it ever the same services or are they always different?

I ask because once the fix is applied, there is still the possibility that a service will do that until it has cycled through a non-OK state and back.

Re: Service recovery logged as soft instead of hard

Posted: Mon Nov 18, 2019 10:46 pm
by crystal.then
It's difficult to tell as most of our servers aren't going down too often (thankfully), and the order of events that need to occur are pretty rare. But I have been able to find a few instances of it happening more than once for a single service.

e.g.1:
State History:
ex1-states.png
Notifications:
ex1-notifications.png
e.g.2:
State History:
ex2-states.png
This one didn't send out notifications as the timing wasn't right, but am I right in saying the recoveries should be hard in these cases?


We updated to 5.6.7 on October 30th. The first example is from a service that was added after this date.

Re: Service recovery logged as soft instead of hard

Posted: Tue Nov 19, 2019 7:33 am
by scottwilkerson
By chance is the host going in a down state? I ask because you are also seeing a CRITICAL 1/5 which makes me believe that the host may be down also which would log the recovery as soft.

Re: Service recovery logged as soft instead of hard

Posted: Tue Nov 19, 2019 7:01 pm
by crystal.then
Yes, the host is also going down in these cases.

The problem for us is that after the host comes back, a critical notification gets sent for the service. Then when the service recovers, no recovery notification gets sent because the recovery is soft.

Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.

Re: Service recovery logged as soft instead of hard

Posted: Wed Nov 20, 2019 7:44 am
by scottwilkerson
crystal.then wrote:Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.
This is normally the case, however if the host comes back up but the service isn't able to respond a notification can be sent.

It is possible it may be remedied by increasing the max_check_attempts for the services affected.

Re: Service recovery logged as soft instead of hard

Posted: Wed Nov 20, 2019 6:49 pm
by crystal.then
Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?

Re: Service recovery logged as soft instead of hard

Posted: Thu Nov 21, 2019 8:10 am
by scottwilkerson
crystal.then wrote:Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?
It should if the host is also up this whole time

Re: Service recovery logged as soft instead of hard

Posted: Thu Nov 21, 2019 6:03 pm
by crystal.then
Here's the state history of the host from example 1:
ex1-hoststates.png

Re: Service recovery logged as soft instead of hard

Posted: Fri Nov 22, 2019 7:31 am
by scottwilkerson
As I suspected, the service went directly to hard critical when the host was soft down, then the service recovered.

Here's a deeper explanation of the logic
https://assets.nagios.com/downloads/nag ... types.html