Service recovery logged as soft instead of hard

crystal.then · Post by **crystal.then** » Sun Nov 17, 2019 8:45 pm

Hello,

We've got an issue where sometimes a service recovery is soft when it should be hard. As a result we're not getting notifications for those recoveries.

It's definitely this bug:
https://github.com/NagiosEnterprises/na ... issues/651

However we're running Nagios XI 5.6.7, which I believe should include the bug fix. Any idea on how to fix the issue?

scottwilkerson · Post by **scottwilkerson** » Mon Nov 18, 2019 7:39 am

Have you noticed is it ever the same services or are they always different?

I ask because once the fix is applied, there is still the possibility that a service will do that until it has cycled through a non-OK state and back.

crystal.then · Post by **crystal.then** » Mon Nov 18, 2019 10:46 pm

It's difficult to tell as most of our servers aren't going down too often (thankfully), and the order of events that need to occur are pretty rare. But I have been able to find a few instances of it happening more than once for a single service.

e.g.1:
State History:

ex1-states.png

Notifications:

ex1-notifications.png

e.g.2:
State History:

ex2-states.png

This one didn't send out notifications as the timing wasn't right, but am I right in saying the recoveries should be hard in these cases?

We updated to 5.6.7 on October 30th. The first example is from a service that was added after this date.

scottwilkerson · Post by **scottwilkerson** » Tue Nov 19, 2019 7:33 am

By chance is the host going in a down state? I ask because you are also seeing a CRITICAL 1/5 which makes me believe that the host may be down also which would log the recovery as soft.

crystal.then · Post by **crystal.then** » Tue Nov 19, 2019 7:01 pm

Yes, the host is also going down in these cases.

The problem for us is that after the host comes back, a critical notification gets sent for the service. Then when the service recovers, no recovery notification gets sent because the recovery is soft.

Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.

scottwilkerson · Post by **scottwilkerson** » Wed Nov 20, 2019 7:44 am

crystal.then wrote:Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.

This is normally the case, however if the host comes back up but the service isn't able to respond a notification can be sent.

It is possible it may be remedied by increasing the max_check_attempts for the services affected.

crystal.then · Post by **crystal.then** » Wed Nov 20, 2019 6:49 pm

Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?

scottwilkerson · Post by **scottwilkerson** » Thu Nov 21, 2019 8:10 am

crystal.then wrote:Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?

It should if the host is also up this whole time

crystal.then · Post by **crystal.then** » Thu Nov 21, 2019 6:03 pm

Here's the state history of the host from example 1:

ex1-hoststates.png

scottwilkerson · Post by **scottwilkerson** » Fri Nov 22, 2019 7:31 am

As I suspected, the service went directly to hard critical when the host was soft down, then the service recovered.

Here's a deeper explanation of the logic
https://assets.nagios.com/downloads/nag ... types.html

Nagios Support Forum

Service recovery logged as soft instead of hard

Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard

Re: Service recovery logged as soft instead of hard