Service recovery logged as soft instead of hard

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
crystal.then
Posts: 57
Joined: Mon Oct 27, 2014 12:05 am

Service recovery logged as soft instead of hard

Post by crystal.then »

Hello,

We've got an issue where sometimes a service recovery is soft when it should be hard. As a result we're not getting notifications for those recoveries.

It's definitely this bug:
https://github.com/NagiosEnterprises/na ... issues/651

However we're running Nagios XI 5.6.7, which I believe should include the bug fix. Any idea on how to fix the issue?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service recovery logged as soft instead of hard

Post by scottwilkerson »

Have you noticed is it ever the same services or are they always different?

I ask because once the fix is applied, there is still the possibility that a service will do that until it has cycled through a non-OK state and back.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
crystal.then
Posts: 57
Joined: Mon Oct 27, 2014 12:05 am

Re: Service recovery logged as soft instead of hard

Post by crystal.then »

It's difficult to tell as most of our servers aren't going down too often (thankfully), and the order of events that need to occur are pretty rare. But I have been able to find a few instances of it happening more than once for a single service.

e.g.1:
State History:
ex1-states.png
Notifications:
ex1-notifications.png
e.g.2:
State History:
ex2-states.png
This one didn't send out notifications as the timing wasn't right, but am I right in saying the recoveries should be hard in these cases?


We updated to 5.6.7 on October 30th. The first example is from a service that was added after this date.
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service recovery logged as soft instead of hard

Post by scottwilkerson »

By chance is the host going in a down state? I ask because you are also seeing a CRITICAL 1/5 which makes me believe that the host may be down also which would log the recovery as soft.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
crystal.then
Posts: 57
Joined: Mon Oct 27, 2014 12:05 am

Re: Service recovery logged as soft instead of hard

Post by crystal.then »

Yes, the host is also going down in these cases.

The problem for us is that after the host comes back, a critical notification gets sent for the service. Then when the service recovers, no recovery notification gets sent because the recovery is soft.

Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service recovery logged as soft instead of hard

Post by scottwilkerson »

crystal.then wrote:Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.
This is normally the case, however if the host comes back up but the service isn't able to respond a notification can be sent.

It is possible it may be remedied by increasing the max_check_attempts for the services affected.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
crystal.then
Posts: 57
Joined: Mon Oct 27, 2014 12:05 am

Re: Service recovery logged as soft instead of hard

Post by crystal.then »

Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service recovery logged as soft instead of hard

Post by scottwilkerson »

crystal.then wrote:Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?
It should if the host is also up this whole time
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
crystal.then
Posts: 57
Joined: Mon Oct 27, 2014 12:05 am

Re: Service recovery logged as soft instead of hard

Post by crystal.then »

Here's the state history of the host from example 1:
ex1-hoststates.png
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service recovery logged as soft instead of hard

Post by scottwilkerson »

As I suspected, the service went directly to hard critical when the host was soft down, then the service recovered.

Here's a deeper explanation of the logic
https://assets.nagios.com/downloads/nag ... types.html
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked