Service recovery logged as soft instead of hard
-
crystal.then
- Posts: 57
- Joined: Mon Oct 27, 2014 12:05 am
Service recovery logged as soft instead of hard
Hello,
We've got an issue where sometimes a service recovery is soft when it should be hard. As a result we're not getting notifications for those recoveries.
It's definitely this bug:
https://github.com/NagiosEnterprises/na ... issues/651
However we're running Nagios XI 5.6.7, which I believe should include the bug fix. Any idea on how to fix the issue?
We've got an issue where sometimes a service recovery is soft when it should be hard. As a result we're not getting notifications for those recoveries.
It's definitely this bug:
https://github.com/NagiosEnterprises/na ... issues/651
However we're running Nagios XI 5.6.7, which I believe should include the bug fix. Any idea on how to fix the issue?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service recovery logged as soft instead of hard
Have you noticed is it ever the same services or are they always different?
I ask because once the fix is applied, there is still the possibility that a service will do that until it has cycled through a non-OK state and back.
I ask because once the fix is applied, there is still the possibility that a service will do that until it has cycled through a non-OK state and back.
-
crystal.then
- Posts: 57
- Joined: Mon Oct 27, 2014 12:05 am
Re: Service recovery logged as soft instead of hard
It's difficult to tell as most of our servers aren't going down too often (thankfully), and the order of events that need to occur are pretty rare. But I have been able to find a few instances of it happening more than once for a single service.
e.g.1:
State History: Notifications: e.g.2:
State History: This one didn't send out notifications as the timing wasn't right, but am I right in saying the recoveries should be hard in these cases?
We updated to 5.6.7 on October 30th. The first example is from a service that was added after this date.
e.g.1:
State History: Notifications: e.g.2:
State History: This one didn't send out notifications as the timing wasn't right, but am I right in saying the recoveries should be hard in these cases?
We updated to 5.6.7 on October 30th. The first example is from a service that was added after this date.
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service recovery logged as soft instead of hard
By chance is the host going in a down state? I ask because you are also seeing a CRITICAL 1/5 which makes me believe that the host may be down also which would log the recovery as soft.
-
crystal.then
- Posts: 57
- Joined: Mon Oct 27, 2014 12:05 am
Re: Service recovery logged as soft instead of hard
Yes, the host is also going down in these cases.
The problem for us is that after the host comes back, a critical notification gets sent for the service. Then when the service recovers, no recovery notification gets sent because the recovery is soft.
Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.
The problem for us is that after the host comes back, a critical notification gets sent for the service. Then when the service recovers, no recovery notification gets sent because the recovery is soft.
Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service recovery logged as soft instead of hard
This is normally the case, however if the host comes back up but the service isn't able to respond a notification can be sent.crystal.then wrote:Shouldn't the critical notification for the service get suppressed? This doesn't happen for all services on the host that goes down.
It is possible it may be remedied by increasing the max_check_attempts for the services affected.
-
crystal.then
- Posts: 57
- Joined: Mon Oct 27, 2014 12:05 am
Re: Service recovery logged as soft instead of hard
Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service recovery logged as soft instead of hard
It should if the host is also up this whole timecrystal.then wrote:Okay I understand that. But if the service goes into a hard critical state after the host comes back up, then shouldn't the recovery also be hard and send a notification?
-
crystal.then
- Posts: 57
- Joined: Mon Oct 27, 2014 12:05 am
Re: Service recovery logged as soft instead of hard
Here's the state history of the host from example 1:
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service recovery logged as soft instead of hard
As I suspected, the service went directly to hard critical when the host was soft down, then the service recovered.
Here's a deeper explanation of the logic
https://assets.nagios.com/downloads/nag ... types.html
Here's a deeper explanation of the logic
https://assets.nagios.com/downloads/nag ... types.html