Services stuck in soft state - no notifications

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Services stuck in soft state - no notifications

Post by GldRush98 »

Ok, I have upgraded a couple XI machines to 5.5 and while one is working fine, one is not it seems.

Services that go down are stuck in a soft state. Even well after the number of checks has passed and the service should be hard down, it's not changing to a hard state. I can observe this by looking at the service in nagios core.
Current Attempt: 5/5 (SOFT state)
Which is not right. It has been down for 18 minutes now and it should go hard after 5 minutes of being down.

I have also observed Reports -> Notifications is completely blank, leading me to conclude nothing is going in to a hard down state. It stays in a soft state, thus no notification, then it recovers and since it was only ever soft down, no notification is ever generated.

I have not ran in to this problem before and previous to the 5.5 update this machine was working completely fine.
Where should I start looking to troubleshoot this?
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Re: Services stuck in soft state - no notifications

Post by GldRush98 »

Also wanted to add, have verified configs, no errors or warnings generated.

Code: Select all

Nagios Core 4.4.1 
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors 
Copyright (c) 1999-2009 Ethan Galstad 
Last Modified: 2018-06-25 
License: GPL 

Website: https://www.nagios.org 
Reading configuration data... 
Read main config file okay... 
Read object config files okay... 

Running pre-flight check on configuration data... 

Checking objects... 
Checked 63 services. 
Checked 7 hosts. 
Checked 1 host groups. 
Checked 0 service groups. 
Checked 2 contacts. 
Checked 2 contact groups. 
Checked 137 commands. 
Checked 8 time periods. 
Checked 0 host escalations. 
Checked 0 service escalations. 
Checking for circular paths... 
Checked 7 hosts 
Checked 0 service dependencies 
Checked 0 host dependencies 
Checked 8 timeperiods 
Checking global event handlers... 
Checking obsessive compulsive processor commands... 
Checking misc settings... 

Total Warnings: 0 
Total Errors: 0 

Things look okay - No serious problems were detected during the pre-flight check 
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Re: Services stuck in soft state - no notifications

Post by GldRush98 »

stuck_soft.png
You do not have the required permissions to view the files attached to this post.
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Re: Services stuck in soft state - no notifications

Post by GldRush98 »

Update: I believe I have found the fix for this from another thread 3 years back. Yay for searching.
https://support.nagios.com/forum/viewto ... =7&t=33952

Basically I did:

Code: Select all

service nagios stop
rm /usr/local/nagios/var/retention.dat
service nagios start
I just stopped a service and it went hard after the 5th check and I immediately got a notification for it.

I have noticed one other peculiarity though. The first failed check doesn't seem to immediately rechedule the next next for the retry interval...

This service is configured like so:
f2.PNG
After failure 1, the next check should be the next minute, but it is not. The next check is scheduled for the regular check interval of 5 minutes. So, this service would effectively be down for 10 minutes instead of 5 minutes before receiving the first notification. I'm pretty sure this is not how it has worked in the past.
f1.PNG
On failure 2, the next check after that then kicks to the 1 minute interval like it should.
You do not have the required permissions to view the files attached to this post.
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Services stuck in soft state - no notifications

Post by lmiltchev »

After failure 1, the next check should be the next minute, but it is not. The next check is scheduled for the regular check interval of 5 minutes...
On failure 2, the next check after that then kicks to the 1 minute interval like it should.
This is a Nagios Core issue. We are aware of it, and our developers are working on a solution. Thank you for reporting this issue!
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Re: Services stuck in soft state - no notifications

Post by GldRush98 »

lmiltchev wrote:
After failure 1, the next check should be the next minute, but it is not. The next check is scheduled for the regular check interval of 5 minutes...
On failure 2, the next check after that then kicks to the 1 minute interval like it should.
This is a Nagios Core issue. We are aware of it, and our developers are working on a solution. Thank you for reporting this issue!
Ah, ok, thank you sir!
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Services stuck in soft state - no notifications

Post by scottwilkerson »

a 5.5.1 version with this fix will be released within a week.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Re: Services stuck in soft state - no notifications

Post by GldRush98 »

Well, after applying the fix above (removing the retention.dat file), this has occurred again today. I've got services sticking in a "soft" state and failing to send alerts to us!
You do not have the required permissions to view the files attached to this post.
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Services stuck in soft state - no notifications

Post by lmiltchev »

As scottwilkerson said, the fix is in 5.5.1, which was released today. You can upgrade your Nagios XI instance and check to see if the issue is resolved.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Re: Services stuck in soft state - no notifications

Post by GldRush98 »

Oh, I thought he was referring to the retry interval bug. I didn't realize this was fixed too. I have just updated to 5.5.1, so fingers crossed ;)
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
Locked