Page 1 of 2
Services stuck in soft state - no notifications
Posted: Mon Jul 09, 2018 12:22 pm
by GldRush98
Ok, I have upgraded a couple XI machines to 5.5 and while one is working fine, one is not it seems.
Services that go down are stuck in a soft state. Even well after the number of checks has passed and the service should be hard down, it's not changing to a hard state. I can observe this by looking at the service in nagios core.
Current Attempt: 5/5 (SOFT state)
Which is not right. It has been down for 18 minutes now and it should go hard after 5 minutes of being down.
I have also observed Reports -> Notifications is completely blank, leading me to conclude nothing is going in to a hard down state. It stays in a soft state, thus no notification, then it recovers and since it was only ever soft down, no notification is ever generated.
I have not ran in to this problem before and previous to the 5.5 update this machine was working completely fine.
Where should I start looking to troubleshoot this?
Re: Services stuck in soft state - no notifications
Posted: Mon Jul 09, 2018 12:26 pm
by GldRush98
Also wanted to add, have verified configs, no errors or warnings generated.
Code: Select all
Nagios Core 4.4.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-06-25
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 63 services.
Checked 7 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 2 contacts.
Checked 2 contact groups.
Checked 137 commands.
Checked 8 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 7 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 8 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Re: Services stuck in soft state - no notifications
Posted: Mon Jul 09, 2018 12:42 pm
by GldRush98
Re: Services stuck in soft state - no notifications
Posted: Mon Jul 09, 2018 3:11 pm
by GldRush98
Update: I believe I have found the fix for this from another thread 3 years back. Yay for searching.
https://support.nagios.com/forum/viewto ... =7&t=33952
Basically I did:
Code: Select all
service nagios stop
rm /usr/local/nagios/var/retention.dat
service nagios start
I just stopped a service and it went hard after the 5th check and I immediately got a notification for it.
I have noticed one other peculiarity though. The first failed check doesn't seem to immediately rechedule the next next for the retry interval...
This service is configured like so:
f2.PNG
After failure 1, the next check should be the next minute, but it is not. The next check is scheduled for the regular check interval of 5 minutes. So, this service would effectively be down for 10 minutes instead of 5 minutes before receiving the first notification. I'm pretty sure this is not how it has worked in the past.
f1.PNG
On failure 2, the next check after that then kicks to the 1 minute interval like it should.
Re: Services stuck in soft state - no notifications
Posted: Mon Jul 09, 2018 4:32 pm
by lmiltchev
After failure 1, the next check should be the next minute, but it is not. The next check is scheduled for the regular check interval of 5 minutes...
On failure 2, the next check after that then kicks to the 1 minute interval like it should.
This is a Nagios Core issue. We are aware of it, and our developers are working on a solution. Thank you for reporting this issue!
Re: Services stuck in soft state - no notifications
Posted: Tue Jul 10, 2018 8:16 am
by GldRush98
lmiltchev wrote:After failure 1, the next check should be the next minute, but it is not. The next check is scheduled for the regular check interval of 5 minutes...
On failure 2, the next check after that then kicks to the 1 minute interval like it should.
This is a Nagios Core issue. We are aware of it, and our developers are working on a solution. Thank you for reporting this issue!
Ah, ok, thank you sir!
Re: Services stuck in soft state - no notifications
Posted: Tue Jul 10, 2018 9:37 am
by scottwilkerson
a 5.5.1 version with this fix will be released within a week.
Re: Services stuck in soft state - no notifications
Posted: Thu Jul 12, 2018 10:32 am
by GldRush98
Well, after applying the fix above (removing the retention.dat file), this has occurred again today. I've got services sticking in a "soft" state and failing to send alerts to us!
Re: Services stuck in soft state - no notifications
Posted: Thu Jul 12, 2018 1:51 pm
by lmiltchev
As scottwilkerson said, the fix is in 5.5.1, which was released today. You can upgrade your Nagios XI instance and check to see if the issue is resolved.
Re: Services stuck in soft state - no notifications
Posted: Fri Jul 13, 2018 9:14 am
by GldRush98
Oh, I thought he was referring to the retry interval bug. I didn't realize this was fixed too. I have just updated to 5.5.1, so fingers crossed
