Nagios Support Forum

Posted: **Wed Aug 01, 2018 10:54 pm**

Hi Guys,

Most of my services are setup to run every 10 minutes, with a retry interval of 1 minute.
But after upgrading from 4.3.4 to 4.4.1, when a service detects the first soft state change, the next check is scheduled 10 minutes later, not 1 minute later.
I didn't make any changes to my nagios.cfg file during the upgrade, so maybe it needs tweaking, but I didn't see anything in the release notes that suggested that a change was necessary.
I'm just wondering if anyone else has noticed this behaviour? Is there a bug in 4.4.1, or in my cfg file(s)?

Cheers from Down Under,
Tony

Posted: **Thu Aug 02, 2018 7:24 am**

This is a known big and is fixed in the maint branch

I believe I found the cause in Core and is fixed in the maint branch on Github
https://github.com/NagiosEnterprises/na ... ee/maint

Posted: **Thu Aug 02, 2018 7:38 pm**

Thanks Scott, for your quick response and resolution. I have installed the "maint" release from GitHub and the check intervals are fine now.
But I've spotted another issue, it seems that most of the "SOFT;2" records are missing from the nagios.log file. The only time I see a SOFT;2 record is when the state of the service changes, e.g. if the SOFT;1 record is a WARNING and the SOFT;2 record is a CRITICAL. It's not a big issue, so no hurry. Is that a known issue too?
Thanks again.

Posted: **Thu Aug 02, 2018 10:31 pm**

There was one more commit tonight to the maint branch that I believe fixes this as well

Posted: **Sun Aug 05, 2018 7:10 pm**

Hi Scott, thanks again for the quick reply. I have installed the latest "maint" code but it doesn't seem to fix the logging issue. And I've also noticed that when the checks recover I am not getting a Recovery Notification.

Posted: **Mon Aug 06, 2018 7:15 am**

I should have noted, the services that were stuck in the soft state will need to go into an ok state before they will act normally, this can either be natural, or by sending an ok passive check, or to to them all in one go, removing the retention.dat with the following

Code: Select all

service nagios stop
rm -f /usr/local/nagios/var/retention.dat
service nagios start

The above will make all the checks go into a pending state until they receive their first check result.

Posted: **Wed Aug 08, 2018 11:40 pm**

Sorry Scott, but this still isn't working. Even after removing the retention.dat file and starting afresh, it's the same behaviour. Missing SOFT;2 records from nagios.log, and no recovery notifications.
I then rolled back to v4.3.4, but using the same files from etc and var, and it works just fine.
I have setup symlinks for the etc and var directories that point to a shared directory to make it easy to flip between versions, so I hope that's not mucking things up, i.e.

Code: Select all

[root@nagios local]# ll -d /usr/local/nagios*
lrwxrwxrwx  1 root root         9 Aug  8 14:05 /usr/local/nagios -> nagios441B
drwxr-xr-x  7 root root      4096 Aug  2 11:54 /usr/local/nagios434
drwxr-xr-x 10 root root      4096 Sep  9  2014 /usr/local/nagios407
drwxr-xr-x  7 root root      4096 Aug  2 11:55 /usr/local/nagios441
drwxr-xr-x  7 root root      4096 Aug  3 09:15 /usr/local/nagios441B        <--- the "maint" release
drwxr-xr-x  4 root root      4096 Aug  2 11:46 /usr/local/nagioscommon

[root@nagios local]# ll  /usr/local/nagios441B
total 20
drwxrwxr-x  2 nagios nagios 4096 Aug  6 09:40 bin
lrwxrwxrwx  1 root   root     27 Aug  3 09:15 etc -> /usr/local/nagioscommon/etc
drwxr-xr-x  2 root   root   4096 Aug  3 09:14 include
drwxrwxr-x  2 nagios nagios 4096 Aug  3 09:14 libexec
drwxrwxr-x  2 nagios nagios 4096 Aug  6 09:40 sbin
drwxrwxr-x 15 nagios nagios 4096 Aug  6 09:40 share
lrwxrwxrwx  1 root   root     27 Aug  3 09:15 var -> /usr/local/nagioscommon/var

Posted: **Thu Aug 09, 2018 8:10 am**

I hadn't caught this in the first change, one more commit to the maint branch was made this morning that I tested fixes the logging on SOFT states > 1

Posted: **Thu Aug 09, 2018 8:58 pm**

Woohoo, it works!!!
Thanks Scott, for putting up with my nagging

Posted: **Fri Aug 10, 2018 8:42 am**

tonymcg27 wrote:Woohoo, it works!!!
Thanks Scott, for putting up with my nagging

No problem, thanks for assisting in finding the bug!

Nagios Support Forum

Retry Interval behaviour after upgrade to 4.4.1

Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1

Re: Retry Interval behaviour after upgrade to 4.4.1