Retry Interval behaviour after upgrade to 4.4.1

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
tonymcg27
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Retry Interval behaviour after upgrade to 4.4.1

Post by tonymcg27 »

Hi Guys,

Most of my services are setup to run every 10 minutes, with a retry interval of 1 minute.
But after upgrading from 4.3.4 to 4.4.1, when a service detects the first soft state change, the next check is scheduled 10 minutes later, not 1 minute later.
I didn't make any changes to my nagios.cfg file during the upgrade, so maybe it needs tweaking, but I didn't see anything in the release notes that suggested that a change was necessary.
I'm just wondering if anyone else has noticed this behaviour? Is there a bug in 4.4.1, or in my cfg file(s)?

Cheers from Down Under,
Tony
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by scottwilkerson »

This is a known big and is fixed in the maint branch

I believe I found the cause in Core and is fixed in the maint branch on Github
https://github.com/NagiosEnterprises/na ... ee/maint​​
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
tonymcg27
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by tonymcg27 »

Thanks Scott, for your quick response and resolution. I have installed the "maint" release from GitHub and the check intervals are fine now.
But I've spotted another issue, it seems that most of the "SOFT;2" records are missing from the nagios.log file. The only time I see a SOFT;2 record is when the state of the service changes, e.g. if the SOFT;1 record is a WARNING and the SOFT;2 record is a CRITICAL. It's not a big issue, so no hurry. Is that a known issue too?
Thanks again.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by scottwilkerson »

There was one more commit tonight to the maint branch that I believe fixes this as well
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
tonymcg27
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by tonymcg27 »

Hi Scott, thanks again for the quick reply. I have installed the latest "maint" code but it doesn't seem to fix the logging issue. And I've also noticed that when the checks recover I am not getting a Recovery Notification.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by scottwilkerson »

I should have noted, the services that were stuck in the soft state will need to go into an ok state before they will act normally, this can either be natural, or by sending an ok passive check, or to to them all in one go, removing the retention.dat with the following

Code: Select all

service nagios stop
rm -f /usr/local/nagios/var/retention.dat
service nagios start
The above will make all the checks go into a pending state until they receive their first check result.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
tonymcg27
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by tonymcg27 »

Sorry Scott, but this still isn't working. Even after removing the retention.dat file and starting afresh, it's the same behaviour. Missing SOFT;2 records from nagios.log, and no recovery notifications.
I then rolled back to v4.3.4, but using the same files from etc and var, and it works just fine.
I have setup symlinks for the etc and var directories that point to a shared directory to make it easy to flip between versions, so I hope that's not mucking things up, i.e.

Code: Select all

[root@nagios local]# ll -d /usr/local/nagios*
lrwxrwxrwx  1 root root         9 Aug  8 14:05 /usr/local/nagios -> nagios441B
drwxr-xr-x  7 root root      4096 Aug  2 11:54 /usr/local/nagios434
drwxr-xr-x 10 root root      4096 Sep  9  2014 /usr/local/nagios407
drwxr-xr-x  7 root root      4096 Aug  2 11:55 /usr/local/nagios441
drwxr-xr-x  7 root root      4096 Aug  3 09:15 /usr/local/nagios441B        <--- the "maint" release
drwxr-xr-x  4 root root      4096 Aug  2 11:46 /usr/local/nagioscommon

[root@nagios local]# ll  /usr/local/nagios441B
total 20
drwxrwxr-x  2 nagios nagios 4096 Aug  6 09:40 bin
lrwxrwxrwx  1 root   root     27 Aug  3 09:15 etc -> /usr/local/nagioscommon/etc
drwxr-xr-x  2 root   root   4096 Aug  3 09:14 include
drwxrwxr-x  2 nagios nagios 4096 Aug  3 09:14 libexec
drwxrwxr-x  2 nagios nagios 4096 Aug  6 09:40 sbin
drwxrwxr-x 15 nagios nagios 4096 Aug  6 09:40 share
lrwxrwxrwx  1 root   root     27 Aug  3 09:15 var -> /usr/local/nagioscommon/var
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by scottwilkerson »

I hadn't caught this in the first change, one more commit to the maint branch was made this morning that I tested fixes the logging on SOFT states > 1
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
tonymcg27
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by tonymcg27 »

Woohoo, it works!!!
Thanks Scott, for putting up with my nagging :)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Retry Interval behaviour after upgrade to 4.4.1

Post by scottwilkerson »

tonymcg27 wrote:Woohoo, it works!!!
Thanks Scott, for putting up with my nagging :)
No problem, thanks for assisting in finding the bug!
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked