Retry Interval behaviour after upgrade to 4.4.1

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Retry Interval behaviour after upgrade to 4.4.1

Postby tonymcg27 » Wed Aug 01, 2018 10:54 pm

Hi Guys,

Most of my services are setup to run every 10 minutes, with a retry interval of 1 minute.
But after upgrading from 4.3.4 to 4.4.1, when a service detects the first soft state change, the next check is scheduled 10 minutes later, not 1 minute later.
I didn't make any changes to my nagios.cfg file during the upgrade, so maybe it needs tweaking, but I didn't see anything in the release notes that suggested that a change was necessary.
I'm just wondering if anyone else has noticed this behaviour? Is there a bug in 4.4.1, or in my cfg file(s)?

Cheers from Down Under,
Tony
tonymcg27
 
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby scottwilkerson » Thu Aug 02, 2018 7:24 am

This is a known big and is fixed in the maint branch

I believe I found the cause in Core and is fixed in the maint branch on Github
https://github.com/NagiosEnterprises/na ... tree/maint​​
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12054
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby tonymcg27 » Thu Aug 02, 2018 7:38 pm

Thanks Scott, for your quick response and resolution. I have installed the "maint" release from GitHub and the check intervals are fine now.
But I've spotted another issue, it seems that most of the "SOFT;2" records are missing from the nagios.log file. The only time I see a SOFT;2 record is when the state of the service changes, e.g. if the SOFT;1 record is a WARNING and the SOFT;2 record is a CRITICAL. It's not a big issue, so no hurry. Is that a known issue too?
Thanks again.
tonymcg27
 
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby scottwilkerson » Thu Aug 02, 2018 10:31 pm

There was one more commit tonight to the maint branch that I believe fixes this as well
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12054
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby tonymcg27 » Sun Aug 05, 2018 7:10 pm

Hi Scott, thanks again for the quick reply. I have installed the latest "maint" code but it doesn't seem to fix the logging issue. And I've also noticed that when the checks recover I am not getting a Recovery Notification.
tonymcg27
 
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby scottwilkerson » Mon Aug 06, 2018 7:15 am

I should have noted, the services that were stuck in the soft state will need to go into an ok state before they will act normally, this can either be natural, or by sending an ok passive check, or to to them all in one go, removing the retention.dat with the following

Code: Select all
service nagios stop
rm -f /usr/local/nagios/var/retention.dat
service nagios start


The above will make all the checks go into a pending state until they receive their first check result.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12054
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby tonymcg27 » Wed Aug 08, 2018 11:40 pm

Sorry Scott, but this still isn't working. Even after removing the retention.dat file and starting afresh, it's the same behaviour. Missing SOFT;2 records from nagios.log, and no recovery notifications.
I then rolled back to v4.3.4, but using the same files from etc and var, and it works just fine.
I have setup symlinks for the etc and var directories that point to a shared directory to make it easy to flip between versions, so I hope that's not mucking things up, i.e.
Code: Select all
[root@nagios local]# ll -d /usr/local/nagios*
lrwxrwxrwx  1 root root         9 Aug  8 14:05 /usr/local/nagios -> nagios441B
drwxr-xr-x  7 root root      4096 Aug  2 11:54 /usr/local/nagios434
drwxr-xr-x 10 root root      4096 Sep  9  2014 /usr/local/nagios407
drwxr-xr-x  7 root root      4096 Aug  2 11:55 /usr/local/nagios441
drwxr-xr-x  7 root root      4096 Aug  3 09:15 /usr/local/nagios441B        <--- the "maint" release
drwxr-xr-x  4 root root      4096 Aug  2 11:46 /usr/local/nagioscommon

[root@nagios local]# ll  /usr/local/nagios441B
total 20
drwxrwxr-x  2 nagios nagios 4096 Aug  6 09:40 bin
lrwxrwxrwx  1 root   root     27 Aug  3 09:15 etc -> /usr/local/nagioscommon/etc
drwxr-xr-x  2 root   root   4096 Aug  3 09:14 include
drwxrwxr-x  2 nagios nagios 4096 Aug  3 09:14 libexec
drwxrwxr-x  2 nagios nagios 4096 Aug  6 09:40 sbin
drwxrwxr-x 15 nagios nagios 4096 Aug  6 09:40 share
lrwxrwxrwx  1 root   root     27 Aug  3 09:15 var -> /usr/local/nagioscommon/var
tonymcg27
 
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby scottwilkerson » Thu Aug 09, 2018 8:10 am

I hadn't caught this in the first change, one more commit to the maint branch was made this morning that I tested fixes the logging on SOFT states > 1
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12054
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby tonymcg27 » Thu Aug 09, 2018 8:58 pm

Woohoo, it works!!!
Thanks Scott, for putting up with my nagging :)
tonymcg27
 
Posts: 7
Joined: Thu Jul 26, 2018 6:52 pm

Re: Retry Interval behaviour after upgrade to 4.4.1

Postby scottwilkerson » Fri Aug 10, 2018 8:42 am

tonymcg27 wrote:Woohoo, it works!!!
Thanks Scott, for putting up with my nagging :)


No problem, thanks for assisting in finding the bug!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12054
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Next

Return to Nagios Core

Who is online

Users browsing this forum: bluedive, ocerda, skazi and 30 guests