Hi, I'm using the check_logfiles v3.4.2 plugin to check for errors in syslog messages on Linux machines, and I cannot make the --sticky option work as I would expect to.
Upon detecting an error in logs it keeps the alert status for a few minutes, but then goes back to OK even without matching an okpattern.
I tried just "--sticky", but also "--sticky=0" and "--sticky=90000". The result appears to be always the same.
How can the sticky duration be set? Am I missing anything?
Thanks.
Marco
check_logfiles and duration of --sticky option
Re: check_logfiles and duration of --sticky option
Have you tried to update the the latest version of the plugin? Looking at their page, it looks like there have been quite a few releases from since the version you are using.
Former Nagios Employee.
me.
me.
Re: check_logfiles and duration of --sticky option
Thank you, but I've read the release notes and I see no changes in the --sticky option after v3.4.2, so I doubt I'll get a different behaviour... I can try anyway on a test machine, but I'm not sure I'll be able to bring the new version on the production machine.
Re: check_logfiles and duration of --sticky option
I downloaded the latest version 3.7.3 and tested it on a test machine, but the result is still the same: the alarm seems to be reset after a few minutes, whatever option I use.
Any clues?
Thanks.
Marco
Any clues?
Thanks.
Marco
Re: check_logfiles and duration of --sticky option
Being that this is not our plugin, we really have no direct connection with the author it becomes pretty difficult to troubleshoot. Can you try something like 4000 seconds? Maybe there is a limit I am not aware of.
Former Nagios Employee.
me.
me.
Re: check_logfiles and duration of --sticky option
Problem solved!
My fault... I didn't realize that there was another machine launching check_logfiles via nrpe on my test machine, without the --sticky option, which was resetting the sticky I was setting by launching the command locally instead.
I stopped the automated check and now my local tests give the expected results, also with version 3.4.2.
For the record, I was helped in debugging by the -v option (only available in the newer version) and by the debug output the plugin creates in /tmp/check_logfiles.trace if you create the file beforehand.
Now I can also see that the default behaviour with just "--sticky", no numeric values, will keep alarms sticky for 10 years (!!).
Thank you for making me insist in testing
.
Marco
My fault... I didn't realize that there was another machine launching check_logfiles via nrpe on my test machine, without the --sticky option, which was resetting the sticky I was setting by launching the command locally instead.
I stopped the automated check and now my local tests give the expected results, also with version 3.4.2.
For the record, I was helped in debugging by the -v option (only available in the newer version) and by the debug output the plugin creates in /tmp/check_logfiles.trace if you create the file beforehand.
Now I can also see that the default behaviour with just "--sticky", no numeric values, will keep alarms sticky for 10 years (!!).
Thank you for making me insist in testing
Marco
Re: check_logfiles and duration of --sticky option
Awesome! I'm glad to hear it is working. I was a little bit concerned because people seem to have very few issues with that plugin. I'll go ahead and close this thread. Please let us know if you need help with anything else.
Thanks.
Thanks.
Former Nagios Employee.
me.
me.