Page 1 of 1

check_logfiles and duration of --sticky option

Posted: Mon Oct 05, 2015 10:20 am
by starless
Hi, I'm using the check_logfiles v3.4.2 plugin to check for errors in syslog messages on Linux machines, and I cannot make the --sticky option work as I would expect to.

Upon detecting an error in logs it keeps the alert status for a few minutes, but then goes back to OK even without matching an okpattern.

I tried just "--sticky", but also "--sticky=0" and "--sticky=90000". The result appears to be always the same.

How can the sticky duration be set? Am I missing anything?
Thanks.
Marco

Re: check_logfiles and duration of --sticky option

Posted: Mon Oct 05, 2015 2:29 pm
by hsmith
Have you tried to update the the latest version of the plugin? Looking at their page, it looks like there have been quite a few releases from since the version you are using.

Re: check_logfiles and duration of --sticky option

Posted: Mon Oct 05, 2015 5:33 pm
by starless
Thank you, but I've read the release notes and I see no changes in the --sticky option after v3.4.2, so I doubt I'll get a different behaviour... I can try anyway on a test machine, but I'm not sure I'll be able to bring the new version on the production machine.

Re: check_logfiles and duration of --sticky option

Posted: Tue Oct 06, 2015 5:19 am
by starless
I downloaded the latest version 3.7.3 and tested it on a test machine, but the result is still the same: the alarm seems to be reset after a few minutes, whatever option I use.

Any clues?
Thanks.
Marco

Re: check_logfiles and duration of --sticky option

Posted: Tue Oct 06, 2015 3:23 pm
by hsmith
Being that this is not our plugin, we really have no direct connection with the author it becomes pretty difficult to troubleshoot. Can you try something like 4000 seconds? Maybe there is a limit I am not aware of.

Re: check_logfiles and duration of --sticky option

Posted: Wed Oct 07, 2015 10:44 am
by starless
Problem solved!
My fault... I didn't realize that there was another machine launching check_logfiles via nrpe on my test machine, without the --sticky option, which was resetting the sticky I was setting by launching the command locally instead.
I stopped the automated check and now my local tests give the expected results, also with version 3.4.2.

For the record, I was helped in debugging by the -v option (only available in the newer version) and by the debug output the plugin creates in /tmp/check_logfiles.trace if you create the file beforehand.
Now I can also see that the default behaviour with just "--sticky", no numeric values, will keep alarms sticky for 10 years (!!).

Thank you for making me insist in testing ;).
Marco

Re: check_logfiles and duration of --sticky option

Posted: Wed Oct 07, 2015 1:07 pm
by hsmith
Awesome! I'm glad to hear it is working. I was a little bit concerned because people seem to have very few issues with that plugin. I'll go ahead and close this thread. Please let us know if you need help with anything else.

Thanks.