Page 1 of 1

Check_logfile: generating OK vs CRITICAL states

Posted: Wed Dec 12, 2018 10:44 am
by jimdurr
I've got a command that's 90% of the way there. I'm trying to parse a log for 2 things:
1) stuck threads, this should be marked as critical
2) cleared stuck threads, this should be marked as OK

I can get the command to recognize the difference between the two when pointed at a test log, but I can't seem to figure out a way to have it look at the counts it's receiving for ok vs crit and calculate if the number of critical events is greater than the number of ok events.

Here's the command:
./check_nrpe -H test.server.com -c check_logfile -a file="c:\stdout-stderr.log" "filter=column1 like 'reported to be stuck' OR column1 like 'has been active for'" "ok=column1 like 'reported to be stuck'" "crit=column1 like 'has been active for'" top-syntax='${status} (${crit_list}): ${ok_count}/${crit_count}/${total}' 'crit=crit_count>ok_count'

Here's the output:
CRITICAL (has been active, has been active): 3/2/8

So I can see that it recognizes that there are 3 'OK' events in the test log file and 2 marked as 'critical', but it still flags the whole thing as critical and would therefore generate an alert even though there are more 'ok's than 'crit's. Is there a way to tell the command to do a 'greater than' and only send an alert when crit is greater than ok, and send a clear when ok is = to crit?

Re: Check_logfile: generating OK vs CRITICAL states

Posted: Wed Dec 12, 2018 5:22 pm
by cdienger
I wasn't able to find an option with the nsclient module but was able to create a simple script to wrap the check in. Definitely room for improvement but it demonstrates the idea:

Code: Select all

#!/bin/bash

results=`/usr/local/nagios/libexec/check_nrpe -H $1 -c check_logfile -a file="c:\stdout-stderr.log" "filter=column1 like 'reported to be stuck' OR column1 like 'has been active for'" "ok=column1 like 'reported to be stuck'" "crit=column1 like 'has been active for'" top-syntax='${ok_count} ${crit_count}'`

ok=`echo $results|awk '{ print $1 }'`
critical=`echo $results|awk '{ print $2 }'`

if [[ $critical > $ok ]]; then
        echo "Critical!"
        exit 2
fi

if [[ $ok > $critical ]]; then
        echo "OK!"
        exit 0
fi
The command can then be run with:

./check_nrpe.sh ip_address_of_remote_machine

Re: Check_logfile: generating OK vs CRITICAL states

Posted: Wed Dec 12, 2018 5:25 pm
by jimdurr
Awesome, I'll give that a shot and update. Thanks.

Re: Check_logfile: generating OK vs CRITICAL states

Posted: Thu Dec 13, 2018 1:13 pm
by cdienger
No problem :)

Re: Check_logfile: generating OK vs CRITICAL states

Posted: Fri Dec 14, 2018 4:55 pm
by jimdurr
Perfect. I've got it clearing critical states. I'm going to poke at it a bit to see if I can get it to return the number of critical phrases in the log as part of the notification just so I can see if things are really blowing up or just mostly blowing up.

Thanks so much for the help!

Re: Check_logfile: generating OK vs CRITICAL states

Posted: Mon Dec 17, 2018 11:00 am
by cdienger
Glad to hear it's useful.