Check_logfile: generating OK vs CRITICAL states

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
jimdurr
Posts: 11
Joined: Thu Jul 25, 2013 7:11 am

Check_logfile: generating OK vs CRITICAL states

Post by jimdurr »

I've got a command that's 90% of the way there. I'm trying to parse a log for 2 things:
1) stuck threads, this should be marked as critical
2) cleared stuck threads, this should be marked as OK

I can get the command to recognize the difference between the two when pointed at a test log, but I can't seem to figure out a way to have it look at the counts it's receiving for ok vs crit and calculate if the number of critical events is greater than the number of ok events.

Here's the command:
./check_nrpe -H test.server.com -c check_logfile -a file="c:\stdout-stderr.log" "filter=column1 like 'reported to be stuck' OR column1 like 'has been active for'" "ok=column1 like 'reported to be stuck'" "crit=column1 like 'has been active for'" top-syntax='${status} (${crit_list}): ${ok_count}/${crit_count}/${total}' 'crit=crit_count>ok_count'

Here's the output:
CRITICAL (has been active, has been active): 3/2/8

So I can see that it recognizes that there are 3 'OK' events in the test log file and 2 marked as 'critical', but it still flags the whole thing as critical and would therefore generate an alert even though there are more 'ok's than 'crit's. Is there a way to tell the command to do a 'greater than' and only send an alert when crit is greater than ok, and send a clear when ok is = to crit?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Check_logfile: generating OK vs CRITICAL states

Post by cdienger »

I wasn't able to find an option with the nsclient module but was able to create a simple script to wrap the check in. Definitely room for improvement but it demonstrates the idea:

Code: Select all

#!/bin/bash

results=`/usr/local/nagios/libexec/check_nrpe -H $1 -c check_logfile -a file="c:\stdout-stderr.log" "filter=column1 like 'reported to be stuck' OR column1 like 'has been active for'" "ok=column1 like 'reported to be stuck'" "crit=column1 like 'has been active for'" top-syntax='${ok_count} ${crit_count}'`

ok=`echo $results|awk '{ print $1 }'`
critical=`echo $results|awk '{ print $2 }'`

if [[ $critical > $ok ]]; then
        echo "Critical!"
        exit 2
fi

if [[ $ok > $critical ]]; then
        echo "OK!"
        exit 0
fi
The command can then be run with:

./check_nrpe.sh ip_address_of_remote_machine
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jimdurr
Posts: 11
Joined: Thu Jul 25, 2013 7:11 am

Re: Check_logfile: generating OK vs CRITICAL states

Post by jimdurr »

Awesome, I'll give that a shot and update. Thanks.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Check_logfile: generating OK vs CRITICAL states

Post by cdienger »

No problem :)
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jimdurr
Posts: 11
Joined: Thu Jul 25, 2013 7:11 am

Re: Check_logfile: generating OK vs CRITICAL states

Post by jimdurr »

Perfect. I've got it clearing critical states. I'm going to poke at it a bit to see if I can get it to return the number of critical phrases in the log as part of the notification just so I can see if things are really blowing up or just mostly blowing up.

Thanks so much for the help!
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Check_logfile: generating OK vs CRITICAL states

Post by cdienger »

Glad to hear it's useful.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked