Page 1 of 1
Suppressing "timed out" alerts
Posted: Thu Apr 10, 2014 8:53 am
by cunningrat
We have an issue in a client environment. They get CPU spikes, which cause Nagios checks to fail with the "Plugin timed out while executing system call". I've increased the timeout value as much as I am comfortable with, but that still occurs.
I am aware that fixing the server is the preferred solution, but as I said, that's a client environment, so fixing the server may not happen.
Is there a way to make Nagios suppress the alert if the message says "Plugin timed out"?
Re: Suppressing "timed out" alerts
Posted: Thu Apr 10, 2014 11:12 am
by abrist
How is this check performed? an you post the full check command? Depending on the plugin, you may be able to set a an option to do so, or create a wrapper script.
Re: Suppressing "timed out" alerts
Posted: Thu Apr 10, 2014 11:28 am
by cunningrat
abrist wrote:How is this check performed? an you post the full check command? Depending on the plugin, you may be able to set a an option to do so, or create a wrapper script.
All of the checks are performed via check_by_ssh: the plugins hit on the client side are mostly default Nagios ones, or home-grown perl scripts.
Here's a representative example:
$USER1$/check_by_ssh -H $HOSTADDRESS$ -t 45 -C "/home/nagios/scripts/ready/check_disk -w 10% -c 5% -W 40% -K 30% -p /exe_prd/temp"
I didn't find any appropriate flags in the check_by_ssh documentation.
Re: Suppressing "timed out" alerts
Posted: Thu Apr 10, 2014 1:03 pm
by cunningrat
I saw the -u flag in check_nrpe. Pity I'm not using check_nrpe.
I'm going to go post in the suggestions forum about adding a flag with that functionality to check_by_ssh.

Re: Suppressing "timed out" alerts
Posted: Fri Apr 11, 2014 9:54 am
by abrist
cunningrat wrote:
I'm going to go post in the suggestions forum about adding a flag with that functionality to check_by_ssh.

For now, you may just want to create a wrapper script that runs your check and saves the exit code and status/perf string. Check for "CRITICAL - Plugin timed out after" in the status string, if it matches, replace "CRITICAL" with "WARNING", "UNKNOWN" or "OK" and exit with the new respective exit. Otherwise, just return the status/perf string and keep the original exit code. For example:
Command:
Code: Select all
$USER1$/check_by_ssh_custom.sh "$HOSTADDRESS$" $ARG1$ "$ARG2$"
check_by_ssh_custom.sh:
Code: Select all
#!/bin/bash
HOST=$1
TIMEOUT=$2
COMMAND=$3
OUTPUT=$(/usr/local/nagios/libexec/check_by_ssh -H "$HOST" -t $TIMEOUT -C "$COMMAND")
EXIT=$(echo $?)
if $(echo "$OUTPUT" | grep -q "CRITICAL - Plugin timed out after");then
OUTPUT=$(echo "$OUTPUT" | sed 's/CRITICAL/UNKNOWN/g')
echo "$OUTPUT"
exit 3
else
echo "$OUTPUT"
exit $EXIT
fi
Note: The above is just an example, I have not even tested this script.
Re: Suppressing "timed out" alerts
Posted: Mon Apr 14, 2014 3:10 pm
by cunningrat
I'll try that, abrist. Thanks!
Re: Suppressing "timed out" alerts
Posted: Mon Apr 14, 2014 4:18 pm
by abrist
Alright! Let me know if you have issues. Happy scripting!