Suppressing "timed out" alerts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
cunningrat
Posts: 29
Joined: Mon Nov 18, 2013 5:51 pm

Suppressing "timed out" alerts

Post by cunningrat »

We have an issue in a client environment. They get CPU spikes, which cause Nagios checks to fail with the "Plugin timed out while executing system call". I've increased the timeout value as much as I am comfortable with, but that still occurs.

I am aware that fixing the server is the preferred solution, but as I said, that's a client environment, so fixing the server may not happen.

Is there a way to make Nagios suppress the alert if the message says "Plugin timed out"?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Suppressing "timed out" alerts

Post by abrist »

How is this check performed? an you post the full check command? Depending on the plugin, you may be able to set a an option to do so, or create a wrapper script.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cunningrat
Posts: 29
Joined: Mon Nov 18, 2013 5:51 pm

Re: Suppressing "timed out" alerts

Post by cunningrat »

abrist wrote:How is this check performed? an you post the full check command? Depending on the plugin, you may be able to set a an option to do so, or create a wrapper script.
All of the checks are performed via check_by_ssh: the plugins hit on the client side are mostly default Nagios ones, or home-grown perl scripts.
Here's a representative example:
$USER1$/check_by_ssh -H $HOSTADDRESS$ -t 45 -C "/home/nagios/scripts/ready/check_disk -w 10% -c 5% -W 40% -K 30% -p /exe_prd/temp"

I didn't find any appropriate flags in the check_by_ssh documentation.
cunningrat
Posts: 29
Joined: Mon Nov 18, 2013 5:51 pm

Re: Suppressing "timed out" alerts

Post by cunningrat »

I saw the -u flag in check_nrpe. Pity I'm not using check_nrpe.

I'm going to go post in the suggestions forum about adding a flag with that functionality to check_by_ssh. :)
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Suppressing "timed out" alerts

Post by abrist »

cunningrat wrote: I'm going to go post in the suggestions forum about adding a flag with that functionality to check_by_ssh. :)
For now, you may just want to create a wrapper script that runs your check and saves the exit code and status/perf string. Check for "CRITICAL - Plugin timed out after" in the status string, if it matches, replace "CRITICAL" with "WARNING", "UNKNOWN" or "OK" and exit with the new respective exit. Otherwise, just return the status/perf string and keep the original exit code. For example:
Command:

Code: Select all

$USER1$/check_by_ssh_custom.sh "$HOSTADDRESS$" $ARG1$ "$ARG2$"
check_by_ssh_custom.sh:

Code: Select all

#!/bin/bash
HOST=$1
TIMEOUT=$2
COMMAND=$3

OUTPUT=$(/usr/local/nagios/libexec/check_by_ssh -H "$HOST" -t $TIMEOUT -C "$COMMAND")
EXIT=$(echo $?)
if $(echo "$OUTPUT" | grep -q  "CRITICAL - Plugin timed out after");then
    OUTPUT=$(echo "$OUTPUT" | sed 's/CRITICAL/UNKNOWN/g')
    echo "$OUTPUT"
    exit 3
else
    echo "$OUTPUT"
    exit $EXIT
fi
Note: The above is just an example, I have not even tested this script.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cunningrat
Posts: 29
Joined: Mon Nov 18, 2013 5:51 pm

Re: Suppressing "timed out" alerts

Post by cunningrat »

I'll try that, abrist. Thanks!
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Suppressing "timed out" alerts

Post by abrist »

Alright! Let me know if you have issues. Happy scripting!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked