We have an issue in a client environment. They get CPU spikes, which cause Nagios checks to fail with the "Plugin timed out while executing system call". I've increased the timeout value as much as I am comfortable with, but that still occurs.
I am aware that fixing the server is the preferred solution, but as I said, that's a client environment, so fixing the server may not happen.
Is there a way to make Nagios suppress the alert if the message says "Plugin timed out"?
Suppressing "timed out" alerts
Re: Suppressing "timed out" alerts
How is this check performed? an you post the full check command? Depending on the plugin, you may be able to set a an option to do so, or create a wrapper script.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
cunningrat
- Posts: 29
- Joined: Mon Nov 18, 2013 5:51 pm
Re: Suppressing "timed out" alerts
All of the checks are performed via check_by_ssh: the plugins hit on the client side are mostly default Nagios ones, or home-grown perl scripts.abrist wrote:How is this check performed? an you post the full check command? Depending on the plugin, you may be able to set a an option to do so, or create a wrapper script.
Here's a representative example:
$USER1$/check_by_ssh -H $HOSTADDRESS$ -t 45 -C "/home/nagios/scripts/ready/check_disk -w 10% -c 5% -W 40% -K 30% -p /exe_prd/temp"
I didn't find any appropriate flags in the check_by_ssh documentation.
-
cunningrat
- Posts: 29
- Joined: Mon Nov 18, 2013 5:51 pm
Re: Suppressing "timed out" alerts
I saw the -u flag in check_nrpe. Pity I'm not using check_nrpe.
I'm going to go post in the suggestions forum about adding a flag with that functionality to check_by_ssh.
I'm going to go post in the suggestions forum about adding a flag with that functionality to check_by_ssh.
Re: Suppressing "timed out" alerts
For now, you may just want to create a wrapper script that runs your check and saves the exit code and status/perf string. Check for "CRITICAL - Plugin timed out after" in the status string, if it matches, replace "CRITICAL" with "WARNING", "UNKNOWN" or "OK" and exit with the new respective exit. Otherwise, just return the status/perf string and keep the original exit code. For example:cunningrat wrote: I'm going to go post in the suggestions forum about adding a flag with that functionality to check_by_ssh.
Command:
Code: Select all
$USER1$/check_by_ssh_custom.sh "$HOSTADDRESS$" $ARG1$ "$ARG2$"Code: Select all
#!/bin/bash
HOST=$1
TIMEOUT=$2
COMMAND=$3
OUTPUT=$(/usr/local/nagios/libexec/check_by_ssh -H "$HOST" -t $TIMEOUT -C "$COMMAND")
EXIT=$(echo $?)
if $(echo "$OUTPUT" | grep -q "CRITICAL - Plugin timed out after");then
OUTPUT=$(echo "$OUTPUT" | sed 's/CRITICAL/UNKNOWN/g')
echo "$OUTPUT"
exit 3
else
echo "$OUTPUT"
exit $EXIT
fi
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
cunningrat
- Posts: 29
- Joined: Mon Nov 18, 2013 5:51 pm
Re: Suppressing "timed out" alerts
I'll try that, abrist. Thanks!
Re: Suppressing "timed out" alerts
Alright! Let me know if you have issues. Happy scripting!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.