Before I dig into a deep dive trying to find a solution I was hoping someone might already have some type of solution available.
My environment periodically has network issues which results in event storms when check_by_ssh timeouts occur. Even though I have it set to 30 seconds it results in "CRITICAL - Plugin timed out after 30 seconds" notifications after multiple failures. I thought about using a wrapper script to test the results and set a specific Status and Exit but thought perhaps there was a way the C source check_by_ssh.c that might be capable of the same behavior. Perhaps a way to make it result in a Warning instead.
Has anyone done any work with the C source to allow for the modification of behavior for timeouts? Maybe using the extra options???
Any help would be greatly appreciated...
Danny
check_by_ssh and timeouts
check_by_ssh and timeouts
Last edited by onegative on Tue Jul 24, 2018 3:35 pm, edited 1 time in total.
Re: check_by_ssh and timeouts
If you were looking for pointers on how to do this yourself, a simple set_timeout_state(STATE_UNKNOWN) in check_by_ssh and a rebuild would probably do the trick.
I think CRITICAL is a good general purpose state for that particular plugin's timeout, but your use case makes sense.
I think CRITICAL is a good general purpose state for that particular plugin's timeout, but your use case makes sense.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: check_by_ssh and timeouts
Not sure when it was added but in 2.2.1 of the plugin you can add a timeout state
Code: Select all
-t, --timeout=INTEGER:<timeout state>
Seconds before connection times out (default: 10)
Optional ":<timeout state>" can be a state integer (0,1,2,3) or a state STRINGRe: check_by_ssh and timeouts
This helped...I was able to modify utils.c as follows hard coding the exit (1);
Which resulted in the timeout still displaying CRITICAL but it exit results as 1 which triggers Warning response within the Nagios framework....I think this will serve what I need...
Thanks for your input...
Danny
Code: Select all
void
timeout_alarm_handler (int signo)
{
const char msg[] = " - Plugin timed out\n";
if (signo == SIGALRM) {
/* printf (_("%s - Plugin timed out after %d seconds\n"),
state_text(timeout_state), timeout_interval); */
switch(timeout_state) {
case STATE_OK:
write(STDOUT_FILENO, "OK", 2);
break;
case STATE_WARNING:
write(STDOUT_FILENO, "WARNING", 7);
break;
case STATE_CRITICAL:
write(STDOUT_FILENO, "CRITICAL", 8);
break;
case STATE_DEPENDENT:
write(STDOUT_FILENO, "DEPENDENT", 9);
break;
default:
write(STDOUT_FILENO, "UNKNOWN", 7);
break;
}
write(STDOUT_FILENO, msg, sizeof(msg) - 1);
exit (1);
}
}
Thanks for your input...
Danny
Re: check_by_ssh and timeouts
Wow I did not see the timeout state so I will give that a go...THANKS!!!!
Re: check_by_ssh and timeouts
Yeppers works like a CHARM!!!!!!!! Thanks for your help...and that makes the timeout usable for my environment.
Danny
Danny
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: check_by_ssh and timeouts
Excellent!onegative wrote:Yeppers works like a CHARM!!!!!!!! Thanks for your help...and that makes the timeout usable for my environment.
Danny
Locking