check_by_ssh and timeouts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
onegative
Posts: 175
Joined: Tue Feb 17, 2015 12:06 pm

check_by_ssh and timeouts

Post by onegative »

Before I dig into a deep dive trying to find a solution I was hoping someone might already have some type of solution available.
My environment periodically has network issues which results in event storms when check_by_ssh timeouts occur. Even though I have it set to 30 seconds it results in "CRITICAL - Plugin timed out after 30 seconds" notifications after multiple failures. I thought about using a wrapper script to test the results and set a specific Status and Exit but thought perhaps there was a way the C source check_by_ssh.c that might be capable of the same behavior. Perhaps a way to make it result in a Warning instead.

Has anyone done any work with the C source to allow for the modification of behavior for timeouts? Maybe using the extra options???

Any help would be greatly appreciated...
Danny
Last edited by onegative on Tue Jul 24, 2018 3:35 pm, edited 1 time in total.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_by_ssh and timeouts

Post by mcapra »

If you were looking for pointers on how to do this yourself, a simple set_timeout_state(STATE_UNKNOWN) in check_by_ssh and a rebuild would probably do the trick.

I think CRITICAL is a good general purpose state for that particular plugin's timeout, but your use case makes sense.
Former Nagios employee
https://www.mcapra.com/
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: check_by_ssh and timeouts

Post by scottwilkerson »

Not sure when it was added but in 2.2.1 of the plugin you can add a timeout state

Code: Select all

 -t, --timeout=INTEGER:<timeout state>
    Seconds before connection times out (default: 10)
    Optional ":<timeout state>" can be a state integer (0,1,2,3) or a state STRING
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
onegative
Posts: 175
Joined: Tue Feb 17, 2015 12:06 pm

Re: check_by_ssh and timeouts

Post by onegative »

This helped...I was able to modify utils.c as follows hard coding the exit (1);

Code: Select all

void
timeout_alarm_handler (int signo)
{
        const char msg[] = " - Plugin timed out\n";
        if (signo == SIGALRM) {
/*              printf (_("%s - Plugin timed out after %d seconds\n"),
                                                state_text(timeout_state), timeout_interval); */
                switch(timeout_state) {
                        case STATE_OK:
                                write(STDOUT_FILENO, "OK", 2);
                                break;
                        case STATE_WARNING:
                                write(STDOUT_FILENO, "WARNING", 7);
                                break;
                        case STATE_CRITICAL:
                                write(STDOUT_FILENO, "CRITICAL", 8);
                                break;
                        case STATE_DEPENDENT:
                                write(STDOUT_FILENO, "DEPENDENT", 9);
                                break;
                        default:
                                write(STDOUT_FILENO, "UNKNOWN", 7);
                                break;
                }
                write(STDOUT_FILENO, msg, sizeof(msg) - 1);
                exit (1);
        }
}
Which resulted in the timeout still displaying CRITICAL but it exit results as 1 which triggers Warning response within the Nagios framework....I think this will serve what I need...

Thanks for your input...
Danny
onegative
Posts: 175
Joined: Tue Feb 17, 2015 12:06 pm

Re: check_by_ssh and timeouts

Post by onegative »

Wow I did not see the timeout state so I will give that a go...THANKS!!!!
onegative
Posts: 175
Joined: Tue Feb 17, 2015 12:06 pm

Re: check_by_ssh and timeouts

Post by onegative »

Yeppers works like a CHARM!!!!!!!! Thanks for your help...and that makes the timeout usable for my environment.

Danny
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: check_by_ssh and timeouts

Post by scottwilkerson »

onegative wrote:Yeppers works like a CHARM!!!!!!!! Thanks for your help...and that makes the timeout usable for my environment.

Danny
Excellent!

Locking
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked