Nagios Support Forum

Posted: **Fri Jan 20, 2017 11:53 am**

Hello All!

I need to modify the check_ntp_time check (in plugin version 2.2.0) so that it will not alert (send status OK) upon a socket timeout.

The servers we are using this check on often timeout and we are getting a lot of alerts when we don't need/want them.

I am aware of the following
* This is an odd request
* You can mute the alert emails for any check you wish in Nagios
* Maybe I should figure out why the NTP check is timing out and fix the root cause
* There is a "-q" flag you can add to the check
* This check is written in C and you have to compile it before using it

All I need is for someone with more C knowledge than I have to see if they can modify the source code and then just tell/show me where the modifications were made so that I can do the same on my end.

I already tried modifying this (line 566):

Code: Select all

        offset = offset_request(server_address, &offset_result);
        if (offset_result == STATE_UNKNOWN) {
                result = (quiet == 1 ? STATE_UNKNOWN : STATE_CRITICAL);

To this:

Code: Select all

offset = offset_request(server_address, &offset_result);
        if (offset_result == STATE_OK) {
                result = (quiet == 1 ? STATE_OK : STATE_OK);

But this did not work.

Posted: **Fri Jan 20, 2017 2:37 pm**

Does quiet mode not produce the desired effect?

It's hard for me to navigate the plugin without knowing specifically what outputs you need changed. Could you share your full check_command definition you're currently leveraging (sanitize as needed) as well as some of the CRITICAL check outputs you would like to instead be recognized as "OK" or maybe "UNKNOWN"?

Posted: **Fri Jan 20, 2017 2:52 pm**

What is the exact status that the service gives when you hit the timeout? The way timeouts are handled is not as straight-forward as you might think, and are handled by Core if my assumptions are correct about the message you are seeing. When you see Service check timed out after X seconds as the status, that means the plugin self-terminated after 10 seconds (by default) and Core detected this, giving that message and the critical status. The way around this would be to patch Core to return OK instead, but that would affect all timeouts, not just one service.

Note that this is different from the service_check_timeout_state option in nagios.cfg - that is used for cases where a plugin does not have its own internal timeout, so things don't run forever.

Posted: **Fri Jan 20, 2017 3:07 pm**

Or, you can specify the timeout and status like this: -t 10:OK. Instead of OK you can use OK, 0, WARNING, 1, CRITICAL, 2, UNKNOWN, or 3.

Posted: **Fri Jan 20, 2017 3:10 pm**

Or just do what @jfrickson said :) Totally forgot about that flag status option.

Posted: **Fri Jan 20, 2017 3:49 pm**

jfrickson, this suggestion seems to be what I need however it does not appear to be working. Here are some sample inputs and outputs. If you could let me know what I am doing wrong it would be very helpful.

Code: Select all

#./check_ntp_time -t 10:OK -H us.pool.ntp.org -w 1 -c 3
CRITICAL - Socket timeout after 10 seconds

Code: Select all

#./check_ntp_time -t 10:0 -H us.pool.ntp.org -w 1 -c 3
CRITICAL - Socket timeout after 10 seconds

Code: Select all

#./check_ntp_time -V
check_ntp_time v2051 (nagios-plugins 1.4.13)

So it seems that no matter what I do with the -t option I still get a returned state of Critical, not OK as I want. Also trying the -q option does nothing, I get the exact same output "CRITICAL - Socket timeout after 10 seconds" as I get without it.

Posted: **Fri Jan 20, 2017 4:31 pm**

Version 1.4.13? That's over eight years old! That format for the -t timeout parameter did not exist way back then.

Go to plugins/netutils.c and change this:

Code: Select all

/* handles socket timeouts */
void
socket_timeout_alarm_handler (int sig)
{
	if (sig == SIGALRM)
		printf (_("CRITICAL - Socket timeout after %d seconds\n"), socket_timeout);
	else
		printf (_("CRITICAL - Abnormal timeout after %d seconds\n"), socket_timeout);

	exit (STATE_CRITICAL);
}

to this:

Code: Select all

/* handles socket timeouts */
void
socket_timeout_alarm_handler (int sig)
{
	if (sig == SIGALRM)
		printf (_("OK - Socket timeout after %d seconds\n"), socket_timeout);
	else
		printf (_("OK - Abnormal timeout after %d seconds\n"), socket_timeout);

	exit (STATE_OK);
}

Posted: **Fri Jan 20, 2017 4:55 pm**

jfrickson I am so sorry, I downloaded the latest check from Nagios but apparently did not copy the correct file to the correct location. My version is now "check_ntp_time v2.2.0 (nagios-plugins 2.2.0)" and furthermore when I try the -t tag I now get:

Code: Select all

# ./check_ntp_time -t 15:0 -H us.pool.ntp.org -w 1 -c 3
OK - Socket timeout

So your suggestion for using the -t flag was spot on and VERY helpful. Thank you very much.

Posted: **Mon Jan 23, 2017 11:16 am**

It sounds like this issue has been resolved. Is it okay if we lock this thread? Thanks for choosing the Nagios forums!

Nagios Support Forum

check_ntp_time Modification Question

check_ntp_time Modification Question

Re: check_ntp_time Modification Question

Re: check_ntp_time Modification Question

Re: check_ntp_time Modification Question

Re: check_ntp_time Modification Question

Re: check_ntp_time Modification Question

Re: check_ntp_time Modification Question

Re: check_ntp_time Modification Question

Re: check_ntp_time Modification Question