Page 1 of 1
check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 11:53 am
by PhoneSuite
Hello All!
I need to modify the check_ntp_time check (in plugin version 2.2.0) so that it will not alert (send status OK) upon a socket timeout.
The servers we are using this check on often timeout and we are getting a lot of alerts when we don't need/want them.
I am aware of the following
* This is an odd request
* You can mute the alert emails for any check you wish in Nagios
* Maybe I should figure out why the NTP check is timing out and fix the root cause
* There is a "-q" flag you can add to the check
* This check is written in C and you have to compile it before using it
All I need is for someone with more C knowledge than I have to see if they can modify the source code and then just tell/show me where the modifications were made so that I can do the same on my end.
I already tried modifying this (line 566):
Code: Select all
offset = offset_request(server_address, &offset_result);
if (offset_result == STATE_UNKNOWN) {
result = (quiet == 1 ? STATE_UNKNOWN : STATE_CRITICAL);
To this:
Code: Select all
offset = offset_request(server_address, &offset_result);
if (offset_result == STATE_OK) {
result = (quiet == 1 ? STATE_OK : STATE_OK);
But this did not work.
Re: check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 2:37 pm
by mcapra
Does quiet mode not produce the desired effect?
It's hard for me to navigate the plugin without knowing specifically what outputs you need changed. Could you share your full check_command definition you're currently leveraging (sanitize as needed) as well as some of the CRITICAL check outputs you would like to instead be recognized as "OK" or maybe "UNKNOWN"?
Re: check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 2:52 pm
by tmcdonald
What is the exact status that the service gives when you hit the timeout? The way timeouts are handled is not as straight-forward as you might think, and are handled by Core if my assumptions are correct about the message you are seeing. When you see Service check timed out after X seconds as the status, that means the plugin self-terminated after 10 seconds (by default) and Core detected this, giving that message and the critical status. The way around this would be to patch Core to return OK instead, but that would affect all timeouts, not just one service.
Note that this is different from the service_check_timeout_state option in nagios.cfg - that is used for cases where a plugin does not have its own internal timeout, so things don't run forever.
Re: check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 3:07 pm
by jfrickson
Or, you can specify the timeout and status like this: -t 10:OK. Instead of OK you can use OK, 0, WARNING, 1, CRITICAL, 2, UNKNOWN, or 3.
Re: check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 3:10 pm
by tmcdonald
Or just do what
@jfrickson said :) Totally forgot about that flag status option.
Re: check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 3:49 pm
by PhoneSuite
jfrickson, this suggestion seems to be what I need however it does not appear to be working. Here are some sample inputs and outputs. If you could let me know what I am doing wrong it would be very helpful.
Code: Select all
#./check_ntp_time -t 10:OK -H us.pool.ntp.org -w 1 -c 3
CRITICAL - Socket timeout after 10 seconds
Code: Select all
#./check_ntp_time -t 10:0 -H us.pool.ntp.org -w 1 -c 3
CRITICAL - Socket timeout after 10 seconds
Code: Select all
#./check_ntp_time -V
check_ntp_time v2051 (nagios-plugins 1.4.13)
So it seems that no matter what I do with the -t option I still get a returned state of Critical, not OK as I want. Also trying the -q option does nothing, I get the exact same output "CRITICAL - Socket timeout after 10 seconds" as I get without it.
Re: check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 4:31 pm
by jfrickson
Version 1.4.13? That's over eight years old! That format for the
-t timeout parameter did not exist way back then.
Go to
plugins/netutils.c and change this:
Code: Select all
/* handles socket timeouts */
void
socket_timeout_alarm_handler (int sig)
{
if (sig == SIGALRM)
printf (_("CRITICAL - Socket timeout after %d seconds\n"), socket_timeout);
else
printf (_("CRITICAL - Abnormal timeout after %d seconds\n"), socket_timeout);
exit (STATE_CRITICAL);
}
to this:
Code: Select all
/* handles socket timeouts */
void
socket_timeout_alarm_handler (int sig)
{
if (sig == SIGALRM)
printf (_("OK - Socket timeout after %d seconds\n"), socket_timeout);
else
printf (_("OK - Abnormal timeout after %d seconds\n"), socket_timeout);
exit (STATE_OK);
}
Re: check_ntp_time Modification Question
Posted: Fri Jan 20, 2017 4:55 pm
by PhoneSuite
jfrickson I am so sorry, I downloaded the latest check from Nagios but apparently did not copy the correct file to the correct location. My version is now "check_ntp_time v2.2.0 (nagios-plugins 2.2.0)" and furthermore when I try the -t tag I now get:
Code: Select all
# ./check_ntp_time -t 15:0 -H us.pool.ntp.org -w 1 -c 3
OK - Socket timeout
So your suggestion for using the -t flag was spot on and VERY helpful. Thank you very much.

Re: check_ntp_time Modification Question
Posted: Mon Jan 23, 2017 11:16 am
by dwhitfield
It sounds like this issue has been resolved. Is it okay if we lock this thread? Thanks for choosing the Nagios forums!