Page 1 of 1

check_ntp_time fails for first hour after restart

Posted: Mon Jun 17, 2013 4:56 am
by wzinet
Hi,

I've a strange issue, that I can't seem to find any links to on the web :(

I'm running nagios 3.4.1 on a VM (virtual box), under Ubuntu 12.04.
it's been up and running fine for the last 9 months (approx), and I've only seen real issues come out of it.

The machine recently suffered a disk issue (remounted root as r/o), but seems to be ok after a reboot & fsck. I'll replace the disk in due course, but for now here's where I am.

Whenever I restart nagios, _all_ the check_ntp_time scripts on the clients (various ubuntu flavours, running nrpe), give me "CRITICAL: Offset Unknown" for about an hour or so (I've not timed exactly), before they start to work.

If I run manually on a client I get:

/usr/lib/nagios/plugins/check_ntp_time -H lemon -w 1 -c 10 -v
sending request to peer 0
response from peer 0: offset -0.5986770391
sending request to peer 0
response from peer 0: offset -0.6005151272
sending request to peer 0
response from peer 0: offset -0.6005502939
sending request to peer 0
response from peer 0: offset -0.6005481482
discarding peer 0: flags=3
overall average offset: 0
NTP CRITICAL: Offset unknown|


(lemon is the name of my nagios host).

for reference on this machine (so it's not the leap-second bug), but it should be noted I've got many versions of the scripts from 1.4.10 onwards:
/usr/lib/nagios/plugins/check_ntp_time -V
check_ntp_time v1.4.14 (nagios-plugins 1.4.14)

Running the reverse on the nagios-server itself shows that I've got a bit of an offset, but within the thresholds:
/usr/local/nagios/libexec/check_ntp_time -v -H fandango
sending request to peer 0
response from peer 0: offset 0.7899760008
sending request to peer 0
response from peer 0: offset 0.7899907827
sending request to peer 0
response from peer 0: offset 0.7900112867
sending request to peer 0
response from peer 0: offset 0.7887747288
overall average offset: 0.7899760008
NTP OK: Offset 0.7899760008 secs|offset=0.789976s;60.000000;120.000000;

But a problem when running against localhost:
/usr/local/nagios/libexec/check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -7.510185242e-06
sending request to peer 0
response from peer 0: offset -9.894371033e-06
sending request to peer 0
response from peer 0: offset -5.722045898e-06
sending request to peer 0
response from peer 0: offset -5.841255188e-06
discarding peer 0: flags=3
overall average offset: 0
NTP CRITICAL: Offset unknown|


Has anyone got any ideas on where to go from here?

Re: check_ntp_time fails for first hour after restart

Posted: Mon Jun 17, 2013 1:28 pm
by abrist
Is the servers timezone correct?
Have you considered updating to the newest nagios-plugins package (nagios-plugins 1.4.16)?
http://nagiosplugins.org/

Re: check_ntp_time fails for first hour after restart

Posted: Wed Jun 19, 2013 4:20 am
by wzinet
Some of the failing servers are using 1.4.16, so I don't think upgrading the other clients will help.

The timezone is correct (and the same on all machines), so I don't think it's that.

after a further restart of ntp, it appears that I'm getting the following:
./check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -2.145767212e-05
sending request to peer 0
response from peer 0: offset -8.225440979e-06
sending request to peer 0
response from peer 0: offset -9.536743164e-07
sending request to peer 0
response from peer 0: offset -1.907348633e-06
discarding peer 0: stratum=0
overall average offset: 0
NTP CRITICAL: Offset unknown|


and I remember seeing something about stratum=0 in other threads, so I'll follow that up.

Re: check_ntp_time fails for first hour after restart

Posted: Wed Jun 19, 2013 5:20 am
by wzinet
So after a lot more digging, it seems that the NTP daemon on my host was not syncing, and so the problem is with ntp, not with the check_ntp_time script.

There appears to be a problem with the initial setting of the date (I'm running ubuntu 12.04 on this machine), and then it doesn't sync for some time.
check_ntp_time is then flagging an error.

The solution (for me, short-term) is to stop ntp, run ntpdate to sync the time, and then restart ntp, but I'll need to look into this properly...

Thanks nagios for catching another bug with my network :)

Re: check_ntp_time fails for first hour after restart

Posted: Wed Jun 19, 2013 1:07 pm
by abrist
You are welcome, though it seems as if you caught the issue with your network, we just humored you along the way :)
You may want to consider creating a cron for ntpdate if your daemon/service is not working correctly. Cheers.

Re: check_ntp_time fails for first hour after restart

Posted: Wed Jun 19, 2013 1:07 pm
by lmiltchev
So after a lot more digging, it seems that the NTP daemon on my host was not syncing...
This is strange...

You can stop the daemon, and set up a cron job to run:

Code: Select all

ntpdate pool.ntp.org
at certain times (until you figure it out).

Re: check_ntp_time fails for first hour after restart

Posted: Mon Jun 24, 2013 4:14 am
by wzinet
Many thanks.

The issue seems to be that I'm running within a 'virtual box' virtual machine.

Sometimes (and I don't understand the reason yet), there is a massive jitter in the RTC that is provided to the virtual machine (easy enough to see some possibilities, but nothing concrete). There are some notes about the internet (mostly from other VM's than virtual box), on how to improve things, but for now, I simply need to reboot a couple of times, and sometimes it wakes up in a non-jittery state, and is perfectly happy.

If I ever get the bandwidth to investigate properly I'll report the results, but I suspect for now I'll leave it...

Re: check_ntp_time fails for first hour after restart

Posted: Wed Jan 13, 2016 4:31 am
by nel_viox
Hi,

Any fixed solution for this?
I have the same issue and my version check_ntp_time v1.4.16 (nagios-plugins 1.4.16).

# /usr/nagios/libexec/check_ntp_time -vv -H 127.0.0.1
Found 1 peers to check
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893433
rxts = 1452676569.893442
txts = 1452676569.89347
offset -3.218650818e-06
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893588
rxts = 1452676569.893592
txts = 1452676569.893603
offset -2.384185791e-06
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893683
rxts = 1452676569.893685
txts = 1452676569.893693
offset -2.74181366e-06
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893772
rxts = 1452676569.893775
txts = 1452676569.893783
offset -2.145767212e-06
discarding peer 0: stratum=0
no peers meeting synchronization criteria :(
overall average offset: 0
NTP CRITICAL: Offset unknown|