check_ntp_time fails for first hour after restart

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
wzinet
Posts: 4
Joined: Mon Jun 17, 2013 4:26 am

check_ntp_time fails for first hour after restart

Post by wzinet »

Hi,

I've a strange issue, that I can't seem to find any links to on the web :(

I'm running nagios 3.4.1 on a VM (virtual box), under Ubuntu 12.04.
it's been up and running fine for the last 9 months (approx), and I've only seen real issues come out of it.

The machine recently suffered a disk issue (remounted root as r/o), but seems to be ok after a reboot & fsck. I'll replace the disk in due course, but for now here's where I am.

Whenever I restart nagios, _all_ the check_ntp_time scripts on the clients (various ubuntu flavours, running nrpe), give me "CRITICAL: Offset Unknown" for about an hour or so (I've not timed exactly), before they start to work.

If I run manually on a client I get:

/usr/lib/nagios/plugins/check_ntp_time -H lemon -w 1 -c 10 -v
sending request to peer 0
response from peer 0: offset -0.5986770391
sending request to peer 0
response from peer 0: offset -0.6005151272
sending request to peer 0
response from peer 0: offset -0.6005502939
sending request to peer 0
response from peer 0: offset -0.6005481482
discarding peer 0: flags=3
overall average offset: 0
NTP CRITICAL: Offset unknown|


(lemon is the name of my nagios host).

for reference on this machine (so it's not the leap-second bug), but it should be noted I've got many versions of the scripts from 1.4.10 onwards:
/usr/lib/nagios/plugins/check_ntp_time -V
check_ntp_time v1.4.14 (nagios-plugins 1.4.14)

Running the reverse on the nagios-server itself shows that I've got a bit of an offset, but within the thresholds:
/usr/local/nagios/libexec/check_ntp_time -v -H fandango
sending request to peer 0
response from peer 0: offset 0.7899760008
sending request to peer 0
response from peer 0: offset 0.7899907827
sending request to peer 0
response from peer 0: offset 0.7900112867
sending request to peer 0
response from peer 0: offset 0.7887747288
overall average offset: 0.7899760008
NTP OK: Offset 0.7899760008 secs|offset=0.789976s;60.000000;120.000000;

But a problem when running against localhost:
/usr/local/nagios/libexec/check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -7.510185242e-06
sending request to peer 0
response from peer 0: offset -9.894371033e-06
sending request to peer 0
response from peer 0: offset -5.722045898e-06
sending request to peer 0
response from peer 0: offset -5.841255188e-06
discarding peer 0: flags=3
overall average offset: 0
NTP CRITICAL: Offset unknown|


Has anyone got any ideas on where to go from here?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_ntp_time fails for first hour after restart

Post by abrist »

Is the servers timezone correct?
Have you considered updating to the newest nagios-plugins package (nagios-plugins 1.4.16)?
http://nagiosplugins.org/
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
wzinet
Posts: 4
Joined: Mon Jun 17, 2013 4:26 am

Re: check_ntp_time fails for first hour after restart

Post by wzinet »

Some of the failing servers are using 1.4.16, so I don't think upgrading the other clients will help.

The timezone is correct (and the same on all machines), so I don't think it's that.

after a further restart of ntp, it appears that I'm getting the following:
./check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -2.145767212e-05
sending request to peer 0
response from peer 0: offset -8.225440979e-06
sending request to peer 0
response from peer 0: offset -9.536743164e-07
sending request to peer 0
response from peer 0: offset -1.907348633e-06
discarding peer 0: stratum=0
overall average offset: 0
NTP CRITICAL: Offset unknown|


and I remember seeing something about stratum=0 in other threads, so I'll follow that up.
wzinet
Posts: 4
Joined: Mon Jun 17, 2013 4:26 am

Re: check_ntp_time fails for first hour after restart

Post by wzinet »

So after a lot more digging, it seems that the NTP daemon on my host was not syncing, and so the problem is with ntp, not with the check_ntp_time script.

There appears to be a problem with the initial setting of the date (I'm running ubuntu 12.04 on this machine), and then it doesn't sync for some time.
check_ntp_time is then flagging an error.

The solution (for me, short-term) is to stop ntp, run ntpdate to sync the time, and then restart ntp, but I'll need to look into this properly...

Thanks nagios for catching another bug with my network :)
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_ntp_time fails for first hour after restart

Post by abrist »

You are welcome, though it seems as if you caught the issue with your network, we just humored you along the way :)
You may want to consider creating a cron for ntpdate if your daemon/service is not working correctly. Cheers.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: check_ntp_time fails for first hour after restart

Post by lmiltchev »

So after a lot more digging, it seems that the NTP daemon on my host was not syncing...
This is strange...

You can stop the daemon, and set up a cron job to run:

Code: Select all

ntpdate pool.ntp.org
at certain times (until you figure it out).
Be sure to check out our Knowledgebase for helpful articles and solutions!
wzinet
Posts: 4
Joined: Mon Jun 17, 2013 4:26 am

Re: check_ntp_time fails for first hour after restart

Post by wzinet »

Many thanks.

The issue seems to be that I'm running within a 'virtual box' virtual machine.

Sometimes (and I don't understand the reason yet), there is a massive jitter in the RTC that is provided to the virtual machine (easy enough to see some possibilities, but nothing concrete). There are some notes about the internet (mostly from other VM's than virtual box), on how to improve things, but for now, I simply need to reboot a couple of times, and sometimes it wakes up in a non-jittery state, and is perfectly happy.

If I ever get the bandwidth to investigate properly I'll report the results, but I suspect for now I'll leave it...
nel_viox
Posts: 1
Joined: Sun Jan 10, 2016 10:06 pm

Re: check_ntp_time fails for first hour after restart

Post by nel_viox »

Hi,

Any fixed solution for this?
I have the same issue and my version check_ntp_time v1.4.16 (nagios-plugins 1.4.16).

# /usr/nagios/libexec/check_ntp_time -vv -H 127.0.0.1
Found 1 peers to check
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893433
rxts = 1452676569.893442
txts = 1452676569.89347
offset -3.218650818e-06
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893588
rxts = 1452676569.893592
txts = 1452676569.893603
offset -2.384185791e-06
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893683
rxts = 1452676569.893685
txts = 1452676569.893693
offset -2.74181366e-06
sending request to peer 0
response from peer 0: packet contents:
flags: 0xe4
li=3 (0xc0)
vn=4 (0x20)
mode=4 (0x04)
stratum = 0
poll = 16
precision = 1.19209e-07
rtdelay = 0
rtdisp = 0.0024261474609375
refid = 54494e49
refts = 0
origts = 1452676569.893772
rxts = 1452676569.893775
txts = 1452676569.893783
offset -2.145767212e-06
discarding peer 0: stratum=0
no peers meeting synchronization criteria :(
overall average offset: 0
NTP CRITICAL: Offset unknown|
Locked