The Limits of Ntp Accuracy on Linux
Key topics
Diving into the nuances of NTP accuracy on Linux, a lively discussion erupted around the limitations and potential workarounds for achieving precise timing. Commenters debated the merits of GPS timing modules, with some noting that only high-end models offer sawtooth correction values to mitigate errors, while others pointed out that older, cheaper receivers can still provide this capability. As the conversation unfolded, it became clear that applying sawtooth correction can reduce jitter, but its effectiveness depends on various factors, including relative phases and aliasing products. With insights ranging from the humorous "Segal's Law" to technical discussions of NTP complexities, this thread is a treasure trove for anyone grappling with the intricacies of timekeeping.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
3h
Peak period
36
6-12h
Avg / period
9.2
Based on 92 loaded comments
Key moments
- 01Story posted
Aug 25, 2025 at 9:02 PM EDT
4 months ago
Step 01 - 02First comment
Aug 25, 2025 at 11:48 PM EDT
3h after posting
Step 02 - 03Peak activity
36 comments in 6-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 30, 2025 at 1:50 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://news.ycombinator.com/item?id=44147523
Aligning the PPS pulse with an asynchronous local clock is going to require a very large number of measurements, or a high resolution timer (e.g. a time to digital converter, TDOA chip, etc. there are a few options.)
There is an exception re. sawtooth, but only a recent one: the Furuno GT-100, priced between the ZED-F9T and the Mosaic-T, has 200 ps clock resolution and doesn't even provide a quantisation error output.
https://content.u-blox.com/sites/default/files/products/docu...
There are cheap LEA-8MT on eBay too, especially if you have a method to pull the module off a scrap board fragment.
You can't really, depending on their relative phases and the resulting aliasing products the average of the sawtooth error can still have an arbitrary offset which last for an arbitrarily long time.
> that they will not have that much effect
okay fine for some definition of 'not much' that's true. But failing to account for it can result in a bigger error than many people expect-- and in an annoying way, since when you test it might be in a state where it is averaging out okay but then later shift into a state where it's producing an offset that doesn't average out.
Assuming your receiver outputs the correction it's pretty easy to handle, so long as you know it's a thing.
"A man with a watch knows what time it is. A man with two watches is never sure."
https://en.m.wikipedia.org/wiki/Segal's_law
A better model might be to measure the confidence with something like (x-1)/x as this grows, more slowly with each step, towards 1, without really getting there until infinity. With two watches you are 50% of maximum confidence in your time, with three 66%, with four 75%, five->80%, and so on.
Some major uses of high-precision timing, albeit not with NTP, include:
* Synchronising cell phone towers, the system partly relies on precise timing to avoid them interfering with one another.
* Timestamping required by regulators, in industries like high-frequency trading.
* Certain video production systems, where a ten-parts-per-million framerate error would build up into an unacceptable 1.7 second error over 48 hours.
* Certain databases (most famously Google Spanner) that use precise timing to ensure events are ordered correctly
* As a debugging luxury in distributed systems.
In some of these applications you could get away with a precise-but-free-floating time, as you only need things precisely synchronised relative to one another, not globally. But if you're building an entire data centre, a sub-$1000 GPS-controlled clock is barely noticeable.
Dumb personal and useless anecdote: one of those appliances made my life more difficult for months (at a FAANG company that built its own data centers, no less) for the nearly comical reason that we needed to move it but somehow couldn't rewire the GPS antenna, and the delays kept retriggering alerting that we kept disabling until the expecte "it'll be moved by then" time.
So, I guess to make the anecdote more useful: if you're gonna get one, make sure it doesn't hamstring you with wires...
They always know the paperwork and contractors needed to get a guy on a cherrypicker drilling holes and installing data cables without upsetting the building owners.
Securities regulation?: https://podcasts.apple.com/us/podcast/signals-and-threads/id...
Say you are running a few geographically apart radio receivers to triangulate signals, you want to have all of them as closely synchronized as possible for better accuracy.
Scientific and consistent analysis of streaming realtime sensor data.
Been there, done that, shipped the package. Took quite a bit of fun to get it working consistently, which was the main thing.
Synchronising the clocks on network connected audio devices (ADCs, DACs, DSP processors) on a LAN (https://en.wikipedia.org/wiki/Audio_Video_Bridging), or over the internet (broadcast-grade live streaming). This, and related standards, are more or less the norm in live sound and high-channel-count digital recording setups.
Well, that TSC-enabled hardware also has other peripherals (like SMBUS as mentioned in the article) that on the other hand introduce errors into the system.
I personally use a RPi4 with its external oscillators replaced with a TXCO. Some sellers on AliExpress even have kits for "audiophiles" that let you do this. It significantly improved clock stability and holdover. So much so that "chronyc tracking" doesn't show enough decimal places to display frequency error or skew. It's unfortunate though that the NIC does not do timestamping. (My modifications are similar to these: https://raspberrypi.stackexchange.com/a/109074)
I'd love to find an alternative cheap (microcontroller-based) implementation that could beat it.
I personally will not care for sub 200 microseconds and think it was a good article if read critically. I think it does describe why you should not do that at the moment if you have lots of nodes that need to sync consistently.
Having a shared 10Mhz reference clock is great and that gives you a pretty good consistent beat. I never managed to sync other physical sensors to that so the technical gotchas are too much for me.
There is madness in time.
Edit: changing some orders of magnitude honestly I feel happy if my systems are at 10ms.
If you do not do this, the times will never be consistent.
The author produced a faulty benchmark.
Making the sync work across existing heterogenous hardware is the goal of the exercise. That can't be a disqualifier.
It’s easy to fire up Chrony against a local GPS-backed time source and see it claim to be within X nanoseconds of GPS, but it’s tricky to figure out if Chrony is right or not.
"system time as distorted by getting it to/from the network" is exactly what is supposed to be measured here.
Linux PTP (https://linuxptp.sourceforge.net/) and hardware timestamping in the network card will get you in the sub 100ns range
Chrony is also much better software than any of the PTP daemons I tested a few years ago (for an onboard autonomous vehicle system).
I'll also say PTP is superior since it syncs TAI rather than NTP's UTC. Which probably isn't going to change even with NTPv5.
The switches could also implement a proper HW-timestamping NTP server and client to provide an equivalent to a PTP boundary clock.
PTP was based on a broadcast/multicast model to reduce the message rate in order to simplify and reduce the cost of HW support. But that is no longer a concern with modern HW that can timestamp packets at very high rates, so the simpler unicast protocols like NTP and client-server PTP (CSPTP) currently developed by IEEE might be preferable to classic PTP for better security and other advantages.
For telco gear, there is PTP + SyncE.
The Linux PTP stack is great for the price, but as an open source project it's hamstrung by the fact the PTP standard (IEEE1588) is paywalled; and the fact it doesn't work on wifi or usb-ethernet converters (meaning it also doesn't work on laptop docking stations or raspberry pi 3 and earlier)
This limits people developing/using for fun. And it's the people using it for fun who actually write all the documentation, the 'serious users' at high frequency trading firms and cell phone networks aren't blogging about their exploits.
802.1AS-2020 (gPTP) includes 802.11-2016 (wifi) support.
The IEEE's gatekeeping is indeed odious.
The biggest limitation is that many ethernet MACs do not support hardware timestamping. Nor do many entry-level ethernet switches.
For what it's worth, I'm interested in TSN for fun (music, actually), and I'm prepared to buy compatible networking hardware to do it. No difference to gamers spending money on a GPU.
Uh, the Siglent SDS 1204X-E used here has a "new, innovative digital trigger with low latency" ...
But yes, as others have commented already, if only the relative jitter between the signals is of interest, the trigger jitter itself is inconsequential.
Datasheet claims trigger jitter of <100 ps.
For anything that involves a scope, I think it's good practice to specify how and what you are measuring - namely what termination, what trigger level and show what the pulse actually looks like on the scope.
Where you are definitely right is that all those devices can produce completely different looking pulses, and only by looking at the pulse with a scope that has sufficient bandwidth you can pick the right trigger level that lands at a point of the waveforms that gets consistent characteristics, namely you stay away from the top of the pulse and tend to trigger somewhere mid-ramp that looks clean, and keep your paws away from those AUTO buttons!
...but this is all nitpicking as far as the post goes, this is where lots of electronics people get triggered (badum, tss!) where we have a network in the middle that is, essentially, chaos.
It appears to be set to trigger on the bottom trace (it appears still) and then retrospectively display the other two.
(offset values on the hardware timestamp on the immediately connected PTP clock also line up with this)
[Caveat: everything is in the same room with the same ambient temperature drifts…]
ETA: I can also increase the nav speed threshold to 2m/s.
It's simply that if you know your location, you can remove that as free variable from the equations and instead constrain the time further.
The author also mixes precision with accuracy and relies on self-reported figures from NTP (chrony says xxx ns jitter). With every media speed change you get asymmetry which affects accuracy (not always precision though). So your 100m->1G link for example will already introduce over 1 us of error (to accuracy!), but NTP will never show you this and nothing will unless you measure both ends with 1PPS, and the only way around it is PTP BC or TC. There is a very long list of similar clarifications that can be made. For example nothing is mentioned about message rates / intervals, which are crucial for improving the numbers of samples filters work with - and the fact that ptp4l and especially phc2sys aren't great with filtering. Finally getting time into the OS clock, unless you use PCIE PTM which practically limits you to newer Intel CPUs and newer Intel NICs, relies on a PCIE transaction with unknown delays and asymmetries, and without PTM (excluding few NICs) your OS clock is nearly always 500+ ns away from the PHC and you don't know by how much and you can't measure it. It's just a complex topic and requires an end to end, leave no stone unturned, semi-scientific approach to really present things correctly.
The author has other posts in the series where he tried to measure the accuracy relative to the PHC (not system clock) using PPS: https://scottstuff.net/posts/2025/06/02/measuring-ntp-accura...
Steering the same PHC with phc2sys as chronyd is using for HW timestamping is not the best approach as that creates a feedback loop (instability). It would be better to leave the PHC running free and just compare the sys<->PHC with PHC<->PPS offsets.
> So your 100m->1G link for example will already introduce over 1 us of error (to accuracy!), but NTP will never show you this
That doesn't apply to NTP to such an extent as PTP because it timestamps end of the reception (HW RX timestamps are transposed to the end of the packet), so the asymmetries in transmission lengths due to different link speeds should cancel out (unless the switches are cutting through in the faster->slower link direction, but that seems to be rare).
This is standard practice, though, for most PTP slave clocks. The feedback is just factored into the math. (Why? No idea. I just know how the code works.)
Although… it's standard practice in PTP setups that are designed for it. Not NTP… if only there were a specification… :)
I do have to wonder though. Of what use are timestamps from an unsynchronized PHC to chrony? Is it continuously taking twin sys+PHC timestamps to line up things?
That would be the logical way to do it. You want the lowest jitter timestamps you can get on the incoming ethernet frames. If conditions are stable enough you can compute a timebase translation between your MAC local clock and sys clock using the best available method, potentially using many samples over a (relatively) long time frame. And as the GP says, this gives a feed-forward structure without any need for stabilising a feedback loop.
gPTP full-duplex ethernet peer synchronisation uses timestamps from free-running local clocks at each end of the link.
Do you mean free-running as opposed to VCXO? Most implementations I know indeed use free-running clocks, but there's an increment/rate register in hardware that specifies what value to add to the time counter per crystal tick, which gets updated by the PTP layer — and timestamps use that. So even though the crystal is physically free running, the feedback loop is still there, it just doesn't include the crystal itself.
It is very clear that there is no feedback loop in 802.1AS (gPTP). I can't speak to other PTP versions. Local clocks (used, among other things for peer delay estimation) are specified to be free running with respect to the PTP time base and to connected peers, not disciplined to PTP Time, asynchronous, not synchronised, not syntonised. The peer delay mechanism equations compensate for both time offset and rate difference in peer local clocks. Furthermore, in order to average estimates over multiple measurements, time offset and rate difference are assumed to be stable (i.e. messing with the rate of a local clock violates invariant assumptions of the peer delay algorithm).
A few more qualifications: I am talking here about non-master peers. I think it might be compliant for the grand master to discipline the local clock to the time source (e.g. GPS PPS), provided it appears stable to connected peers, but it is not at all required by the protocol. Similarly, you might discipline Local Clock to some other stable clock, or to perform temperature compensation. In principle you could synchronise Local Clock to system clock (both free-running with respect to PTP time) so that your packet timestamps are automatically in sys-clock timebase. Once again, there is nothing in the PTP spec that requires this, but the potential utility is clear on a general purpose OS (not necessarily so on an embedded device).
There's only one clock in HW, not two. And you really want PTP time in HW for PPS/timestamp IO. (And gPTP uses 1588 HW, there are no special 802.1AS HW implementations that I'm aware of.)
Whether this matches the spec — no idea. My knowledge is from implementations… there could of course be ones that have two clocks. Can you link one?
In gPTP there is one hardware clock, yes. The Local Clock. PTP time is not a hardware clock, it is a virtual clock. The Local Clock is not synchronised to PTP time. Do you have a copy of 802.1AS-2020 there? Here are a few quick quotes resulting from searching for "local clock":
"3.16 local clock: A free-running clock, embedded in a respective entity (e.g., PTP Instance, CSN node), that provides a common time to that entity relative to an arbitrary epoch." (p. 21)
"Each PTP Instance measures, at each PTP Port, the ratio of the frequency of the PTP Instance at the other end of the link attached to that PTP Port to the frequency of its own clock. The cumulative ratio of the Grandmaster Clock frequency to the local clock frequency is accumulated in a standard organizational type, length, value (TLV) attached to the Follow_Up message (or the Sync message if the optional one-step processing is enabled). The frequency ratio of the Grandmaster Clock relative to the local clock is used in computing synchronized time, and the frequency ratio of the neighbor relative to the local clock is used in correcting the propagation time measurement." (p. 44)
"10.1.2.1 LocalClock entity The LocalClock entity is a free-running local clock (see 3.16) that provides a common time to the PTP Instance, relative to an arbitrary epoch. A PTP Instance contains a LocalClock entity. The requirements for the LocalClock entity are specified in B.1. All timestamps are taken relative to the LocalClock entity (see 8.4.3). The LocalClock entity also provides the value of currentTime (see 10.2.4.12), which is used in the state machines to specify the various timers.
NOTE—The epoch for the LocalClock entity can be the time that the PTP Instance is powered on. " (p.66)
Need I continue?
I may be mistaken, but I thought in this sub-thread we were talking about the PTP clock and the system clock (e.g. the x86 tsc). Only one of these is relevant to PTP but the system clock is the only one available to timestamp software events and so you may want to be able to convert between tsc time, NIC local clock time, or PTP time. If you want to schedule GPIO events on the NIC, given a PTP time you can compute the corresponding local clock time, and schedule the event on the GPIO. This does not require disciplining the NIC clock (i.e. the PTP local clock, as defined in my quotes above) to PTP time.
Re. asymmetries canceling out, OK, I oversimplified, and this is true in theory and often in practice, but for example, having done this with nearly all generations of enterprise type Broadcom ASICs sort of 2008 onwards I know that there are so many variations to this behaviour that the only way to know is to precisely measure latencies in one direction and the other for a variety of speed, CT vs. S&F and frame sizes, and even bandwidth combinations and see. I used to characterise switches for this, build test harnesses, measurement tools etc., and I saw everything ranging from: CT one way, S&F the other way, but not for all speed combinations, then CT behaviour regardless of enabling or disabling it, finally even things like latency having quantised step characteristics in increments of say X bytes because internally the switching fabric used X-byte cells, and then CT only behaving like CT above certain frame sizes. There's just a lot to take into account. There are even cases where a certain level of background traffic _improves_ latency fairness and symmetry, an equivalent of keeping the caches hot.
The author's best bet at reliable numbers would be to get himself a Latte Panda Mu or another platform with TGPIO and measure against 1PPS straight from the CPU. That would be the ultimate answer. Failing that, at least a PTM NIC synced to the OS clock, but that will alter the noise profile of the OS clock.
But you and me know all this because we've been digging deep into the hardware and software guts of those things for years, and have done this for a job, and what's a home lab user to do. It's a never-ending learning exercise and the key is to acknowledge the possible unknowns, and by that I don't mean scientific unknowns but that we don't know what we don't know, and bloggers sometimes don't do this.
Need the cards for that first ;). Still on cx5 here.
(Also some nVidia docs say you need cx7, but it's listed for cx6, not sure which is true…)
Meanwhile, here's some other articles:
NTP: https://austinsnerdythings.com/2025/02/14/revisiting-microse...
PTP: https://austinsnerdythings.com/2025/02/18/nanosecond-accurat...
https://www.jeffgeerling.com/blog/2025/diy-ptp-grandmaster-c...
Any asymmetry that is consistent is irrelevant.
On a more taboo note, while RasPi's can be great little time servers they have more drift and will have higher jitter but that should not matter for a home setup and should not be surprising. If jitter is their concern then they should consider using mini-pc's, disable cpuspeed, all power management and confine/lock the min/max speed to half the CPU capabilities and disable all services other than chrony. It will use more power but would address their concerns. They could also try different models of layer 2 switches. Consumer switches will add some artificial jitter and that varies wildly from make, model and even batch but again for a home network that should not matter. I think they are nitpicking. Perfect is the enemy of good, especially in a day and age when people prefer power saving over accuracy.
[Edit] As a side note the aggressive min/max poll settings they are using can amplify the inefficiencies of consumer switches and NICs regardless of filter settings and that can make the graphs more chaotic. They should consider re-testing that on data-center class servers, server NICs and enterprise class switches or just reduce the polling to something reasonable for a home network minpoll 1 maxpoll 2 for client, minpoll 5 maxpoll 7 for edge talking to a dozen stratum 1's with a high combinelimit. Presend should not be required even with default ARP neighbor GC times and intervals. Oh and if you want to try something fun with the graphs, run chronyc makestep ever minute in cron on every node. yeah yeah I know why one would not do that and its just cheating.
[1] - https://chrony-project.org/examples.html
There's so much more that can be picked apart here because it's an absolute rabbit hole of a topic - for example, saturate the links a little or a little more, especially with bursty traffic in both directions (or do an 80-20 cycle), and watch those measurements go out the window and only with PTP-capable switches at every hop will you survive this. The Telecom industry has done it ad nauseam and for years with appropriate standardised measurements, test masks and requirements.
And this whole business is also not fundamentally PTP vs. NTP because the principles are exactly the same, it's the fact that PTP was designed with hardware timestamping in mind and it would serve no purpose more useful than NTP had NTP gained support for one-step operation, hardware timestamping - and network assistance. But the default PTP profile uses known multicast groups and thus known destination MACs and it was the easiest entry into hardware packet matching - early "PTP-enabled" NICs only timestamped PTP packets (and most only multicast), only more modern ones allowed to timestamp all packets and that includes NTP.
And as far as RasPi goes - for time sync, at least in terms of COTS equipment, Intel is king, but that's because they had smart people working hard for years to purposefully integrate time-aware functionalities into the architectures (Hey Kevin and team!) - invariant TSC, ART, culminating with PCIE PTM. But this is where aiming for the tens to single digit ns region.
You can easily deliver sub-10 ns sync to a NIC, but a huge source of uncertainty is time transfer from your hardware-timestamping NIC to the OS clock. PTM is the only way to do this in hardware, otherwise, with Solarflare being the only NON-PTM exception I've worked with, comparing NIC to OS time is literally reading the time register on the NIC and the kernel time in quick succession in batches (granted, with local interrupts disabled), and then picking the pair of reads that seems to have taken the least amount of time. Unknowns on top of unknowns.
That pretty much sums it up and I agree with everything you stated. There are countless variables that one could spend a lifetime trying to understand, tune and compensate for and all of that changes with each combination of hardware and refreshing hardware is inevitable. It can be a never ending game. I just tune for good enough for my needs that being slightly better than defaults.
Graham: Synchronizing Clocks by Leveraging Local Clock Properties (2022) [pdf] (usenix.org) https://news.ycombinator.com/item?id=44860832
In particular the podcast about Jane Street's NTP setup was discussed.