Macos 26.2 Enables Fast AI Clusters with Rdma Over Thunderbolt
Key topics
The revelation that macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt sparked a lively debate that quickly derailed into a discussion about HDR support on Macs. While some users complained that HDR on macOS looks "washed out" on non-Apple monitors, others countered that this is actually intended behavior, with some even suggesting that Windows HDR is not as seamless as claimed. As the discussion veered off-topic, some commenters prioritized AI advancements over HDR, while others took the opportunity to steer the conversation towards entirely unrelated social issues. Amidst the chaos, a consensus emerged that macOS HDR implementation has its quirks, particularly with non-Apple monitors.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
53m
Peak period
86
0-6h
Avg / period
17.8
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 12, 2025 at 3:41 PM EST
21 days ago
Step 01 - 02First comment
Dec 12, 2025 at 4:34 PM EST
53m after posting
Step 02 - 03Peak activity
86 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 15, 2025 at 12:43 PM EST
18 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://www.youtube.com/shorts/sx9TUNv80RE
Which, personally, I find to be extremely ugly and gross and I do not understand why they thought this was a good idea.
The white and black levels of the UX are supposed to stay in SDR. That's a feature not a bug.
If you mean the interface isn't bright enough, that's intended behavior.
If the black point is somehow raised, then that's bizarre and definitely unintended behavior. And I honestly can't even imagine what could be causing that to happen.
A couple of examples:
Kimi K2 Thinking (1 trillion parameters): https://x.com/awnihannun/status/1986601104130646266
DeepSeek R1 (671B): https://x.com/awnihannun/status/1881915166922863045
The release in 26.2 will enable us to do fast tensor parallelism. Each layer of the model is sharded across all machines. With this type of parallelism you can get close to N-times faster for N machines. The main challenge is latency since you have to do much more frequent communication.
The way it typically works in an attention block is: smaller portions of the Q, K and V linear layers are assigned to each node and are processed independently. Attention, rope norm etc is run on the node-specific output of that. Then, when the output linear layer is applied an "all reduce" is computed which combines the output of all the nodes.
EDIT: just realized it wasn't clear -- this means that each node ends up holding a portion of the KV cache specific to its KV tensor shards. This can change based on the specific style of attention (e.g., in GQA where there are fewer KV heads than ranks you end up having to do some replication etc)
I am asking, however, is whether that will speed up decoding as linearly as it would for prefilling.
In our benchmarks with MLX / mlx-lm it's as much as 3.5x for token generation (decoding) at batch size 1 over 4 machines. In that case you are memory bandwidth bound so sharding the model and KV cache 4-ways means each machine only needs to access 1/4th as much memory.
Earlier this year I experimented with building a cluster to do tensor parallelism across large cache CPUs (AMD EPYC 7773X have 768mb of L3). My thought was to keep an entire model in SRAM and take advantage of the crazy memory bandwidth between CPU cores and their cache, and use Infiniband between nodes for the scatter/gather operations.
Turns out cross-core and PCIe latency absolutely dominate. The Infiniband fabric is damn fast once you get data to it, but getting it there quickly is a struggle. In theory CXL would help but I didn't have the budget for newer hardware. Perhaps modern Apple hardware is better for this than x86 stuff.
Exo-Labs: https://github.com/exo-explore/exo
This is nice. But if you can keep it on GPU, do absolutely everything you can to do that. For example the 32G VRAM RTX 5090's internal on card memory bandwidth equates to over 14 terabits per second. So for whatever you're doing:
First prize by far is on GPU. 14 Tbps. Second prize is Infiniband NDR at 400 Gbps. Third prize is multi GPU on board where the PCIe bus is the limiter at 248 Gbps. (Fast Ram is 1.1 Tbps so not a limiting factor)
So 80 Gbps isn't great, but it's fine for tinkering.
Infiniband NDR is 400 Gbps direct RDMA into GPU memory with 1 to 2 microseconds latency. You begin to understand that Nvidia's real strength is being a networking company that has fast GPUs. You can move data that a cuda kernel on one machine is working on to a kernel on a totally different machine without doing a device to host transfer. You're transferring from device to device at 400 Gbps with 1ns latency.
Yeah I know this is not the same market, but it's fun to contemplate.
Hopefully this makes it really nice for people that want the experiment with LLMs and have a local model but means well funded companies won’t have any reason to grab them all vs GPUs.
That said, the need for them also faded. The new chips have performance every bit as good as the eGPU-enhanced Intel chips.
My suggestion is to accept that format and just provide a way to network them at a low level via pci or better.
This is an issue for some industry-standard software like CUDA, which does provide BSD drivers with ARM support that just never get adopted by Apple: https://www.nvidia.com/en-us/drivers/unix/
Having used both professionally, once you understand how to drive Apple's MDM, Mac OS is as easy to sysadmin as Linux. I'll grant you it's a steep learning curve, but so is Linux/BSD if you're coming at it fresh.
In certain ways it's easier - if you buy a device through Apple Business you can have it so that you (or someone working in a remote location) can take it out of the shrink wrap, connect it to the internet, and get a configured and managed device automatically. No PXE boot, no disk imaging, no having it shipped to you to configure and ship out again. If you've done it properly the user can't interrupt/corrupt the process.
The only thing they're really missing is an iLo, I can imagine how AWS solved that, but I'd love to know.
There's a reason why Macs are the minority in the datacenter even compared to Windows server.
What this does offer is a good alternative to GPUs for smaller scale use and research. At small scale it’s probably competitive.
Also, I’m curious and in case anyone that knows reads this comment:
Apple say they can’t get the performance they want out of discreet GPUs.
Fair enough. But yet nVidia becomes the most valuable company in the world selling GPUs.
So…
Now I get that Apples use case is essentially sealed consumer devices built with power consumption and performance tradeoffs in mind.
But could Apple use its Apple Silicon tech to build a Mac Pro with its own expandable GPU options?
Or even other brand GPUs knowing they would be used for AI research etc…. If Apple ever make friends with nVidia again of course :-/
What we know of Tim Cooks Apple is that it doesn’t like to leave money on the table, and clearly they are right now!
Theoretically they could farm out the GPU to another company but it seems like they’re set on owning all of the hardware designs.
I think Apple is done with expansion slots, etc.
You'll likely see M5 Mac Studios fairly soon.
I think the hold up here is whether TSMC can actually deliver the M5 Pro/Ultra and whether the MLX team can give them a usable platform.
I guess there are other kinds of scientific simulation, very large dev work, and etc., but those things are quite a bit more niche.
Note fast sync workaround
Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.
I don't think I can recommend the Mac Studio for AI inference until the M5 comes out. And even then, it remains to be seen how fast those GPUs are or if we even get an Ultra chip at all.
The way this capability is exposed in the OS is that the computers negotiate an Ethernet bridge on top of the TB link. I suspect they're actually exposing PCIe Ethernet NICs to each other, but I'm not sure. But either way, a "Thunderbolt router" would just be a computer with a shitton of USB-C ports (in the same way that an "Ethernet router" is just a computer with a shitton of Ethernet ports). I suspect the biggest hurdle would actually just be sourcing an SoC with a lot of switching fabric but not a lot of compute. Like, you'd need Threadripper levels of connectivity but with like, one or two actual CPU cores.
[0] Like, last time I had to swap work laptops, I just plugged a TB cable between them and did an `rsync`.
https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
The "R" in RDMA means there are multiple DMA controllers who can "transparently" share address spaces. You can certainly share address spaces across nodes with RoCE or Infiniband, but thats a layer on top
Home PCs are as cheap as they’ve ever been. Adjusted for inflation the same can be said about “home use” Macs. The list price of an entry level MacBook Air has been pretty much the same for more than a decade. Adjust for inflation, and you get a MacBook air for half the real cost of the launch model that is massively better in every way.
A blip in high end RAM prices has no bearing on affordable home computing. Look at the last year or two and the proliferation of cheap, moderately to highly speced mini desktops.
I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house before dinner tomorrow for $500 + tax.
That’s not depressing, that’s amazing!
The analogous PC for this era requires a large amount of high speed memory and specialized inference hardware.
Can they run SOTA LLMs? No. Can they run smaller, yet still capable LLMs? Yes.
However, I don’t think that the ability to run SOTA LLMs is a reasonable expectation for “a computer in every home” just a few years into that software even existing.
I feel like you’re twisting the goalposts to make your point that it has to be local compute to have access to AI. Why does it need to be local?
Update: I take it back. You can get access to AI for free.
The PC that I was talking about is here[https://a.co/d/6c8Udbp]. I live in Canada so translated the prices to USD. Remember that US stores are forced to hide a massive import tax in those prices. The rest of the world isn’t subject to that.
Yes. Absolutely correct if you are talking about the short term. I was talking about the long term. If you are so certain would you take this bet: any odds, any amount that within 1 month I can buy 32gb of new retail DDR5 in the US for at least 10% less than the $384 you cited. (think very hard on why I might offer you infinite upside so confidently. It's not because I know where the price of RAM is going in the short term)
> So with prime that's $439+139 for $578 which is only slightly higher than the cost without prime of $549.99.
At this point I can't tell if you are arguing in bad faith, or just unfamiliar with how prime works. Just in case: You have cited the cost of prime for a full year. You can buy just a month of prime for a maximum price of $14.99 (that's how I got $455) if you have already used your free trial, and don't qualify for any discounts. Prime also allows cancellation within 14 days of signing up for a paid option, which is more than enough time to order a computer, and have it delivered, and cancel for a full refund.
So really, if you use a trial or ask for a refund for your prime fees the price is $439.
I agree that we've seen similar fluctuations in the past and the price of compute trends down in the long-term. This could be a bubble, which it likely is, in which case prices should return to baseline eventually. The political climate is extremely challenging at this time though so things could take longer to stabilize. Do you think we're in this ride for months or years?
If the current RAM supply crisis continues, it is very likely that these kinds of offers will disappear and that systems like this will become more expensive as well, not to mention all the other products that rely on DRAM components.
I also don’t believe RAM prices will drop again anytime soon, especially now that manufacturers have seen how high prices can go while demand still holds. Unlike something like graphics cards, RAM is not optional, it is a fundamental requirement for building any computer (or any device that contains one). People don’t buy it because they want to, but because they have to.
In the end, I suspect that some form of market-regulating mechanism may be required, potentially through government intervention. Otherwise, it’s hard for me to see what would bring prices down again, unless Chinese manufacturers manage to produce DRAM at scale, at significantly lower cost, and effectively flood the market.
just the 5090 GPU costs +$3k, what are you even talking about
How much as a base model MacBook Air changed in price over the last 15 years? With inflation, it's gotten cheaper.
The current base model is $999, and literally better in every way except thickness on one edge.
If we constrain ourselves to just 15 years. The $999 MBA was released that year ($1488 in real dollars). The list price has remained the same, with the exception of when they sold the discontinued 11” MBAs for $899.
It’s actually kind of wild how much better and cheaper computers have gotten.
I’m talking about the hundreds of affordable models that are perfectly suitable for everything up to and including AAA gaming.
The existence of expensive, and very much optional, high end computer parts does not mean that affordable computers are not more incredible than ever.
Just because cutting edge high end parts are out of reach to you, does not mean that perfectly usable computers are too, as I demonstrated with actual specs and prices in my post.
That’s what I’m talking about.
That's an amazing price, but I'd like to see where you're getting it. 32GB of RAM alone costs €450 here (€250 if you're willing to trust Amazon's February 2026 delivery dates).
Getting a PC isn't that expensive, but after the blockchain hype and then the AI hype, prices have yet to come down. All estimations I've seen will have RAM prices increase further until the summer of next year, and the first dents in pricing coming the year after at the very earliest.
The personal computing situation is great right now. RAM is temporarily more expensive, but it's definitely not ending any eras.
We're going back to the "consumer PCs have 8GB of RAM era" thanks to the AI bubble.
When the 2019 Mac Pro came out, it was "amazing" how many still photography YouTubers all got launch day deliveries of the same BTO Mac Pro, with exactly the same spec:
18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.
Or, more likely, Apple worked with them and made sure each of them had this Mac on launch day, while they waited for the model they actually ordered. Because they sure as hell didn't need an $18,000 computer for Lightroom.
Nowadays I fire off async jobs that involve 1000s of requests, billion of tokens, yet it costs basically the same as if I didn't.
Maybe it takes a different type of person, than the one I am, but all these "pay-as-you-go"/tokens/credits platforms make me nervous to use, and I end up not using it or spending time trying to "optimize", while investing in hardware and infrastructure I can run at home and use that seems to be no problem for my head to just roll with.
And just because you're mostly using local models doesn't mean you can't use API hosted models in specific contexts. Of course, then the same dread sets in, but if you can do 90% of the tokens with local models and 10% with pay-per-usage API hosted models, you get the best of both worlds.
1. The power button is in an awkward location, meaning rackmounting them (either 10" or 19" rack) is a bit cumbersome (at best)
2. Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability... wish they made a Mac with QSFP :)
3. Cabling will be important, as I've had tons of issues with TB4 and TB5 devices with anything but the most expensive Cable Matters and Apple cables I've tested (and even then...)
4. macOS remote management is not nearly as efficient as Linux, at least if you're using open source / built-in tooling
To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely, without a GUI, but it looks like you _have_ to use something like Screen Sharing or an IP KVM to log into the UI, to click the right buttons to initiate the upgrade.
Trying "sudo softwareupdate -i -a" will install minor updates, but not full OS upgrades, at least AFAICT.
https://www.owc.com/solutions/thunderbolt-dock
It's a poor imitation of old ports that had screws on the cables, but should help reduce inadvertent port stress.
The screw only works with limited devices (ie not the Mac Studio end of the cord) but it can also be adhesive mounted.
https://eshop.macsales.com/item/OWC/CLINGON1PK/
130 more comments available on Hacker News