1.5 Tb of Vram on MAC Studio – Rdma Over Thunderbolt 5
Key topics
The debate rages on about Apple's alleged reluctance to adopt high-end tech for their Mac lineup, with commenters dissecting the company's priorities and the implications of a recent experiment achieving 1.5 TB of VRAM on a Mac Studio via RDMA over Thunderbolt 5. Some, like behnamoh, argue that using Thunderbolt 5 instead of more robust connections like QSFP links holds Apple back from truly embracing high-performance capabilities. Others, such as donavanm and PunchyHamster, contend that Apple's focus on consumer and creative markets means enterprise-grade features are low on their priority list, with donavanm noting that "enterprise never ever mattered" in Apple's revenue numbers. Meanwhile, spacedcowboy throws a wrench in the works by pointing out Apple's growing presence in the datacenter with their own devices, potentially signaling a shift in priorities. As the discussion unfolds, it becomes clear that the real question on everyone's mind is: what's holding Apple back from unleashing its full potential in the high-end market?
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
46m
Peak period
149
Day 1
Avg / period
32
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 18, 2025 at 5:23 PM EST
15 days ago
Step 01 - 02First comment
Dec 18, 2025 at 6:08 PM EST
46m after posting
Step 02 - 03Peak activity
149 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 29, 2025 at 12:52 PM EST
4d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
- Something like DGX QSFP link (200Gb/s, 400Gb/s) instead of TB5. Otherwise, the economies of this RDMA setup, while impressive, don't make sense.
- Neural accelerators to get prompt prefill time down. I don't expect RTX 6000 Pro speeds, but something like 3090/4090 would be nice.
- 1TB of unified memory in the maxed out version of Mac Studio. I'd rather invest in more RAM than more devices (centralized will always be faster than distributed).
- The ability to overclock the system? I know it probably will never happen, but my expectation of Mac Studio is not the same as a laptop, and I'm TOTALLY okay with it consuming +600W energy. Currently it's capped at ~250W.
By the time I left in ‘10 the total revenue from mac hardware was like 15% of revenue. Im honestly surprised theres anyone who cared enough to package the business services for mac minis.
So if everything else is printing cash for a HUGE addressable consumer market at premium price points why would they try and compete with their own ODMs on more-or-less commodity enterprise gear?
Requiring one for doing iOS development they were already back into the green.
Microsoft gave Apple $250 million. The next quarter Apple turned around and spent $100 million on PowerComputing’s Mac assets.
Apple lost over a billion more before it became profitable. The $150 Net wouldn’t have been make or break.
Now Microsoft promising to keep Office on the Mac was a big deal
At one point, Microsoft was making more money on each Mac sold than Apple. Microsoft wasn’t doing it for charity. If it were, why did it do it before the agreement and continue to support Mac today?
Apple got credit from banks before either the announcement or Steve Jobs return.
I could count with my hand fingers how many Macs I have seen being used between being born in the 70's and 2000's, up to 10.
My university graduation project was porting a visualisation framework from NeXTSTEP into Windows, because already there the university could not see a future with NeXT.
The fact that people believe Apple's cash injection, not only from Microsoft, that allowed for a survival plan, has nothing to with Apple escaping bankruptcy is kind of interesting.
And yes Excel was initially developed for Mac.
And it’s not my “believing”, it’s math. Apple lost far more than the net $150 million before it became popular.
This isn’t my reading the history books. My first computer was an Apple //e in 1986 and by 1993, I was following what was going on with Apple real time via Usenet and TidBits (been around since 1990) and I lied to get a free subscription to MacWeek.
Some of the machine-designs that consumers are able to buy seem to have a marked resemblance to the feature-set that the datacenter people were clamouring for. Just saying...
Have there been leaks or something about these internal machines? I am curious to know more.
I can see the dollar signs in their eyes right now.
Aftermarkets are a nice reflection of durable value, and there's a massive one for iPhones and a smaller one for quick flameout startup servers, but not much money in 5 - 7 year old servers.
This isn’t any different with QSFP unless you’re suggesting that one adds a 200GbE switch to the mix, which:
* Adds thousands of dollars of cost,
* Adds 150W or more of power usage and the accompanying loud fan noise that comes with that,
* And perhaps most importantly adds measurable latency to a networking stack that is already higher latency than the RDMA approach used by the TB5 setup in the OP.
https://www.bhphotovideo.com/c/product/1926851-REG/mikrotik_...
Put another way, see the graphs in the OP where he points out that the old way of clustering performs worse the more machines you add? I’d expect that to happen with 200GbE also.
e.g. QSFP28 (100GbE) splits into 4x SFP28s (25GbE each), because QSFP28 is just 4 lanes of SFP28.
Same goes for QSFP112 (400GbE). Splits into SFP112s.
It’s OSFP that can be split in half, i.e. into QSFPs.
https://www.fs.com/products/101806.html
But all of this is pretty much irrelevant to my original point.
There's also splitting at the module level, for example I have a PCIe card that is actually a fully self hosted 6 port 100GB switch with it's own onboard Atom management processor. The card only has 2 MPO fiber connectors - but each has 12 fibers, which each can carry 25Gbps. You need a special fiber breakout cable but you can mix anywhere between 6 100GbE ports and 24 25Gbe ports.
https://www.silicom-usa.com/pr/server-adapters/switch-on-nic...
Are the smaller 98DX7325 and 98DX7321 the same chip with fuses blown? I wouldn't be surprised.
The switch in question has eight 50Gb ports, and the switch silicon apparently supports configurations that use all of its lanes in groups of four to provide only 200Gb ports. So it might be possible with the right (non-standard) configuration on the switch to be able to use a four-way breakout cable to combine four of the 50Gb ports from the switch into a single 200Gb connection to a client device.
I did some digging to find the switching chip: Marvell 98DX7335
Seems confirmed here: https://cdn.mikrotik.com/web-assets/product_files/CRS812-8DS...
And here: https://cdn.mikrotik.com/web-assets/product_files/CRS812-8DS...
From Marvell's specs: https://www.marvell.com/content/dam/marvell/en/public-collat... Again, those are some wild numbers if I have the correct model. Normally, Mikrotik includes switching bandwidth in their own specs, but not in this case.Besides stuff like this switch they've also produced pretty cool little micro-switches you can PoE and run as WLAN hotspots, e.g. to distance your mobile user device from some network you don't really trust, or more or less maliciously bridge a cable network through a wall because your access to the building is limited.
Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it.
> this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!!
Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDNA is ran on top of?
Or, Apple could pay for the engineers to add it.
If you daisy chain four nodes, then traffic between nodes #1 and #4 eat up all of nodes #2 and #3's bandwidth, and you eat a big latency penalty. So, absent a switch, the fully connected mesh is the only way to have fast access to all the memory.
If you have 3 links per box, then you can set up 8 nodes with a max distance of 2 hops and an average distance of 1.57 hops. That's not too bad. It's pretty close to having 2 links each to a big switch.
I do wonder where this limitation comes from, since on the M3 Ultra Mac Studios the front USB-C ports are also Thunderbolt 5, for a total of six Thunderbolt ports: https://www.apple.com/mac-studio/specs/
Though, I am always leery to recommend any decisions be made over something that's not already proven to work, so I would say don't bet on all ports being able to be used. They very well may be able to though.
Wasn’t it loaned ie didn’t buy any at all?
Apple should have loaned enough to flex.
I don't think the Mac Studio has a thermal design capable of dissipating 650W of heat for anything other than bursty workloads. Need to look at the Mac Pro design for that.
Overclocking long ago was an amazing saintly act, milking a lot of extra performance that was just there waiting, without major downsides to take. But these days, chips are usually already well tuned. You can feed double or tripple the power into the chip with adequate cooling, but the gain is so unremarkable. +10% +15% +20% is almost never going to be a make or break difference for your work, and doing so at double or triple the power budget is an egregious waste.
So many of the chips about are already delivered at way higher than optimum efficiency, largely for bragging rights. I disregard anyone out of hand trying to push more power through their chips. The exponential decay of efficiency you keep pushing for is an anti-quest, is against good. In almost all cases.
If your problem will not scale and dumping a ton of power into one GPU or one cpu socket is all you got, fine, your problem is bad and you have to deal with that. But for 90% of people, begging for more power proces you don't actually know jack & my personal recommendation is that all such points of view deserve massive down voting by anyone with half a brain.
I'd tweak individual GPUs' various clocks and volts to optimize this. I'd even go so far as to tweak fan speed ramps on the cards themselves (those fans don't power themselves! There's whole Watts to save there!).
I worked to optimize the efficiency of even the power from the wall.
But that was a system that ran, balls-out, 24/7/365.
Or at least it ran that way until it got warmer outside, and warmer inside, and I started to think about ways to scale mining eth in the basement vs. cooling the living space of the house to optimize returns. (And I never quite got that sorted before they pulled the rug on mining.)
And that story is whatever it is, but: Power efficiency isn't always the most-sensible goal. Sometimes, maximum performance is a better goal. We aren't always mining Ethereum.
Jeff's (quite lovely) video and associated article is a story about just one man using a stack of consumer-oriented-ish hardware in amusing -- to him -- ways, with local LLM bots.
That stack of gear is a personal computer. (A mighty-expensive one on any inflation-adjusted timeline, but what was constructed was definitely used as a personal computer.)
Like most of our personal computers (almost certainly including the one you're reading this on), it doesn't need to be optimized for a 24/7 100% workload. It spends a huge portion of its time waiting for the next input. And unlike mining Eth in the winter in Ohio: Its compute cycles are bursty, not constant, and are ultimately limited by the input of one human.
So sure: I, like Jeff, would also like to see how it would work when running with the balls[2] running further out. The whole rig is going to spend most of its time either idling or off, anyway: We might as well get some work done when a human is in front of it, even if each token costs more in that configuration than it does OOTB.
It theoretically even can clock up when being actively-used (and suck all the power), and clock back down when idle (and resume being all sleepy and stuff).
That's a well-established concept that Intel has variously called SpeedStep and/or Turbo Boost -- and those things work for bursty workloads, and have worked in that way for a very long time now.
[1]: Y'all can hate me for being a small part of that problem. It's allowed.
[2]: https://en.wikipedia.org/wiki/Centrifugal_governor
My office-room was heated mostly by resistance, plus whatever gas-fired heat trickled in through the doorway.
I didn't have as much power available there as I had in the basement, but I had enough to mine a bit of crypto to supplement the resistance heater. :)
From one perspective: It was never directly profitable to do this. Other than eth, nothing has ever been profitable-enough for me to care about.
From another perspective: I was going to burn the energy anyway. The Joules cost the same and add the same amount of warmth either way, so I might as well get them with a side dish of free crypto.
Good times.
(These days, I transcode videos with Tdarr during the winter.)
Or they are simply not-rich people who cannot afford to purchase extra hardware to run in parallel. Electricity is cheap. GPUs are not. So i want to get every ounce of power out of the precious few GPUs i can afford to own.
(And dont point at clouds. Running AI on someone else's cloud is like telling a shadetree mechanic to rent a car instead of fixing his owm.)
American :)
[roughly 23 us cents / kWh on my last bill]
"On-Peak/Weekdays 4 p.m. – 9 p.m. = 39.1c"
Back when you bought a 233 Mhz chip with ram at 66 Mhz, ran the bus at 100 Mhz which also increased your ram speed if it could handle it, and everything was faster.
> But these days, chips are usually already well tuned. You can feed double or tripple the power into the chip with adequate cooling, but the gain is so unremarkable. +10% +15% +20% is almost never going to be a make or break difference for your work
20% in synthetic benchmarks maybe, or very particular loads. Because you only overclock the CPU these days so anything hitting the ram won't even go to 20%.
Too lazy to figure out which cryptic setting is exact watts.
One of these days I'll configure the video card too.
I was actually looking for benchmarks earlier this week along those lines - ideally covering the whole slate of Arrow Lake processors running at various TDPs. Not much available on the web though.
anyone even remotely on the fence about whether or not they should bother with all this stuff, just read OP or read this tl;dr: the answer is no, it is not.
What is a computer?
i'm not sure why anyone would buy a mac studio instead of a gb10 machine for this use case though.
For an AI-only use case, the GB10s make sense, but they are only OK as desktop workstations, and I’m not sure for how long DGX OS will be updated, as dedicated AI machines have somewhat short lives. Apple computers, OTOH, have much longer lives, and desktops live the longest. I retired my Mac Mini a year after the machine was no longer getting OS updates, and it was still going strong.
/"s"
It's all a long game, folks. Play it long.
In what ways? The only switching I've seen is away from desktop memory.
If you meant glut of memory suitable for datacenter GPUs, I don't expect that nearly so soon. That market can absorb extra chips pretty easily unless we see a really harsh pop really soon.
We have always had a ram shortage. We’ve also always been at war with eastasia.
The 2019 i9 Macbook Pro has entered the chat.
M4 already hit the necessary speed per channel, and M5 is well above it. If they actually release an Ultra that much bandwidth is guaranteed on the full version. Even the smaller version with 25% fewer memory channels will be pretty close.
We already know Max won't get anywhere near 1TB/s since Max is half of an Ultra.
It is a little sad that they gave someone an uber machine and this was the best he could come up with.
Question answering is interesting but not the most interesting thing one can do, especially with a home rig.
The realm of the possible
Video generation: CogVideoX at full resolution, longer clips
Mochi or Hunyuan Video with extended duration
Image generation at scale:
FLUX batch generation — 50 images simultaneously
Fine-tuning:
Actually train something — show LoRA on a 400B model, or full fine-tuning on a 70B
but I suppose "You have it for the weekend" means chatbot go brrrrr and snark
Because web search is so broken these days, if you want a clean answer instead of wading through pages of SEO nonsense. It's really common (even) amongst non-techy friends that "I'll ask ChatGPT" has replaced "I'll Google it".
Yeah, that's what I wanted to see too.
Seems like the ecosystem is rapidly evolving
I would have expected that going from one node (which can't hold the weights in RAM) to two nodes would have increased inference speed by more than the measured 32% (21.1t/s -> 27.8t/s).
With no constraint on RAM (4 nodes) the inference speed is less than 50% faster than with only 512GB.
Am I missing something?
I don't think that's true. At least not without heavy performance loss in which case "just be memory mapped" is doing a lot of work here.
By that logic GPUs could run models much larger than their VRAM would otherwise allow, which doesn't seem to be the case unless heavy quantization is involved.
MoEs is great for distributed deployments, because you can maintain a distribution of experts that matches your workload, and you can try to saturate each expert and thereby saturate each node.
With a cluster of two 512GB nodes, you have to send half the weights (350GB) over a TB5 connection. But you have to do this exactly once on startup.
With a single 512GB node, you'll be loading weights from disk each time you need a different expert, potentially for each token. Depending on how many experts you're loading, you might be loading 2GB to 20GB from disk each time.
Unless you're going to shut down your computer after generating a couple of hundred tokens, the cluster wins.
I definitely would not be buying an M3 Ultra right now on my own dime.
I have an M4 Max I can use to bridge any gap...
Which I guess is the point of this for Apple, but still.
Makes one wonder what apple uses for their own servers. I guess maybe they have some internal M-series server product they just haven’t bothered to release to the public, and features like this are downstream of that?
I guess they prefer that third parties deal with that. There’s rack mount shelves for Mac Minis and Studios.
- Why is the tooling so lame ?
- What do they, themselves, use internally ?
Stringing together mac minis (or a "Studio", whatever) with thunderbolt cables ... Christ.
Or do they have some real server-grade product coming down the line, and are releasing this ahead of it so that 3rd party software supports it on launch day?
That they use INTERNALLY for their servers? I could certainly see this being useful for that.
Mostly I think this is just to get money from the AI boom. They already had TB5, it’s not like this was costing them additional hardware. Just some time that probably paid off on their internal model training anyway.
Given up is not a given. A lot of the exec team has been changing.
If I was in charge of a business, and I’m an Apple fan, I wouldn’t touch them. I’d have no faith they’re in it for the long term. I think that would be a common view.
https://cottonbureau.com/p/4RUVDA/shirt/mac-pro-believe-dark...
These machines are very much internal - you can cram a lot of M-series (to use the public nomenclature) chips onto a rack-sized PCB. I was never under the impression they were destined for anything other than Apple datacenters though...
As I mentioned above, it seems to me there's a couple of feature that appeared on the customer-facing designs that were inspired by what the datacenter people wanted on their own PCB boards.
https://developer.apple.com/documentation/macos-release-note...
Which I'm sure you saw in literally yesterday's thread about the exact same thing.
I wrote about this earlier this week, in particular sharding k-v cache across GPUs, and how network is the new memory hierarchy.
https://buildai.substack.com/p/kv-cache-sharding-and-distrib...
But I mostly want to say thanks for everything you do. Your good vibes are deeply appreciated and you are an inspiration.
66 more comments available on Hacker News