Macos 26.2 Enables Fast AI Clusters with Rdma Over Thunderbolt

Posted21 days agoActive18 days ago

guiand

533 points

290 comments

developer.apple.comTech DiscussionstoryHigh profile

informativepositive

Debate

20/100

MacosRdmaThunderboltArtificial Intelligence

Key topics

Macos

Rdma

Thunderbolt

Artificial Intelligence

The revelation that macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt sparked a lively debate that quickly derailed into a discussion about HDR support on Macs. While some users complained that HDR on macOS looks "washed out" on non-Apple monitors, others countered that this is actually intended behavior, with some even suggesting that Windows HDR is not as seamless as claimed. As the discussion veered off-topic, some commenters prioritized AI advancements over HDR, while others took the opportunity to steer the conversation towards entirely unrelated social issues. Amidst the chaos, a consensus emerged that macOS HDR implementation has its quirks, particularly with non-Apple monitors.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

53m

Peak period

0-6h

Avg / period

17.8

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 12, 2025 at 3:41 PM EST
21 days ago
Step 01
02First comment
Dec 12, 2025 at 4:34 PM EST
53m after posting
Step 02
03Peak activity
86 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Dec 15, 2025 at 12:43 PM EST
18 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (290 comments)

Showing 160 comments of 290

nodesocket

21 days ago

2 replies

Can we get proper HDR support first in macOS? If I enable HDR on my LG OLED monitor it looks completely washed out and blacks are grey. Windows 11 HDR works fine.

Razengan

21 days ago

2 replies

Really? I thought it's always been that HDR was notorious on Windows, hopeless on Linux, and only really worked in a plug-and-play manner on Mac, unless your display has an incorrect profile or something/

https://www.youtube.com/shorts/sx9TUNv80RE

masspro

21 days ago

4 replies

MacOS does wash out SDR content in HDR mode specifically on non-Apple monitors. An HDR video playing in windowed mode will look fine but all the UI around it has black and white levels very close to grey.

Starmina

21 days ago

3 replies

That's intended behavior for monitor limited in peak brightness

nodesocket

21 days ago

I don't think so. Windows 11 has a HDR calibration utility that allows you to adjust brightness and HDR and it maintains blacks being perfectly black (especially with my OLED). When I enable HDR on macOS whatever settings I try, including adjusting brightness and contrast on the monitor the blacks look completely washed out and grey. HDR DOES seem to work correctly on macOS but only if you use Mac displays.

kmeisthax

21 days ago

Actually, intended behavior in general. Even on their own displays the UI looks grey when HDR is playing.

Which, personally, I find to be extremely ugly and gross and I do not understand why they thought this was a good idea.

masspro

21 days ago

That’s the statement I found last time I went down this rabbit hole, that they don’t have physical brightness info for third-party displays so it just can’t be done any better. But I don’t understand how this can lead to making the black point terrible. Black should be the one color every emissive colorspace agrees on.

adastra22

21 days ago

Huh, so that’s why HDR looks like shit on my Mac Studio.

crazygringo

21 days ago

Define "washed out"?

The white and black levels of the UX are supposed to stay in SDR. That's a feature not a bug.

If you mean the interface isn't bright enough, that's intended behavior.

If the black point is somehow raised, then that's bizarre and definitely unintended behavior. And I honestly can't even imagine what could be causing that to happen.

robflynn

21 days ago

Oh, that explains why it looked so odd when I enabled HDR on my Studio.

heavyset_go

21 days ago

Works well on Linux, just toggle a checkmark in the settings.

m-ack-toddler

21 days ago

1 reply

AI is arguably more important than whatever gaming gimmick you're talking about.

djdkdldl

21 days ago

You know what's more important? Taxpayer-funded net-zero friendly transition centers / abortion centers / no judgement BIPOC QUEERX friendly safe injection spaces in our communities. Please take a seat.

simonw

21 days ago

4 replies

I follow the MLX team on Twitter and they sometimes post about using MLX on two or more joined together Macs to run models that need more than 512GB of RAM.

A couple of examples:

Kimi K2 Thinking (1 trillion parameters): https://x.com/awnihannun/status/1986601104130646266

DeepSeek R1 (671B): https://x.com/awnihannun/status/1881915166922863045

awnihannun

21 days ago

4 replies

For a bit more context, those posts are using pipeline parallelism. For N machines put the first 1/N layers on machine 1, next 1/N layers on machine 2, etc. With pipeline parallelism you don't get a speedup over one machine - it just buys you the ability to use larger models than you can fit one a single machine.

The release in 26.2 will enable us to do fast tensor parallelism. Each layer of the model is sharded across all machines. With this type of parallelism you can get close to N-times faster for N machines. The main challenge is latency since you have to do much more frequent communication.

liuliu

21 days ago

2 replies

But that's only for prefilling right? Or is it beneficial for decoding too (I guess you can do KV lookup on shards, not sure how much speed-up that will be though).

zackangelo

21 days ago

1 reply

No you use tensor parallelism in both cases.

The way it typically works in an attention block is: smaller portions of the Q, K and V linear layers are assigned to each node and are processed independently. Attention, rope norm etc is run on the node-specific output of that. Then, when the output linear layer is applied an "all reduce" is computed which combines the output of all the nodes.

EDIT: just realized it wasn't clear -- this means that each node ends up holding a portion of the KV cache specific to its KV tensor shards. This can change based on the specific style of attention (e.g., in GQA where there are fewer KV heads than ranks you end up having to do some replication etc)

liuliu

21 days ago

1 reply

I usually call it "head parallelism" (which is a type of tensor parallelism, but paralllelize for small clusters, and specific to attention). That is what you described: sharding input tensor by number of heads and send to respective Q, K, V shard. They can do Q / K / V projection, rope, qk norm whatever and attention all inside that particular shard. The out projection will be done in that shard too but then need to all reduce sum amongst shard to get the final out projection broadcasted to every participating shard, then carry on to do whatever else themselves.

I am asking, however, is whether that will speed up decoding as linearly as it would for prefilling.

awnihannun

21 days ago

1 reply

Right, my comment was mostly about decoding speed. For prefill you can get a speed up but there you are less latency bound.

In our benchmarks with MLX / mlx-lm it's as much as 3.5x for token generation (decoding) at batch size 1 over 4 machines. In that case you are memory bandwidth bound so sharding the model and KV cache 4-ways means each machine only needs to access 1/4th as much memory.

liuliu

21 days ago

Oh! That's great to hear. Congrats! Now, I want to get the all-to-all primitives ready in s4nnc...

monster_truck

21 days ago

Even if it wasn't outright beneficial for decoding by itself, it would still allow you to connect a second machine running a smaller, more heavily quantized version of the model for speculative decoding which can net you >4x without quality loss

dpe82

21 days ago

1 reply

> The main challenge is latency since you have to do much more frequent communication.

Earlier this year I experimented with building a cluster to do tensor parallelism across large cache CPUs (AMD EPYC 7773X have 768mb of L3). My thought was to keep an entire model in SRAM and take advantage of the crazy memory bandwidth between CPU cores and their cache, and use Infiniband between nodes for the scatter/gather operations.

Turns out cross-core and PCIe latency absolutely dominate. The Infiniband fabric is damn fast once you get data to it, but getting it there quickly is a struggle. In theory CXL would help but I didn't have the budget for newer hardware. Perhaps modern Apple hardware is better for this than x86 stuff.

wmf

21 days ago

2 replies

That's how Groq works. A cluster of LPUv2s would probably be faster and cheaper than an Infiniband cluster of Epycs.

fooblaster

21 days ago

1 reply

what is an lpuv2

wmf

21 days ago

The chip that Groq makes.

dpe82

21 days ago

Yeah I'm familiar; I was hoping I could do something related on previous generation commodity(ish) hardware. It didn't work but I learned a ton.

aimanbenbaha

20 days ago

Exo-Labs is an open source project that allows this too, pipeline parallelism I mean not the latter, and it's device agnostic meaning you can daisy-chain anything you have that has memory and the implementation will intelligently shard model layers across them, though its slow but scales linearly with concurrent requests.

Exo-Labs: https://github.com/exo-explore/exo

mmaunder

21 days ago

What's the network topology for RDMA over TB5? Trying to figure out how they'd support more than 2 macs.

This is nice. But if you can keep it on GPU, do absolutely everything you can to do that. For example the 32G VRAM RTX 5090's internal on card memory bandwidth equates to over 14 terabits per second. So for whatever you're doing:

First prize by far is on GPU. 14 Tbps. Second prize is Infiniband NDR at 400 Gbps. Third prize is multi GPU on board where the PCIe bus is the limiter at 248 Gbps. (Fast Ram is 1.1 Tbps so not a limiting factor)

So 80 Gbps isn't great, but it's fine for tinkering.

Infiniband NDR is 400 Gbps direct RDMA into GPU memory with 1 to 2 microseconds latency. You begin to understand that Nvidia's real strength is being a networking company that has fast GPUs. You can move data that a cuda kernel on one machine is working on to a kernel on a totally different machine without doing a device to host transfer. You're transferring from device to device at 400 Gbps with 1ns latency.

Yeah I know this is not the same market, but it's fun to contemplate.

andy99

21 days ago

4 replies

I’m hoping this isn’t as attractive as it sounds for non-hobbyists because the performance won’t scale well to parallel workloads or even context processing, where parallelism can be better used.

Hopefully this makes it really nice for people that want the experiment with LLMs and have a local model but means well funded companies won’t have any reason to grab them all vs GPUs.

codazoda

21 days ago

1 reply

I haven’t looked yet but I might be a candidate for something like this, maybe. I’m RAM constrained and, to a lesser extent, CPU constrained. It would be nice to offload some of that. That said, I don’t think I would buy a cluster of Macs for that. I’d probably buy a machine that can take a GPU.

ChrisMarshallNY

21 days ago

1 reply

I’m not particularly interested in training models, but it would be nice to have eGPUs again. When Apple Silicon came out, support for them dried up. I sold my old BlackMagic eGPU.

That said, the need for them also faded. The new chips have performance every bit as good as the eGPU-enhanced Intel chips.

andy_ppp

20 days ago

1 reply

eGPU with an Apple accelerator with a bunch or RAM and GPU cores could be really interesting honestly. I’m pretty sure they are capable of designing something very competitive especially in terms of performance per watt.

sroussey

20 days ago

1 reply

Really, that’s a place for the MacPro: slide in SoC with ram modules / blades. Put 4, 8, 16 Ultra chips in one machine.

andy_ppp

19 days ago

1 reply

You honestly don’t need extra CPUs in this system at some point do you?

sroussey

19 days ago

They are inseparable for Apple. CPUS/GPUs/memory. They can use chipsets to tweak ratios, but I doubt they will change the underlying module format—everything together.

My suggestion is to accept that format and just provide a way to network them at a low level via pci or better.

bigyabai

21 days ago

2 replies

The lack of official Linux/BSD support is enough to make it DOA for any serious large-scale deployment. Until Apple figures out what they're doing on that front, you've got nothing to worry about.

Eggpants

21 days ago

2 replies

Not sure I understand, Mac OS is BSD based. https://en.wikipedia.org/wiki/Darwin_(operating_system)

bigyabai

21 days ago

1 reply

macOS is XNU-based. There is BSD code that runs in the microkernel level and BSD tools in the userland, but the kernel does not resemble BSD's architecture or adopt BSD's license.

This is an issue for some industry-standard software like CUDA, which does provide BSD drivers with ARM support that just never get adopted by Apple: https://www.nvidia.com/en-us/drivers/unix/

21 days ago

1 reply

If there were TCO advantages with this setup, CUDA would not be a blocker.

bigyabai

21 days ago

1 reply

CUDA's just one example; there's a lot of hardware support on the BSDs that Apple doesn't want to inherit.

ngcc_hk

21 days ago

1 reply

Why maint other and have baggage ?

bigyabai

20 days ago

Because Apple already does...? There's still PowerPC and MIPS code that runs in macOS. Asking for CUDA compatibility is not somehow too hard for the trillion-dollar megacorp to handle.

zargon

21 days ago

Don't be deliberately obtuse.

mjlee

20 days ago

1 reply

Why? AWS manages to do it (https://aws.amazon.com/ec2/instance-types/mac/). Smaller companies too - https://macstadium.com

Having used both professionally, once you understand how to drive Apple's MDM, Mac OS is as easy to sysadmin as Linux. I'll grant you it's a steep learning curve, but so is Linux/BSD if you're coming at it fresh.

In certain ways it's easier - if you buy a device through Apple Business you can have it so that you (or someone working in a remote location) can take it out of the shrink wrap, connect it to the internet, and get a configured and managed device automatically. No PXE boot, no disk imaging, no having it shipped to you to configure and ship out again. If you've done it properly the user can't interrupt/corrupt the process.

The only thing they're really missing is an iLo, I can imagine how AWS solved that, but I'd love to know.

bigyabai

20 days ago

1 reply

Where the ever-loving hell are you working where MDM is the limiting factor on Linux deployments? North Korea?

There's a reason why Macs are the minority in the datacenter even compared to Windows server.

mjlee

20 days ago

1 reply

I’m talking about using MDM with Mac OS (to take advantage of Apple Silicon, not licensing) in contrast to the tools we already have with other OSes. Probably you could do it to achieve a large scale on prem Linux deployment, fortunately I’ve never tried.

bigyabai

18 days ago

Well, be that as it may, it's quite unrelated to deploying Macs in the datacenter. It's definitely not a selling point to people putting Proxmox or k8s on their machines.

api

21 days ago

2 replies

No way buying a bunch of minis could be as efficient as much denser GPU racks. You have to consider all the logistics and power draw, and high end nVidia stuff and probably even AMD stuff is faster than M series GPUs.

What this does offer is a good alternative to GPUs for smaller scale use and research. At small scale it’s probably competitive.

gumboshoes

21 days ago

1 reply

Exactly: The AI appliance market. A new kind of home or small-business server.

jabbywocker

21 days ago

4 replies

I’m expecting Apple to release a new Mac Pro in the next couple years who’s main marketing angle is exactly this

firecall

21 days ago

1 reply

Seems like it could be a thing.

Also, I’m curious and in case anyone that knows reads this comment:

Apple say they can’t get the performance they want out of discreet GPUs.

Fair enough. But yet nVidia becomes the most valuable company in the world selling GPUs.

So…

Now I get that Apples use case is essentially sealed consumer devices built with power consumption and performance tradeoffs in mind.

But could Apple use its Apple Silicon tech to build a Mac Pro with its own expandable GPU options?

Or even other brand GPUs knowing they would be used for AI research etc…. If Apple ever make friends with nVidia again of course :-/

What we know of Tim Cooks Apple is that it doesn’t like to leave money on the table, and clearly they are right now!

jabbywocker

21 days ago

2 replies

There’s been rumors of Apple working on M-chips that have the GPU and CPU as discrete chiplets. The original rumor said this would happen with the M5 Pro, so it’s potentially on the roadmap.

Theoretically they could farm out the GPU to another company but it seems like they’re set on owning all of the hardware designs.

nntwozz

21 days ago

1 reply

[delayed]

jabbywocker

20 days ago

Yeah outside of TSMC, I don’t see them ever going back to having a hardware partner.

storus

20 days ago

TSMC has a new tech that allows seamless integration of mini chiplets, i.e. you can add as many CPU/GPU cores in mini chiplets as you wish and glue them seamlessly together, at least in theory. The rumor is that TSMC had some issues with it which is why M5P and M5M are delayed.

alwillis

20 days ago

1 reply

> I’m expecting Apple to release a new Mac Pro in the next couple years

I think Apple is done with expansion slots, etc.

You'll likely see M5 Mac Studios fairly soon.

jabbywocker

20 days ago

I’m not saying a Mac Pro with expansion slots, I’m saying a Mac Pro whose marketing angle is locally running AI models. A hungry market that would accept moderate performance and is already used to bloated price tags has to have them salivating.

I think the hold up here is whether TSMC can actually deliver the M5 Pro/Ultra and whether the MLX team can give them a usable platform.

pjmlp

20 days ago

I fear they no longer care about the workstation market, even the folks at ATP Podcast are at the verge of accepting it.

api

20 days ago

It’s really the only common reason to buy a machine that big these days. I could see a Mac Pro with a huge GPU and up to a terabyte of RAM.

I guess there are other kinds of scientific simulation, very large dev work, and etc., but those things are quite a bit more niche.

FuckButtons

21 days ago

Power draw? A entire Mac Pro running flat out uses less power than 1 5090. If you have a workload that needs a huge memory footprint then the tco of the Macs, even with their markup may be lower.

willtemperley

21 days ago

I think it’s going to be great for smaller shops that want on premise private cloud. I’m hoping this will be a win for in-memory analytics on macOS.

CamperBob2

21 days ago

1 reply

Almost the most impressive thing about that is the power consumption. Less than 50 watts for both of them?

wmf

21 days ago

2 replies

Yeah, two Mac Studios is going to be ~400 W.

CamperBob2

21 days ago

1 reply

What am I missing? https://i.imgur.com/YpcnlCH.png

wmf

21 days ago

https://www.youtube.com/watch?v=zCkbVLqUedg

m-s-y

21 days ago

Can confirm. My M3 Ultra tops out at 210W when ComfyUI or ollama is running flat out. Confirmed via smart plug.

anemll

21 days ago

Tensor Parallel test with RDMA last week https://x.com/anemll/status/1996349871260107102

Note fast sync workaround

pstuart

21 days ago

2 replies

I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender for building plug and play AI clusters. Not cheap, but neither is Nvidia.

whimsicalism

21 days ago

2 replies

nvidia is absolutely cheaper per flop

FlacksonFive

21 days ago

1 reply

To acquire, maybe, but to power?

whimsicalism

21 days ago

1 reply

machine capex currently dominates power

amazingman

21 days ago

1 reply

Sounds like an ecosystem ripe for horizontally scaling cheaper hardware.

crote

21 days ago

If I understand correctly, a big problem is that the calculation isn't embarrasingly parallel: the various chunks are not independent, so you need to do a lot of IO to get the results from step N from your neighbours to calculate step N+1.

Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.

adastra22

21 days ago

1 reply

FLOPS are not what matters here.

whimsicalism

21 days ago

1 reply

also cheaper memory bandwidth. where are you claiming that M5 wins?

Infernal

21 days ago

1 reply

I'm not sure where else you can get a half TB of 800GB/s memory for < $10k. (Though that's the M3 Ultra, don't know about the M5). Is there something competitive in the nvidia ecosystem?

whimsicalism

21 days ago

1 reply

I wasn't aware that M3 Ultra offered a half terabyte of unified memory, but an RTX5090 has double that bandwidth and that's before we even get into B200 (~8TB/s).

650REDHAIR

21 days ago

1 reply

You could get x1 M3 Ultra w/ 512gb of unified ram for the price of x2 RTX 5090 totaling 64gb of vram not including the cost of a rig capable of utilizing x2 RTX 5090.

bigyabai

21 days ago

1 reply

Which would almost be great, if the M3 Ultra's GPU wasn't ~3x weaker than a single 5090: https://browser.geekbench.com/opencl-benchmarks

I don't think I can recommend the Mac Studio for AI inference until the M5 comes out. And even then, it remains to be seen how fast those GPUs are or if we even get an Ultra chip at all.

adastra22

21 days ago

1 reply

Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.

my123

20 days ago

Not for prompt processing. Current Macs are really not great at long contexts

baq

21 days ago

at current memory prices today's cheap is yesterday's obscenely expensive - Apple's current RAM upgrade prices are cheap

jeffbee

21 days ago

1 reply

Very cool. It requires a fully-connected mesh so the scaling limit here would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to work with.

PunchyHamster

21 days ago

1 reply

I'm sure someone will figure out how to make thunderbolt switch/router

huslage

21 days ago

1 reply

I don't believe the standard supports such a thing. But I wonder if TB6 will.

kmeisthax

21 days ago

1 reply

RDMA is a networking standard, it's supposed to be switched. The reason why it's being done over Thunderbolt is that it's the only cheap/prosumer I/O standard with enough bandwidth to make this work. Like, 100Gbit Ethernet cards are several hundred dollars minimum, for two ports, and you have to deal with SFP+ cabling. Thunderbolt is just way nicer[0].

The way this capability is exposed in the OS is that the computers negotiate an Ethernet bridge on top of the TB link. I suspect they're actually exposing PCIe Ethernet NICs to each other, but I'm not sure. But either way, a "Thunderbolt router" would just be a computer with a shitton of USB-C ports (in the same way that an "Ethernet router" is just a computer with a shitton of Ethernet ports). I suspect the biggest hurdle would actually just be sourcing an SoC with a lot of switching fabric but not a lot of compute. Like, you'd need Threadripper levels of connectivity but with like, one or two actual CPU cores.

[0] Like, last time I had to swap work laptops, I just plugged a TB cable between them and did an `rsync`.

bleepblap

21 days ago

1 reply

I think you might be swapping RDMA with RoCE - RDMA can happen entirely within a single node. For example between an NVME and a GPU.

wmf

21 days ago

1 reply

Within a single node it's just called DMA. RDMA is DMA over a network and RoCE is RDMA over Ethernet.

bleepblap

21 days ago

2 replies

Sorry, but it certainly isn't--

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

The "R" in RDMA means there are multiple DMA controllers who can "transparently" share address spaces. You can certainly share address spaces across nodes with RoCE or Infiniband, but thats a layer on top

wtallis

21 days ago

I don't know why that NVIDIA document is wrong, but the established term for doing DMA from eg. an NVMe SSD to a GPU within a single system without the CPU initiating the transfer is peer to peer DMA. RDMA is when your data leaves the local machine's PCIe fabric.

wmf

21 days ago

I'm going to agree to disagree with Nvidia here.

novok

21 days ago

1 reply

Now we need some hardware that is rackmount friendly, an OS that is not fidly as hell to manage in a data center or headless server and we are off to the races! And no, custom racks are not 'rackmount friendly'.

joeframbach

21 days ago

So, the Powerbook Duo Dock?

btown

21 days ago

2 replies

It would be incredibly ironic if, with Apple's relatively stable supply chain relative to the chaos of the RAM market these days (projected to last for years), Apple compute became known as a cost-effective way to build medium-sized clusters for inference.

andy99

21 days ago

2 replies

It’s gonna suck if all the good Macs get gobbled up by commercial users.

mschuster91

21 days ago

1 reply

it's not like regular people can afford this kind of Apple machine anyway.

teeray

21 days ago

2 replies

[delayed]

dghlsakjg

21 days ago

4 replies

Huh?

Home PCs are as cheap as they’ve ever been. Adjusted for inflation the same can be said about “home use” Macs. The list price of an entry level MacBook Air has been pretty much the same for more than a decade. Adjust for inflation, and you get a MacBook air for half the real cost of the launch model that is massively better in every way.

A blip in high end RAM prices has no bearing on affordable home computing. Look at the last year or two and the proliferation of cheap, moderately to highly speced mini desktops.

I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house before dinner tomorrow for $500 + tax.

That’s not depressing, that’s amazing!

heavyset_go

21 days ago

3 replies

Home calculators are cheap as they've ever been, but this era of computing is out of reach for the majority of people.

The analogous PC for this era requires a large amount of high speed memory and specialized inference hardware.

dghlsakjg

21 days ago

1 reply

What regular home workload are you thinking of that the computer I described is incapable of?

Can they run SOTA LLMs? No. Can they run smaller, yet still capable LLMs? Yes.

However, I don’t think that the ability to run SOTA LLMs is a reasonable expectation for “a computer in every home” just a few years into that software even existing.

buu700

21 days ago

It's a bit ironic to invoke the "computer in every home" quote when we're talking about the equivalent of ~$100 buying a non-trivial percentage of all computational power in existence at the time of the quote. By the standards of that time, we don't just have a computer in every home, we have a supercomputer in every pocket.

atonse

21 days ago

You can have access to a supercomputer for pennies, internet access for very little money, and even an m4 Mac mini for $500. You can have a raspberry pi computer for even less. And buy a monitor for a couple hundred dollars.

I feel like you’re twisting the goalposts to make your point that it has to be local compute to have access to AI. Why does it need to be local?

Update: I take it back. You can get access to AI for free.

platevoltage

21 days ago

No it doesn't. The majority of people aren't trying to run Ollama on their personal computers.

inferiorhuman

21 days ago

2 replies

  A blip in high end RAM prices

It's not a blip and it's not limited to high end machines and configurations. Altman gobbled up the lion's share of wafer production. Look at that Raspberry Pi article that made it to the front page, that's pretty far from a high end Mac and according to the article's author likely to be exported from China due to the RAM supply crisis.

dghlsakjg

21 days ago

2 replies

People that can reliably predict the future, especially when it comes to markets are almost always billionaires. Why should I trust your assessment that this pricing is the new standard? If everything remains the same, RAM pricing will also. I have never once found a period in known history where everything stays the same.

The PC that I was talking about is here[https://a.co/d/6c8Udbp]. I live in Canada so translated the prices to USD. Remember that US stores are forced to hide a massive import tax in those prices. The rest of the world isn’t subject to that.

inferiorhuman

21 days ago

1 reply

  People that can reliably predict the future

You don't need to be a genius or a billionaire to realize that when most of the global supply of a product becomes unavailable the remaining supply gets more expensive.

  here’s an equivalent speced pc available in the US for $439 with a prime membership.

So with prime that's $439+139 for $578 which is only slightly higher than the cost without prime of $549.99.

dghlsakjg

21 days ago

2 replies

> You don't need to be a genius or a billionaire to realize that when most of the global supply of a product becomes unavailable the remaining supply gets more expensive.

Yes. Absolutely correct if you are talking about the short term. I was talking about the long term. If you are so certain would you take this bet: any odds, any amount that within 1 month I can buy 32gb of new retail DDR5 in the US for at least 10% less than the $384 you cited. (think very hard on why I might offer you infinite upside so confidently. It's not because I know where the price of RAM is going in the short term)

> So with prime that's $439+139 for $578 which is only slightly higher than the cost without prime of $549.99.

At this point I can't tell if you are arguing in bad faith, or just unfamiliar with how prime works. Just in case: You have cited the cost of prime for a full year. You can buy just a month of prime for a maximum price of $14.99 (that's how I got $455) if you have already used your free trial, and don't qualify for any discounts. Prime also allows cancellation within 14 days of signing up for a paid option, which is more than enough time to order a computer, and have it delivered, and cancel for a full refund.

So really, if you use a trial or ask for a refund for your prime fees the price is $439.

r0b05

20 days ago

What is your estimate for when memory prices will decrease?

I agree that we've seen similar fluctuations in the past and the price of compute trends down in the long-term. This could be a bubble, which it likely is, in which case prices should return to baseline eventually. The political climate is extremely challenging at this time though so things could take longer to stabilize. Do you think we're in this ride for months or years?

inferiorhuman

20 days ago

  At this point I can't tell if you are arguing in bad faith, or just unfamiliar with how prime works. Just in case: You have cited the cost of prime for a full year.

Oh for the love of fuck. I don't subscribe to Prime or pay any attention to how it's priced. I've gotten offers for free trials of Prime before, should I just ignore that for most people Prime is something they have to pay for?

SunlitCat

21 days ago

Don’t forget that many of these manufacturers operate with long-term supply contracts for components like RAM, maintain existing inventory, or are selling systems that were produced some time ago. That helps explain why we are still seeing comparatively low prices at the moment.

If the current RAM supply crisis continues, it is very likely that these kinds of offers will disappear and that systems like this will become more expensive as well, not to mention all the other products that rely on DRAM components.

I also don’t believe RAM prices will drop again anytime soon, especially now that manufacturers have seen how high prices can go while demand still holds. Unlike something like graphics cards, RAM is not optional, it is a fundamental requirement for building any computer (or any device that contains one). People don’t buy it because they want to, but because they have to.

In the end, I suspect that some form of market-regulating mechanism may be required, potentially through government intervention. Otherwise, it’s hard for me to see what would bring prices down again, unless Chinese manufacturers manage to produce DRAM at scale, at significantly lower cost, and effectively flood the market.

sspiff

21 days ago

Add to that a case, PSU and monitor and you're realitically over $1000

behnamoh

21 days ago

3 replies

> Home PCs are as cheap as they’ve ever been.

just the 5090 GPU costs +$3k, what are you even talking about

pests

21 days ago

1 reply

A home PC has to have a SOTA gpu?

morshu9001

21 days ago

1 reply

Probably upset that the high-end video game "hobby" costs more than it used to.

selfhoster11

19 days ago

I mean, yes. Very much so. People should be upset about a relatively affordable hobby getting to this point.

platevoltage

21 days ago

2 replies

Man you positively demolished that straw man.

How much as a base model MacBook Air changed in price over the last 15 years? With inflation, it's gotten cheaper.

morshu9001

21 days ago

2 replies

It's also gotten cheaper nominally. I just got a new baseline model for $750.

teaearlgraycold

21 days ago

1 reply

I feel bad for their competitors. We need good competition in the long run but over the last few years it's made less and less sense to get something other than an Apple laptop for most use cases.

platevoltage

20 days ago

I don't. They're being weighed down by Windows and to a lesser extent, Intel. If they want to excel in the market, make a change. Use what Valve is doing as an example.

morshu9001

20 days ago

Also, the MBA vs MBP lineup is different now. MBP was the default choice before even for students, so MacBooks sorta started at $1300. Now the MBA is decent, and the MBP is really only for pros who need extra power and features.

dghlsakjg

21 days ago

The original base MacBook Air sold for $1799 in 2008. The inflation adjusted price is $2715.

The current base model is $999, and literally better in every way except thickness on one edge.

If we constrain ourselves to just 15 years. The $999 MBA was released that year ($1488 in real dollars). The list price has remained the same, with the exception of when they sold the discontinued 11” MBAs for $899.

It’s actually kind of wild how much better and cheaper computers have gotten.

dghlsakjg

21 days ago

“A computer in every home” (from the original post I was replying to) does not mean “A computer with the highest priced version of the highest priced optional accessory for computers in every home”

I’m talking about the hundreds of affordable models that are perfectly suitable for everything up to and including AAA gaming.

The existence of expensive, and very much optional, high end computer parts does not mean that affordable computers are not more incredible than ever.

Just because cutting edge high end parts are out of reach to you, does not mean that perfectly usable computers are too, as I demonstrated with actual specs and prices in my post.

That’s what I’m talking about.

jeroenhd

21 days ago

1 reply

> I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house before dinner tomorrow for $500 + tax

That's an amazing price, but I'd like to see where you're getting it. 32GB of RAM alone costs €450 here (€250 if you're willing to trust Amazon's February 2026 delivery dates).

Getting a PC isn't that expensive, but after the blockchain hype and then the AI hype, prices have yet to come down. All estimations I've seen will have RAM prices increase further until the summer of next year, and the first dents in pricing coming the year after at the very earliest.

dghlsakjg

20 days ago

https://www.amazon.com/BOSGAME-P3-Gigabit-Ethernet-Computer/...

Aurornis

21 days ago

1 reply

You can get a Mac Mini for $600 with 16GB of RAM and it will be more powerful than the "PC in every home" people would need for any common software.

The personal computing situation is great right now. RAM is temporarily more expensive, but it's definitely not ending any eras.

m-s-y

21 days ago

1 reply

Not Apple’s ram.

jeroenhd

21 days ago

1 reply

RAM prices have exploded enough that Apple's RAM is now no longer a bad deal. At least until their next price hikes.

We're going back to the "consumer PCs have 8GB of RAM era" thanks to the AI bubble.

RestartKernel

20 days ago

Funny, considering Macbooks finally started shipping at 16 GB due to Apple Intelligence.

icedchai

21 days ago

5 replies

Outside of YouTube influencers, I doubt many home users are buying a 512G RAM Mac Studio.

FireBeyond

21 days ago

1 reply

I doubt many of them are, either.

When the 2019 Mac Pro came out, it was "amazing" how many still photography YouTubers all got launch day deliveries of the same BTO Mac Pro, with exactly the same spec:

18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.

Or, more likely, Apple worked with them and made sure each of them had this Mac on launch day, while they waited for the model they actually ordered. Because they sure as hell didn't need an $18,000 computer for Lightroom.

lukeh

21 days ago

1 reply

Still rocking a 2019 Mac Pro with 192GB RAM for audio work, because I need the slots and I can’t justify the expense of a new one. But I’m sure a M4 Mini is faster.

NSUserDefaults

20 days ago

How crazy with # of tracks or plugins do you have to get to saturate such a machine? I was under the impression that most studios would be fine with an Intel Mac Mini + external storage.

DrStartup

21 days ago

4 replies

I'm neither and have 2. 24/7 async inference against github issues. Free. (once you buy the macs that is)

Waterluvian

21 days ago

1 reply

I wonder what the actual lifetime amortized cost will be.

oidar

21 days ago

5 replies

Every time I'm tempted to get one of these beefy mac studios, I just calculate how much inference I can buy for that amount and it's never a good deal.

bee_rider

21 days ago

1 reply

Are the inference providers profitable yet? Might be nice to be ready for the day when we see the real price of their services.

Nextgrid

21 days ago

1 reply

Isn't it then even better to enjoy cheap inference thanks to techbro philanthropy while it lasts? You can always buy the hardware once the free money runs out.

bee_rider

20 days ago

Probably depends on what you are interested in. IMO, setting up local programs is more fun anyway. Plus, any project I’d do with LLMs would just be for fun and learning at this point, so I figure it is better to learn skills that will be useful in the long run.

embedding-shape

21 days ago

1 reply

Every time someone brings up that, it brings me back memories of trying to frantically finish stuff as quickly as possible as either my quota slowly go down with each API request, or the pay-as-you-go bill is increasing 0.1% for each request.

Nowadays I fire off async jobs that involve 1000s of requests, billion of tokens, yet it costs basically the same as if I didn't.

Maybe it takes a different type of person, than the one I am, but all these "pay-as-you-go"/tokens/credits platforms make me nervous to use, and I end up not using it or spending time trying to "optimize", while investing in hardware and infrastructure I can run at home and use that seems to be no problem for my head to just roll with.

noname120

21 days ago

1 reply

But the downside is that you are stuck with inferior LLMs. None of the best models have open weights: Gemini 3.5, Claude Sonnet/Opus 4.5, ChatGPT 5.2. The best model with open weights performs an order of magniture worse than those.

embedding-shape

20 days ago

The best weights are the weights you can train yourself for specific use cases. As long as you have the data and the infrastructure to train/fine-tune your own small models, you'll get drastically better results.

And just because you're mostly using local models doesn't mean you can't use API hosted models in specific contexts. Of course, then the same dread sets in, but if you can do 90% of the tokens with local models and 10% with pay-per-usage API hosted models, you get the best of both worlds.

asimovDev

21 days ago

anyone buying these is usually more concerned with just being able to run stuff on their own terms without handing their data off. otherwise it's probably always cheaper to rent compute for intense stuff like this

dontlaugh

21 days ago

For now, while everything you can rent is sold at a loss.

stingraycharles

21 days ago

Nevermind the fact that there are a lot of high quality (the highest quality?) models that are not released as open source.

madeofpalk

21 days ago

1 reply

I'm not sure who 'home users' are, but i doubt they're buying two $9,499 computers.

trvz

20 days ago

2 replies

Peanuts for people who make their living with computers.

selfhoster11

19 days ago

In the US, yes.

jon-wood

20 days ago

So, not a home user then. If you make your living with computers in that manner you are by definition a professional, and just happen to have your work hardware at home.

icedchai

21 days ago

Heh. I'm jealous. I'm still running a first gen Mac Studio (M1 Max, 64 gigs RAM.) It seemed like a beast only 3 years ago.

servercobra

20 days ago

Interesting. Answering them? Solving them? Looking for ones to solve?

mirekrusin

21 days ago

1 reply

Of course they're not. Everybody is waiting for next generation that will run LLMs faster to start buying.

rbanffy

20 days ago

Every generation runs LLMs faster than the previous one.

kridsdale1

21 days ago

I did. Admittedly it was for video processing at 8k which uses more than 128gb of ram, but I am NOT a YouTuber.

21 days ago

That product can still steal fab slots from cheaper, more consumer products.

teaearlgraycold

21 days ago

It already is depending on your needs.

geerlingguy

21 days ago

2 replies

This implies you'd run more than one Mac Studio in a cluster, and I have a few concerns regarding Mac clustering (as someone who's managed a number of tiny clusters, with various hardware):

1. The power button is in an awkward location, meaning rackmounting them (either 10" or 19" rack) is a bit cumbersome (at best)

2. Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability... wish they made a Mac with QSFP :)

3. Cabling will be important, as I've had tons of issues with TB4 and TB5 devices with anything but the most expensive Cable Matters and Apple cables I've tested (and even then...)

4. macOS remote management is not nearly as efficient as Linux, at least if you're using open source / built-in tooling

To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely, without a GUI, but it looks like you _have_ to use something like Screen Sharing or an IP KVM to log into the UI, to click the right buttons to initiate the upgrade.

Trying "sudo softwareupdate -i -a" will install minor updates, but not full OS upgrades, at least AFAICT.

eurleif

21 days ago

1 reply

I have no experience with this, but for what it's worth, looks like there's a rack mounting enclosure available which mechanically extends the power switch: https://www.sonnetstore.com/products/rackmac-studio

geerlingguy

21 days ago

I have something similar from MyElectronics, and it works, but it's a bit expensive, and still imprecise. At least the power button isn't in the back corner underneath!

wlesieutre

21 days ago

For #2, OWC puts a screw hole above their dock's thunderbolt ports so that you can attach a stabilizer around the cord

https://www.owc.com/solutions/thunderbolt-dock

It's a poor imitation of old ports that had screws on the cables, but should help reduce inadvertent port stress.

The screw only works with limited devices (ie not the Mac Studio end of the cord) but it can also be adhesive mounted.

https://eshop.macsales.com/item/OWC/CLINGON1PK/

timsneath

21 days ago

Also see https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-i...

130 more comments available on Hacker News

View full discussion on Hacker News

ID: 46248644Type: storyLast synced: 12/15/2025, 8:30:48 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN