Local AI Is Driving the Biggest Change in Laptops in Decades

Posted18 days agoActive7 days ago

barqawiz

248 points

252 comments

spectrum.ieee.orgTech DiscussionstoryHigh profile

informativepositive

Debate

40/100

Artificial IntelligenceLaptop DesignLarge Language Models

Key topics

Artificial Intelligence

Laptop Design

Large Language Models

The laptop landscape is on the cusp of a revolution driven by local AI, sparking debate about the future of memory and processing power. Commenters are divided on whether we'll see a surge in affordable laptops with 128+ GB of RAM, with some predicting a flood of second-hand memory and others pointing out that high-capacity RAM has been available in workstation laptops for years. Meanwhile, alternative technologies like compute-in-flash and memristors are being touted as potential game-changers, although some are skeptical about their viability. As the discussion unfolds, it becomes clear that the industry is on the brink of a significant shift, with various stakeholders weighing in on the most likely outcomes.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

139

Day 1

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 22, 2025 at 7:12 PM EST
18 days ago
Step 01
02First comment
Dec 22, 2025 at 10:04 PM EST
3h after posting
Step 02
03Peak activity
139 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Jan 2, 2026 at 4:25 PM EST
7 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (252 comments)

Showing 160 comments of 252

aappleby

18 days ago

8 replies

I predict we will see compute-in-flash before we see cheap laptops with 128+ gigs of ram.

wkat4242

18 days ago

1 reply

Yeah especially since what is happening in the memory market

noosphr

18 days ago

2 replies

Feast and famine.

In three years we will be swimming in more ram than we know what to do with.

fallat

18 days ago

1 reply

Kind of feel that's already the case today... 4GB I find is still plenty for even business workloads.

autoexec

18 days ago

Video games have driven the need for hardware more than office work. Sadly games are already being scaled back and more time is being spent on optimization instead of content since consumers can't be expected to have the kind of RAM available they normally would and everyone will be forced to make due with whatever RAM they have for a long time.

znpy

18 days ago

1 reply

That might not be the case. The kind of memory that will flood the second-hand market could not be the kind of memory we can stuff in laptops or even desktop systems.

robotresearcher

17 days ago

It won’t just be second hand server DIMMs. There may be excess capacity in RAM chip production if the warehouse-scale AI market dries up suddenly.

I’ll take those previously super premium high capacity ECC RAM chips, packaged as workstation DIMMs or on Apple-style SOCs, please.

aitchnyu

18 days ago

1 reply

Memristors are (IME) missing from the news. They promised to act as both persistent storage and fast RAM.

ACCount37

17 days ago

If only memristors weren't vaporware that has "shown promise" for 3 decades now and went nowhere.

znpy

18 days ago

1 reply

You could get 128gb ram laptops from the time ddr4 came around: workstation class laptops with 4 ram slots would happily take 128gb of memory.

The fact that nowadays there are little to no laptops with 4 ran slots is entirely artificial.

mhitza

17 days ago

1 reply

I was mussing this summer if I should get a refurbed Thinkpad P16 with 96GB of RAM to run VMs purely in memory. Now that 96GB of ram cost as much as a second P16.

znpy

17 days ago

1 reply

I feel you, so much. I was thinking of getting a second 64gb node for my homelab and i thought i’d save those money… now the ram alone cost as much as the node, and I’m crying.

Lesson learned: you should always listen to that voice inside your head that say: “but i need it…” lol

pluralmonad

17 days ago

1 reply

I rebuilt a workstation after a failed motherboard a year ago. I was not very excited about being forced to replace it on a days notice and cheaped out on the RAM (only got 32GB). This is like the third or fourth time I've taught myself the lesson to not pinch pennies when buying equipment/infrastructure assets. It's the second time the lesson was about RAM, so clearly I'm a slow learner.

znpy

7 days ago

You probably had paged out the lesson to slow storage… you should get more ram :p

112233

18 days ago

1 reply

By "we" do you mean consumers? No, "we" will get neither. This is unexpected, irresistable opportunity to create a new class, by controlling the technology that people are required and are desiring to use (large genAI) with a comprehensive moat — financial, legislative and technological. Why make affordable devices that enable at least partial autonomy? Of course the focus will be on better remote operation (networking, on-device secure computation, advancing narrative that equates local computation with extremism and sociopathy).

cmxch

17 days ago

Push Washington to grill the foundries and their customers. Repeat until prices drop.

p1esk

18 days ago

We’ve had “compute in flash” for a few years now: https://mythic.ai/product/

14113

17 days ago

There was a company that did compute-in-dram, which was recently acquired by Qualcomm: https://www.emergentmind.com/topics/upmem-pim-system

zamadatix

18 days ago

I can't tell if this is optimism for compute-in-flash or pessimism with how RAM has been going lately!

ajb

17 days ago

The thing that is supposed to happen next is high-bandwidth flash. In theory, it could allow laptops to run the larger models without being extortionately costly, by loading directly from flash into the GPU (not by executing in flash) But I haven't seen figures of the actual bandwidth yet, and no doubt to start with it will be expensive. the underlying technology of flash has much higher read latency than dram, so it's not really clear (to me, at least) if they can deliver the speeds needed to remove the need to cache in VRAM just by increasing parallelism.

wkat4242

18 days ago

3 replies

This article is so dumb. It totally ignores the memory price explosion that will make large fast memory laptops unfeasible for years and states stuff like this:

> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly. It’s not possible to run these models on today’s consumer hardware, so real-world tests just can’t be done.

We know exactly the performance needed for a given responsiveness. TOPS is just a measurement independent from the type of hardware it runs on..

The less TOPS the slower the model runs so the user experience suffers. Memory bandwidth and latency plays a huge role too. And context, increase context and the LLM becomes much slower.

We don't need to wait for consumer hardware until we know much much is needed. We can calculate that for given situations.

It also pretends small models are not useful at all.

I think the massive cloud investments will put pressure away from local AI unfortunately. That trend makes local memory expensive and all those cloud billions have to be made back so all the vendors are pushing for their cloud subscriptions. I'm sure some functions will be local but the brunt of it will be cloud, sadly.

vegabook

18 days ago

1 reply

also, state of the art models have hundreds of _billions_ of parameters.

omneity

18 days ago

It tells you about their ambitions..

dcreater

17 days ago

Horrible article. Low effort, low knowledge. Had no idea the bar was so low for an IEEE publication

layer8

17 days ago

The article is from mid-November (and probably was written even earlier), where the RAM price explosion wasn’t as striking yet.

esses

18 days ago

1 reply

I spent a good 30 seconds trying to figure out what DDS was an acronym for in this context.

lucb1e

17 days ago

1 reply

Care to share the answer?

esses

16 days ago

Turns out the first word of the article was odds.

seanmcdirmid

18 days ago

2 replies

I’ve been running LLMs on my laptop (M3 Max 64GB) for a year now and I think they are ready, especially with how good mid sized models are getting. I’m pretty sure unified memory and energy efficient GPUs will be more than just a thing on Apple laptops in the next few years.

allovertheworld

18 days ago

2 replies

Only because of Apples unified memory architecture. The groundwork is there, we just need memory to be cheaper so we can fit 512+GB now ;)

seanmcdirmid

18 days ago

1 reply

Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition). I meant, I expect the other mobile chip providers to adopt unified architecture and beefy GPU cores on chip, I think AMD is already doing the former?

spwa4

17 days ago

1 reply

> Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition)

Don't worry! Sam Altman is on it. Making sure there never is healthy competition that is.

https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram...

seanmcdirmid

17 days ago

1 reply

We’ve been through multiple cycles of scarcity/surplus DRAM cycles in the last couple of decades. Why do we think it will be different now?

re-thc

17 days ago

2 replies

> Why do we think it will be different now?

Margins. AI usage can pay a lot more. Even if they sell less than can still be more profitable.

In the past there wasn’t a high margin usage. Servers didn’t charge such a high premium.

seanmcdirmid

17 days ago

1 reply

Do you not think that some DRAM producer isn't going to see the high margins as a signal to create more capacity to get ahead of the other DRAM producers? This is how it always has worked before, but somehow it is different this time?

re-thc

17 days ago

1 reply

> Do you not think that some DRAM producer isn't going to see the high margins as a signal to create more capacity to get ahead of the other DRAM producers?

They took the bite during COVID and failed, so there's still fear from over supply.

seanmcdirmid

17 days ago

It only works if they collude on keeping supply steady. If anyone gets greedy for a bigger share of the AI pie, then it implodes quickly. Not all DRAM is made in South Korea so some nationalism will muddy the waters as well.

zozbot234

17 days ago

High margins are exactly what should create a strong incentive to build more capacity. But that dynamic has been tamped down so far because we're all scared of a possible AI bubble that might pop at any moment.

zmmmmm

17 days ago

There's not in the end all that much point having more memory than you can compute on in a reasonable time. So I think probably the useful amount tops out in the 128GB range where you can still run a 70b model and get a useful token rate out of it.

noman-land

17 days ago

1 reply

You doing code completion and agentic stuff successfully with local models? Got any tips? I've been out of the game for [checks watch] a few months and am behind on the latest. Is Cline the move?

seanmcdirmid

17 days ago

1 reply

I haven't bothered doing code completion locally yet, though its something I want to try with the QWEN model. I'm mostly using it to generate/fix code CLI style.

noman-land

16 days ago

I had some pretty decent but very non-state-of-the-art success with it even cobbled together with LM Studio and VSCode plugins. I'm excited to keep trying it over the next months and years.

Morromist

18 days ago

5 replies

I was in the market for a laptop this month. Many new laptops now advertise AI features like this "HP OmniBook 5 Next Gen AI PC" which advertises:

"SNAPDRAGON X PLUS PROCESSOR - Achieve more everyday with responsive performance for seamless multitasking with AI tools that enhance productivity and connectivity while providing long battery life"

I don't want this garbage on my laptop, especially when its running of its battery! Running AI on your laptop is like playing Starcraft Remastered on the Xbox or Factorio on your steamdeck. I hear you can play DOOM on a pregnancy test too. Sure, you can, but its just going to be a tedious inferior experiance.

Really, this is just a fine example of how overhyped AI is right now.

Legend2440

18 days ago

6 replies

Laptop manufacturers are too desperate to cash on the AI craze. There's nothing special about an 'AI PC'. It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.

>I don't want this garbage on my laptop, especially when its running of its battery!

The one bit of good news is it's not going to impact your battery life because it doesn't do any on-device processing. It's just calling an LLM in the cloud.

autoexec

18 days ago

2 replies

Even collecting and sending all that data to the cloud is going to drain battery life. I'd really rather my devices only do what I ask them to than have AI running the background all the time trying to be helpful or just silently collecting data.

sandworm101

18 days ago

2 replies

>> I'd really rather my devices only do what I ask them to

Linux hears your cry. You have a choice. Make it.

benbristow

17 days ago

2 replies

Unfortunately still loads of hurdles for most people.

AAA Games with anti-cheat that don't support Linux.

Video editing (DaVinci Resolve exists but is a pain to get up and running on many distros, KDenLive/OpenShot don't really cut it for most)

Adobe Suite (Photoshop/Lightroom specifically, and Premiere for Video Editing) - would like to see Affinity support Linux but hasn't happened so far. GIMP and DarkTable aren't really substitutions unless you pour a lot of time into them.

Tried moving to Linux on my laptop this past month, made it a month before a reinstall of Windows 11. Had issues with WiFi chip (managed to fix but had to edit config files deep in the system, not ideal), Fedora with LUKS encryption after a kernel update the keyboard wouldn't work to input the encryption key, no Windows Hello-like support (face ID). Had the most success with EndeavourOS but running Arch is a chore for most.

It's getting there, best it's ever been, but there's still hurdles.

grayhatter

17 days ago

1 reply

According to my friends, Arc Raders works well on linux. So it's very much, just a small selection of AAA games, so they can run anti-cheat, that probably doesn't even work. Can you name a triple a you want to play, that proton says is incompatible?

Gimp isn't a solution, sure but it works for what I need. Darktable does way more than I've ever wanted, so I can forgive it for the one time it crashed. Inkscape and blender both exceed my needs as well.

And Adobe is so user hostile, that I feel I need to call you a mean name to prove how I feel.... dummy!

Yes, I already feel bad, and I'm sorry. But trolling aside, listing applications that treat users like shit, aren't reasons to stay on the platform that also treats you like shit.

I get it, sometimes, being treated like shit is worth it because it's easier now that you're used to being disrespected. But an aversion to the effort it'd take for you to climb the learning curve of something different, isn't valid reason to help the disrespectful trash companies making the world worse, recruit more people for them to treat like trash.

Just because you use it, doesn't make it worth recommending.

benbristow

17 days ago

1 reply

I don't really PC game anymore, use my Xbox or a few older games my laptop's iGPU can handle, not at the moment anyway. Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.

I know Adobe are... c-words, but their software is industry standard for a reason.

grayhatter

17 days ago

1 reply

> Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.

We definitely play very different games, I wouldn't touch it if you paid me. So I'm sure we both have a bit of sample bias in our expected rates of linux compatibility. Especially since EA is another company like Adobe.

They're industry standard because they were first. Not necessarily because they were better. They do have a feature set that's near impossible to beat, not even I can pretend like they don't. I'm just saying, respect and fairness is more important to me, than content aware fill ever will be.

Also, doesn't the Adobe suite work on Linux?

benbristow

17 days ago

I think older versions do, like CS6 through WINE.

Photoshop CC 2024 apparently works somewhat, but no GPU support and the removal tool doesn't work apparently.

https://appdb.winehq.org/objectManager.php?sClass=version&iI...

Basically, no.

cultofmetatron

17 days ago

3 replies

> AAA Games with anti-cheat that don't support Linux.

I really don't understand people that want to play games so badly that they are willing to install a literal rootkit on their devices. I can understand if you're a pro gamer but it feels stupid to do it otherwise.

benbristow

17 days ago

Most of the time they're not really informed that they are. I know Valorant does (Riot Games), one I've avoided in the past because of it.

But a lot of the time it's peer-pressure for wanting to play with friends who couldn't care less.

cmxch

17 days ago

Riot Vanguard is a popular rootkit.

eulers_secret

17 days ago

I do it, even though I'm very well informed about what MS is up to WRT to AI, recall, TPM 2.0 shenanigans, etc. I'm also an oldhead and still have grudges against M$ from literally the 90's.

Heck, I even refuse to use vscode because I don't like their current hegemony over dev tooling.

I've also been a Linux dev over 10 years (kernel and userspace).

Fact is, gaming is still better on Windows - for my specific usecases. I only use my gaming PC for gaming, nothing else. It's basically isolated. Sometimes it boots Linux so I can run AI workloads on its juicy graphics card.

Windows has HDR (I know Linux support is there, but it's 'experimental' and screwing with Linux is my day job), HDR10+ gaming, auto HDR for SDR content, anti-cheat, better support (it's the default, easy to find fixes online etc), better or more consistent performance, better NVidia driver support (more up-to-date, latest fixes for games, etc), GeForce Experience for auto settings, NVidia control panel to easily set global 3D settings (limited to 3FPS below refresh, etc), better VRR/G-Sync support... and in the end PC gamepass is a FANTASTIC deal; it's the sole reason I play as many indies as I do.

I really don't see this changing for me in the future, a win10 pro license that keeps getting upgraded for free is a small price for the convenience.

sixothree

17 days ago

Part of me is starting to think Valve is going to be the best thing to happen to Linux (in this regard) since Ubuntu.

Legend2440

18 days ago

1 reply

Copilot is just ChatGPT as an app.

If you don't use it, it will have no impact on your device. And it's not sending your data to the cloud except for anything you paste into it.

dijit

17 days ago

So, the new AI features like recall don’t exist?

Windows is going more and more into AI and embedding it into the core of the OS as much as it can. It’s not “an app”, even if that was true now it wouldn't be true for very long. The strategy is well communicated.

marcus_holmes

18 days ago

4 replies

Doesn't this lead to a lot of tension between the hardware makers and Microsoft?

MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.

Laptop manufacturers are making laptops that can run an LLM locally, but there's no point in that unless there's a local LLM to run (and Windows won't have that because Copilot). Are they going to be pre-installing Llama on new laptops?

Are we going to see a new power user / normal user split? Where power users buy laptops with LLMs installed, that can run them, and normal folks buy something that can call Copilot?

Any ideas?

autoexec

18 days ago

2 replies

> MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.

MS doesn't care where your data is, they're happy to go digging through your C drive to collect/mine whatever they want, assuming you can avoid all the dark patterns they use to push you to save everything on OneDrive anyway and they'll record all your interactions with any other AI using Recall which they'll also backup to their cloud.

marcus_holmes

18 days ago

I had assumed that they needed the usage to justify the investment in the data centre, but you could be right and they don't care.

astrange

16 days ago

MS doesn't want your data in the first place. Nobody cares about or wants your data. You are not special.

zdragnar

18 days ago

1 reply

It isn't just copilot that these laptops come with; manufacturers are already putting their own AI chat apps as well.

For example, the LG gram I recently got came with just such an app named Chat, though the "ai button" on the keyboard (really just right alt or control, I forget which) defaults to copilot.

If there's any tension at all, it's just who gets to be the default app for the "ai button" on the keyboard that I assume almost nobody actually uses.

marcus_holmes

18 days ago

Interesting. Yeah, that'll be the argument

wmf

17 days ago

Copilot is a local LLM (well SLM). https://learn.microsoft.com/en-us/windows/ai/apis/phi-silica

eterm

17 days ago

It's just marketing. The laptop makers will market it as if your laptop power makes a difference knowing full well that it's offloaded to the cloud.

For a slightly more charitable perspective, agentic AI means that there is still a bunch of stuff happening on the local machine, it's just not the inference itself.

zamadatix

18 days ago

1 reply

> It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.

"AI PC" branded devices get "Copilot+" and additional crap that comes with that due to the NPU. Despite desktops having GPUs with up to 50x more TOPs than the requirement, they don't get all that for some reason https://www.thurrott.com/mobile/copilot-pc/323616/microsoft-...

robocat

17 days ago

[delayed]

14113

17 days ago

That's not quite correct. Snapdragon chips that are advertised as being good for "AI" also come with the Hexagon DSP, which is now used for (or targeted at) AI applications. It's essentially a separate vector processor with large vector sizes.

bitwize

18 days ago

AI PCs also have NPUs which I guess provide accelerated matmuls, albeit less accelerated than a good discrete GPU.

eleventyseven

17 days ago

There's nothing special with what Intel has lowered the bar as an AI PC so vendors can market it. Ollama can run a 4b model plenty fine on Tiger Lake with 8gb classic RAM.

But unified memory IS truly what makes an AI ready PC. The Apple Silicon proves that. People are willing to pay the premium, and I suspect unified memory will still be around and bringing us benefits even if no one cares about LLMs in 5 years.

neves

17 days ago

1 reply

I have a Snapdragon laptop and it is the best I've ever had. But the NPU is really almost useless.

This is a nice companion to the article: https://www.pcworld.com/article/2965927/the-great-npu-failur...

dijit

17 days ago

Agreed, I have the ARM based T14s for work.

The thing is nowhere near the performance as a macbook, but its silent and the battery lasts ages, which is a far cry from the same laptop with an Intel CPU, which is what many are running.

Company removes a lot of the AI bloat though.

layer8

17 days ago

It’s true that the AI marketing is largely nonsense, but the NPUs also don’t hurt, and you don’t have to make use of them.

pluralmonad

17 days ago

Factorio runs really well on the deck though...

But yeah, fresh install of OS is a must for any new computer.

dpedu

17 days ago

> Running AI on your laptop is like playing Starcraft Remastered on the Xbox

A great analogy because there is Starcraft for a console - Nintendo 64 - and it is quite awkward. Split-screen multiplayer included.

socketcluster

18 days ago

4 replies

I feel like there's no point to get a graphics card nowadays. Clearly, graphics cards are optimized for graphics; they just happened to be good for AI but based on the increased significance of AI, I'd be surprised if we don't get more specialized chips and specialized machines just for LLMs. One for LLMs, one for stable diffusion.

With graphics processing, you need a lot of bandwidth to get stuff in and out of the graphics card for rendering on a high-resolution screen, lots of pixels, lots of refreshes, lots of bandwidth... With LLMs, a relatively small amount of text goes in and a relatively small amount of text comes out over a reasonably long amount of time. The amount of internal processing is huge relative to the size of input and output. I think NVIDIA and a few other companies already started going down that route.

But probably graphics cards will still be useful for stable diffusion; especially AI-generated videos as the inputs and output bandwidth is much higher.

Legend2440

18 days ago

2 replies

LLMs are enormously bandwidth hungry. You have to shuffle your 800GB neural network in and out of memory for every token, which can take more time/energy than actually doing the matrix multiplies. GPUs are almost not high bandwidth enough.

Zambyte

18 days ago

4 replies

This doesn't seem right. Where is it shuffling to and from? My drives aren't fast enough to load the model every token that fast, and I don't have enough system memory to unload models to.

p1esk

18 days ago

1 reply

It is right. The shuffling is from CPU memory to GPU memory, and from GPU memory to GPU. If you don’t have enough memory you can’t run the model.

Zambyte

17 days ago

2 replies

How can I observe it being loaded into CPU memory? When I run a 20gb model with ollama, htop reports 3gb of total RAM usage.

zamadatix

17 days ago

1 reply

Think of it like loading a moving truck where:

- The house is the disk

- You are the RAM

- The truck is the VRAM

There won't be a single time you can observe yourself carrying the weight of everything being moved out of the house because that's not what's happening. Instead you can observe yourself taking many tiny loads until everything is finally moved, at which point you yourself should not be loaded as a result of carrying things from the house anymore (but you may be loaded for whatever else you're doing).

Viewing active memory bandwidth can be more complicated than it'd seem to set up, so the easier way is to just view your VRAM usage as you load in the model freshly into the card. The "nvtop" utility can do this for most any GPU on Linux, as well as other stats you might care about as you watch LLMs run.

Zambyte

17 days ago

1 reply

My confusion was on the shuffling process happening per token. If this was happening per token, it would be effectively the same as loading the model from disk every token.

p1esk

17 days ago

The model might get loaded on every token - from GPU memory to GPU. This depends on how much of it is cached on GPU. Inputs to every layer must be loaded as well. Also, if your model doesn’t fit in GPU memory but fits in CPU memory, and you’re doing GPU offloading, then you’re also shuffling between CPU and GPU memory.

p1esk

17 days ago

Depends on map_location arg in torch.load: might be loaded straight to GPU memory

Legend2440

18 days ago

1 reply

From VRAM to the tensor cores and back. On a modern GPU you can have 1-2tb moving around inside the GPU every second.

This is why they use high bandwidth memory for VRAM.

Zambyte

17 days ago

This makes sense now, thanks!

smallerize

18 days ago

You're probably not using an 800GB model.

zamadatix

18 days ago

If you're using a MoE model like DeepSeek V3 the full model is 671 GB but only 37 GB are active per token, so it's more like running a 37 GB model from the memory bandwidth perspective. If you do a quant of that it could e.g. be more like 18 GB.

socketcluster

18 days ago

2 replies

But even so, the rate for a super fast LLM would be like 100 tokens per second... With graphics, we're taking like 2 million pixels 60 times a second... 120 million pixels a second. Big difference.

djsjajah

17 days ago

> Do you really though?

Yes.

It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register. So for every token that is generated, it has to read every parameter in a dense model.

visarga

17 days ago

If we did that it would be much more expensive, keeping all weights in SRAM is done by Groq for example.

ACCount37

17 days ago

1 reply

Nah, that's just plain wrong.

First, GPGPU is powerful and flexible. You can make an "AI-specific accelerator", but it wouldn't be much simpler or much more power-efficient - while being a lot less flexible. And since you need to run traditional graphics and AI workloads both in consumer hardware? It makes sense to run both on the same hardware.

And bandwidth? GPUs are notorious for not being bandwidth starved. 4K@60FPS seems like a lot of data to push in or out, but it's nothing compared to how fast modern PCIe 5.0 x16 goes. AI accelerators are more of the same.

djsjajah

17 days ago

1 reply

GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm. It’s the whole reason why low precision floating point numbers are being pushed by nvidia.

ACCount37

17 days ago

That's memory bandwidth, not I/O. Unless your LLM doesn't fit into VRAM.

autoexec

18 days ago

I don't doubt that there will be specialized chips that make AI easier, but they'll be more expensive than the graphics cards sold to consumers which means that a lot of companies will just go with graphics cards, either because the extra speed of specialized chips won't be worth the cost, or will they'll be flat out too expensive and priced for the small number of massive spenders who'll shell out insane amounts of money for any/every advantage (whatever they think that means) they can get over everyone else.

zamadatix

18 days ago

> Clearly, graphics cards are optimized for graphics; they just happened to be good for AI

I feel like the reverse has been true since after the Pascal era.

fwipsy

18 days ago

3 replies

Seems like wishful thinking.

> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly.

Why not extrapolate from open-source AIs which are available? The most powerful open-source AI (which I know of) is Kimi K2 and >600gb. Running this at acceptable speed requires 600+gb GPU/NPU memory. Even $2000-3000 AI-focused PCs like the DGX spark or Strix Halo typically top out at 128gb. Frontier models will only run on something that costs many times a typical consumer PC, and only going to get worse with RAM pricing.

In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.

But the typical user will never need that much RAM for basic web browsing, etc. The typical computer RAM size is not going to keep growing indefinitely.

What about cheaper models? It may be possible to run a "good enough" model on consumer hardware eventually. But I suspect that for at least 10-15 years, typical consumers (HN readers may not be typical!) will prefer capability, cheapness, and especially reliability (not making mistakes) over being able to run the model locally. (Yes AI datacenters are being subsidized by investors; but they will remain cheaper, even if that ends, due to economies of scale.)

The economics dictate that AI PCs are going to remain a niche product, similar to gaming PCs. Useful AI capability is just too expensive to add to every PC by default. It's like saying flying is so important, everyone should own an airplane. For at least a decade, likely two, it's just not cost-effective.

sipjca

18 days ago

1 reply

> It may be possible to run a "good enough" model on consumer hardware eventually

10-15 years?!!!! What is the definition of good enough? Qwen3 8B or A30B are quite capable models which run on a lot of hardware even today. SOTA is not just getting bigger, it's also getting more intelligence and running it more efficiently. There have been massive gains in intelligence at the smaller model sizes. It is just highly task dependent. Arguably some of these models are "good enough" already, and the level of intelligence and instruction following is much better from even 1 year ago. Sure not Opus 4.5 level, but still much could be done without that level of intelligence.

fwipsy

17 days ago

1 reply

"Good enough" has to mean users won't be frequently frustrated if they transition to it from a frontier model.

> it is highly task dependent... much could be done without that level of intelligence

This is an enthusiast's glass-half-full perspective, but casual end users are gonna have a glass-half-empty perspective. Quen3-8B is impressive, but how many people use it as a daily driver? Most casual users will toss it as soon as it screws up once or twice.

The phrase you quoted in particular was imprecise (sorry) but my argument as a whole still stands. Replace "consumer hardware" with "typical PCs" - think $500 bestseller laptops from Walmart. AI PCs will remain niche luxury products, like gaming PCs. But gaming PCs benefit from being part of gaming culture and because cloud gaming adds input latency. Neither of these affects AI much.

sipjca

17 days ago

How many consumers (not business) are genuinely using frontier models? You think OpenAI and Anthropic will forever serve the most intelligent models to free users? Heck they don’t already

Efficiency gains exist and likely will continue, as well as hardware generally accelerating, as software and hardware starts to become co-optimized. This will take time no doubt but 10-15 years is hilariously long in this world. The iPhone has barely been out that long

And to be clear I think the other arguments are valid I just think the timeline is out of whack

epicureanideal

18 days ago

2 replies

You may be correct, but I wonder if we'll see Mac Mini sized external AI boxes that do have the 1TB of RAM and other hardware for running local models.

Maybe 100% of computer users wouldn't have one, but maybe 10-20% of power users would, including programmers who want to keep their personal code out of the training set, and so on.

I would not be surprised though if some consumer application made it desirable for each individual, or each family, to have local AI compute.

It's interesting to note that everyone owns their own computer, even though a personal computer sits idle half the day, and many personal computers hardly ever run at 80% of their CPU capacity. So the inefficiency of owning a personal AI server may not be as much of a barrier as it would seem.

saltcured

17 days ago

But will it ever lead to a Mac Mini-priced external AI box? Or will this always be a premium "pro" tier that seems to rival used car prices?

seanmcdirmid

18 days ago

> but I wonder if we'll see Mac Mini sized external AI boxes that do have the 1TB of RAM

Isn't that the Mac Studio already? Ok, it seems to max at 512 GB.

marcus_holmes

18 days ago

> In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.

Part of the reason that RAM isn't growing faster is that there's no need for that much RAM at the moment. Technically you can put multiple TB of RAM in your machine, but no-one does that because it's a complete waste of money [0]. Unless you're working in a specialist field 16Gb of RAM is enough, and adding more doesn't make anything noticeably faster.

But given a decent use-case, like running an LLM locally, and you'd find demand for lots more RAM, and that would drive supply, and new technology developments, and in ten years it'll be normal to have 128TB of RAM in a baseline laptop.

Of course, that does require that there is a decent use-case for running an LLM locally, and your point that that is not necessarily true is well-made. I guess we'll find out.

[0] apart from a friend of mine working on crypto who had a desktop Linux box with 4TB of RAM in it.

gguncth

18 days ago

2 replies

I have no desire to run an LLM on my laptop when I can run one on a computer the size of six football fields.

sandworm101

18 days ago

2 replies

I've been playing around with my own home-built AI server for a couple months now. It is so much better than using a cloud provider. It is the difference between drag racing in your own car, and renting one from a dealership. You are going to learn far more doing things yourself. Your tools will be much more consistent and you will walk away with a far greater understanding of every process.

A basic last-generation PC with something like a 3060ti (12GB) is more than enough to get started. My current rig pulls less than 500w with two cards (3060+5060). And, given the current temperature outside, the rig helps heat my home. So I am not contributing to global warming, water consumption, or any other datacenter-related environmental evil.

HelloUsername

17 days ago

> I am not contributing to global warming

lol

DamonHD

16 days ago

Unless you normally use electric resistance heating (or some kind of fossil fuel with higher gCO2/kWh) then you don't get necessarily a free pass on the global warming thing!

Our whole home is heated with <500W on average: at this moment the heat pump is drawing 501W (H4 boundary) at close to freezing outside, and its demand is intermittent.

theshrike79

17 days ago

1 reply

[delayed]

dboreham

17 days ago

1 reply

Regular people don't understand nor care about any of that. They'll happily take the Faustian bargain.

theshrike79

17 days ago

[delayed]

spullara

18 days ago

1 reply

[delayed]

scotty79

13 days ago

I'm running it on PC laptop with mobile 5090 and 64GB of ram. Start is a bit rough, but once it gets going it is perfectly servicable when I'm on a bad connection.

mattas

17 days ago

3 replies

See: "3D TVs are driving the biggest change in TVs in decades"

eleventyseven

17 days ago

7 replies

A lazy easy cheap shot. But do you deny these aspects from the article are not coming? Or won't be still here in 5 years?

- Addition of more—and faster—memory.

- Consolidation of memory.

- Combination of chips on the same silicon.

All of these are also happening for non AI reasons. The move to SoC that really started with the M1 wasn't because of AI, but unified memory being the default is something we will see in 5 years. Unlike 3D TV.

MisterTea

17 days ago

1 reply

> The move to SoC that really started with the M1

No it did not.

robotresearcher

17 days ago

1 reply

Which widely available laptops with comparable but prior to M1 SOCs are you thinking of?

ac29

14 days ago

x86 laptop chips have been SoCs for a lot more than 5 years

technion

17 days ago

1 reply

We just had a series of articles and sysadmin outcry that major vendors were bringing 8gb laptops back to standard models because of the ram prices. In the short term, we're seeing a reduction.

frank_nitti

17 days ago

In terms of demand, anecdotally-speaking I can certainly see this influencing some decisions when other circumstances permit. Many people I know are both excited for new and better games, and equally exited about running LLM/SD/etc models locally with Comfy, LM studio and the like

estimator7292

17 days ago

Memory is absolutely not coming in the near future. Nobody can afford it.

heavyset_go

16 days ago

The move to SoC happened long before the M1, it was the state of things in the ARM space for over a decade, and most x86 laptops have been SoCs for quite some time.

ToucanLoucan

17 days ago

In order:

- People wanting more memory is not a novel feature. I am excited to find out how many people immediately want to disable the AI nonsense to free up memory for things they actually want to do.

- Same answer.

- I think the drive towards SOCs has been happening already. Apple's M-series utterly demolishes every PC chip apart from the absolute bleeding-edge available, includes dedicated memory and processors for ML tasks, and it's mature technology. Been there for years. To the extent PC makers are chasing this, I would say it's far more in response to that than anything to do with AI.

blibble

17 days ago

> Addition of more—and faster—memory.

probably not after scam altman bought up half the world's supply for his shit company

BoredPositron

17 days ago

That happened before AI. I hate this maxing in the hardware sector. People with 16 cores and 14 of them are just idling away for the life time of the device. It's insanity the same with gamers and 20 fans in their PCs. Everyone with a fucking 300 bucks podcast mic or a mirror less camera as webcam and the lighting setup of a small town photostudio for some dumbass team jour fix every week. Devs with 4 mechanical keyboards in their drawer while sleeping on a 40 dollar mattress on the floor. Insanity.

m4rtink

17 days ago

Blockchain is making money obsolete.

j45

17 days ago

This article is just saying more laptops will have power efficient GPUs in it. A bit better than 3D TVs.

They might not use Apple silicon often. Other options are encouraging.

seunosewa

17 days ago

4 replies

"How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly."

What's he talking about? It's trivial to calculate that.

RobotToaster

17 days ago

2 replies

Isn't the ability to run it more dependant on (V)RAM? With TOPS just dictating the speed at which it runs?

NitpickLawyer

17 days ago

A good rule of thumb is that PP (Prompt Processing) is compute bound while TG (Token Generation) is (V)RAM speed bound.

zozbot234

17 days ago

Strictly speaking, you don't need that much VRAM or even plain old RAM - just enough to store your context and model activations. It's just that as you run with less and less (V)RAM you'll start to bottleneck on things like SSD transfer bandwidth and your inference speed goes down to a crawl. But even that may or may not be an issue depending on your exact requirements: perhaps you don't need your answer instantly and can wait while it gets computed in the background. Or maybe you're running with the latest PCIe 5 storage which overall gives you comparable bandwidth to something like DDR3/DDR4 memory.

fny

17 days ago

It's also been done before...[0]

[0]: https://www.edge-ai-vision.com/2024/05/2024-edge-ai-and-visi...

cramcgrab

17 days ago

It’s trivial to ask an AI to answer that. Well, I guess we know it’s not an AI generated article!

swyx

17 days ago

> state-of-the-art models

> hundreds of millions of parameters

lol

lmao, even

tracerbulletx

17 days ago

2 replies

This mostly just shows you how far behind the M1 (which came out 5 years ago) all the non Apple laptops are.

blazingbanana

17 days ago

1 reply

Was never really into Apple hardware (mainly the price), however I recently got an M1 Mac Mini and an iPhone for app development, and the inference speed for as you say, a 5 year old chip is actually crazy.

If they made the M series fully open for Linux (I know Asahi is working away) I probably would never buy another non-M series processor again.

dpedu

17 days ago

1 reply

I got an M1 Mac Mini somewhat recently as well, to replace my ~2012 Mac Mini that I use as a media center PC. And frankly, it's overkill. Used ones can be had for $200-$300 USD, lower side with cosmetic damage. An absolute steal, IMO.

bnolsen

17 days ago

Work gave me an m1 pro with 32gb on it. A year ago I put together one of those minisforum board+laptop apu with 64gb ram and 2tb nvme for not much money at the time, likely 500usd. For the performance sensitive software I was working on the 7935hs ran with about 50x more throughout using compilers with llvm backend.

jeffbee

17 days ago

2 replies

You can still get an M1 Macbook Air at retail for $599 ($300 for refurbs), which is a Chromebook price for a laptop that is better in pretty much every respect than any Chromebook.

nyarlathotep_

17 days ago

https://slickdeals.net/f/19004236-select-micro-center-stores...

MicroCenter has(had? OOS near me) M4 Minis for $400!

A remarkable bargain, even more so considering the recent hardware price hikes.

heavyset_go

16 days ago

If you're going for refurbs, you can get a device with an AMD 7000/8000/9000 APU, at the same or lower price point, and the iGPU itself will perform better than an M1 for prompt processing and generation, even with SODIMM memory.

TrackerFF

17 days ago

2 replies

With the wild ram prices, which btw are probably going to last out 2026, I expect 8 GB ram to be the new standard going on forward.

32 GB ram will be for enthusiasts with deep pockets, and professionals. Anything over that, exclusively professionals.

The conspiracy theorist inside me is telling me that big AI companies like OpenAI would rather see that people are using their puny laptops as terminals / shells only, to reach sky-based models, than to let them have beefy laptops and local models.

cmxch

17 days ago

Not if a few investigations into the foundries and their datacenter deals stops that.

andy99

17 days ago

  The conspiracy theorist inside me is telling me that big AI companies...

I don’t believe in conspiracies but I do believe in incentives sometimes lining up. Now that there is a RAM heavy cloud application, cloud providers are suddenly in direct competition with consumers for scarce resources, with the winner being able to control where people run their models.

jwr

17 days ago

1 reply

The author seems unaware of how well recent Apple laptops run LLMs. This is puzzling and puts into question the validity of anything in this article.

fancyfredbot

17 days ago

1 reply

I think the author is aware of Apple silicon. The article mentions the fact Apple has unified memory and that this is advantageous for running LLMs.

dangus

17 days ago

2 replies

Then idk why they say that most laptops are bad at running LLMs, Apple has a huge marketshare in the laptop market and even their cheapest laptops are capable in that realm.

> However, for the average laptop that’s over a year old, the number of useful AI models you can run locally on your PC is close to zero.

This straight up isn’t true IMO.

layer8

17 days ago

1 reply

By “PC”, they mean non-Apple devices.

Also, macOS only has around 10% desktop market share globally, and maybe double that in the US.

dangus

17 days ago

1 reply

It's actually closer to 20% globally. Apple now outsells Lenovo:

https://www.mactech.com/2025/03/18/the-mac-now-has-14-8-of-t...

layer8

16 days ago

1 reply

I meant market share in terms of installed base: https://gs.statcounter.com/os-market-share/desktop/worldwide...

dangus

16 days ago

macOS and OS X are split on this graph, and “Unknown” could be anything? This might actually show Apple install base close to 20%.

andai

17 days ago

1 reply

So I'm hearing a lot of people running LLMs on Apple hardware. But is there actually anything useful you can run? Does it run at a usable speed? And is it worth the cost? Because the last time I checked the answer to all three questions appeared to be no.

Though maybe it depends on what you're doing? (Although if you're doing something simple like embeddings, then you don't need the Apple hardware in the first place.)

DANmode

17 days ago

Of course it depends what you’re doing.

Do you work offline often?

Essential.

zkmon

17 days ago

You don't understand the needs of a common laptop user. Define the usecases that require reaching out to laptop instead of using the phone that is nearby. Those usecases don't need LLM for a common laptop user.

juancn

17 days ago

The price of RAM is going to throw a wrench at that

tehjoker

17 days ago

I mean, having a more powerful laptop is great, but at the same time, these guys are calling for a >10x increase in RAM and a far more powerful NPU. How will this affect pricing? How will it affect power management? It made it seem like most of the laptop will be dedicated to gen AI services, which I'm still not entirely convinced are quite THAT useful. I still want a cheap laptop that lasts all day!

j45

18 days ago

This must be referring mostly to windows, or non-Apple laptops

bfrog

18 days ago

I suppose it depends on the model, code was useless. As a lossy copy of an interactive Wikipedia it could be ok not good or great just ok.

Maybe for creative suggestions and editing it’d be ok.

superkuh

17 days ago

The problem with this is that NPU have terrible, terrible support in the various software ecosystems because they are unique to their particular soc or whatever. No consistency even within particular companies.

92 more comments available on Hacker News

View full discussion on Hacker News

ID: 46360856Type: storyLast synced: 12/26/2025, 12:00:25 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN