Microsoft CTO Says He Wants to Swap Most Amd and Nvidia Gpus for Homemade Chips
Posted3 months agoActive3 months ago
cnbc.comTechstoryHigh profile
calmmixed
Debate
70/100
AI HardwareCustom SiliconGPU Market
Key topics
AI Hardware
Custom Silicon
GPU Market
Microsoft's CTO plans to replace most AMD and Nvidia GPUs with in-house AI chips, sparking discussion on the implications for the GPU market and AI hardware development.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
8m
Peak period
56
0-2h
Avg / period
11.7
Comment distribution129 data points
Loading chart...
Based on 129 loaded comments
Key moments
- 01Story posted
Oct 3, 2025 at 10:48 AM EDT
3 months ago
Step 01 - 02First comment
Oct 3, 2025 at 10:56 AM EDT
8m after posting
Step 02 - 03Peak activity
56 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 4, 2025 at 5:15 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45463642Type: storyLast synced: 11/20/2025, 9:01:20 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Just look at the implosion of the XBox business.
And I'm guessing that the decline is due to executive meddling.
What is it that executives do again? Beyond collecting many millions of dollars a year, that is.
I was ranting about this to my friends; Wallstreet is now banking on Tech firms to produce the illusion of growth and returns, rather than repackaging and selling subprime mortgages.
The tech sector seems to have a never ending supply of things to spur investment and growth: cloud computing, saas, mobile, social media, IoT, crypto, Metaverse, and now AI.
Some useful, some not so much.
Tech firms have a lot of pressure to produce growth, it's filled with very smart people, and wields influence on public policy. The flip side is the mortage crisis, at least before it collapsed, got more Americans into home ownership (even if they weren't ready for it). I'm not sure the tech sectors meteoric rise has been as helpful (sentiment of locals in US tech hubs suggests a overall feeling of dissatisfaction with tech)
Oh right, for their data centers. I could see this being useful there too, brings costs down lower.
Yes, in the sense that this is at least partially inspired by Apple's vertical integration playbook, which has now been extended to their own data centers based on custom Apple Silicon¹ and a built-for-purpose, hardened edition of Darwin².
¹ https://security.apple.com/blog/private-cloud-compute/ ² https://en.wikipedia.org/wiki/Darwin_(operating_system)
Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.
Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...
"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."
In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.
So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.
https://www.cdotrends.com/story/3823/groq-ai-chip-delivers-b...
And no, the NPU isn't a GPU.
Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.
https://www.etched.com/announcing-etched
Long term, I wonder if we're exiting the "platform compute" era, for want of a better term. By that I mean compute which can run more or less any operating system, software, etc. If everyone is siloed into their own vertically integrated hardware+operating system stack, the results will be awful for free software.
I think they do all deep learning for Gemini on ther own silicon.
But they also invented AI as we know it when they introduced transformer architecture and they’ve been more invested in machine learning than most companies for a very long time.
Since I prefer games to AI, this makes me rather sad.
When it comes to CPUs they bought P.A. Semi back in 2008 and got a lot of smart people with decades of relevant experience that were doing cutting-edge stuff at the time.
This was immensely important to be able to deliver current Apple CPUs.
I had previously encountered some of that team with the SiByte MIPS in an embedded context, I know they were highly skilled, they had tons of pedigree, but PA Semi itself was a strange beast.
Not sure about the mobile SoCs
Maybe if you restrict it similarly to the Deepseek paper to "Gemini uses TPU for the final successful training run and for scaled inference" you might be correct, but there's no way that GPUs aren't involved for at minimum comparability and more rapid iteration reasons during the extremely buggy and error prone point of getting to the final training run. Certainly the theoretical and algorithmic innovations that are often being done at Google and do make their way into Gemini also sometimes using Nvidia GPUs.
GCP has a lot of, likely on the order of at least 1 million GPUs in their fleet today (I'm likely underestimating). Some of that is used internally and is made available to their engineering staff. What constitutes "deep learning for gemini" is very up to interpretation.
In my experience JAX is way more flexible than pytorch the moment you want to do things that aren't training ML models. E.g. you want to build an optimizer that uses the derivative of your model with respect to the input.
Loss.backward? Tensor.grad? Optimizer.zero grad()? With torch.no_grad()?
What is with all these objects holding pointers to stuff? An ndarray is a pointer to memory and a shape my dudes. A gradient is the change in a scalar function w.r.t to some inputs.
Just to clarify, TPU has been in development for a decade and it is quite mature these days. Years ago internal consumers had to accept the CPU/GPU and TPU duality but I think this case is getting rarer. I guess this is even more true for DeepMind since itself owns a ML infra team. They likely be able to fix most of the issues with a high priority.
> The software titan is rather late to the custom silicon party. While Amazon and Google have been building custom CPUs and AI accelerators for years, Microsoft only revealed its Maia AI accelerators in late 2023.
They are too late for now, they realistically hardware takes a couple generations to become a serious contender and by the time Microsoft has a chance to learn from their hardware mistakes the “AI” bubble will have popped.
But, there will probably be some little LLM tools that do end up having practical value; maybe there will be a happy line-crossing point for MS and they’ll have cheap in-house compute when the models actually need to be able to turn a profit.
Not really and for the same reason Chinese players like Biren are leapfrogging - much of the workload profile in AI/ML is "embarrassingly parallel", thus reducing the need for individual ASICs to be bleeding edge performant.
If you are able to negotiate competitive fabrication and energy supply deals, you can mass produce your way into providing "good enough" performance.
Finally, the persona who cares about hardware performance in training isn't in the market for cloud offered services.
It's largely a solved problem based on Google/Broadcom's TPU work - almost everyone is working with Broadcom to design their own custom ASIC and SoC.
There was a split at MS where the ‘Next Gen’ bayesian was being done in the US and the frequentist work was being shipped off to China. Chris Bishop was promoted to head of MSR Cambridge which didn’t help.
Microsoft really is an institutionally stupid organization so I have no idea on which direction they actually go. My best guess is that it’s all talk.
What I’ve found; MIAI 200 the next version is having issues due to brain drain, and MIAI 300 is to be an entirely new architecture so the status for that is rather uncertain.
I think a big reason MS invested so heavily into OpenAI was to have a marquee customer push cultural change through the org, which was a necessary decision. If that eventually yields in a useful chip I will be impressed, I hope it does.
* a bet on storing the entire model (and code) in 900MB of SRAM. Hell of a lot of SRAM but it only really works for small models and the world wants enormous models.
* Blew it's weirdness budget by a lot. Everything is quite different so it's a significant effort to port software to it. Often you did get a decent speedup (like 2-10x) but I doubt many people thought that was worth the software pain.
* The price was POA so normal people couldn't buy one. (And it would have been too expensive for individuals anyway.) So there was little grass roots community support and research. Nvidia gets that because it's very easy to buy a consumer 4090 or whatever and run AI on it.
* Nvidia were killing it with Grace Hopper.
GC3 was way more ambitious and supports a mountain of DRAM with crazy memory bandwidth so if they ever finish it maybe they'll make a comeback.
So, unless they also solve that issue with their own hardware, then it will be like the TPU, which is limited to usage primarily at Google, or within very specific use cases.
There are only so many super talented software engineers to go around. If you're going to become an expert in something, you're going to pick what everyone else is using first.
I don't know. The transformer architecture uses only a limited number of primitives. Once you have ported those to your new architecture, you're good to go.
Also, Google has been using TPUs for a long time now, and __they__ never hit a brick wall for a lack of CUDA.
> Also, Google has been using TPUs for a long time now, and __they__ never hit a brick wall for a lack of CUDA.
That's exactly what I'm saying. __they__ is the keyword.
If you're going to design a custom chip and deploy it in your data centers, you're also committing to hiring and training developers to build for it.
That's a kind of moat, but with private chips. While you solve one problem (getting the compute you want), you create another: supporting and maintaining that ecosystem long term.
NVIDIA was successful because they got their hardware into developers hands, which created a feedback loop, developers asked for fixes/features, NVIDIA built them, the software stack improved, and the hardware evolved alongside it. That developer flywheel is what made CUDA dominant and is extremely hard to replicate because the shortage of talented developers is real.
You can compare CUDA to the first PC OS, DOS 1.0. Sure, DOS was viewed as a moat at the time, but it didn't keep others from kicking its ass.
Sorry, I don't understand this comparison at all. CUDA isn't some first version of an OS, not even close. It's been developed for almost 20 years now. Bucketloads of documentation, software and utility have been created around it. It won't have its ass kicked by any stretch of imagination.
Anyway, this all distracts from the fact that you don't need an entire "OS" just to run some arithmetic primitives to get transformers running.
If you want to cherry pick anything, you can. But in my eyes, you're just solidifying my point. Software is critical. Minimizing the surface is obviously a good thing (tinygrad for example), but you're still going to need people who are willing and able to write the code.
Anthropic?
You do not need most of CUDA, or most of the GPU functionality, so dedicated chips make sense. It was great to see this theory put to the test in the original llama.cpp stack which showed just what you needed, the tiny llama.c that really shows how little was actually needed and more recently how a small team of engineers at Apple put together MLX.
The name of the game has been custom SoCs and ASICs for a couple years now, because inference and model training is an "embarrassingly parallel" problem, and models that are optimized for older hardware can provide similar gains to models that are run on unoptimized but more performant hardware.
Same reason H100s remain a mainstay in the industry today, as their performance profile is well understood now.
[0] - https://news.ycombinator.com/item?id=45275413
[1] - https://news.ycombinator.com/item?id=43383418
Is anyone else getting crypto flashbacks?
The teams that work on custom ASIC design at (eg.) Broadcom for Microsoft are basically designing custom GPUs for MS, but these will only meet the requirements that Microsoft lays out, and Microsoft would have full insight and visibility into the entire architecture.
https://www.electronicdesign.com/technologies/analog/article...
https://www.analog.com/en/resources/analog-dialogue/articles...
http://madvlsi.olin.edu/bminch/talks/090402_atact.pdf
analog neural network hardware
physical neural network hardware
Put "this paper" after each one to get academic research. Try it with and without that phrase. Also, add "survey" to the next iteration.
The papers that pop up will have the internal jargon the researchers use to describe their work. You can further search with it.
The "this paper," "survey," and internal jargon in various combinations are how I find most CompSci things I share.
It’s never come even close to penciling out in practice.
For small models there are people working on this implemented in flash memory eg Mythic.
We do know: ads, spyware and rounding corners of UI elements.
If their processor work like their software, i really feel pity for people who use it.
https://azure.microsoft.com/en-us/blog/azure-maia-for-the-er...
Is it practical for them to buy an existing chip maker? Or would they just go home-grown?
- As of today, Nvidia's market cap is a whopping 4.51 trillion USD compared to Microsoft's 3.85 trillion USD, so that might not work.
- AMD's market cap is 266.49 billion USD, which is more in reach.
Will they equip the new Microsoft Vacuum Cleaner with it ? /s
https://www.youtube.com/watch?v=So7TNRhIYJ8
Unless they start selling the hardware, but in the current AI market nobody would do that because it's their special sauce.
On the other hand, maybe it's no different than any other hardware, and other makers will catch up eventually.
Submitters: "Please submit the original source. If a post reports on something found on another site, submit the latter." - https://news.ycombinator.com/newsguidelines.html