Extropic Is Building Thermodynamic Computing Hardware
Posted2 months agoActive2 months ago
extropic.aiTechstoryHigh profile
skepticalmixed
Debate
80/100
Thermodynamic ComputingAI HardwareProbabilistic Computing
Key topics
Thermodynamic Computing
AI Hardware
Probabilistic Computing
Extropic is developing thermodynamic computing hardware for AI workloads, sparking debate among HN users about its legitimacy, potential applications, and technical challenges.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
14m
Peak period
82
0-12h
Avg / period
13.5
Comment distribution108 data points
Loading chart...
Based on 108 loaded comments
Key moments
- 01Story posted
Oct 29, 2025 at 2:25 PM EDT
2 months ago
Step 01 - 02First comment
Oct 29, 2025 at 2:38 PM EDT
14m after posting
Step 02 - 03Peak activity
82 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 2, 2025 at 9:01 PM EST
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45750995Type: storyLast synced: 11/20/2025, 4:29:25 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
>no real technology underneath
They're literally shipping real hardware. They also put out a paper + posted their code too.
Flippant insults will not cut it.
I'll say it again. The hardware exists. The paper and code are there. If someone wants to insist that it's fake or whatever, they need to come up with something better than permutations of "u r stoopid" (your response to their paper: https://news.ycombinator.com/item?id=45753471). Just engage with the actual material. If there's a solid criticism, I'd like to hear it too.
It attracts a kind of people that are not aware of what hacker means
Is it?
The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.
In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.
I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.
I said you can't dismiss someone's hardware + paper + code solely based on insults. That's what I said. That was my argument. Speaking of which:
>disingenuous
>sizzle
>oversell
>dubious niche value
>window dressing
>suspicious
For the life of me I can't understand how any of this is an appropriate response when the other guy is showing you math and circuits.
If this was just the paper, I'd say 'cool area of research, dunno if it'll find application though'. I'm criticizing the business case and the messaging around it, not the implementation.
Two important questions I think illustrate my point:
1) The paper shows an FPGA implementation which has a 10x speedup compared to a CPU or GPU implementation. Extropic's first customer would have leapt up and started trying to use the FPGA version immediately. Has anyone done this?
2) The paper shows the projected real implementation being ~10x faster than the FPGA version. This is similar to the speedup going from an FPGA to an ASIC implementation of a digital circuit, which is a standard process which requires some notable up-front cost but much less than developing and debugging custom analog chips. Why not go this route, at least initially?
If Extropic (or any similar competitor) can unlock these hypothetical gains, I'd like to see it sorted out asap.
FPGAs are superior in every respect for models of up to a few megabytes in size and scale all the way down to zero. If they are going for generative AI, they wouldn't even have bothered with FPGAs, because only the highest end FPGAs with HBM are even viable and even then, they come with dedicated AI accelerators.
The stuff they talk about in the paper is mainly about things that were in vogue when AI was called Machine Learning where you're essentially trying to construct and sample from very complicated distributions to try to represent your problem in a Bayesian way (i.e. trying to create a situation where you can calculate 'what's the most probable answer given this problem'. In this approach it's often useful to have a relatively small 'model' but to be able to feed random numbers predicated on it back into itself to be able to sample from a distribution which would otherwise be essentially intractable to sample from). This kind of thing was very successful for some tasks but AFAIK those tasks would generally be considered quite small today (and I don't know how many of those have now been taken over by neural nets anyway).
This is why I say it looks very niche and it feels like the website tries to just ride on the AI hypetrain by association with the term.
It's just miniaturized lava lamps.
One can create a true random generator algorithm by plugging a moving computer mouse to its input.
Would be easy to put a dozen cages with mouse wheels on in them, real mammals in there, to generate a lot of random numbers, everyone would understand so only funny, they want mysterious!
This is what they are trying to create, more specifically:
https://pubs.aip.org/aip/apl/article/119/15/150503/40486/Pro...
"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."
RIP Ed. https://en.wikipedia.org/wiki/Edward_Fredkin
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.
Could you provide some keywords to read more about this?
The Web site of Extropic claims that their hardware devices are made with standard CMOS technology, which cannot make magnetic tunnel junctions.
So it appears that there is no connection between the linked article and what Extropic does.
The idea of stochastic computation is not at all new. I have read about such stochastic computers as a young child, more than a half of century ago, long before personal computers. The research on them was inspired by the hypotheses about how the brain might work.
Along with analog computers, stochastic computers were abandoned due to the fast progress of deterministic digital computers, implemented with logic integrated circuits.
So anything new cannot be about the structure of stochastic computers, which has been well understood for decades, but only about a novel extremely compact hardware RNG device, which could be scaled to a huge number of RNG devices per stochastic computer.
I could not find during a brief browsing of the Extropic site any description about the principle of their hardware RNG, except that it is made with standard CMOS technology. While there are plenty of devices made in standard CMOS that can be used as RNG, they are not reliable enough for stochastic computation (unless you use complex compensation circuits), so Extropic must have found some neat trick to avoid using complex circuitry, assuming that their claims are correct.
However I am skeptical about their claims because of the amount of BS words used on their pages, which look like taken from pseudo-scientific Star Trek-like mumbo-jumbo, e.g. "thermodynamic computing", "accelerated intelligence", "Extropic" derived from "entropic", and so on.
To be more clear, there is no such thing as "thermodynamic computing" and inventing such meaningless word combinations is insulting for the potential customers, as it demonstrates that the Extropic management believes that they must be naive morons.
The traditional term for such computing is "stochastic computing". "Stochastics" is an older, and in my opinion better, alternative name for the theory of probabilities. In Ancient Greek, "stochastics" means the science of guessing. Instead of "stochastic computing" one can say "probabilistic computing", but not "thermodynamic computing", which makes no sense (unless the Extropic computers are dual use, besides computing, they also provide heating and hot water for a great number of houses!).
Like analog computers, stochastic computers are a good choice only for low-precision computations. With increased precision, the amount of required hardware increases much faster for analog computers and for stochastic computers than for deterministic digital computers.
The only currently important application that is happy even with precisions under 16 bit is AI/ML, so trying to market their product for AI applications is normal for Extropic, but they should provide more meaningful information about what advantages their product might have.
I have not seen benchmarks on Extropic's new computing hardware yet but need to know from experts who are in the field of AI infrastructure at the semiconductor level if this is legit.
I'm 75% believing that this is real but have a 25% skepticisim and will reserve judgement until others have tried the hardware.
So my only question for the remaining 25%:
Is this a scam?
https://extropic.ai/writing/inside-x0-and-xtr-0
I really like the design of it though.
This gives "hype" vibes.
It's the one attached to my TV that just runs movies/YT - I don't recall the last time I heard the fans.
It scored 34 for mobile and completely timed out for desktop with a time limit exceeded warning.
It’s hilariously bad.
I have a passing familiarity with the areas they talk about in the paper, and it feels... dubious. Mainly because of the dedicated accelerator problem. Even dedicated neural net accelerators are having difficulty gaining traction against general purpose compute units in a market that is ludicrously hot for neural net processing, and this is talking about accelerating Monte-Carlo processes which are pretty damn niche in application nowadays (especially in situations where you're compute-limited). So even if they succeed in speeding up that application, it's hard to see how worthwhile it would be. And it's not obvious from the publicly available information whether they're close to even beating the FPGA emulation of the concept which was used in the paper.
Two linear transformations compose into a single linear transformation. If you have y = W2(W1*x) = (W2*W1)*x = W*x where W = W2*W1, you've just done one matrix multiply instead of two. The composition of linear functions is linear.
The ReLU breaks this because it's nonlinear: ReLU(W1*x) can't be rewritten as some W*x, so W2(ReLU(W1*x)) can't collapse either.
Without nonlinearities like ReLU, many layers of a neural network could be collapsed into a single matrix multiplication. This inherently limits the function approximation that it can do, because linear functions are not very good at approximating nonlinear functions. And there are many nonlinearities involved in modeling speech, video, etc.
https://extropic.ai/writing https://arxiv.org/abs/2510.23972
people are so scared of losing market share because of art choice they make all of their products smooth dark grey rectangles with no features.
ugly.
at least this one has some sense of beauty, the courage to make a decision about what looks good to them and act on it. they'll probably have to change the heptagon shape because no way that becomes a standard
it costs so little to add artistic flair to a product, its really a shame fewer companies do
This is "quantum" computing, btw.
The bad thing about Silicon Valley is charlatans abuse this openness and friendly spirit, and swindle investors of millions with pipe dreams and worthless technology. I think the second is inevitable as Silicon Valley becomes more famous, more high status without a strong gatekeeping mechanism which is also anathema to its open ethos. Unfortunately this company is firmly in the second category. A performative startup, “changing the world” to satisfy the neurosis of its founders who desperately want to be seen as someone taking risks to change the world. In reality it will change nothing, and go die into the dustbins of history. I hope he enjoys his 15 minutes of fame.
1. They build a chip that does random sampling far better than any GPU (is this even proven yet?)
2. They use a model architecture that utilizes this sampling advantage which means most of the computation must be concentrated at sampling. This might be true for energy based models or some future architecture we have no idea about. AFAIK, this is not even true for diffusion.
3. This model architecture must outcompete autoregressive models in economically useful tasks, whether language modeling or robotics etc, right now auto regressive transformers is still king across all tasks.
And then their chip will be bought by hyper scalers and their company will become successful. There is just so many if’s outside of them building their core technology that this whole project makes no sense. And you can say that this is true for all startups, I don’t think that’s the case, this is just ridiculous.
The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.
The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.
Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.
I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)
Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.
[1]: https://github.com/extropic-ai/thrml/blob/7f40e5cbc460a4e2e9...
Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.
I also have concerns that this is quite unlikely to be able to have sufficiently uncorrelated randomness in all channels to be useful in practical applications.
Does TSUs have to same issue?
[0]: https://www.normalcomputing.com
[1]: https://www.zach.be/p/making-unconventional-computing-practi...
To me, the biggest limitation is that you’d need an entirely new stack to support a new paradigm. It doesn’t seem compatible with using existing pretrained models. There’s plenty of ways to have much more efficient paradigms of computation, but it’ll be a long while before any are mature enough to show substantial value.
my handwavy analogy interpretation was they were in-effect building an analog computer for AI model training, using some ideas that originated in quantum computing. their insight is that since model training is itself probabilistic, you don't need discrete binary computation to do it, you just need something that implements the sigmoid function for training a NN.
they had some physics to show they could cause a bunch of atoms to polarize (conceptually) instantaneously using the thermodynamic properties of a material, and the result would be mostly deterministic over large samples. the result is what they are calling a "probabilistic bit" or pbit, which is an inferred state over a probability distribution, and where the inference is incorrect, they just "get it in post," because the speed of the training data through a network of these pbits is so much more efficient that it's faster to just augment and correct the result in the model afterwards than to use classical clock cycles to directly compute it.
Or if we call it analog is it too obvious what the problems are going to be?