Extropic Is Building Thermodynamic Computing Hardware

2 months ago

2 replies

I don't really understand the purpose of hyping up a launch announcement and then not making any effort whatsoever to make the progress comprehensible to anyone without advanced expertise in the field.

2 months ago

3 replies

That's the intention. Fill it up with enough jargon and gobbledegook that it looks impressive to investors, while hiding the fact that there's no real technology underneath.

2 months ago

2 replies

>jargon and gobbledegook

>no real technology underneath

They're literally shipping real hardware. They also put out a paper + posted their code too.

Flippant insults will not cut it.

2 months ago

2 replies

Nice try. It's smoke and mirrors. Tell me one thing it does better than a 20 year old CPU.

2 months ago

1 reply

More insults and a blanket refusal to engage with the material. Ok.

2 months ago

1 reply

If you think comparing hardware performance is an insult, then you have some emotional issues or are a troll.

2 months ago

1 reply

Ah, more insults. This will be my final reply to you.

I'll say it again. The hardware exists. The paper and code are there. If someone wants to insist that it's fake or whatever, they need to come up with something better than permutations of "u r stoopid" (your response to their paper: https://news.ycombinator.com/item?id=45753471). Just engage with the actual material. If there's a solid criticism, I'd like to hear it too.

WorldPeas

2 months ago

1 reply

I've noticed recently that HN is resembling slashdot more. I wonder what's causing it.

yeasku

2 months ago

Crypto was bad but AI really hurt HN.

It attracts a kind of people that are not aware of what hacker means

maradan

2 months ago

1 reply

This hardware is an analog simulator for Gibbs sampling, which is an idealized physical process that describes random systems with large scale structure. The energy efficient gains come from the fact that it's analog. It may seem like jargon, but Gibbs sampling is an extremely well known concept with decades of work with connections to many areas of statistics, probability theory, and machine learning. The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap, it's very similar to EBM learning/sampling but with the advantage of being able to sample larger systems for the same energy.

theamk

2 months ago

> The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap,

Is it?

The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.

In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.

I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.

2 months ago

1 reply

The fact that there's real hardware and a paper doesn't mean the product is actually worth anything. It's very possible to make something (especially some extremely simplified 'proof of concept' which is not actually useful at all) and massively oversell it. Looking at the paper, it looks like it may have some very niche applications but it's really not obvious that it would be enough to justify the investment needed to make it better than existing general purpose hardware, and the amount of effort that's been put into 'sizzle' aimed at investors makes it look disingenuous.

2 months ago

1 reply

>The fact that there's real hardware and a paper doesn't mean the product is actually worth anything.

I said you can't dismiss someone's hardware + paper + code solely based on insults. That's what I said. That was my argument. Speaking of which:

>disingenuous

>sizzle

>oversell

>dubious niche value

>window dressing

>suspicious

For the life of me I can't understand how any of this is an appropriate response when the other guy is showing you math and circuits.

2 months ago

2 replies

No, they're not showing just math and circuits, they're also showing a very splashy and snazzy front page which makes all kinds of vague, exciting sounding claims that aren't really backed up by the very boring (though sometimes useful) application of that math and circuits (neat how the design of those circuits may be).

If this was just the paper, I'd say 'cool area of research, dunno if it'll find application though'. I'm criticizing the business case and the messaging around it, not the implementation.

Two important questions I think illustrate my point:

1) The paper shows an FPGA implementation which has a 10x speedup compared to a CPU or GPU implementation. Extropic's first customer would have leapt up and started trying to use the FPGA version immediately. Has anyone done this?

2) The paper shows the projected real implementation being ~10x faster than the FPGA version. This is similar to the speedup going from an FPGA to an ASIC implementation of a digital circuit, which is a standard process which requires some notable up-front cost but much less than developing and debugging custom analog chips. Why not go this route, at least initially?

2 months ago

1 reply

I don't know about your earlier point, but those questions are perfectly reasonable and a springboard for further discussion. Yes, that's where the rubber hits the road. That's the way to go.

If Extropic (or any similar competitor) can unlock these hypothetical gains, I'd like to see it sorted out asap.

theamk

2 months ago

If they could answer those questions _before_ making fancy website with claims like "our breakthrough AI algorithms and hardware, which can run generative AI workloads using radically less energy than deep learning algorithms running on GPUs", they would be much better received. But they jumped to bombastic claims right away, so they now cause that scammy feeling. Hence the comments there.

imtringued

2 months ago

1 reply

The fact that they show a comparison with an FPGA is a red flag, because large scale generative AI is their biggest weakness.

FPGAs are superior in every respect for models of up to a few megabytes in size and scale all the way down to zero. If they are going for generative AI, they wouldn't even have bothered with FPGAs, because only the highest end FPGAs with HBM are even viable and even then, they come with dedicated AI accelerators.

2 months ago

One thing seems pretty clear from the papers and technical information is that the product is not really aimed at the current approach that is used by mainstream AI models in the first place (where random numbers are far from the bottleneck and sampling from a distribution is generally done by either picking a random starting point and then having a neural net move towards the 'closest' point or by having a neural net spit out a simplified distribution for part of the result and picking randomly from that. In this approach the neural net computation is completely deterministic and takes the bulk of the compute time).

The stuff they talk about in the paper is mainly about things that were in vogue when AI was called Machine Learning where you're essentially trying to construct and sample from very complicated distributions to try to represent your problem in a Bayesian way (i.e. trying to create a situation where you can calculate 'what's the most probable answer given this problem'. In this approach it's often useful to have a relatively small 'model' but to be able to feed random numbers predicated on it back into itself to be able to sample from a distribution which would otherwise be essentially intractable to sample from). This kind of thing was very successful for some tasks but AFAIK those tasks would generally be considered quite small today (and I don't know how many of those have now been taken over by neural nets anyway).

This is why I say it looks very niche and it feels like the website tries to just ride on the AI hypetrain by association with the term.

maradan

2 months ago

"no really technology underneath" zzzzzzzzzzz

fastball

2 months ago

You not comprehending a technology does not automatically make it vaporware.

lacy_tinpot

2 months ago

2 replies

What's not comprehensible?

It's just miniaturized lava lamps.

hirako2000

2 months ago

It proves the point it mesmerizes the audience.

One can create a true random generator algorithm by plugging a moving computer mouse to its input.

Would be easy to put a dozen cages with mouse wheels on in them, real mammals in there, to generate a lot of random numbers, everyone would understand so only funny, they want mysterious!

https://pubs.aip.org/aip/apl/article/119/15/150503/40486/Pro...

2 months ago

A lava lamps that just produces randomness, ie for cryptology purposes, is different than the benefit here, which is to produce specific randomness at low energy-cost

d_silin

2 months ago

6 replies

It is a hardware RNG they are building. The claim is that their solution is going to be more computationally efficient for a narrow class of problems (de-noising step for diffusion AI models) vs current state of the art. Maybe.

This is what they are trying to create, more specifically:

A_D_E_P_T

2 months ago

2 replies

An old concept indeed! I think about this Ed Fredkin story a lot... In his words:

"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."

RIP Ed. https://en.wikipedia.org/wiki/Edward_Fredkin

Imnimo

2 months ago

1 reply

And still today we spend a great deal of effort trying to make our randomly-sampled LLM outputs reproducibly deterministic:

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

heavenlyblue

2 months ago

1 reply

can't you just save the seed?

bfkwlfkjf

2 months ago

My understanding is that because GPUs do operations in a highly parallelized fashion, and because float point operations aren't commutative, then once you're using GPUs the seed isn't enough, no. You'd need the seed plus the specific order in which each of intermediate steps of the calculation was finished by the various streaming multiprocessors.

https://en.wikipedia.org/wiki/Lavarand

2 months ago

It's funny because that did actually reappear at some point with rdrand. But still it's only really used for cryptography, if you just need a random distribution almost everyone just uses a PRNG (a non-cryptographic one is a lot faster still, apart from being deterministic).

vlovich123

2 months ago

1 reply

Generating randomness is not a bottleneck and modern SIMD CPUs should be more than fast enough. I thought they’re building approximate computation where a*b is computed within some error threshold p.

UltraSane

2 months ago

Generating enough random numbers with the right distribution for Gibbs sampling, at incredibly low power is what their hardware does.

jazzyjackson

2 months ago

1 reply

I think that's underselling it a bit, since there's lots of existing ways to have A hardware RNG. They're trying to use lots and lots of hardware RNG to solve probabilistic problems a little more probabilisticly.

pclmulqdq

2 months ago

I tried this, but not with the "AI magic" angle. It turns out nobody cares because CSPRNGs are random enough and really fast.

modeless

2 months ago

1 reply

It's not just a "hardware RNG". An RNG outputs a uniform distribution. This hardware outputs randomness with controllable distributions, potentially extremely complex ones, many orders of magnitude more efficiently than doing it the traditional way with ALUs. The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.

I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.

bfkwlfkjf

2 months ago

> The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.

Could you provide some keywords to read more about this?

TYPE_FASTER

2 months ago

adrian_b

2 months ago

The article linked by you uses magnetic tunnel junctions for implementing the RNG part.

The Web site of Extropic claims that their hardware devices are made with standard CMOS technology, which cannot make magnetic tunnel junctions.

So it appears that there is no connection between the linked article and what Extropic does.

The idea of stochastic computation is not at all new. I have read about such stochastic computers as a young child, more than a half of century ago, long before personal computers. The research on them was inspired by the hypotheses about how the brain might work.

Along with analog computers, stochastic computers were abandoned due to the fast progress of deterministic digital computers, implemented with logic integrated circuits.

So anything new cannot be about the structure of stochastic computers, which has been well understood for decades, but only about a novel extremely compact hardware RNG device, which could be scaled to a huge number of RNG devices per stochastic computer.

I could not find during a brief browsing of the Extropic site any description about the principle of their hardware RNG, except that it is made with standard CMOS technology. While there are plenty of devices made in standard CMOS that can be used as RNG, they are not reliable enough for stochastic computation (unless you use complex compensation circuits), so Extropic must have found some neat trick to avoid using complex circuitry, assuming that their claims are correct.

However I am skeptical about their claims because of the amount of BS words used on their pages, which look like taken from pseudo-scientific Star Trek-like mumbo-jumbo, e.g. "thermodynamic computing", "accelerated intelligence", "Extropic" derived from "entropic", and so on.

To be more clear, there is no such thing as "thermodynamic computing" and inventing such meaningless word combinations is insulting for the potential customers, as it demonstrates that the Extropic management believes that they must be naive morons.

The traditional term for such computing is "stochastic computing". "Stochastics" is an older, and in my opinion better, alternative name for the theory of probabilities. In Ancient Greek, "stochastics" means the science of guessing. Instead of "stochastic computing" one can say "probabilistic computing", but not "thermodynamic computing", which makes no sense (unless the Extropic computers are dual use, besides computing, they also provide heating and hot water for a great number of houses!).

Like analog computers, stochastic computers are a good choice only for low-precision computations. With increased precision, the amount of required hardware increases much faster for analog computers and for stochastic computers than for deterministic digital computers.

The only currently important application that is happy even with precisions under 16 bit is AI/ML, so trying to market their product for AI applications is normal for Extropic, but they should provide more meaningful information about what advantages their product might have.

rvz

2 months ago

3 replies

This looks really amazing if not unbelieveable to the point where it is almost too good to be real.

I have not seen benchmarks on Extropic's new computing hardware yet but need to know from experts who are in the field of AI infrastructure at the semiconductor level if this is legit.

I'm 75% believing that this is real but have a 25% skepticisim and will reserve judgement until others have tried the hardware.

So my only question for the remaining 25%:

Is this a scam?

unsupp0rted

2 months ago

1 reply

I doubt it’s a scam. Beff might be on to something or completely delusional, but not actively scamming.

vzcx

2 months ago

1 reply

The best conmen have an abundance confidence in themselves.

delichon

2 months ago

1 reply

This one just released a prototype platform, "XTR-0", so if it's a fraud the jig will shortly be up.

https://extropic.ai/writing/inside-x0-and-xtr-0

natosaichek

2 months ago

I think it's more a concern that the hardware isn't useful in the real world, rather than that the hardware doesn't meet the description they provide of it.

arisAlexis

2 months ago

Too good to be true, incomprehensible jargon to go along...

KaiserPro

2 months ago

I mean it sure looks like a scam.

I really like the design of it though.

behnamoh

2 months ago

2 replies

Usually there's a negative correlation between the fanciness of a startup webpage and the actual value/product they'll deliver.

This gives "hype" vibes.

noir_lord

2 months ago

2 replies

I'm more impressed that my laptop fans came on when I loaded the page.

It's the one attached to my TV that just runs movies/YT - I don't recall the last time I heard the fans.

imploded_sub

2 months ago

1 reply

Same on a serious dev machine. That page just pegs a core at max, it's sort of impressive.

noir_lord

2 months ago

I ran it through googles pagespeed insights.

It scored 34 for mobile and completely timed out for desktop with a time limit exceeded warning.

It’s hilariously bad.

ecshafer

2 months ago

They did say thermodynamic computing.

wfurney

2 months ago

1 reply

Interesting you say that, I had an instinctual reaction in that vein as well. I chalked it up to bias since I couldn’t think of any concrete examples. Something about the webpage being so nice made me think they’ve spent a lot of time on it (relative to their product?) Admittedly I’m nowhere close to even trying to understand their paper, but I’m interested in seeing what others think about it

2 months ago

I've seen it as well. One thing that's universally true about potential competitor startups in the field I work in is that the ones who don't actually have anything concrete to show have way nicer websites than ours (some have significantly more funding and still nothing to show).

I have a passing familiarity with the areas they talk about in the paper, and it feels... dubious. Mainly because of the dedicated accelerator problem. Even dedicated neural net accelerators are having difficulty gaining traction against general purpose compute units in a market that is ludicrously hot for neural net processing, and this is talking about accelerating Monte-Carlo processes which are pretty damn niche in application nowadays (especially in situations where you're compute-limited). So even if they succeed in speeding up that application, it's hard to see how worthwhile it would be. And it's not obvious from the publicly available information whether they're close to even beating the FPGA emulation of the concept which was used in the paper.

vlovich123

2 months ago

2 replies

I’ve been wondering how long it would take for someone to try probabilistic computing for AI workloads - the imprecision inherent in the workload makes it ideally suited for AI matrix math with a significant power reduction. My professor in university was researching this space and it seemed very interesting. I never thought it could supplant CPUs necessarily but certainly massive computer applications that don’t require precise math like 3D rendering (and now AI) always seemed like a natural fit.

Imustaskforhelp

2 months ago

1 reply

I don't think that it does AI matrix math with significant power reduction but rather it just seems to provide rng? I may be wrong but I don't think what you are saying is true in my limited knowledge, maybe someone can tell what is the reality of it, whether it can do Ai matrix math with significant power reduction or not or if its even their goal right now as to me currently it feels like a lava lamp equivalent* thing as some other commenter said

https://extropic.ai/writing https://arxiv.org/abs/2510.23972

2 months ago

The paper talks about some quite old-school AI techniques (the kind of thing I learned about in university a decade ago when it was already on its way out). It's not anything to do with matrix multiplications (well, anything do with computing them faster directly) but instead being able to sample from a complex distribution more efficiently by have dedicated circuits to simulate elements of that distribution in hardware. So it won't make your neural nets any faster.

6510

2 months ago

I'm still waiting for my memristors.

docandrew

2 months ago

1 reply

Hype aside, if you can get an answer to a computing problem with error bars in significantly less time, where precision just isn’t that important (such as LLMs) this could be a game changer.

alyxya

2 months ago

1 reply

Precision actually matters a decent amount in LLMs. Quantization is used strategically in places that’ll minimize performance degradation, and models are smart enough so some loss in performance still gives a good model. I’m skeptical how well this would turn out, but it’s probably always possible to remedy precision loss with a sufficiently larger model though.

fastball

2 months ago

1 reply

LLMs are inherently probabilistic. Things like ReLU throw out a ton of data deliberately.

alyxya

2 months ago

1 reply

No that isn’t throwing out data. Activation functions perform a nonlinear transformation to increase the expressivity of a function. If you did two matrix multiplications without ReLU in between, your function contains less information than with a ReLU in between.

fastball

2 months ago

1 reply

How are you calculating "less information"?

shwaj

2 months ago

I think what they meant was:

Two linear transformations compose into a single linear transformation. If you have y = W2(W1*x) = (W2*W1)*x = W*x where W = W2*W1, you've just done one matrix multiply instead of two. The composition of linear functions is linear.

The ReLU breaks this because it's nonlinear: ReLU(W1*x) can't be rewritten as some W*x, so W2(ReLU(W1*x)) can't collapse either.

Without nonlinearities like ReLU, many layers of a neural network could be collapsed into a single matrix multiplication. This inherently limits the function approximation that it can do, because linear functions are not very good at approximating nonlinear functions. And there are many nonlinearities involved in modeling speech, video, etc.

trevor_extropic

2 months ago

3 replies

If you want to understand exactly what we are building, read our blogs and then our paper

throwaway_7274

2 months ago

3 replies

I was hoping the preprint would explain the mysterious ancient runes on the device chassis :(

2 months ago

2 replies

The answer is that they're cosplaying sci-fi movies, in attempt to woo investors.

simonerlic

2 months ago

1 reply

What, is a bit of whimsy illegal?

2 months ago

1 reply

A product of dubious niche value that has this much effort put into window dressing is suspicious.

2 months ago

how much effort is it really to draw some doodles on the 3d model?

dmos62

2 months ago

1 reply

Why are you replying under every other comment here in this low effort, negative manner?

2 months ago

i think they're a hater

2 months ago

2 replies

i dig it.

people are so scared of losing market share because of art choice they make all of their products smooth dark grey rectangles with no features.

ugly.

at least this one has some sense of beauty, the courage to make a decision about what looks good to them and act on it. they'll probably have to change the heptagon shape because no way that becomes a standard

it costs so little to add artistic flair to a product, its really a shame fewer companies do

peteforde

2 months ago

1 reply

You might like Storm Summoner. https://kabaragoya.com

2 months ago

yeah this is a great example!

WorldPeas

2 months ago

When I was a child, I was so enchanted by the look of the Cray supercomputers of old with their in-built furniture and great open arrays of status indicators. There is really something to making a machine show you the wonder of creation it unlocks

ehnto

2 months ago

It looks super cool. I feel like I'm watching cyberpunk come to life with the way we're talking about technology these days, but this also looks straight out of the Neuromancer of my imagination.

2 months ago

can you play doom on it, yet?

chermi

2 months ago

Could you explain to me how you could reasonably justify not citing even one of Normal Computing's works? I can't imagine you're unaware of them or their works. You cite the Thermodynamic Computing group paper.

moralestapia

2 months ago

1 reply

Nice!

This is "quantum" computing, btw.

trevormccrt

2 months ago

1 reply

Actually it's not. Here's some stuff to read to get a clearer picture! https://extropic.ai/writing

moralestapia

2 months ago

It strictly is not, as no quantum phenomena is being measured (hence why I used the quotes); but if all goes well w/ extropic you'll most likely end up doing quantum again.

sashank_1509

2 months ago

1 reply

The cool thing about Silicon Valley is serious people try stuff that may seem wild and unlikely and in the off chance it works, entire humanity benefits. This looks like Atomic Semi, Joby Aviation, maybe even OpenAI in its early days.

The bad thing about Silicon Valley is charlatans abuse this openness and friendly spirit, and swindle investors of millions with pipe dreams and worthless technology. I think the second is inevitable as Silicon Valley becomes more famous, more high status without a strong gatekeeping mechanism which is also anathema to its open ethos. Unfortunately this company is firmly in the second category. A performative startup, “changing the world” to satisfy the neurosis of its founders who desperately want to be seen as someone taking risks to change the world. In reality it will change nothing, and go die into the dustbins of history. I hope he enjoys his 15 minutes of fame.

2 months ago

1 reply

What makes you so sure that extropic is the second and not the first?

sashank_1509

2 months ago

2 replies

Fundamentally, gut feels by following the founder on Twitter. But if I had to explain, I don’t understand the point of speeding up or getting true RnG, even for diffusion models this is not a big bottleneck, so it sounds more like a buzzword than actual practical technology.

jazzyjackson

2 months ago

1 reply

Having a TRNG is easy, you just reverse bias a zener diode or any number of other strategies that rely on physics for noise. Hype is a strategy they're clearly taking, but people in this thread are so dismissive (and I get why, extropic has been vague posting for years and makes it sound like vaporware) but what does everything think they're actually doing with the money? It's not a better dice roller...

[1]: https://github.com/extropic-ai/thrml/blob/7f40e5cbc460a4e2e9...

2 months ago

1 reply

What is it if not a better dice roller though? Isn't that what they are claiming it is? And also that this better dice rolling is very important (and I admittedly am not someone who can evaluate this)

sashank_1509

2 months ago

Yes, I think they claim they are a far better dice roller in randomness and speed and that this is very important. The first might be true, but I don’t see why second is in any way true. These all need to be true for this company to make sense :

1. They build a chip that does random sampling far better than any GPU (is this even proven yet?)

2. They use a model architecture that utilizes this sampling advantage which means most of the computation must be concentrated at sampling. This might be true for energy based models or some future architecture we have no idea about. AFAIK, this is not even true for diffusion.

3. This model architecture must outcompete autoregressive models in economically useful tasks, whether language modeling or robotics etc, right now auto regressive transformers is still king across all tasks.

And then their chip will be bought by hyper scalers and their company will become successful. There is just so many if’s outside of them building their core technology that this whole project makes no sense. And you can say that this is true for all startups, I don’t think that’s the case, this is just ridiculous.

alwahi

2 months ago

thats is the main reason i don't trust extropic.

Void_

2 months ago

2 replies

This gives me Devs vibe (2020 TV Series) - https://www.indiewire.com/awards/industry/devs-cinematograph...

tcdent

2 months ago

1 reply

Such an underrated TV show.

Void_

2 months ago

Yes, the billionaire driving a Subaru Forester was my favorite part

vortegne

2 months ago

That's what they're trying to do, yeah. To give off a cool vibe I mean. To raise more money. There is nothing even remotely as cool in their real (or not) product. I was very excited when they started specifically because of their cool branding, but the vibe quickly wears off.

antics

2 months ago

1 reply

I like this but based on what I am seeing here and the THRML readme, I would describe this as "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015." A kind of AI equivalent of, like, post-9/11 airport security. I mean this in a value-neutral way, as personally I think that era of models was very beautiful.

The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.

The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.

Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.

I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)

Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.

wasabi991011

2 months ago

1 reply

Really informative insight, thanks. I'm not too familiar with those models, is there any chance that this hardware could lead to a renaissance of sample-based methods? Given efficient hardware, would they scale to LLM size, and/or would they allow ML to answer some types of currently unanswerable questions?

antics

2 months ago

Any time something costs trillionths of a cent to do, there is an enormous economic incentive to turn literally everything you can into that thing. Since the 50s “that thing” has been arithmetic, and as a result, we’ve just spent 70 years trying to turn everything from HR records to images into arithmetic.

Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.

mikewarot

2 months ago

Even if this works as intended, the hardware is needs at least another order of magnitude precision to be able to use gradient descent methods. Without that you're stuck with genetic or other less effective strategies to train models.

I also have concerns that this is quite unlikely to be able to have sufficiently uncorrelated randomness in all channels to be useful in practical applications.

est

2 months ago

I listened to the Hinton podcast few days ago, he mentioned (IIRC) that "analog" AIs are bad because the models can not be transfered/duplicated in a lossless way, like in .gguf format, every analog system is built differently you have to re-learn/re-train again somehow

Does TSUs have to same issue?

lordofgibbons

2 months ago

How should we think about how much effective compute is being done with these devices compared to classical (GPU) computing? Obviously FLOPs doesn't make sense, so what does?

arjie

2 months ago

Has anyone received the dev board? What did you do with it? Curious what this can do.

hl_maker

2 months ago

In non deterministic computation, verificaition will be key challenge. Curious to see how companies address this.

moktonar

2 months ago

I think you at Extropic should find a way of “entangling “ your pbits..

quantumHazer

2 months ago

there is also Normal Computing[0] that are trying different approaches to chips like that. Anyway these are very difficult problems and Extropic already abandoned some of their initial claims about superconductors to pivot to more classical CMOS circuits[1]

[0]: https://www.normalcomputing.com

[1]: https://www.zach.be/p/making-unconventional-computing-practi...

jabedude

2 months ago

Question for the experts in the field: why does this need to be a CPU and not a dongle you plug into a server and query?

Jaauthor

2 months ago

In this house we obey the laws of Thermodynamics!

alyxya

2 months ago

This seems to be the page that describes the low level details of what the hardware aims to do. https://extropic.ai/writing/tsu-101-an-entirely-new-type-of-...

To me, the biggest limitation is that you’d need an entirely new stack to support a new paradigm. It doesn’t seem compatible with using existing pretrained models. There’s plenty of ways to have much more efficient paradigms of computation, but it’ll be a long while before any are mature enough to show substantial value.

hereme888

2 months ago

Looks like an artifact from Assassin's Creed or Halo.

WorldPeas

2 months ago

I really don't get why they decided to choose usb-b-3.0 and usb-micro-b-2.0 female for this? Usb-c is so much cheaper and more common at this point. Why bother with using such old plugs, especially so when one plug could do the job of 2?

motohagiography

2 months ago

i've followed them for a while and as just a general technologist and not a scientist, i have a probably wrong idea of what they do, but perhaps correcting it will let others write about it more accurately.

my handwavy analogy interpretation was they were in-effect building an analog computer for AI model training, using some ideas that originated in quantum computing. their insight is that since model training is itself probabilistic, you don't need discrete binary computation to do it, you just need something that implements the sigmoid function for training a NN.

they had some physics to show they could cause a bunch of atoms to polarize (conceptually) instantaneously using the thermodynamic properties of a material, and the result would be mostly deterministic over large samples. the result is what they are calling a "probabilistic bit" or pbit, which is an inferred state over a probability distribution, and where the inference is incorrect, they just "get it in post," because the speed of the training data through a network of these pbits is so much more efficient that it's faster to just augment and correct the result in the model afterwards than to use classical clock cycles to directly compute it.

fidotron

2 months ago

Is this the new term for analog VLSI?

Or if we call it analog is it too obvious what the problems are going to be?