Gmp Damaging Zen 5 Cpus?

Posted4 months agoActive4 months ago

sequin

249 points

233 comments

gmplib.orgTech DiscussionstoryHigh profile

skepticalnegative

Debate

50/100

Gmp LibraryCPU DesignAI Performance Analysis

Key topics

Gmp Library

CPU Design

AI Performance Analysis

The tech world is abuzz with speculation about whether GMP is damaging Zen 5 CPUs, sparking a lively debate among commenters. Some, like craftkiller, are poring over the AM5 pinout, hypothesizing that the affected pins are related to power delivery, while others, such as raverbashing, caution that the connection between silicon area and pin layout isn't always straightforward. As the discussion unfolds, opinions on the reporting style of Gamers Nexus, a prominent tech journalist, are sharply divided, with some praising their investigative zeal and others criticizing their sensationalist approach. Amidst the back-and-forth, a consensus emerges that crusader-style reporting has its value in consumer advocacy, even if it can be polarizing.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

0-6h

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Aug 27, 2025 at 12:24 PM EDT
4 months ago
Step 01
02First comment
Aug 27, 2025 at 2:30 PM EDT
2h after posting
Step 02
03Peak activity
59 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Aug 30, 2025 at 7:40 PM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (233 comments)

Showing 160 comments of 233

craftkiller

4 months ago

1 reply

Looking at the AM5 pinout[0], it looks like those pins are VDDCR and VSS. There might be a little bit of PCIe sprinkled in towards the outer edges, but I'm not 100% on the orientation of this pinout vs the orientation of the CPU. I don't know anything about electricity so I've got nothing else to add.

[0] https://upload.wikimedia.org/wikipedia/commons/2/2d/Socket_A...

raverbashing

4 months ago

1 reply

This is a nice guess but the likelihood that actual silicon area is closely connected to the pins in that area is not so obvious

nsteel

4 months ago

Isn't almost every other pin going to be power/ground on a high-power chip like this? On both the package and the die.

topspin

4 months ago

7 replies

As there is ongoing drama with Zen 5 and power issues, there are people with the instruments and the motivation to investigate this. You should consider contacting Gamers Nexus, and help them to get your test suite running. They can measure power draw and do a thermal analysis of this CPU, and they'd likely be eager to do it, given the possibility of making a bunch of dramatic YouTube content about design flaws in widely used hardware. That's pretty much their whole schtick in recent years.

> Modern CPUs measure their temperature and clock down if they get too hot, don't they?

Yes. It's rather complex now and it involves the motherboard vendor's firmware. When (not if) they get that wrong CPUs burn up. You're going to need some expertise to analyze this.

fxtentacle

4 months ago

4 replies

He's a bit sensationalist, yes, but I am thankful that he saved us from buying affected Intel CPUs.

tpurves

4 months ago

2 replies

Yes. When he's right, he's right. However the main issue I have with GN is how Steve tends to go full Leeroy Jenkins pitchforks and torches for 9 out of every 5 actual scandals in the tech industry.

jchw

4 months ago

When it comes to interpersonal drama, the "Shoot first, ask questions later" style of reporting is terrible. However, for consumer advocacy it's basically the opposite, especially because in most cases it's easy for companies to turn the narrative around by simply handling the issue well. It's almost more about how they handle it than the actual issue in many cases.

CaptainBanger

4 months ago

I felt the same way, but over time I have come to respect those with the Crusader personality archetype, we need these people to do their thing and they need us to balance them out.

bayindirh

4 months ago

1 reply

He's a "student" and friend of late Gordon Mah Ung. He's carrying his torch forward.

This was Gordon's style, and Steve is continuing it. He has the courage to hit Bloomberg offices with a cameraman, so I don't think his words ring hollow.

We need that kind of in your face, no punches held back type of reporting when compared to "measured professionals".

mft_

4 months ago

Absolutely - this is the sort of direct citizen journalism I expect (sort of hope?) we'll see more and of as traditional investigative journalism dies its slow death.

hnuser123456

4 months ago

GN wasn't the first to break the story the 13/14th gen was defective. The thousands and thousands of users experiencing the issues collectively noticed pretty quick. If anything, there was a period where he was saying "We've talked to Intel but we won't say anything yet until they do."

spookie

4 months ago

Not sure of sensationalist or just doing great reporting. I take him as one of the last good tech journalists on the platform.

MrGilbert

4 months ago

2 replies

> [...] a bunch of dramatic YouTube content [...]

That framing doesn't do him and the team justice. There is (or better, was) a 3.5h long story about NVIDIA GPUs finding their ways illegaly from the US to China, which got taken down by a malicious DMCA claim from Bloomberg. It is quite interesting to watch (Can be found archive.org).

GN is one of the last pro-consumer outlets, that keep on digging and shaking the tree big companys are sitting on.

themafia

4 months ago

2 replies

> which got taken down

Not everywhere:

https://archive.org/details/the-nvidia-ai-gpu-black-market-i...

sitkack

4 months ago

2 replies

https://www.youtube.com/watch?v=ZyWelvEP_CQ

mook

4 months ago

6 replies

Given that Gamers Nexus needs the ad revenue wouldn't linking to a re-upload that wouldn't give them any of that be sort of bad?

dspillett

4 months ago

1 reply

While they can't publish it themselves this at least achieves the goal of the information being spread, along with the knowledge that it was their investigative team that did the work in the first place.

But yes, once they reedit and republish themselves (or manage some sort of appeal and republish as-is) then of course linking to that (and a smaller cut of the parts they've had to change because Bloomberg were litigious arseholes, if only to highlight that their copyright claim here is somewhat ridiculous) would be much better.

gdwatson

4 months ago

It sounds like their lawyers have done the appropriate counter-challenge with YouTube, so the video will go back up unless Bloomberg sues them in the next so many days. And this is Gamers Nexus, so I presume they will fight to keep it as is on principle.

Personally, I found the length of the quotes from politicians kind of tedious, but I sure wouldn’t want them to capitulate to Bloomberg after this.

crote

4 months ago

YouTube ad revenue isn't as high as you'd think. A very significant part of their income comes from in-video sponsors and merchandise sales.

gblargg

4 months ago

Agreed. I'm waiting until they get it back up to watch it. I can wait.

Alcatros552

4 months ago

buy a mug from them to support them!

akimbostrawman

4 months ago

They do not "need" it since that movie was crowd funded with over 400k anyways and AdSense are pittance in comparison. They also have indirectly promoted that reupload.

FirmwareBurner

4 months ago

>Given that Gamers Nexus needs the ad revenue

They made six figures from merch sales on that investigation. Not much, but more than Youtube ads.

foresto

4 months ago

That copy is missing the chapters. Here they are:

00:00:00 - The NVIDIA AI GPU Black Market

00:06:06 - WE NEED YOUR HELP

00:07:41 - A BIG ADVENTURE

00:10:10 - Ignored by the US

00:11:46 - BACKGROUND: Why They're Banned

00:16:04 - TIMELINE

00:21:32 - H20 15 Percent Revenue Share with the US

00:26:01 - Calculating BANNED GPUs

00:29:31 - OUR INFORMANTS

00:31:47 - THE SMUGGLING PIPELINE

00:33:39 - PART 1: HONG KONG Demand Drivers

00:43:14 - PART 1: How Do Suppliers Get the GPUs?

00:48:18 - PART 1: GPU Rich and GPU Poor

00:56:19 - PART 1: DATACENTER with Banned GPUs, AMD, Intel

01:06:19 - PART 1: Chinese Military, Huawei GPUs

01:09:48 - PART 1: How China Circumvents the Ban

01:19:30 - PART 1: GPU MARKET in Hong Kong

01:32:39 - WIRING MONEY TO CHINA

01:36:29 - PART 2: CHINA Smuggling Process

01:43:26 - PART 3: SHENZHEN's GPU MIDDLEMEN

01:50:22 - PART 3: AMD and INTEL GPUs Unwanted

01:56:34 - PART 4: THE GPU FENCE

02:06:01 - PART 4: FINDING the GPUs

02:15:12 - PART 4: THE FIXER IC Supplier

02:21:12 - PART 5: GPU WAREHOUSE

02:27:17 - PART 6: CHOP SHOP and REPAIR

02:34:52 - PART 6: BUILD a Custom AI GPU

02:56:33 - PART 7: FACTORY

03:01:01 - PART 8: TAIWAN and SINGAPORE Intermediaries

03:02:06 - PART 9: SMUGGLER

03:05:11 - LEGALITY of Buying and Selling

03:08:05 - CORRUPTION: NVIDIA and Governments

03:26:51 - SIGNOFF

mrheosuper

4 months ago

2 replies

When something is uploaded to the internet, it won't be easy to take it down.

Ask Beyonce.

Mistletoe

4 months ago

2 replies

Can you explain further about Beyoncé? Do you mean the elevator video where her sister attacks Jay Z?

inferiorhuman

4 months ago

1 reply

http://i.imgur.com/oyUfxWE.jpg

Mistletoe

4 months ago

Haha I love that one. She should have just leaned into it and laughed.

mrheosuper

4 months ago

There is a picture on internet that Beyonce and her lawyers don't like. They tried to remove it from the internet.

You guess the result.

frabert

4 months ago

Or Barbra Streisand

topspin

4 months ago

For the record, I think GN is excellent and highly credible.

trebligdivad

4 months ago

3 replies

They don't say what temperature the CPU was reporting which seems like an odd omission. Whatever the specs of your cooler etc check the temperature it's actually running at. Go by what the CPU is saying! I've got the older 3950x, and the first one died after a few months (still in warranty) with a cooler in spec, but it would go into the 90s at full load just doing big builds. I replaced the heatsink with a basic watercooler when the replacement chip arrived and it's running at least 20c cooler at full load.

account42

4 months ago

1 reply

Zen 2 is supposed to be able to work up to 95 C so that shouldn't have caused your CPU to fail. And it should clock down before it fails anyway, way below the specified "minimum" frequency if needed - got to experience that with a failing AIO. A better cooler should only be required to make full use of your CPU not to protect it.

trebligdivad

4 months ago

I kind of agree with you and Symmetry; but having had a fried CPU I'm more careful. No electronics like running very hot - so even if you're just inside spec on something for the heat it's likely to live a shorter life than if you kept it more comfortable - and it'll let it clock faster if you keep it cool! And really my points are: * the standard spec coolers just don't manage that on these hot CPUs, even if they claim to. * If you're building a machine and you know you're pushing it hard, just check the temperatures to check that cooling you bought is working.

wkat4242

4 months ago

Maybe they didn't have anything logging the temperature. They didn't expect it to die after all.

Symmetry

4 months ago

A modern CPU should be able to detect temperature excursions and bring itself to a safe halt even if you power it up without any cooler attached. It's normal and expected that people making mistakes around the cooling systems of their CPUs will accidentally give themselves terrible performance. It is not normal that the CPUs will break.

spoaceman7777

4 months ago

3 replies

All you really need to see is the picture of the CPU with thermal paste only on one half. Thermal throttling is tuned to work when there is 1. a sufficient heatsink (theirs was significantly below requirements) and 2. it is installed correctly so that its triggers for downclocking happen with the correct timing. This is just another instance of ridiculous PEBCAK error

userbinator

4 months ago

1 reply

I'm not as sure about AMD CPUs (and they were known for having far worse overheat behaviour back in the early 2000s) but there are plenty of stories of Intel CPUs working for many years, sitting at the thermal limits, with the (stock) heatsink not even in contact, thanks to their cheap push-pin retention mechanism.

munchlax

4 months ago

Those dreadful plastic knobs never want to sit right. Simple lever over that shit any time of day, pls.

ndiddy

4 months ago

2 replies

This is per design. On AM5 processors, there's a hotspot on the lower half of the processor where the dies that contain the CPU cores are located. Noctua recommends that AM5 users mount their coolers shifted towards the lower side of the processor for optimal cooling performance, see https://noctua.at/en/offset-am5-mounting-technical-backgroun... . You may have missed the paragraph in the article that explicitly points this out:

> We use a Noctua cooling solution for both systems. For the 1st system, we mounted the heat sink centred. For the 2nd system, we followed Noctua's advice of mounting things offset towards what they claim to be the hotter side of the CPU. Below is a picture of the 2nd system without the heat sink which shows that offset. Note the brackets and their pins, those pins are where the heat sink's pressure gets centred. Also note how the thermal paste has been squeezed away from that part, but is quite thick towards the left.

nolok

4 months ago

1 reply

While it is noctua advice, I don't think AMD supports that view, so it would seem correct to at least test the cpu the way the vendor recommends before making conclusions

ndiddy

4 months ago

You may have missed the part in the article that says that they only switched to offset mounting after their first Ryzen 9950X died when the cooler was mounted centered.

> But note that the 1st failure happened with a more centred heat sink. We only made the off-centre mounting for the 2nd system as to minimise the risk of a repeated system failure.

dukezzz

4 months ago

1 reply

Noctua recommends mounting their cooler so that the center is shifted toward the lower part of the CPU. From your picture with the thermal paste, it’s clear that your cooler is only making contact with about two-thirds of the CPU, meaning you mounted it incorrectly. The cooler’s contact area must always cover the entire CPU; otherwise, you reduce heat transfer capacity. On top of that, you’re already using an undersized cooler for this CPU. I think you don’t understand the basics of thermodynamics.

ndiddy

4 months ago

1 reply

Welcome to Hacker News! I'm glad my comment encouraged you to join the site.

I didn't write the article, I was just commenting because other users seemed to miss information that was written in it.

The picture with the thermal paste shows that paste was squeezed out from the entire perimeter of the CPU, so the cooler is making contact with the whole CPU. The paste is squeezed thinner near the lower side of the CPU because that's where the mounting pins are located, meaning that's where the mounting pressure is the strongest. The impression left by the thermal paste matches the diagram on Noctua's site ( https://noctua.at/pub/media/wysiwyg/offset/heat_cooler_base_... ).

Noctua lists the NH-U9S cooler as being compatible with the 9950X, and claims it has "medium turbo/overclocking headroom", see https://ncc.noctua.at/cpus/model/AMD-Ryzen-9-9950X-1831 . I'm not sure how they come up with their compatibility ratings, but I generally trust Noctua knows what they're doing when it comes to CPU cooling.

It's also important to note that the author only tried the offset mount after they had a CPU die when the cooler was mounted centered on the CPU.

Overall, I think it's unlikely that these failures can be blamed on poor cooling.

spoaceman7777

4 months ago

i'm not sure what image you're looking at, but the picture in question here most certainly shows a CPU that did not have a properly mounted heatsync (to a very severe degree)

marshray

4 months ago

1 reply

Clearly paste was squeezed out from the entire perimeter of the CPU. Offset mounting is used intentionally for this CPU.

Probably there's less paste remaining on the south end of the CPU because that's where the mounting force is greatest.

If anything, there's too much paste remaining on the center/north end of the CPU. Paste exists simply to bridge the roughness of the two metal surfaces, too much paste is a bad sign.

My guess is that the MB was oriented vertically and that big heavy heat sink with the large lever arm pulled it away from the center and north side of the CPU.

IMO, the CPU is still responsible for managing its power usage to live a long life. The only effect of an imperfect thermal solution ought to be proportionally reduced performance.

mrheosuper

4 months ago

1 reply

Many reviewers have tested that too much paste is not an issue, except being messy to clean.

marshray

4 months ago

The experiments comparing different paste and application methods I've seen only make 1-2 degree C difference. Which enthusiasts might care alot about, but most people wouldn't notice.

RachelF

4 months ago

4 replies

AMD has failed to be reliable with its Zen 4 and Zen 5 consumer CPUs, just at the same time Intel did the same with their 13k and 14k higher end CPUs.

AMD is somewhat worse than Intel as their DDR5 memory bus is very "twitchy" making it hard to get the highest DDR5 timings, especially with multiple DIMMs per channel.

AuryGlenz

4 months ago

I had to put together an AM5 computer pretty quickly after I accidentally fried some components in my last computer, so I got a Microcenter bundle.

I got 2x32GB sticks of RAM with the plan to throw in another two sticks later. I had no idea that was now a bad plan. I wish manufacturers would have just put 2 DIMM slots on motherboards as a “warning.”

a-french-anon

4 months ago

What do you mean? Is your second sentence the only reason for the first?

akimbostrawman

4 months ago

I don't think it's reasonable to call memory timing tweaking stability issues worse than a cpu dying from heat under normal usage.

rangestransform

4 months ago

I think that's just a result of being at the limit of what a right-angle memory slot can handle, it's about time that desktop move to CAMM or soldered memory

adrian_b

4 months ago

1 reply

The small coolers used by them are not recommended by Noctua for 9950X. Noctua recommends only bigger coolers for 9950X, which dissipates 200 W permanently on a workload like theirs (which is much less than the over 250 W dissipated in similar conditions by the competing Intel CPUs).

Despite this, the overtemperature protection of the CPUs should have protected the CPUs and prevent any kind of damage like this.

Besides the system that varies continuously the clock frequency to keep the CPU within the current and power consumption limits, there is a second protection that stops temporarily the clock when a temperature threshold is exceeded. However, the internal temperature sensors of the CPUs are not accurate, so the over-temperature protection may begin to act only at a temperature that is already too high.

So these failures appear to have been caused by a combination of not using the appropriate coolers for a 200 W CPU, combined with the fact that AMD advertises a 200-W CPU as an 170-W CPU, fooling naive customers into believing that smaller coolers are acceptable, and with either some kind of malfunction of the over-temperature protection in these CPUs or with a degradation problem that happens even within the nominal temperature range, but at its upper end.

ollien

4 months ago

1 reply

> The small coolers used by them are not recommended by Noctua for 9950X

Noctua's CPU compatibility page lists the NH-U9s as "medium turbo/overclocking headroom" for the 9950X [0]. I don't think it's fair to suggest their cooler choice is the problem here.

[0] https://ncc.noctua.at/cpus/model/AMD-Ryzen-9-9950X-1831

adrian_b

4 months ago

1 reply

That means pretty much "not recommended".

On the same page linked by you, Noctua explains that the green check mark means that with that cooler the CPU can run all-core intensive tasks, exactly like those used by the gmplib developers, only at the base clock, which is 4.3 GHz for 9950X, with turbo disabled in BIOS.

Only then the CPU might dissipate its nominal TDP of 170 W, instead of the 200 W that it dissipates with turbo enabled.

With "best turbo headroom", you can be certain that the CPU can run all-core intensive tasks with turbo enabled. Even if you do no overclocking, but you run all-core intensive tasks with turbo enabled, this is the kind of cooler that you need.

Noctua does not define what "medium headroom" means, but presumably it means that you can run with turbo enabled all-core tasks that have medium intensity, not maximum intensity.

There is no doubt that it is a mistake to choose such a cooler when you intend to run intensive multi-threaded computations. A better cooler, but not much bigger, like NH-U12A, has an almost double cooling capacity.

That said, there is also no doubt that AMD is guilty of at least having some bugs in their firmware or in failing to provide adequate documentation for the motherboard manufacturers that adapt the AMD firmware for their MBs.

ollien

4 months ago

It is important to remember that CPUs scale their turbo with thermals. It's not a matter of needing to turn turbo on and off

nerdsniper

4 months ago

Wendell at Level1Techs often goes more in-depth on the software testing and datacenter use-case analysis through partnerships with friends who run lots of machines in datacenters.

GN is unique in paying for silicon-level analysis of failures.

der8auer also contributes a lot to these stories.

I tend to wait for all 3 of their analyses, because each adds a different "hard-won" perspective.

tester756

4 months ago

2 replies

My Ryzen CPU recently died too! wtf

FuriouslyAdrift

4 months ago

1 reply

ASRock motherboard?

tester756

4 months ago

Gigabyte

LASR

4 months ago

1 reply

Zen5?

tester756

4 months ago

Ryzen 7

fxtentacle

4 months ago

3 replies

"We suspect that GMP's extremely tight loops around MULX make the Zen 5 cores use much more power than specified, making cooling solutions inadequate."

I feel like if this was heat related, the overall CPU temperature should still somewhat slowly creep up, thereby giving everything enough time for thermal throttling. But their discoloration sure looks like a thermal issue, so I wonder why the safety features of the CPU didn't catch this...

jeffbee

4 months ago

1 reply

Are we talking "slowly" in a relative sense? A silicon die of this size has a thermal mass (guessing) around 10⁻³ J/K but a power dissipation rate over 200W, so it can rise from room temperature to junction temperature limits almost instantly.

topspin

4 months ago

1 reply

People without a background in electronics don't appreciate what modern CPUs and GPUs are doing: the amount of current flowing through these devices is just mind blowing. With adequate cooling, a Ryzen 9 9950X is handling somewhere in the neighborhood of 150-200 amps under high load.

nisegami

4 months ago

4 replies

I initially scoffed at the 150-200 amps. But I know core voltage is usually in the neighbourhood of 1V so to draw 200W, you really would have to basically be moving 200A of current. That's wild.

lightedman

4 months ago

1 reply

And you're pushing that many amps across a piece of silicon roughly the size of your thumbnail all said and done.

marshray

4 months ago

A spot welder, basically.

jeffbee

4 months ago

What's really wild is with all the power scaling features the regulators have to step from zero to hundreds of amps in microseconds with very little overshoot. The power design for these modern systems is demanding.

wtallis

4 months ago

AMD's desktop CPUs are still running at a bit more than 1V; 1.3-1.4V is what you'll see at the high end of the clock speed range. But power draw can easily be in the 250–300W range if you turn on the "PBO" automatic overclocking mode, so 200A is not really the upper bound.

mlyle

4 months ago

Yup. P=IV is really surprising when you get to high power parts at low core voltages. Needless to say, you need lots of transistors and phases on voltage conversion, and you need lots and lots of plane area.

(And,... 200A is the average when dissipating 200W. So how high are the switching currents? ;)

BearOso

4 months ago

They said it took months for each CPU to fail. Both systems used the same inadequate heatsink/fan. Then there's also the lower-end motherboards (they are not "top-quality", the brand means nothing) and the miniscule 450W power supply used in the initial configuration, which are confusingly paired with a 16-core CPU and 64/96GB of RAM.

It doesn't strike me as odd that running an extremely power-heavy load for months continuously on such configurations eventually failed.

touisteur

4 months ago

I'm guessing the temperature could increase quite fast (milliseconds or less) in heavy duty areas, especially when going scalar-to-dense-vector operations.

My best understanding of the avx-512 'power license' debacle on Intel CPUs was that the processor was actually watching the instruction stream and computing heuristics to lower core frequency before reaching avx512 or dense-avx2 instructions. I guessed they knew or worried that even a short large-vector stint would fry stuff...

Apparently voltage and thermal sensor have vastly improved and looking at the crazy swings on NVIDIA GPU's clocks seem to agree with this :-)

tux3

4 months ago

4 replies

The room temperature or precise way the paste was applied should not matter. Modern CPUs have very advanced dynamic voltage and frequency scaling (DVFS), which accounts for several sensors, including temperature.

These big x86 CPUs in stock configuration can throttle down to speeds where they can function with entirely passive cooling, so even if the cooler was improperly mounted, they'd only throttle.

All that to say, if GMP is causing the CPU to fry itself, something went very wrong, and it is not user error or the room being too hot.

mk_stjames

4 months ago

4 replies

This was my first question as well- I thought it had been a long, long time since you could fry a CPU by taking away the heatsink.

As in... what, AMD K6 / early Pentium 4 days was the last time I remember hearing about cpu cooler failing and frying a cpu?

Twirrim

4 months ago

1 reply

It was some time around then. I remember AMD being late to it vs Intel.

mook

4 months ago

1 reply

That was SpeedStep? By the time AMD got to it it was just sort of expected and didn't have a fancy name, as far as I know.

Or maybe I'm thinking of something else entirely…

ChoGGi

4 months ago

AMD's fancy name was PowerNow! or Cool'n'Quiet.

dwood_dev

4 months ago

Athlon era when AMD had no IHS but Intel had one. Intel also had thermal controls that AMD lacked.

I once worked on a piece of equipment that was running awful slow. The CPU was just not budging from its base clock of 700Mhz. As I was removing the stock Intel cooler, I noticed it wasn't seated fully. Once I removed it and looked I saw a perfectly clean CPU with no residue. I looked at the HSF, the original thermal paste was in pristine condition.

I remounted the HSF and it worked great. It ran 100% throttled for seven years before I touched it.

p_l

4 months ago

K6 depended on motherboard having thermal sensors - and which had to properly attach to the CPU in the first place.

Built-in thermal sensing came later.

userbinator

4 months ago

This infamous video: https://www.youtube.com/watch?v=06MYYB9bl70

RachelF

4 months ago

1 reply

Yes, this is the point - software should never be able to physically damage the hardware it is on.

If it can, then the hardware is to blame.

mrheosuper

4 months ago

1 reply

As a FW engineer, my software has released the magic smoke a lot.

account42

4 months ago

That's why firmware is often considered a separate category from software even though technically it's the same thing. Software is code that expects the hardware to works as specified, firmware is what achieves that.

secabeen

4 months ago

I would be interested to see if they had the same result with PTM7950 thermal material instead of paste. I've seen significantly better temps with these modern phase-change compounds, and they essentially eliminate application errors.

themafia

4 months ago

If the throttling is not stable it could increase stress on the part by creating a bunch of transient but large thermal cycles through the chip. It would need to have some kind of exponential backoff on throttle so it doesn't immediately try to raise the frequencies when the temperature slightly dips.

bob1029

4 months ago

3 replies

Could be the power supply and load profile?

I've heard some really wild noises coming out of my zen4 machine when I've had all cores loaded up with what is best described as "choppy" workloads where we are repeatedly doing something like a parallel.foreach into a single threaded hot path of equal or less duration as fast as possible. I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD. I've not actually killed a cpu yet though.

bee_rider

4 months ago

1 reply

Is that, like, an intentional stress-test for the hardware that you’ve come up with?

bob1029

4 months ago

No. It is just how the algorithms play out:

1. Evaluate population of candidates in parallel

2. Perform ranking, mutation, crossover, and objective selection in serial

3. Go to 1.

I can very accurately control the frequency of the audible PWM noise by adjusting the population size.

userbinator

4 months ago

1 reply

I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD.

Then you shouldn't trust the results of your work either, as that's indicative of a CPU that's producing incorrect results. I suggest lowering the frequency or even undervolting if necessary until you get a stable system.

...and yes, wildly fluctuating power consumption is even more challenging than steady-state high power, since the VRMs have to react precisely and not overshoot or undershoot, or even worse, hit a resonance point. LINPACK, one of the most demanding stress tests and benchmarks, is known for causing crashes on unstable systems not when it starts each round, but when it stops.

bob1029

4 months ago

The results might be invalid for one generation but the model is resilient to these kinds of events overall. Far more resilient than my operating system is.

Randomly flipped genome bits could even be beneficial for escaping local minima and broken RNG in evolutionary algorithms. One bad evaluation won't throw the whole thing off. It's gotta be bad constantly.

fc417fc802

4 months ago

I experienced that with a GPU years ago. A workload I wrote caused a pronounced high frequency noise from the card that I've never encountered the like of before or since. I'd describe it as a very high frequency chirping. I refactored the program rather than seeing what would come of it.

tw04

4 months ago

3 replies

That looks like a combination of improperly mounting the heatsink and noctuna being wrong in their recommendation to offset it. I’d imagine for gaming cooling one side more makes sense but my completely uneducated guess is that GMP is working a different part of the CPU than gaming does.

pharrington

4 months ago

1 reply

I'd assume both GMP and any CPU intensive game just prefer the performance cores.

jsheard

4 months ago

1 reply

AMDs desktop chips don't have distinct P and E cores, they're all P cores. AMD do have an E core design but it's currently only used in mobile and server parts.

pharrington

4 months ago

1 reply

Gotcha. Apparently Intel's marketing's gotten to me. I haven't really been keeping up with this stuff, so whenever I read about P & E cores in the past, I think I just assumed that was a thing both Intel & AMD were doing, without considering the source material too closely.

wtallis

4 months ago

AMD has definitely been moving in that direction, and arguably doing a better job of it than Intel. But for now, AMD's desktop parts are still built with the same CPU core chiplets as their server parts, and none of the server parts are using heterogenous cores yet (from AMD or Intel). At some point AMD could theoretically build a desktop processor from one Zen chiplet and one Zen-c chiplet, but there hasn't been a good reason to do that yet.

toast0

4 months ago

They had failures with standard mounting and offset mounting.

Also, take a look at a delidded 9950; the two cpu chiplets are to one side, the i/o chiplet is in the middle, and the other side is a handful of passives. Offsetting the heatsink moves the center of the heatsink 7mm towards the chiplets (the socket is 40mm x 40mm), but there's still plenty of heatsink over the top of the i/o chiplet.

This article has some decent pictures of delidded processors https://www.tomshardware.com/pc-components/overclocking/deli...

jsheard

4 months ago

This is what Zen5 looks like under the IHS: https://i.imgur.com/j85YUzX.jpeg

Everything is offset towards one side and the two CPU core clusters are way towards the edge, offset cooling makes sense regardless of usage.

nromiun

4 months ago

2 replies

How is that possible? Even if the chip did not get enough cooling it should have been just throttled heavily.

jsheard

4 months ago

Modern silicon is so dense and heats up so fast that throttling is easier said than done. I think they have to model and predict the thermals ahead of time nowadays, because by the time they could react to a temp sensor alone, the chip might already be toast.

tliltocatl

4 months ago

Maybe the throttling circuitry/firmware simply doesn't have enough time to react.

db48x

4 months ago

4 replies

> The so-called TDP of the Ryzen 9950X is 170W. The used heat sinks are specified to dissipate 165W, so that seems tight.

TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.

https://gamersnexus.net/guides/3525-amd-ryzen-tdp-explained-...

einpoklum

4 months ago

3 replies

Wow, I can't believe how BS this TDP is! I feel like a total idiot! I've always assumed it's sorta-kinda a tight upper bound on power consumption, perhaps with some allowance for "imperfections" in the dissipation properties of the CPU, and that I shouldn't sweat the details.

Couldn't this count as false/misleading advertizing though?

vel0city

4 months ago

1 reply

Its pretty insane to see someone say something like: “TDP is about thermal watts, not electrical watts. These are not the same.” Watts are watts.

But yeah, TDP means nothing. If you stick plenty of cooling and run the right motherboard board revision your "TDP" can be whatever you want it to be until the thing melts.

o11c

4 months ago

1 reply

"TDP is about average watts, not peak watts" would be an honest way of saying it.

vel0city

4 months ago

But in the end that's still not actually true in many modern desktop chips. You can take a 65W part, and with a "stock" motherboard firmware, good cooling, and the right workload end up averaging way more than 65W. Or if you have it in a hot room it just might end up using less than 65W.

TDP is more of a rough idea of how much power the manufacturer wanted to classify the part as. It ultimately only loosely relates to the actual heat or electrical usage in practice.

gruez

4 months ago

2 replies

It's thermal design power, ie. it's the power that it's designed for, not absolute max.

db48x

4 months ago

2 replies

No, they don’t design the chip with these numbers in mind. The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.

gruez

4 months ago

1 reply

>The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.

Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?

db48x

4 months ago

1 reply

Yep. The 5800X is a higher bin specifically because it can clock higher than the ones in the 5700X bin. That certainly makes them draw more power, so they give them a higher TDP number too. But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.

gruez

4 months ago

2 replies

>But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.

I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).

lazide

4 months ago

1 reply

Apparently that’s not actually true?

gruez

4 months ago

1 reply

Which part?

lazide

4 months ago

All of it.

db48x

4 months ago

1 reply

No, the TDP number doesn’t even vaguely approximate anything. You can’t use the number to predict anything, or to plan, or to estimate your electric bill, or anything like that.

menaerus

4 months ago

1 reply

Isn't TDP supposed to be an upper bound of how much power budget there is for a chip when it's running under maximum IPC (which implies AVX512 workload spread across all cores with all the test data in all the L1 caches)? I guess that power budget can vary due to process imperfections and/or CPU bugs but saying that it doesn't approximate anything is hard to believe. How about the PSU then, e.g. is 800W PSU a made-up number as well?

db48x

4 months ago

No, TDP is only supposed to be a marketing number. It would be nice if it were a real number that meant something, but CPU manufacturers don’t want to include really complicated information in their marketing. When they want to emphasize that a processor is powerful, they increase the TDP number! When they want you to buy an efficient laptop, they just lower the number! Same cpu, same number of transistors, same number of cores and PCIe lanes, different model number, different TDP number.

The power ratings of power supplies, on the other hand, are perfectly valid. Try to draw more than that and they will blow a fuse. Note however that a power supply’s efficiency is nonlinear. If your computer is really drawing 800W from the power supply, then the power supply is probably drawing 1000W from the wall, or maybe more. The difference is converted into heat during the conversion from 120V AC to 12V DC (and 5V DC and 3.3V DC, etc, etc). That’s an efficiency of 80%. But if your PC was drawing 400W from the same power supply then maybe the efficiency would be 92% instead, and the supply would only draw 435W from the wall. The right power supply for your computer is the cheapest one that is most efficient at the level of power that your computer actually needs. The Bronze/Gold/Platinum efficiency ratings are almost BS made–up marketing things though, because all that tells you is that it hits a certain efficiency rating at _some_ power level, not that it does so at the power level you’ll typically run your computer at.

There is a similar but more extreme set of nonlinearities when talking about the power drawn by a CPU (or a GPU). The CPU monitors its own temperature and then raises or lowers its own frequency multiplier in response to those temperature changes. This means that the same CPU will draw more power and run faster when you cool it better, and will run more slowly and generate less heat when the ambient temperature is too high. There are also timers involved. Because so many of the tasks we actually give to our CPUs are bursty, CPU performance is also bursty. The CPU will run at a high speed for a short period of time, then automatically scale back after a few seconds. The exact length of that timer can be adjusted by the BIOS, so laptop motherboards turn the timer down really short (because cooling in a laptop is terrible), while gamer motherboards turn them way up (because gamers buy overbuilt Noctua coolers, or water cooling, or whatever). Intel and AMD cannot even tell you a single number that encompasses all of these factors. Thus TDP became entirely meaningless and subject to the whims of marketing.

wahern

4 months ago

1 reply

That seems a little too cynical. It matters how a customer might use a chip, such as the type of cooling that would be expected in a typical system using that model, and that's informed by the advertised specifications. Base clocks and the amount of SRAM also figure into TDP. No doubt there are completely arbitrary aspects to TDP driven purely by profit-focused market segmentation, but it's not just that.

That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.

db48x

4 months ago

1 reply

No, clock speed and cache have nothing to do with TDP. AMD uses a simple formula to calculate TDP. It is the temperature of the IHS minus the air temperature measured at the cpu cooler’s intake fan, divided by a conversion faction in °C/W.

But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.

I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.

magicalhippo

4 months ago

1 reply

> The formula doesn’t even mean anything, in real physical terms.

From your description the formula is how you would calculate the power for which a certain heatsink at a given ambient temperature would result in the specified IHS temperature.

The °C/W number is not a conversion factor but the thermal resistance[1] of the heatsink & paste, that is a physical property.

So unless I misunderstood you it's very much something real in physical terms.

[1]: https://fscdn.rohm.com/en/products/databook/applinote/common...

db48x

4 months ago

It might be a useful formula _if_ the numbers were real. Note that when AMD tells you that a 9900X cpu is has a 120W TDP, that's because they picked three numbers to plug into that formula that result in 120 popping out. They picked the result of 120 first, and then found numbers to put into the formula so that it gives you that result.

But the reason I say that it’s physically meaningless is that real heat dissipation is strongly temperature dependent. The thermal conductivity of a heatsink goes up as the temperature goes up because heat is more effectively transferred into the air at higher temperatures.

taneq

4 months ago

Huh, I always thought it was “total dissipated power”. Like you’d use to spec a power supply.

kllrnohj

4 months ago

1 reply

> Couldn't this count as false/misleading advertizing though?

For what, exactly? TDP stands for "thermal design power" - nothing in that means peak power or most power. It stopped being meaningful when CPUs learned to vary clock speeds and turbo boost - what is the thermal design target at that point, exactly? Sustained power virus load?

einpoklum

4 months ago

1 reply

> For what, exactly? TDP stands for "thermal design power"

The chip is not designed for this rate of power dissipation; and it is not the rate of power dissipation that you can expect to get from the chip.

kllrnohj

4 months ago

1 reply

> The chip is not designed for this rate of power dissipation

Says who? AMD advertises the chip as having a base clock of 4.3 GHz over all cores. The 9950X pulls somewhere around 220W at 5ghz all cores and with how power scales, 170W at the advertised 4.3 GHz seems more than plausible. Seems perfectly within reason that the advertised frequency and the advertised TDP are aligned.

I wish Anandtech was still around as iirc they did have charts for all this, which nobody else seems to do :/

> and it is not the rate of power dissipation that you can expect to get from the chip.

Again, says who? Who's expectations? This is a consumer chip, and the expectations of a consumer chip is not that it spends 100% of its time running prime95 or a similar "power virus" workload. I expect that if I buy this chip while I would have intervals of >170W, I'd also have long periods of much less than 170W. If I have a cooler designed to sustain 170W of cooling, that's going to work out on average just fine as there's thermal mass in the system.

einpoklum

4 months ago

> Says who?

Says AMD and says Intel, apparently. At the link, there is an official explanation (sort of) how the TDP figure is derived.

aidenn0

4 months ago

1 reply

I have a 65W TDP CPU, and the difference in power draw (measured at the outlet) from idle to full CPU load is over 100W; it seems to just raise the clock until it hist 95C, so if I limit the CPU fan's top speed, the power draw goes down.

db48x

4 months ago

Yep. Modern CPUs continually adjust their clock multiplier based on what their temperature is doing, plus a few timers. If you have a better cooler then you’ll get more performance out of the same CPU, but at the cost of drawing more power and producing more heat.

bayindirh

4 months ago

4 replies

When I see the term TDP, I remember what I have read in the "Thermal Design Document" of Intel Core2Quad Q6600 and the family it belongs:

> The thermal solution bundled with the CPUs is not designed to handle the thermal output when all the cores are utilized 100%. For that kind of load, a different thermal solution is strongly recommended (paraphrased).

I never used the stock cooler bundled with the processor, but what kind of dark joke is this?

johncolanduoni

4 months ago

1 reply

Most states of “100% utilization” as you’d see in `top` are not 100% thermal output or even close. Cores waiting for memory accesses count as utilized in the former sense but will not produce as much heat as one that is actually using the ALU etc. That’s why special make-work like Prime95 is used for stress testing overclocking/thermals: it will saturate the cores with enough unblocked arithmetic work to generate more heat than having 1000 browser tabs open does.

account42

4 months ago

You're not going to get anywhere near full thermal load with just integer arithmetic either - you need to saturate the floating point units for that.

cogman10

4 months ago

2 replies

Man that was a beast of a CPU back in the day.

The Conroe Intel era was amazing for the time.

keanebean86

4 months ago

1 reply

That was such a fun time to be into hardware. For years Intel had the money and relationships to keep the Pentium 4 everywhere even though AMD had the better product. The P4 might edge ahead in video rendering but the Athlon would win overall and use less power.

Then Conroe launched and the balance shifted. Even the cheapest Core2Duo chips were competitive against the best P4s and the high-end C2Ds rivaled or beat AMD. https://web.archive.org/web/20100909205130/http://www.anandt...

AND those chips overclocked to the moon. I got my E6420 to 3.2ghz (from 2.133ghz) just by upping the multiplier. A quick search makes me think my chip wasn't even that great.

cogman10

4 months ago

1 reply

Absolutely. Intel was also keeping up the tick-tock processing. I could be misremembering, but it seemed like every tock intel was getting something like 20% improvements over the last tock. It really wasn't until ~Haswell that that slowed down and continued to slow down to basically nothing. I think Kaby Lake IIRC was the last major performance jump from intel. Everything else has just been incremental changes.

bayindirh

4 months ago

1 reply

One of the reasons that Intel only shipped 5% incremental updates was AMD was basically non-existent due to both Intel pressuring them and AMD has done a massive mistake with bulldozer/piledriver architecture.

They vastly underestimated how much a single FPU would be bottleneck on a multicore/SMP processor.

Then AMD took things personal and architected Zen/EPYC. The rest is history.

cogman10

4 months ago

Certainly, and by that time Intel just sort of dropped all the balls. They were already struggling to do die shrinks and it seems like they simply lost all their ability to develop the architecture.

That had maybe happened years earlier. The thing about Conroe is, IIRC, its ancestry came from the P3 and Intel's mobile CPU designs. P4 was steady evolutions on the Netburst architecture. The years of improvements to conroe were mostly just incremental changes and porting over features from Netburst (such as hyperthreading). Once that all played out, intel really didn't have anywhere else to go or plans on how to evolve the architecture. They fell back on the same old "let's just add wider SIMD instructions (AVX)".

I also seem to recall that intel made fab bets that ultimately didn't pay off. Again, IIRC, I believe they were trying to use the same light lithography (230nm light?) rather than going into UV lithography. That caused them to dump a fair bit of money fabrication that never really paid off.

bayindirh

4 months ago

Buying parts for that particular desktop was quite fun:

    - Me: Can I get a Q6600?
    - Seller: But, that's... Quad core?
    - Me: Yes, I'll have it.
    - Seller: OK. RAM?
    - Me: I'll get OCZ Flex-XLC Hybrids. 1GB.
    - Seller: *Gives one*
    - Me: I'll get four.
    - Seller: ?
    - Me: Yes, four please.

Motherboard was an MSI P35 Platinum. Fun times.

kokada

4 months ago

1 reply

This is more how I think too: using a cooler that supports your CPU TDP is generally fine because most people will not run a CPU 100% for an extended amount of time. But in this case they seem to be running the CPU 100% for an extended amount of time AND are using an under-spec'ed cooler (even if it is just by 5W).

You don't even need to change the actual cooler since for AMD CPUs you can pretty much customize the TDP whatever way you want, and by default they run well above their efficiency curve. For example, my 7600X has a default TDP of 105W but I run it in Eco Mode (65W) with undervolt and I barely lose any performance. Even if I did no undervolt, running the CPU in Eco Mode is generally preferable since the performance loss is still negligible (~5%).

bayindirh

4 months ago

For a general purpose system, this line of thinking makes sense. However, the desktop system in question was built to be daily driven and support some high performance code research, so it had to endure some serious loads for a desktop computer.

I went the other way and overspecced the CPU cooler and added some silent but high CFM capable fans on the system. The motherboard I got was able to adjust all fans depending on the system temps, so it scaled from a very silent desktop to a low-key space heater automatically under load.

Instead of undervolting the processor, I was using a tweaked on-demand governor on the system which stuck to lower power levels more than usual, so unless I was doing software development and testing things, it stayed cool and silent.

BTW, by 100%, I'm talking about completely saturating the CPU pipeline. Not pseudo 100% where CPU reports saturation but most of the load is iowait.

lofaszvanitt

4 months ago

I always used the stock cooler, because it's quiet and nothing uses the cpu to its fullest :).

mrb

4 months ago

1 reply

You are correct. In fact these guys measured a maximum socket power consumption of 240 watt using a 9950X at stock settings, running prime95. So far above the "170 watt" TDP:

https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...

dcrazy

4 months ago

4 replies

I don’t understand this argument. If the CPU dissipated an equal number of watts of heat energy as it consumed from the wall, there wouldn’t be any energy left to do actual useful work. Isn’t the extra 100W accounted for by things like changing the state of flip-flops? In other words, mustn’t one consider the entropy reduction of the system as an energy sink?

db48x

4 months ago

2 replies

Nope. Remember that you cannot destroy energy. The energy you use to flip the flip flop still exists, only now it’s just disordered waste heat instead of electricity.

dcrazy

4 months ago

2 replies

Energy cannot be created or destroyed, but it can enter and leave an open system. When I lift a 10kg box 1 meter in the air, I don’t raise its temperature at all, and I only raise mine a tiny bit, yet I have still done work on the box and therefore have imparted it energy. The energy came from food I ate earlier, and was ultimately stored in the box as gravipotential energy.

Is this not analogous to storing energy in the EM fields within the CPU?

db48x

4 months ago

1 reply

Yes, but only briefly. When you study the thermodynamics of information you’ll discover that it’s actually erasing information that has a cost. Every time the CPU stores a value in a register it erases the previous value, using up energy. In fact, every individual transistor has to erase the previous state on basically every clock cycle.

Curiously there is a minimum cost to erase a single bit that no system can go below. It’s extremely small, billions of times smaller than the amount of energy our CPUs use every time they erase a bit, but it exists. Look up Landauer’s Limit. There is a similar limit on the maximum amount of information stored in a system which is proportional to the surface area of the sphere that the information fits inside. Exceed that limit and you’ll form a black hole. We’re no where near that limit yet either.

imtringued

4 months ago

1 reply

>In fact, every individual transistor has to erase the previous state on basically every clock cycle.

This is incorrect in both directions.

Only transistors whose inputs are changing have to discharge their capacitance.

This means that if the inputs don't change nothing happens, but if the inputs change then the changes propagate through the circuit to the next flip flop, possibly creating a cascade of changes.

Consider this pathological scenario: The first input changes, then a delay happens, then the second input changes so that the output remains the same. This is known as a "glitch". Even though the output hasn't changed, the downstream transistors see their input switch twice. Glitches propagate through transistors and not only that, if another unfortunate timing event happens, you can end up with accumulating multiple glitches. A single transistor may switch multiple times in a clock cycle.

Switching transistors costs energy, which means you end up with "parasitic" power consumption that doesn't contribute to the calculated output.

db48x

4 months ago

My apologies if I wasn’t clear enough. I was only intending to make a statistical statement that the number of erasures is of similar order to the number of transistors, not that every single transistor changes its state exactly once per cycle. Some don't change their state this cycle, others end up changing multiple times before settling. In fact, some are completely powered off! (Because you’re not using the built–in GPU right now, or you’re not doing AVX512 right now, etc, etc.)

Note also that discharging the internal capacitance of a transistor, and the heat generated by current through the transistor’s internal resistance, are both costs over and above the fundamental cost of erasing a bit. Transistors can be made more efficient by reducing those additional costs, but Landauer discovered that nothing can reduce the fundamental cost of erasing a bit.

lmm

4 months ago

CPUs don't store nontrivial amounts of energy, and even if storing a 1 was a significantly higher energy level than a 0 (or vice versa) there's no plausible workload that would be causing the CPU to switch significantly more 0s to 1s than 1s to 0s (or vice versa).

sireat

4 months ago

2 replies

To be a bit flippant, you can absolutely destroy energy by creating some mass..

Then again most of us do not have particle accelerator nearby looking for Higgs boson.

db48x

4 months ago

I’m sorry, but no. Mass is just energy.

inejge

4 months ago

>> The energy you use to flip the flip flop

> To be a bit flippant

I see what you did here :)

marshray

4 months ago

1 reply

Clocking and changing register states requires charging and discharging the gate capacitance of a bunch of MOSFET transistors. The current that results from moving all that charge around encounters resistance, which converts it to heat. Silicon is only a "semi" conductor after all.

You are correct that there is energy bound in the information stored in the chip. But last I checked, our most efficient chips (e.g., using reversible computing to avoid wasting that energy) are still orders of magnitude less efficient than those theoretical limits.

dcrazy

4 months ago

Thank you for encouraging me to go on this educational adventure. I have now heard of Landauer’s principle, which says each bit of information releases 2.9e-21 joules when destroyed: https://en.wikipedia.org/wiki/Landauer%27s_principle

arcade79

4 months ago

What happens to the energy that did the useful work?

MadnessASAP

4 months ago

I think the numbers are more like <1W used in actual information processing, >239W lost to heat. Information and the transformation of it does have some inherent energy cost. But it is very, very small. And you end up getting that back as heat somewhere else down the line anyways.

FuriouslyAdrift

4 months ago

2 replies

Most likely it's the motherboard. ASRock is getting nailed right now for unstable XMP and CPU voltages (it's recommended to undervolt a little just in case).

The Asus Prime B650M motherboards they are using aren't exactly high end.

J_Shelby_J

4 months ago

My friend just had an ASRock board cook his AMD CPU. Apparently a very common problem.

kvemkon

4 months ago

And the close-up photos of the socket with pins are missing.

73 more comments available on Hacker News

View full discussion on Hacker News

ID: 45041743Type: storyLast synced: 11/20/2025, 8:32:40 PM

Want the full context?