Gmp Damaging Zen 5 Cpus?
Key topics
The tech world is abuzz with speculation about whether GMP is damaging Zen 5 CPUs, sparking a lively debate among commenters. Some, like craftkiller, are poring over the AM5 pinout, hypothesizing that the affected pins are related to power delivery, while others, such as raverbashing, caution that the connection between silicon area and pin layout isn't always straightforward. As the discussion unfolds, opinions on the reporting style of Gamers Nexus, a prominent tech journalist, are sharply divided, with some praising their investigative zeal and others criticizing their sensationalist approach. Amidst the back-and-forth, a consensus emerges that crusader-style reporting has its value in consumer advocacy, even if it can be polarizing.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
59
0-6h
Avg / period
20
Based on 160 loaded comments
Key moments
- 01Story posted
Aug 27, 2025 at 12:24 PM EDT
4 months ago
Step 01 - 02First comment
Aug 27, 2025 at 2:30 PM EDT
2h after posting
Step 02 - 03Peak activity
59 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 30, 2025 at 7:40 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
[0] https://upload.wikimedia.org/wikipedia/commons/2/2d/Socket_A...
> Modern CPUs measure their temperature and clock down if they get too hot, don't they?
Yes. It's rather complex now and it involves the motherboard vendor's firmware. When (not if) they get that wrong CPUs burn up. You're going to need some expertise to analyze this.
This was Gordon's style, and Steve is continuing it. He has the courage to hit Bloomberg offices with a cameraman, so I don't think his words ring hollow.
We need that kind of in your face, no punches held back type of reporting when compared to "measured professionals".
That framing doesn't do him and the team justice. There is (or better, was) a 3.5h long story about NVIDIA GPUs finding their ways illegaly from the US to China, which got taken down by a malicious DMCA claim from Bloomberg. It is quite interesting to watch (Can be found archive.org).
GN is one of the last pro-consumer outlets, that keep on digging and shaking the tree big companys are sitting on.
Not everywhere:
https://archive.org/details/the-nvidia-ai-gpu-black-market-i...
But yes, once they reedit and republish themselves (or manage some sort of appeal and republish as-is) then of course linking to that (and a smaller cut of the parts they've had to change because Bloomberg were litigious arseholes, if only to highlight that their copyright claim here is somewhat ridiculous) would be much better.
Personally, I found the length of the quotes from politicians kind of tedious, but I sure wouldn’t want them to capitulate to Bloomberg after this.
They made six figures from merch sales on that investigation. Not much, but more than Youtube ads.
00:00:00 - The NVIDIA AI GPU Black Market
00:06:06 - WE NEED YOUR HELP
00:07:41 - A BIG ADVENTURE
00:10:10 - Ignored by the US
00:11:46 - BACKGROUND: Why They're Banned
00:16:04 - TIMELINE
00:21:32 - H20 15 Percent Revenue Share with the US
00:26:01 - Calculating BANNED GPUs
00:29:31 - OUR INFORMANTS
00:31:47 - THE SMUGGLING PIPELINE
00:33:39 - PART 1: HONG KONG Demand Drivers
00:43:14 - PART 1: How Do Suppliers Get the GPUs?
00:48:18 - PART 1: GPU Rich and GPU Poor
00:56:19 - PART 1: DATACENTER with Banned GPUs, AMD, Intel
01:06:19 - PART 1: Chinese Military, Huawei GPUs
01:09:48 - PART 1: How China Circumvents the Ban
01:19:30 - PART 1: GPU MARKET in Hong Kong
01:32:39 - WIRING MONEY TO CHINA
01:36:29 - PART 2: CHINA Smuggling Process
01:43:26 - PART 3: SHENZHEN's GPU MIDDLEMEN
01:50:22 - PART 3: AMD and INTEL GPUs Unwanted
01:56:34 - PART 4: THE GPU FENCE
02:06:01 - PART 4: FINDING the GPUs
02:15:12 - PART 4: THE FIXER IC Supplier
02:21:12 - PART 5: GPU WAREHOUSE
02:27:17 - PART 6: CHOP SHOP and REPAIR
02:34:52 - PART 6: BUILD a Custom AI GPU
02:56:33 - PART 7: FACTORY
03:01:01 - PART 8: TAIWAN and SINGAPORE Intermediaries
03:02:06 - PART 9: SMUGGLER
03:05:11 - LEGALITY of Buying and Selling
03:08:05 - CORRUPTION: NVIDIA and Governments
03:26:51 - SIGNOFF
Ask Beyonce.
You guess the result.
> We use a Noctua cooling solution for both systems. For the 1st system, we mounted the heat sink centred. For the 2nd system, we followed Noctua's advice of mounting things offset towards what they claim to be the hotter side of the CPU. Below is a picture of the 2nd system without the heat sink which shows that offset. Note the brackets and their pins, those pins are where the heat sink's pressure gets centred. Also note how the thermal paste has been squeezed away from that part, but is quite thick towards the left.
> But note that the 1st failure happened with a more centred heat sink. We only made the off-centre mounting for the 2nd system as to minimise the risk of a repeated system failure.
I didn't write the article, I was just commenting because other users seemed to miss information that was written in it.
The picture with the thermal paste shows that paste was squeezed out from the entire perimeter of the CPU, so the cooler is making contact with the whole CPU. The paste is squeezed thinner near the lower side of the CPU because that's where the mounting pins are located, meaning that's where the mounting pressure is the strongest. The impression left by the thermal paste matches the diagram on Noctua's site ( https://noctua.at/pub/media/wysiwyg/offset/heat_cooler_base_... ).
Noctua lists the NH-U9S cooler as being compatible with the 9950X, and claims it has "medium turbo/overclocking headroom", see https://ncc.noctua.at/cpus/model/AMD-Ryzen-9-9950X-1831 . I'm not sure how they come up with their compatibility ratings, but I generally trust Noctua knows what they're doing when it comes to CPU cooling.
It's also important to note that the author only tried the offset mount after they had a CPU die when the cooler was mounted centered on the CPU.
Overall, I think it's unlikely that these failures can be blamed on poor cooling.
Probably there's less paste remaining on the south end of the CPU because that's where the mounting force is greatest.
If anything, there's too much paste remaining on the center/north end of the CPU. Paste exists simply to bridge the roughness of the two metal surfaces, too much paste is a bad sign.
My guess is that the MB was oriented vertically and that big heavy heat sink with the large lever arm pulled it away from the center and north side of the CPU.
IMO, the CPU is still responsible for managing its power usage to live a long life. The only effect of an imperfect thermal solution ought to be proportionally reduced performance.
AMD is somewhat worse than Intel as their DDR5 memory bus is very "twitchy" making it hard to get the highest DDR5 timings, especially with multiple DIMMs per channel.
I got 2x32GB sticks of RAM with the plan to throw in another two sticks later. I had no idea that was now a bad plan. I wish manufacturers would have just put 2 DIMM slots on motherboards as a “warning.”
Despite this, the overtemperature protection of the CPUs should have protected the CPUs and prevent any kind of damage like this.
Besides the system that varies continuously the clock frequency to keep the CPU within the current and power consumption limits, there is a second protection that stops temporarily the clock when a temperature threshold is exceeded. However, the internal temperature sensors of the CPUs are not accurate, so the over-temperature protection may begin to act only at a temperature that is already too high.
So these failures appear to have been caused by a combination of not using the appropriate coolers for a 200 W CPU, combined with the fact that AMD advertises a 200-W CPU as an 170-W CPU, fooling naive customers into believing that smaller coolers are acceptable, and with either some kind of malfunction of the over-temperature protection in these CPUs or with a degradation problem that happens even within the nominal temperature range, but at its upper end.
Noctua's CPU compatibility page lists the NH-U9s as "medium turbo/overclocking headroom" for the 9950X [0]. I don't think it's fair to suggest their cooler choice is the problem here.
[0] https://ncc.noctua.at/cpus/model/AMD-Ryzen-9-9950X-1831
On the same page linked by you, Noctua explains that the green check mark means that with that cooler the CPU can run all-core intensive tasks, exactly like those used by the gmplib developers, only at the base clock, which is 4.3 GHz for 9950X, with turbo disabled in BIOS.
Only then the CPU might dissipate its nominal TDP of 170 W, instead of the 200 W that it dissipates with turbo enabled.
With "best turbo headroom", you can be certain that the CPU can run all-core intensive tasks with turbo enabled. Even if you do no overclocking, but you run all-core intensive tasks with turbo enabled, this is the kind of cooler that you need.
Noctua does not define what "medium headroom" means, but presumably it means that you can run with turbo enabled all-core tasks that have medium intensity, not maximum intensity.
There is no doubt that it is a mistake to choose such a cooler when you intend to run intensive multi-threaded computations. A better cooler, but not much bigger, like NH-U12A, has an almost double cooling capacity.
That said, there is also no doubt that AMD is guilty of at least having some bugs in their firmware or in failing to provide adequate documentation for the motherboard manufacturers that adapt the AMD firmware for their MBs.
GN is unique in paying for silicon-level analysis of failures.
der8auer also contributes a lot to these stories.
I tend to wait for all 3 of their analyses, because each adds a different "hard-won" perspective.
I feel like if this was heat related, the overall CPU temperature should still somewhat slowly creep up, thereby giving everything enough time for thermal throttling. But their discoloration sure looks like a thermal issue, so I wonder why the safety features of the CPU didn't catch this...
(And,... 200A is the average when dissipating 200W. So how high are the switching currents? ;)
It doesn't strike me as odd that running an extremely power-heavy load for months continuously on such configurations eventually failed.
My best understanding of the avx-512 'power license' debacle on Intel CPUs was that the processor was actually watching the instruction stream and computing heuristics to lower core frequency before reaching avx512 or dense-avx2 instructions. I guessed they knew or worried that even a short large-vector stint would fry stuff...
Apparently voltage and thermal sensor have vastly improved and looking at the crazy swings on NVIDIA GPU's clocks seem to agree with this :-)
These big x86 CPUs in stock configuration can throttle down to speeds where they can function with entirely passive cooling, so even if the cooler was improperly mounted, they'd only throttle.
All that to say, if GMP is causing the CPU to fry itself, something went very wrong, and it is not user error or the room being too hot.
As in... what, AMD K6 / early Pentium 4 days was the last time I remember hearing about cpu cooler failing and frying a cpu?
Or maybe I'm thinking of something else entirely…
I once worked on a piece of equipment that was running awful slow. The CPU was just not budging from its base clock of 700Mhz. As I was removing the stock Intel cooler, I noticed it wasn't seated fully. Once I removed it and looked I saw a perfectly clean CPU with no residue. I looked at the HSF, the original thermal paste was in pristine condition.
I remounted the HSF and it worked great. It ran 100% throttled for seven years before I touched it.
Built-in thermal sensing came later.
If it can, then the hardware is to blame.
I've heard some really wild noises coming out of my zen4 machine when I've had all cores loaded up with what is best described as "choppy" workloads where we are repeatedly doing something like a parallel.foreach into a single threaded hot path of equal or less duration as fast as possible. I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD. I've not actually killed a cpu yet though.
1. Evaluate population of candidates in parallel
2. Perform ranking, mutation, crossover, and objective selection in serial
3. Go to 1.
I can very accurately control the frequency of the audible PWM noise by adjusting the population size.
Then you shouldn't trust the results of your work either, as that's indicative of a CPU that's producing incorrect results. I suggest lowering the frequency or even undervolting if necessary until you get a stable system.
...and yes, wildly fluctuating power consumption is even more challenging than steady-state high power, since the VRMs have to react precisely and not overshoot or undershoot, or even worse, hit a resonance point. LINPACK, one of the most demanding stress tests and benchmarks, is known for causing crashes on unstable systems not when it starts each round, but when it stops.
Randomly flipped genome bits could even be beneficial for escaping local minima and broken RNG in evolutionary algorithms. One bad evaluation won't throw the whole thing off. It's gotta be bad constantly.
Also, take a look at a delidded 9950; the two cpu chiplets are to one side, the i/o chiplet is in the middle, and the other side is a handful of passives. Offsetting the heatsink moves the center of the heatsink 7mm towards the chiplets (the socket is 40mm x 40mm), but there's still plenty of heatsink over the top of the i/o chiplet.
This article has some decent pictures of delidded processors https://www.tomshardware.com/pc-components/overclocking/deli...
Everything is offset towards one side and the two CPU core clusters are way towards the edge, offset cooling makes sense regardless of usage.
TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.
https://gamersnexus.net/guides/3525-amd-ryzen-tdp-explained-...
Couldn't this count as false/misleading advertizing though?
But yeah, TDP means nothing. If you stick plenty of cooling and run the right motherboard board revision your "TDP" can be whatever you want it to be until the thing melts.
TDP is more of a rough idea of how much power the manufacturer wanted to classify the part as. It ultimately only loosely relates to the actual heat or electrical usage in practice.
Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?
I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).
The power ratings of power supplies, on the other hand, are perfectly valid. Try to draw more than that and they will blow a fuse. Note however that a power supply’s efficiency is nonlinear. If your computer is really drawing 800W from the power supply, then the power supply is probably drawing 1000W from the wall, or maybe more. The difference is converted into heat during the conversion from 120V AC to 12V DC (and 5V DC and 3.3V DC, etc, etc). That’s an efficiency of 80%. But if your PC was drawing 400W from the same power supply then maybe the efficiency would be 92% instead, and the supply would only draw 435W from the wall. The right power supply for your computer is the cheapest one that is most efficient at the level of power that your computer actually needs. The Bronze/Gold/Platinum efficiency ratings are almost BS made–up marketing things though, because all that tells you is that it hits a certain efficiency rating at _some_ power level, not that it does so at the power level you’ll typically run your computer at.
There is a similar but more extreme set of nonlinearities when talking about the power drawn by a CPU (or a GPU). The CPU monitors its own temperature and then raises or lowers its own frequency multiplier in response to those temperature changes. This means that the same CPU will draw more power and run faster when you cool it better, and will run more slowly and generate less heat when the ambient temperature is too high. There are also timers involved. Because so many of the tasks we actually give to our CPUs are bursty, CPU performance is also bursty. The CPU will run at a high speed for a short period of time, then automatically scale back after a few seconds. The exact length of that timer can be adjusted by the BIOS, so laptop motherboards turn the timer down really short (because cooling in a laptop is terrible), while gamer motherboards turn them way up (because gamers buy overbuilt Noctua coolers, or water cooling, or whatever). Intel and AMD cannot even tell you a single number that encompasses all of these factors. Thus TDP became entirely meaningless and subject to the whims of marketing.
That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.
But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.
I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.
From your description the formula is how you would calculate the power for which a certain heatsink at a given ambient temperature would result in the specified IHS temperature.
The °C/W number is not a conversion factor but the thermal resistance[1] of the heatsink & paste, that is a physical property.
So unless I misunderstood you it's very much something real in physical terms.
[1]: https://fscdn.rohm.com/en/products/databook/applinote/common...
But the reason I say that it’s physically meaningless is that real heat dissipation is strongly temperature dependent. The thermal conductivity of a heatsink goes up as the temperature goes up because heat is more effectively transferred into the air at higher temperatures.
For what, exactly? TDP stands for "thermal design power" - nothing in that means peak power or most power. It stopped being meaningful when CPUs learned to vary clock speeds and turbo boost - what is the thermal design target at that point, exactly? Sustained power virus load?
The chip is not designed for this rate of power dissipation; and it is not the rate of power dissipation that you can expect to get from the chip.
Says who? AMD advertises the chip as having a base clock of 4.3 GHz over all cores. The 9950X pulls somewhere around 220W at 5ghz all cores and with how power scales, 170W at the advertised 4.3 GHz seems more than plausible. Seems perfectly within reason that the advertised frequency and the advertised TDP are aligned.
I wish Anandtech was still around as iirc they did have charts for all this, which nobody else seems to do :/
> and it is not the rate of power dissipation that you can expect to get from the chip.
Again, says who? Who's expectations? This is a consumer chip, and the expectations of a consumer chip is not that it spends 100% of its time running prime95 or a similar "power virus" workload. I expect that if I buy this chip while I would have intervals of >170W, I'd also have long periods of much less than 170W. If I have a cooler designed to sustain 170W of cooling, that's going to work out on average just fine as there's thermal mass in the system.
Says AMD and says Intel, apparently. At the link, there is an official explanation (sort of) how the TDP figure is derived.
> The thermal solution bundled with the CPUs is not designed to handle the thermal output when all the cores are utilized 100%. For that kind of load, a different thermal solution is strongly recommended (paraphrased).
I never used the stock cooler bundled with the processor, but what kind of dark joke is this?
The Conroe Intel era was amazing for the time.
Then Conroe launched and the balance shifted. Even the cheapest Core2Duo chips were competitive against the best P4s and the high-end C2Ds rivaled or beat AMD. https://web.archive.org/web/20100909205130/http://www.anandt...
AND those chips overclocked to the moon. I got my E6420 to 3.2ghz (from 2.133ghz) just by upping the multiplier. A quick search makes me think my chip wasn't even that great.
They vastly underestimated how much a single FPU would be bottleneck on a multicore/SMP processor.
Then AMD took things personal and architected Zen/EPYC. The rest is history.
That had maybe happened years earlier. The thing about Conroe is, IIRC, its ancestry came from the P3 and Intel's mobile CPU designs. P4 was steady evolutions on the Netburst architecture. The years of improvements to conroe were mostly just incremental changes and porting over features from Netburst (such as hyperthreading). Once that all played out, intel really didn't have anywhere else to go or plans on how to evolve the architecture. They fell back on the same old "let's just add wider SIMD instructions (AVX)".
I also seem to recall that intel made fab bets that ultimately didn't pay off. Again, IIRC, I believe they were trying to use the same light lithography (230nm light?) rather than going into UV lithography. That caused them to dump a fair bit of money fabrication that never really paid off.
You don't even need to change the actual cooler since for AMD CPUs you can pretty much customize the TDP whatever way you want, and by default they run well above their efficiency curve. For example, my 7600X has a default TDP of 105W but I run it in Eco Mode (65W) with undervolt and I barely lose any performance. Even if I did no undervolt, running the CPU in Eco Mode is generally preferable since the performance loss is still negligible (~5%).
I went the other way and overspecced the CPU cooler and added some silent but high CFM capable fans on the system. The motherboard I got was able to adjust all fans depending on the system temps, so it scaled from a very silent desktop to a low-key space heater automatically under load.
Instead of undervolting the processor, I was using a tweaked on-demand governor on the system which stuck to lower power levels more than usual, so unless I was doing software development and testing things, it stayed cool and silent.
BTW, by 100%, I'm talking about completely saturating the CPU pipeline. Not pseudo 100% where CPU reports saturation but most of the load is iowait.
https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...
Is this not analogous to storing energy in the EM fields within the CPU?
Curiously there is a minimum cost to erase a single bit that no system can go below. It’s extremely small, billions of times smaller than the amount of energy our CPUs use every time they erase a bit, but it exists. Look up Landauer’s Limit. There is a similar limit on the maximum amount of information stored in a system which is proportional to the surface area of the sphere that the information fits inside. Exceed that limit and you’ll form a black hole. We’re no where near that limit yet either.
This is incorrect in both directions.
Only transistors whose inputs are changing have to discharge their capacitance.
This means that if the inputs don't change nothing happens, but if the inputs change then the changes propagate through the circuit to the next flip flop, possibly creating a cascade of changes.
Consider this pathological scenario: The first input changes, then a delay happens, then the second input changes so that the output remains the same. This is known as a "glitch". Even though the output hasn't changed, the downstream transistors see their input switch twice. Glitches propagate through transistors and not only that, if another unfortunate timing event happens, you can end up with accumulating multiple glitches. A single transistor may switch multiple times in a clock cycle.
Switching transistors costs energy, which means you end up with "parasitic" power consumption that doesn't contribute to the calculated output.
Note also that discharging the internal capacitance of a transistor, and the heat generated by current through the transistor’s internal resistance, are both costs over and above the fundamental cost of erasing a bit. Transistors can be made more efficient by reducing those additional costs, but Landauer discovered that nothing can reduce the fundamental cost of erasing a bit.
Then again most of us do not have particle accelerator nearby looking for Higgs boson.
> To be a bit flippant
I see what you did here :)
You are correct that there is energy bound in the information stored in the chip. But last I checked, our most efficient chips (e.g., using reversible computing to avoid wasting that energy) are still orders of magnitude less efficient than those theoretical limits.
The Asus Prime B650M motherboards they are using aren't exactly high end.
73 more comments available on Hacker News