Does Mhz Still Matter?

Posted5 months agoActive5 months ago

furkansahin

66 points

47 comments

ubicloud.comTechstory

calmmixed

Debate

70/100

CPU PerformanceCloud ComputingHardware Optimization

Key topics

CPU Performance

Cloud Computing

Hardware Optimization

The article 'Does MHz Still Matter?' explores the relevance of CPU clock speed in modern computing, sparking a discussion on the trade-offs between clock speed, core count, and other factors like PCIe lanes and memory bandwidth.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

23m

Peak period

0-6h

Avg / period

11.8

Comment distribution47 data points

Loading chart...

Based on 47 loaded comments

Key moments

01Story posted
Aug 22, 2025 at 10:47 AM EDT
5 months ago
Step 01
02First comment
Aug 22, 2025 at 11:09 AM EDT
23m after posting
Step 02
03Peak activity
32 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Aug 24, 2025 at 4:01 PM EDT
5 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (47 comments)

Showing 47 comments

magicalhippo

5 months ago

1 reply

Well from what I can see the faster one also has 3D V-cache, so not apples to apples.

That said, at such low core count the primary Epyc advantage is PCIe lanes no?

furkansahinAuthor

5 months ago

Yes, you're right but we tried to keep the workloads less cache dependent.

Also, EPYC's PCIe advantage doesn't hold for the Hetzner provided server setup unfortunately because the configurator allows the same number of devices to be attached to both servers.

adamcharnock

5 months ago

2 replies

This is certainly interesting, and is something I often wonder about. In our case we mostly run Kubernetes clusters on EX130 servers (Xeon, 24c, 256GB). In this situation there are a lot of processes running, for which increased cores-count and memory availability seems worth it. Particularly when we have the cost of 10-25g private networking for each server, lower node counts seems to come out being more economical.

But, but with fewer processes I can totally believe this works out to be the better option. Thank you for the write-up!

furkansahinAuthor

5 months ago

Thanks for the comment! Yeah, if you are using the servers dedicated to yourself and considering the larger server packs more, it definitely makes sense.

In our case, though, if we provide 1/48 of 10Gbit network, it really doesn't work for our end customers. So, we're trying to provide the VMs from smaller but more up-to-date lineup.

justsomehnguy

5 months ago

> In this situation there are a lot of processes running, for which increased cores-count and memory availability seems worth it.

It's always the workload type. For the mixed environments (some with a heavy constant load while the other have only some occasional spikes) the increase of RAM per node is the most important part what allowed us to actually decrease the node count. The whole racks with multiple switches was replaced by a single rack with a modest amount of servers and a single stacked switch.

bob1029

5 months ago

6 replies

> Once network IO became the main bottleneck, the faster CPU mattered less.

It is surprisingly hard to keep a modern CPU core properly saturated unless you are baking global illumination, searching primes or mining crypto currency. I/O and latency will almost always dominate at scale. Moving information is way more expensive than processing it.

cma

5 months ago

1 reply

Some major game engines still have large single thread bottlenecks. They aren't fundamental to the problem space though, more just from legacy engine design decisions.

SkiFire13

5 months ago

2 replies

Being single thread bottlenecked doesn't mean they are actually saturating a CPU core, it may likely be waiting for data from RAM for a lot if not most of the time.

cma

5 months ago

If they had it multithreaded, the execution ports could be used on the other threads while one thread is affected by RAM latency.

And if there is a pointer chasey task that isn't utilizing all of the core and is waiting on memory and data dependencies, often that just makes threading a bigger win, 16 threads doing pointer chases instead of one if you aren't memory bandwidth bound.

NoMoreNicksLeft

5 months ago

The solution is simple. A bog-standard RISC instruction set, with 64 billion 256-bit registers. The operating system will come on the die, etched into the silicon. There will be no ram and no storage.

nine_k

5 months ago

1 reply

Big machines already are more like clusters (NUMA), where access to memory outside a core's domain is much slower. I suspect compute will be more and more dispersed within RAM.

Transputers just came 30+ years too early.

EdNutting

5 months ago

Lol, transputers getting their 5-yearly re-mention.

In most of the ways that matter, Transputers weren’t too early. And if you built one again now, you’d find little to no market for it (as per efforts by several startups and various related research efforts).

Sources: many conversations with the architect of Transputer, and various of the engineers that designed the hardware and Occam, and watching the Inmos 40th Anniversary lectures on YouTube (channel: @ednutting). Also being in the processor design startup space.

malux85

5 months ago

1 reply

Or running molecular simulations, I can keep our whole cluster pegged at 100% CPU for weeks

yieldcrv

5 months ago

1 reply

I've heard generative AI is better at that, or determining configurations and folding patterns

But I figure it is a broad field, so I'm curious what you're doing and if it is the best use of time and energy

I'm also assuming that the generative AI model wouldn't run on your machine well and need to be elsewhere

dcrazy

5 months ago

1 reply

You’ve heard that generative AI is better at what, precisely? Many problems rely on simulations that follow rules derived from experimentation or theory. Are you suggesting that replacing such a simulation with a generative model that appears to follow similar rules is superior?

malux85

5 months ago

^ Exactly, generative AI can /sometimes/ help at /sometimes/ the first step of high throughput structure searching for example, but if you want results that are grounded in reality, then you need to fallback to classical simulations based on the laws of physics to verify them.

I think the above answer was just a hype parrot

I would love it if generative AI could get us even further, we are severely compute limited and also testing in the lab is 20k a pop ... I am strongly incentivised for generative AI to be the answer, but as someone who works deeply in the field, the hype is real.

theamk

5 months ago

2 replies

C++ compilation can do this trivially, as long as you have enough RAM.

vanviegen

5 months ago

Wouldn't the cores be mostly waiting for RAM to dereference a zillion pointers? The cores would still show up in `top` as busy, but wouldn't actually be doing much.

behringer

5 months ago

High quality emulation too.

ksec

5 months ago

>It is surprisingly hard to keep a modern CPU core properly saturated

Modern PS5 developments already shows SSD I/O is getting faster than CPU core can keep up. It is also not true when CPU is still the limiting factor on Web Server.

furkansahinAuthor

5 months ago

Certainly! That was the main reason why we decided to try postgres benchmarking tbh

Aurornis

5 months ago

1 reply

AMD’s fastest consumer CPUs are a great value for small servers. If you’re doing just one task (like in this article) the clock speed is a huge benefit.

The larger server grade parts start to shine when the server is doing a lot of different things. The extra memory bandwidth helps keep the CPU fed and the higher core count reduces the need for context switching because your workloads aren’t competing as much.

The best part about the AMD consumer CPUs is that you can even use ECC RAM if you get the right motherboard.

AnimalMuppet

5 months ago

6 replies

Wait, what? ECC RAM for a consumer CPU? Does anyone sell motherboards like that?

dcm360

5 months ago

Asrock Rack and Supermico sell AM4/AM5 motherboards with support for ECC UDIMMs. Other vendors might state official support on workstation-class motherboards, and in general it might work even on motherboards without official support.

bitwize

5 months ago

"Consumer" shouldn't mean garbage. Between random bit flips in an environment where you have 16 GiB of RAM or more (common in gaming setups now) and Rowhammer, ECC should be the standard. It's only not so that chip and RAM vendors can bin and charge a premium for the good stuff.

anonymars

5 months ago

For example https://www.asus.com/support/faq/1051605/

kllrnohj

5 months ago

Literally first Asrock motherboard I happened to click on has it listed as a feature:

https://www.asrock.com/mb/AMD/X870%20Taichi%20Creator/index....

Asus has options as well such as https://www.asus.com/motherboards-components/motherboards/pr...

I think it was more rare when AM5 first came out, there were a bunch of ECC supported consumer boards for AM4 and threadripper.

0x457

5 months ago

AMDs I/O die supports ECC RAM, but up to motherboard vendor to run the required traces. Some do, some don't.

burnt-resistor

5 months ago

I have 2 Supermicro H13SAE-MF with Ryzen 9's and ECC UDIMM RAM. It's not registered or LR ECC RAM like a mainstream server though, and not as fancy as Chipkill-like ECC systems. This particular board also accepts EPYC 4004 / 4005 series. I'll probably replace the Ryzens with EPYC 4585PX once they get old enough and cheap enough on the secondary markets. These boxes are 100 Gbps network test nodes.

(I'm currently in the midst of refanning my CSE847-JBOD with bigger, quieter fans and swapping PSUs.)

chicagojoe

5 months ago

2 replies

Consumer grade CPUs aren't meant to be pushed with heavy load 24/7, meaning, durability becomes another variable which, in my experience, will quickly outweigh the brief burst of speed.

wmf

5 months ago

1 reply

If your software can handle machine failures, 20% extra performance is absolutely worth some extra failures.

bob1029

5 months ago

I think this is the best path if your problem can support it.

I use a 5950X for running genetic programming and neuroevolution experiments and about once every 100 hours the machine will just not like the state/load it is experiencing and will restart. My approach is to checkpoint as often as possible. I restart the program the next morning and it deserializes the last snapshot from disk. Worst case, I lose 5 minutes of work.

This also helps with Windows updates, power outages, and EM/cosmic radiation.

toast0

5 months ago

AMD uses the same chiplets for Epyc and Ryzen. The packaging is different, and the I/O dies are different, but whatever.

If you really care, you can buy an Epyc branded AM4 or AM5 cpu which has remarkably similar specifications and MSRP to Ryzen offerings.

bgnn

5 months ago

3 replies

If you are running engineering jobs (HPC) like electrical simulation for chip design the only two thingd you care about are the CPU clock speed and memory RW speed.

It's unfortunate that we can only have 16 core CPUs running at 5+ GHz. I would have loved to have a 32 or 64 core Ryzen 9. The software we use charge per core used, so 30% less performance is that much extra cost, which is easily an order of magnitude higher than a flagship server CPU. These licenses cost millions per year for couple 16 core seats.

So, at the end, CPU speed is determining how fast abd economically chips are developed.

CBLT

5 months ago

1 reply

I'm in disbelief that the software you run is completely insensitive to IPC and instruction latency. Without those number, clock speed is meaningless.

bgnn

5 months ago

I don't have those numbers, but here is AMD's on results on Cadence Spectre X with different gen EPYC CPUs, maybe this helps for you to understand the issue: https://www.amd.com/content/dam/amd/en/documents/epyc-techni...

About the technical problem: it's ODE solution for a sparse matrix of 10s of millions elements. Sparsity comes from the locality of interactions within a chip: not every transistor is connected to thd others. So the modern simulators make use of this sparsity to divide the circuit to independent chunks and spread over to multiple independent threads.

Scale of the compute time is not objective, but typically anywhere from couple hours to couple of months. Most top level chip integration verification jobs take weeks. Because of this we spend months for verification after the design is pretty much finish before the tapeout. This applies for every single reasonably complex chip.

ytreem

5 months ago

2 replies

>These licenses cost millions per year for couple 16 core seats

The ROI on hiring a professional overclocker to build, tune and test a workstation is probably at least break even. As long as the right checksums are in place, extreme OC is just a business writeoff.

selectodude

5 months ago

That’s pretty common in the HFT realm already.

https://blackcoretech.com/

jiggawatts

5 months ago

I had a conversation like this with a business that had been around for decades and suddenly grew 100x because some market they were in “took off”. They had built up decades of integration with a legacy database that was single threaded and hence they couldn’t scale it.

Given the urgency and the kind of money involved, I offered to set up a gaming PC for them using phase change cooling. Sadly they just made the staff work longer hours to catch up with the paperwork.

wmf

5 months ago

1 reply

Threadripper can OC to 5+ GHz.

bgnn

5 months ago

I'm waiting to get my hamds on Threadripper Pro 9995WX, which is Turbo at 5.4GHz. Currently I'm using Ryzen™ 9 9950X, Turbo at 5.7 GHz. It's a 7% difference.

theyinwhy

5 months ago

1 reply

Unfortunately, the page's numbers are represented in a sloppy way. A benchmark number with a dollar sign. Different job counts. Lacking documentation. I wouldn't trust this data too much.

ddalex

5 months ago

> A benchmark number with a dollar sign.

Time is money. Or the inverse of money. Ufff, my head hurts.

ksec

5 months ago

From a cloud perspective I guess that would make sense. But if you are actually owning the hardware you would be looking at Performance per Watt over single and multiple core and its balance across I/O both in Sequel and Random. Because at the end of the day you are still limited by power budget. And single core boost are not sustainable over a long period of time especially in a multiple core CPU scenario.

On that note I cant wait to see 256 Core Zen6c later this year. We will soon be able to buy a server with 512 Core, 1024 vCPU / Thread. 2TB of Memory, x TB of SSD all inside 1U.

mrandish

5 months ago

As a consumer who nursed an overclocked 1080ti along for 2.5 gens longer than I would've liked thanks to crypto and then AI, I was reading this fearing a positive conclusion - thinking "Oh great, just when it's time to upgrade my 5600x CPU data centers will start driving up already over-priced consumer CPUs too."

Although said somewhat tongue in cheek, it has been a rough several years for tech hobbyist consumers. At least the end of Moore's law scaling and the bite of Dennard scaling combined to nerf generational improvements enough that getting by on existing hardware wasn't as nearly as bad as it would've been 20 yrs ago.

Now that maybe the AI bubble is just starting to burst, we've got tariffs to ensure tech consumers still won't see undistorted prices. The silver lining in all this is that it got me into retro gaming and computing which, frankly, is really great.

ianpcook

5 months ago

*Do

joennlae

5 months ago

How can I make sure that each github runner uses exactly one cpu core?

View full discussion on Hacker News

ID: 44985323Type: storyLast synced: 11/20/2025, 8:56:45 PM

Want the full context?