What If Amd Fx Had "real" Cores? [video]
Posted4 months agoActive4 months ago
youtube.comTechstory
calmmixed
Debate
40/100
CPU ArchitectureAmd Fx SeriesRetro Computing
Key topics
CPU Architecture
Amd Fx Series
Retro Computing
A YouTube video explores what if AMD FX had 'real' cores, sparking a discussion about the FX series' architecture and its legacy among enthusiasts.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
4d
Peak period
14
90-96h
Avg / period
8
Comment distribution24 data points
Loading chart...
Based on 24 loaded comments
Key moments
- 01Story posted
Sep 18, 2025 at 4:37 PM EDT
4 months ago
Step 01 - 02First comment
Sep 22, 2025 at 8:38 AM EDT
4d after posting
Step 02 - 03Peak activity
14 comments in 90-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 22, 2025 at 8:41 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45294711Type: storyLast synced: 11/20/2025, 3:10:53 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Also, the benchmark is clock-for-clock, so while the older Phenom II looks like it's ahead, the Buldozer should be able to go faster still.
All that said, I really enjoyed this retrospective look.
And perhaps most importantly: 4x decoders/4x L1 iCache. IIRC, the entire damn chip was decoder-bound.
--------
Note: AMD Zen has 4x Integer pipelines and 4x FPU pipelines __PER CORE__. Modern high-performance systems CANNOT have a single 2x-pipeline FPU shared between two cores (averaging one pipeline per core). Modern Zen is closer to 4x pipelines per core, maybe more depending on how you count load/store units.
Shrinking the decoder on Bulldozer was clearly the wrong move for Fx-series / AMD. Today's chips are going wide decoder (ex: Apple can do 8x decode per clock tick), deep opcode cache (AMD Zen has a large opcode cache allowing for 6x way lookup per clocktick), or Intel's new and interesting multiple-decoder thing.
https://www.agner.org/optimize/microarchitecture.pdf
They want you to write code that takes advantage of their speedups. Agner Fog is a better writer (a sibling comment already linked to Agner Fogs stuff). But I also like referencing the official manuals and whitepapers as a primary source document.
Hard to beat Intels documents on Intel chips after all.
> Leapfrogging fetch and decode clusters have been a distinguishing feature of Intel’s E-Core line ever since Tremont. Skymont doubles down by adding another decode cluster, for a total of three clusters capable of decoding a total of nine instructions per cycle.
FX cores had his issues. But one, was the AMD bet too early, and too hard that the future was to have a high number of cores.
You can easily see the multithreaded workloads there because you have the six core 3960X as comparison too.
It's almost 10 years old, so I can't complain. And I think I got a check for $2 or something like that from the class-action suit.