Amd's Epyc 9355p: Inside a 32 Core Zen 5 Server Chip
Posted3 months agoActive3 months ago
chipsandcheese.comTechstoryHigh profile
calmpositive
Debate
40/100
Amd EpycZen 5 ArchitectureServer ProcessorsHardware Analysis
Key topics
Amd Epyc
Zen 5 Architecture
Server Processors
Hardware Analysis
The article analyzes AMD's EPYC 9355P server chip, revealing its 32-core Zen 5 architecture, and sparking discussion on its features, performance, and potential applications.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
2h
Peak period
12
6-12h
Avg / period
4.6
Comment distribution55 data points
Loading chart...
Based on 55 loaded comments
Key moments
- 01Story posted
Oct 3, 2025 at 4:01 PM EDT
3 months ago
Step 01 - 02First comment
Oct 3, 2025 at 6:00 PM EDT
2h after posting
Step 02 - 03Peak activity
12 comments in 6-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 7, 2025 at 10:55 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45467166Type: storyLast synced: 11/20/2025, 6:30:43 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I know it's a server but I'd be so ready to use all of that as RAM disk. Crazy amount at a crazy high speed. Even 1% would be enough just to play around with something.
This has been the basic pattern for ages, particularly with large C++ projects. C++ builds, specially with the introduction of multi-CPU and multi-core systems, turns builds into IO-bound workflows, specially during linking.
Creating RAM disks to speed up builds is one of the most basic and low effort strategies to improve build times, and I think it was the main driver for a few commercial RAM drive apps.
RAMsan line for example started in 2000 with 64GB DRAM-based SSD with up to 15 1Gbit FC interfaces, providing a shared SAN SSD for multiple hosts (very well utilized by some of the beefier cluster SQL databases like Oracle RAC) but the company itself has been providing high speed specialized DRAM-based SSDs since 1978
Last time I saw one was with a mainframe, which kind of makes sense if adding cheaper third party memory to the machine would void warranties or breach support contracts. People really depend on company support for those machines.
A fast scratch pad that can be shared between multiple machines can be ideal at times.
You are arguing hypotheticals, whereas for decades the world had to deal with practicals. I recommend you spend a few minutes looking into how to create RAM drives on, say, Windows, and think through how to achieve that when your build workstation has 8GB of RAM and you need a scratchpad memory of, say, 16GB of RAM.
Recommended reading: https://en.wikipedia.org/wiki/RAM_drive
These are only for when the OS and the machine itself can't deal with the extra memory and wouldn't know what to do with it, things you buy when you run out of sensible options (such as adding more memory to your machine and/or configuring a RAM disk).
A) this technique precedes the existence of Linux.
B) Linux is far from the most popular OS in use today.
C) some software development projects are developed and target non-Linux platforms (see Windows)
Nowadays nvmes might indeed be able to get close - but we'd probably need to still span over multiple SSDs (reducing the cost savings), and the developers there are incredible sensitive to build times. If a 5 minute build suddenly takes 30 seconds more we have some unhappy developers.
Another reason is that it'd eat SSDs like candy. Current enterprise SSDs have something like a 10000 TBW rating, which we'd exceed in the first month. So we'd either get cheap consumer SSDs and replace them every few days, or enterprise SSDs and replace them every few months - or stick with the RAM setup, which over the live of the build system will be cheaper than constantly buying SSDs.
Wow. What’s your use case?
We actually did try with SSDs about 15 years ago, and had a lot of dead SSDs in a very short time. After that we went for estimating data written, it's cheaper. While SSD durability increased a lot since then everything else got faster as well - so we'd have SSDs last a bit longer now (back then it was a weekly thing), but still nowhere near where it'd be a sensible thing to do.
They sound incredibly spoiled. Where should I send my CV?
They indeed are quite spoiled - and that's not necessarily a good thing. Part of the issue is that our CI was good and fast enough that at some point a lot of the new hires never bothered to figure out how to build the code - so for quite a few the workflow is "commit to a branch, push it, wait for CI, repeat". And as they often just work on a single problem the "wait" is time lost for them, which leads to the unhappiness if we are too slow.
Running the numbers to verify: a read-write-mixed enterprise SSD will typically have 3 DWPD (drive writes per day), across it's 5 year warranty. At 2TB, that would be 10950 TBW, so that sort of checks out. If endurance was a concern, upgrading to a higher capacity would linearly increase the endurance. For example the Kioxia CD8P-V. https://americas.kioxia.com/en-us/business/ssd/data-center-s...
Finding it a bit hard to imagine build machines working that hard, but I could believe it!
I don't know where you're buying your NVMe drives, but mine usually respond within a hundred microseconds.
I assume the same would be true for any project that is configure-heavy.
this kit? https://www.newegg.com/nemix-ram-1tb/p/1X5-003Z-01930
I also have M920Q 8500t, HP prodesk with 10500t, and a lenovo P520 -> these three are truly for home purposes.
IF i were to do the pricetracker machine again, i'd go much smaller, and get a jbod + and probably a P520.
So just those components would be just over $12k.
That's just from regular consumer shops, and includes 25% VAT. Without the VAT it's about $9800.
Problem for consumers is that a just about all the shops that sells such and you might get a deal from would be geared towards companies, and not interested in deal with consumers due to consumer protection laws.
I found a used server with 768 GB DDR4 and dual Intel Gold 6248 CPUs for $4200 including 25% VAT.
That's a complete 2U server, the CPUs are a bit weak but not too bad all in all.
That's 300GB/s slower than my old Mac Studio (M1 Ultra). Memory speeds in 2025 remain thouroughly unimpressive outside of high-end GPUs and fully integrated systems.
The M1 Ultra doesn't have 800GB/s because it's "integrated", it simply has 16 channels of DDR5-6400, which it could have whether it was soldered or not. And none of the more recent Apple chips have any more than that.
It's the GPUs that use integrated memory, i.e. GDDR or HBM. That actually gets you somewhere -- the RTX 5090 has 1.8TB/s with GDDR7, the MI300X has 5.3TB/s with HBM3. But that stuff is also more expensive which limits how much of it you get, e.g. the MI300X has 192GB of HBM3, whereas normal servers support 6TB per socket.
And it's the same problem with Apple even though there's no great reason for it to be. The 2019 Intel Xeon Mac Pro supported 1.5TB of RAM -- still in slots -- but the newer ones barely reach a third of that at the top end.
The M1 Ultra has LPDDR5, not DDR5. And the M1 Ultra was running its memory at 6400MT/s about two and a half years before any EPYC or Xeon parts supported that speed—due in part to the fact that the memory on a M1 Ultra is soldered down. And as far as I can tell, neither Intel nor AMD has shipped a CPU socket supporting 16 channels of DRAM; they're having enough trouble with 12 channels per socket often meaning you need the full width of a 19-inch rack for DIMM slots.
Existing servers typically have 12 channels per socket, but they also have two DIMMs per channel, so you could double the number of channels per socket without taking up any more space for slots. You could also use CAMM which takes up less space.
They don't currently use more than 12 channels per socket even though they could because that's enough to not be a constraint for most common workloads, more channels increase costs, and people with workloads that need more can get systems with more sockets. Apple only uses more because they're using the same memory for the GPU and that is often constrained by memory bandwidth.
Usually this comes at a pretty sizable hit to MHz available. For example STH notes that their Zen5 ASRock Rack EPYC4000D4U goes from DDR5-5600 down to DDR5-3600 with the second slot populated, a 35% drop in throughput. https://www.servethehome.com/amd-epyc-4005-grado-is-great-an...
(It's also because of servers being ultra-cautious again. The desktops say the same thing in the manual but then don't enforce it in the BIOS and people run two sticks per channel at the full speed all over the place.)
So they have been really optimising that IO die for latency.
NUMA is already workload sensitive, you need to benchmark your exact workload to know if it’s worth enabling or not, and this change is probably going to make it even less worthwhile. Sounds like you will need a workload that really pushes total memory bandwidth to make NUMA worthwhile.
It says 16 cores per die with up 16 zen 5 dies per chip. For zen 5 it's 8 cores per die, 16 dies per chip giving a total of 128 cores.
For zen 5c it's 16 cores per die, 12 dies per chip giving a total of 192 cores.
Weirdly it's correct on the right side of the image.
2 more comments available on Hacker News