I Wish Ssds Gave You CPU Performance Style Metrics About Their Activity
Posted3 months agoActive2 months ago
utcc.utoronto.caTechstory
calmmixed
Debate
60/100
SsdsNvmePerformance MetricsStorage
Key topics
Ssds
Nvme
Performance Metrics
Storage
The author wishes SSDs provided CPU-style performance metrics, sparking a discussion on the feasibility and potential solutions, including NVMe's existing log capabilities and standardized APIs.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
30m
Peak period
30
0-3h
Avg / period
5.4
Comment distribution54 data points
Loading chart...
Based on 54 loaded comments
Key moments
- 01Story posted
Oct 19, 2025 at 1:13 PM EDT
3 months ago
Step 01 - 02First comment
Oct 19, 2025 at 1:42 PM EDT
30m after posting
Step 02 - 03Peak activity
30 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 21, 2025 at 4:54 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45635870Type: storyLast synced: 11/20/2025, 2:49:46 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I feel like maybe some of this info is already available we just don't commonly look at it: knowing how deep the queue is, how many commands are outstanding at any given moment is probably a decent start. I haven't spent time digging into blk-mq to see what's available, to understand the hardware dispatch queue (how the kernel represents the many hardware queues available) info. https://www.kernel.org/doc/html/v5.16/block/blk-mq.html
Every command that you issue to the ssd returns a response. It would be nice to have a bunch of performance counters that tell us where the time went with each of the commands we give it.
GPUs have this already.
For NVMe in particular you will have a hard time filling their queues. Your perceived performance is mostly latency, as there is hardly an application that can submit enough concurrent requests.
You get many of the same problems these days, but they're a bit harder to diagnose. You have to go looking at system monitors to see what's going on. Whereas, if the computer just communicated to you what it was doing, in an ambient way, this stuff would be immediately obvious.
I've heard stories like this where people worked on older computers that were loud, and then you could actually hear what it was doing. If it got stuck in an infinite loop, you'd literally hear it.
That seems like very much a feature to me.
With training runs it makes a little beat and you can tell when it checkpoints because there’s a little skip. Or a GPU drops off the bus…
I'd hope you hear their fans too...
First world problems.
When doing some AI stuff on my garage PC (4060 Ti; nothing crazy) the overhead lights in the garage slightly but noticeably dim. This doesn't occur when gaming.
It's most easily noticeable with one of nVidia's demo apps -- "AI Paintbrush" or something like that, I forget. It's a GUI app where you can "paint" with the mouse cursor. When you depress the mouse button, the GPU engages... and the garage lights dim. Release the mouse button, and the lights return to normal.
The drives were numerous (hard, floppy, tape, optical), and the noises were too loud to avoid using diagnostically. Printers clacked and whooshed (and sometimes moved furniture). Scanners sang songs. Monitors produced clicks and pops and buzzes and sizzles, and the flyback transformer would continuously whine at different frequencies depending on mode. Modems made dialing and shrieking noises. Sound cards were anything but silent; a person could hear noises that varied based on the work the system was doing. And for a long while, CPUs and/or front side bus speeds put a lot of noise right in the middle of the FM dial.
Computing is pretty quiet these days.
At least in my world, the sound of computing had changed quite a bit over the span of decades from the 90s to the 2010s.
The only incidentally-noisy computing things I had left at the end of the teens were the hard drives of ever-increasing size that got used for storing Linux ISOs.
They are still noisy when doing real work on them. Especially laprops.
I sometimes think about what a modern analogy would be for some of the operations work I do — translate a graph of status codes into a steady hum at 440hz for 200s, then cacophonous jolts as the 500s start to arrive? As you mentioned, no perfect analogy as you get farther and farther from moving parts.
They have extremely distinct sounds coming from the GPUs. You can hear the difference between GPT-OSS-20b and Qwen3-30b pretty easily just based on the sounds that the gpu is making.
The sound is being produced by the VRMs and power supply to the GPU being switched on and off hundreds of times per second. Each token being produced consumes power, and each attention and MLP layer consumes a different amount of power. No other GPU stress test consumes power in the same way, so you rarely hear that sound otherwise.
(I've also gotten great use out of a $5 AM/FM radio.)
One could use that while half asleep in the bedroom, whith a radio tuned into the right frequency, almost muted, and then know if Portage on Gentoo, or build.sh/pkgsrc on NetBSD was ready, or interrupted.
Because no buzzing or humming anymore :-)
https://www.paulgraham.com/popular.html
Luckily, storage also get incredibly cheap, so instead of diagnosing it's easier to just have a full backup of your data, and swap to it in case something goes wrong.
Graphs and logs provide a proxy to that data at best, and attaching a debugger, tracer, or perf tool is not an option all the time.
Sounds and LEDs provided an overhead-free real time communication channel to the operation of the system.
Possibly. My first 386-DX40 had activity lights and I tried out a CompuServ disk and saw my HD activity going nuts so I killed the power and trashed the CD.
There are programs that can show a virtual LED for HD and Network activity so all is not lost.
No. Just removing of parts to increase profits.
But then saying “it is too much to ask” is just another way to limit what user can do with the specific resources they paid for.
There's a lot of non-trivial stuff that goes on inside of a modern SSD. And to be sure, none of it is magic; all of it could certainly be implemented in software.
But is that kind of drastic move strictly necessary in order to get meaningful statistics?
(You've heard about apple and orange comparisons, right? Right.)
I'm going to keep referring to the QuickSync video encoding block in my CPU as "hardware," though, because the tiny lump of transistors that is dedicated to performing this specialized task is something that I can kick.
Relatedly, the business of managing raw NAND storage on Apple devices and abstracting it to operating system software as NVMe: That translation happens in hardware. That hardware is also something that I can kick, so I'm going to keep calling it "hardware".
The `nvme id-ctrl -H` (human readable) option does parse and explain some configuration settings and hardware capabilities in a more standardized human readable fashion, but availability of internal activity counters, events vary greatly across vendors, products, firmware versions (and even your currently installed nvme & smartctl software package versions).
Regarding eBPF (for OS level view), the `biolatency` tool supports -F option to additionally break down I/Os by the IORQ flags. I have added the iorq_flags to my eBPF `xcapture` tool as well, so I can break down IOs (and latencies) by submitter PID, user, program, etc and see IORQ flags like "WRITE|SYNC|FUA" that help to understand why some write operations are slower than others (especially on commodity SSDs without power-loss-protected write cache).
An example output of viewing IORQ flags in general is below:
https://tanelpoder.com/posts/xcapture-xtop-beta/#disk-io-wai...
If you want detailed Ryzen stats you have to use ryzen_monitor. If you want detailed Seagate HDD stats you have to use OpenSeaChest. If you want detailed NIC queue stats there's ethq. I'm sure there are other examples as well.
Most hardware metrics are still really difficult to collect, understand and monitor.
I have had the wish since the days of spinning disks.