SSD-Iq: Uncovering the Hidden Side of SSD Performance [pdf]

Posted4 months agoActive4 months ago

jandrewrogers

59 points

24 comments

vldb.orgTechstory

calmmixed

Debate

40/100

SSD PerformanceStorage TechnologyDatabase Systems

Key topics

SSD Performance

Storage Technology

Database Systems

The paper 'SSD-IQ: Uncovering the Hidden Side of SSD Performance' analyzes the intricacies of SSD performance, and the discussion revolves around the findings, implications, and related experiences with SSDs in various settings.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

Day 3

Avg / period

Comment distribution24 data points

Loading chart...

Based on 24 loaded comments

Key moments

01Story posted
Aug 22, 2025 at 11:13 AM EDT
4 months ago
Step 01
02First comment
Aug 24, 2025 at 9:59 AM EDT
2d after posting
Step 02
03Peak activity
12 comments in Day 3
Hottest window of the conversation
Step 03
04Latest activity
Sep 5, 2025 at 2:16 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (24 comments)

Showing 24 comments

jeffbee

4 months ago

1 reply

Seems like the color codes in Table 3 are reversed? Higher write application factors are green and lower ones are red.

djoldman

4 months ago

That table is really confusing as the colors have wildly different meanings depending on the row.

jmpman

4 months ago

1 reply

Feels like a paper that should have been published about 15 years ago.

tanelpoder

4 months ago

In the database-nerd world, we had something like this about ~10 years ago, written by @flashdba. Still a good read:

https://flashdba.com/category/storage-for-dbas/understanding...

loeg

4 months ago

1 reply

We've observed FDP to make a surprisingly big difference in drive internal WA. If you can meaningfully tag different lifetime/stream data from your workloads, and you can expect hardware that supports it, it's very helpful. We some something like WAF reduction from ~1.60 to ~1.04 (on a synthetic but vaguely plausible workload).

jeffbee

4 months ago

1 reply

With rocks, I assume?

loeg

4 months ago

1 reply

Most of our writes (>99%) aren't rocks.

jeffbee

4 months ago

1 reply

Hrmm. Still guessing about your workloads, but isn't it possible that workload A could cause a disproportionate amount of amplification, while still being much smaller in aggregate than workload B?

loeg

4 months ago

Seems possible, though we only incorporated FDP changes to the non-rocks writes and saw large WAF reduction. (We're approximately an object store as a service with rocksdb for metadata and direct disk writes for data; typical object size is hundreds of kB to single-digit MBs, and rocks updates are batched. Many of our users can accurately predict how long their objects will live, so we can segregate FDP streams by lifetime.)

kvemkon

4 months ago

3 replies

> Vendors downplay the idiosyncrasies of specific SSD models by marketing their devices using four “headline” throughput metrics: sequential read, sequential write, random read, and random write.

For SOHO yes, where no serious database usage is expected. But server/datacenter SSDs are categorized: read-intensive, write-intensive and mixed-usage.

p_ing

4 months ago

1 reply

Gamers also fall into the read/write number trap. When tested, that type of workload performs just about the same from PCIe 3.0 through 5.0 due to the 4KiB often random access. And in some cases, there was only a minor delta between PCIe 5.0 NVMe and SATA SSD.

https://www.youtube.com/watch?v=gl8wXT8F3W4

antonkochubey

4 months ago

1 reply

What games would load data in random 4KB chunks? Textures, sounds etc are in megabytes nowadays, 4K random reads are completely irrelevant.

p_ing

4 months ago

It doesn't matter how large the asset is, it matters what the method used to read the asset is.

Not every application will read in a specific size, but 4KiB isn't uncommon.

lmz

4 months ago

1 reply

Those categories are usually derived from another advertised number: Drive Writes Per Day.

As an example in this Micron product brief the Latency for the read-intensive vs mixed use product are the same: https://assets.micron.com/adobe/assets/urn:aaid:aem:e71d9e5e...

Of course the footnote says that latency is a median at QD=1 random 4K IO.

From the paper the PM9A3 which is 1 DWPD has better P99.9 write latency under load vs the 7450 Pro (3 DWPD mixed use).

bayindirh

4 months ago

The best way to spec a storage system for any use case is to give baseline numbers for the desired benchmark (plus its parameters), and let the vendors do their tests in house and spec the system out to you.

If you can borrow systems, you can do it yourself, too.

Otherwise, there are too many variables to calculate now. In the past it was easier. Now it's much more complicated.

wtallis

4 months ago

You're conflating two different things here: the performance metrics that marketing provides, and the product segments that marketing groups products into.

__turbobrew__

4 months ago

2 replies

Something I learned the hard way is that SSD performance can nosedive if DISCARD/TRIM commands are not sent to the device. Up to 50% lower throughput on our Samsung DC drives.

Through metrics I noticed that some SSD in a cluster were much slower than others despite being uniform hardware. After a bit of investigation it was found that the slow devices had been in service longer, and we were mot sending DISCARDs to the SSDs due to a default in dm-crypt: https://wiki.archlinux.org/title/Dm-crypt/Specialties#Discar...

The performance penalty for our drives (Samsung DC drives) was around 50% if TRIM was never run. We now run blkdiscard when provisioning new drives and enable discards on the crypt devices and things seem to be much better now.

Reflecting a bit more, this makes me more bullish on system integrators like Oxide as I have seen so many times software which was misconfigured to not use the full potential of the hardware. There is a size of company between a one person shop and somewhere like facebook/google where they are running their own racks but they don’t have the in house expertise to triage and fix these performance issues. If for example you are getting 50% less performance out of your DB nodes, what is the cost of that inefficiency?

p_ing

4 months ago

4 replies

While not the same issue, I took four 500GB Samsung 850 EVO drives and created a Storage Space out of them for Hyper-V VMs. Under any sort of load the volume would reach ~1 second latency. This was on a SAS controller in JBOD mode.

Switched to some Intel 480GB DC drives and performance was in the low milliseconds as I would have thought any drive should be.

Not sure if I was hitting the DRAM limit of the Samsungs or what, spent a bit of time t-shooting but this was a home lab and used Intel DCs were cheap on eBay. Granted, the Samsung EVOs weren't targeted to that type of work.

ADefenestrator

4 months ago

The Samsung consumer drives definitely don't do well under sustained high write workloads. The SLC cache fills up after a while, and write speeds drop drastically. The They also have some variety of internal head-of-line blocking type issue, where read latency goes way up when the writes saturate. I can't say I've ever seen 1s latency out of them, though.

Consumer drives can definitely have some quirks. The 2TB 960 Pro also just had weird write latency, even under relatively moderate load. Like 2-4ms instead of <1ms. It didn't really get much worse with extra load and concurrency, except that if there's writes enqueued the reads end up waiting behind them for some reason and also seeing the latency penalty.

They can also be weird behind RAID controllers, though I'm not sure if JBOD counts there. For whatever reason, the 860 EVO line wouldn't pass TRIM through the SAS RAID controller, but the 860 PRO would.

sitkack

4 months ago

Could be garbage collection pauses. You could try wiping them again with zeros or doing a drive specific reset and see if the performance is normative.

pkaye

4 months ago

The Samsung 850 EVO drives probably used an SLC write cache. A small portion of the NAND is configured to use as an SLC write buffer so they can handle a burst of writes faster and later move them the the MLC/TLC region. This is sufficient for typical consumer workloads.

Another thing you will notice is the 850 EVO is 500GB capacity while the Intel one is 480GB. The difference is capacity is put towards overprovisioning which reduces write amplification. The idea is if you have sufficient free space available, whole NAND blocks will naturally get invalidated before you run out of free blocks.

__turbobrew__

4 months ago

850 EVO is basically the lowest tier consumer device, from what I have read those devices can only handle short bursts of IOs and do not perform well under sustained load.

lathiat

4 months ago

1 reply

The fun part is that for a bunch of SSD drives (especially older ones), sending discard/trim may also tank the performance. Due to firmware bugs.

loeg

4 months ago

You still might need to pace how fast you send discard/trim to modern drives, FWIW.

View full discussion on Hacker News

ID: 44985619Type: storyLast synced: 11/20/2025, 4:20:22 PM

Want the full context?