Reduce Bandwidth Costs with Dm-Cache: Fast Local SSD Caching for Network Storage
Posted4 months agoActive4 months ago
devcenter.upsun.comTechstory
calmmixed
Debate
60/100
Storage OptimizationCloud ComputingCaching Strategies
Key topics
Storage Optimization
Cloud Computing
Caching Strategies
The article discusses using dm-cache to reduce bandwidth costs by caching network storage on local SSDs, sparking a discussion on caching strategies, data integrity, and cloud architecture.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
3d
Peak period
12
84-96h
Avg / period
6
Comment distribution24 data points
Loading chart...
Based on 24 loaded comments
Key moments
- 01Story posted
Sep 9, 2025 at 8:14 AM EDT
4 months ago
Step 01 - 02First comment
Sep 12, 2025 at 7:31 PM EDT
3d after posting
Step 02 - 03Peak activity
12 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 14, 2025 at 9:04 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45180876Type: storyLast synced: 11/20/2025, 1:32:57 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
An expense in the age of 100gbit networking that is entirely because AWS can get away with charging the suckers, um, customers for it
The internet egress price is where they're bastards.
Getting terabits and terabits of 'private' interconnect is unbelievably cheap at amazon scale. AWS even own some of their own cables and have plans to build more.
There is _so_ much capacity available on fiber links. For example one newish (Anjana) cable between the US and Europe has 480Tbit/sec capacity. That's just one cable. And that could probably be upgraded to 10-20x that already with newer modulation techniques.
Another option I haven't tried is tmpfs with an overlay. Initial access is RAM, falls back to underlying slower storage. Since I'm mostly doing reads, should be fine, writes can go to the slower disk mount. No block storage changes needed.
It's maintained by Intel and Huawei and the devs were very responsive.
I’ve been under the impression that Intel got rid of pretty much all of their storage software employees.
My head goes to the xz attack when I hear that Intel decided to stop supporting an open source tool, and a Chinese company known to sell backdoored equipment "steps in" to continue development, and it makes me suspicious & concerned.
This is to say nothing of the quality of the software they write or its functionality, they may be "good stewards" of it, but does it seem paranoid to be unsure of that arrangement?
Unless the writer is always overwriting entire files at once blindly (doesn't read-then-write), consistency requires consistency reads AND writes. Even then, potential ordering issues creep in. It would be really interesting to hear how they deal with it.
If so, safe enough, though if they're going to do that, why stop at 512MB? The big win of Flash would be that you could go much bigger.
I used writeback mode, but expected to wipe the machine if the caching layer ever collapsed. In the end, the SSDs outlived my interest in the machine, though I think I did failover an HDD or two while the rest remained in normal operating mode.
Does this ring any bells? I’ve searched for this a time or two and can’t find it again.
As I recall it was to change the current mirrored read strategy to be aware of the speed of the underlying devices, and prefer the faster if it has capacity. Though perhaps a fixed pool property to always read from a given device was discussed, it's been a while so my memory is hazy.
The use-case was similar IIRC, where a customer wanted to combine local SSD with remote block device.
So, might come to ZFS.
(Somehow the name "SuperDisks" was burned into my brain for this. Although Discord's post does use 'Super-Disks' in a section header, if you search the Internet for SuperDisks you'll everything's about the LS-120 floppies that went by that name.)
I've used it before for a low downtime migration of VMs between two machines - it was a personal project and I could have just kept the VM offline for the migration, but it was fun to play around with it.
You give it a read-only backing device and a writable device that's at least as big. It will slowly copy the data from the read-only device to the writable device. If a read is issued to the dm-clone target it's either gotten from the writable device if it's already cloned or forwarded to the read-only device. Writes are always going to the writable device and afterwards the read-only device is ignored for that block.
It's not the fastest, but it's relatively easy to set up, even though using device mapper directly is a bit clunky. It's also not super efficient, IIRC if a read goes to a chunk that hasn't been copied yet, that's used to give the data to the reading program, but it's not stored on the writable device, so it has to be fetched again. If the file system being copied isn't full, it's a good idea to run trimming after creating the dm-clone target as discarded blocks are marked as not needing to be fetched.
[1] https://docs.kernel.org/admin-guide/device-mapper/dm-clone.h...
That said, for this use, I would be very concerned about coherency issues putting any cache in front of the actual distributed filesystem. (Unless this is the only node doing writes, I guess?)
For local disks though? bcache
1. How is the cache invalidated to avoid reading stale data? 2. If multi az setup is for high availability then I guess the only traffic between zones must be replication from the active one to the standby zones, in such a setup read cache doesn’t make much sense..