Blockdiff: We Built Our Own File Format for Vm Disk Snapshots
Posted3 months agoActive3 months ago
cognition.aiTechstory
calmpositive
Debate
60/100
VirtualizationFile SystemsStorage Optimization
Key topics
Virtualization
File Systems
Storage Optimization
Cognition team developed a custom file format, Blockdiff, for efficient VM disk snapshots, sparking discussion on its potential applications, comparisons to existing solutions, and technical trade-offs.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
3h
Peak period
16
0-12h
Avg / period
4.8
Comment distribution24 data points
Loading chart...
Based on 24 loaded comments
Key moments
- 01Story posted
Sep 30, 2025 at 11:13 PM EDT
3 months ago
Step 01 - 02First comment
Oct 1, 2025 at 2:39 AM EDT
3h after posting
Step 02 - 03Peak activity
16 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 8, 2025 at 4:12 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45433926Type: storyLast synced: 11/20/2025, 12:29:33 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
(We have since removed the fiemap code from cp, and replaced it with LSEEK_DATA, LSEEK_HOLE)
I was curious about a couple of things:
* Have you considered future extensions where you can start the VM before you completed the FS copy?
* You picked XFS over ZFS and BTRFS. Any reason why XFS in particular?
* You casually mention that you wrote 'otterlink', your own hypervisor. Isn't that by itself a complicated effort worthy of a blog post? Or is it just mixing and matching existing libraries from the Rust ecosystem?
> Any reason why XFS in particular?
XFS is still the default filesystem of choice for many enterprise systems. For instance, Red Hat states in their manual [1]:
There are good reasons to choose other file systems, but if you just want good performance on simple storage, XFS is a pretty good default.[1] https://docs.redhat.com/en/documentation/red_hat_enterprise_...
[0] https://ceph.io/en/news/blog/2013/incremental-snapshots-with...
Ideally, I would like to use something like this without being forced to use a specific file system. This is essentially what qcow2 does, and it's a shame that it's not supported by all hypervisors. But then your implementation would need to be much more complex, and implement what CoW filesystems give you for free, so I appreciate that this is possible in 600 LOCs.
Also, your repo doesn't have a license, which technically makes it unusable.
Created an issue of this on their github as I wanted this issue to also go to their github, let's hope they add a permissive license like MIT or apache
Here's the implementation I did for openbsd; it's around 700 lines, including the gunk to interface with the hypervisor.
https://github.com/openbsd/src/blob/master/usr.sbin/vmd/vioq...
It's not a good choice for computing diffs, but you can run your VM directly off a read-only base qcow2, with all deltas going into a separate file. That file can either be shipped around or discarded. And multiple VMs can share the same read only base.
So, it probably would have been better to write the code for the hypervisor, and end up with something far more efficient overall.
I use it to back up external disks, usb sticks etc. Because the resulting qcow2 images are sparse and compressed they use less storage which is great for backups.
Also, a few years ago I've implemented VM management tool called 'vmess', in which the concept is to maintain a tree of QCOW2 files, which R/W snapshots are at the leafs and R/O snapshots are the nodes of the tree. The connection up to the root is made via QCOW2 backing-file store mechanism, so a newly created leaf starts a 0 space. I did this because libvirt+qemu impose various annoying limitations surrounding snapshots-with-in-qcow2, and I liked the idea of file-per-snapshot.
VDO: https://docs.kernel.org/admin-guide/device-mapper/vdo.html (original project URL: https://github.com/dm-vdo/kvdo )
vmess: https://github.com/da-x/vmess
It would be helpful to share the command-line and details of how benchmarks were run.