Frozen Ducklakes for Multi-User, Serverless Data Access
Posted2 months agoActive2 months ago
ducklake.selectTechstory
supportivepositive
Debate
20/100
Data ManagementDuckdbServerless Architecture
Key topics
Data Management
Duckdb
Serverless Architecture
The post introduces 'Frozen DuckLakes', a multi-user, serverless data access solution built on top of DuckDB, with the discussion highlighting its simplicity, flexibility, and potential for innovation in data management.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
5d
Peak period
3
120-132h
Avg / period
2.5
Key moments
- 01Story posted
Oct 25, 2025 at 6:57 AM EDT
2 months ago
Step 01 - 02First comment
Oct 30, 2025 at 12:57 PM EDT
5d after posting
Step 02 - 03Peak activity
3 comments in 120-132h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 31, 2025 at 1:09 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45702831Type: storyLast synced: 11/20/2025, 1:08:48 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Similar to how git can serve a repo from a simple http server with no git installed on that (git update-server-info).
The frozen part is what iceberg promised in the beginning, away from Hive's mutable metastore.
Point to a manifest file + parquet/orc & all you need to query it is S3 API calls (there is no metadata/table server, the server is the client).
> Creating and publishing a Frozen DuckLake with about 11 billion rows, stored in 4,030 S3-based Parquet files took about 22 minutes on my MacBook
Hard to pin down how much of it is CPU and how much is IO from s3, but doing something like HLL over all the columns + rows is pretty heavy on the CPU.
- create your frozen ducklake
- run whatever "normal" mutation query you want to run (DELETE, UPDATE, MERGE INTO)
- use `ducklake_rewrite_data_files` to make new files w/ mutations applied, then optionally run `ducklake_merge_adjacent_files` to compact the files as well (though this might cause all files to change).
- call `ducklake_list_files` to get the new set of active files.
- update your upstream "source of truth" with this new list, optionally deleting any files no longer referenced.
The net result should be that any files "touched" by your updates will have new updated versions alongside them, while any that were unchanged should just be returned in the list files operation as is.
In all seriousness though, this seems like a great idea.