Building a Distributed Database in Elixir, Part 3: Storage Layer and Why RocksDB
Postedabout 1 month agoActiveabout 1 month ago
medium.comTech Discussionstory
informativepositive
Debate
20/100
Elixir ProgrammingDistributed DatabasesRocksdb
Key topics
Elixir Programming
Distributed Databases
Rocksdb
Discussion Activity
Light discussionFirst comment
N/A
Peak period
2
0-6h
Avg / period
1.3
Key moments
- 01Story posted
Nov 28, 2025 at 3:49 PM EST
about 1 month ago
Step 01 - 02First comment
Nov 28, 2025 at 3:49 PM EST
0s after posting
Step 02 - 03Peak activity
2 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 2, 2025 at 6:35 AM EST
about 1 month ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 46082638Type: storyLast synced: 11/28/2025, 8:50:07 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Following on that project I might build other things like a stream/queue broker, which will probably fit decently with RocksDB as well
Main topics: how key encoding lets a single ordered KV store emulate document/graph/time-series models, LSM-tree vs B-tree trade-offs, and the benchmark that killed my pure Elixir dreams.
I wanted to use CubDB (pure Elixir, no NIF risks, easy debugging). The benchmarks said otherwise: RocksDB was 177x faster on writes and used 26,000x less memory during batch operations. For a distributed database, that gap is insurmountable.
The post also covers living with NIFs in Elixir - they bypass the BEAM scheduler, so a crash kills your VM instead of just a process. You architect around it: shard isolation, replication, aggressive monitoring.
Also discussed: RocksDB column families (underrated feature for multi-model storage), write amplification as the LSM-tree tax, and why this approach handles time-series data but won't compete with columnar engines like ClickHouse for pure analytics.
Next post will cover Raft consensus for metadata and how the CP metadata plane coordinates with the AP data plane.
Happy to discuss storage engine choices, NIF risk mitigation, or whether the CubDB benchmarks surprised anyone else who's used it.