Building a Distributed Database in Elixir, Part 3: Storage Layer and Why RocksDB

Postedabout 1 month agoActiveabout 1 month ago

gawry

1 points

1 comments

medium.comTech Discussionstory

informativepositive

Debate

20/100

Elixir ProgrammingDistributed DatabasesRocksdb

Key topics

Elixir Programming

Distributed Databases

Rocksdb

Discussion Activity

Light discussion

First comment

N/A

Peak period

0-6h

Avg / period

1.3

Key moments

01Story posted
Nov 28, 2025 at 3:49 PM EST
about 1 month ago
Step 01
02First comment
Nov 28, 2025 at 3:49 PM EST
0s after posting
Step 02
03Peak activity
2 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Dec 2, 2025 at 6:35 AM EST
about 1 month ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 4 comments

hasante

about 1 month ago

1 reply

I really love these pure store tools men. The things you can do with them in insane. The bad part is most application developers dont know about them.

gawryAuthor

about 1 month ago

Yes! RocksDB is incredibly powerful with proper setup and key schemas.

Following on that project I might build other things like a stream/queue broker, which will probably fit decently with RocksDB as well

tommica

about 1 month ago

Love the series! Very educative, and helps me think better in elixir!

gawryAuthor

about 1 month ago

Part 3 of my distributed database series. This one covers the storage engine decision - the foundation everything else sits on.

Main topics: how key encoding lets a single ordered KV store emulate document/graph/time-series models, LSM-tree vs B-tree trade-offs, and the benchmark that killed my pure Elixir dreams.

I wanted to use CubDB (pure Elixir, no NIF risks, easy debugging). The benchmarks said otherwise: RocksDB was 177x faster on writes and used 26,000x less memory during batch operations. For a distributed database, that gap is insurmountable.

The post also covers living with NIFs in Elixir - they bypass the BEAM scheduler, so a crash kills your VM instead of just a process. You architect around it: shard isolation, replication, aggressive monitoring.

Also discussed: RocksDB column families (underrated feature for multi-model storage), write amplification as the LSM-tree tax, and why this approach handles time-series data but won't compete with columnar engines like ClickHouse for pure analytics.

Next post will cover Raft consensus for metadata and how the CP metadata plane coordinates with the AP data plane.

Happy to discuss storage engine choices, NIF risk mitigation, or whether the CubDB benchmarks surprised anyone else who's used it.

View full discussion on Hacker News

ID: 46082638Type: storyLast synced: 11/28/2025, 8:50:07 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN