We Built the Fastest Data Replication Tool in the World Using Go

Posted3 months agoActive3 months ago

Cappybara12

2 points

1 comments

Techstory

supportivepositive

Debate

0/100

Data ReplicationGo Programming LanguageApache Iceberg

Key topics

Data Replication

Go Programming Language

Apache Iceberg

hey people! At OLake, our team has been building a high-throughput data replication tool in Go for a while now. the more we push real workloads, the more it is getting clear that Go is a fantastic fit for data engineering simple concurrency, predictable deploys, tiny containers, and great perf without a JVM.

As part of that journey, we’ve been contributing upstream to the Apache Iceberg Go ecosystem. this week, our PR to enable writing into partitioned tables got merged (https://github.com/apache/iceberg-go/pull/524)

However that may sound niche, but it unlocks a very practical path for Go services to write straight to Iceberg (no Spark/Flink detour) and be query-ready in Trino/Spark/DuckDB right away.

what we added : partitioned fan-out writer that splits data into multiple partitions, with each partition having its own rolling data writer efficient Parquet flush/roll as the target file size is reached, all the usual Iceberg transforms supported: identity, bucket, truncate, year/month/day/hour Arrow-based write for stable memory & fast columnar handling

and why we’re bullish on Go for building our platform - OLake?

the runtime’s concurrency model makes it straightforward to coordinate partition writers, batching, and backpressure. small static binaries → easy to ship edge and sidecar ingestors. great ops story (observability, profiling, and sane resource usage) which is a big deal when you’re replicating at high rates. where this helps right now: building micro-ingestors that stream changes from DBs to Iceberg in Go. edge or on-prem capture where you don’t want a big JVM stack. teams that want cleaner tables (fewer tiny files) without a separate compaction job for every write path.

For data teams still worried about Go, we have our case study helps you : check the benchmarks we’re hitting thanks to the language’s lightweight model See numbers here: https://olake.io/docs/benchmarks

If you’re experimenting with Go + Iceberg, we’d love to collaborate as we believe in open source :)

repo: https://github.com/datazip-inc/olake/

The OLake team shares their experience building a high-throughput data replication tool in Go, highlighting its benefits for data engineering and their contributions to the Apache Iceberg Go ecosystem.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

Peak period

48-54h

Avg / period

Key moments

01Story posted
Sep 29, 2025 at 8:48 AM EDT
3 months ago
Step 01
02First comment
Oct 1, 2025 at 2:21 PM EDT
2d after posting
Step 02
03Peak activity
1 comments in 48-54h
Hottest window of the conversation
Step 03
04Latest activity
Oct 1, 2025 at 2:21 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

gus_massa

3 months ago

Clicky: https://github.com/datazip-inc/olake/

View full discussion on Hacker News

ID: 45413064Type: storyLast synced: 11/17/2025, 12:05:26 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

View on HN