Any Pipeline Tool for Clickhouse, Similar to Snowflake's Dynamic Tables

Posted3 months agoActive3 months ago

tingfirst

2 points

6 comments

snowflake.comTechstory

calmneutral

Debate

20/100

ClickhouseData PipelinesSnowflakeDatabase Management

Key topics

Clickhouse

Data Pipelines

Snowflake

Database Management

The post inquires about pipeline tools for ClickHouse similar to Snowflake's Dynamic Tables, sparking a discussion on existing solutions and potential alternatives.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

N/A

Peak period

0-3h

Avg / period

Key moments

01Story posted
Sep 21, 2025 at 10:16 PM EDT
3 months ago
Step 01
02First comment
Sep 21, 2025 at 10:16 PM EDT
0s after posting
Step 02
03Peak activity
3 comments in 0-3h
Hottest window of the conversation
Step 03
04Latest activity
Sep 23, 2025 at 7:12 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (6 comments)

Showing 6 comments

tingfirstAuthor

3 months ago

1 reply

Is there a native SQL pipeline tool for ClickHouse that processes real-time data incrementally, with low latency, large throughput and high efficiency, similar to Snowflake’s Dynamic Tables?

[1] Dynamic Tables: One of Snowflake’s Fastest-Adopted Features: https://www.snowflake.com/en/blog/reimagine-batch-streaming-...

Sep142324

3 months ago

1 reply

Dynamic Tables are interesting for declarative streaming. In the ClickHouse ecosystem, you might want to look at materialized views combined with streaming engines.

For real-time transformations, there are a few approaches: - Native ClickHouse MaterializedViews with AggregatingMergeTree - Stream processors that write to ClickHouse (Flink, Spark Streaming) - Streaming SQL engines that can read/write ClickHouse

We've been working on streaming SQL at Proton (github.com/timeplus-io/proton) which handles similar use cases - continuous queries that maintain state and can write results back to ClickHouse. The key difference from Dynamic Tables is handling unbounded streams vs micro-batches.

What's your specific use case? Happy to discuss the tradeoffs.

tingfirstAuthor

3 months ago

1 reply

Data sources are usually in Kafka, or other operational databases like Postgres or MySQL

1. Table A : fact events, high-throughput (10k~1M eps), high-cardinality

2. Table B, C, D : couple of dimension tables (fast or slow changing).

The use case is straightforward : join/enrich/lookup everything into one big flattened, analytics-friendly table into ClickHouse.

What’s the best pipeline approach to achieve this in real-time and efficiently?

tingfirstAuthor

3 months ago

1 reply

Consistently we heard about ClickHouse has very limited materialized views that can't handle real-time pipeline fast efficiently enough. would love to see more comments here.

gangtao

3 months ago

there are some limitations as I know:

1. Insert Performance Degradation

Users frequently complain that materialized views significantly slow down insert performance, especially when having multiple MVs on a single table.

2. Streaming Data Patterns

This is critical for ClickHouse materialized views. Streaming data often arrives in frequent, small batches, but ClickHouse performs best when ingesting data in larger batches. The materialized views' transformation query runs synchronously within the INSERT transaction for every single batch, making the fixed overhead disproportionately large for small batches

3. Block-Level Processing Limitations

MVs in ClickHouse operate only on the data blocks being inserted at that moment. When performing aggregation, a single group from the original dataset may have multiple entries in the target table since grouping is applied only to the current insert block.

4. JOIN Limitations and Memory Issues

Materialized views with JOINs are problematic because MVs only trigger on the left-most table. It's also inefficient to update the view upon the right join table since it needs to recreate a hash table each time, or else keeping a large hash table and consuming a lot of memory.

5. Reprocessing historical data requires manual ALTER TABLE operations.

6. Each materialized view will create a new part from the block over which it runs - potentially causing the "Too Many Parts" issue

gangtao

3 months ago

you can check https://github.com/timeplus-io/proton which provides streaming processing pipeline.

View full discussion on Hacker News

ID: 45328498Type: storyLast synced: 11/17/2025, 1:06:07 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN