Lopa: Scaling Diffusion LLM Single-Sample Throughput to 1000 Tps
Posted10 days ago
zhijie-group.github.ioResearchstory
informativepositive
Debate
20/100
Diffusion ModelsLLMAI Research
Key topics
Diffusion Models
LLM
AI Research
Discussion Activity
Light discussionFirst comment
N/A
Peak period
1
Start
Avg / period
1
Key moments
- 01Story posted
Dec 23, 2025 at 8:37 AM EST
10 days ago
Step 01 - 02First comment
Dec 23, 2025 at 8:37 AM EST
0s after posting
Step 02 - 03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 23, 2025 at 8:37 AM EST
10 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 46365226Type: storyLast synced: 12/23/2025, 1:40:36 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
LoPA-Dist: Engineered for Scale Algorithm is only half the battle. We built LoPA-Dist with Branch Parallelism (BP) to handle the load: - NVIDIA GPUs: Implements a two-phase update protocol (Pre-Write / Commit-Winner) to ensure KV cache consistency. - Ascend 910C: Utilizes Graph Compilation and Block-wise masking for high-throughput serving.