A Short Introduction to Optimal Transport and Wasserstein Distance (2020)
Posted4 months agoActive4 months ago
alexhwilliams.infoSciencestory
calmpositive
Debate
20/100
Optimal TransportWasserstein DistanceGenerative AIStatistics
Key topics
Optimal Transport
Wasserstein Distance
Generative AI
Statistics
The post introduces optimal transport and Wasserstein distance, sparking discussion on their applications in generative AI and parameter estimation.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
3d
Peak period
3
60-66h
Avg / period
1.7
Key moments
- 01Story posted
Aug 21, 2025 at 7:15 PM EDT
4 months ago
Step 01 - 02First comment
Aug 24, 2025 at 10:40 AM EDT
3d after posting
Step 02 - 03Peak activity
3 comments in 60-66h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 25, 2025 at 3:27 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 44979301Type: storyLast synced: 11/20/2025, 11:26:10 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
But!
Wasserstein distances are used instead of a KL inside all kinds of VAE's and diffusion models, because while the Wasserstein distance is hard to compute, it is easy to make distributions whose expectation is the gradient wrt to the Wasserstein distance. So you can easily get unbiased gradients, and that is all you need to train big neural networks. [0] Pretty much any time you sample from your current and the target distribution and take the gradient of the distance between the points, you will be minimizing a Wasserstein distance.
[0] https://arxiv.org/abs/1711.01558
[1] https://deepgenerativemodels.github.io/
[2] https://youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXa...
[1] https://en.wikipedia.org/wiki/Earth_mover's_distance#More_th...