Back to Home11/18/2025, 1:15:27 AM

Out-of-Distribution Generalization in Transformers via Latent Space Reasoning

marojejian

1 points

1 comments

Mood

thoughtful

Sentiment

positive

Discussion Activity

Light discussion

First comment

N/A

Peak period

Hour 1

Avg / period

Comment distribution1 data points

Based on 1 loaded comments

Key moments

01Story posted
11/18/2025, 1:15:27 AM
20h ago
Step 01
02First comment
11/18/2025, 1:15:27 AM
0s after posting
Step 02
03Peak activity
1 comments in Hour 1
Hottest window of the conversation
Step 03
04Latest activity
11/18/2025, 1:15:27 AM
20h ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

marojejian

20h ago

openreview (https://openreview.net/forum?id=Wjgq9ISdP0)

I found this paper both really interesting and clear. No one part is very novel, but It composes disparate threads to obtain what looks like strong results in OOD length generalization. Even for the toy task, and using a DSL (vs. being an LM), length-generalizing on simple math >4x is impressive, from what I've read.

This also fits my priors for the key elements of unlocking better OOD compositional generalization: variable recurrence, step-wise curriculum training to build depth-invariant algorithms, discrete bottlenecks. Finally, it's very interesting to compare this to the below recent article arguing for the benefits of continuous latent spaces: Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought (https://arxiv.org/abs/2505.12514)

My take is both papers are right, and that continuous spaces are more expressive and can handle tougher problem spaces (e.g. shortest graph path), whereas discrete spaces will provide a better inductive bias for elegant algorithms that can scale OOD. And I bet the two can be combined / balanced.

View full discussion on Hacker News

ID: 45960311Type: storyLast synced: 11/18/2025, 1:18:04 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN