Continuous Autoregressive Language Models

Posted2 months agoActiveabout 2 months ago

Anon84

115 points

10 comments

arxiv.orgTechstory

calmpositive

Debate

40/100

Artificial IntelligenceLanguage ModelsMachine Learning

Key topics

Artificial Intelligence

Language Models

Machine Learning

The HN community discusses a new paper on Continuous Autoregressive Language Models, exploring its potential to improve efficiency and capabilities of LLMs, while also raising concerns about potential failure modes and limitations.

Snapshot generated from the HN discussion

Discussion Activity

Moderate engagement

First comment

Peak period

180-192h

Avg / period

Key moments

01Story posted
Nov 5, 2025 at 4:49 PM EST
2 months ago
Step 01
02First comment
Nov 13, 2025 at 5:01 AM EST
8d after posting
Step 02
03Peak activity
10 comments in 180-192h
Hottest window of the conversation
Step 03
04Latest activity
Nov 13, 2025 at 3:11 PM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (10 comments)

Showing 10 comments

mentalgear

about 2 months ago

1 reply

Very interesting. Also I find these training parameters quite elegant:

- Diversity: This term encourages the model to generate a diverse set of samples, preventing mode collapse. - Fidelity: This term rewards the model for making predictions that are close to the ground-truth

I'm wondering if a continuos next-vector generative approach also increase innate "reasoning" capabilities of the model, since it could potentially capture more of the semantics of the data vs just tokens.

barrenko

about 2 months ago

1 reply

And may be even more adapted to sorts of RL finetuning?

mike_hearn

about 2 months ago

They say this technique isn't compatible yet with RL because you can't adjust the logits. So no GRPO I guess, which is going to be the biggest issue. An LLM with no RL applied isn't going to be that useful.

killerstorm

about 2 months ago

1 reply

Would be interesting to combine it with Reasoning In the Latent Space: feed the vector from the output layer of transformer back to input.

Obviously, you can't do it in pre-training. But you can add it later as an optional 'extra' vector, I think. E.g. `input_embedding + MLP(prev_output) * alpha`. Alpha is zero during pre-training.

vessenes

about 2 months ago

I like this plan, but don't you already have this from the input vector in the prompt, at least if the inference is 'chunk wise' - generating a latent space vector, decoding it, outputting it, doing the next one.

What if you trained a separate thinking phase using the auto encoder, though? Might be more efficient, and then you've got it using neuralese internally.

Actually, reading the (summary) paper - they tried your idea and had trouble with it for a different reason:

   > Once the generative head predicts the next vector , a natural next step would be to feed it directly as input to the Transformer for predicting . However, we found that the model struggles to unpack the semantic information from such a compact representation. Instead, we ground the autoregressive process back in the more structured discrete space, where the predicted  is passed through the autoencoder to reconstruct the K tokens.

notrealyme123

about 2 months ago

Congratulations for the authors, but damit, there goes a good idea ^^

vatsachak

about 2 months ago

K being fixed here seems like it will eventually be done away with

When I'm thinking about math proofs, sometimes I can have a single idea which can be unfolded into a hundred lines of proof

Maybe I'm getting the wrong analogy here, but if vectors = ideas then K should depend on the vector

Gormanu

about 2 months ago

If this works, we’re looking at the next structural shift in LLMs — and all the “bigger model = better” business might finally face a serious challenger. But — and you knew there’d be a “but” — if the reconstruction fails in edge-cases, or the continuous space hides weird failure modes, then this could backfire and produce models that look efficient but feel brittle.

Still — props to the team for going after the real root of inefficiency, not just piling on more layers. If nothing else, this is one to watch if you care about scaling models smarter.

mike_hearn

about 2 months ago

If they can reinvent RL so it works with this then I guess the big labs will be all over it, as ~halving inference costs would be huge (especially if Ed Zitron's leaked OpenAI inf costs are accurate). Potentially the difference between inferencing being profitable and loss making. It's an elegant approach.

I also wonder how far they can push K if other aspects are tweaked. The approach of just doubling each parameter each time leaves a lot of space between the chosen value and the next value known to not work.

suddenlybananas

about 2 months ago

The technique of compressing tokens down reminds me a bit of byte latent transformers

View full discussion on Hacker News

ID: 45828523Type: storyLast synced: 11/20/2025, 8:00:11 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN