LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
Mood
thoughtful
Sentiment
mixed
Category
science
Key topics
self-supervised learning
AI research
machine learning
The paper 'LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics' presents a new approach to self-supervised learning, sparking discussion on its potential and limitations compared to existing methods like autoregressive LLMs.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
2h
Peak period
4
Hour 3
Avg / period
1.8
Based on 16 loaded comments
Key moments
- 01Story posted
11/18/2025, 2:58:31 AM
1d ago
Step 01 - 02First comment
11/18/2025, 4:55:46 AM
2h after posting
Step 02 - 03Peak activity
4 comments in Hour 3
Hottest window of the conversation
Step 03 - 04Latest activity
11/18/2025, 3:46:43 PM
1d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
In a probability distribution model, the model is always forced to output a probability for a set of tokens, even if all the states are non sense. In an energy based model, the model can infer that a states makes no sense at all and can backtrack by itself.
Notice that diffusion models, DINO and other successful models are energy based models, or end up being good proxies of the data density (density is a proxy of entropy ~ information).
Finally, all probability models can be thought as energy based, but not all EBM output probabilities distributions.
So, his argument is not against transformers or the architectures themselves, but more about the learned geometry.
A lot of people say "LLMs are fundamentally flawed, a dead end, and can never become AGI", but on deeper examination? The arguments are weak at best, and completely bogus at worst. And then the suggested alternatives fail to outperform the baseline.
I think by now, it's clear that pure next token prediction as a training objective is insufficient in practice (might be sufficient in theory in the limit?) - which is why we see things like RLHF, RLAIF and RLVR in post-training instead of just SFT. But that says little about the limitations of next token prediction as an architecture.
Next token prediction as a training objective still allows an LLM to learn an awful lot of useful features and representations in an unsupervised fashion, so it's not going away any time soon. But I do expect to see modified pre-training, with other objectives alongside it, to start steering the models towards features that are useful for inference early on.
Lecun still can't show JEPA competitive at scale with autoregressive LLM.
Source: Y. LeCun.
Does anybody understand why that benchmark might still be reasonable?
The idea is to show that unsupervised pre-training on your target data, even if you don't have a lot of it, can beat transfer learning from a larger, but less focused dataset.
1 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.