Back to Home11/18/2025, 2:58:31 AM

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

68 points
17 comments

Mood

thoughtful

Sentiment

mixed

Category

science

Key topics

self-supervised learning

AI research

machine learning

Debate intensity60/100

The paper 'LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics' presents a new approach to self-supervised learning, sparking discussion on its potential and limitations compared to existing methods like autoregressive LLMs.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

2h

Peak period

4

Hour 3

Avg / period

1.8

Comment distribution16 data points

Based on 16 loaded comments

Key moments

  1. 01Story posted

    11/18/2025, 2:58:31 AM

    1d ago

    Step 01
  2. 02First comment

    11/18/2025, 4:55:46 AM

    2h after posting

    Step 02
  3. 03Peak activity

    4 comments in Hour 3

    Hottest window of the conversation

    Step 03
  4. 04Latest activity

    11/18/2025, 3:46:43 PM

    1d ago

    Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (17 comments)
Showing 16 comments of 17
cl42
1d ago
1 reply
This Yann LeCun lecture is a nice summary of the conceptual model behind JEPA (+ why he isn't a fan of autoregressive LLMs): https://www.youtube.com/watch?v=yUmDRxV0krg
krackers
1d ago
3 replies
Is there a summary? Every time I try to understand more about what LeCun is saying all I see are strawmans of LLMs (like claims that LLMs cannot learn a world model or that next token prediction is insufficient for long-range planning)
sbinnee
1d ago
You don’t sound like a layman knowing the looped latents and others :)
estebarb
1d ago
The criticisms are not strawmans, are actually well grounded on math. For instance, promoting energy based models.

In a probability distribution model, the model is always forced to output a probability for a set of tokens, even if all the states are non sense. In an energy based model, the model can infer that a states makes no sense at all and can backtrack by itself.

Notice that diffusion models, DINO and other successful models are energy based models, or end up being good proxies of the data density (density is a proxy of entropy ~ information).

Finally, all probability models can be thought as energy based, but not all EBM output probabilities distributions.

So, his argument is not against transformers or the architectures themselves, but more about the learned geometry.

ACCount37
1d ago
That's the issue I have with criticism of LLMs.

A lot of people say "LLMs are fundamentally flawed, a dead end, and can never become AGI", but on deeper examination? The arguments are weak at best, and completely bogus at worst. And then the suggested alternatives fail to outperform the baseline.

I think by now, it's clear that pure next token prediction as a training objective is insufficient in practice (might be sufficient in theory in the limit?) - which is why we see things like RLHF, RLAIF and RLVR in post-training instead of just SFT. But that says little about the limitations of next token prediction as an architecture.

Next token prediction as a training objective still allows an LLM to learn an awful lot of useful features and representations in an unsupervised fashion, so it's not going away any time soon. But I do expect to see modified pre-training, with other objectives alongside it, to start steering the models towards features that are useful for inference early on.

byyoung3
1d ago
1 reply
jepa shows little promise over traditional objectives in my own experiments
eden-u4
1d ago
1 reply
what type of experiments did you run in less than a week to be so dismissing? (seriously curious)
hodgehog11
1d ago
JEPA has been around for quite a while now, so many labs have had time to assess its viability.
rfv6723
1d ago
1 reply
> using imagenet-1k for pretraining

Lecun still can't show JEPA competitive at scale with autoregressive LLM.

welferkj
1d ago
It's ok, autoregressive LLMs are a dead end anyway.

Source: Y. LeCun.

suthakamal
1d ago
1 reply
More optimistic signal it’s very early innings in the architectural side of AI, with many more orders of magnitude power-to-intelligence efficiency to come, and less certainty today’s giants’ advantages will be durable.
ACCount37
1d ago
I've seen too many "architectural breakthroughs" that failed to accomplish anything at all to be this bullish on architectural gains.
artitars
1d ago
1 reply
I am a bit confused by the benchmark comparison they are doing. The comparison of a domain specific "LeJEPA" on astronomy images against general models, which are not explicitly fine-tuned on astronomy images seems misleading to me.

Does anybody understand why that benchmark might still be reasonable?

yorwba
1d ago
The comparison is against general models which are explicitly fine-tuned. Specifically, they pre-train their models on unlabeled in-domain images and take DINO models pre-trained on internet-scale general images, then fine-tune both of them on a small number of labeled in-domain images.

The idea is to show that unsupervised pre-training on your target data, even if you don't have a lot of it, can beat transfer learning from a larger, but less focused dataset.

ml-anon
1d ago
lolJEPA
estebarb
1d ago
I'm a bit confused about the geometry. I'm not sure if the result ends up being like an fuzzy hypersphere or more like an "spiky hyperstar".

1 more comments available on Hacker News

ID: 45960922Type: storyLast synced: 11/19/2025, 1:57:15 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.