Scaffolding to Superhuman: How Curriculum Learning Solved 2048 and Tetris

Posted9 days agoActive6d ago

a1k0n

145 points

31 comments

kywch.github.ioResearchstory

informativepositive

Debate

20/100

AI ResearchCurriculum LearningAI-Powered Support

Key topics

AI Research

Curriculum Learning

AI-Powered Support

The art of teaching machines to master complex games like 2048 and Tetris just got a whole lot more fascinating, thanks to a clever approach called curriculum learning. As commenters dug in, it became clear that this method, which involves gradually increasing the difficulty of training data, is not just a neat trick, but a potentially game-changing strategy that's already being used in other domains, like sports training. Some commenters questioned whether starting with simpler tasks is "cheating," but others pointed out that it's a common-sense approach that's been used to solve other complex problems, like the Rubik's Cube. The real takeaway here is that achieving "superhuman" performance may not require massive resources, but rather a smart training regimen.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

26m

Peak period

0-6h

Avg / period

5.7

Comment distribution34 data points

Loading chart...

Based on 34 loaded comments

Key moments

01Story posted
Dec 31, 2025 at 10:52 AM EST
9 days ago
Step 01
02First comment
Dec 31, 2025 at 11:18 AM EST
26m after posting
Step 02
03Peak activity
21 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Jan 3, 2026 at 4:17 PM EST
6d ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (31 comments)

Showing 34 comments

bob1029

9 days ago

2 replies

[delayed]

larrydag

9 days ago

2 replies

perhaps I'm missing something. Why not start the learning at a later state?

LatencyKills

9 days ago

1 reply

If the goal is to achieve end-to-end learning that would be cheating.

If you sat down to solve a problem you’ve never seen before you wouldn’t even know what a valid “later state” looking like.

taeric

9 days ago

1 reply

Why is it cheating? We literally teach sports this way? Often times you teach sports by learning in scaled down scenarios. I see no reason this should be different.

LatencyKills

6d ago

1 reply

If the goal is to learn how to solve a Rubik's Cube when you've never seen a Rubik's Cube before, you have no idea what "halfway solved" even looks like.

This is precisely how RL worked for learning Atari games: you don't start with the game halfway solved and then claim the AI solved the end-to-end problem on its own.

The goal in these scenarios is for the machine to solve the problem with no prior information.

taeric

6d ago

This isn't accurate, though? Halfway solved, for most teachings, is to have the first layer solved.

Indeed, this is a key to teaching people to know how to advance. Do not focus on a side, but learn to advance a layer.

bob1029

9 days ago

That's effectively what you get in either case. With MLM, on the first learning iteration you might only mask exactly one token per sequence. This is equivalent to starting learning at a later state. The direction of the curriculum flows toward more and more of these being masked over time, which is equivalent to starting from earlier and earlier states. Eventually, you mask 100% of the sequence and you are starting from zero.

algo_trader

9 days ago

This is less about masked modelling and more about reverse-curriculum.

e.g. DeepCubeA 2019 (!) paper to solve Rubik cube.

Start with solved state and teach the network successively harder states. This is so "obvious" and "unhelpful in real domains" that perhaps they havent heard of this paper.

pedrozieg

9 days ago

1 reply

What I like about this writeup is that it quietly demolishes the idea that you need DeepMind-scale resources to get “superhuman” RL. The headline result is less about 2048 and Tetris and more about treating the data pipeline as the main product: careful observation design, reward shaping, and then a curriculum that drops the agent straight into high-value endgame states so it ever sees them in the first place. Once your env runs at millions of steps per second on a single 4090, the bottleneck is human iteration on those choices, not FLOPs.

The happy Tetris bug is also a neat example of how “bad” inputs can act like curriculum or data augmentation. Corrupted observations forced the policy to be robust to chaos early, which then paid off when the game actually got hard. That feels very similar to tricks in other domains where we deliberately randomize or mask parts of the input. It makes me wonder how many surprisingly strong RL systems in the wild are really powered by accidental curricula that nobody has fully noticed or formalized yet.

ACCount37

9 days ago

You never needed DeepMind scale resources to get superhuman performance on a small subset of narrow tasks. Deep Blue scale resources are often enough.

The interesting tasks, however, tend to take a lot more effort.

kgwxd

9 days ago

3 replies

Great, add "curriculum" to the list of words that will spark my interest in human learning, only for it to be about garbage AI. I want HN with a hard rule against AI posts.

utopiah

9 days ago

1 reply

> HN with a hard rule against AI posts.

Greasemonkey / Tampermonkey / User Scripts with

Array.from( document.querySelectorAll(".submission>.title") ).filter( e => e.innerText.includes("AI") ).map( e => e.parentElement.style.opacity = .1)

snet0

9 days ago

2 replies

Notably this doesn't match the current thread.

utopiah

9 days ago

Expand e.innerText.includes("AI") with an array of whatever terms you prefer.

shwaj

9 days ago

Could always run the posts through a LLM to decide which are about AI :-p

yunwal

9 days ago

2 replies

Are we really dismissing the entire field of AI just because LLMs are overhyped?

kgwxd

9 days ago

1 reply

Believe it or not, you can visit more than 1 website. How about a guideline to put (AI) like we do with (video). I'm just sick of having to click to figure out if it's about humans or computers.

pessimizer

9 days ago

The famous Hacker News website is about computers. It is also about ad revenue and VC funding. It was originally named Startup News, and its patron and author is the multibillionaire founder of a well-known "startup accelerator" called "Y Combinator."

> Believe it or not, you can visit more than 1 website.

themafia

9 days ago

LLMs show the problems of energy economy in this form of computing. It costs way too much in resources and power for minimal and generally worthless results. 2048 is a game with a several known algorithm for winning. Tetris is an obscenely simple game that unassisted humans could reliably take to the kill screen 20 years ago.

Does any of this used energy benefit any other problem?

Also using "Superhuman" in the title is absurd given this paltry outcome.

artninja1988

9 days ago

Why garbage ai? I thought it was a very interesting post, personally.

gyrovagueGeist

9 days ago

1 reply

I've always found curriculum learning incredibly hard to tune and calibrate reliably (even more so than many other RL approaches!).

Reward scales and horizon lengths may vary across tasks with different difficulty, effectively exploring policy space (keeping multimodal strategy distributions for exploration before overfitting on small problems), and catastrophic forgetting when mixing curriculum levels or when introducing them too late.

Does any reader/or the author have good heuristics for these? Or is it still so problem dependent that hyper parameter search for finding something that works in spite of these challenges is still the go to?

kywch

9 days ago

I think Go-Explore (https://arxiv.org/abs/1901.10995) is promising. It'll provide automatic scaffolding and prevent catastrophic forgetting.

If one can frame the problem into a competition, then self-play has been shown to work repeatedly.

infinitepro

9 days ago

1 reply

Unless I am mistaken, this would be the first heuristic-free model trained to play tetris, which is pretty incredible, since mastering tetris from just raw game state has never been close to solved, till now(?)

kywch

9 days ago

Pufferlib already had a pretty good model before: https://puffer.ai/ocean.html?env=tetris

NooneAtAll3

9 days ago

1 reply

I wonder if he tried NNUE

bonzini

9 days ago

NNUE is for deep searches, as far as I understand this just says what move to do based on the state?

hiddencost

9 days ago

Those are not hard tasks ...

kywch

9 days ago

You can watch these agents play live, and you can also intervene * 2048: https://kywch.github.io/games/2048.html * Tetris: https://kywch.github.io/games/tetris.html

juggy69

9 days ago

Is there value in using deep RL for problems that seem more suited to planning-based approaches?

omneity

9 days ago

Related, I heard about curriculum learning a lot for LLMs but I couldn’t find a library to order training data by an arbitrary measure like difficulty, so I made one[0].

What you get is an iterator over the dataset that samples based on how far you are in the training.

0: https://github.com/omarkamali/curriculus

jsuarez5341

9 days ago

All open source - don't forget to feed the puffer a star! https://github.com/pufferai/pufferlib

Zacharias030

9 days ago

I'm gonna go out on a limb and say that this is LLM written slop that is badly edited by a human. Factually correct but the awful writing remains.

drubs

9 days ago

Star the puffer https://github.com/PufferAI/PufferLib

someoneontenet

9 days ago

Curriculum learning helped me out a lot in this project too https://www.robw.fyi/2025/12/28/solve-hi-q-with-alphazero-an...

View full discussion on Hacker News

ID: 46445195Type: storyLast synced: 1/2/2026, 10:10:38 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN