Scaffolding to Superhuman: How Curriculum Learning Solved 2048 and Tetris
Key topics
The art of teaching machines to master complex games like 2048 and Tetris just got a whole lot more fascinating, thanks to a clever approach called curriculum learning. As commenters dug in, it became clear that this method, which involves gradually increasing the difficulty of training data, is not just a neat trick, but a potentially game-changing strategy that's already being used in other domains, like sports training. Some commenters questioned whether starting with simpler tasks is "cheating," but others pointed out that it's a common-sense approach that's been used to solve other complex problems, like the Rubik's Cube. The real takeaway here is that achieving "superhuman" performance may not require massive resources, but rather a smart training regimen.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
26m
Peak period
21
0-6h
Avg / period
5.7
Based on 34 loaded comments
Key moments
- 01Story posted
Dec 31, 2025 at 10:52 AM EST
9 days ago
Step 01 - 02First comment
Dec 31, 2025 at 11:18 AM EST
26m after posting
Step 02 - 03Peak activity
21 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Jan 3, 2026 at 4:17 PM EST
6d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
If you sat down to solve a problem you’ve never seen before you wouldn’t even know what a valid “later state” looking like.
This is precisely how RL worked for learning Atari games: you don't start with the game halfway solved and then claim the AI solved the end-to-end problem on its own.
The goal in these scenarios is for the machine to solve the problem with no prior information.
Indeed, this is a key to teaching people to know how to advance. Do not focus on a side, but learn to advance a layer.
e.g. DeepCubeA 2019 (!) paper to solve Rubik cube.
Start with solved state and teach the network successively harder states. This is so "obvious" and "unhelpful in real domains" that perhaps they havent heard of this paper.
The happy Tetris bug is also a neat example of how “bad” inputs can act like curriculum or data augmentation. Corrupted observations forced the policy to be robust to chaos early, which then paid off when the game actually got hard. That feels very similar to tricks in other domains where we deliberately randomize or mask parts of the input. It makes me wonder how many surprisingly strong RL systems in the wild are really powered by accidental curricula that nobody has fully noticed or formalized yet.
The interesting tasks, however, tend to take a lot more effort.
Greasemonkey / Tampermonkey / User Scripts with
Array.from( document.querySelectorAll(".submission>.title") ).filter( e => e.innerText.includes("AI") ).map( e => e.parentElement.style.opacity = .1)
> Believe it or not, you can visit more than 1 website.
Does any of this used energy benefit any other problem?
Also using "Superhuman" in the title is absurd given this paltry outcome.
Reward scales and horizon lengths may vary across tasks with different difficulty, effectively exploring policy space (keeping multimodal strategy distributions for exploration before overfitting on small problems), and catastrophic forgetting when mixing curriculum levels or when introducing them too late.
Does any reader/or the author have good heuristics for these? Or is it still so problem dependent that hyper parameter search for finding something that works in spite of these challenges is still the go to?
If one can frame the problem into a competition, then self-play has been shown to work repeatedly.
What you get is an iterator over the dataset that samples based on how far you are in the training.
0: https://github.com/omarkamali/curriculus