Fantastic Pretraining Optimizers and Where to Find Them
Posted4 months agoActive4 months ago
arxiv.orgSciencestory
calmmixed
Debate
30/100
Deep LearningOptimizersPretraining
Key topics
Deep Learning
Optimizers
Pretraining
The paper 'Fantastic pretraining optimizers and where to find them' explores alternative optimizers for deep learning pretraining, sparking discussion on the importance of factors like speed and memory usage.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
6h
Peak period
2
6-7h
Avg / period
1.3
Key moments
- 01Story posted
Sep 5, 2025 at 2:15 PM EDT
4 months ago
Step 01 - 02First comment
Sep 5, 2025 at 8:02 PM EDT
6h after posting
Step 02 - 03Peak activity
2 comments in 6-7h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 5, 2025 at 10:37 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45141762Type: storyLast synced: 11/20/2025, 7:45:36 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
This may all seem in good fun, but it makes a real difference when you have to introduce this paper to students and other academics getting into the discipline. It's just embarrassing, and always garners a reaction from a facepalm to disgust. This is especially true now that Rowling is a controversial figure.
As for the paper itself, this provides a good source for referencing, but the conclusions drawn here seem to be pretty commonly known in the folklore. I think we're finally starting to see a healthy and meaningful shift in tone from the optimization community that has been obsessed with early convergence rates for years. It's good to have options in optimizers, but the decision on which optimizer to use is rarely so straightforward and comes from prior experience. Most will stick with AdamW.