Schedule-Free Lion Optimizer

Posted3 months ago

quantosaurus

1 points

1 comments

github.comTechstory

calmneutral

Debate

0/100

Optimization AlgorithmsMachine LearningOpen-Source

Key topics

Optimization Algorithms

Machine Learning

Open-Source

A new 'Schedule-Free Lion Optimizer' is released on GitHub, with minimal discussion on HN, indicating a lack of strong opinions or controversy.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

N/A

Peak period

Start

Avg / period

Key moments

01Story posted
Oct 7, 2025 at 4:31 AM EDT
3 months ago
Step 01
02First comment
Oct 7, 2025 at 4:31 AM EDT
0s after posting
Step 02
03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03
04Latest activity
Oct 7, 2025 at 4:31 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

quantosaurusAuthor

3 months ago

While working on new ML architectures I struggled to stabilize training by using countless learning-rate schedulers, gradient clippers and normalizers enough to go and implement a schedule-free optimizer.

Here, Lion Schedule-Free optimizer - a version of Lion optimizer that requires no learning-rate scheduler. It uses sign agreement - an absolute value of cross correlation between momentum sign and gradient sign, to scale the effective update step. Not only it converges 3x times faster ON MY MODEL, by eliminating LR scheduler it also allows for hot training resume & restart. And also stabilizes training, especially late training, eliminating the need for gradient clipping, etc. The effective update depends on the training regime and can decrease or increase during training. In this implementation, the sign agreement is calculated per-module. It's probably more logical and stable to calculate it per-parameter-group, but that's more code and since module-wise already works pretty well...

The optimizer is provided as is. There will be no paper, no convergence guarantees, no ablation studies and no time to do any of that.

Install it:

pip install git+https://github.com/govorunov/lion-sf.git

And use it as normal optimizer:

from lion_pytorch import LionSF

optimizer = LionSF(model.parameters(), lr=5e-4, betas=(0.9, 0.99), weight_decay=1e-2) Give it a generous base learning rate, like 5e-4 or more, and ditch LR scheduler completely. You can also ditch gradient clipping (as I did).

If you want to resume / restart training later from a checkpoint - keep the optimizer state, do a hot-restart. There is no need to warm-up - it will restart gently naturally. The ability to do a hot-restart and increased training stability is probably more important (for me) than even faster convergence, although faster convergence looks better on plots.

View full discussion on Hacker News

ID: 45500724Type: storyLast synced: 11/17/2025, 11:07:47 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN