Back to Home11/18/2025, 6:31:11 PM

Show HN: Aion-Torch – Adaptive residual scaling for deep Transformers

1 points
0 comments

Mood

thoughtful

Sentiment

positive

Category

tech

Key topics

deep learning

Transformers

PyTorch

Hello HN, I’ve turned my Master’s research on stabilizing very deep Transformers into an open-source PyTorch library called AION-Torch. Instead of a fixed residual connection, it uses an adaptive residual that looks at how “energetic” the block’s input and output are and dials the residual strength up or down to keep things stable. On my small setup (RTX 4060) it seemed to help very deep Transformer stacks keep gradients under control and reach lower loss without special tuning.

The repo has a drop-in AionResidual module, some basic tooling to log what’s happening inside the network, and small examples to show how to plug it into existing models. I’d love feedback on whether this idea makes sense beyond toy setups, how you would benchmark it against standard residuals/DeepNorm on real tasks, and if the API feels natural for people who train larger models.

The author shares their open-source PyTorch library, Aion-Torch, which implements adaptive residual scaling for deep Transformers, and seeks feedback on its effectiveness and usability.

Snapshot generated from the HN discussion

Discussion Activity

No activity data yet

We're still syncing comments from Hacker News.

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (0 comments)

Discussion hasn't started yet.

ID: 45970113Type: storyLast synced: 11/18/2025, 6:32:40 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.