Writing an LLM From Scratch, Part 20 – Starting Training, and Cross Entropy Loss
Posted3 months agoActive3 months ago
gilesthomas.comTechstory
calmpositive
Debate
20/100
LLMMachine LearningDeep Learning
Key topics
LLM
Machine Learning
Deep Learning
The author continues their series on building an LLM from scratch, discussing the start of training and cross-entropy loss, with commenters providing feedback and suggestions on the implementation.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
16h
Peak period
2
16-18h
Avg / period
1.5
Key moments
- 01Story posted
Oct 2, 2025 at 5:14 PM EDT
3 months ago
Step 01 - 02First comment
Oct 3, 2025 at 9:42 AM EDT
16h after posting
Step 02 - 03Peak activity
2 comments in 16-18h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 3, 2025 at 12:09 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45455648Type: storyLast synced: 11/20/2025, 4:29:25 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
This is the approach the author has taken.
Hugging Face's Trainer class takes a different approach. The label is same as input shifted left by 1 and padded by the <ignore> token (-1). Cross entropy is then calculated for the output logits and shifted label. At least this my understanding after reviewing the code.part 1