Writing an LLM From Scratch, Part 20 – Starting Training, and Cross Entropy Loss

Posted3 months agoActive3 months ago

gpjt

41 points

3 comments

gilesthomas.comTechstory

calmpositive

Debate

20/100

LLMMachine LearningDeep Learning

Key topics

LLM

Machine Learning

Deep Learning

The author continues their series on building an LLM from scratch, discussing the start of training and cross-entropy loss, with commenters providing feedback and suggestions on the implementation.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

16h

Peak period

16-18h

Avg / period

1.5

Key moments

01Story posted
Oct 2, 2025 at 5:14 PM EDT
3 months ago
Step 01
02First comment
Oct 3, 2025 at 9:42 AM EDT
16h after posting
Step 02
03Peak activity
2 comments in 16-18h
Hottest window of the conversation
Step 03
04Latest activity
Oct 3, 2025 at 12:09 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (3 comments)

Showing 3 comments

leopoldj

3 months ago

1 reply

There's more than one way to do self supervised training.

This is the approach the author has taken.

    Training corpus: "The fat cat sat on the mat"

    Input -> Label
    --------------
    "The" -> " fat"
    "The fat" -> " cat"
    "The fat cat" -> " sat"

Hugging Face's Trainer class takes a different approach. The label is same as input shifted left by 1 and padded by the <ignore> token (-1).

    Training corpus: "The fat cat sat on the mat"
    Input (7 tokens): "The fat cat sat on the mat"
    Output logit (7 tokens): "mat fat sat on fat mat and"
    Shifted label (7 tokens): "fat cat sat on the mat <ignore>"

Cross entropy is then calculated for the output logits and shifted label. At least this my understanding after reviewing the code.

blackbear_

3 months ago

The two ways are equivalent (it's always next token prediction) but the latter is way more efficient as it computes the loss for N tokens in a single forward pass.

asimovDev

3 months ago

https://www.gilesthomas.com/2024/12/llm-from-scratch-1

part 1

View full discussion on Hacker News

ID: 45455648Type: storyLast synced: 11/20/2025, 4:29:25 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN