Writing an LLM From Scratch, Part 22 – Training Our LLM
Posted3 months agoActive3 months ago
gilesthomas.comTechstory
calmpositive
Debate
20/100
LLMAIMachine LearningDeep Learning
Key topics
LLM
AI
Machine Learning
Deep Learning
The author shares their journey of building a Large Language Model (LLM) from scratch, with the latest part focusing on training the model, sparking discussions on the learning process, cost comparisons, and the value of hands-on experience.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
2h
Peak period
2
6-8h
Avg / period
1.3
Key moments
- 01Story posted
Oct 15, 2025 at 7:42 PM EDT
3 months ago
Step 01 - 02First comment
Oct 15, 2025 at 9:24 PM EDT
2h after posting
Step 02 - 03Peak activity
2 comments in 6-8h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 17, 2025 at 1:52 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45599727Type: storyLast synced: 11/20/2025, 3:35:02 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
[1] https://www.gilesthomas.com/2024/12/llm-from-scratch-1
I think it is a great guide. An extended tutorial if you will (at least until this point in my reading). Also having the code right in front of you helps a lot. For example, I was under the impression that embedding vectors were static like in word2vec. Turns out, they are learnable parameters too. I wouldn't have been able to tell for sure if I didn't have the code right in front of me.
There isn't really much intuition to begin with, and I don't really think building intuition will be useful, anyway. Even when looking at something as barebones as perceptrons, it's hard to really see "why" they work. Heck, even implementing a Markov chain from scratch (which can be done in an afternoon with no prior knowledge) can feel magical when it starts outputting semi-legible sentences.
It's like trying to build intuition when it comes to technical results like the Banach-Tarski paradox or Löb's theorem. Imo, understanding the math (which in the case of LLMs is actually quite simple) is orders of magnitude more valuable than "building intuition," whatever that might mean.
I was thinking something like "it is trying to approximate a non-linear function" (which is what it is in the case of MLPs).
Check out the Karpathy "Zero to Hero" videos, and try to follow along by building an MLP implementation in your own language of choice. He does a good job of building intuition because he doesn't skip much of anything.
Feeling nostalgic about the days building LFS in college.
Learning by building wouldn't help you remember all the details but many things would make more sense after going through the process step by step. And it's fun.