LLM Visualization
Posted4 months agoActive4 months ago
bbycroft.netTechstory
excitedpositive
Debate
20/100
LLM VisualizationAI ExplainabilityMachine Learning
Key topics
LLM Visualization
AI Explainability
Machine Learning
A highly-visualized representation of Large Language Models (LLMs) has been shared, sparking discussion on its potential as a teaching tool and the limitations of current understanding of LLMs.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
2h
Peak period
14
18-24h
Avg / period
6.4
Comment distribution45 data points
Loading chart...
Based on 45 loaded comments
Key moments
- 01Story posted
Sep 4, 2025 at 2:06 PM EDT
4 months ago
Step 01 - 02First comment
Sep 4, 2025 at 4:17 PM EDT
2h after posting
Step 02 - 03Peak activity
14 comments in 18-24h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 8, 2025 at 5:12 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45130260Type: storyLast synced: 11/22/2025, 11:47:55 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
On a more serious note, this highlights a deeper issue with HN, similar sites and the attention economy. When an article takes a lot of time to read:
- The only people commenting at first have no read it.
- By the time you are done reading it, it's no longer visible on the front page so new people are not coming in anymore and the discussion appears dead. This discourages people who read it from making thoughtful comments because few people will read it.
- There are people who wait for the discussion to die down so they can read it without missing the later thoughtful comments but they are discouraged from participating earlier while the discussion is alive because then they'd have to wade through the constantly changing discussion and separate what they have already seen from what they haven't.
---
Back on topic, I'd love to see this with weights from an actual working model and a customizable input text so we could see how both the seed and input affects the output. And also a way to explore vectors representing "meanings" the way 3blue1brown did in his LLM videos.
https://youtu.be/KSovbSkARYw
"Adding numbers. the green line are the weights.
At the top: the red circle indicates an incorrect answer. the green circle indicates a correct answer.
As the NN learns, the weights adjust and the green circle appears more often. "
"Guys, if this hammer works as advertised, you'll totally be fired"
"Ok, boss! Let me figure it out for you"
Man, kids these days.
My suggestion would be one of the gemma3 models:
https://ollama.com/library/gemma3/tags
Picking one where the size is < your VRAM(or, memory if without a dedicated GPU) is a good rule of thumb. But you can always do more with less if you get into the settings for Ollama(or other tools like it).
Its already possible to run an LLM off chips, of course depending on the LLM and the chip.
I find the model to be extremely simple, you can write the attention equation on a napkin.
This is the core idea:
Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V
The attention process itself is based on all-to-all similarity calculation Q * K
Having said it's interesting to point out that the modules are what allow CPU offload. It's fairly common to run some parts on the CPU and others on the GPU/NPU/TPU depending on your configuration. This has some performance costs but allows more flexibility.
Where does this come from in abstract/math? Did we not have it before, or did we just not consider it an avenue to go into? Or is it just simply the idea of scraping the entirety of human knowledge was just not considered until someone said "well, we could just scrape everything?"
Were there recent breakthroughs from what we've understood about ML that have lead to this current explosion of research and pattern discovery and refinement?
That's the current stage we're at and is the whole scraping the entirety of human knowledge thing. Compute has gotten good enough and data readily accessible to do all this, plus we have architectures like transformers that scale really nicely.
For multilayer back propagation.
Skips all the obtuse subscript jargon which differs in every text out there!
LLM Visualization - https://news.ycombinator.com/item?id=38505211 - Dec 2023 (131 comments)
How does it get from the ideas to the intelligence? What if we saw intelligence as the ideas themselves?
The Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/
Sebastian Raschka, PhD has a post on the architectures: https://magazine.sebastianraschka.com/p/from-gpt-2-to-gpt-os...
This HN comment has numerous resources: https://news.ycombinator.com/item?id=35712334
1 more comments available on Hacker News