Matrix Core Programming on Amd Gpus
Posted3 months agoActive3 months ago
salykova.github.ioTechstory
calmmixed
Debate
40/100
GPU ProgrammingMatrix MultiplicationAmd Hardware
Key topics
GPU Programming
Matrix Multiplication
Amd Hardware
The article discusses programming matrix cores on AMD GPUs, sparking a discussion on the suitability of GPUs for matrix multiplication and the complexities of parallel processing.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
6h
Peak period
2
6-8h
Avg / period
1.3
Key moments
- 01Story posted
Oct 4, 2025 at 5:22 PM EDT
3 months ago
Step 01 - 02First comment
Oct 4, 2025 at 11:42 PM EDT
6h after posting
Step 02 - 03Peak activity
2 comments in 6-8h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 5, 2025 at 12:41 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45476821Type: storyLast synced: 11/20/2025, 12:59:45 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
You're pretending that each streaming multiprocessor can handle independent threads, when in reality you're feeding something that only exists once or twice per SM. It's like independently controlling one out of 32 cars on a 32 lane highway where the cars aren't allowed to switch lanes and having the controls on one car replicated to all the others when in reality everyone is sitting in the same bus.
My vision of SMs has always been "assume AVX512 is the default ISA" and "tensor cores are another layer aside of this" (kind-of like AMX) and you have this heterogeneous "thing" to program. Don't know if it helps. The CUDA programming model hides a lot and looking at PTX code in nsight-compute is most enlightening.