Writing High-Performance Matrix Multiplication Kernels for Blackwell

Posted3 months agoActive3 months ago

lairv

63 points

6 comments

docs.jax.devTechstory

excitedpositive

Debate

20/100

High-Performance ComputingMatrix MultiplicationGPU Optimization

Key topics

High-Performance Computing

Matrix Multiplication

GPU Optimization

The post discusses the implementation of high-performance matrix multiplication kernels for NVIDIA's Blackwell GPUs using JAX's Pallas, sparking interest and comparisons to previous work on CUDA matrix multiplication.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

Peak period

96-108h

Avg / period

Key moments

01Story posted
Oct 2, 2025 at 11:43 AM EDT
3 months ago
Step 01
02First comment
Oct 6, 2025 at 4:54 PM EDT
4d after posting
Step 02
03Peak activity
5 comments in 96-108h
Hottest window of the conversation
Step 03
04Latest activity
Oct 7, 2025 at 5:03 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (6 comments)

Showing 6 comments

arjvik

3 months ago

1 reply

The interesting part is this is done in Pallas!

Seems like the Pallas of old has completely been upgraded

reasonableklout

3 months ago

1 reply

Pallas has a couple backends, this is the new-ish Mosaic GPU one. AAUI it provides a bunch of low-level APIs for interacting directly with NVIDIA-specific and new Blackwell features like SMEM, TMEM, collective MMA, etc.

What's interesting is that the MGPU team has achieved SOTA Blackwell GEMM performance before Triton (which IIUC is trying to bring up Gluon to reach the same level). All the big players are coming up with their own block-based low-level-ish DSLs for CUDA: OpenAI, NVIDIA, and now Google.

flakiness

3 months ago

1 reply

So OpenAI has Triton and Google has Pallas. What's the NVIDIA counterpart?

saagarjha

3 months ago

1 reply

Tilus/CUTLASS I assume

flakiness

3 months ago

Interesting: https://github.com/NVIDIA/tilus Thanks for the pointer!

m00x