Not

Hacker

News!

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

Home
Hiring
Products
Companies
Discussion
Q&A
Privacy Policy

Resources

Visit Hacker News
HN API
Modal cronjobs
Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2026 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.

Not

Hacker

News!

Home
Discussion
GPU Optimization

GPU Optimization

20 stories

•

24h: 0%

•

7d: 0

•

866 comments

Top contributors:hd4 sydriax mmastrac vinhnx ashvardanian

Stories

Related Stories

20 stories tagged with gpu optimization

Alibaba Cloud Says It Cut Nvidia AI GPU Use by 82% with New Pooling System

523315 commentsby hd4

Posted3 months agoActiveabout 2 months ago

We Bought the Whole Gpu, So We're Damn Well Going to Use the Whole GPU

504110 commentsby sydriax

Posted3 months agoActiveabout 2 months ago

Fp8 Runs ~100 Tflops Faster When the Kernel Name Has "cutlass" in It

338166 commentsby mmastrac

Posted3 months agoActiveabout 2 months ago

Amd Gpus Go Brrr

26592 commentsby vinhnx

Postedabout 2 months agoActiveabout 2 months ago

Processing Strings 109x Faster Than Nvidia on H100

21626 commentsby ashvardanian

Posted4 months agoActiveabout 2 months ago

Writing Speed-of-Light Flash Attention for 5090 in Cuda C++

15934 commentsby dsr12

Posted5 months ago

We Reverse-Engineered Flash Attention 4

13448 commentsby birdculture

Posted4 months agoActiveabout 2 months ago

Optimizing Datalog for the GPU

12726 commentsby blakepelton

Posted2 months agoActiveabout 2 months ago

Kvcached: Virtualized, Elastic Kv Cache for LLM Serving on Shared Gpus

6913 commentsby Jrxing

Posted3 months agoActiveabout 2 months ago

Writing High-Performance Matrix Multiplication Kernels for Blackwell

636 commentsby lairv

Posted3 months agoActiveabout 2 months ago

Unweaving Warp Specialization on Modern Tensor Core Gpus

344 commentsby rohany

Posted4 months agoActiveabout 2 months ago

Processing Strings 109x Faster Than Nvidia on H100

343 commentsby samspenc

Posted4 months agoActiveabout 2 months ago

Gpus: Anatomy of High Performance Matmul Kernels

301 commentsby ai-epiphany

Posted3 months agoActiveabout 2 months ago

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul

2313 commentsby robertvc

Posted4 months agoActiveabout 2 months ago

I Just Trained a Physics-Based Earthquake Forecasting Model on a $1000 GPU

157 commentsby ArchitectAI

Posted2 months agoActiveabout 2 months ago

Sharing Base Model in GPU Vram Across Multiple Inference Stack Process [video]

71 commentsby medicis123

Posted4 months agoActiveabout 2 months ago

Processing Strings 109x Faster Than Nvidia on H100

60 commentsby binarymax

Posted4 months agoActiveabout 2 months ago

Run GPT-Oss-20b on 8gb Gpus

60 commentsby anuarsh

Posted4 months agoActiveabout 2 months ago

We Rebuilt Mxfp8 Moe Kernels From Scratch to Run 3.5x Faster on Blackwell

50 commentsby ecz

Posted5 months ago

Hipkittens: Fast and Furious Amd Kernels

41 commentsby pella

Posted2 months agoActiveabout 2 months ago

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

Home
Hiring
Products
Companies
Discussion
Q&A
Privacy Policy

Resources

Visit Hacker News
HN API
Modal cronjobs
Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2026 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.

GPU Optimization | Trending Topic on Hacker News | Not Hacker News!