Not
Hacker
News
!
Home
Hiring
Products
Discussion
Q&A
Users
Not
Hacker
News
!
Home
Hiring
Products
Discussion
Q&A
Users
Home
/
Discussion
/
GPU Optimization
Back to Discussion
GPU Optimization
Loading...
20 stories
•
24h:
0%
•
7d: 0
•
866 comments
Top contributors:
hd4
sydriax
mmastrac
vinhnx
ashvardanian
Stories
Related Stories
20 stories tagged with gpu optimization
Alibaba Cloud Says It Cut Nvidia AI GPU Use by 82% with New Pooling System
523
315 comments
by hd4
Posted
3 months ago
Active
about 2 months ago
Artificial Intelligence
GPU optimization
cloud computing
We Bought the Whole Gpu, So We're Damn Well Going to Use the Whole GPU
504
110 comments
by sydriax
Posted
3 months ago
Active
about 2 months ago
GPU optimization
CUDA
LLM performance
Fp8 Runs ~100 Tflops Faster When the Kernel Name Has "cutlass" in It
338
166 comments
by mmastrac
Posted
3 months ago
Active
about 2 months ago
GPU optimization
compiler controversy
NVIDIA
Amd Gpus Go Brrr
265
92 comments
by vinhnx
Posted
about 2 months ago
Active
about 2 months ago
AMD GPUs
GPU optimization
AI hardware
Processing Strings 109x Faster Than Nvidia on H100
216
26 comments
by ashvardanian
Posted
4 months ago
Active
about 2 months ago
GPU optimization
string processing
performance engineering
Writing Speed-of-Light Flash Attention for 5090 in Cuda C++
159
34 comments
by dsr12
Posted
5 months ago
CUDA
NVIDIA
GPU Optimization
Machine Learning
We Reverse-Engineered Flash Attention 4
134
48 comments
by birdculture
Posted
4 months ago
Active
about 2 months ago
Flash Attention 4
GPU optimization
deep learning
Optimizing Datalog for the GPU
127
26 comments
by blakepelton
Posted
2 months ago
Active
about 2 months ago
Datalog
GPU Optimization
High-Performance Computing
Kvcached: Virtualized, Elastic Kv Cache for LLM Serving on Shared Gpus
69
13 comments
by Jrxing
Posted
3 months ago
Active
about 2 months ago
GPU optimization
LLM serving
cache management
Writing High-Performance Matrix Multiplication Kernels for Blackwell
63
6 comments
by lairv
Posted
3 months ago
Active
about 2 months ago
high-performance computing
matrix multiplication
GPU optimization
Unweaving Warp Specialization on Modern Tensor Core Gpus
34
4 comments
by rohany
Posted
4 months ago
Active
about 2 months ago
GPU optimization
warp specialization
multi-stage pipelining
Processing Strings 109x Faster Than Nvidia on H100
34
3 comments
by samspenc
Posted
4 months ago
Active
about 2 months ago
GPU optimization
string processing
CUDA
Gpus: Anatomy of High Performance Matmul Kernels
30
1 comments
by ai-epiphany
Posted
3 months ago
Active
about 2 months ago
GPU optimization
matrix multiplication
high-performance computing
Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul
23
13 comments
by robertvc
Posted
4 months ago
Active
about 2 months ago
GPU Optimization
Matrix Multiplication
NVIDIA Blackwell
I Just Trained a Physics-Based Earthquake Forecasting Model on a $1000 GPU
15
7 comments
by ArchitectAI
Posted
2 months ago
Active
about 2 months ago
earthquake forecasting
physics-based modeling
GPU optimization
Sharing Base Model in GPU Vram Across Multiple Inference Stack Process [video]
7
1 comments
by medicis123
Posted
4 months ago
Active
about 2 months ago
GPU optimization
AI inference
VRAM management
Processing Strings 109x Faster Than Nvidia on H100
6
0 comments
by binarymax
Posted
4 months ago
Active
about 2 months ago
GPU optimization
string processing
performance benchmarking
Run GPT-Oss-20b on 8gb Gpus
6
0 comments
by anuarsh
Posted
4 months ago
Active
about 2 months ago
Artificial Intelligence
LLMs
GPU Optimization
We Rebuilt Mxfp8 Moe Kernels From Scratch to Run 3.5x Faster on Blackwell
5
0 comments
by ecz
Posted
5 months ago
GPU optimization
AI performance
MoE kernels
Hipkittens: Fast and Furious Amd Kernels
4
1 comments
by pella
Posted
2 months ago
Active
about 2 months ago
GPU Optimization
AMD Kernels
High-Performance Computing
GPU Optimization | Trending Topic on Hacker News | Not Hacker News!