Not
Hacker
News
!
Home
Hiring
Products
Companies
Discussion
Q&A
Users
GPU Optimization | Trending Topic on Hacker News | Not Hacker News!
Not
Hacker
News
!
Home
Hiring
Products
Companies
Discussion
Q&A
Users
Home
/
Discussion
/
GPU Optimization
Back to Discussion
GPU Optimization
Loading...
20 stories
•
24h:
0%
•
7d: 0
•
865 comments
Top contributors:
hd4
sydriax
mmastrac
vinhnx
ashvardanian
Stories
Related Stories
20 stories tagged with gpu optimization
Alibaba Cloud Says It Cut Nvidia AI GPU Use by 82% with New Pooling System
523
315 comments
by hd4
•
1mo ago
AI
GPU optimization
cloud computing
We Bought the Whole Gpu, So We're Damn Well Going to Use the Whole GPU
504
110 comments
by sydriax
•
1mo ago
GPU optimization
CUDA
LLM performance
Fp8 Runs ~100 Tflops Faster When the Kernel Name Has "cutlass" in It
338
166 comments
by mmastrac
•
1mo ago
GPU optimization
compiler controversy
NVIDIA
Amd Gpus Go Brrr
265
92 comments
by vinhnx
•
12d ago
AMD GPUs
GPU optimization
AI hardware
Processing Strings 109x Faster Than Nvidia on H100
216
26 comments
by ashvardanian
•
2mo ago
GPU optimization
string processing
performance engineering
Writing Speed-of-Light Flash Attention for 5090 in Cuda C++
159
34 comments
by dsr12
•
3mo ago
CUDA
NVIDIA
GPU Optimization
Machine Learning
We Reverse-Engineered Flash Attention 4
134
48 comments
by birdculture
•
2mo ago
Flash Attention 4
GPU optimization
deep learning
Optimizing Datalog for the GPU
127
26 comments
by blakepelton
•
22d ago
Datalog
GPU Optimization
High-Performance Computing
Kvcached: Virtualized, Elastic Kv Cache for LLM Serving on Shared Gpus
69
13 comments
by Jrxing
•
1mo ago
GPU optimization
LLM serving
cache management
Writing High-Performance Matrix Multiplication Kernels for Blackwell
63
6 comments
by lairv
•
1mo ago
high-performance computing
matrix multiplication
GPU optimization
Unweaving Warp Specialization on Modern Tensor Core Gpus
34
4 comments
by rohany
•
2mo ago
GPU optimization
warp specialization
multi-stage pipelining
Processing Strings 109x Faster Than Nvidia on H100
34
3 comments
by samspenc
•
2mo ago
GPU optimization
string processing
CUDA
Gpus: Anatomy of High Performance Matmul Kernels
30
1 comments
by ai-epiphany
•
1mo ago
GPU optimization
matrix multiplication
high-performance computing
Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul
23
13 comments
by robertvc
•
2mo ago
GPU Optimization
Matrix Multiplication
NVIDIA Blackwell
I Just Trained a Physics-Based Earthquake Forecasting Model on a $1000 GPU
15
7 comments
by ArchitectAI
•
23d ago
earthquake forecasting
physics-based modeling
GPU optimization
Sharing Base Model in GPU Vram Across Multiple Inference Stack Process [video]
7
1 comments
by medicis123
•
2mo ago
GPU optimization
AI inference
VRAM management
Run GPT-Oss-20b on 8gb Gpus
6
0 comments
by anuarsh
•
2mo ago
AI
LLMs
GPU Optimization
Processing Strings 109x Faster Than Nvidia on H100
6
0 comments
by binarymax
•
2mo ago
GPU optimization
string processing
performance benchmarking
We Rebuilt Mxfp8 Moe Kernels From Scratch to Run 3.5x Faster on Blackwell
5
0 comments
by ecz
•
3mo ago
GPU optimization
AI performance
MoE kernels
Implementing a Fast Tensor Core Matmul on the Ada Architecture
4
0 comments
by skidrow
•
1mo ago
GPU Optimization
Tensor Cores
NVIDIA Ada Architecture