Writing Speed-of-Light Flash Attention for 5090 in CUDA C++ | Not Hacker News!