We Reverse-Engineered Flash Attention 4

Posted4 months agoActive3 months ago

birdculture

134 points

48 comments

modal.comTechstory

calmmixed

Debate

60/100

Flash Attention 4GPU OptimizationDeep Learning

Key topics

Flash Attention 4

GPU Optimization

Deep Learning

The post discusses the reverse-engineering of Flash Attention 4, a GPU optimization technique, sparking a discussion on the meaning of 'reverse-engineering' and the complexity of writing optimized GPU kernels.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

2-4h

Avg / period

3.7

Comment distribution48 data points

Loading chart...

Based on 48 loaded comments

Key moments

01Story posted
Sep 27, 2025 at 5:50 PM EDT
4 months ago
Step 01
02First comment
Sep 27, 2025 at 7:15 PM EDT
1h after posting
Step 02
03Peak activity
14 comments in 2-4h
Hottest window of the conversation
Step 03
04Latest activity
Sep 29, 2025 at 3:36 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (48 comments)

Showing 48 comments

petters

4 months ago

6 replies

Is reading the source code reverse engineering?

charles_irl

4 months ago

4 replies

Hey, one of the authors here!

Reductively, software engineering means taking an idea and mapping it into code. So one form of "reverse" engineering would be taking the code and extracting the ideas. That's what we did here.

Because the source is public, there's quite a lot to work with from the start -- the warp specializations are named and there are helpful comments in many places.

But for many components, we didn't have much. Maybe the clearest case of "reverse engineering" explained in the post is with the cubic approximation for the rational part of the exponentiation. That required staring at some inline assembly and doing math.

metadat

4 months ago

5 replies

I've never heard of this definition of reverse engineering -- when one has the unobfuscated actual source code I'd usually call it: reading the code, or something like summarization.

Not trying to be uncharitable, I found your article informative. Reverse engineering has historically been reserved for cases where there is an adversial aspect, as in binaries or server APIs. Anyhow, Cheers and thank you, sincerely.

pests

4 months ago

1 reply

Having the source code and understanding how it works is two different things, especially when running on state of the art hardware. If I had just read the source I would not have gained as much knowledge as this article taught me. Where did this extra info come from? They read the source too, but then they did something more. I wouldn’t call it summarization either, as again any summary I wrote about the code would pale in comparison.

VBprogrammer

4 months ago

I think "explained" is a reasonable term for this. If I remember correctly there where books of the form "The Linux Source Code Explained".

Certainly I can't get on board with reverse engineered.

heavyset_go

4 months ago

1 reply

You've never had to reverse engineer the thinking and ideas that went behind code written by someone else/you a year ago?

greatgib

4 months ago

2 replies

No, because so far you "engineered" nothing. You just studied it, tried to understand it, and explain or teach it.

If you had reverse engineered it, you would have tried to "recreate something" that does not exist to do the same.

So, if you have a binary code, you recreate the source code that in theory could allow you to recreate the binary.

If you have the source code, I guess that would be when you are missing pieces of info that allows you to run this code like it is done by others...

heavyset_go

3 months ago

Disagree that reverse engineering necessarily requires something to be recreated.

For example, simple hardware reversing can just be learning what, how and why something works, you don't need to "recreate" anything other than ideas.

hackinthebochs

4 months ago

You guys are being obtuse. Engineering is turning a spec into a more technical artifact, whether that's source code, machine code, physical hardware or something else. Reverse engineering is then reserving the process of engineering, recovering the semantic artifact from the engineering artifact. That the OP is using the term in the sense of recovering the semantic insights from the cuda kernels is a fine application of the concept.

unnah

4 months ago

1 reply

That is the traditional explanation of why it is called reverse engineering. The term originated in hardware engineering. When it was originally applied to software, it was common to create requirements documents and design documents before coding, even if the actual process did not strictly follow the "waterfall" idea.

Thus it was natural to call the process of producing design documents from undocumented software "reverse engineering". These days coding without any formal design documents is so common that it seems the original meaning of reverse engineering has become obscured.

knome

4 months ago

1 reply

What time period and area did you come across this usage? As I ever saw it used, 'reverse engineering' generally referred to creating docs from executables or watching network protocols rather than from source.

unnah

3 months ago

Back in the 1990's. As an example, back then the Rational Rose design software had a feature to generate UML diagrams from existing source code, and it was called "reverse engineering".

https://en.wikipedia.org/wiki/IBM_Rational_Rose

cmrx64

4 months ago

it’s more properly just software archaeology; recovering design intent from artifacts https://en.m.wikipedia.org/wiki/Software_archaeology

Zacharias030

4 months ago

That time when I reverse engineered JRR Tolkien‘s Lord of the rings from symbols engraved on dead trees. Took me three summers…

saagarjha

4 months ago

I have to say this is kind of funny given that you also had this in the blog post:

> cudnn kernels are closed source, so Jensen only knows what’s going on in there.

billy99k

4 months ago

It's the 'hacker' argument all over again.

varispeed

4 months ago

I reverse engineered above comment by reading it and extracting the idea.

edunteman

4 months ago

1 reply

I'd argue that understanding disassembled assembly could be considered reverse engineering, which would logically extend to source code unless we draw the line at compilation

taneq

4 months ago

1 reply

We kinda... do? Draw the line there, I mean. Reverse engineering, as I've always heard the term used, is taking the final artifact and working backwards to infer the original design, and ideally some of the reasons for the decisions made. If you take a shipped binary, disassemble/decompile it, figure out what the variables mean and how it all works, that's reverse engineering. It's the equivalent of taking a mechanism, pulling it apart, and figuring out the cause and effect of how it works, to the extent that you can duplicate it and even modify the functionality.

Starting from high level source code is like starting from engineering drawings or the CAD model. You've already been handed most or all of the info that reverse engineering is attempting to recover.

hackinthebochs

4 months ago

Source code doesn't inherently contain the "why" of the operations. Code itself is an engineering artifact, so recovering the why is a kind of reverse engineering.

LoganDark

4 months ago

1 reply

Reading the source code is one thing, understanding it is another. Reverse engineering source code can be as simple as figuring out the original meaning/intent behind the code when it isn't immediately obvious or documented.

lelanthran

4 months ago

> Reading the source code is one thing, understanding it is another. Reverse engineering source code can be as simple as figuring out the original meaning/intent behind the code when it isn't immediately obvious or documented.

I would get some pretty weird looks if I changed my CV to replace "maintained legacy application that I did not write" with "reverse engineering".

Similarly, I would get instant hoots of laughter if told my dev managers over the last 28 years that I reversed engineered the legacy application I was hired to work on.

I mean, I get what you're saying, but when you use the term "reverse engineering" in the context of software, you're just going to confuse everyone who already knows what it means.

magicalhippo

4 months ago

2 replies

So, what would you call studying the code for the fast inverse square root[1] in the Quake source code so you truly understand it, to the point you can explain what it does to someone else without invoking words like "magic" or similar?

Because I'm pretty sure most devs would not just read the code and go "ah yes, of course".

[1]: https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...

bongodongobob

4 months ago

2 replies

Reverse engineering has always been without code. I didn't reverse engineer a project at work by reading the code. I literally just read the code. How is reading code you didn't write reverse engineering?

baq

4 months ago

1 reply

Code doesn’t tell you why it is like it is. Code is just the what. Engineering is the why and why not.

bongodongobob

4 months ago

1 reply

That's just learning the codebase. Reverse engineering is absolutely not that. Why is this even in question.

quotemstr

4 months ago

1 reply

Learning codebases involves a significant amount of reverse engineering! You have to get into the head of the authors and make guesses about why things work the way they do.

bongodongobob

4 months ago

4 replies

Yeah, that's just understanding the code. Reverse engineering is figuring out the code when you don't have it.

quotemstr

4 months ago

2 replies

What does it mean to "have" "it"? You might have assembly. That's an "it" that you can "have". Plenty of people derive meaning from it. Some people, like the retro gaming people, even use the assembly as the "form preferred for modification" if they don't have the original source. How is inferring the intent of, say, dense uncommented C or dense CUDA much different?

bongodongobob

4 months ago

This is just stupid semantic arguing. In the situations where you have assembly, its from getting it in some arcane way that is not supposed to happen. Building something to rip Nintendo roms for example. Looking at a codebase isn't reverse engineering.

vasachi

4 months ago

Are… Are you pulling out the “It depends on what the meaning of the word ‘is’ is.” trick?

hackinthebochs

4 months ago

1 reply

We need a name for the phenomenon where a popularization of a term is more narrow than its original usage and then people who only encountered the popularized word insist that the narrow application is its only meaning.

baq

4 months ago

'ignorance'?

msl

4 months ago

You keep on asserting that, but what are you basing it on?

According to Wikipedia[1], "In 1990, the Institute of Electrical and Electronics Engineers (IEEE) defined (software) reverse engineering (SRE) as "the process of analyzing a subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher level of abstraction" in which the "subject system" is the end product of software development." It goes on to clarify that "Reverse engineering can be performed from any stage of the product cycle, not necessarily from the functional end product."

Further, "There are two components in reverse engineering: redocumentation and design recovery."

Are you arguing that the work here does not fit the definition or that the definition is wrong? In the latter case, could you please share your definition, and maybe even explain why it is superior to IEEE's?

[1] https://en.wikipedia.org/wiki/Reverse_engineering#Software

LoganDark

4 months ago

Reverse engineering is not only deriving source code from an executable. Reverse engineering is figuring out what resulted in any given solution. This could be the source code that resulted in a given executable, or it could be the design decisions, considerations, and reasoning behind some given source code. You can go even further and reverse engineer those requirements to guess at the problems they were meant to solve, and so on and so forth. Reverse engineering is literally just going backwards: from machine code to source code, from source code to ideas and thoughts, from ideas and thoughts to the inciting problems or even more fundamental things. You can also reverse from further in the other direction, i.e. reverse a binary from a desired output (superoptimization!), reverse the desired output from the result of a calculation involving it (hello password cracking), etc.

Though password cracking is not necessarily the best example, some (very bad!) hashing algorithms can actually be reversed that way. Figuring out the reverse is, reverse engineering. You would reverse engineer the algorithm to figure out how to create a collision that way. Same as superoptimizers sort of reverse engineer the behavior you want to come up with a very efficient implementation. I'm using the term reverse engineer a bit loosely there but you get the point. It has nothing to do with source code really, you can just as easily reverse engineer physical objects. Or artwork. Or the psyche.

So yes, you can reverse engineer source code to understand on a deeper level how it works. Sometimes reading it over once or twice is enough for this, sometimes even reading the API documentation or observing behavior is enough, but sometimes you have to do a bit of thinking and/or testing to fully understand it.

magicalhippo

4 months ago

Just reading non-trivial code often does not give you any insight into why the code does what it does, or why it doesn't do something else, or even sometimes what it really does.

If reverse engineering is reserved for cases without source code, which I assume also means no decompilation which often is an option, then what do we call figuring out what some piece of code does and why it does what it does? And why is it sufficiently different from reverse engineering to warrant a separate term?

WithinReason

4 months ago

analyzing

SteveJS

4 months ago

The content is good. I’m glad i ignored a similar negative reaction to the reverse engineering framing.

bthornbury

4 months ago

I'm pretty sure it's called "reading the code". That said, it is difficult enough in its own right.

refibrillator

4 months ago

2 replies

Great exposition, loved the touch of humor. Please do the backward pass when it’s published.

As a fellow Tri Dao groupie and lucky duck who gets to build on Hopper/Blackwell clusters, I find it amazing how difficult it is becoming to write kernels that saturate GPU hardware.

When I squint, there appears to be a trend emerging across work like FA4, monolithic (mega) kernels, etc. Namely, a subversion of the classic CUDA programming model in the form of fine grained task based parallelism, managed entirely in “user space”.

Not exactly sure what’s ahead but I’m strapping in for a wild ride…

charles_irl

4 months ago

1 reply

Thanks! I think computers are fun and I want reading about them to be fun too.

I was also reminded of HazyResearch's MegaKernels. Didn't want to distract from the main thrust of the post, but definitely think that's a promising approach.

emaadm

4 months ago

There's some interesting work in NeurIPS this year on fused kernels for MoE too: https://flash-moe.github.io/

kweezar

4 months ago

1 reply

Any great learning resources for beginners friendly GPU programming?

arthurcolle

4 months ago

Modal's CUDA Book is cool

quatonion

4 months ago

This is really interesting. I always wondered how it works.

Couple of years ago I did some experiments using a surrogate for attention using a feed forward network (MLP) to avoid the quadratic explosion.

It worked but had problems at the time, and my mind wasn't really in it.

This has dug it back out again with the benefit of time and additional insights.

So now I'm thinking, you can use a lot of the insights in the work here, but also shoot for a full linear scaling surrogate.

The trick is to use the surrogate as a discriminator under an RL regime during training.

Instead of just applying better/faster math and optimizations alone, have the model learn to work with a fundamentally better inference approach during training.

If you do that, you can turn the approximation error present in the FFN surrogate inference method into a recovery signal encoded into the model itself.

I haven't tried it, but don't see a reason it shouldn't work. Will give it a go on a GPT-2 model ASAP.

Thanks again for the awesome article.

patrulek

4 months ago

It seems that in spec-driven development era "reverse enginnering" gonna change its meaning...

askl

4 months ago

Quite confusing name. I was hoping this was something about Adobe flash.

greatgib

4 months ago

Looking at the title of this post, when you do PR reviews, you are "reverse engineers"...

This question set aside, I'm not fan at all of this blog post content, might be me being too stupid, but I don't think that it is well understandable. Very few concrete info and a lot of digressions. Like the constant reference to research article or reference on related topics. Looks like low value research papers trying to show that you did your work with lot of references.

View full discussion on Hacker News

ID: 45399637Type: storyLast synced: 11/20/2025, 3:29:00 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN