Fp8 Runs ~100 Tflops Faster When the Kernel Name Has "cutlass" in It
Mood
heated
Sentiment
negative
Category
other
Key topics
A GitHub PR reveals that NVIDIA's GPU compiler optimizes code based on kernel names, specifically when containing 'cutlass', sparking controversy over unfair optimization practices and vendor lock-in.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
143
Day 1
Avg / period
32
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 3, 2025 at 12:21 AM EDT
about 2 months ago
Step 01 - 02First comment
Oct 3, 2025 at 1:39 AM EDT
1h after posting
Step 02 - 03Peak activity
143 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 7, 2025 at 1:29 AM EDT
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
> By disassembly of ptxas, it is indeed hard-coded that they have logic like: strstr(kernel_name, "cutlass").
> it is likely that, this is an unstable, experimental, aggressive optimization by NVIDIA, and blindly always enabling it may produce some elusive bugs.
An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.
My experience is with non-Nvidia GPU systems, but this feels like a familiar situation. They probably found something that has great outcomes for one set of kernels, terrible outcomes for another, and no known reliable heuristic or modeling they could use to automatically choose.
> An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.
Maybe a common example of this is that people can write matrix matrix multiplication kernels that outperform standard implementations (also in BLAS for CPU). But that's not a General Matrix Matrix multiply. Is the speedup still there for spare matrices? Larger ones? Small ones? Ones that aren't powers of 2? Non-square? And so on. You can beat the official implementation in any one of these but good luck doing it everywhere. In fact, you should beat the official method because you don't have the overhead to check which optimization you should use.It's easy to over simplify a problem and not even realize you have done so. There's always assumptions being made and you should not let these be invisible.
Nvidia seems to get a pass. Whys that?
Nvidia are disabling optimisations on their own hardware. The motivation appears to be related to these optimisations being unsafe to apply to general code.
If intentionally slowing non CUTLASS shaders, sure pitchfork time.
If it's an option that /technically/ breaks the CUDA shader compatibility contract, then enabling it in specific "known good" situations is just business as usual for GPU drivers.
That can be for all kinds of reasons - straightforward bugs or incomplete paths in the optimization implementation, the app not actually needing the stricter parts of the contract so can have a faster path, or even bugs in apps that need workarounds.
Though piggybacking into these without understanding can be extremely fragile - you don't know why they've limited it, and you run the risk of tripping over some situation that will simply fail, either with incorrect results or something like a crash. And possibly in rather unexpected, unpredictable situations.
Fix dynamic channel list by passing auth via metadata
- Pass userId and userEmail in metadata when calling HTTP transport
- AuthenticatedToolsProviderFactory now reads from context.metadata
- Each tools/list request creates a fresh ToolsProvider with authentication
- Execute command description now correctly shows currently online machines
- Tested locally and working correctlyIn our case the agent has access to Jira and has wider knowledge. For commit messages i don’t bother that much anymore (i realise typing this), but for the MRs I do. Here i have to instruct it to remove implementation details.
The agent can't do that if you told Claudepilotemini directly to make some change without telling it why you were prompting it to make such a change. LLMs might appear magic, but they aren't (yet) psychic.
He's saying that he likely has an MCP connected to jira on the LLM he's developing with.
Hence the prompt will have already referenced the jira ticket, which will include the why - and if not, you've got a different issue. Now the LLM will only need something like "before committing, check the jira ticket we're working on and create a commit message ...
But whether you actually want that is a different story. You're off the opinion it's useful, I'd say it's rarely doing to be valuable, because requirements change, making this point in time rational mostly interesting in an academic sense, but not actually valuable for the development you're doing
It depends on a ton of factors, and at least I'd put very little stock in the validity of the commit message that it might as well not exist. (And this is from the perspective of human written ones, not AI)
∞ Try make the code sensible & readable so it can be the documentation.
∞ Comment well anyway, just in case it isn't as obvious to the reader (which might be me in a few months time) as it is to me when making the change. Excess comments can always be removed later (and, unless some idiot rewrites history, can potentially be referred to after removal if you have a “why t f” moment), comments you never write can't be found later.
∞ Either a directly meaningful commit message, or at very least ticket references to where more details can be found.
For personal tinkering, I'm a lot less fastidious.
It never actually is at any non-minimal scale (and not even the code authored by the the people who claim code is self documenting).
My comment was rhetorical and sarcastic.
/s
When the "why" isn't explained, you end up with things like someone refactoring code and spending time (at best) trying to figure out why some tests now fail or (at worst) breaking something in production.
I'd argue that even the "how" sometimes is better explained in plain words than in code (even if that opens the door for outdated comments when code is changed).
This is completely meaningless and just pollutes the log.
"[Tangentially related emoji] I have completed this fully functional addition to the project that is now working perfectly! There are now zero bugs and the system is ready for deployment to production! [Rocketship emoji]"
Then of course you test it out and it doesn't work at all! It's very grating. It would be more bearable if it hedged its claims a bit more (maybe that will negatively affect the quality of the results though - if training a model to output insecure code also makes it a murderous Hitler admirer then, since when humans hedge their output is less likely to be perfect, it may mean it pushes the model to output code that is less than perfect).
This made me laugh so hard. Never trust an AI model saying “There are now zero bugs”! Weaponized incompetence? :)
As a side note, I absolutely am in love with GPT-5 and GPT-5-codex. When I talk to it, it feels like talking to a peer and not an over enthusiastic (but talented) junior with potential. GPT-5-codex on high has been exceptional at debugging insidious bugs.
If a human puts that, I doubt it. If I know they are using “AI” to fill in commit message I'll just assume it is a complete hallucination.
I use conventional commit formats for a reason, and the AI can’t even attempt it. I’m not even sure I’d trust it to get the right designation, like “fix(foo)!: increase container size”.
@Mogball Mogball merged commit ade3d49 into main on Jul 9
They "made something ~100 tflops faster" and peoples' comments are "their commit messages are bad"? You guys would hate how John Carmack worked, too
Would love to see Carmack's commit messages. Just the other day I unsuccessfully tried to look for pictures of his office newer than QuakeIII era. Want ti figure out his ergonomics for working (presumed) 10h days well into middle age.
They're mostly not exactly prose but remember this was almost 40 years ago when the dominant style of writing code in some places was still ye olde K&R C with one letter variable names and goto everywhere
git branch "backup/$(git branch --show-current)/$(date +%s)"
# do whatever you fancy
git reset --hard "backup/$(git branch --show-current)/${thattimestampabove}"
You can't lose anything as long as you have a pointer to it (which doubles as making it easy to find)the only thing that does keep a git object is having a ref that (directly or transitively) points to it
git-bisect works best when every commit works, contains a single idea, and stacks in a linear history. These features are of most use in a publicly visible branch, and is why it is helpful to squash an entire pull-request into a single, atomic commit — one which clearly defines the change from before- to after-this-feature.
You’re welcome to do whatever you like in your private branch of course, but once you are presenting work for someone else to review then it’s consistent with “I believe this is now complete, correct, working, and ready for review” to squash everything into a single commit. (The fact that code review tools show the sum of all the minor commits is a workaround for people that don’t do this, not a feature to support them!)
In terms of ‘git commit -m wip’: no one is saying you should wear a suit and tie around the house, but when you show up for your senate hearing, presenting yourself formally is as necessary as it is to leave the slides, sweat pants, and tee shirt at home.
Yes, commit early and often while in the flow of putting together a new idea or piece of work. When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
Or to use a different analogy: they don’t want every draft of your master’s thesis from start to finish and they’ll be annoyed if they have to fix basic typos for you that should’ve been caught before the final draft. They don’t care about the typos you already found either, nor how you fixed them. They just want the final draft and to discuss the ideas of the final draft!
Conversely if your master’s thesis or git branch contains multiple semantically meaningful changes — invent calculus then invent gravity / add foo-interface to lib_bar then add foo-login to homepage — then it probably ought to be two code reviews.
Disagree; git-bisect works best when every commit is small and most commits work (in particular, as long as any broken commit is likely to have a neighbour that works - isolated bad commits aren't a problem (that's what skip is for, and it's easy enough to include that in your script - you do automate your bisects, right?), long chains of bad commits are). Squashing means your bisect will land on a squashed commit, when it's only really done half the job. (In particular, the very worst case, where every single one of your intermediate commits was broken, is the same as the case you get when you squash)
> When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
And that's what the PR description is for! You don't have to destroy all your history to make one.
You’re right that GitHub, GitLab et al let you use their tooling to write the final commit message (for the merge commit or squash commit). My preference has always been to do that in git itself.
In both cases you end up with a single atomic commit that represents the approved change and its description. For me, the commit is created the moment a review is requested, instead of from the moment it is approved and landed. One reason this is particularly useful is that you can now treat the commit as if it had already landed on the main branch. (It is easier to share, cherry-pick, rebase, etc. — easier than doing so with a branch of many commits, in my experience.)
Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.
> Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.
You can do that even earlier if you simply never squash or otherwise edit history, which is my approach - any pushed feature branch is public as far as I'm concerned, and my colleagues are encouraged to pull them if they're e.g. working on the same area at the same time. It comes at the cost of having to actually revert when you need to undo a pushed commit (and, since cherry-pick is not an option, if you're making e.g. a standalone fix and want your colleagues to be able to pull it in unrelated branches, you have to think ahead a little and make it on a new branch based from master rather than sticking it in the middle of your incomplete feature branch), but it's very much worth it IME.
``` #!/bin/sh N=20 msg_len=$(git log -1 --pretty=%B | wc -c) if [ "$msg_len" -lt "$N" ]; then exit 125 fi # Here you would run your actual test and report 0 (good), 1 (bad) as needed exit 0 ```
This is an extremely opinionated and time consuming way of working. Maybe in this context it makes sense (nvidia driver kernel somethings), but I don't think it's universally the best way to write code together.
Naturally, I have Opinions on the right way to use git, having used it since inception within various different contexts at various places, along with other VCSs. What works at one place won't be right for another place, and vice versa. Especially given different skill levels of individuals and of teams, the tools involved, and how much weight I have to review code and commits before it gets accepted. What's important is it should work for you, not the other way around. Regardless of where I'm working though, my local commit messages are total crap. "wip" being the most common, but I commit frequently. Importantly though, before I do eg a slightly involved refactor, going back to see what it was before I started is trivial. Being a skilled operator of git is important to make it easy to run newly written tests against the old code. Being efficient at rebase -i and sorting commits into understandable chunks and squashing minor commits to keep things clean is key.
I don't think every patch in a series has to work totally independently for every git repo, but what it comes down to is maintenance. There's nothing worse than digging around in git history, trying to figure out why things are how they are, only to dead end at a 3000 line commit from 5 years ago with the message "poop, lol". It's even worse when the person who did that was you!
Universally, what it comes down to is maintenance. That totally rushed prototype that was just for a demo has now been in production for years, and there's this weird bug with the new database. If you hate yourself, your job, future you, your colleagues, and everybody that comes after you, and you're no good at git, by all means, shit out 300 commits, don't squash, and have the PR message be totally useless. Also believe you're hot shit after one semester of boot camp and that no one else cares just because you want to go home, get high, and play xbox. (Not remotely saying that's you, but those people are out there.)
We could get all philosophical and try and answer the question of if there are any universal truths, nevermind universally best git commit practices.
I don't work where you work, don't know your team, or anybody's skill levels on it, so I'll just close with a couple thoughts. the tool is there to work for you, so learn to work with it not against it. Git bisect is your friend. And that it really sucks 4 years later to be confronted by totally useless commit messages on inappropriately sized commits (too big or too small) and have to guess at things in order to ship fixes to prod (on the prototype that was supposed to get thrown away but never did) and just hope and pray that you've guessed correctly.
Why do I care? Interesting question. I'm generally a person who cares, I guess. In this specific case it seems analogous to a mechanic having a tidy workshop. Would you leave your bicycle with someone head to toe in grease and tools strewn all over the place? I wouldn't.
If someone is git bisecting trying to track down a bug they want to land on the smallest code diff possible, not the whole feature change.
> In this specific case it seems analogous to a mechanic having a tidy workshop. Would you leave your bicycle with someone head to toe in grease and tools strewn all over the place? I wouldn't.
Generally I'd be suspicious of a mechanic whose workspace looked too tidy. There's a balance, but it's very easy to keep your tools organised if you're not actually using them much.
A code reviewer doesn't care that you spent 10 days across 100 commits tweaking some small piece of code, the code reviewer cares about what you ended up with.
The bisecter probably wants to compile the code at every commit to try and see if they can reproduce a bug. Commits which don't compile force the bisecter to step through commits one by one until they find one which compiles, linearizing a process which should be logarithmic.
Those are bad, don't do that. (At least don't do long chains of them. The occasional isolated one is ok)
I still find writing a "good" commit message to be a much heavier process than making sure that it compiles.
- Human readers don't care about the details of everything you tried
- Commits that both don't compile and have no useful message are of very little value
But I don't think that's an argument against making small commits with uninformative messages, because making those commits without breaking compilation - or even explicitly checking that it compiles before committing - is much easier than coming up with a full commit message. And small commits that do compile - or even small commits of which a significant proportion compile - are very useful for bisect.
Not unless they're literally just e.g. making a change and undoing it. In general 100 tiny commits with messages like "x" don't make human PR review any harder and they make (automated) bisect to find a bug better.
If you change 2 lines 8 times to check something just squash the commit, saves everyone the hassle
Message doesn’t need to be fancy, but it should describe what you did. Being unable to articulate your actions is a thought smell. It’s often seen when the developer is trying stuff until it sticks and needs a commit because the only way to test the fix is in a testing environment, two bad practices.
Sometimes when people speak rhetorically I'm baffled because I feel they literally do not understand what they're saying because they end up supporting an opposing rhetorical purpose. Yes you're 100% correct well-structured commits are a luxury that most of us do not have the privilege of experiencing because we work in high-pressure, deadlines driven environments where no points are awarded for beautifully crafted commit messages.
So in effect your argument is like "people that haven't had the luxury of a Michelin star restaurant don't appreciate amuse bouche and they should strive to rectify that".
You claimed that “literally no one” has a different review workflow than yours. I do, and my experience is that clear commits make reviews both faster and deeper, which is very helpful specifically in a high-pressure, deadline driven environment where being slow and wrong is costly. You’re of course free to disagree and work differently.
To use restaurants as the analogy, Michelin star-grade dining might be unavailable, and we might have to live the Olive Garden, or even McDonald's life. Regardless of which restaurant we're at though, if the food is moldy and gross, we shouldn't eat it.
Isn't your interpretation backwards in some cases? What I mean, is that _because_ you see the intermediate commits are garbage, you _then_ decide not to review the individual commits (because you are interested in the contribution anyway).
I certainly do care for the hobby FOSS projects I maintain, and bad commit messages + mega-commits won't fly at my day job.
Squash-merging has the advantages of making 1 PR == one commit with the PR ID in the commit message, sure, but it's unfortunately promotes bad Git hygiene (and works around it)
For an illustration of the scale of this, search GitHub for 'commit by commit': https://github.com/search?q=%22commit+by+commit%22&type=pull... (2M results)
At some point, you will have something working that makes sense that clean up. Then use interactive rebase to create one or a few commits that "makes sense". What makes sense is one of these topics that could create a whole bike garage, but you and your team will have some agreement on it. One thing that I like is to keep pure refactorings by themselves. No one cares to review that you've changed typos in old variables names and things like that. If it's a separate commit, you can just skip over it.
Depending on if you are completely done or not, the resulting branch can be sent as a PR/MR. Make sure that all commits have a reason why the change was made. No reason to repeat what the code says or generate some AI slop message. Your knowledge of why a change was done in a certain way is the most valuable part.
Of course, this way of working fits my work, that is not cloud based in any way and with lots of legacy code. It creates git history that I would like to have if I have to take over old projects or if I have to run git bisect on an unfamiliar code base and figure out some obscure bug. You might have a completely different technology stack and business, where it makes sense to work in some other way with git.
You can do whatever you like locally. That's the luxury of git: it's distributed so you always get your own playground. You can make commits with short names if you find that useful. Personally I prefer to track my progress using a todo system, so I don't need commits to tell me where I am. I use `stash` instead of committing broken stuff if I need to switch branches.
I've found the idea of "rebase later" always easier said than done. In my experience if you work like that you'll more often than not end up just squashing everything into one commit. I prefer to rebase as I go. I'll rebase multiple times a day. Rearranging commits, amending the most recent commit etc. Keeping on top of it is the best way to achieve success at the end. It's like spending that bit of extra time putting your tools away or sweeping the floor. It pays off in the long run.
https://web.archive.org/web/20230929180112/https://techrepor...
https://web.archive.org/web/20011108190056/https://hardocp.c...
https://web.archive.org/web/20011118183932/www.3dcenter.de/a...
Ati did not rename quake to quack as I originally thought from this! :)
Page 1 https://web.archive.org/web/20071028172853/http://techreport...
Page 2 https://web.archive.org/web/20111130162817/http://techreport...
Page 3 https://web.archive.org/web/20080213212637/http://techreport...
Page 4 https://web.archive.org/web/20101110031431/http://techreport...
Page 5 https://web.archive.org/web/20101108144857/http://techreport...
When it comes to computer graphics, iirc it's pretty normalized now - graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.
(As an aside, I hate that I have to link to archive.org, there's a lot of dead links nowadays but these are important things to remember).
[0] https://web.archive.org/web/20250306120819/https://www.anand...
[1] https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
[2] https://web.archive.org/web/20051218120547/http://techreport...
[3] https://www.servethehome.com/impact-of-intel-compiler-optimi...
Maybe hyperbole, but I think obviously they can't do this for literally every game, that would require huge personnel resources. At least looking at mesa (linked elsewhere), only ~200 games are patched, out of what 100k PC games? So <1%.
Even Mesa has them: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/uti...
These changes are supposed to have minimal to no impact on the actual output, but sometimes vendors are really aggressive, and significantly degrade the outputs so that the game can run faster on their hardware.
Do you have a source for this? I’d like to see some examples
Eg. Frostpunk has Antialiasing for transparency layers on. Slay the spire does not. I never set these settings. Nvidia literally does a lookup on first run for what they judge as best defaults and sets these appropriately.
Every single game/program you install has different options from a huge list of possible optimizations.
[1]: https://github.com/KhronosGroup/Vulkan-Headers/blob/main/inc...
Funnily, it's under an older submission of the same cutlass optimizations.
And despite it not being nice, some optimizations rely on type or function names schemas/substrings/etc
It sucks, but thats how it works
It doesnt have to be malicious just sometimes it is safer to deploy optimization only for your libs than risk breaking stuff
Or your frontend is not giving you more data which you can rely on
E.g "__nvidia_experimental_feature_xyz_v1"
if(AskLLM("Does function signature+name look like error handling code")) {
TurnOffInliner();
}
is actually probably a lot more effective than you'd think (generating PGO traces with a machine learning tool is apparently a thing that sort of works)... until somebody randomly chooses the same name for some reason and gets hosed.
You're not helping.
The marginal costs per user are very small or even zero for desktop applications. This means that software needs a funding structure with periodic payments, but at the same time the payments shouldn't grow with the number of users. There also needs to be a way for the initial investors who pay for the creation of new features or entire code bases to get their money back as the product becomes popular.
This in itself is not problematic, but it is not covered by traditional crowdfunding. The problem is that the funding goal needs to be met no matter what, and the contribution per user shrinks as more users contribute. You can't expect everyone to chip in 100%, 10% or even 1% of the funding cost, since that could be thousands of dollars even at the minimum. You need some sort of auctioning process where people can pledge a fixed quantity and if the user count is low enough, their pledge counts, otherwise it doesn't.
This has one problem though. What's problematic is the transition from the exclusive to non-exclusive mode.
There will be freeloaders who might pitch in five dollars, but they know five big corporations have chipped in and this covered the full development cost, leading to open sourcing the entire codebase. Everyone else is a freeloader. Including cheapskate corporations.
When you talk to the computer team about it they may offer a range of solutions, some of which may not be applicable to open source code. Picture proprietary #pragmas, intrinsics, or whatnot. What do you do? You can't ship a high performance library that doesn't deliver high performance. It is then when you rely on things like function names to enable specific code transformations that can't be used in general because they would sometimes break third party code.
I never worked on Cutlass, but this is the sort of thing that is done in the real world.
There is nothing nefarious about this sort of optimizatkon. People comparing this to cheating on benchmarks by rendering lower quality images are not on the right track.
- Intel faced a "cheating compiler" controversy when SPEC, the benchmark standard-setter, invalidated over 2,600 benchmark results for Intel Xeon processors in early 2024. ( https://www.tomshardware.com/pc-components/cpus/spec-invalid... )
- microsoft doing similar things (java benchmarks, C compiler benchmarks)
- and everybody cheating on AI benchmarks (https://www.thestack.technology/ai-benchmarking-scandal-were...)
6 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.