Last activity about 2 months agoPosted Oct 3, 2025 at 12:21 AM EDT

Fp8 Runs ~100 Tflops Faster When the Kernel Name Has "cutlass" in It

mmastrac

338 points

166 comments

Mood

heated

Sentiment

negative

Discussion Activity

Very active discussion

First comment

Peak period

143

Day 1

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 3, 2025 at 12:21 AM EDT
about 2 months ago
Step 01
02First comment
Oct 3, 2025 at 1:39 AM EDT
1h after posting
Step 02
03Peak activity
143 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Oct 7, 2025 at 1:29 AM EDT
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (166 comments)

Showing 160 comments of 166

vasco

about 2 months ago

1 reply

This was discussed before at the time the PR was created and there's nothing new that I can see.

https://news.ycombinator.com/item?id=44530581

rob_c

about 2 months ago

Thought this looked familiar...

nulld3v

about 2 months ago

2 replies

https://github.com/triton-lang/triton/pull/7298#discussion_r...

> By disassembly of ptxas, it is indeed hard-coded that they have logic like: strstr(kernel_name, "cutlass").

> it is likely that, this is an unstable, experimental, aggressive optimization by NVIDIA, and blindly always enabling it may produce some elusive bugs.

frogblast

about 2 months ago

2 replies

Often not elusive bugs, but elusive performance. GPU compilers are hard: Once you've done the basics, trying to do further transforms in a mature compiler will almost always produced mixed results. Some kernels will go faster, some will go slower, and you're hoping to move the balance and not hit any critical kernel too hard in your efforts to make another go faster.

An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.

My experience is with non-Nvidia GPU systems, but this feels like a familiar situation. They probably found something that has great outcomes for one set of kernels, terrible outcomes for another, and no known reliable heuristic or modeling they could use to automatically choose.

Eridrus

about 2 months ago

1 reply

A saner design would turn this optimization into a documented flag that anyone can opt into.

rcoveson

about 2 months ago

3 replies

Speaking from a place of long-term frustration with Java, some compiler authors just absolutely hate exposing the ability to hint/force optimizations. Never mind that it might improve performance for N-5 and N+5 major releases, it might be meaningless or unhelpful or difficult to maintain in a release ten years from now, so it must not be exposed today.

MichaelZuo

about 2 months ago

1 reply

That seems valid for customers expecting a warranty or support. But they should allow it if customers waive all such in writing.

Dylan16807

about 2 months ago

1 reply

Warranty and support specifically for that flag? Because I don't see how general warranty and support requires keeping any hint flags forever.

shadowpho

about 2 months ago

1 reply

If you remove the hint flag peoples build will break

Dylan16807

about 2 months ago

1 reply

Doesn't need to, it can acknowledge and ignore the hints.

shadowpho

about 2 months ago

1 reply

True, but there might be more problems — like if you drop support their run time will be slow because they rely on this flag and they are unhappy

Dylan16807

about 2 months ago

The premise of removing the flag is that it's useless or a problem. If it's still causing a big speed boost somewhere then you need to figure something out, but the core scenario here is that it's obsolete.

recursivecaveat

about 2 months ago

I once exposed a "disableXYZOptimization" flag to customers so they could debug a easier without stuff getting scrambled. Paid for my gesture for the next year signing off on release updates, writing user guide entries, bleh.

Eridrus

about 2 months ago

So it's better to hardcode your specific library name and deal with the same issue after people have reverse engineered it and started depending on it anyway?

godelski

about 2 months ago

  > An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.

Maybe a common example of this is that people can write matrix matrix multiplication kernels that outperform standard implementations (also in BLAS for CPU). But that's not a General Matrix Matrix multiply. Is the speedup still there for spare matrices? Larger ones? Small ones? Ones that aren't powers of 2? Non-square? And so on. You can beat the official implementation in any one of these but good luck doing it everywhere. In fact, you should beat the official method because you don't have the overhead to check which optimization you should use.

It's easy to over simplify a problem and not even realize you have done so. There's always assumptions being made and you should not let these be invisible.

temp0826

about 2 months ago

Thanks for a little context, this is not my wheelhouse at all (never even heard of this project) and I could not make heads or tails of the title or the linked PR.

fooker

about 2 months ago

5 replies

When intel did it, the pitchforks came out.

Nvidia seems to get a pass. Whys that?

fancyfredbot

about 2 months ago

1 reply

Intel disabled optimisations when they detected they were running on their competitors hardware. The motivation was to make competitors compare badly in benchmarks.

Nvidia are disabling optimisations on their own hardware. The motivation appears to be related to these optimisations being unsafe to apply to general code.

flamedoge

about 2 months ago

Let's be clear here. Intel searched for "GenuineIntel" and ran optimized code on their hardware.

Cthulhu_

about 2 months ago

1 reply

nVidia got their pitchforks back in 2003: https://web.archive.org/web/20051218120547/http://techreport...

chedabob

about 2 months ago

And again in 2010, although as far as I'm aware this was just based on speculation and it was never proved that it was intentional, or that the optimisation would have netted the gains the author said: https://web.archive.org/web/20250325144612/https://www.realw...

kimixa

about 2 months ago

It really depends on details.

If intentionally slowing non CUTLASS shaders, sure pitchfork time.

If it's an option that /technically/ breaks the CUDA shader compatibility contract, then enabling it in specific "known good" situations is just business as usual for GPU drivers.

That can be for all kinds of reasons - straightforward bugs or incomplete paths in the optimization implementation, the app not actually needing the stricter parts of the contract so can have a faster path, or even bugs in apps that need workarounds.

Though piggybacking into these without understanding can be extremely fragile - you don't know why they've limited it, and you run the risk of tripping over some situation that will simply fail, either with incorrect results or something like a crash. And possibly in rather unexpected, unpredictable situations.

m00x

about 2 months ago

This isn't the same thing

whatevaa

about 2 months ago

Intel did this to consumers, nvidia does this to enterprises.

kilpikaarna

about 2 months ago

3 replies

Keeping it real with the commit msgs

RestartKernel

about 2 months ago

2 replies

I much prefer this over those AI generated commit messages that just say "refactored X" every single commit.

spullara

about 2 months ago

5 replies

what kind of AI are you using that generates shitty commit messages? This a common kind of message from Claude / Augment:

    Fix dynamic channel list by passing auth via metadata
    - Pass userId and userEmail in metadata when calling HTTP transport
    - AuthenticatedToolsProviderFactory now reads from context.metadata
    - Each tools/list request creates a fresh ToolsProvider with authentication
    - Execute command description now correctly shows currently online machines
    - Tested locally and working correctly

RestartKernel

about 2 months ago

1 reply

GitHub Copilot for one, and I'm pretty sure JetBrains' offering does the same.

spullara

about 2 months ago

JetBrains does a good job on them as well. Copilot is shit.

WildGreenLeave

about 2 months ago

4 replies

It is missing the (to me) most important part. The reason why these changes are made.

stpedgwdgfhgdd

about 2 months ago

1 reply

True, you need to instruct the AI agents to include this.

In our case the agent has access to Jira and has wider knowledge. For commit messages i don’t bother that much anymore (i realise typing this), but for the MRs I do. Here i have to instruct it to remove implementation details.

DaiPlusPlus

about 2 months ago

1 reply

> you need to instruct the AI agents to include this.

The agent can't do that if you told Claudepilotemini directly to make some change without telling it why you were prompting it to make such a change. LLMs might appear magic, but they aren't (yet) psychic.

ffsm8

about 2 months ago

I think you're missing context.

He's saying that he likely has an MCP connected to jira on the LLM he's developing with.

Hence the prompt will have already referenced the jira ticket, which will include the why - and if not, you've got a different issue. Now the LLM will only need something like "before committing, check the jira ticket we're working on and create a commit message ...

But whether you actually want that is a different story. You're off the opinion it's useful, I'd say it's rarely doing to be valuable, because requirements change, making this point in time rational mostly interesting in an academic sense, but not actually valuable for the development you're doing

It depends on a ton of factors, and at least I'd put very little stock in the validity of the commit message that it might as well not exist. (And this is from the perspective of human written ones, not AI)

jjcob

about 2 months ago

1 reply

I hate it when I look at some code, wondering why I added a refresh call at that point, I do a git blame to find the commit message, and it says "add refresh call".

rkomorn

about 2 months ago

2 replies

But... I keep being told that commit messages are useless because the code is the documentation, so code diffs are self-explanatory...

dspillett

about 2 months ago

1 reply

That only works if the code is good enough to be the documentation. In DayJob prefer to cover all the bases:

∞ Try make the code sensible & readable so it can be the documentation.

∞ Comment well anyway, just in case it isn't as obvious to the reader (which might be me in a few months time) as it is to me when making the change. Excess comments can always be removed later (and, unless some idiot rewrites history, can potentially be referred to after removal if you have a “why t f” moment), comments you never write can't be found later.

∞ Either a directly meaningful commit message, or at very least ticket references to where more details can be found.

For personal tinkering, I'm a lot less fastidious.

rkomorn

about 2 months ago

> That only works if the code is good enough to be the documentation.

It never actually is at any non-minimal scale (and not even the code authored by the the people who claim code is self documenting).

My comment was rhetorical and sarcastic.

cratermoon

about 2 months ago

2 replies

The code is the "how", sometimes it's necessary to explain the "why".

rkomorn

about 2 months ago

1 reply

Sometimes I wonder if I really do just need to add /s every time I'm being sarcastic.

cratermoon

about 2 months ago

1 reply

Sometimes I think that people who can't write well enough to convey sarcasm when they mean it should just avoid using it and say what they mean.

D-Coder

about 2 months ago

Unfortunately HN does not support use of the Sarcasm font.

rkomorn

about 2 months ago

Sarcasm aside, yes, I 100% agree with you.

When the "why" isn't explained, you end up with things like someone refactoring code and spending time (at best) trying to figure out why some tests now fail or (at worst) breaking something in production.

I'd argue that even the "how" sometimes is better explained in plain words than in code (even if that opens the door for outdated comments when code is changed).

darrenf

about 2 months ago

Isn't "Fix dynamic channel list" the reason?

_joel

about 2 months ago

Set a PR template up, that demands those sections are filled in. Could probably do that down to the commit level with pre-commit but realistically you'd want that level of detail in the in the PR. Also add issue id to the commits too, that way you can pull them up easily and get more context.

xnorswap

about 2 months ago

3 replies

> - Tested locally and working correctly

This is completely meaningless and just pollutes the log.

lambdaone

about 2 months ago

1 reply

"ready for production", "fully working" and other Claude-isms come to mind

withinboredom

about 2 months ago

You're totally right!

joegibbs

about 2 months ago

1 reply

God I can't stand it when I get this kind of output from Claude, they really need to train it out for Claude 5.

"[Tangentially related emoji] I have completed this fully functional addition to the project that is now working perfectly! There are now zero bugs and the system is ready for deployment to production! [Rocketship emoji]"

Then of course you test it out and it doesn't work at all! It's very grating. It would be more bearable if it hedged its claims a bit more (maybe that will negatively affect the quality of the results though - if training a model to output insecure code also makes it a murderous Hitler admirer then, since when humans hedge their output is less likely to be perfect, it may mean it pushes the model to output code that is less than perfect).

sheepscreek

about 2 months ago

> "[Tangentially related emoji] I have completed this fully functional addition to the project that is now working perfectly! There are now zero bugs and the system is ready for deployment to production! [Rocketship emoji]"

This made me laugh so hard. Never trust an AI model saying “There are now zero bugs”! Weaponized incompetence? :)

As a side note, I absolutely am in love with GPT-5 and GPT-5-codex. When I talk to it, it feels like talking to a peer and not an over enthusiastic (but talented) junior with potential. GPT-5-codex on high has been exceptional at debugging insidious bugs.

cratermoon

about 2 months ago

And there's at least an 80% chance one of those items is, in fact, not in the commit.

dspillett

about 2 months ago

> - Tested locally and working correctly

If a human puts that, I doubt it. If I know they are using “AI” to fill in commit message I'll just assume it is a complete hallucination.

lloydatkinson

about 2 months ago

Every time I’ve tried to use AI for commit messages its designers couldn’t be bothered to get it to take into account previous commit messages.

I use conventional commit formats for a reason, and the AI can’t even attempt it. I’m not even sure I’d trust it to get the right designation, like “fix(foo)!: increase container size”.

rkomorn

about 2 months ago

So AI really does learn from humans...

IshKebab

about 2 months ago

3 replies

I think it's fine if you squash it. I have no idea why they didn't squash it before pushing to GitHub though.

speedgoose

about 2 months ago

2 replies

They probably didn’t care. And having many small commits instead of a big squashed one can be useful when using git bisect for example.

davedx

about 2 months ago

Yeah. I have never in my entire career thought "there are too many commit messages" when doing code archeology, but I have sometimes thought "damn, this commit is huge"

IshKebab

about 2 months ago

Not really because CI only needs to pass for the final commit so it's super unlikely that the intermediate ones work.

almostgotcaught

about 2 months ago

@Mogball Mogball enabled auto-merge (squash) 3 months ago

@Mogball Mogball merged commit ade3d49 into main on Jul 9

tkfoss

about 2 months ago

They squashed it before pushing to main.

davedx

about 2 months ago

2 replies

Some criticism of the author here regarding how they structure their diffs.

They "made something ~100 tflops faster" and peoples' comments are "their commit messages are bad"? You guys would hate how John Carmack worked, too

kilpikaarna

about 2 months ago

1 reply

I was appreciative/shitposting.

Would love to see Carmack's commit messages. Just the other day I unsuccessfully tried to look for pictures of his office newer than QuakeIII era. Want ti figure out his ergonomics for working (presumed) 10h days well into middle age.

benji-york

about 2 months ago

Looks pretty normal: https://playcanv.as/p/apIKHp7a

mhh__

about 2 months ago

1 reply

https://github.com/oliverbenns/john-carmack-plan you can read carmacks old .plan files

They're mostly not exactly prose but remember this was almost 40 years ago when the dominant style of writing code in some places was still ye olde K&R C with one letter variable names and goto everywhere

dehugger

about 2 months ago

Found this in there, a great soapbox piece about opengl vs D3D. https://github.com/oliverbenns/john-carmack-plan/blob/master...

globular-toast

about 2 months ago

4 replies

Someone really needs to learn to use `git commit --amend`. Almost 100 commits with pointless commit messages like "wip" or "x"? Be kinder to your reviewers...

Zardoz84

about 2 months ago

1 reply

and git commit --fixup git rebase -i --autosquash

lemonlearnings

about 2 months ago

1 reply

Claude, look at this git history, analyse diffs and create an intelligent commit message to replace each commit message. Do a rebase to fix it all up.

kristopolous

about 2 months ago

3 replies

Would you actually do that? It's information destruction. You can machine generate at any time, but you can only delete the human input once

lloeki

about 2 months ago

1 reply

> you can only delete the human input once

    git branch "backup/$(git branch --show-current)/$(date +%s)"
    # do whatever you fancy
    git reset --hard "backup/$(git branch --show-current)/${thattimestampabove}"

You can't lose anything as long as you have a pointer to it (which doubles as making it easy to find)

globular-toast

about 2 months ago

2 replies

No need to make a "backup" branch. Learn to trust the reflog.

lloeki

about 2 months ago

reflog doesn't keep things through gc.

the only thing that does keep a git object is having a ref that (directly or transitively) points to it

lemonlearnings

about 2 months ago

That is like learn to trust the indestructibility of matter. I can still lose (not able to locate...) my keys even though they still exist!

trenchpilgrim

about 2 months ago

My human inputs are usually commit messages like "awdjhwahdwadga" until I do the rebase at the end

lemonlearnings

about 2 months ago

? I assume you want to replace your jubberish messages with something more useful before pushing? It is only "destroying" https://xkcd.com/1296/ style crap? Code changes stay the same.

lmm

about 2 months ago

8 replies

Why do you care? Small commits are great for git bisect, and having to come up with a fancy message can break your flow. Code reviewers generally review a whole PR diff, not the individual commits. Fussing about commit messages smacks of prioritising aesthetics over functionality.

gorgoiler

about 2 months ago

2 replies

You have the right idea but, I believe, the wrong reasoning with your first two arguments.

git-bisect works best when every commit works, contains a single idea, and stacks in a linear history. These features are of most use in a publicly visible branch, and is why it is helpful to squash an entire pull-request into a single, atomic commit — one which clearly defines the change from before- to after-this-feature.

You’re welcome to do whatever you like in your private branch of course, but once you are presenting work for someone else to review then it’s consistent with “I believe this is now complete, correct, working, and ready for review” to squash everything into a single commit. (The fact that code review tools show the sum of all the minor commits is a workaround for people that don’t do this, not a feature to support them!)

In terms of ‘git commit -m wip’: no one is saying you should wear a suit and tie around the house, but when you show up for your senate hearing, presenting yourself formally is as necessary as it is to leave the slides, sweat pants, and tee shirt at home.

Yes, commit early and often while in the flow of putting together a new idea or piece of work. When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)

Or to use a different analogy: they don’t want every draft of your master’s thesis from start to finish and they’ll be annoyed if they have to fix basic typos for you that should’ve been caught before the final draft. They don’t care about the typos you already found either, nor how you fixed them. They just want the final draft and to discuss the ideas of the final draft!

Conversely if your master’s thesis or git branch contains multiple semantically meaningful changes — invent calculus then invent gravity / add foo-interface to lib_bar then add foo-login to homepage — then it probably ought to be two code reviews.

lmm

about 2 months ago

2 replies

> git-bisect works best when every commit works, contains a single idea, and stacks in a linear history. That works best in a publicly visible branch, and is why it is helpful to squash an entire pull-request into a single, atomic commit — one which clearly defines the change from before- to after-this-feature.

Disagree; git-bisect works best when every commit is small and most commits work (in particular, as long as any broken commit is likely to have a neighbour that works - isolated bad commits aren't a problem (that's what skip is for, and it's easy enough to include that in your script - you do automate your bisects, right?), long chains of bad commits are). Squashing means your bisect will land on a squashed commit, when it's only really done half the job. (In particular, the very worst case, where every single one of your intermediate commits was broken, is the same as the case you get when you squash)

> When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)

And that's what the PR description is for! You don't have to destroy all your history to make one.

gorgoiler

about 2 months ago

3 replies

Thanks for responding. Everything you say I agree with. I think our differences lie in the scope of how much of my private activity do I want to share in public.

You’re right that GitHub, GitLab et al let you use their tooling to write the final commit message (for the merge commit or squash commit). My preference has always been to do that in git itself.

In both cases you end up with a single atomic commit that represents the approved change and its description. For me, the commit is created the moment a review is requested, instead of from the moment it is approved and landed. One reason this is particularly useful is that you can now treat the commit as if it had already landed on the main branch. (It is easier to share, cherry-pick, rebase, etc. — easier than doing so with a branch of many commits, in my experience.)

Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.

1313ed01

about 2 months ago

You can also use just plain git and make sure the merge-commit has a useful message, while leaving the work-branch unsquashed and available to explore and bisect when necessary. The main branch looks as neat as when using squashes, and using something like git log --merges --first-parent all the small commits on the work-branches are hidden anyway. It looks just like when using atomic commits, but the extra details are still there when someone needs them.

isleyaardvark

about 2 months ago

I've followed your same approach but switched to having just "most of the commits work" after I found out about git bisect --skip.

lmm

about 2 months ago

> One reason this is particularly useful is that you can now treat the commit as if it had already landed on the main branch. (It is easier to share, cherry-pick, rebase, etc. — easier than doing so with a branch of many commits, in my experience.)

> Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.

You can do that even earlier if you simply never squash or otherwise edit history, which is my approach - any pushed feature branch is public as far as I'm concerned, and my colleagues are encouraged to pull them if they're e.g. working on the same area at the same time. It comes at the cost of having to actually revert when you need to undo a pushed commit (and, since cherry-pick is not an option, if you're making e.g. a standalone fix and want your colleagues to be able to pull it in unrelated branches, you have to think ahead a little and make it on a new branch based from master rather than sticking it in the middle of your incomplete feature branch), but it's very much worth it IME.

amelius

about 2 months ago

2 replies

Can't git-bisect simply ignore commits with a commit message smaller than N characters?

lambdaone

about 2 months ago

1 reply

That might sound facile, but it's actually a great idea. Being able to ignore commits based on regexes would be even more powerful.

s_gourichon

about 2 months ago

Not sure it really has huge benefits, but I guess something like this should work:

``` #!/bin/sh N=20 msg_len=$(git log -1 --pretty=%B | wc -c) if [ "$msg_len" -lt "$N" ]; then exit 125 fi # Here you would run your actual test and report 0 (good), 1 (bad) as needed exit 0 ```

lmm

about 2 months ago

Yes - or, being more principled, you can tell it to only use mainline commits (first parent) if landing on a whole PR is what you want.

davedx

about 2 months ago

2 replies

This is an extremely opinionated and time consuming way of working. Maybe in this context it makes sense (nvidia driver kernel somethings), but I don't think it's universally the best way to write code together.

gorgoiler

about 2 months ago

I agree that it’s time consuming but the complexity is constant, in my personal experience and with helping others, in that once you start writing long form commit messages (a) you only ever get faster at it, as a skill; and (b) it’s hard to stop!

fragmede

about 2 months ago

One of the best things about git, and the reason it won, is that as a tool, it's extremely unopinionated on this matter, and is supportive of however you want to do it. Of course, one of the worst things about git is how unopinionated it is. If you want 300 commit messages in every branch with the commit message of "poop" and no squashing, and none of them even compile, the tool isn't going to stop you. If every commit is fully functional and rebased on top of master so the graph isn't an octopus, you can. If you'd rather use the name main as the primary branch, also totally fine. Git, the tool leaves all that up to the user and the culture they operate in.

Naturally, I have Opinions on the right way to use git, having used it since inception within various different contexts at various places, along with other VCSs. What works at one place won't be right for another place, and vice versa. Especially given different skill levels of individuals and of teams, the tools involved, and how much weight I have to review code and commits before it gets accepted. What's important is it should work for you, not the other way around. Regardless of where I'm working though, my local commit messages are total crap. "wip" being the most common, but I commit frequently. Importantly though, before I do eg a slightly involved refactor, going back to see what it was before I started is trivial. Being a skilled operator of git is important to make it easy to run newly written tests against the old code. Being efficient at rebase -i and sorting commits into understandable chunks and squashing minor commits to keep things clean is key.

I don't think every patch in a series has to work totally independently for every git repo, but what it comes down to is maintenance. There's nothing worse than digging around in git history, trying to figure out why things are how they are, only to dead end at a 3000 line commit from 5 years ago with the message "poop, lol". It's even worse when the person who did that was you!

Universally, what it comes down to is maintenance. That totally rushed prototype that was just for a demo has now been in production for years, and there's this weird bug with the new database. If you hate yourself, your job, future you, your colleagues, and everybody that comes after you, and you're no good at git, by all means, shit out 300 commits, don't squash, and have the PR message be totally useless. Also believe you're hot shit after one semester of boot camp and that no one else cares just because you want to go home, get high, and play xbox. (Not remotely saying that's you, but those people are out there.)

We could get all philosophical and try and answer the question of if there are any universal truths, nevermind universally best git commit practices.

I don't work where you work, don't know your team, or anybody's skill levels on it, so I'll just close with a couple thoughts. the tool is there to work for you, so learn to work with it not against it. Git bisect is your friend. And that it really sucks 4 years later to be confronted by totally useless commit messages on inappropriately sized commits (too big or too small) and have to guess at things in order to ship fixes to prod (on the prototype that was supposed to get thrown away but never did) and just hope and pray that you've guessed correctly.

globular-toast

about 2 months ago

1 reply

The only sane thing a maintainer can do with something like this is squash it into one commit. So if you care about `git bisect` then you don't want this.

Why do I care? Interesting question. I'm generally a person who cares, I guess. In this specific case it seems analogous to a mechanic having a tidy workshop. Would you leave your bicycle with someone head to toe in grease and tools strewn all over the place? I wouldn't.

lmm

about 2 months ago

> The only sane thing a maintainer can do with something like this is squash it into one commit. So if you care about `git bisect` then you don't want this.

If someone is git bisecting trying to track down a bug they want to land on the smallest code diff possible, not the whole feature change.

> In this specific case it seems analogous to a mechanic having a tidy workshop. Would you leave your bicycle with someone head to toe in grease and tools strewn all over the place? I wouldn't.

Generally I'd be suspicious of a mechanic whose workspace looked too tidy. There's a balance, but it's very easy to keep your tools organised if you're not actually using them much.

mort96

about 2 months ago

1 reply

Small commits where each commit represents nothing of value and doesn't compile are terrible for git bisect. And for code review.

A code reviewer doesn't care that you spent 10 days across 100 commits tweaking some small piece of code, the code reviewer cares about what you ended up with.

The bisecter probably wants to compile the code at every commit to try and see if they can reproduce a bug. Commits which don't compile force the bisecter to step through commits one by one until they find one which compiles, linearizing a process which should be logarithmic.

lmm

about 2 months ago

1 reply

> Commits which don't compile

Those are bad, don't do that. (At least don't do long chains of them. The occasional isolated one is ok)

I still find writing a "good" commit message to be a much heavier process than making sure that it compiles.

mort96

about 2 months ago

1 reply

But you disagree with the other part of my comment, or..?

lmm

about 2 months ago

1 reply

I agree that:

- Human readers don't care about the details of everything you tried

- Commits that both don't compile and have no useful message are of very little value

But I don't think that's an argument against making small commits with uninformative messages, because making those commits without breaking compilation - or even explicitly checking that it compiles before committing - is much easier than coming up with a full commit message. And small commits that do compile - or even small commits of which a significant proportion compile - are very useful for bisect.

mort96

about 2 months ago

1 reply

But you don't agree that it's better if those 100 tiny commits with messages like "x" and which all just tweak the same piece of code are cleaned up with some squashing and rewording before making a PR?

lmm

about 2 months ago

> But you don't agree that it's better if those 100 tiny commits with messages like "x" and which all just tweak the same piece of code are cleaned up with some squashing and rewording before making a PR?

Not unless they're literally just e.g. making a change and undoing it. In general 100 tiny commits with messages like "x" don't make human PR review any harder and they make (automated) bisect to find a bug better.

PunchyHamster

about 2 months ago

It makes it bitch to know why given change was made.

If you change 2 lines 8 times to check something just squash the commit, saves everyone the hassle

Vegenoid

about 2 months ago

Do you frequently dig in to new codebases? I do, and commits that contain a functional, complete idea with a descriptive commit message are immensely useful to me for understanding why the code is the way it is.

rbanffy

about 2 months ago

> having to come up with a fancy message

Message doesn’t need to be fancy, but it should describe what you did. Being unable to articulate your actions is a thought smell. It’s often seen when the developer is trying stuff until it sticks and needs a commit because the only way to test the fix is in a testing environment, two bad practices.

amelius

about 2 months ago

Plus they might have a tool to rewrite the commit history where the commit message is "wip" and the commit is older than $DATE.

almostgotcaught

about 2 months ago

When I was a kid and my family first moved to the US my dad used to take me for walks. He didn't know anything about the country (couldn't speak English yet) so the only thing he could comment on was what he imagined the prices of all the houses were. Some people just feel the need to comment on something even when they don't understand anything.

almostgotcaught

about 2 months ago

3 replies

Literally no one looks through the individual commits in a PR that's gonna be squashed. I don't care if it's 10 or 10,000 - I'm always gonna review the full thing.

tylerchr

about 2 months ago

1 reply

You might be surprised. Yours sounds like the attitude of someone who has not had the luxury of reviewing well-constructed commits. PRs with intentional commits permit both faster and deeper reviews—but alas, not everyone is so respectful of their reviewers’ time and energy.

almostgotcaught

about 2 months ago

1 reply

> Yours sounds like the attitude of someone who has not had the luxury of

Sometimes when people speak rhetorically I'm baffled because I feel they literally do not understand what they're saying because they end up supporting an opposing rhetorical purpose. Yes you're 100% correct well-structured commits are a luxury that most of us do not have the privilege of experiencing because we work in high-pressure, deadlines driven environments where no points are awarded for beautifully crafted commit messages.

So in effect your argument is like "people that haven't had the luxury of a Michelin star restaurant don't appreciate amuse bouche and they should strive to rectify that".

tylerchr

about 2 months ago

1 reply

Yeah, exactly. It seems like you understand just fine.

You claimed that “literally no one” has a different review workflow than yours. I do, and my experience is that clear commits make reviews both faster and deeper, which is very helpful specifically in a high-pressure, deadline driven environment where being slow and wrong is costly. You’re of course free to disagree and work differently.

fragmede

about 2 months ago

You gotta go slow in order to go fast, and utterly useless commit messages and inappropriate commit sizes will bite you in the ass. They don't have to be the most beautiful commits ever, but ideally there's a minimum standard we can all live up to.

To use restaurants as the analogy, Michelin star-grade dining might be unavailable, and we might have to live the Olive Garden, or even McDonald's life. Regardless of which restaurant we're at though, if the food is moldy and gross, we shouldn't eat it.

TuxSH

about 2 months ago

> that's gonna be squashed

Isn't your interpretation backwards in some cases? What I mean, is that _because_ you see the intermediate commits are garbage, you _then_ decide not to review the individual commits (because you are interested in the contribution anyway).

I certainly do care for the hobby FOSS projects I maintain, and bad commit messages + mega-commits won't fly at my day job.

Squash-merging has the advantages of making 1 PR == one commit with the PR ID in the commit message, sure, but it's unfortunately promotes bad Git hygiene (and works around it)

reitanuki

about 2 months ago

Plenty of people do. At least at my work (and yes we squash PRs too). For some changes it's an easy way to make review way more sane.

For an illustration of the scale of this, search GitHub for 'commit by commit': https://github.com/search?q=%22commit+by+commit%22&type=pull... (2M results)

stahorn

about 2 months ago

1 reply

The proper way to work with git: Commit like a madman on your private branch. Short messages, written in seconds, just to be able to remember what you were doing if you are interrupted and have to get back into your work later. If you have a CI pipeline, often you have to make small changes until it works, so no reason to bother with smart commit messages.

At some point, you will have something working that makes sense that clean up. Then use interactive rebase to create one or a few commits that "makes sense". What makes sense is one of these topics that could create a whole bike garage, but you and your team will have some agreement on it. One thing that I like is to keep pure refactorings by themselves. No one cares to review that you've changed typos in old variables names and things like that. If it's a separate commit, you can just skip over it.

Depending on if you are completely done or not, the resulting branch can be sent as a PR/MR. Make sure that all commits have a reason why the change was made. No reason to repeat what the code says or generate some AI slop message. Your knowledge of why a change was done in a certain way is the most valuable part.

Of course, this way of working fits my work, that is not cloud based in any way and with lots of legacy code. It creates git history that I would like to have if I have to take over old projects or if I have to run git bisect on an unfamiliar code base and figure out some obscure bug. You might have a completely different technology stack and business, where it makes sense to work in some other way with git.

globular-toast

about 2 months ago

There is no "proper" way to use git, there is only a proper way to interact with your fellow developers.

You can do whatever you like locally. That's the luxury of git: it's distributed so you always get your own playground. You can make commits with short names if you find that useful. Personally I prefer to track my progress using a todo system, so I don't need commits to tell me where I am. I use `stash` instead of committing broken stuff if I need to switch branches.

I've found the idea of "rebase later" always easier said than done. In my experience if you work like that you'll more often than not end up just squashing everything into one commit. I prefer to rebase as I go. I'll rebase multiple times a day. Rearranging commits, amending the most recent commit etc. Keeping on top of it is the best way to achieve success at the end. It's like spending that bit of extra time putting your tools away or sweeping the floor. It pays off in the long run.

haunter

about 2 months ago

7 replies

Heh. Does anyone remember when almost 25 years ago ATI (AMD) caught manipulating the Quake III benchmarks by renaming the executables to ‘quack’?

https://web.archive.org/web/20230929180112/https://techrepor...

https://web.archive.org/web/20011108190056/https://hardocp.c...

https://web.archive.org/web/20011118183932/www.3dcenter.de/a...

a-french-anon

about 2 months ago

1 reply

Or Intel checking for "GenuineIntel" in ICC's output: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Support...

cratermoon

about 2 months ago

1 reply

Or Win 3.1 looking for whatever shibboleth was in MS-DOS and popping up a scary-looking message if it found another DOS? https://en.wikipedia.org/wiki/AARD_code

keanb

about 2 months ago

1 reply

I don’t think anybody remembers this since that code never shipped in retail.

GeekyBear

about 2 months ago

1 reply

It didn't ship (in the final retail version) only after the tech press of the day exposed what Microsoft had done.

cratermoon

about 2 months ago

It did ship in the final retail version, in way. It was disabled, but the code was still there, and a flag was all that was needed to enable it.

mattlondon

about 2 months ago

3 replies

Just in case anyone else parsed that sentence the same way as me, ati detected "quake" as the executable and changed things like texture quality etc to increase benchmark performance. Some people discovered this after they renamed the executable to "quack" and the image quality improved but the benchmarks were lower, proving that the ati drivers "optimised" by reducing quality.

Ati did not rename quake to quack as I originally thought from this! :)

a_wild_dandan

about 2 months ago

1 reply

Thank you for explaining. I was so confused at how AMD was improving Quake performance with duck-like monikers.

stavros

about 2 months ago

2 replies

Well, if it _looks_ like a high-performance texture renderer, and it _walks_ like a high-performance texture renderer...

_joel

about 2 months ago

1 reply

It's probably been duck typed

tbalsam

about 2 months ago

shocked quack

taneq

about 2 months ago

If it looks like a benchmark and it quacks like a benchmark… duck?

Dwedit

about 2 months ago

The story was that they used a lower mipmap level (blurrier textures) when the process was named Quake, but used the normal mipmap level (standard textures) when the process was named Quack.

robotresearcher

about 2 months ago

So the additional performance came with a large bill?

mort96

about 2 months ago

1 reply

Are there any archives of that techreport article with images intact?

haunter

about 2 months ago

Ah yes they changed the site and URL system after some years, here is the OG one with screenshots

Page 1 https://web.archive.org/web/20071028172853/http://techreport...

Page 2 https://web.archive.org/web/20111130162817/http://techreport...

Page 3 https://web.archive.org/web/20080213212637/http://techreport...

Page 4 https://web.archive.org/web/20101110031431/http://techreport...

Page 5 https://web.archive.org/web/20101108144857/http://techreport...

Cthulhu_

about 2 months ago

3 replies

This is weirdly common; phone chipset manufacturers did it with phone benchmarks [0], VW with emissions [1], nVidia did it with 3DMark [2], Intel with the SPEC benchmark for its Xeon processors [3], etc.

When it comes to computer graphics, iirc it's pretty normalized now - graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.

(As an aside, I hate that I have to link to archive.org, there's a lot of dead links nowadays but these are important things to remember).

[0] https://web.archive.org/web/20250306120819/https://www.anand...

[1] https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal

[2] https://web.archive.org/web/20051218120547/http://techreport...

[3] https://www.servethehome.com/impact-of-intel-compiler-optimi...

darkmighty

about 2 months ago

2 replies

> graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.

Maybe hyperbole, but I think obviously they can't do this for literally every game, that would require huge personnel resources. At least looking at mesa (linked elsewhere), only ~200 games are patched, out of what 100k PC games? So <1%.

account42

about 2 months ago

Mesa is a lot more conservative about this than the proprietary drivers.

0x457

about 2 months ago

Well, pretty much every large AAA game launch complimented by GPU driver upgrade that adds support for that game. It's in the patch notes.

wat10000

about 2 months ago

Goodhart's law: when a measure becomes a target, it ceases to be a good measure.

cesarb

about 2 months ago

> When it comes to computer graphics, iirc it's pretty normalized now - graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.

Even Mesa has them: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/uti...

torginus

about 2 months ago

2 replies

Every vendor does this to this day - and its a morally grey practice, drivers hijack and modify the rendering loops of popular games, fixing bugs, replacing shaders with more optimized versions, enabling faster codepaths in the driver etc.

These changes are supposed to have minimal to no impact on the actual output, but sometimes vendors are really aggressive, and significantly degrade the outputs so that the game can run faster on their hardware.

Aurornis

about 2 months ago

1 reply

> but sometimes vendors are really aggressive, and significantly degrade the outputs so that the game can run faster on their hardware.

Do you have a source for this? I’d like to see some examples

AnotherGoodName

about 2 months ago

2 replies

Nvidia has a control panel with it's drivers. Open it up -> Manage 3D settings -> Program Settings. Scroll through and see how every single program/game you have installed openly has different defaults in it based on application name. As someone noted above others do the same thing.

Eg. Frostpunk has Antialiasing for transparency layers on. Slay the spire does not. I never set these settings. Nvidia literally does a lookup on first run for what they judge as best defaults and sets these appropriately.

Every single game/program you install has different options from a huge list of possible optimizations.

umanwizard

about 2 months ago

1 reply

Applying different standard settings is pretty different from "hijacking and modifying the rendering loop", though.

altairprime

about 2 months ago

1 reply

In what sense? The render loop is modified from “the” default without user or program opt-in, and “hijacking” is what it would be called if anyone but Nvidia did it — so Nvidia is not exempt from that use. Though: Runtime patch, haxie, hijack, LD_PRELOAD, system extension; the noun changes every few years, so perhaps it’s time for a new one. Override?

umanwizard

about 2 months ago

But the comment I replied to wasn’t talking about runtime patching or any of the other settings you mentioned. It was talking about changing GPU settings for specific programs. Not changing anything about the program itself.

dahart

about 2 months ago

That’s not what @torginus was referring to. There’s nothing wrong with having and exposing application specific settings. There’s nothing wrong with drivers having application specific optimization patches either, but that’s a very different thing.

surajrmal

about 2 months ago

1 reply

Sadly it's built into the vulkan protocol. Even a fully userspace driver arrangement with a microkernel ends up giving the driver access to the client's information. Of course it's forgeable the way it's done though so you could opt out if you really wanted to.

[1]: https://github.com/KhronosGroup/Vulkan-Headers/blob/main/inc...

doubletwoyou

about 2 months ago

I mean Khronos put that in for a reason. If the drivers didn't get explicit information about the application being run, they would do silly heuristics like quake3 to squeeze out performance.

Adachi91

about 2 months ago

I feel like omitting AMD is relevant here for anyone who doesn't know the acquisition history of ATI, AMD had no involvement with this.

bayindirh

about 2 months ago

For more context and deeper discussion on the subject, see https://news.ycombinator.com/item?id=44531107

Funnily, it's under an older submission of the same cutlass optimizations.

high_na_euv

about 2 months ago

3 replies

I work with compilers

And despite it not being nice, some optimizations rely on type or function names schemas/substrings/etc

It sucks, but thats how it works

It doesnt have to be malicious just sometimes it is safer to deploy optimization only for your libs than risk breaking stuff

Or your frontend is not giving you more data which you can rely on

wiz21c

about 2 months ago

3 replies

On function types or schema, I can understand that. But names ?

colejohnson66

about 2 months ago

1 reply

Not exactly the same, but intrinsics in some languages are purely name+signature matched.

high_na_euv

about 2 months ago

Sure, but will anyone name a method like "__nvidia_experimental_feature_xyz_v1"?

high_na_euv

about 2 months ago

Yes.

E.g "__nvidia_experimental_feature_xyz_v1"

mhh__

about 2 months ago

Something like:

    if(AskLLM("Does function signature+name look like error handling code")) {
        TurnOffInliner();
    }

is actually probably a lot more effective than you'd think (generating PGO traces with a machine learning tool is apparently a thing that sort of works)

tliltocatl

about 2 months ago

It is probably not malicious, but it certainly does create new barriers, which is not a good thing.

Hizonner

about 2 months ago

> than risk breaking stuff

... until somebody randomly chooses the same name for some reason and gets hosed.

You're not helping.

trumbitta2

about 2 months ago

1 reply

Reminds me of when ~10 years ago with a particular version of Webpack the build would fail if I had one SVG called add.svg so I had to rename it plus.svg

compass_copium

about 2 months ago

or breaking numpy by importing your api key from a little file you wrote called "secret.py" :P

user3939382

about 2 months ago

1 reply

It would be nice if we could find economics that allowed us to share code instead of all the bullshit with the binary blob drivers. Same for basebands and everything else. How many collective hours and months of our society’s finest minds has been wasted reverse engineering binary blobs, controllers through IO pins, trying to reverse engineer circuit schematics —- when all of this is already sitting on someone’s computer somewhere and they could just GIVE you the docs. CUDA and NVIDIA can go to hell.

imtringued

about 2 months ago

1 reply

The problem is as follows: You have a fixed cost investment to produce a software code base, then you have fixed ongoing maintenance costs, at a minimum one developer who knows the codebase. Preferably two for a commercial product. On top of that you have small distribution costs over time. E.g. servers that host the software downloads.

The marginal costs per user are very small or even zero for desktop applications. This means that software needs a funding structure with periodic payments, but at the same time the payments shouldn't grow with the number of users. There also needs to be a way for the initial investors who pay for the creation of new features or entire code bases to get their money back as the product becomes popular.

This in itself is not problematic, but it is not covered by traditional crowdfunding. The problem is that the funding goal needs to be met no matter what, and the contribution per user shrinks as more users contribute. You can't expect everyone to chip in 100%, 10% or even 1% of the funding cost, since that could be thousands of dollars even at the minimum. You need some sort of auctioning process where people can pledge a fixed quantity and if the user count is low enough, their pledge counts, otherwise it doesn't.

This has one problem though. What's problematic is the transition from the exclusive to non-exclusive mode.

There will be freeloaders who might pitch in five dollars, but they know five big corporations have chipped in and this covered the full development cost, leading to open sourcing the entire codebase. Everyone else is a freeloader. Including cheapskate corporations.

user3939382

about 2 months ago

That’s a problem but it’s not the irreducible problem. Which is that the computing stack is constructed wrong (obvious and fixable) so you have these ridiculous requirements in the first place. The capital issue is a second order effect.

david-gpu

about 2 months ago

1 reply

Sometimes you write some heavily tuned code in a high level language like C++ that you know could be translated into very specific GPU assembly, then find that the compiler isn't producing the exact assembly that you had in mind.

When you talk to the computer team about it they may offer a range of solutions, some of which may not be applicable to open source code. Picture proprietary #pragmas, intrinsics, or whatnot. What do you do? You can't ship a high performance library that doesn't deliver high performance. It is then when you rely on things like function names to enable specific code transformations that can't be used in general because they would sometimes break third party code.

I never worked on Cutlass, but this is the sort of thing that is done in the real world.

There is nothing nefarious about this sort of optimizatkon. People comparing this to cheating on benchmarks by rendering lower quality images are not on the right track.

umanwizard

about 2 months ago

1 reply

Why wouldn’t you just use inline assembly in that case?

krapht

about 2 months ago

There is no way to write inline SASS (assembly equivalent) for CUDA code. You can inline PTX, but PTX is a high level bytecode designed to be portable.

toolslive

about 2 months ago

nil novi sub soli.

- Intel faced a "cheating compiler" controversy when SPEC, the benchmark standard-setter, invalidated over 2,600 benchmark results for Intel Xeon processors in early 2024. ( https://www.tomshardware.com/pc-components/cpus/spec-invalid... )

- microsoft doing similar things (java benchmarks, C compiler benchmarks)

- and everybody cheating on AI benchmarks (https://www.thestack.technology/ai-benchmarking-scandal-were...)

6 more comments available on Hacker News

View full discussion on Hacker News

ID: 45458948Type: storyLast synced: 11/20/2025, 7:35:46 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN