When Compiler Optimizations Hurt Performance
Posted3 months agoActive3 months ago
nemanjatrifunovic.substack.comTechstory
calmmixed
Debate
60/100
Compiler OptimizationsPerformanceLow-Level Programming
Key topics
Compiler Optimizations
Performance
Low-Level Programming
The article discusses how compiler optimizations can sometimes hurt performance, and the discussion highlights the complexities of compiler optimizations and the importance of measurement in optimization.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
7d
Peak period
20
168-180h
Avg / period
9
Comment distribution36 data points
Loading chart...
Based on 36 loaded comments
Key moments
- 01Story posted
Oct 14, 2025 at 5:49 AM EDT
3 months ago
Step 01 - 02First comment
Oct 21, 2025 at 2:52 AM EDT
7d after posting
Step 02 - 03Peak activity
20 comments in 168-180h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 22, 2025 at 4:07 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45578080Type: storyLast synced: 11/20/2025, 5:45:28 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
for Zen5 rustc creates the following:
https://rust.godbolt.org/z/hz1eKjnaGWe can assume the count has already been done.
Struggling proc \n lzcnt ecx,ecx \n test cl,cl \n setz al \n lea edx,[ecx-2] ;[0,1,2] \n cmp rdx,2 \n cmovle eax,ecx \n ret \n Struggling endp
lzcnt ecx,ecx
mov eax, 00100 0011 0010 0000 0001b shl ecx,2 shrx eax,eax,ecx and eax,15
Intuitively I understand why it is a hard problem. Micro-optimizations have deterministic properties that are simple enough that optimality is all but certain. Macro-optimization heuristics create incidental minor pessimization effects that are buried below the noise floor by major optimization effects on average.
In the middle are optimizations that are too complex to guarantee effective optimization but too small in effect to offset any incidental pessimization. It is easy to inadvertently make things worse in these cases. Most cases of surprising performance variances I see fall into this gap.
It is also where the codegen from different compilers seems to disagree the most, which lends evidence to the idea that the “correct” codegen is far from obvious to a compiler.
Can u give some examples?
Perhaps you mean small granular choices which occur widely throughout the code?
For example: you decide to unroll/inline/vectorize code has unpredictable effect on performance.
Given a specific program, yes it is measurable and predictable.
A compiler has to deal with arbitrary programs, actually - small fragments of arbitrary programs at a time. Even if you could have an oracle giving you very precise measurements for the program fragment the compiler is dealing with, it won't hold up in practice.
Consider a small representative example. Suppose the compiler is compiling a function, and it did the best possible job.
Now, inlining this function at two callsites could have completely different second order effects like further optimizations, register pressure. There are partial solutions of course, people have spent their entire careers dealing with this specific problem. Rest assured that whatever you can come up with after thinking for a minute has been tried in 1970.
There can be competition between optimizations. At a given moment, the preconditions for both optimization O1 and optimization O2 may hold, but if you then apply O1, the precondition for O2 may no longer hold and vice versa.
In such cases, the best solution is to pick the better of O1 and O2.
Cranelift has an optimiser based on it, for classic optimisations within an IR.
KazeEmmanuar did a great job analyzing exactly this so we don't have to!
https://www.youtube.com/watch?v=Ca1hHC2EctY
The whole thing reminds me in a fuzzy way that I don't yet fully comprehend of register-memory architecture.
It’s also a reminder that compilers aren’t magic and optimizers aren’t all-knowing.
People have a tendency to ascribe magical properties to anything they don’t understand. Compilers and optimizers more so.
Measure before and after any optimization. You’d be surprised at what does and doesn’t improve efficiency.
Today’s CPUs are more complex than a human can understand.
Measure twice, cut once.
Only because it used to be true in the past, someone told it was so during a coffee break, or it appeared in some computer related article.
Branch predictors are a lot bigger and more capable now than they were and there is every chance when the optimisation for jump tables was tested against the equivalent branch code it outperformed it, but on more recent CPUs the situation reverses.