Quantifying Pass-by-Value Overhead
Posted2 months agoActive2 months ago
owen.cafeTechstory
calmmixed
Debate
60/100
C++Performance OptimizationCompiler Design
Key topics
C++
Performance Optimization
Compiler Design
The article discusses the overhead of passing structs by value in C++, and the discussion revolves around the implications of the findings and the limitations of microbenchmarks.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
10h
Peak period
10
42-48h
Avg / period
3.5
Comment distribution28 data points
Loading chart...
Based on 28 loaded comments
Key moments
- 01Story posted
Oct 27, 2025 at 8:52 PM EDT
2 months ago
Step 01 - 02First comment
Oct 28, 2025 at 6:44 AM EDT
10h after posting
Step 02 - 03Peak activity
10 comments in 42-48h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 30, 2025 at 10:05 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45728170Type: storyLast synced: 11/20/2025, 5:33:13 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
What a fascinating CPU bug. I am quite curious as to how that came to pass.
It would be great to repeat the author’s tests on other CPU models
Apple's M2 uses 128-byte cache line.
I wonder if it has to do with a non-ideal implementation of virtual address resolution for the next page.
By the way, the fastest way was branchless linear scan up to 32-64 elements, as far as I remember.
I don't know about this, whenever I've benchmarked it on my use cases, unordered_map started to become faster than vector at well below 100 elements
Yet remembering how to measure time with nanosecond precision is the burden?
> By the way, the fastest way was branchless linear scan up to 32-64 elements, as far as I remember.
The analysis presented in the article is far more interesting, qualified, and useful that what you've produced here.
Pass by value describes the semantics of a function call, not implementation. Passing a const reference in C++ is pass-by-value. If the user opts to pass "a copy" instead, nothing requires the compiler to actually copy the data. The compiler is required only to supply the actual parameter as if it was copied.
[1] there is also call-by-push-value, but i was never able to wrap my mind around it.
I can cast the const away. The implementation does not hide this detail. The semantics therefore must be understood by the programmer.
The semantics of pass by const reference are also not exactly the same as pass by value in C++. The compiler can't in general assume a const reference doesn't alias other arguments or global variables and so has to be more conservative with certain optimizations than with pass by value.
Presumably this means for all arguments combined? If for example you pass four pointers each pointing to a 256-byte struct, you probably don’t want to pass all four structs (or even just one or two of the four?) by value instead.
In real world code, your caches and the CPU’s pipeline are influenced by some complex combination of what happens at the call site and what else the program is doing. So, a particular kind of call will perform better or worse than another kind of call depending on what else is happening.
The version of this benchmark that would have had predictive power is if you compared different kinds of call across a sufficiently diverse sampling of large programs that used those calls and also did other interesting things.