The Cost of a Closure in C
Key topics
The debate around implementing closures in C sparked a lively discussion, with some commenters embracing the idea of explicit context management, à la POSIX's ucontext.h, as a more "C-like" approach. Others drew parallels with existing features in languages like Raku and Rust, highlighting how they handle stateful functions and async operations. While some questioned the testing methodology behind the original article's findings, others pointed out that a "Sufficiently Good" compiler could potentially optimize away differences between various lambda implementations. As commenters explored the complexities of closure allocation and state management, a consensus emerged that a well-designed compiler could make a significant difference in performance.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
29
6-12h
Avg / period
7.5
Based on 98 loaded comments
Key moments
- 01Story posted
Dec 11, 2025 at 2:21 AM EST
22 days ago
Step 01 - 02First comment
Dec 11, 2025 at 3:32 AM EST
1h after posting
Step 02 - 03Peak activity
29 comments in 6-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 15, 2025 at 4:52 AM EST
18 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
- where does the automatically defined struct live? Data segment might work for static, but doesn't allow dynamic use. Stack will be garbage if closure outlives function context (ie. callback, future). Heap might work, but how do you prevent leaks without C++/Rust RAII?
- while a function pointer may be copied or moved, the state area probably cannot. It may contain pointers to stack object or point into itself (think Rust's pinning)
- you already mention recursion, compilation
- ...
Another complication is that it would be beneficial to be able to optimize state storage in the same way that stack frame resources are optimized, including things like coalescing equal values in conceptually distinct state instances. This would (I think) preclude things like sizeof(statetype(f)) which you really want for certain types of manual memory management, or it would require multiple compiler passes.
[1] https://github.com/ThePhD/future_cxx/issues/55#issuecomment-...
Raku (née Perl 6) has this! https://docs.raku.org/language/variables#The_state_declarato...
C++Builder’s entire UI system is built around __closure and it is remarkably efficient: effectively, a very neat fat pointer of object instance and method.
[*] Edit: two dates on the paper, but “bound pointer to member” and they note the connection to events too: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n13...
Practically speaking all lambda options except for the one involving allocation (why would you even do that) are equivalent modulo inlining.
In particular, the caveat with the type erasure/helper variants is precisely that it prevents inlining, but given everything is in the same translation unit and isn't runtime-driven, it's still possible for the compiler to devirtualize.
I think it would be more interesting to make measurements when controlling explicitly whether inlining happens or the function type can be deduced statically.
In a simple test I see that GCC has no problem with completely removing the overhead of std::function_ref, but plain std::function is a huge mess.
Eventually we will get there [1], but in the meantime I prefer not to rely on devirtualization, and heap elision is more of a party trick.
[1] for example 25 years ago compilers were terrible at removing abstraction overhead of the STL, today there is very little cost.
I have a case where I need to create a static templated lambda to be passed to C as a pointer. Such thing is impossible in Rust, which I considered at first.
Unfortunately a lot of existing C APIs won't have the user arg in the place you need it, it's a mix of first, last, and sometimes even middle.
This isn’t fully accurate. In your example, `&mut C` actually has the same layout as usize. It’s not a fat pointer. `C` is a concrete type and essentially just an anonymous struct with FnMut implemented for it.
You’re probably thinking of `&dyn FnMut` which is a fat pointer that pairs a pointer to the data with a pointer to a VTable.
You can call the local functions directly and get the benefits of the specialized code.
There's no way to spell out this function's type, and no way to store it anywhere. This is true of regular functions too!
To pass it around you need to use the type-erased "fat pointer" version.
I don't see how anything else makes sense for C.
I'm a fan of nested functions but don't think the executable stack hack is worth it, and using a 'display' is a better solution.
See the Dragon Book or Compiler Construction: Principles and Practice (1984) by Louden
https://news.ycombinator.com/item?id=46243298
https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Functio...
I do like the trampoline trick in 3.2.4, however, neat alternative to a fat pointer!
well regular functions decay to function pointers. You could have the moral equivalent of std::function_ref (or similarly, borland __closure) in C of course and have closures decay to it.
The most striking surprise is the magnitude of the gap between std::function and std::function_ref. It turns out std::function (the owning container) forces a "copy-by-value" semantics deeply into the recursion. In the "Man-or-Boy" test, this apparently causes an exponential explosion of copying the closure state at every recursive step. std::function_ref (the non-owning view) avoids this entirely.
Differently from GCC14, GCC15 itself does seem to be able to optimize the allocation (and the whole std::function) in trivial cases though (independently
Therefore it's very jarring with this text after the first C code example:
This uses a static variable to have it persist between both the compare function calls that qsort makes and the main call which (potentially) changes its value to be 1 instead of 0
This feels completely made up, and/or some confusion about things that I would expect an author of a piece like this to really know.
In reality, in this usage (at the global outermost scope level) `static` has nothing to do with persistence. All it does is make the variable "private" to the translation unit (C parliance, read as "C source code file"). The value will "persist" since the global outermost scope can't go out of scope while the program is running.
It's different when used inside a function, then it makes the value persist between invocations, in practice typically by moving the variable from the stack to the "global data" which is generally heap-allocated as the program loads. Note that C does not mention the existence of a stack for local variables, but of course that is the typical implementation on modern systems.
If I follow your comment, you mean that he could have use a non-static global variable instead and avoid using "static" keyword afterward?
Yes, the `static` can simply be dropped, it does no additional work for a single-file snippet like this.
I tried diving into Compiler Explorer to examine this, and it actually produces slightly different code for the with/without `static` cases, but it was confusing to deeply understand quickly enough to use the output here. Sorry.
Also, the difference manifests in the symbols table, not the assembly.
This doesn’t mean that it’s impossible to make mistakes, but still.
The only misleading thing here is that ‘static’ is monospaced in the article (this can’t be seen on HN). Other than that, ‘static variable’ can plausibly refer to an object with a static storage duration, which is what the C standard would call it.
The fact that you are questioning the use of the term shows that you are not familiar with the ISO C standard. What the author alludes to is static storage duration. And whether or not you use the "static" keyword in that declaration, the storage duration of the object remains "static". People mostly call those things "global variables", but the proper standardese is "static storage duration". In that sense, the author was right to use "static" for the lifetime of the object.
It’s confusing to me that thread locals are “not the best idea outside small snippets” meanwhile the top solution is templating on recursion depth with a constexpr limit of 11.
(You could solve that with a manually maintained stack for the context in a thread local, but you'd have to do that case-by-case)
I think the times you need to do this are few. And this version is much more pruden.
Anyway, the larger point is that a re-entrant general solution is desirable. The sort example might be a bit misguided, because who calls sort-inside-sort[0]? Nobody, realistically, but these types of issues are prevalent in the "how to do closures" area... and In C every API does it slightly differently, even if they're even aware of the issues.
[0] Because there's no community that likes nitpicking like the C (or C++) community. I considered preempting that objection :). C++ has solved this, so there's that.
That you do not call it recursively by checking that the thread local is nil before invocation.
> a re-entrant general solution is desirable.
I know what you mean, but I just don't know why you want to emulate that in C. There is a real problem of people writing APIs that don't let you pass in data with your function pointer - the thread local method can solve 99% of those without changes to the original API.
But if you really want to do all kinds of first class functions with data, do you want to use C?
Then it's clearly only half a solution.
The example I gave above should work fine in any language with first-class closures.
No I do not. It will reassigned next call.
> But again you are reinventing dynamic scoping
No. I’m not reinventing anything. I’m using the existing feature of thread local variables.
The usage of such is entirely an implementation detail of qsort2 with the exception of recursion.
Dynamic scoping typically refers to defining variables which have scope outside of their call stack. No usage of this API requires it.
Can you just try to learn something new?
Once again, the caller of the API does not declare any variables so there is no dynamic scoping.
With -ftrampoline-impl=heap, GCC automatically insert[1] pairs of constructor/destructor routines from libgcc which were built around mmap/munmap.
[1] https://godbolt.org/z/7s5nooMPz
I've used lambdas extensively in modern C++. I hate them with a passion.
I've also used OCaml. An awesome language where this stuff is super natural and beautiful.
I don't understand why people want to shoehorn functional programming into C. C++ was terrible alraedy, and is now worse for it.
> we’re going to be focusing on and looking specifically at Closures in C and C++, since this is going to be about trying to work with and – eventually – standardize something for ISO C that works for everyone.
Sigh. My heart sinks.
https://dlang.org/spec/betterc.html
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3654.pdf
(and I am not impressed by micro benchmarks)
I disagree because I've seen closures shine (in OCaml) and suck terribly (in C++). Same concept, extremely different programming experience and debuggability. Syntax matters, language psychology matters. Closures naturally and obviously suit programming languages that are genuinely functional, or at least manage memory transparently for you. C is neither, and so "alloca", "defer", the cleanup attribute, closures -- all stick out sorely. The whole selling point of C is explicitness. The tedium of C is the price we pay for the great control that C offers us with a relatively simple vocabulary. I couldn't be more content with that deal.
C++ is impossible for any single person to learn, there is so much insanely complicated implicit behavior in it. C can mostly be learned by a single (persistent) person, but it's been getting harder.
I don't want C to be fashionable, or attractive. I want it to remain minimal. If someone feels hamstrung by it, there are so many other languages to choose from. I simply want the particular tradeoff that C offers (or used to offer) to remain in existence. And that is what's been going away, with each issue of ISO C being the only official standard (obsoleting/superseding all earlier issues of the standard).
Why give people closures or "defer" or ... whatever ... when they can't even remember the concept of the usual arithmetic conversions? Which has been standard since C89? Have you met a "practitioner" (= any C programmer with no particular interest in the standard proper) that could explain the effective type rules? Why make it more complicated?
I apologize -- I guess this is just my semi-diplomatic way to say, "please, get off my lawn". (Not to you personally, of course!) I'm very sorry.
Well, I also do not want to see C++ style closures in C and I fully agree about your point regarding control and explicitness. I also agree that some of the initiatives we see now are regrettably motivated by the attempt to make C fashionable, and sometimes by poorly adopting C++ features.
Yet, I think nested functions fit C perfectly way and I use them for a long time in some projects. They exist in very similar languages (PASCAL, Ada, D, ...) and even in C's ancestor ALGOL. This also shows that this type of nested functions are also not a functional programming concept. There is not really anything to learn, as syntax and semantics follow very naturally from all existing rules and the improvement in code quality for callbacks or higher-level iteration over data types is very real.
The usual arithmetic conversion have seen unfair criticism in my opinion. Effective types rules are mess, to some degree also because compilers invented their own rules or simply ignore the C standard. But this is a different topic. From a programmer's point of view, the rule that you just access each variable should have one type that is used consistently is enough to know to stay on the safe side.
There they allowed nested functions, but also what they termed "full function values", being a form of fat pointer. Certainly I came across it in High-C v1.7 in 1990, and the full manual for an earlier version (1.5?) from around '85 can be found on Bitsavers.
It had a syntax like:
The above is an extract from their language reference, which you can find here:https://archive.org/download/Yoshizuki_UnRenamed_Files__D-V/...
I believe the High-C compiler with this support is still available, for modern embedded CPUs.
https://www.synopsys.com/dw/ipdir.php?ds=arc-metaware-mx
With "static", it is implemented as an ordinary function, but the name is local to the function that contains it; it cannot access stuff within the function containing it unless those things are also declared as "static".
With "register", the address of the function cannot be taken, and if the function accesses other stuff within the function that contains it then the compiler will add additional arguments to the function so that its type does not necessarily match the type which is specified in the program.
This is not good enough for many uses though, so having the other extensions would also be helpful (possibly including implementing Apple Blocks in GCC).
(I can't be bothered to run his benchmarks)
17 more comments available on Hacker News