Closures as Win32 Window Procedures
Key topics
The debate around using closures as Win32 window procedures has sparked a lively discussion, with commenters weighing in on the pros and cons of this approach. Some, like cyberax, point out that this technique was used in the past, such as in the Active Template Library, but ultimately deemed a bad idea due to the need to generate executable code, which can interfere with non-executable memory protections. Others, like kmeisthax, express surprise at Microsoft's historical approach to handling the lack of a context pointer in window procedures, which involved JIT compiling a trampoline to hold the context pointer. As the conversation unfolds, it becomes clear that this technique has been used in various contexts, including Delphi in the 90s, and that some developers, like Philpax, have used similar approaches in their own work, although others, like RossBencina, question the practicality of building executable trampolines at runtime.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
23
Day 1
Avg / period
5.8
Based on 29 loaded comments
Key moments
- 01Story posted
Dec 13, 2025 at 6:39 PM EST
20 days ago
Step 01 - 02First comment
Dec 13, 2025 at 8:10 PM EST
2h after posting
Step 02 - 03Peak activity
23 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 23, 2025 at 5:22 PM EST
10 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Windows actually had a workaround in its NX-bit implementation that recognized the byte patterns of these trampolines from the fault handler: https://web.archive.org/web/20090123222148/http://support.mi...
Windows x64 and ARM64 do use register passing, with 4 registers for x64 (rcx/rdx/r8/r9) and 8 registers for ARM64 (x0-x7). Passing an additional parameter on the stack would be cheap compared to the workarounds that everyone has to do now.
They designed windows classes to be reusable, and assumed many developers going to reuse windows classes across windows.
Consider the following use case. Programmer creates a window class for a custom control, registers the class. Designs a dialog template with multiple of these custom controls in a single dialog. Then creates the dialog by calling DialogBoxW or similar.
These custom controls are created automatically multiple at once, hard to provide context pointers for each control.
Indeed, aside from a party trick, why build an executable trampoline at runtime when you can store and retrieve the context, or a pointer to the context, with SetWindowLong() / GetWindowLong() [1]?
Slightly related: in my view Win32 windows are a faithful implementation of the Actor Model. The window proc of a window is mutable, it represents the current behavior, and can be changed in response to any received message. While I haven't personally seen this used in Win32 programs it is a powerful feature as it allows for implementing interaction state machines in a very natural way (the same way that Miro Samek promotes in his book.)
[1] https://learn.microsoft.com/en-us/windows/win32/api/winuser/...
The code as written, though, is missing a call to FlushInstructionCache() and might not work in processes that prohibit dynamic code generation. An alternative is to just pregenerate an array of trampolines in a code segment, each referencing a mutable pointer in a parallel array in the data segment. These can be generated straightforwardly with a little template magic. This adds size to the executable unlike an empty RWX segment, but doesn't run afoul of any dynamic codegen restrictions or require I-cache flushing. The number of trampolines must be predetermined, but the RWX segment has the same limitation.
This two step approach is the only way I found to use rust closures for wndproc without double allocation and additional indirection.
Another seemingly underutilised feature closely related to {Get,Set}WindowLong is cbClsExtra/cbWndExtra which lets you allocate additional data associated with a window, and store whatever you want there. The indices to the GWL/SWL function are quite revealing of how this mechanism works:
https://learn.microsoft.com/en-us/windows/win32/api/winuser/...
I guess I need to prove a point on my Github during next week.
C++ lambdas are basically old style C++ functors that are compiled generated, with the calling address being the operator().
https://github.com/pjmlp/LambdaWndProc/blob/main/LambdaWndPr...
https://github.com/pjmlp/LambdaWndProc/blob/main/LambdaWndPr...
And that's why I generally don't see C to have closures, and requires a JIT/dynamic code generation approach as this article has actually done (using shadow stacks). There is also a hack in GNU C which introduce local function lambda, but it is not in ISO C, and obviously won't in the next decade or so.
[^1]: https://en.wikipedia.org/wiki/Closure_(computer_programming)
Probably not useful for most of my use cases (I'm usually injecting a payload, so I'd still have the pointer-distance issue between the executable and my payload), but it's still potentially handy. Will have to keep that around!
This is cool, but isn’t runtime code generation pretty frowned upon nowadays?