Closures as Win32 Window Procedures

Posted20 days agoActive10 days ago

ibobev

95 points

24 comments

nullprogram.comTech Discussionstory

informativeneutral

Debate

20/100

Win32ProgrammingClosures

Key topics

Win32

Programming

Closures

The debate around using closures as Win32 window procedures has sparked a lively discussion, with commenters weighing in on the pros and cons of this approach. Some, like cyberax, point out that this technique was used in the past, such as in the Active Template Library, but ultimately deemed a bad idea due to the need to generate executable code, which can interfere with non-executable memory protections. Others, like kmeisthax, express surprise at Microsoft's historical approach to handling the lack of a context pointer in window procedures, which involved JIT compiling a trampoline to hold the context pointer. As the conversation unfolds, it becomes clear that this technique has been used in various contexts, including Delphi in the 90s, and that some developers, like Philpax, have used similar approaches in their own work, although others, like RossBencina, question the practicality of building executable trampolines at runtime.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

Day 1

Avg / period

5.8

Comment distribution29 data points

Loading chart...

Based on 29 loaded comments

Key moments

01Story posted
Dec 13, 2025 at 6:39 PM EST
20 days ago
Step 01
02First comment
Dec 13, 2025 at 8:10 PM EST
2h after posting
Step 02
03Peak activity
23 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Dec 23, 2025 at 5:22 PM EST
10 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (24 comments)

Showing 29 comments

cyberax

20 days ago

2 replies

This approach was used in the ATL/WTF (Active Template Library, Windows Template Library) in the early 2000-s. It was a bad idea, because you need to generate executable code, interfering with NX-bit memory protection.

Windows actually had a workaround in its NX-bit implementation that recognized the byte patterns of these trampolines from the fault handler: https://web.archive.org/web/20090123222148/http://support.mi...

kmeisthax

20 days ago

3 replies

I'm genuinely surprised Microsoft's attitude towards "wndprocs don't have a context pointer" was "let's JIT compile a trampoline to hold the context pointer" and not to add support for a five-parameter wndproc into USER.dll, or have a wrapper that grabs GWLP_USERDATA and copies it to the register this lives in.

pjc50

19 days ago

2 replies

Doesn't x32 only have four registers available in the calling convention, AX-DX?

ack_complete

19 days ago

The stdcall calling convention used APIs and API callbacks on Windows x86 doesn't use registers at all, all parameters are passed on the stack. MSVC does support thiscall/fastcall/vectorcall conventions that pass some values in registers, but the system APIs and COM interfaces all use stdcall.

Windows x64 and ARM64 do use register passing, with 4 registers for x64 (rcx/rdx/r8/r9) and 8 registers for ARM64 (x0-x7). Passing an additional parameter on the stack would be cheap compared to the workarounds that everyone has to do now.

pwdisswordfishy

19 days ago

The x32 ABI uses the same fastcall convention as the regular x86-64 ABI. It's mostly syscall numbers that are affected by the shrunk pointers.

tonyedgecombe

20 days ago

[delayed]

Const-me

19 days ago

> I'm genuinely surprised Microsoft's attitude towards "wndprocs don't have a context pointer"

They designed windows classes to be reusable, and assumed many developers going to reuse windows classes across windows.

Consider the following use case. Programmer creates a window class for a custom control, registers the class. Designs a dialog template with multiple of these custom controls in a single dialog. Then creates the dialog by calling DialogBoxW or similar.

These custom controls are created automatically multiple at once, hard to provide context pointers for each control.

barrkel

19 days ago

It was also used by Delphi in 90s.

RossBencina

20 days ago

2 replies

> This is more work than going through GWLP_USERDATA

Indeed, aside from a party trick, why build an executable trampoline at runtime when you can store and retrieve the context, or a pointer to the context, with SetWindowLong() / GetWindowLong() [1]?

Slightly related: in my view Win32 windows are a faithful implementation of the Actor Model. The window proc of a window is mutable, it represents the current behavior, and can be changed in response to any received message. While I haven't personally seen this used in Win32 programs it is a powerful feature as it allows for implementing interaction state machines in a very natural way (the same way that Miro Samek promotes in his book.)

[1] https://learn.microsoft.com/en-us/windows/win32/api/winuser/...

ack_complete

20 days ago

2 replies

There's an annoying corner case when using SetWindowLongPtr/GetWindowLongPtr() -- Windows sends WM_GETMINMAXINFO before WM_NCCREATE. This can be worked around with a thread local, but a trampoline inherently handles it. Trampolines are also useful for other Win32 user functions that don't have an easy way to store context data, such as SetWindowsHookEx(). They're also slightly faster, though GetWindowLongPtr() at least seems able to avoid a syscall.

The code as written, though, is missing a call to FlushInstructionCache() and might not work in processes that prohibit dynamic code generation. An alternative is to just pregenerate an array of trampolines in a code segment, each referencing a mutable pointer in a parallel array in the data segment. These can be generated straightforwardly with a little template magic. This adds size to the executable unlike an empty RWX segment, but doesn't run afoul of any dynamic codegen restrictions or require I-cache flushing. The number of trampolines must be predetermined, but the RWX segment has the same limitation.

201984

19 days ago

1 reply

FlushInstructionCache isn't needed on x86_64. I-cache and D-cache are coherent.

ack_complete

18 days ago

I'm not convinced this is always guaranteed for a Windows x64 program. When running on bare x64 hardware, FlushInstructionCache() does seem to be an (inefficient) noop on Windows 11 x64, but when running in emulation on Windows 11 ARM64, it's running a significantly larger amount of ARM64 native code -- it looks like it might be ensuring that stale JIT code is flushed.

rovingeye

19 days ago

I wasn't aware of the thread local trick, I solve this problem by not setting WS_VISIBLE and calling SetWindowPos & ShowWindow after CreateWindow returns (this solves some other problems as well..)

timokr

19 days ago

The combination `GWLP_USERDATA` to pass state and `GWLP_WNDPROC` to update the actual wnd procedure is what I used in my [rust wrapper](https://github.com/timokroeger/winmsg-executor/blob/main/src...).

This two step approach is the only way I found to use rust closures for wndproc without double allocation and additional indirection.

userbinator

19 days ago

1 reply

This somewhat reminds me of the old MakeProcInstance mechanism in Win16, which was quickly rendered obsolete by someone who made an important realisation: https://www.geary.com/fixds.html

Another seemingly underutilised feature closely related to {Get,Set}WindowLong is cbClsExtra/cbWndExtra which lets you allocate additional data associated with a window, and store whatever you want there. The indices to the GWL/SWL function are quite revealing of how this mechanism works:

https://learn.microsoft.com/en-us/windows/win32/api/winuser/...

rovingeye

19 days ago

All my window classes uses cbWndExtra, and I leave GWLP_USERDATA for the user who is creating windows.

pjmlp

19 days ago

1 reply

Or I don't know, just use C++ lambdas instead?

LegionMammal978

19 days ago

1 reply

You can't turn a capturing C++ lambda into a WNDPROC, which is an ordinary function pointer. You'd still have to ferry the lambda via a context pointer, which is what this blog post and the other solutions in the comments are all about.

pjmlp

19 days ago

2 replies

You kind of can, that is one of their design points, naturally you need to move the context into the body and know what to cast back from.

I guess I need to prove a point on my Github during next week.

rovingeye

19 days ago

1 reply

I assume by "move the context into the body" you mean using GetWindowLongPtr? Why not just use a static wndproc at that point?

pjmlp

19 days ago

1 reply

I mean using a static C++ lambda that moves the context into the lambda body via capture specifier.

C++ lambdas are basically old style C++ functors that are compiled generated, with the calling address being the operator().

rovingeye

19 days ago

1 reply

That doesn't sound like a valid wndproc

pjmlp

13 days ago

1 reply

It certainly is,

https://github.com/pjmlp/LambdaWndProc/blob/main/LambdaWndPr...

rovingeye

10 days ago

It looks like you missed the part where you "move the context into the lambda body via capture specifier."

LegionMammal978

19 days ago

1 reply

If you mean that you can ship a C++ lambda through a static C function via a context pointer, of course you can do that, it's not that special. Rust programs also have to do that trick all the time to turn a closure into a C callback. The primary problem with WNDPROC is how to get that context pointer in the first place, to which there are a few different possible solutions.

pjmlp

13 days ago

No need for context pointer,

https://github.com/pjmlp/LambdaWndProc/blob/main/LambdaWndPr...

stevefan1999

19 days ago

I hate to say it (and I know a lot of C apologists will downvote it), but there is no native closure in C, all you have is a function pointer in C, and you need to manually add the "context" pointer to make it a closure, in the strict (textbook) sense. That's because C does not have the concept of "data ownership", only automatic memory (that is on stack or register) or manual memory (in the sense of malloc/sbrk'd blocks), but a (again, textbook definition of) closure requires you to have access to the data of caller/"parent"/upper layer [^1].

And that's why I generally don't see C to have closures, and requires a JIT/dynamic code generation approach as this article has actually done (using shadow stacks). There is also a hack in GNU C which introduce local function lambda, but it is not in ISO C, and obviously won't in the next decade or so.

[^1]: https://en.wikipedia.org/wiki/Closure_(computer_programming)

Philpax

20 days ago

Hah! I usually allocate trampolines at runtime, as the article suggests, but reserving R/W space for them within the application's memory space is a cute trick.

Probably not useful for most of my use cases (I'm usually injecting a payload, so I'd still have the pointer-distance issue between the executable and my payload), but it's still potentially handy. Will have to keep that around!

solarkraft

18 days ago

> Taking this idea further, I’d like to generate these new functions on demand at run time akin to a JIT compiler

This is cool, but isn’t runtime code generation pretty frowned upon nowadays?

View full discussion on Hacker News

ID: 46259334Type: storyLast synced: 12/15/2025, 6:40:29 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN