Pointer Tagging in C++: the Art of Packing Bits Into a Pointer
Posted4 months agoActive4 months ago
vectrx.substack.comTechstory
calmmixed
Debate
60/100
C++Pointer TaggingLow-Level ProgrammingMemory Management
Key topics
C++
Pointer Tagging
Low-Level Programming
Memory Management
The article discusses the technique of packing bits into a pointer in C++, and the discussion revolves around its usefulness, potential pitfalls, and alternatives.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
25
4-6h
Avg / period
6.4
Comment distribution51 data points
Loading chart...
Based on 51 loaded comments
Key moments
- 01Story posted
Sep 21, 2025 at 9:46 PM EDT
4 months ago
Step 01 - 02First comment
Sep 21, 2025 at 11:58 PM EDT
2h after posting
Step 02 - 03Peak activity
25 comments in 4-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 22, 2025 at 3:20 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45328335Type: storyLast synced: 11/20/2025, 5:27:03 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I've done my own exploration of what I can get away with across 64-bit x86 and ARM in this regard. It has been a while but the maximum number of bits that are reliably taggable across all environments and use cases that I have been able to determine is six. Can you get away with more? Probably yes, but there are identifiable environments where it will explode if you do so. That may not apply to your use case.
Reliable pointer tagging is not trivial.
Would be great to hear some actionable details.
Note also that this is the intersection of bits that are available on both ARM and x86. If you want it to be portable, you need both architectures. Just because ARM64 doesn’t use a bit doesn’t mean that x86 doesn’t and vice versa.
Both x86 and ARM have proposed standards for pointer tagging in the high bits. However, those bits don’t perfectly overlap. Also, some platforms don’t fully conform to this reservation of high bits for pointer tagging, so there is a backward compatibility issue.
Across all of that, I found six high bits that were guaranteed to be safe for all current and future platforms. In practice you can probably use more but there is a portability risk.
Your mask/tag doesn't need to use the same bits on x86 and ARM to be portable, though.
My perspective is biased by the requirements of high-assurance systems.
Otherwise they're right, it's not the intersection that matters but just the total bits available
You can’t cram 8 bits of tag in 7 bits if the latter is all the architecture has available. Hence why you have to design for the smallest reliable target.
Kinda weird to materialize pointers across architectures rather than indices.
But in any case surely the relevant consideration is “fewest number of free pointer bits on any single platform”. And not “intersection of free bits across all platforms”. Right?
IIRC, the 6 high bits I mentioned was the intersection of every tag reservation implementation and/or proposal. In other words, it was the set of bits that Intel, AMD, and ARM agreed would be safe for tagging for the foreseeable future.
Fewer bits than I would like and can probably exploit, but nonetheless the number I can reasonably rely on. If a consistent standard is ever agreed upon, the number of bits may increase.
The original article already contains a note that "Some more recent x64 CPUs use 5-level paging, which would increase this number to 57 bits [0]"
Apparently server-level "Sunny Cove" Intel CPUs implement this extension [1].
[0]: https://en.wikipedia.org/wiki/Intel_5-level_paging
[1]: <https://en.wikipedia.org/wiki/Sunny_Cove_(microarchitecture)>
It is generally kinda sad though that there's not a way to request from mmap or equivalent that the result is in a specific range of memory (in (0; 1<<48) here). Would be useful for JIT-compiling code that needs to call into precompiled functions too.
[1]: https://www.kernel.org/doc/html/v5.8/x86/x86_64/5level-pagin...
[2]: https://www.kernel.org/doc/html/v5.8/arm64/memory.html#bit-u...
The fact that Linux does this isn't nice, it's a huge mistake. It means that the kernel can't automatically use 5-level page tables on processors that support it, because backwards compatibility guarantees mean the programs must be able to use those bits in a pointer. AMD was smart enough to throw an exception if programs use those bits in a pointer (thus guaranteeing forward compatibility), so why Linux didn't follow suit is puzzling.
It is somewhat unfortunate to just force the larger address space to specific mmap usage, but it's hard for me to imagine that many programs actually needing more than 256TB of virtual memory that aren't doing so in a very-specialized way.
Certainly much less frequent than the already-infrequent (but very much existing, and significant! incl. both Firefox/SpiderMonkey and WebKit/JavaScriptCore) cases of programs utilizing top 16 bits being zeroes.
Then there's the option of mmap returning ranges from from the low 2^48 while possible, and using larger addresses only when that completely runs out; should mean existing software works fine before it needs more than 256TB of RAM, and, if the software checks the top bits of mmap's result being zeroes, it's not negatively affected anyway.
Really the proper solution is to go back in time and make mmap have separate lower and upper bounds addresses though.
It is if you use alignment bits. Not always possible if you don't control the data though.
Also: Caching.
I wonder whether you could use the MMU to ignore these upper bits, by mapping each combination of bits to the same address as with them clear.
Most of what hardware STM provided could be achieved by designing the software better.
Enabled on most iphones even!
https://en.wikipedia.org/wiki/Tagged_architecture#Architectu...
[1] https://stackoverflow.com/a/64863331/2013747
Today in C++ the way to avoid technical UB for such tricks will be going via an integer which is promised to have the same width as a pointer. In practice the compiler won't do anything different (types evaporate during compilation) but that isn't UB.
Say you have a pool of objects, when you allocate one you give out a pointer, but in the packed bits you add a 'generation' counter. Every time you free that object from the pool you increment the counter. When you dereference a pointer you can check the generation in the packed pointer and the one in the pool. If they don't match you're using a pointer to an invalid object.
There's some Bugblatter Beast logic here. They assume that if they can't imagine a counter/pointer value reuse ever reoccurring then malicious actors won't be able to imagine it either.
Here's an idea. How about fixing the buffer overflows and lack of data sanitation that allow code injection to occur in the first place.
Even if you can squeeze in 32 tag bits (you would need to leverage the virtual memory system in some platform-specific way), counting to 4-billion-and-change takes no time on modern CPUs, so you would realistically overflow the tag in milliseconds.
But if your system is aware of virtual memory, you can still do interesting things, like remapping a whole arena with a different "tag" and use the information in GC write barriers.
Another technique is to use the Least Significant Bits: indeed since pointers are 64 bit aligned, the lowest 3 bits are always zero, therefore it is possible to pack 3 bits of information there.
And there's a third technique in C++, which is to use placement new to morph an object in-place. This can be use to toggle between 2 classes with a boolean method, where one has this method return true and the other class has this method return false. This creates per-object state that really uses bits in the vptr to store this boolean state. Obviously this can be used with a whole set of classes to stores more bits. I have used this successfully to store values of refcount (each AddRef/Release using placement new to morph from RefCounted1/RefCounted2/../RefCountedN classes) to implement reference counting without a counter.