Pointer Tagging in C++: the Art of Packing Bits Into a Pointer

Posted4 months agoActive4 months ago

signa11

69 points

51 comments

vectrx.substack.comTechstory

calmmixed

Debate

60/100

C++Pointer TaggingLow-Level ProgrammingMemory Management

Key topics

C++

Pointer Tagging

Low-Level Programming

Memory Management

The article discusses the technique of packing bits into a pointer in C++, and the discussion revolves around its usefulness, potential pitfalls, and alternatives.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

4-6h

Avg / period

6.4

Comment distribution51 data points

Loading chart...

Based on 51 loaded comments

Key moments

01Story posted
Sep 21, 2025 at 9:46 PM EDT
4 months ago
Step 01
02First comment
Sep 21, 2025 at 11:58 PM EDT
2h after posting
Step 02
03Peak activity
25 comments in 4-6h
Hottest window of the conversation
Step 03
04Latest activity
Sep 22, 2025 at 3:20 PM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (51 comments)

Showing 51 comments

jandrewrogers

4 months ago

3 replies

As someone that uses pointer tagging, I must point out that this article is insufficiently defensive.

I've done my own exploration of what I can get away with across 64-bit x86 and ARM in this regard. It has been a while but the maximum number of bits that are reliably taggable across all environments and use cases that I have been able to determine is six. Can you get away with more? Probably yes, but there are identifiable environments where it will explode if you do so. That may not apply to your use case.

Reliable pointer tagging is not trivial.

forrestthewoods

4 months ago

4 replies

Can you share details? What modern platforms/environments does this not work on? Are you saying the intersection of available bits on all platforms is just 6? Or are there platforms that actually use 58 bits?

Would be great to hear some actionable details.

jandrewrogers

4 months ago

2 replies

It wasn’t anything clever. A couple years ago I did a dive into x86 and ARM literature to determine what bits of a pointer were in use in various environments or were on a roadmap to be used in the future. To be honest, it was more bits than I was expecting.

Note also that this is the intersection of bits that are available on both ARM and x86. If you want it to be portable, you need both architectures. Just because ARM64 doesn’t use a bit doesn’t mean that x86 doesn’t and vice versa.

Both x86 and ARM have proposed standards for pointer tagging in the high bits. However, those bits don’t perfectly overlap. Also, some platforms don’t fully conform to this reservation of high bits for pointer tagging, so there is a backward compatibility issue.

Across all of that, I found six high bits that were guaranteed to be safe for all current and future platforms. In practice you can probably use more but there is a portability risk.

loeg

4 months ago

1 reply

> Note also that this is the intersection of bits that are available on both ARM and x86. If you want it to be portable, you need both architectures. Just because ARM64 doesn’t use a bit doesn’t mean that x86 doesn’t and vice versa.

Your mask/tag doesn't need to use the same bits on x86 and ARM to be portable, though.

jandrewrogers

4 months ago

2 replies

It depends on the application, those bits may be materialized across architectures. The objective was maximizing safety in all contexts.

My perspective is biased by the requirements of high-assurance systems.

rhdjebejdbd

4 months ago

1 reply

It doesn't depend on the application unless the application shares the same pointers between x86 and arm which doesn't make any sense to me.

Otherwise they're right, it's not the intersection that matters but just the total bits available

jandrewrogers

4 months ago

1 reply

Eh? The values you can store in the tags are absolutely dependent on the number of available bits. That’s a simple type safety problem. This requirement is architecture independent.

You can’t cram 8 bits of tag in 7 bits if the latter is all the architecture has available. Hence why you have to design for the smallest reliable target.

injidup

4 months ago

You are agreeing with each other whilst still arguing

forrestthewoods

4 months ago

1 reply

If one platform uses the upper 56 bits and another uses the lower 56 bits that doesn’t mean you have 0 bits available for tagging. It means you have 8 bits and have to go through a conversation when moving from one platform to another. This is perhaps annoying but perfectly fine.

Kinda weird to materialize pointers across architectures rather than indices.

But in any case surely the relevant consideration is “fewest number of free pointer bits on any single platform”. And not “intersection of free bits across all platforms”. Right?

loeg

4 months ago

Right.

menaerus

4 months ago

1 reply

How about taking the advantage of max_align_t pointer alignment guarantees, which on x86-64 Linux (glibc) is 16-bytes? This would leave you with the 4 lowest bits to be used.

jandrewrogers

4 months ago

1 reply

Unfortunately, low bits vary considerably from a large number to zero. If you need a minimum number of bits to be reliably available then you have to look at the high bits of a pointer. Naturally, implementations use the low bits of alignment makes them available.

menaerus

4 months ago

Hm, what I have seen so far is that pointers returned by system malloc are usually aligned either to 8-byte boundary (windows) or 16-byte boundary (linux). I think jemalloc interprets the C standard guarantees a bit differently and will return the 8-byte aligned pointer for allocations whose size <= 8. But even this still leaves us with 3 LBSs to use.

nervoir

4 months ago

If you include ARM then PAC and MTE will consume a few of those precious bits. Don’t think any platforms use PAC for pointers to allocated objects though unless they’re determined to be exceptionally important like creds structure pointers in task structures in the kernel.

jandrewrogers

4 months ago

Page tables can optionally consume a very large number of bits on x86 (57?). Not every platform enables it but your code may run on a platform that uses it. There are a bunch of proposals from Intel, AMD, ARM, et al about officially recognizing some set of high bits in 64-bit pointers as tags in user space, with implementations. Unfortunately, these “standards” don’t agree on which high bits can be safely reserved for tagging.

IIRC, the 6 high bits I mentioned was the intersection of every tag reservation implementation and/or proposal. In other words, it was the set of bits that Intel, AMD, and ARM agreed would be safe for tagging for the foreseeable future.

Fewer bits than I would like and can probably exploit, but nonetheless the number I can reasonably rely on. If a consistent standard is ever agreed upon, the number of bits may increase.

joz1-k

4 months ago

> are there platforms that actually use 58 bits?

The original article already contains a note that "Some more recent x64 CPUs use 5-level paging, which would increase this number to 57 bits [0]"

Apparently server-level "Sunny Cove" Intel CPUs implement this extension [1].

[0]: https://en.wikipedia.org/wiki/Intel_5-level_paging

[1]: <https://en.wikipedia.org/wiki/Sunny_Cove_(microarchitecture)>

dzaima

4 months ago

1 reply

At least on Linux, you're guaranteed to get the top 16 bits free on at least x86[1] and ARM[2]. Maybe other OSes are less nice, but generally an OS doesn't really have a reason to force down the full 57-or-whatever-bit range before the program actually requests 256 terrabytes of virtual address space.

It is generally kinda sad though that there's not a way to request from mmap or equivalent that the result is in a specific range of memory (in (0; 1<<48) here). Would be useful for JIT-compiling code that needs to call into precompiled functions too.

[1]: https://www.kernel.org/doc/html/v5.8/x86/x86_64/5level-pagin...

[2]: https://www.kernel.org/doc/html/v5.8/arm64/memory.html#bit-u...

bigstrat2003

4 months ago

1 reply

> Maybe other OSes are less nice, but generally an OS doesn't really have a reason to force down the full 57-or-whatever-bit range before the program actually requests 256 terrabytes of virtual address space.

The fact that Linux does this isn't nice, it's a huge mistake. It means that the kernel can't automatically use 5-level page tables on processors that support it, because backwards compatibility guarantees mean the programs must be able to use those bits in a pointer. AMD was smart enough to throw an exception if programs use those bits in a pointer (thus guaranteeing forward compatibility), so why Linux didn't follow suit is puzzling.

dzaima

4 months ago

Indeed most architectures error if the top bits are non-zero, but that's trivially worked around by just a manual `x & 0xffffffffffff` or `x << 16 >> 16` or similar, and software already utilizes that (and indeed must do that to not immediately break on x86).

It is somewhat unfortunate to just force the larger address space to specific mmap usage, but it's hard for me to imagine that many programs actually needing more than 256TB of virtual memory that aren't doing so in a very-specialized way.

Certainly much less frequent than the already-infrequent (but very much existing, and significant! incl. both Firefox/SpiderMonkey and WebKit/JavaScriptCore) cases of programs utilizing top 16 bits being zeroes.

Then there's the option of mmap returning ranges from from the low 2^48 while possible, and using larger addresses only when that completely runs out; should mean existing software works fine before it needs more than 256TB of RAM, and, if the software checks the top bits of mmap's result being zeroes, it's not negatively affected anyway.

Really the proper solution is to go back in time and make mmap have separate lower and upper bounds addresses though.

orlp

4 months ago

> Reliable pointer tagging is not trivial.

It is if you use alignment bits. Not always possible if you don't control the data though.

sema4hacker

4 months ago

4 replies

In the 70's when memory was always small and expensive, I had to keep things as packed and tight as possible. But now memory is so huge and cheap that it's been a long time since I had to worry about things like packing bits, which is incredibly bug-prone anyway.

RossBencina

4 months ago

1 reply

Cache is not huge. If you want good performance you need to keep your working set in cache. Sometimes that requires storing offsets (say 32-bit) rather than pointers. If you can also pack info into spare bits, all the better. Perhaps not everyone needs to care about this, but there are still reasons to care.

gpderetta

4 months ago

Indeed, you might have hundreds of GBs, or even TBs of memory, but caches are still (relatively) small. L1 is going to be in the order of 64k of memory!

dh2022

4 months ago

A benefit for packing pointers is when the data needed is already packed-this will avoid a pointer reference.

archi42

4 months ago

Depending on what you're building, memory is still scarce. Of course in the past that was much more common, but even today just doubling memory doesn't always fly: Embedded is still not a land of plenty; and not every computational expensive task is a money earner, which might put you in a difficult spot when arguing with controlling about a bigger server or at least a memory upgrade.

Also: Caching.

orlp

4 months ago

I take it you never wrote code involving atomic pointers. Regardless of memory usage, a lot of platforms only provide single-word atomics (efficiently), making bitpacking crucial for lockfree algorithms.

gblargg

4 months ago

2 replies

What's old is new again. The original 68000 processor only had a 24-bit physical address bus, so the MacOS used the upper 8 bits for tag information, and didn't even need to clear it when accessing. Once they started using CPUs and more RAM that needed these upper bits, they had to make "32-bit clean" versions of programs.

I wonder whether you could use the MMU to ignore these upper bits, by mapping each combination of bits to the same address as with them clear.

scheme271

4 months ago

Pointer tagging has been around for a while in the Java / JVM context. It lets you add information for bookkeeping when doing GC operations. There might be some usages elsewhere (unboxed vs boxed values, etc).

praptak

4 months ago

Pointer tagging is at least as old as Lisp: https://www.snellman.net/blog/archive/2017-09-04-lisp-number...

mlhpdx

4 months ago

1 reply

Brings back memories. I cut my teeth in C++ working on a system that used pointer tagging and transactional memory. Good times. That mind mending experience perhaps made my career.

brcmthrowaway

4 months ago

2 replies

Whatever happened to STM?

RossBencina

4 months ago

Clojure has transactional memory primitives. I've no idea how often they are used.

jandrewrogers

4 months ago

My understanding is that it didn’t provide much practical benefit that could justify the cost. I was interested at the time but the zeitgeist was that no one could come up with a scenario where it was worth it.

Most of what hardware STM provided could be achieved by designing the software better.

fooker

4 months ago

1 reply

This sort of stuff is starting to have hardware support now, so you no longer have to perform this wizardry.

Enabled on most iphones even!

praptak

4 months ago

1 reply

Hurray, we are getting Lisp machines back!

https://en.wikipedia.org/wiki/Tagged_architecture#Architectu...

fooker

4 months ago

What's old is new again :)

int_19h

4 months ago

3 replies

I find it rather concerning that an article of this nature has not a single mention of the words "undefined behavior", even though that is something one has to be very much aware of when doing such shenanigans.

wavemode

4 months ago

1 reply

I don't think pointer tagging (in and of itself, when implemented correctly for a particular platform) is ever undefined behavior. Rather, it is implementation-defined behavior - the result of converting a pointer to an integer is not specified, and could in theory be anything, so you have to know how your compiler and platform behave.

RossBencina

4 months ago

1 reply

We pray that it is implementation defined. According to some [1] even arithmetic on intptr_t is UB, let alone bit operations.

[1] https://stackoverflow.com/a/64863331/2013747

gpderetta

4 months ago

Eh, that's about strictly conforming programs. In practice the vast majority of programs are going to use platform facilities and extensions (POSIX for example) while still being conforming.

tialaramex

4 months ago

IIUC Because C++ has pointer provenance but no explicit mechanism for tagging or clear instructions about what is guaranteed, this is basically always all Undefined Behaviour and yet in practice lots of this will work on real C++ compilers.

Today in C++ the way to avoid technical UB for such tricks will be going via an integer which is promised to have the same width as a pointer. In practice the compiler won't do anything different (types evaporate during compilation) but that isn't UB.

RossBencina

4 months ago

What exactly do you propose such articles say about potential UB? If you are aware of UB, you are also aware that (1) it is extremely difficult (if not impossible) to conclusively determine whether such a thing is UB without retaining a language lawyer at great expense, or ending up in an infinite regress of bickering, (2) simultaneously you know that in practice, arithmetic and bit-wise pointer manipulations need to work for many intended C use cases, so it is highly unlikely that such a thing will stop working in production compilers. Perhaps this places unrealistic trust in the ISO working groups, but the only valid alternative, in my view, is to give up and go back to writing low-level code in assembly where such things have well-defined semantics.

lentil_soup

4 months ago

2 replies

One cool technique is to use the packed bits to check if a raw pointer is still valid.

Say you have a pool of objects, when you allocate one you give out a pointer, but in the packed bits you add a 'generation' counter. Every time you free that object from the pool you increment the counter. When you dereference a pointer you can check the generation in the packed pointer and the one in the pool. If they don't match you're using a pointer to an invalid object.

RossBencina

4 months ago

1 reply

A similar technique is sometimes used to mitigate the ABA problem in lock-free data structures.

j_seigh

4 months ago

Only using way more bits. The original IBM lock-free stack algorithm assumed 32 bits was safe because it would take 100 years for a 32 bit counter to wrap at the time. Now it's less than 1 second.

There's some Bugblatter Beast logic here. They assume that if they can't imagine a counter/pointer value reuse ever reoccurring then malicious actors won't be able to imagine it either.

Here's an idea. How about fixing the buffer overflows and lack of data sanitation that allow code injection to occur in the first place.

simonask

4 months ago

1 reply

It's a cool technique, but only useful in very limited scenarios. In particular, it is unsuitable when the number of reallocations is unbounded, because the generation counter can easily overflow.

Even if you can squeeze in 32 tag bits (you would need to leverage the virtual memory system in some platform-specific way), counting to 4-billion-and-change takes no time on modern CPUs, so you would realistically overflow the tag in milliseconds.

But if your system is aware of virtual memory, you can still do interesting things, like remapping a whole arena with a different "tag" and use the information in GC write barriers.

lentil_soup

4 months ago

1 reply

ah for sure, the context will matter but I'd say that if you're generating and destroying 4 billion objects very fast then probably storing pointers to those objects is already a bad idea!

simonask

4 months ago

It's mostly a question of whether a user can accidentally or maliciously cause 4 billion objects to be constructed in the same location. Maybe it happens accidentally after the system has been running for a year, and everything comes tumbling down.

jbn

4 months ago

This uses the Most Significant Bits to pack some bits into pointers..

Another technique is to use the Least Significant Bits: indeed since pointers are 64 bit aligned, the lowest 3 bits are always zero, therefore it is possible to pack 3 bits of information there.

And there's a third technique in C++, which is to use placement new to morph an object in-place. This can be use to toggle between 2 classes with a boolean method, where one has this method return true and the other class has this method return false. This creates per-object state that really uses bits in the vptr to store this boolean state. Obviously this can be used with a whole set of classes to stores more bits. I have used this successfully to store values of refcount (each AddRef/Release using placement new to morph from RefCounted1/RefCounted2/../RefCountedN classes) to implement reference counting without a counter.

Panzerschrek

4 months ago

Sadly on 32-bit platforms there is not so many unused bits in pointers - typically only 3 least significant bits. But for some cases even 3 bits may be useful.

billpg

4 months ago

I wonder if the language could support a tagged pointer type, where you ask the compiler for n extra bits along with the pointer. If the pointer's internal structure has space for those bits, it uses them, otherwise the compiler allocates space for the pointer and the tag as a larger structure.

View full discussion on Hacker News

ID: 45328335Type: storyLast synced: 11/20/2025, 5:27:03 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN