My Favourite Small Hash Table
Key topics
The quest for an efficient small hash table has sparked a lively discussion, with commenters diving into the code and implementation details of the author's favorite data structure. One key debate revolves around the author's decision to store key-value pairs as a single `uint64_t` instead of a struct, with some suggesting it might be to avoid struct padding, while others point out that alignment constraints are the likely reason. The discussion sheds light on the intricacies of data structure design and the trade-offs involved in optimizing for performance. As commenters share their insights and alternative implementations, the thread becomes a fascinating exploration of the nuances of low-level programming.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
2h
Peak period
15
0-6h
Avg / period
5.6
Based on 28 loaded comments
Key moments
- 01Story posted
Dec 9, 2025 at 9:47 AM EST
24 days ago
Step 01 - 02First comment
Dec 9, 2025 at 11:20 AM EST
2h after posting
Step 02 - 03Peak activity
15 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 11, 2025 at 5:07 PM EST
22 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://www.corsix.org/content/fast-crc32c-4k
https://github.com/corsix/fast-crc32/
You could work around that with a union or casts with explicit alignment constraints, but this is the shortest way to express that.
This constraint allows simply making a linear array of all the 4 billion values for each key with the key as index, which fits in 16 GiB. Another 500 MiB is enough to have a bit indicating present or not for each.
Perhaps strings as keys and values would give a more interesting example...
The hash table has the significant advantage of having a much smaller minimum size.
> Perhaps text strings as keys and values would give a more interesting example
Keep reading to "If keys and values are larger than 32 bits"
Yay, we've got an equivalent of SIB byte but as three (six?) separate opcodes. Well, sub-opcodes.
It's a shame though that Zcmp extension didn't get into RVA23 even as an optional extension.
Zcmp is only for embedded applications without D support.
You wouldn't want an instruction with up to 13 destinations in high performance designs anyways.
If you want load/store pair, we already have that, you can just interpret two adjacent 16-bit load or stores as a single 32-bit instruction.
Why not? Code density matters even in high-performance designs although I guess the "millicode routines" can help with that somewhat. Still, the ordering of stores/loads is undefined, and they are allowed to be re-done however many times, so... it shouldn't be onerous to implement? Expanding it into μops during the decoding stages seems straightforward.
I wouldn't say so, because if you want to be able to crack an instruction into up to N uops, now the second instruction could be placed in any slot from the 2nd to the 1+Nth and you now have to create huge shuffle hardware tk support this.
Apple for example can only crack instructions that generate up to 3 μops at decode (or before rename) anything beyond needs to be microcoded and stall decoding other instructions.
https://gemini.google.com/share/5add15a1c12f
One observation: the implementation is a simple in-memory, single-threaded hash table. It’s perfect for its intended use, but concurrent access or very large tables would need further adaptation.
https://en.wikipedia.org/wiki/Cuckoo_hashing
> To meet property (3) [if the key 0 is present, its value is not 0] ... "array index plus one" can be stored rather than "array index".
If hash code can take any value in [0,2^32), how to define a special value for empty buckets? The more common solution is to have a special key (not hash code) for empty slots which is easier to achieve. In addition, as the author points out, supporting generic keys requires to store 32-bit hash values. With the extra 4 bytes per bucket, it is not clear if this implementation is better than plain linear probing. The fastest hash table implementations like boost and abseil don't often use Robin Hood hashing.
8 more comments available on Hacker News