Uuidv47: Store Uuidv7 in Db, Emit Uuidv4 Outside (siphash-Masked Timestamp)
Key topics
The post introduces UUIDv47, a scheme that stores UUIDv7 internally for database indexing but emits UUIDv4-looking identifiers externally to mask timing information, sparking discussion on its utility, security, and potential alternatives.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2m
Peak period
55
0-6h
Avg / period
11
Based on 88 loaded comments
Key moments
- 01Story posted
Sep 17, 2025 at 10:02 AM EDT
4 months ago
Step 01 - 02First comment
Sep 17, 2025 at 10:04 AM EDT
2m after posting
Step 02 - 03Peak activity
55 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 19, 2025 at 4:53 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
How it works: the 48-bit timestamp is XOR-masked with a keyed SipHash-2-4 stream derived from the UUID’s random field. The random bits are preserved, the version flips between 7 (inside) and 4 (outside), and the RFC variant is kept. The mapping is injective: (ts, rand) → (encTS, rand). Decode is just encTS ⊕ mask, so round-trip is exact.
Security: SipHash is a PRF, so observing façades doesn’t leak the key. Wrong key = wrong timestamp. Rotation can be done with a key-ID outside the UUID.
Performance: one SipHash over 10 bytes + a couple of 48-bit loads/stores. Nanosecond overhead, header-only C11, no deps, allocation-free.
Tests: SipHash reference vectors, round-trip encode/decode, and version/variant invariants.
Curious to hear feedback!
1. You implicitly take away someone else's hypothetical benefit of leveraging UUID v7, which is disappointing for any consumer of your API.
2. By storing the UUIDs differently on your API service from internally, you're going to make your life just a tiny bit harder because now you have to go through this indirection of conversion, and I'm not sure if this is worth it.
Usually if you see an id in your http logs you can simply search your database for that id. The v4 to v7 indirection creates a small inconvenience.
The mismatch may be resolved if this was available as a fully transparent database optimization.
1. Not leaking timestamp data (security/regulations)
2. Having easily time-sortable primary keys (DB performance/etc.)
If you don't have both of these needs, the tool is an unnecessary indirection, as you've identified in (2).
However, where you do have both needs, some indirection is necessary. Whether this is the correct one is a different question.
Similarly, if you _must not_ leak timestamps for some real-world reason, (1) is an intrinsic requirement, consumers be damned.
Also, why use UUIDs in that case?
So it is not generally fit for that purpose either.
UUIDs are often generated client-side. Am I right in thinking that this isn’t possible with this approach? Even if you let clients give you UUIDs and they gave them back the masked versions, wouldn't you be vulnerable to a client providing two UUIDs with different ts and the same rand? So this is only designed for when you are generating the UUIDv7s yourself?
Of course, UUIDv4 on the client side is not without risk either- needing to validate uniqueness and not re-use of some other ID. For the UUIDv7 on client side- you could add some sanity validation- but really I think it’s best avoided.
creating them server-side risks having a network error cause a client to have requested a resource be created without receiving its id due to a network error before receiving the response, risking double submissions and generally bad recovery options from the UI.
if you need users to provide uuids for consistent network operations, you can have an endpoint responsible for generating signed uuids that expire after a short interval, thereby controlling uuid-time drift (must be used within 1-5 minutes, perhaps), ensuring the client can't forge them to mess with your backend, and still provide a nice and stable client-side-uuid system.
for the uuidv47 thing, you would apply their XOR trick prior to sending the UUID to the user. you presumably just reverse the XOR trick to get the UUIDv7 back from the UUIDv4 you passed them.
since when?
Of course, adding two IDs for a resource complicates things. But so too does trusting client-generated IDs to be universally unique.
https://www.postgresql.org/docs/18/functions-uuid.html
So there's definitely some gotchas with relying on rand_a and rand_b in UUIDv7 for seeding a PRF, and when ingesting data from devices outside of your trust boundary (as may be the case with high-volume telemetry), even if you wrote the code they basically can't be trusted for this purpose, and if those bits are undisturbed in the output it's certainly a problem if the idea was to obfuscate serialisation, timing, or correlation.
Even generations we might assume are safe may not be completely safe; for example, the new uuidv7() in PostgreSQL 18 fills rand_a entirely from the high precision part of the timestamp, and this is RFC compliant. So if an import routine generates a big batch of such UUIDs, this v7-to-v4 scheme discloses output bits that can be used to relate individual records as part of the same group. That might be fine for data points pertaining to a vehicle engine. It might not be fine for identifiers that relate to people.
So, since not all UUIDv7 is created alike, I'd add a strong caveat: unless generating the rand_a and rand_b bits entirely oneself with a high degree of confidence in their nonguessibility, then this scheme may still leak information regarding timing, sequence, or correlation of records, and you will have to read the source code of your UUIDv7 implementation to know for sure.
If they are suitably random then this scheme seems to check out, but you're going to need some barbed wire and some inspiration from these https://en.wikipedia.org/wiki/Long-term_nuclear_waste_warnin... on anything that can generate v7 IDs.
- "Ex-spouse: I looked you up on a dating website, and your userID indicates it was created while you were at Tom's party where you swear nothing happened."
- "You say you are in XYZ timezone, but all your imageIDs (that are unique to the image upon creation) are timestamped at what would be 3am in your timezone)"
Granted, for individual messages that are near-real-time, or for transactions that need to be timestamped anyway, it's probably fine, but for user-account-creation or "evergreen" asset-creation, it could leak the time to a sufficiently curious individual (or an organized group that is doing data-trawling and cross-correlation)
Can you expand on this? I don't see a situation where it's actually leaking. You either have a photo with EXIF or an image post were generated when post is created and created_at usually exposed.
For analysis reasons, you want to share this dataset (e.g. for diagnostics on the machine) but first must strip it of potentially identifying information.
The uuidv7 timestamp could be used to re-identify the data through correlation - "I know this person got an MRI on this day, there's only one record with a matching datestamp, thus I know it's their MRI."
It's pretty simple, unless when you provide a GUID to a party you are also willing to provide the timestamp when it was created, use UUIDv4.
A more simple example is a URL for say a file / photo share service. You allow users to upload images, and you return them back website.com/GUID. That's it. You don't provide a way to see when that photo / file was updated, but because you use a UUIDv7 you just did.
Is this a security risk? Maybe or maybe not? But it's an unintended disclosure of information.
Although I finished it, I never quite published it properly for some reason, probably partly because I shelved the projects where I had been going to use it (I might unshelve one of them next year).
Well, I might as well share it, because it’s quite relevant here and interesting:
https://temp.chrismorgan.info/2025-09-17-tesid/
My notes on its construction, pros and cons are fairly detailed.
Maybe I’ll go back and publish it properly next year.
https://sqids.org/
(Ah, it’s fun reading through that document a bit again. A few things I’d need to update now, like the Hashids name, or in the UUID section how UUIDv7 is no longer a draft, and of sidenote 12 I moved to India and got married and so took a phone number ending in 65536, replacing my Australian 32768. :-) )
It’s lasted for three years of use and three years of disuse, and I hope to replace it with something utterly different (stylistically and technically) by the end of this year, though it may slip to next year. The replacement will be based on handwriting.
(I’m not a fan of handwriting fonts either. They’re never truly satisfying, though some with quite a few variants for each character get past the point of feeling transparently inauthentic. But when you can write and draw what you choose, where you choose, that’s liberating.)
However, it is fit for purpose if your purpose is showing user-facing ids that can't be trivially incremented. For example, in a url, or in an api response. It does, in fact, "protect" against the "attack" of "Oh, I see in the url that my id is 19563, I wonder what I get if I change it to 19564.”
Now, the system should absolutely have authorization boundaries around data, but that doesn't mean there's no value in avoiding putting an "attractive nuisance" in front of users.
If it's not a real attack, it's not worth protecting against even in the slightest. If it's a real attack, it doesn't matter if it's trivial or not, does it?
The most likely purpose for this kind of encoding is to discourage users (as in other developers) from trying to derive meaning from the values that is not actually there.
This happens all the time: Another developer using your API observes sequential IDs, for example, and soon they start building their software on top of that observation, assuming it to be an intended property of the system. It even works perfectly for a while... until you want to change your implementation and break those assumptions. Which you now can't do, because breaking users is the cardinal sin of software development, leaving you forever beholden to implementation details that were never intended to leak out. That's not a good place to be. Making the IDs "opaque" indicates to the user that there is no other meaning.
That they are guessable doesn't matter. I dare say it may even be beneficial to be able to easily reverse the strings back into their original form to aid with things like debugging. Software development is primarily about communicating with other people, and using IDs that, at first glance, look random communicates a lot — even if they aren't actually random.
There may be a time and place for actually secure IDs, but more often than not you don't really need them. What you do regularly need, though, especially in large organizations, is a way to effectively work with others who don't read the documentation.
> It’s just bad
This is the first I've heard of Hashids, so I'll take your word for it, but I'm not sure you actually articulated why. I'll grant you that excluding profanity is a stupid need, but it is understandable why one might have to accept that as a necessary feature even if ultimately ridiculous.
https://github.com/noreastergroup/active_record_pretty_key
I wanted to use it many times in project for non-iteratable IDs but never found it again.
Look at https://git.chrismorgan.info/tesid/blob/HEAD:/rust/src/fpeck..., it’s very simple.
And maybe I misunderstand how the hashing works, but it seems if you're looking things up by the hashed uuid, you're still going to want two columns anyway.
timestamp + readability
UUIDv8 gives you timestamp + counter + random.
The advantage is that lexical order and chronological order are the same and you still retain enough random bits that guessing the next generated timestamp is not easy.
This library converts a uuidv7 into a cryptographically random but deterministic uuidv4 recoverable with a shared key. For all intents and purposes the external view is a uuidv4, the internal representation is a v7, which has better index block locality and orderability.
Add some random large value to your ints periodically - they’ll still be monotonic, but you’ll throw off the dastardly spies stealing your super valuable business intelligence.
For example, by only scraping the date and author of an online newspaper‘s articles over a period of time, you can deduce when every author is typically on vacation. Compare that against every other author and you can find patterns indicating, say, workplace affairs.
Source: a talk by David Kreisel called SpiegelMining (in German), or at least what I remember.
This way the DBs can use simple sequence numbers instead of timestamp based IDs.
Consider what you'll do if someone ever gets root in your web server and leaks the key.
Suddenly all your UUID's need to be replaced. That tends to be impossible since they're probably part of published URL's etc.
Big companies have made similar mistakes - that's probably why for example all private YouTube videos and Google docs had their links invalidated a few years back when the key security of a decade old key couldn't be certain and the key wasn't rotatable.
TL;DR: Never use anything where you cannot rotate a key, including this.
https://github.com/n2p5/uuid47
refs: https://github.com/dchest/siphash