Exploring Postgresql 18's New Uuidv7 Support
Posted3 months agoActive2 months ago
aiven.ioTechstoryHigh profile
calmmixed
Debate
70/100
PostgresqlUuidv7Database Design
Key topics
Postgresql
Uuidv7
Database Design
The article discusses PostgreSQL 18's new UUIDv7 support, sparking a discussion on the trade-offs between UUIDv7 and other ID generation strategies, including performance, security, and privacy concerns.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2d
Peak period
135
Day 3
Avg / period
32
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 15, 2025 at 10:40 AM EDT
3 months ago
Step 01 - 02First comment
Oct 17, 2025 at 4:32 PM EDT
2d after posting
Step 02 - 03Peak activity
135 comments in Day 3
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 26, 2025 at 4:35 PM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45593358Type: storyLast synced: 11/20/2025, 8:32:40 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Is a collision possible? Yes, but the likelihood of a collision is so low that it's not worth agonizing over (although I did when I was designing the system).
Of course you need to be sure the server will accept the ID, but that is practically guaranteed by the uniqueness property of UUIDs.
With multiple servers talking to a single database, I'd still prefer to let the database handle generating IDs.
Speaking of Google, Spanner recommends uuid4, and specifically not any uuid that includes a timestamp at the start like uuid7.
Now, the index on the public IDs would be faster with a uuid7 than a uuid4, but you have a similar info leak risk that the article mentions.
bigserial must by generated by the db
UUID7 allows anyone to know the time of creation, but not how many records have been created (approximately) in a particular time frame. It leaks data about the record itself, but not about other records.
- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.
- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.
- Because UUIDs can be generated anywhere, sharding is easier.
The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.
I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.
World’s easiest hack. You’re looking at /customers/3836/bills? What happens if you change that to 4000? They’re a big company. I bet that exists.
Did they put proper security checks EVERYWHERE? Easy to test.
But if you’re at /customers/{big-long-hex-string}/bill the chances of you guessing another valid ID are basically zero.
Yeah it’s security through obscurity. But it’s really good obscurity.
In some use cases it can be possible to exclude, or anonymize the PK, but in other cases a PK is necessary. Once you start building APIs to allow others to access your system, a UUIDv4 is the best ID.
There are some performance issues with very large tables though. If you have very large tables (think billions of rows) then UUIDv7 offers some performance benefits at a small security cost.
Personally I use v4 for almost all my tables because only a very small number of them will get large enough to matter. But YMMV.
In a well designed application, you shouldn't be able to guess whether a record exists or not simply by accessing a protected URL. As a counter argument - normal BIGINT or serial PKs are performant and are more than enough for most applications.
Systems must be _structurally architected_ with security in mind.
Security is layered, using a random key with 128-bit space makes guessing UUIDs infeasible. But _also_ you should be doing AuthZ on the records, and also you should be doing rate limiting on API so they can't be brute forced, either.
So the common response is sequential ID crawling by bad actors. UUIDs are generally un-guessable and you can throw them into slop DBs like Mongo or storage like S3 as primary identifiers without worrying about permissions or having a clever interested party pwn your whole database. A common case of security through obscurity.
The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.
I want to see something like HBase or S3 paths when UUIDv7 gets used.
It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.
As for range partitioning, you can of course choose to partition on the hash of the UUIDv7 at the cost of giving up cheaper rights / faster indices. On the other hand, that of course gives up locality which is a common challenge of partitioning schemes. It depends on the end-to-end design of the system but I wouldn't say that UUIDv7 is inherently good or bad or better/worse than other UUID schemes.
I assume there would be some type of index on the timestamp portion & the uuid portion?
wouldn’t that make it better for partitioning since we’d only need to query partitions that match the timestamp portion
> What can go wrong with using UUIDv7 Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.
> This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization. While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach. Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
So then what's the point? How I always did things in the past was use an auto increment big int as the internal primary key, and then use a separate random UUID for the external facing key. I think this recommendation from "experts" is pretty dumb because you get very little benefit using UUIDV7 (beyond some portability improvements) if you're still using a separate internal key.
While I wouldn't use UUIDV7 as a secure token like I would UUIDV4, I don't see anything wrong with using UUIDV7 as externally exposed object keys - you're still going to need permissions checks anyway.
Or where, for some reason, the ID needs to be created before being inserted into the database. Like you're inserting into multiple services at once.
UUIDv4 lets us sidestep this.
Is it bad design? Probably. Is it going to happen at huge companies? Yes.
I honestly don't see how.
What experts? For what scenarios specifically? When do they consider time-of-creation to be sensitive?
So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
In other words, you can only use UUIDv7 for rows that never need to be looked up by any data coming from the user. And maybe that exists sometimes for certain data in JOINs... but it seems like it might be more the exception than the rule, and you never know when an internal ID might need to become an external one in the future.
In most cases this forms a compliance matter rather than an open attack vector, but it nevertheless remains that one has to answer any question along the lines "did you minimise the privacy surface?" in the negative, or at least, with a caveat.
Whether creation date is PHI…I could see the argument being yes, since it correlates to medical information (when someone sought treatment, which could be when symptoms present.)
The only thing that seems to be protected is ‘reason for appointment’, and not all systems do that.
Everyone signs paperwork to authorize this when they first engage with the medical providers!
Dates are specifically cited as potential vectors for de-anonymization. For example, you can't disclose that "Bob H presented to the clinic on October 10th" because that's a lot of information that can be used to find out who Bob H is.
Here's a practical example of what I'm talking about. Suppose you have an app for physicians that allows them to message each other to discuss a case. They can share relevant information for diagnostic purposes, e.g., "34y/o male from the southern Louisianna presented with a rash." They share de-identified photos and chat about ddx, treatment protocol, etc. All of that is cool. However, if the record of that visit is identified with a UUIDv7, and that ID is used as part of the URL you've exposed the time of the visit, and that would be a problem.
You wouldn't be publishing patient visits publically, the only folks that'd legitimatly see that record would be those which access to that visit, and they'd most likely need to know the time of said visit. This access should be controlled via AuthN, AuthZ and audited.
You'd also generally do a lot of time-based lookups on this data; what visits do I have today, this week, and so on. You might also want an additional DateTime field for timezones and offsets, but the v7 is probably better than v4 for this usecase.
Always thought that was elegant (the attach not using the time as the seed).
Of course, it's always possible for something to do something stupid, like weak rng.
UUIDv4 removes all three of those vectors. UUIDv7 still removes two of three. It doesn't leak record count or the rate at which you create them, only creation time. And you still can't guess adjacent keys. It's a pretty narrow information leakage for something you routinely reveal on purpose.
With UUIDv7 the creation time is always leaked without any sampling. A casual attacker could quite easily lookup the time and become motivated in probing and linking the account further
When sequential integer ID's are externalized, an attacker does not need creation times to perform predictive attacks. All they need to do is apply deltas to known identifiers.
I can see it being bad for tracking IDs, but not order IDs, unless you are allowed to view any orders that do not belong to your account, which is just fundamentally bad security and using UUIDv4 or a random string would simply be obscuring security.
Knowing approximate age is a relatively small leak compared to that.
Bank security does not depend on your bank account being private information. Pretty much all bank security rounds to the bank having a magic undo button, so they can undo any bad transactions after it comes to light that it was a bad transaction. Sure they do some filtering on the front-end now to eliminate the need to use the magic undo button, but that's just extra icing to keep the undo button's use to a dull roar.
>> So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
> This is only really true if leaking the creation time of the record is itself a security concern.
No, as "leaking the creation time" is not a concern when API's return resources having properties representing creation/modification timestamps.
Where exposing predictable identifiers creates a security risk, such as exposing UUIDv7 or serial[0] types used as database primary keys, is it enables attackers to be able to synthesize identifiers which match arbitrary resources much quicker than when random identifiers are employed.
0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...
If your security relies on attacker don't know your ID (you don't do proper data permission check), your security is flawed.
That qualification is doing a lot of work in this sentence. For supporting evidence as to why this is the case, a quick search for "CVE PHP security vulnerabilities" or "CVE NodeJS security vulnerabilities" will produce voluminous results.
> And UUIDv7's random part is large enough so that it's much harder to predict than auto increment id.
Usually. One common scenario where using UUIDv7 for primary keys in a persistent store can be exploited similar to sequential integer ID's is when there are queries supporting pagenation and/or those leveraging the temporal ordering UUIDv7 supports intrinsically. For example:
Note that this does not require successful identification of either the `rand_a` or `rand_b` UUIDv7 fields[0].> If your security relies on attacker don't know your ID (you don't do proper data permission check), your security is flawed.
Again, I agree with this in theory. But as the saying[1] goes:
0 - https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-versio...1 - https://quoteinvestigator.com/2018/04/14/theory/
Primary keys using UUID v7 are (potentially) an HR violation.
https://mikenotthepope.com/primary-keys-using-uuid-v7-are-po...
I submit my application in 2025 and get rejected.
20 years later I submit another application to the same company, using my existing 20 years old user profile, and now get rejected because somebody figures out I'm old by looking at my user id?
I don't understand why you considered UUIDv7 in the first place.
If it’s the latter (which, reading wikipedias summary suggests it is), then the entire premise that k-sortable uuids are a “HR violation” is bunk.
The problem with arguing about timestamps leaking this kind of information is that _anything_ can leak this kind of vaguely dated information.
- Seen on a website that ceased to exist after 2010? Gotchya!
- Indexed by Waybackmachine? Gotchya!
- Used <different uuid scheme> for records created before 2022? Gotchya!
The only way to prevent divulging temporal clues about an entity is to never reveal its existence in any kind of correlatable way (which, as far as I’m prepared to think right now, seems to defeat the point of revealing it to a UI at all).
When using v7, I need some sort of audit that checks in every API contract for the usage of v7 and potential information leakage.
Detecting V7 uuids in the API contract would probably require me to enforce a special key name (uuidv7 & uuid for v4) for easier audit.
Engineers will get this wrong more than once - especially in a mixed team of Jr/sr.
Also, the API contracts will look a bit inconsistent: some resources will get addressed by v7, others by v4. On top, by using v4 on certain resources, I'd leak the information that those resources addressed by v4 will contain sensitive information.
By sticking to v4, I'd have the same identifier for all resources across the API. When needed, I can expose the creation timestamp in the response separately. Audit is much simpler since the fields state explicitly what they will contain.
Good luck if you're operating at a decent scale, and need to worry about db maintenance/throughput. Ask the DBA at your company what they would prefer.
Unless I'm missing something, check it on receipt, and reject it if it doesn't match. `uuid.replace("-", "")[12]` or `uuid >> 76 & 0xf`.
Regardless of difficulty, this comes down to priorities. Potential security concerns aside (I maintain this really does not matter nearly as much as people think for the majority of companies), it's whether or not you care about performance at scale. If your table is never going to get over a few million rows, it doesn't matter. If you're going to get into the hundreds of millions, it matters a great deal, especially if you're using them as PKs, and doubly so if you're using InnoDB.
UUIDv4 is explicitly forbidden in some high-reliability/high-assurance environments because there is a long history of engineers using weak entropy sources to generate UUIDv4 despite the warnings to use a strong entropy source, which is only discovered when it causes bugs in production. Apparently some engineers don't understand what "strong entropy source" means.
Mixing UUID types should be detectable because type is part of the UUID. But then many companies have non-standard UUID that overwrite the type field mixed with standard UUID across their systems. In practice, you often have to treat UUID as an opaque 128-bit integer with no attached semantics.
There was previously an article linked here about recovering access to some bitcoin by feeding all possible timestamps in a date range to the password creation tool they used, and trying all of those passwords.
There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly
You put in 128 bits, you get out 128 bits. The encryption is strong, so the clients won't be able to infer anything from it, and your backend can still get all the advantages of sequential IDs.
You also can future-proof yourself by reserving a few bits from the UUID for the version number (using cycle-walking).
UUIDv7 still works great in distributed systems and has algorithmic advantages as you have mentioned.
A UUIDv7 primary key seems to reduce / eliminate those problems.
If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.
[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...
That doesn't matter because it's the creation of the index entry that matters, not how often it's used for lookup. The lookup cost is the same anyways.
> Since workloads commonly are interested in recently inserted rows
That's only true for very specific types of applications. There's nothing general about that.
Plenty of applications grab rows from all time, and there's nothing special about the most recent ones. The most recent might also be the least popular rows, since few things reference them.
Very true, as detailed by the link you kindly provided. Which is why a technique I have found useful is to have both an internal `id` PK `serial`[0] column (never externalized to other processes) and another column with a unique constraint having a UUIDv4 value, such as `external_id`, explicitly for providing identifiers to out-of-process collaborators.
0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...
If your UUIDv4 is cached, your still suffering from extra storage and index. Not a issue on a million row system but imagine a billion, 10 billion.
And what if its not cached. Great, now your hitting the disk.
Computers do not suffering from lacking CPU performance, especially when you can deploy CPU instruction sets. Hell, you do not even need encryption. How about making a simple bit shift where you include a simple lookup identifier. Black box sure, and not great if leaked but you have other things to worry about if your actual shift pattern is leaked. Use extra byte or two for iding the pattern.
Obfuscating your IDs is easy. No need for full encryption.
66 more comments available on Hacker News