Avoid Uuid Version 4 Primary Keys in Postgres
Key topics
The debate rages on: should you avoid using UUID Version 4 as primary keys in Postgres? The discussion sparks a lively exchange, with some commenters pointing out that the issue isn't unique to Postgres, but rather a problem that plagues various relational databases. As one commenter quips, "Math is math," highlighting that the concerns around UUIDv4, such as index fragmentation and space requirements, are universal. While some argue that UUIDv7 or alternative solutions like obfuscated integers could be viable alternatives, others counter that the choice ultimately depends on the specific use case and requirements, such as the need for uniqueness across multiple servers or the ability to handle high concurrency.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
39m
Peak period
140
0-12h
Avg / period
32
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 15, 2025 at 5:08 AM EST
23 days ago
Step 01 - 02First comment
Dec 15, 2025 at 5:47 AM EST
39m after posting
Step 02 - 03Peak activity
140 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 20, 2025 at 8:29 AM EST
18 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
(in the scientific reporting world this would be the perennial "in mice")
If you use UUIDv7, you can partition your table by the key prefix. Then the bulk of your data can be efficiently skipped when applying updates.
Just the other day I delivered significant performance gains to a client by converting ~150 million UUIDv4 PKs to good old BIGINT. They were using a fairly recent version of MariaDB.
if they can live with MariaDB, OK, but I wouldn't choose that in the first place these days. Likely Postgres will also perform better in most scenarios.
Had the requirements been different, UUIDv7 would have worked well, too, because fragmentation is the biggest problem here.
It would be the equivalent of "if you're a middle-aged man" or "you're an American".
P.S. I think some of the considerations may be true for any system that uses B-Tree indexes, but several will be Postgres specific.
Now this doesn't work if you actually have enough data that the randomness of the UUIDv4 keys is a practical database performance issue, but I think you really have to think long and hard about every single use of identifiers in your application before concluding that v7 is the solution. Maybe v7 works well for some things (e.g identifiers for resources where creation times are visible to all with access to the resource) but not others (such as users or orgs which are publicly visible but without publicly visible creation times).
But there are cases where it matters. Using an UUIDv7 for identifiers means you need to carefully consider the security and privacy implications every time you create a new table identified by a UUID, and you'll possibly end up with some tables where you use v4 and some where you use v7. Worst case, you'll end up with painful migrations from v7 to v4 as security review identifies timestamped identifiers as a security concern.
For one example, say you were making voting-booth software. You really don't want a (hidden) timestamp attached to each vote (much less an incrementing id) because that would break voter confidentiality.
More generally, it's more a underlying principle of data management. Not leaking ancillary data is easier to justify than "sure we leak the date and time of the record creation, but we can't think of a reason why that matters."
Personally I think the biggest issue are "clever" programmers who treat the uuid as data and start displaying the date and time. This leads to complications ("that which is displayed, the customer wants to change"). It's only a matter of time before someone declares the date "wrong" and it must be "fixed". Not to mention time zone or daylight savings conversions.
I'm sorry. How?
However, because the online ordering system assigned order numbers sequentially, it would have been trivial for that company to determine how important their business was.
For example, over the course of a month, they could order something at the start of the month and something at the end of the month. That would give them the total number of orders in that period. They already know how many orders they have placed during the month, so company_orders / total_orders = percentage_of_business
It doesn't even have to be accurate, just an approximation. I don't know if they figured out that they could do that but it wouldn't surprise me if they had.
You can have more details here, in the section "Complete invoice":
https://sede.agenciatributaria.gob.es/Sede/en_gb/iva/factura...
https://www.boe.es/buscar/act.php?id=BOE-A-2012-14696#a6 (Spanish only)
I've read people suggest using a UUIDv7 as the primary key and a UUIDv4 as a user-visible one as a remedy.
My first thought when reading the suggestion was, "well but you'll still need an index on the v4 IDs, so what does this actually get you?" But the answer is that it makes joins less expensive; you only require the index once, when constructing the query from the user-supplied data, and everything else operates with the better-for-performance v7 IDs.
I'm still not sure either way if I like the idea, but it's certainly not the craziest thing I've ever heard.
But if you are doing that, why not just use an incrementing integer instead of a uuidv7?
The benefit of uuid in this case is that it allows horizontally scalable app servers to construct PKs on their own without risk of collisions. In addition to just reducing database load by doing the ID generation on the app server (admittedly usually a minor benefit), this can be useful either to simplify insert queries that span multiple tables with FK relationships (potentially saving some round trips in the process) or in very niche situations where you have circular dependencies in non-nullable FKs (with the constraint deferred until the end of the transaction).
See perhaps "UUIDv47 — UUIDv7-in / UUIDv4-out (SipHash‑masked timestamp)":
* https://github.com/stateless-me/uuidv47
* Sept 2025: https://news.ycombinator.com/item?id=45275973
UUIDv47 might have a space if you need keys generated on multiple backend servers without synchronization. But it feels very niche to me.
1: https://wiki.postgresql.org/wiki/XTEA_(crypt_64_bits)
(What do you think Youtube video IDs are?)
I actually haven no idea. What are they?
(Also what is the format of their `si=...` thing?)
I am much more interested in the `si` parameter.. but I am fairly sure nobody outside of Google knows what it is exactly.
I shared this article a few weeks ago, discussing the problems with this kind of approach: https://notnotp.com/notes/do-not-encrypt-ids/
I believe it can make sense in some situations, but do you really want to implement such crypto-related complexity?
That being said, while fine for obfuscation, it should not be used for security for this purpose, e.g. hidden/unlisted links, confirmation links and so on. Those should use actual, long-ish random keys for access, because the inability to enumerate them is a security feature.
Yes of course everyone should check and unit test that every object is owned by the user or account loading it, but demanding more sophistication from an attacker than taking "/my_things/23" and loading "/my_things/24" is a big win.
Also, if most of your endpoints require auth, this is not typically a problem.
It really depends on your application. But yes, that's something to be aware of. If you need some ids to be unguessable, make sure they are not predictable :-)
Many systems are not sparse, and separately, that's simply wrong. Unguessable names is not a primary security measure, but a passive remediation for bugs or bad code. Broken access control remains an owasp top 10, and idor is a piece of that. Companies still get popped for this.
See, eg, google having a bug in feb 2025, made significantly less impactful by unguessable names https://infosecwriteups.com/google-did-an-oopsie-a-simple-id...
If you're using latest version of PG, there is a plugin for it.
That's it.
And for those using ints as keys... you'd be surprised how many databases in the wild won't come close to consuming that many IDs or are for workloads where that sort of volume isn't even aspirational.
Now, to be fair, I'm usually in the UUID camp and am using UUIDv7 in my current designs. I think the parent article makes good points, but I'm after a different set of trade-offs where UUIDs are worth their overhead. Your mileage and use-cases may vary.
UUIDv4 are great for when you add sharding, and UUIDs in general prevent issues with mixing ids from different tables. But if you reach the kind of scale where you have 2 billion of anything UUIDs are probably not the best choice either
Once you encode shard number into ID, you got:
- instantly* know which shard to query
- each shard has its own ticker
* programatically, maybe visually as well depending on implementation
I had IDs that encode: entity type (IIRC 4 bit?), timestamp, shard, sequence per shard. We even had a admin page wher you can paste ID and it will decode it.
id % n is fine for cache because you can just throw whole thing away and repopulate or when 'n' never changes, but it usually does.
But the author does not say timestamp ordering, he says ordering. I think he actually means and believes that there is some problem ordering UUIDv4.
But if you need UUID-based lookup, then you might as well have it as a primary key, as that will save you an extra index on the actual primary key. If you also need a date and the remaining bits in UUIDv7 suffice for randomness, then that is a good option too (though this does essentially amount to having a composite column made up of datetime and randomness).
Auto-incrementing keys can work, but what happens when you run out of integers? Also, distributed dbs probably make this hard, and they can't generate a key on client.
There must be something in Postgres that wants to store the records in PK order, which while could be an okay default, I'm pretty sure you can this behavior, as this isn't great for write-heavy workloads.
Just to complement this with a point, but there isn't any mainstream database management system out there that is distributed on the sense that it requires UUIDs to generate its internal keys.
There exist some you can find on the internet, and some institutions have internal systems that behave this way. But as a near universal rule, the thing people know as a "database" isn't distributed on this sense, and if the column creation is done inside the database, you don't need them.
Accessing data in totally random locations can be a performance issue.
Depends on lots of things ofc but this is the concern when people talk about UUID for primary keys being an issue.
Values of the same type can be sorted if a order is defined on the type.
It's also strange to contrast "random values" with "integers". You can generate random integers, and they have a "sorting" (depending on what that means though)
Edit: just saw your edit, sounds like we're on the same page!
One more reason to stay away from microservices, if possible.
If you separate them (i.e. microservices) the they no longer try to use one db.
Sometimes it might even be for a good reason.
Of course this is not always the case that is bad, for example if you have a lot of relations you can have only one table where you have the UUID field (and thus expensive index), and then the relations could use the more efficient int key for relations (for example you have an user entity with both int and uuid keys, and user attribute references the user with the int key, of course at the expense of a join if you need to retrieve one user attribute when retrieving the user is not needed).
https://www.postgresql.org/docs/current/hash-index.html
Also is it necessary to show uuid at all to customers of an API? Or could it be a valid pattern to hide all the querying complexity behind named identifiers, even if it could cost a bit in terms of joining and indexing?
The context is the classic B2B SaaS, but feel free to share your experiences even if it comes from other scenarios!
I'm tired of midwit arguments like "Tech X is 50% faster than tech Y at performing operation Z, therefore it's the only logical choice in all situations!"
It's an infuriatingly silly argument because operation Z may only represent about 10% of the total CPU usage of the whole system... So what is promoted as a 50% gain may in fact be a 5% gain... Negligible. If everyone was looking at this performance 'advantage' rationally; nobody would think it's worth sacrificing important security or operational properties.
I don't know what happened to our industries; we're supposed to be intelligent people but I see developers falling for these obvious deceptive arguments over and over.
If the client generates a UUID and POSTs a new resource to insert it into the database; if there is a connection failure and the client does not receive a success response from the server, they cannot know whether the record was inserted or not without making an expensive and cumbersome additional call to check... If the IDs are auto-incremented on the server and the client posts the same object again without a UUID on it, it will create a duplicate record in the database table...
On the other hand, if the client generates a UUID for the objects it wants to create, then it can safely resend any object after a timeout and there is no risk of double-insertion.
Using idempotency identifier is the last resort in my book.
To your original point, these are heuristics; there isn't always time to dig into every little architectural decision, so having a set of rules of thumb on hand helps to preempt problems at minimal cognitive cost. "Avoid using a GUID as a primary key if you can" is one of mine.
Aren't people using (big)ints are primary keys, and using UUIDs as logical keys for import/export, solving portability across different machines?
Consider say weather hardware. 5 stations all feeding into a central database. They're all creating rows and uploading them. Using sequential integers for that is unnecessarily complex (if even possible.)
Given the amount of data created on phones and tablets, this affects more situations than first assumed.
It's also very helpful in export / edit / update situations. If I export a subset of the data (let's say to Excel), the user can edit all the other columns and I can safely import the result. With integer they might change the ID field (which would be bad). With uuid they can change it, but I can ignore that row (or the whole file) because what they changed it to will be invalid.
0: https://bsky.app/profile/hugotunius.se/post/3m7wvfokrus2g
https://github.com/blitss/typeid-postgres
I have slightly different goals for my version. I want everything to fit in 128 bits so I'm sacrificing some of the random bits, I'm also making sure the representation inside Postgres is also exactly 128 bits. My initial version ended up using CBOR encoding and being 160 bits.
Mine dedicates 16 bits for the prefix allowing up to 3 characters (a-z alphabet).
Permanent identifiers should not carry data. This is like the cardinal sin of data management. You always run into situations where the thing you thought, "surely this never changes, so it's safe to squeeze into the ID to save a lookup". Then people suddenly find out they have a new gender identity, and they need a last final digit in their ID numbers too.
Even if nothing changes, you can run into trouble. Norwegian PNs have your birth date (in DDMMYY format) as the first six digits. Surely that doesn't change, right? Well, wrong, since although the date doesn't change, your knowledge of it might. Immigrants who didn't know their exact date of birth got assigned 1. Jan by default... And then people with actual birthdays on 1 Jan got told, "sorry, you can't have that as birth date, we've run out of numbers in that series!"
Librarians in the analog age can be forgiven for cramming data into their identifiers, to save a lookup. When the lookup is in a physical card catalog, that's somewhat understandable (although you bet they could run into trouble over it too). But when you have a powerful database at your fingertips, use it! Don't make decisions you will regret just to shave off a couple of milliseconds!
I don't agree with the absolute statement, though. Permanent identifiers should not generally carry data. There are situations where you want to have a way to reconciliate, you have space or speed constraints, so you may accept the trade off, md5 your data and store it in a primary index as a UUID. Your index will fragment and thus you will vacuum, but life will still be good overall.
While that is often neat solution, do not do that by simply XORing the numbers with constant. Use a block cipher in ECB mode (If you want the ID to be short then something like NSA's Speck comes handy here as it can be instantiated with 32 or 48 bit block).
And do not even think about using RC4 for that (I've seen that multiple times), because that is completely equivalent to XORing with constant.
https://www.postgresql.org/docs/current/functions-uuid.html
[1] https://gist.github.com/mikelehen/3596a30bd69384624c11
The issue is that is true for more or less all capability URLs. I wouldn't recommend UUIDs per se here, probably better to just use a random number. I have seen UUIDs for this in practice though and these systems weren't compromised because of that.
I hate the tendency that password recovery flows for example leave the URL valid for 5 minutes. Of course these URLs need to have a limited life time, but mail isn't a real time communication medium. There is very little security benefit from reducing it from 30 minutes to 5 minutes for example. You are not getting "securer" this way.
277 more comments available on Hacker News