Avoid Uuid Version 4 Primary Keys in Postgres

Posted23 days agoActive18 days ago

pil0u

375 points

437 comments

andyatkinson.comTech DiscussionstoryHigh profile

informativeneutral

Debate

40/100

Data_storagePrimary KeysUuid

Key topics

Data_storage

Primary Keys

Uuid

The debate rages on: should you avoid using UUID Version 4 as primary keys in Postgres? The discussion sparks a lively exchange, with some commenters pointing out that the issue isn't unique to Postgres, but rather a problem that plagues various relational databases. As one commenter quips, "Math is math," highlighting that the concerns around UUIDv4, such as index fragmentation and space requirements, are universal. While some argue that UUIDv7 or alternative solutions like obfuscated integers could be viable alternatives, others counter that the choice ultimately depends on the specific use case and requirements, such as the need for uniqueness across multiple servers or the ability to handle high concurrency.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

39m

Peak period

140

0-12h

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 15, 2025 at 5:08 AM EST
23 days ago
Step 01
02First comment
Dec 15, 2025 at 5:47 AM EST
39m after posting
Step 02
03Peak activity
140 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Dec 20, 2025 at 8:29 AM EST
18 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (437 comments)

Showing 160 comments of 437

jwr

23 days ago

3 replies

"if you use PostgreSQL"

(in the scientific reporting world this would be the perennial "in mice")

orthoxerox

23 days ago

1 reply

It's not just Postgres or even OLTP. For example, if you have an Iceberg table with SCD2 records, you need to regularly locate and update existing records. The more recent a record is, the more likely it is to be updated.

If you use UUIDv7, you can partition your table by the key prefix. Then the bulk of your data can be efficiently skipped when applying updates.

andatki

23 days ago

Good addition!

kijin

23 days ago

2 replies

The space requirement and index fragmentation issue is nearly the same no matter what kind of relational database you use. Math is math.

Just the other day I delivered significant performance gains to a client by converting ~150 million UUIDv4 PKs to good old BIGINT. They were using a fairly recent version of MariaDB.

zelphirkalt

23 days ago

1 reply

If they can live with making keys only in one place, then sure, this can work. If however they need something that is very highly likely unique, across machines, without the need to sync, then using a big integer is no good.

if they can live with MariaDB, OK, but I wouldn't choose that in the first place these days. Likely Postgres will also perform better in most scenarios.

kijin

23 days ago

Yeah, they had relatively simple requirements so BIGINT was a quick optimization. MariaDB can guarantee uniqueness of auto-incrementing integers across a cluster of several servers, but that's about the limit.

Had the requirements been different, UUIDv7 would have worked well, too, because fragmentation is the biggest problem here.

splix

23 days ago

1 reply

I think the author means all dbs that fit a single server. Because in distributed dbs you often want to spread the load evenly over multiple servers.

esafak

23 days ago

To spell it out: it improves performance by avoiding hot spots.

hyperpape

23 days ago

The thing is, none of us are mice, but many of us use Postgres.

It would be the equivalent of "if you're a middle-aged man" or "you're an American".

P.S. I think some of the considerations may be true for any system that uses B-Tree indexes, but several will be Postgres specific.

benterix

23 days ago

3 replies

The article sums up some valid arguments against UUIDv4 as PKs but the solution the author provides on how to obfuscate integers is probably not something I'd use in production. UUIDv7 still seems like a reasonable compromise for small-to-medium databases.

mort96

23 days ago

4 replies

I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

Now this doesn't work if you actually have enough data that the randomness of the UUIDv4 keys is a practical database performance issue, but I think you really have to think long and hard about every single use of identifiers in your application before concluding that v7 is the solution. Maybe v7 works well for some things (e.g identifiers for resources where creation times are visible to all with access to the resource) but not others (such as users or orgs which are publicly visible but without publicly visible creation times).

cdmckay

23 days ago

8 replies

Out of curiosity, why is it an issue if you leak creation time?

robertlagrant

23 days ago

1 reply

Depends on the data. If you use a primary key in data about a person that shouldn't include their age (e.g. to remove age-based discrimination) then you are leaking an imperfect proxy to their age.

lwhi

23 days ago

1 reply

So the UUID could be used as an imperfect indicator of a records created time?

benterix

23 days ago

1 reply

UUIDv7 but not UUIDv4.

lwhi

23 days ago

1 reply

I suppose timing attacks become an issue too.

wongarsu

23 days ago

UUIDv7 still have a lot of random bits. Most attacks around creating lots of ids are foiled by that

mort96

23 days ago

2 replies

Well you're leaking user data. I'm sure you can imagine situations where "the defendant created an account on this site on this date" could come up. And the user could have created that account not knowing that the creation date is public, because it's not listed anywhere in the publicly viewable part of the profile other than the UUID in the URL.

nish__

23 days ago

1 reply

Pretty much every social media app has a "Member since X" visible on public profiles. I don't think it's an issue.

mort96

23 days ago

1 reply

Who said I was talking about social media?

nish__

23 days ago

1 reply

Well where else do users have public profiles?

0x3f

23 days ago

1 reply

The whole point though is that the ID itself leaks info, even if the profile is not public. There are many cases where you reference an object as a foreign key, even if you can't see the entire record of that foreign key.

nish__

23 days ago

1 reply

I can't think of any.

strbean

23 days ago

Sending a friend request is an obvious example.

koakuma-chan

23 days ago

1 reply

Discord is doing fine.

mort96

23 days ago

Hacker news is also doing fine, even though I can just click your profile and see you joined in october 2024. It doesn't matter for every use case.

But there are cases where it matters. Using an UUIDv7 for identifiers means you need to carefully consider the security and privacy implications every time you create a new table identified by a UUID, and you'll possibly end up with some tables where you use v4 and some where you use v7. Worst case, you'll end up with painful migrations from v7 to v4 as security review identifies timestamped identifiers as a security concern.

bruce511

23 days ago

1 reply

The issue will be very context specific. In other words to (reasonably) answer the question, we'd have to judge each application individually.

For one example, say you were making voting-booth software. You really don't want a (hidden) timestamp attached to each vote (much less an incrementing id) because that would break voter confidentiality.

More generally, it's more a underlying principle of data management. Not leaking ancillary data is easier to justify than "sure we leak the date and time of the record creation, but we can't think of a reason why that matters."

Personally I think the biggest issue are "clever" programmers who treat the uuid as data and start displaying the date and time. This leads to complications ("that which is displayed, the customer wants to change"). It's only a matter of time before someone declares the date "wrong" and it must be "fixed". Not to mention time zone or daylight savings conversions.

nish__

23 days ago

> For one example, say you were making voting-booth software. You really don't want a (hidden) timestamp attached to each vote (much less an incrementing id) because that would break voter confidentiality.

I'm sorry. How?

saaspirant

23 days ago

2 replies

There was a HN comment about competitors tracking how many new signups are happening and increasing the discounts/sales push based on that. Something like this.

JetSetIlly

23 days ago

1 reply

In a business I once worked for one of the users of the online ordering system represented over 50% of the agencies income, something you wouldn't necessarily want them to know.

However, because the online ordering system assigned order numbers sequentially, it would have been trivial for that company to determine how important their business was.

For example, over the course of a month, they could order something at the start of the month and something at the end of the month. That would give them the total number of orders in that period. They already know how many orders they have placed during the month, so company_orders / total_orders = percentage_of_business

It doesn't even have to be accurate, just an approximation. I don't know if they figured out that they could do that but it wouldn't surprise me if they had.

pezezin

23 days ago

2 replies

This is also something that depends heavily on regulations. In my home country, invoice numbers have to be sequential by law, although you can restart the numbering every year.

hexbin010

23 days ago

1 reply

[delayed]

pezezin

22 days ago

A sequence per "series", where a series can be a fiscal year, a department or category, etc. But I am not sure if you can have one series per customer, I only find conflicting information.

You can have more details here, in the section "Complete invoice":

https://sede.agenciatributaria.gob.es/Sede/en_gb/iva/factura...

https://www.boe.es/buscar/act.php?id=BOE-A-2012-14696#a6 (Spanish only)

JetSetIlly

23 days ago

Yes, even if it's not a legal requirement it's definitely best practice to have sequential invoice numbers. I thought about this at the time but these numbers aren't invoice numbers, only order numbers.

0x3f

23 days ago

That's happening everywhere. You can order industrial parts from a Fortune 500 and check some of the numbers on it too, if they're not careful about it.

natch

23 days ago

[delayed]

dboreham

23 days ago

Apart from all the other answers here: an external entity knowing the relative creation time for two different accounts, or just that the two accounts were created close in time to each other can represent a meaningful information leak.

Bombthecat

23 days ago

Admins, early users, founders, CEOs etc etc would have althe lowest creation time...

kreetx

23 days ago

E.g, if your service users have timestamp as part of the key and this data is visible to other users, you would know when that account was created. This could be an issue.

nbadg

23 days ago

1 reply

I'm also not a huge fan of leaking server-side information; I suspect UUIDv7 could still be used in statistical analysis of the keyspace (in a similar fashion to the german tank problem for integer IDs). Also, leaking data about user activity times (from your other comment) is a *really* good point that I hadn't considered.

I've read people suggest using a UUIDv7 as the primary key and a UUIDv4 as a user-visible one as a remedy.

My first thought when reading the suggestion was, "well but you'll still need an index on the v4 IDs, so what does this actually get you?" But the answer is that it makes joins less expensive; you only require the index once, when constructing the query from the user-supplied data, and everything else operates with the better-for-performance v7 IDs.

I'm still not sure either way if I like the idea, but it's certainly not the craziest thing I've ever heard.

thayne

23 days ago

1 reply

I think a bigger benefit from doing that would be that inserts would be cheaper. Instead of an expensive insert into the middle of an index for every table that needs an index on that key, you can do a cheaper insert at the end of the index for all of them except for the one that uses uuid4.

But if you are doing that, why not just use an incrementing integer instead of a uuidv7?

nbadg

23 days ago

Certainly for many applications, the autoint approach would be fine.

The benefit of uuid in this case is that it allows horizontally scalable app servers to construct PKs on their own without risk of collisions. In addition to just reducing database load by doing the ID generation on the app server (admittedly usually a minor benefit), this can be useful either to simplify insert queries that span multiple tables with FK relationships (potentially saving some round trips in the process) or in very niche situations where you have circular dependencies in non-nullable FKs (with the constraint deferred until the end of the transaction).

throw0101a

23 days ago

1 reply

> I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

See perhaps "UUIDv47 — UUIDv7-in / UUIDv4-out (SipHash‑masked timestamp)":

* https://github.com/stateless-me/uuidv47

* Sept 2025: https://news.ycombinator.com/item?id=45275973

wongarsu

23 days ago

If that kind of stuff is on the able you can also use boring 64bit integer keys and encrypt those (e.g. [1]). Which in the end is just a better thought out version of what the article author did.

UUIDv47 might have a space if you need keys generated on multiple backend servers without synchronization. But it feels very niche to me.

1: https://wiki.postgresql.org/wiki/XTEA_(crypt_64_bits)

barrkel

23 days ago

1 reply

You shouldn't generally use PKs as public identifiers, least of all UUIDs, which are pretty user hostile.

mort96

23 days ago

I really don't see the issue with having a UUID in a URL.

formerly_proven

23 days ago

5 replies

If all you want is to obfuscate the fact that your social media site only has 200 users and 80 posts, simply use a permutation over the autoincrement primary key. E.g. IDEA or CAST-128, then encode in base64.

(What do you think Youtube video IDs are?)

pdimitar

23 days ago

2 replies

> What do you think Youtube video IDs are?

I actually haven no idea. What are they?

(Also what is the format of their `si=...` thing?)

intalentive

23 days ago

1 reply

Can’t recall where I heard this, but I’m pretty sure the si=… is tracking information that associates the link with the user who shared it.

pdimitar

23 days ago

Oh absolutely, I am just wondering _what_ does it contain.

wpollock

20 days ago

1 reply

[delayed]

pdimitar

20 days ago

Interesting. Any examples? I mean, I can probably reverse-engineer something myself but just curious.

I am much more interested in the `si` parameter.. but I am fairly sure nobody outside of Google knows what it is exactly.

enz

23 days ago

1 reply

The problem with this approach is that you now have to manage a secret key/secret for a (maybe) a very long time.

I shared this article a few weeks ago, discussing the problems with this kind of approach: https://notnotp.com/notes/do-not-encrypt-ids/

I believe it can make sense in some situations, but do you really want to implement such crypto-related complexity?

formerly_proven

23 days ago

The article is self-contradictory in that it acts like that key is super-important ("Operations becomes a nightmare. You now have a cryptographic secret to manage. Where does this key live? Protected by a wrapping key living in a KMS or HSM? Do you use the same key across prod, staging, and dev? If dev needs to test with prod data, does it need access to prod encryption keys? What about CI pipelines? Local developer machines?") but then also acknowledges that we're talking about an obfuscation layer of stuff which is not actually sensitive ("to hide timestamps that aren't sensitive"). Don't get me wrong, it's a definitive drawback for scaling the approach, but most applications have to manage various secrets, most of which are actually important. E.g. session signing keys, API keys etc. It's still common for applications to use signed session with RCE data formats. The language from that article, while not wrong, is much more apt for those keys.

That being said, while fine for obfuscation, it should not be used for security for this purpose, e.g. hidden/unlisted links, confirmation links and so on. Those should use actual, long-ish random keys for access, because the inability to enumerate them is a security feature.

Retr0id

23 days ago

Why not use AES-128 by default? Your CPU has instructions to accelerate AES-128.

benterix

23 days ago

I always thought they are used and stored as they are because the kind of transformation you mention seems terribly expensive given the YT's scale, and I don't see a clear benefit of adding any kind of obfuscation here.

conradfr

23 days ago

Can't you just change the starting value of your sequence?

stickfigure

23 days ago

1 reply

In Postgres I often like to use a single sequence for everything. It leaks some information yes but in a busy system it tends to be "obscure enough".

x0x0

23 days ago

1 reply

It's not leaking that's the concern. It's that not having the names of objects be easily enumerable is a strongly security-enhancing feature of a system.

Yes of course everyone should check and unit test that every object is owned by the user or account loading it, but demanding more sophistication from an attacker than taking "/my_things/23" and loading "/my_things/24" is a big win.

stickfigure

23 days ago

2 replies

With a single sequence and a busy system, the ids for most high-level tables/collection are extremely sparse. This doesn't mean they can't be enumerated, but you will probably notice if you suddenly start getting hammered with 404s or 410s or whatever your system generates on "not found".

Also, if most of your endpoints require auth, this is not typically a problem.

It really depends on your application. But yes, that's something to be aware of. If you need some ids to be unguessable, make sure they are not predictable :-)

thayne

23 days ago

If you have a busy system, a single sequence is going to be a pretty big performance bottleneck, since every resource creation will need to acquire a lock on that sequence.

x0x0

23 days ago

> Also, if most of your endpoints require auth, this is not typically a problem.

Many systems are not sparse, and separately, that's simply wrong. Unguessable names is not a primary security measure, but a passive remediation for bugs or bad code. Broken access control remains an owasp top 10, and idor is a piece of that. Companies still get popped for this.

See, eg, google having a bug in feb 2025, made significantly less impactful by unguessable names https://infosecwriteups.com/google-did-an-oopsie-a-simple-id...

xandrius

23 days ago

2 replies

To summarise the article: in PG, prefer using UUIDv7 over UUIDv4 as they have slightly better performance.

If you're using latest version of PG, there is a plugin for it.

That's it.

sbuttgereit

23 days ago

1 reply

[delayed]

tmountain

23 days ago

4 replies

Sticking with sequences and other integer types will cause problems if you need to shard later.

zwnow

23 days ago

1 reply

Especially in larger systems, how does one solve the issue of reaching the max value of an integer in their database? Sure for unsigned bigint thats hard to achieve but regular ints? Apps quickly outgrow that.

sbuttgereit

23 days ago

1 reply

OK... but that concern seems a bit artificial.. if bigints are appropriate: use them. If the table won't get to bigint sizes: don't. I've even used smallint for some tables I knew were going to be very limited in size. But I wouldn't worry about smallint's very limited number of values for those tables that required a larger size for more records: I'd just use int or bigint for those other tables as appropriate. The reality is that, unless I'm doing something very specific where being worried about the number of bytes will matter... I just use bigint. Yes, I'm probably being wasteful, but in the cases where those several extra bytes per record are going to really add up.... I probably need bigint anyway and in cases where bigint isn't going to matter the extra bytes are relatively small in aggregate. The consistency of simply using one type itself has value.

And for those using ints as keys... you'd be surprised how many databases in the wild won't come close to consuming that many IDs or are for workloads where that sort of volume isn't even aspirational.

Now, to be fair, I'm usually in the UUID camp and am using UUIDv7 in my current designs. I think the parent article makes good points, but I'm after a different set of trade-offs where UUIDs are worth their overhead. Your mileage and use-cases may vary.

zwnow

23 days ago

1 reply

Idk I use whatever scales best and that would be an close to infinite scaling key. The performance compromise is probably zeroed out once you have to adapt ur database to a different one supporting the current scale of the product. Thats for software that has to scale. Whole different story for stuff that doesnt have to grow obviously. I am in the UUID camp too but I dont care whether its v4 or v7.

wongarsu

23 days ago

It's not like there are dozens of options and you constantly have to switch. You just have to estimate if at maximum growth your table will have 32 thousand, 2 billion or 9 quintillion entries. And even if you go with 9 quintillion for all cases you still use half the space of a UUID

UUIDv4 are great for when you add sharding, and UUIDs in general prevent issues with mixing ids from different tables. But if you reach the kind of scale where you have 2 billion of anything UUIDs are probably not the best choice either

bigmadshoe

23 days ago

1 reply

I’m really no expert on sharding but if you’re using increasing ints why can’t you just shard on (id % n) or something?

0x457

23 days ago

1 reply

Because then you run into an issue when you 'n' changes. Plus, where are you increasing it on? This will require a single fault-tolerant ticker (some do that btw).

Once you encode shard number into ID, you got:

- instantly* know which shard to query

- each shard has its own ticker

* programatically, maybe visually as well depending on implementation

I had IDs that encode: entity type (IIRC 4 bit?), timestamp, shard, sequence per shard. We even had a admin page wher you can paste ID and it will decode it.

id % n is fine for cache because you can just throw whole thing away and repopulate or when 'n' never changes, but it usually does.

tmountain

22 days ago

^ This

SoftTalker

23 days ago

1 reply

This is mentioned, and in many applications you can safely say you will never need to shard.

tmountain

22 days ago

Yes, but if you do need to, it's much simpler if you were using UUID since the beginning. I'm personally not convinced that any of the tradeoffs that comes with a more traditional key are worth the headache that could come in a scenario where you do need to shard. I started a company last year, and the DB has grown wildly beyond our expectations. I did not expect this, and it continues to grow (good problem to have). It happens!

sgarland

23 days ago

There are plenty of ways to deal with that. You can shard by some other identifier (though I then question your table design), you can assign ranges to each shard, etc.

hans_castorp

23 days ago

With the latest Postgres version (>= 18) you do NOT need a plugin

dotancohen

23 days ago

8 replies

From the fine article:

  > Random values don’t have natural sorting like integers or lexicographic (dictionary) sorting like character strings. UUID v4s do have "byte ordering," but this has no useful meaning for how they’re accessed.

Might the author mean that random values are not sequential, so ordering them is inefficient? Of course random values can be ordered - and ordering by what he calls "byte ordering" is exactly how all integer ordering is done. And naive string ordering too, like we would do in the days before Unicode.

kreetx

23 days ago

3 replies

Using an UUIDv4 as primary key is a trade-off: you use it when you need to generate unique keys in a distributed manner. Yes, these are not datetime ordered and yes, they take 128 bits of space. If you can't live with this, then sure, you need to consider alternatives. I wonder if "Avoid UUIDv4 Primary Keys" is a rule of thumb though.

dotancohen

23 days ago

1 reply

If one needs timestamp ordering, then UUIDv7 is a good alternative.

But the author does not say timestamp ordering, he says ordering. I think he actually means and believes that there is some problem ordering UUIDv4.

kreetx

23 days ago

Yup. There are alternatives depending on what the situation is: with non-distributed, you could just use a sufficiently sized int (which can be rather small when the table is for e.g humans). You could add a separate timestamp column if that is important.

But if you need UUID-based lookup, then you might as well have it as a primary key, as that will save you an extra index on the actual primary key. If you also need a date and the remaining bits in UUIDv7 suffice for randomness, then that is a good option too (though this does essentially amount to having a composite column made up of datetime and randomness).

torginus

23 days ago

3 replies

I do not understand why 128 bits is considered too big - you clearly can't have less, as on 64 bits the collision probability on real world workloads is just too high, for all but the smallest databases.

Auto-incrementing keys can work, but what happens when you run out of integers? Also, distributed dbs probably make this hard, and they can't generate a key on client.

There must be something in Postgres that wants to store the records in PK order, which while could be an okay default, I'm pretty sure you can this behavior, as this isn't great for write-heavy workloads.

anarazel

23 days ago

The issue is more fundamental - if you have purely random keys, there's basically no spatial locality for the index data. Which means that for decent performance your entire index needs to be in memory, rather than just recent data. And it means that you have much bigger write amplification, since it's rare that the same index page is modified multiple times close-enough in time to avoid a second write.

vbezhenar

23 days ago

You won't run out of 64-bit integer. IMO, 64-bit integer (and even less for some tables that's not expected to grow much) it the best approach for internal database ID. If you want to expose ID, it might make sense to introduce second UUID for selected tables, if you want to hide internal ID.

p1necone

23 days ago

I doubt many real world use cases would run out of incrementing 64 bit ids - collisions if they were random sure, but i64 max is 9,223,372,036,854,775,807 - if each row took only 1 bit of space, that would be slightly more than an exabyte of data.

marcosdumay

23 days ago

> you use it when you need to generate unique keys in a distributed manner

Just to complement this with a point, but there isn't any mainstream database management system out there that is distributed on the sense that it requires UUIDs to generate its internal keys.

There exist some you can find on the internet, and some institutions have internal systems that behave this way. But as a near universal rule, the thing people know as a "database" isn't distributed on this sense, and if the column creation is done inside the database, you don't need them.

torginus

23 days ago

1 reply

To be polite, I don't think this article rests on sound technical foundations.

mrinterweb

23 days ago

How so?

K0nserv

23 days ago

1 reply

Isn't part of this that inserting into a btree index is more performant when the keys are increasing rather than being random? A random id will cause more re-balancing operations than always inserting at the end. Increasing ids are also more cache friendly

sgarland

23 days ago

1 reply

Yes, and for Postgres, it also causes WAL bloat due to the high likelihood of full page writes.

dotancohen

22 days ago

Could you expand on this? I use postgres often and though I could have an LLM explain what you mean, I think I'd learn more to hear it from you. Thank you.

dev_l1x_be

23 days ago

2 replies

Why would you need to order by UUID? I am missing something here. Most of the time we use UUID keys for being able to create a new key without coordination and most of the time we do not want to order by primary key.

hans_castorp

23 days ago

1 reply

I have seen a lot of people sort by (generated) integer values to return the rows "in creation order" assuming that sorting by an integer is somehow magically faster than sorting by a proper timestamp value (which give a more robust "creation order" sorting than a generated integer value).

sgarland

23 days ago

Assuming the integer value is the PK, it can in fact be much faster for MySQL / MariaDB due to InnoDB’s clustering index. If it can do a range scan over the PK, and that’s also the ORDER BY (with matching direction), congratulations, the rows are already ordered, no sort required. If it has to do a secondary index lookup to find the rows, this is not guaranteed.

sagarm

23 days ago

Most common database indexes are ordered, so if you are using UUIDv4 you will not only bloat the index you will also have poor locality. If you try to use composite keys to fix locality, you'll end up with an even more bloated index.

dagss

23 days ago

The point is how closely located data you access often is. If data is roughly sorted by creation time then data you access close to one another in time is stored close to one another on disk. And typically access to data is correlated with creation time. Not for all tables but for many.

Accessing data in totally random locations can be a performance issue.

Depends on lots of things ofc but this is the concern when people talk about UUID for primary keys being an issue.

andatki

23 days ago

Hi there. Thanks for the feedback. I updated that section to hopefully convey the intent more. The type of ordering we care about for this topic is really B-Tree index traversal when inserting new entries and finding existing entries (single and multiple values i.e. an IN clause, updates, deletes etc). There's a compelling example I re-created from Cybertec showing the pages needed and accessed for equivalent user-facing results, comparing storing PKs as big integers vs. UUID v4s, and how many more pages were needed for v4 UUIDs. I found that to be helpful to support my real world experience as a consultant on various "medium sized" Postgres databases (e.g. single to 10s of millions of records) where clients were experiencing excessive latency for queries, and the UUID v4 PK/FKs selection made for reasons earlier was one of the main culprits. The indexes wouldn’t fit into memory resulting in a lot of sequential scans. I’d confirm this by showing an alternative schema design and set of queries where everything was the same except integer PKs/FKs were used. Smaller indexes (fit in memory), reliable index scans, less latency, faster execution time.

crest

23 days ago

Any fixed sized bitstring has an obvious natural ordering, but since they're allocated randomly they lack the density and locality of sequential allocation.

hashmush

23 days ago

Agree, I did a double take on this too.

Values of the same type can be sorted if a order is defined on the type.

It's also strange to contrast "random values" with "integers". You can generate random integers, and they have a "sorting" (depending on what that means though)

waynenilsen

23 days ago

1 reply

What kills me is I can’t double click the thing to select it.

mrits

23 days ago

This application specific. iTerm2 doesn't break up by - why firefox does.

dimitrisnl

23 days ago

5 replies

Noob question, but why no use ints for PK, and UUIDs for a public_id field?

edding4500

23 days ago

2 replies

because if you dont come up with these ints randomly they are sequential which can cause many unwanted situations where people can guess valid IDs and deduce things from that data. See https://en.wikipedia.org/wiki/German_tank_problem

javaunsafe2019

23 days ago

1 reply

So We make things hard in the backend because of leaky abstractions? Doesn't make sense imo.

jcims

23 days ago

Decades of security vulnerabilities and compromises because of sequential/guessable PKs is (only!) part of the reason we're here. Miss an authorization check anywhere in the application and you're spoon-feeding entire tables to anyone with the inclination to ask for it.

javawizard

23 days ago

Hence the presumed implication behind the public_id field in GP's comment: anywhere identifiers are exposed, you use the public_id field, thereby preventing ID guessing while still retaining the benefits of ordered IDs where internal lookups are concerned.

Edit: just saw your edit, sounds like we're on the same page!

grim_io

23 days ago

2 replies

The article mentions microservices, which can increase the likelihood of collisions in sequential incremental keys.

One more reason to stay away from microservices, if possible.

bardsore

23 days ago

1 reply

Always try to avoid having two services using the same DB. Only way I'd ever consider sharing a DB is if only one service will ever modify it and all others only read.

grim_io

23 days ago

Good luck enforcing that :)

mrkeen

23 days ago

1 reply

The 'collision' is two service classes both trying to use one db.

If you separate them (i.e. microservices) the they no longer try to use one db.

grim_io

23 days ago

There is nothing stopping multiple microservices from using the same DB, so of course this will happen in practice.

Sometimes it might even be for a good reason.

alerighi

23 days ago

1 reply

If you put an index on the UUID field (because you have an API where you can retrieve objects with UUID) you have kind of the same problem, at least in Postgres where a primary key index or a secondary index are more or less the same (to the point is perfectly valid in pgsql to not have any primary key defined for the table, because storage on disk is done trough an internal ID and the indexes, being primary or not, just reference to the rowId in memory). Plus the waste of space of having 2 indexes for the same table.

Of course this is not always the case that is bad, for example if you have a lot of relations you can have only one table where you have the UUID field (and thus expensive index), and then the relations could use the more efficient int key for relations (for example you have an user entity with both int and uuid keys, and user attribute references the user with the int key, of course at the expense of a join if you need to retrieve one user attribute when retrieving the user is not needed).

torginus

23 days ago

You can create hash indexes in Postgres, so the secondary index uuid seems workable:

https://www.postgresql.org/docs/current/hash-index.html

scotty79

23 days ago

One of the benefits of UUIDs is that you can easily merge data coming from multiple databases. Auto-increments cause collisions.

dsego

23 days ago

I also think we can use a combination of a PID - persistent ID (I always thought it was public) and an auto-increment integer ID. Having a unique key helps when migrating data between systems or referencing a piece of data in a different system. Also, using serial IDs in URLs and APIs can reveal sensitive information, e.g. how many items there are in the database.

Lucasoato

23 days ago

1 reply

Hi, a question for you folks. What if I don’t like to embed timestamp in uuid as v7 do? This could expose to timing attacks in specific scenarios.

Also is it necessary to show uuid at all to customers of an API? Or could it be a valid pattern to hide all the querying complexity behind named identifiers, even if it could cost a bit in terms of joining and indexing?

The context is the classic B2B SaaS, but feel free to share your experiences even if it comes from other scenarios!

lwhi

23 days ago

1 reply

Wouldn't you need to expose UUID if you want to make use of optimistic locking?

Lucasoato

23 days ago

I feel that this is among the good reasons to keep exposing UUID in the API.

socketcluster

23 days ago

2 replies

My advice is: Avoid Blanket Statements About Any Technology.

I'm tired of midwit arguments like "Tech X is 50% faster than tech Y at performing operation Z, therefore it's the only logical choice in all situations!"

It's an infuriatingly silly argument because operation Z may only represent about 10% of the total CPU usage of the whole system... So what is promoted as a 50% gain may in fact be a 5% gain... Negligible. If everyone was looking at this performance 'advantage' rationally; nobody would think it's worth sacrificing important security or operational properties.

I don't know what happened to our industries; we're supposed to be intelligent people but I see developers falling for these obvious deceptive arguments over and over.

kunley

23 days ago

1 reply

Wasn't choosing uuids as ids falling for the deceptive argument in the first place?

christophilus

23 days ago

2 replies

Not really, no. They’re very convenient for certain problems and work really well in general. I’ve never had a performance issue where the problem boiled down to my use of UUID.

kunley

23 days ago

1 reply

What are these certain problems, if I may ask?

socketcluster

23 days ago

2 replies

A major one for me is preventing duplicate records.

If the client generates a UUID and POSTs a new resource to insert it into the database; if there is a connection failure and the client does not receive a success response from the server, they cannot know whether the record was inserted or not without making an expensive and cumbersome additional call to check... If the IDs are auto-incremented on the server and the client posts the same object again without a UUID on it, it will create a duplicate record in the database table...

On the other hand, if the client generates a UUID for the objects it wants to create, then it can safely resend any object after a timeout and there is no risk of double-insertion.

mkleczek

23 days ago

2 replies

Preferably, you would design you APIs and services to be idempotent (ie. use PUT not POST etc.)

Using idempotency identifier is the last resort in my book.

dpark

23 days ago

How is URI not an idempotency identifier for PUT?

socketcluster

23 days ago

Still, UUID is probably the simplest and most reliable way to generate such idempotency identifiers.

kunley

23 days ago

1 reply

Ehm.. so you're saying that INSERT ... RETURNING id is not atomic from the client's pov because something terrible could happen just when client is receiving the answer inside its SQL driver?

socketcluster

23 days ago

1 reply

I'm actually more thinking about the client sitting on the front-end like a single page app. Network instability could cause the response to not reach the front-end after a successful insert. This wouldn't be extremely common though would definitely be a problem if you have a certain number of users. Can cause issues that are hard to debug.

kunley

23 days ago

OK got it. I was thinking about SQL client, not about client of a REST service. With that distinction in mind, the reasoning makes sense; thank you.

danparsonson

23 days ago

You never having seen the problem doesn't mean it never happens; I have dealt with a serious performance problem in the past that was due to excessive page fragmentation due to a GUID PK.

To your original point, these are heuristics; there isn't always time to dig into every little architectural decision, so having a set of rules of thumb on hand helps to preempt problems at minimal cognitive cost. "Avoid using a GUID as a primary key if you can" is one of mine.

thraxil

23 days ago

2 replies

Yep. We have tables that use UUIDv4 that have 60M+ rows and don't have any performance problems with them. Would some queries be faster using something else? Probably, but again, for us it's not close to being a bottleneck. If it becomes a problem at 600M or 6B rows, we'll deal with it then. We'll probably switch to UUIDv7 at some point, but it's not a priority and we'll do some tests on our data first. Does my experience mean you should use UUIDv4? No. Understand your own system and evaluate how the tradeoffs apply to you.

p2detar

23 days ago

1 reply

[delayed]

thraxil

22 days ago

Nope. Out of the box GCP Cloud SQL instance.

ezekg

23 days ago

I have tables that have billions of rows that use UUIDv4 primary keys and I haven't encountered any issues either. I do use UUIDv7 for write-heavy tables, but even then, I got a way bigger performance boost from batching inserts than switching from UUIDv4 to UUIDv7. Issue is way overblown.

bux93

23 days ago

2 replies

Long article about why not to use UUIDv4 as Primary Keys, but.. Who is doing so? And why are they doing that? How would you solve their requirements? Just throwing out "you can use UUIDv7" doesn't help with, e.g., the size they take up.

Aren't people using (big)ints are primary keys, and using UUIDs as logical keys for import/export, solving portability across different machines?

Sayrus

23 days ago

1 reply

UUIDs are usually the go-to solution to enumeration problems. The space is large enough that an attacker cannot guess how many X you have (invoices, users, accounts, organizations, ...). When people replace the ints by UUIDv4, they keep them as primary keys.

bruce511

23 days ago

1 reply

I'd add that it's also used when data is created in multiple places.

Consider say weather hardware. 5 stations all feeding into a central database. They're all creating rows and uploading them. Using sequential integers for that is unnecessarily complex (if even possible.)

Given the amount of data created on phones and tablets, this affects more situations than first assumed.

It's also very helpful in export / edit / update situations. If I export a subset of the data (let's say to Excel), the user can edit all the other columns and I can safely import the result. With integer they might change the ID field (which would be bad). With uuid they can change it, but I can ignore that row (or the whole file) because what they changed it to will be invalid.

nrhrjrjrjtntbt

23 days ago

Yes and the DB might be columnular or a distributed KV, sidestepping the index problem.

andatki

23 days ago

This was written based on working on several Postgres databases at different companies of “medium” size as a consultant, that had excessive IO and latency and used UUID v4 PKs/FKs. They’re definitely out there. We could transform the schema for some key tables as a demonstration with big int equivalents and show the IO latency reduction. With that said, the real world PK data type migration is costly but becomes a business decision of whether to do or not.

cebert

23 days ago

1 reply

Using UUIDs as primary keys in non-relational databases like DynamoDB is valid and doesn’t raise the concerns mentioned in the article.

andatki

23 days ago

Good point that the post should be made clear it’s referring only to my experience with Postgres.

reactordev

23 days ago

1 reply

[delayed]

nrhrjrjrjtntbt

23 days ago

1 reply

Atlassian settles for longer "ARIs" for this (e.g.https://developer.atlassian.com/cloud/guard-detect/developer...) composed of guids which allow a scheme like the Amazon ARN to pass around.

reactordev

23 days ago

[delayed]

K0nserv

23 days ago

1 reply

An additional thing I learned when I worked on a ulid alternative over the weekend[0] is: Postgres's internal Datum type is at most 64 bits which means every uuid requires heap allocation (at least until we get 128 bit machines).

0: https://bsky.app/profile/hugotunius.se/post/3m7wvfokrus2g

qntmfred

23 days ago

2 replies

you may be interested in this postgres extension as well

https://github.com/blitss/typeid-postgres

K0nserv

18 days ago

Finished an initial version of mine https://github.com/k0nserv/plid

K0nserv

23 days ago

That is indeed interesting.

I have slightly different goals for my version. I want everything to fit in 128 bits so I'm sacrificing some of the random bits, I'm also making sure the representation inside Postgres is also exactly 128 bits. My initial version ended up using CBOR encoding and being 160 bits.

Mine dedicates 16 bits for the prefix allowing up to 3 characters (a-z alphabet).

vintermann

23 days ago

2 replies

A prime example of premature optimization.

Permanent identifiers should not carry data. This is like the cardinal sin of data management. You always run into situations where the thing you thought, "surely this never changes, so it's safe to squeeze into the ID to save a lookup". Then people suddenly find out they have a new gender identity, and they need a last final digit in their ID numbers too.

Even if nothing changes, you can run into trouble. Norwegian PNs have your birth date (in DDMMYY format) as the first six digits. Surely that doesn't change, right? Well, wrong, since although the date doesn't change, your knowledge of it might. Immigrants who didn't know their exact date of birth got assigned 1. Jan by default... And then people with actual birthdays on 1 Jan got told, "sorry, you can't have that as birth date, we've run out of numbers in that series!"

Librarians in the analog age can be forgiven for cramming data into their identifiers, to save a lookup. When the lookup is in a physical card catalog, that's somewhat understandable (although you bet they could run into trouble over it too). But when you have a powerful database at your fingertips, use it! Don't make decisions you will regret just to shave off a couple of milliseconds!

oncallthrow

23 days ago

1 reply

It sounds to me like you’re just arguing for premature optimization of another kind (specifically, prematurely changing your entire architecture for edge cases that probably won’t ever happen to you).

vintermann

23 days ago

If you have an architecture already, obviously it's hard to change and you may want to postpone it until those edge cases which probably won't ever happen to you, happen. But for new architectures, value your own grey hairs over small performance improvements.

tacone

23 days ago

2 replies

Fantastic real life example. Italian PNs carry also the gender, which something you can change surgically, and you'll eventually run into the issue when operating at scale.

I don't agree with the absolute statement, though. Permanent identifiers should not generally carry data. There are situations where you want to have a way to reconciliate, you have space or speed constraints, so you may accept the trade off, md5 your data and store it in a primary index as a UUID. Your index will fragment and thus you will vacuum, but life will still be good overall.

mckirk

23 days ago

1 reply

I'm not sure whether that was intended, but the 'operating at scale' actually made me laugh out loud :D

benterix

23 days ago

I have to admit an unintended chuckle, too.

cozyman

23 days ago

2 replies

how does one change their gender surgically?

delichon

23 days ago

You can't, but since gender isn't defined by anything physical, there's no need.

WalterSlovotsky

23 days ago

The preferred method would be gender affirming surgery.

dfox

23 days ago

> Creating obfuscated values using integers

While that is often neat solution, do not do that by simply XORing the numbers with constant. Use a block cipher in ECB mode (If you want the ID to be short then something like NSA's Speck comes handy here as it can be instantiated with 32 or 48 bit block).

And do not even think about using RC4 for that (I've seen that multiple times), because that is completely equivalent to XORing with constant.

mcny

23 days ago

Postgresql 18 released in September and has uuidv7

https://www.postgresql.org/docs/current/functions-uuid.html

scary-size

23 days ago

This reminds me about this old gist for generating Firebase-like "push IDs" [1]. Those have some nicer properties.

[1] https://gist.github.com/mikelehen/3596a30bd69384624c11

raxxorraxor

23 days ago

> Do not assume that UUIDs are hard to guess; they should not be used as security capabilities

The issue is that is true for more or less all capability URLs. I wouldn't recommend UUIDs per se here, probably better to just use a random number. I have seen UUIDs for this in practice though and these systems weren't compromised because of that.

I hate the tendency that password recovery flows for example leave the URL valid for 5 minutes. Of course these URLs need to have a limited life time, but mail isn't a real time communication medium. There is very little security benefit from reducing it from 30 minutes to 5 minutes for example. You are not getting "securer" this way.

277 more comments available on Hacker News

View full discussion on Hacker News

ID: 46272487Type: storyLast synced: 12/18/2025, 10:00:33 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN