Bypass Postgresql Catalog Overhead with Direct Partition Hash Calculations

Posted4 months agoActive4 months ago

shayonj

35 points

13 comments

shayon.devTechstory

skepticalmixed

Debate

70/100

PostgresqlDatabase OptimizationPartitioning

Key topics

Postgresql

Database Optimization

Partitioning

The article discusses bypassing PostgreSQL catalog overhead by directly calculating partition hashes, sparking debate among commenters about the usefulness and potential drawbacks of this approach.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

84-96h

Avg / period

6.5

Comment distribution13 data points

Loading chart...

Based on 13 loaded comments

Key moments

01Story posted
Aug 23, 2025 at 3:05 PM EDT
4 months ago
Step 01
02First comment
Aug 27, 2025 at 3:31 AM EDT
4d after posting
Step 02
03Peak activity
12 comments in 84-96h
Hottest window of the conversation
Step 03
04Latest activity
Aug 28, 2025 at 12:32 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (13 comments)

Showing 13 comments

regecks

4 months ago

1 reply

I’d be interested to see a benchmark of actually selecting rows by manually specifying the partition vs not.

This benchmark seems to be pure computation of the hash value, which I don’t think is helpful to test the hypothesis. A lot can happen at actual query time that this benchmark does not account for.

shayonjAuthor

4 months ago

That's fair. I had an EXPLAIN ANALYZE which shows the time it takes for planning and execution. And sending the query directly to the table outperforms that.

More to come soon.

wengo314

4 months ago

1 reply

this looks like maintenance nightmare going forward, but i could be wrong.

If you are stuck on specific pg version for a while, maybe it's worth it.

nopurpose

4 months ago

hash won't change on PG version upgrade, because that would cause a massive reshuffling by pg_upgrade: something PG managed to avoid so far.

Elucalidavah

4 months ago

Tangential: is "without requiring knowledge of data patterns" a frequently useful requirement? I.e. isn't knowledge of data patterns basically required for any performance optimization?

h1fra

4 months ago

Hash partitioning is nice until you have one massive table because of your biggest enterprise customer. Then your solutions are more limited. As the blog post suggest, we could hash again by another id, not always possible but would also exponentially increase the number of tables for all customers...

vbilopav

4 months ago

I did some testing on PostgreSQL 17 and found out that this special function 8s not needed.

ysleepy

4 months ago

Without a comparison of letting postgres calculate the partition this is just useless.

And who in their right mind would calculate a hash using a static SQL query that isn't even using the pg catalog hashing routine but a reimplementation.

I'm baffled.

sgarland

4 months ago

You can do the same thing in MySQL, though its hash partition selection algorithm is far simpler: MOD(expr, N), where expr is your chosen expression on a given column, e.g. TO_DAYS(created_at), and N is the number of partitions you specified for the table.

So you can precompute the partitions in the app, and then explicitly specify the partitions in the query. Though there isn’t a ton of value in this for any large range of dates, since you’ll end up hitting all partitions anyway.

For something like a user id, it might make sense. If you’re using something alphanumeric as a user id, you can pass it through CRC32() first, or just use KEY partitioning on the column directly.

itsthecourier

4 months ago

nice to see ruby getting ahead

pella

4 months ago

Note: PostgreSQL 18 includes many optimisations related to partitions; because of this, the improvement may be smaller. ( https://www.postgresql.org/docs/18/release-18.html )

  -- "Improve the efficiency of planning queries accessing many partitions (Ashutosh Bapat, Yuya Watari, David Rowley)"

  ..."The actual performance increases here are highly dependent on the number
  of partitions and the query being planned.  Performance increases can be
  visible with as few as 8 partitions, but the speedup is marginal for
  such low numbers of partitions.  The speedups become much more visible
  with a few dozen to hundreds of partitions.  With some tested queries
  using 56 partitions, the planner was around 3x faster than before.  For
  use cases with thousands of partitions, these are likely to become
  significantly faster.  Some testing has shown planner speedups of 60x or
  more with 8192 partitions."
  https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d69d45a5a

nopurpose

4 months ago

This sounds really interesting. Query planner acquires lightweight lock on every table partition and its indexes. There can be only 16 fast locks, exceed that number and locking becomes significantly more expensive slowing down queries significantly. Workaround I have used so far was to use plan_cache_mode=force_generic which prevents re-planning queries on parameters change.

This approach might be a better option, but sadly app needs to be modified to make use of it.

optikalfire

4 months ago

Is this correct?

-- Parent table CREATE TABLE events ( id bigint, user_id bigint, event_type integer, payload text, created_at timestamp ) PARTITION BY HASH (user_id);

-- First level: 16 partitions by user_id CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (modulus 16, remainder 0) PARTITION BY HASH (event_type);

shouldn't that be by user_id for the first 16 tables?

View full discussion on Hacker News

ID: 44998276Type: storyLast synced: 11/20/2025, 6:42:50 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN