Bypass Postgresql Catalog Overhead with Direct Partition Hash Calculations
Posted4 months agoActive4 months ago
shayon.devTechstory
skepticalmixed
Debate
70/100
PostgresqlDatabase OptimizationPartitioning
Key topics
Postgresql
Database Optimization
Partitioning
The article discusses bypassing PostgreSQL catalog overhead by directly calculating partition hashes, sparking debate among commenters about the usefulness and potential drawbacks of this approach.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
4d
Peak period
12
84-96h
Avg / period
6.5
Comment distribution13 data points
Loading chart...
Based on 13 loaded comments
Key moments
- 01Story posted
Aug 23, 2025 at 3:05 PM EDT
4 months ago
Step 01 - 02First comment
Aug 27, 2025 at 3:31 AM EDT
4d after posting
Step 02 - 03Peak activity
12 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 28, 2025 at 12:32 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 44998276Type: storyLast synced: 11/20/2025, 6:42:50 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
This benchmark seems to be pure computation of the hash value, which I don’t think is helpful to test the hypothesis. A lot can happen at actual query time that this benchmark does not account for.
More to come soon.
If you are stuck on specific pg version for a while, maybe it's worth it.
And who in their right mind would calculate a hash using a static SQL query that isn't even using the pg catalog hashing routine but a reimplementation.
I'm baffled.
So you can precompute the partitions in the app, and then explicitly specify the partitions in the query. Though there isn’t a ton of value in this for any large range of dates, since you’ll end up hitting all partitions anyway.
For something like a user id, it might make sense. If you’re using something alphanumeric as a user id, you can pass it through CRC32() first, or just use KEY partitioning on the column directly.
This approach might be a better option, but sadly app needs to be modified to make use of it.
-- Parent table CREATE TABLE events ( id bigint, user_id bigint, event_type integer, payload text, created_at timestamp ) PARTITION BY HASH (user_id);
-- First level: 16 partitions by user_id CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (modulus 16, remainder 0) PARTITION BY HASH (event_type);
shouldn't that be by user_id for the first 16 tables?