Is Postgres Read Heavy or Write Heavy?
Posted3 months agoActive2 months ago
crunchydata.comTechstoryHigh profile
controversialnegative
Debate
80/100
PostgresqlDatabase PerformanceSQL
Key topics
Postgresql
Database Performance
SQL
The article 'Is Postgres read heavy or write heavy?' sparks debate on HN, with commenters criticizing its title, content, and perceived AI-generated quality, while discussing Postgres performance and workload analysis.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1d
Peak period
52
Day 2
Avg / period
18
Comment distribution54 data points
Loading chart...
Based on 54 loaded comments
Key moments
- 01Story posted
Oct 17, 2025 at 1:06 PM EDT
3 months ago
Step 01 - 02First comment
Oct 18, 2025 at 5:33 PM EDT
1d after posting
Step 02 - 03Peak activity
52 comments in Day 2
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 29, 2025 at 10:12 PM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45619108Type: storyLast synced: 11/20/2025, 5:11:42 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Postgres an an OLTP databases, which are designed for write heavy workloads.
While that being said, I agree most people have read-heavy needs.
Most applications are used operationally or have a mix of read and write. Even on applications where the user can only consume content present there, there is often more than enough data capture just tracking page history to be relatively write heavy.
I think a very small amount of data is read very often. However, until your DB gets very large, that data is going to end up as data pages cached in memory. So that data is extremely cheap to read.
I also think a significant amount of data that is generated in an OLTP system is written and never read, but you still had to pay the cost to write it. If you have an audit log, chances are you never need to look at it for any one piece of data. But you definitely had to write all the metadata for it.
But I'm also assuming that writes are at least 10 times as expensive as reads. More so if what you're modifying has indexes, since indexes are often functionally identical to a partial copy of the entire table. Indeed, I think that 10 times mark is conservative. Most RDBMSs use transaction logging and some kind of locking on writes. There's data validation and integrity checks on inserts and updates (and deletes if you have foreign keys).
I think 1 write to 10 reads is still write-heavy.
Pretty easy to tune the suppled SQL query to suit your opinion.
Pretty sure you just need to tweak the 2nd line
ratio_target AS (SELECT 5 AS ratio),
The write traffic may be very write heavy, but then you have many, many users who need to see that data. The question is whether the database or a copy of the data from the database is what services that interest.
If you mediate all reads through a cache, then you have split the source of truth from the system of record. And then the read traffic on the system of record is a much lower ratio.
https://youtu.be/PvDBGqEykvc?t=7
Also: HN needs to upgrade its bot upvoting detection tech. This is embarrassing. It was proper ownage of the HN #1 position for like 15 minutes straight. And then like #2-3 for an hour or so.
Personally I agree that it's both possible to detect this better and would actually drastically improve the quality of this site if that wasn't the meta and think it's something that should be openly discussed (in terms of practical suggestions).
But afaik this site doesn't even track that many metrics (e.g. does it even count clicks on articles?) so I think what I'm proposing is a bit antithetical to the minimalist nature.
But then who left to look at the recruitment ads if the quality of the content, comments, and community degrades enough that everyone stops coming?
All I know is that pretty much nobody here knows enough about the whole picture to have a meaningfully informed opinion. So a lot of these opinions are going to be driven by their imagination of the whole picture.
https://news.ycombinator.com/newsguidelines.html
Indeed. On your schema. On your usage. On your app. On your users.
I’ve been mostly using the `pg_stat_statements` table (the second technique) to find out whether my workload is read or write heavy, it’s plenty good in most situations.
For a start, the article does not mention any other database. I don't know how you can say something is read or write heavy without comparing it to something else. It doesn't even compare different queries on the same database. Like, they just wrote a query and it does a lot of reads - so what? There's nothing here. Am I going mad? Why does this article exist?
Queries that need to operate on more data than will fit in the allocated working memory will write to a temporary table on disk, then in some cases perform an operation on that temporary table like sorting the whole thing and finally, after it's done delete it which is even more disk write stress.
It's not really about whether it's ready heavy or write heavy, it's about whether it's usage creates Disk I/O stress.
You can write millions of increment integers and while technically that's "write heavy", there's no stress involved because you're just changing the value in a defined space that's already been allocated. Update space that is more dynamic, like growing a TEXT or JSON field frequently...it's a different story.
for a data lake or analytics prefer columns? / read heavy
what would a row based db be better for? OLTP / write heavy?
Things get muddied when things like the HTAP stuff are bandied about where they promise the best of both worlds.
Reporting on your Ordering System is an OLAP problem though. Generally an OLAP database stores data on disk in a way that it only needs to read the selected columns and the performance is better with wider columns, i.e. lots of duplicated data ( JOINs are slow ).
So like, you select * from Customer, Order, Items, Device, Staff, stick it in your OLAP database that's where customers should generate reports. This both makes reporting more performant, but it also removes the problem from the critical path of your POS device syncing and working.
This has the added benefit that updating your product name won't update the historical log of what was done at the time, because what was done at the time was done at the time ( but you can still map on like productId if you think the data is relevant. )
At scale you want to pop the writes on a queue and design those devices to be as async as possible.
This is what happens when you just build it pure OLTP.
https://www.linkedin.com/pulse/nobody-expects-thundering-her...
This was an ~£19m ARR POS company dying because of architecture, now doing £150m+ ARR. ( the GTV of the workloads are multiple times that, but I can't remember them ).
https://www.oracle.com/database/in-memory/
https://www.oracle.com/database/technologies/exadata/softwar...
That eliminates the complexity of running two databases and keeping them in sync.
Disclosure: work part time for them, own stock
This sounds like the one big table approach. Which in my experience is very difficult to do right and only makes sense in the data mart sense.
Google’s Adsense data model I’m BigQuery is like this and works well but gets so wide it’s difficult. Then again when you imbed things like arrays and structs and can unnest as needed avoiding joins can be nice.
I’ve found star schemas to work out just fine in data marts. Just do them properly. Join as needed. And a good engine will handle the rest. We’ve has no issues with a similar model in Snowflake for example. Of course YMMV.
Also, in my experience "Faster SSD Storage" point applies to both read and write heavy workloads.
IMO this submission needs flags not upvotes.
But title aside, I found this post very useful for better understanding PG reads and writes (under the hood) and how to actually measure your workload
Curious if the tuning actions any different if you're using a non-vanilla storage engine like AWS Aurora or GCP AlloyDB or Neon?
1 more comments available on Hacker News