Is Postgres Read Heavy or Write Heavy?

Posted3 months agoActive2 months ago

soheilpro

192 points

55 comments

crunchydata.comTechstoryHigh profile

controversialnegative

Debate

80/100

PostgresqlDatabase PerformanceSQL

Key topics

Postgresql

Database Performance

SQL

The article 'Is Postgres read heavy or write heavy?' sparks debate on HN, with commenters criticizing its title, content, and perceived AI-generated quality, while discussing Postgres performance and workload analysis.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

Day 2

Avg / period

Comment distribution54 data points

Loading chart...

Based on 54 loaded comments

Key moments

01Story posted
Oct 17, 2025 at 1:06 PM EDT
3 months ago
Step 01
02First comment
Oct 18, 2025 at 5:33 PM EDT
1d after posting
Step 02
03Peak activity
52 comments in Day 2
Hottest window of the conversation
Step 03
04Latest activity
Oct 29, 2025 at 10:12 PM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (55 comments)

Showing 54 comments of 55

alberth

3 months ago

1 reply

Odd that OLTP wasn’t mentioned in the article.

Postgres an an OLTP databases, which are designed for write heavy workloads.

While that being said, I agree most people have read-heavy needs.

da_chicken

3 months ago

2 replies

I disagree. I think the only people that have read-heavy needs are big data and data warehouses. AI being hot right now doesn't mean big data is the only game in town.

Most applications are used operationally or have a mix of read and write. Even on applications where the user can only consume content present there, there is often more than enough data capture just tracking page history to be relatively write heavy.

withinboredom

3 months ago

1 reply

Hmmm. Not really. Yes, everything is a mix, but for applications, it very much is on the read-heavy side. Think about how many queries you have to do just to display an arbitrary page. You might, maybe, just maybe, net 2-3 writes vs. hundreds of reads. If that starts to balance out, or even flip, then you probably need to rethink your database as you start to exit traditional db usage patterns. But <30% writes is not write-heavy.

da_chicken

3 months ago

1 reply

I am thinking about that. I don't think most data is read that often in an OLTP system.

I think a very small amount of data is read very often. However, until your DB gets very large, that data is going to end up as data pages cached in memory. So that data is extremely cheap to read.

I also think a significant amount of data that is generated in an OLTP system is written and never read, but you still had to pay the cost to write it. If you have an audit log, chances are you never need to look at it for any one piece of data. But you definitely had to write all the metadata for it.

But I'm also assuming that writes are at least 10 times as expensive as reads. More so if what you're modifying has indexes, since indexes are often functionally identical to a partial copy of the entire table. Indeed, I think that 10 times mark is conservative. Most RDBMSs use transaction logging and some kind of locking on writes. There's data validation and integrity checks on inserts and updates (and deletes if you have foreign keys).

I think 1 write to 10 reads is still write-heavy.

bigiain

3 months ago

> I think 1 write to 10 reads is still write-heavy.

Pretty easy to tune the suppled SQL query to suit your opinion.

Pretty sure you just need to tweak the 2nd line

ratio_target AS (SELECT 5 AS ratio),

hinkley

3 months ago

I think read replicas disagree with that pretty strongly.

The write traffic may be very write heavy, but then you have many, many users who need to see that data. The question is whether the database or a copy of the data from the database is what services that interest.

If you mediate all reads through a cache, then you have split the source of truth from the system of record. And then the read traffic on the system of record is a much lower ratio.

gdulli

3 months ago

2 replies

Is a ball red or green? How long is a piece of string?

phalangion

3 months ago

1 reply

Did you read the article? It’s about how to tell if your database is read or write heavy.

johncolanduoni

3 months ago

I think a large part of what people are responding to here is the title, which comes off as something someone who doesn't actually understand the nature of a database workload would write. It may be a simple typo, but "Is YOUR Postgres Read Heavy or Write Heavy?" is the question that can have an answer. "Is Postgres More Appropriate for Read Heavy or Write Heavy workloads?" would also be fine, but it would be a totally different article from the written one.

Rendello

3 months ago

How thick is WAL?

https://youtu.be/PvDBGqEykvc?t=7

lysace

3 months ago

2 replies

Insipid text.

Also: HN needs to upgrade its bot upvoting detection tech. This is embarrassing. It was proper ownage of the HN #1 position for like 15 minutes straight. And then like #2-3 for an hour or so.

zug_zug

3 months ago

2 replies

Off topic, but I do feel like there is a significant number of things that mysteriously get to frontpage with 12-40 upvotes, zero comments, and then sit there getting no more upvotes / comments for like 20 minutes.

Personally I agree that it's both possible to detect this better and would actually drastically improve the quality of this site if that wasn't the meta and think it's something that should be openly discussed (in terms of practical suggestions).

lysace

3 months ago

1 reply

It is so incredibly obvious when see it, yes.

zug_zug

3 months ago

Yeah, I was thinking about it, and I do think if you graphed it out it would be pretty easy to see.

But afaik this site doesn't even track that many metrics (e.g. does it even count clicks on articles?) so I think what I'm proposing is a bit antithetical to the minimalist nature.

add-sub-mul-div

3 months ago

1 reply

They don't care what gets ranked where other than their own recruitment and other submissions, for which this site exists.

Waterluvian

3 months ago

I don’t think this holds up to higher order thinking. On the surface, sure that makes sense.

But then who left to look at the recruitment ads if the quality of the content, comments, and community degrades enough that everyone stops coming?

All I know is that pretty much nobody here knows enough about the whole picture to have a meaningfully informed opinion. So a lot of these opinions are going to be driven by their imagination of the whole picture.

akerl_

3 months ago

The rules are pretty clear that you should not do this.

https://news.ycombinator.com/newsguidelines.html

jagged-chisel

3 months ago

1 reply

> When someone asks about [database] tuning, I always say “it depends”.

Indeed. On your schema. On your usage. On your app. On your users.

mikepurvis

3 months ago

1 reply

If it didn’t depend they’d just make the “tuned” value the default.

acscott

3 months ago

Exactly. The parameters you can configure are there due to a lack of automating those since what you want to optimize for might be different than an automaton would.

rednafi

3 months ago

1 reply

Despite using CTEs, I found the first query quite impenetrable. Could be because I don’t spend that much time reading non-trivial SQL queries.

I’ve been mostly using the `pg_stat_statements` table (the second technique) to find out whether my workload is read or write heavy, it’s plenty good in most situations.

teej

3 months ago

pg_ system tables aren’t built for direct consumption. You typically have to massage them quite a bit to measure whatever statistic you need.

Normal_gaussian

3 months ago

1 reply

At the time of writing the query has a small error. The filter is checking for reads and writes, but it should be reads or writes.

    WHERE
     -- Filter to only show tables that have had some form of read or write activity
    (s.n_tup_ins + s.n_tup_upd + s.n_tup_del) > 0
    AND
     (si.heap_blks_read + si.idx_blks_read) > 0
 )

Should be OR

v3ss0n

3 months ago

Yeah definitely AI

J_McQuade

3 months ago

1 reply

This, as a few other commenters have mentioned, is a terrible article.

For a start, the article does not mention any other database. I don't know how you can say something is read or write heavy without comparing it to something else. It doesn't even compare different queries on the same database. Like, they just wrote a query and it does a lot of reads - so what? There's nothing here. Am I going mad? Why does this article exist?

acscott

3 months ago

A little context may be of help. Maybe a better headline for the article would have been, "How Can You Determine if your PostgreSQL Instance's Workload is Read-Heavy or Write-Heavy?" It's useful to know to help optimize settings and hardware for your workload as well as to nkow whether an index might be useful or not. Most major DBMSs will have some way to answer this question, the article is aimed at PostgreSQL only.

moomoo11

3 months ago

1 reply

This article quality makes me not trust the company.

mxey

3 months ago

Yeah, this seems bad strategy for a company that wants to sell you on their PostgreSQL expertise.

SteveLauC

3 months ago

1 reply

Regarding write-heavy workloads, especially for Postgres, I think we really need to distinguish between INSERTs and UPDATEs, because every update to a tuple in Postgres duplicates the whole tuple due to its MVCC implementation (if you use the default heap storage engine)

brightball

3 months ago

One thing that catches people by surprise is that read heavy workloads can generate heavy writes.

Queries that need to operate on more data than will fit in the allocated working memory will write to a temporary table on disk, then in some cases perform an operation on that temporary table like sorting the whole thing and finally, after it's done delete it which is even more disk write stress.

It's not really about whether it's ready heavy or write heavy, it's about whether it's usage creates Disk I/O stress.

You can write millions of increment integers and while technically that's "write heavy", there's no stress involved because you're just changing the value in a defined space that's already been allocated. Update space that is more dynamic, like growing a TEXT or JSON field frequently...it's a different story.

scottcodie

3 months ago

2 replies

I've spent my entire career developing databases (oracle, cassandra, my own database startup). Knowing if your workload is read or write heavy is one of the first questions when evaluating database choice, and is critical for tuning options. I would give this article hate just because it feels partially written by AI and the title needs a possessive 'your' in it, but its core ideas are sound and frame the issue correctly.

EGreg

3 months ago

1 reply

What about asking whether it should be a row database (sqlite) or column database (duckdb)...

for a data lake or analytics prefer columns? / read heavy

what would a row based db be better for? OLTP / write heavy?

gigatexal

3 months ago

2 replies

General rule of thumb is OLAP (DuckDB, BigQuery, redshift, etc) db’s are better at reads (think analytics) and OLTP (Postgres and MySQL and salute) ones are better for writes (think order systems, point of sale).

Things get muddied when things like the HTAP stuff are bandied about where they promise the best of both worlds.

ownagefool

3 months ago

3 replies

Ordering System is a good example because you typically want both. Your base logic will probably exist in OLTP with joins and normalised data, and you'll generally have local on-device OLTP databases.

Reporting on your Ordering System is an OLAP problem though. Generally an OLAP database stores data on disk in a way that it only needs to read the selected columns and the performance is better with wider columns, i.e. lots of duplicated data ( JOINs are slow ).

So like, you select * from Customer, Order, Items, Device, Staff, stick it in your OLAP database that's where customers should generate reports. This both makes reporting more performant, but it also removes the problem from the critical path of your POS device syncing and working.

This has the added benefit that updating your product name won't update the historical log of what was done at the time, because what was done at the time was done at the time ( but you can still map on like productId if you think the data is relevant. )

At scale you want to pop the writes on a queue and design those devices to be as async as possible.

This is what happens when you just build it pure OLTP.

https://www.linkedin.com/pulse/nobody-expects-thundering-her...

This was an ~£19m ARR POS company dying because of architecture, now doing £150m+ ARR. ( the GTV of the workloads are multiple times that, but I can't remember them ).

mike_hearn

3 months ago

1 reply

Right, you want both, which is why databases like Oracle can store data in both forms. You can enable columnar formats on tables for both on disk and in-memory modes, where those columns can then be processed at high speed with lots of SIMD operations, but the data is kept consistent between them.

https://www.oracle.com/database/in-memory/

https://www.oracle.com/database/technologies/exadata/softwar...

That eliminates the complexity of running two databases and keeping them in sync.

Disclosure: work part time for them, own stock

gigatexal

3 months ago

Getting into bed with Oracle is like selling your soul to loan sharks. Probably good in the beginning and only if you have a lot of money.

gigatexal

3 months ago

> Reporting on your Ordering System is an OLAP problem though. Generally an OLAP database stores data on disk in a way that it only needs to read the selected columns and the performance is better with wider columns, i.e. lots of duplicated data ( JOINs are slow ).

This sounds like the one big table approach. Which in my experience is very difficult to do right and only makes sense in the data mart sense.

Google’s Adsense data model I’m BigQuery is like this and works well but gets so wide it’s difficult. Then again when you imbed things like arrays and structs and can unnest as needed avoiding joins can be nice.

I’ve found star schemas to work out just fine in data marts. Just do them properly. Join as needed. And a good engine will handle the rest. We’ve has no issues with a similar model in Snowflake for example. Of course YMMV.

gigatexal

3 months ago

FWIW SQLserver can do the same with its column store tables. Idk though. I stopped using such when I moved to data Eng and we just use open things (clickhouse, DuckDB, etc) except for snowflake.

citrin_ru

3 months ago

Column storage (e.g. ClickHouse) is well suited for storing logs and log-like data even if writes (insert-only) > reads.

biehl

3 months ago

Do you happen to know similar queries for Oracle?

spprashant

3 months ago

1 reply

The thing we really strive for with Postgres is to keep the UPDATE traffic as low as possible. Because of MVCC, table bloat and the subsequent vacuum jobs will kill your IO even further. This means designing the applications and data model in a way that most write traffic is INSERT, with occasional UPDATEs which cannot be avoided. If you know you are going to have a UPDATE heavy table, be sure to set the fill_factor on the table ahead of time to optimize for it.

Also, in my experience "Faster SSD Storage" point applies to both read and write heavy workloads.

perfmode

3 months ago

2 replies

do upserts (on conflict do) count as update?

whatevaa

3 months ago

If they update, yes.

spprashant

3 months ago

Yeah ON CONFLICT and MERGE are just syntactic sugar. At a fundamental level there are only Insert, update, delete and select operations on a SQL row.

AtlasBarfed

3 months ago

1 reply

Does anyone disagree that if it isn't a Log Structured Merge Tree engine, you aren't write-heavy?

tucnak

3 months ago

1 reply

It's been shown in Postgres https://www.paradedb.com/blog/lsm_trees_in_postgres

justinhj

2 months ago

Oddly that's been removed, but it is still in the wayback machine. https://web.archive.org/web/20250831064110/https://www.parad...

gethly

3 months ago

AI articles should not be making it to the front page.

sevg

3 months ago

How did this article get so many votes? Clickbait title, LLM generated content, and multiple fundamental errors in the text and code.

IMO this submission needs flags not upvotes.

mxey

3 months ago

The article mentions the new async IO in PostgreSQL v18 in the section on write tuning but it’s only relevant for reads.

wirelesspotat

3 months ago

Agree with other commenters that the title is a bit confusing and should be renamed to something like "Is your Postgres workload read heavy or write heavy?"

But title aside, I found this post very useful for better understanding PG reads and writes (under the hood) and how to actually measure your workload

Curious if the tuning actions any different if you're using a non-vanilla storage engine like AWS Aurora or GCP AlloyDB or Neon?

tonyhart7

3 months ago

ram heavy

spl757

3 months ago

iotop?

Cupprum

3 months ago

Surprising amount of downvoted comments under this article. I wonder why

1 more comments available on Hacker News

View full discussion on Hacker News