Pipelining in Psql (postgresql 18)
Posted3 months agoActive2 months ago
postgresql.verite.proTechstory
excitedpositive
Debate
60/100
PostgresqlDatabase PerformancePipelining
Key topics
Postgresql
Database Performance
Pipelining
The article discusses the new pipelining feature in PostgreSQL 18, which allows for improved database performance by reducing network latency, and the discussion highlights its potential benefits and limitations.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
35
Day 1
Avg / period
10
Comment distribution40 data points
Loading chart...
Based on 40 loaded comments
Key moments
- 01Story posted
Oct 12, 2025 at 12:46 AM EDT
3 months ago
Step 01 - 02First comment
Oct 12, 2025 at 3:13 AM EDT
2h after posting
Step 02 - 03Peak activity
35 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 24, 2025 at 4:49 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45555308Type: storyLast synced: 11/20/2025, 2:24:16 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
In many cases it would be good to forego interactive transactions and instead execute all read-only queries at once, and another write batch after doing processing on the obtained data. That way, the amount of roundtrips is bounded. There are some complications of course, like dealing with concurrency becomes more complicated. I’m currently prototyping a library exploring these ideas.
Everything else both feels weird and also truly is awkward to express because our programming languages don't really allow us to express it well. And usually by the time we figure out that we need a more reified, batch-oriented mechanism. (the one on page 2) it is too late, the procedural assumptions have been deeply baked into the code we've written so far.
See Can programmers escape the gentle tyranny of call/return? by yours truly.
https://www.hpi.uni-potsdam.de/hirschfeld/publications/media...
See also: https://news.ycombinator.com/item?id=45367519
If that weren't yikes enough, SQL injection bugs used to be the #1 exploited security vulnerabilities. It's gotten a little better, partly because of greater usr of ORMs.
ORMs?
https://blog.codinghorror.com/object-relational-mapping-is-t...
No, just use prepared statements.
You also can't use parameterized values like $1, $2.
It seems more niche than you're suggesting. Though I wish people would write app layer pseudocode to demonstrate what they are referring to.
So unless you are a lean startup, the reasons many products are horribly slow are very low hanging fruits no body are ever going to bother picking.
If you ever reach the time where pipelining is giving you a boost in perf, your app was already in a nice state.
It's so nice to be able to code on a baremetal server where my monolith has directly access to my postgres instance on my personal projects.
Since that discussion on the forum, I talked more about pipelining with some other core devs, and that may happen in some form or another in the future.
The main limiting factor is that most of the big Rails contributors work with MySQL, not Postgres, and MySQL doesn't really have proper pipelining support.
> The network is better utilized because successive queries can be grouped in the same network packets, resulting in less packets overall.
> the network packets are like 50 seater buses that ride with only one passenger.
The performance improvement is not likely to be because you’re sending larger packets, since most queries transfer very little data and the benchmark the conclusion is drawn from definitely is transferring near 0 data. The speed up comes from removing waiting on a round trip ack of a batch from executing subsequent queries; the number of network packets is irrelevant.
Suppose client A runs “select * from foo”, which has a thousand records. It can start streaming those results starting with row 1. Now suppose it’s on row 500 when client B runs the same query. Instead of starting over for B, it can start streaming results to B starting at row 501. Each time it reads a row, now it sends that to both clients.
Now when it finishes with row 1000, client A’s query is done. It starts back over with B on row 1 and continues through row 500.
Hypothetically, you can serve N clients with a total of 2 table scans if they all arrive before the first client’s scan is finished.
So that’s the kind of magic where I think this is going to shine. Queue up a few queries and it’s likely that several will be able to share the same underlying work.
It’s also not true pipelining since you can’t send a follow up request that depends on the results of the previous incomplete request (eg look at capnproto promise pipelining). As such the benefit in practice is actually more limited, especially if instead here you use connection pooling and send the requests over different connections in the first place - I’d expect very similar performance numbers for the benchmark assuming you have enough connections open in parallel to keep the DB busy.
Maybe referring to synchronize_seqscans?
https://www.postgresql.org/docs/current/runtime-config-compa...
"The network is better utilized because successive queries can be grouped in the same network packets, resulting in less packets overall"
For some reason, you don't believe it. OK, let's look at these wireshark network statistics when the test script runs inserting 100k rows, capturing the traffic on the Postgres port.
- case without pipelining (result of "tshark -r capture-file -q z io,stat,0"):
- case with pipelining: So compared to needing 2 packets per query in the non-pipelining case, the pipelining case needs about 10 times less.Again, it's because the client buffers the queries to send in the same batch (this buffer seems to be 64k bytes currently), in addition to not waiting for the results of previous queries.
It has a separate client to server packet that forces previous ones to complete as it will make otherwise-asynchronous (because pipelining) error reporting forcefully serial.
Other than this which is arguably not needed for queries that don't expect errors enough to need early/eager exception throwing during the course of a transaction, it's inherently naturally pipelined as you can just fire two or more statements worth of parameter binding and result fetching back-to-back without blocking on anything.
https://joist-orm.io/blog/initial-pipelining-benchmark/
If you're not in a transaction, afaiu pipelining is not as applicable/useful b/c any SQL statement failing in the pipeline fails all other queries after it, and imo it would suck for separate/unrelated web requests that "share a pipeline" to have one request fail the others -- but for a single txn/single request, these semantics are what you expect anyway.
Unfortunately in the TypeScript ecosystem, the node-pg package/driver doesn't support pipelining yet, instead this "didn't quite hit mainstream adoption and now the author is AWOL" driver does: https://github.com/porsager/postgres
I've got a branch to convert our TypeScript ORM to postgres.js solely for this "send all our INSERTs/UPDATEs/DELETEs in parallel" perf benefit, and have some great stats so far:
https://github.com/joist-orm/joist-orm/pull/1373#issuecommen...
But it's not "must have" for us atm, so haven't gotten time to rebase/ship/etc...hoping to rebase & land the PR by eoy...
My assumption was just from, afaict, the general lack of triage on GitHub issues, i.e. for a few needs we have like tracing/APM, and then also admittedly esoteric topics like this stack trace fixing:
https://github.com/porsager/postgres/issues/963#issuecomment...
Fwiw I definitely sympathize with issue triage being time-consuming/sometimes a pita, i.e. where a nontrivial/majority of issues are from well-meaning but maybe naive users asking for free support/filing incorrect/distracting issues.
I don't have an answer, but just saying that's where my impression came from.
Thanks for replying!
Jeez.
That said, I hope node-postgres can support this soon. As it stands, every single query you add to a transaction adds a serial network roundtrip which is devastating not just in execution time but how long you're holding any locks inside the transaction.
Also interesting about a potential v4! I'll keep lurking on the github project and hope to see what it brings!
How do I handle, say 100K concurrent transactions in an OLTP database? Here are my learnings that make this difficult,
- a transaction has a one-to-one mapping with a connection
- a connection can only process one transaction at at time, so pooling isn't going to help.
- database connections are "expensive"
- a client can open at maximum, 65k connections as otherwise it would run out of ports.
A 100k connections isn't that crazy; say you have 100k concurrent users and each one needs a transaction to manage it's independent state. Transactions are useful as they enforce consistency.