Kafka Is Fast – I'll Use Postgres
Posted2 months agoActiveabout 2 months ago
topicpartition.ioTechstoryHigh profile
heatedmixed
Debate
85/100
PostgresKafkaDatabaseMessage QueueTech Stack
Key topics
Postgres
Kafka
Database
Message Queue
Tech Stack
The article discusses using Postgres as a message queue instead of Kafka, sparking a debate on the appropriateness of using a database for messaging and the trade-offs between different technologies.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
51m
Peak period
151
Day 1
Avg / period
32
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 29, 2025 at 10:06 AM EDT
2 months ago
Step 01 - 02First comment
Oct 29, 2025 at 10:57 AM EDT
51m after posting
Step 02 - 03Peak activity
151 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 8, 2025 at 6:33 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45747018Type: storyLast synced: 11/22/2025, 11:00:32 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Kafka, GraphQL... These are the two technology's where my first question is always this: Does the person who championed/lead this project still work here?
The answer is almost always "no, they got a new job after we launched".
Resume Architecture is a real thing. Meanwhile the people left behind have to deal with a monster...
They’ll be fine if you made something that works, even if it was a bit faddish, make sure you take care of yourself along the way (they won’t)
My point stands, company loyalty tallies up to very little when you’re looking for your next job; no interviewer will care much to hear of how you stood firm, and ignored the siren song of tech and practices that were more modern than the one you were handed down (the tech and practices they’re hiring for).
The moment that reverses, I will start advising people not to skill up, as it will look bad in their resumes.
is there some reason GraphQL gets so much hate? it always feels to me like it's mostly just a normal RPC system but with some incredibly useful features (pipelining, and super easy to not request data you don't need), with obvious perf issues in code and obvious room for perf abuse because it's easy to allow callers to do N+1 nonsense.
so I can see why it's not popular to get stuck with for public APIs unless you have infinite money, it's relatively wide open for abuse, but private seems pretty useful because you can just smack the people abusing it. or is it more due to specific frameworks being frustrating, or stuff like costly parsing and serialization and difficult validation?
which I fully understand is more work than "it's super easy just X" which it gets presented as, but that's always the cost of super flexible things. does graphql (or the ecosystem, as that's part of daily life of using it) make that substantially worse somehow? because I've dealt with people using protobuf to avoid graphql, then trying to reimplement parts of its features, and the resulting API is always an utter abomination.
And yes, you don't want to use it for public APIs. But if you have private APIs that are so complex that you need a query language, and still want use those over web services, you are very likely doing something really wrong.
"check that the user matches the data they're requesting by comparing the context and request field by hand" is ultra common - there are some real benefits to having authorization baked into the language, but it seems very rare in practice (which is part of why it's often flawed, but following the overwhelming standard is hardly graphql's mistake imo). I'd personally think capabilities are a better model for this, but that seems likely pretty easy to chain along via headers?
The problem is that GraphQL doesn't behave like all other general purpose RPC systems. As a rule, authorization does not work on the same abstraction level as GraphQL.
And that explanation you quoted is disingenuous, because GraphQL middleware and libraries don't usually export places where you can do anything by hand.
Instead of not knowing 1 thing to launch.. let’s pick as many new to us things, that will increase the chances of success.
Personally I’d expect some kind of internal interface to abstract away and develop reusable components for such an external dependency, which readily enables having relational data stores mirroring the brokers functionality. Handy for testing and some specific local scenarios, and those database backed stores can easily pull from the main cluster(s) later to mirror data as needed.
Postgres really is a startup's best friend most of the time. Building a new product that's going to deal with a good bit of reporting that I began to look at OLAP DBs for, but had hesitation to leave PG for it. This kind of seals it for me (and of course the reference to the class "Just Use Postgres for Everything" post helps) that I should Just Use Postgres (R).
On top of being easy to host and already being familiar with it, the resources out there for something like PG are near endless. Plus the team working on it is doing constant good work to make it even more impressive.
I've been heads-down building a scheduling tool, and the number of times I've had to talk myself out of over-engineering is embarrassing. "Should I use Kafka for event streaming?" No. "Do I need microservices?" Probably not. "Can Postgres handle this?" Almost certainly yes.
The real skill is knowing when you've actually outgrown something vs. when you're just pattern-matching what Big Tech does. Most products never get to the scale where these distinctions matter—but they DO die from complexity-induced paralysis.
What's been your experience with that inflection point where you actually needed to graduate to more complex tooling? How did you know it was time?
I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.
Wouldn't OrioleDB solve that issue though?
If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.
Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.
It often doesn't.
[1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required
And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)
When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.
Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.
We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.
Of course the other 99% is the remaining 1%.
But anytime you treat a database, or a queue, like a black box dumpster, problems will ensue.
Also, LISTEN/NOTIFY do not scale, and they introduce locks in areas you aren't expecting - https://news.ycombinator.com/item?id=44490510
Postgres isn’t meant to be a guaranteed permanent replacement.
It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.
Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.
Maybe a tweak to Postgres or resources, or consider a jump to Kafka.
> ...
> The other camp chases common sense
I don't really like these simplifications. Like one group obviously isn't just dumb, they're doing things for reasons you maybe don't understand. I don't know enough about data science to make a call, but I'm guessing there were reasons to use Kafka due to current hardware limits or scalability concerns, and while the issues may not be as present today that doesn't mean they used Kafka just because they heard a new word and wanted to repeat it.
https://github.com/mongomock/mongomock Extrapolating from my personal usage of this library to others, I'm starting to think that mongodb's 25 billion dollar valuation is partially based on this open source package :)
Otherwise SQLITE :)
I prefer not to start with a nosql database and then undertake odysseys to make it into a relational database.
In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.
Does Postgres support this functionality for their queues?
So if you want an individual offset, then yes, the consumer could just maintain their own… however, if you want a group’s offset, you have to do something else.
Is a queuing system baked into Postgres? Or there client libraries that make it look like one?
And do these abstractions allow for arbitrarily moving the offset for each consumer independently?
If you're writing your own queuing system using pg for persistence obviously you can architect it however you want.
I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.
But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.
Here is one: https://pgmq.github.io/pgmq/
Some others: https://github.com/dhamaniasad/awesome-postgres
Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.
Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.
But it looks like a queue, which is a fundamentally different data structure from an event log, and Kafka is an event log.
They are very different usecases; work distribution vs pub/sub.
The article talks about both usecases, assuming the reader is very familiar with the distinction.
Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.
Performance is probably last on the list of reasons to choose Kafka over Postgres.
There’s several implementations of queues to increase the chance of finishing what one is after. https://github.com/dhamaniasad/awesome-postgres
I truly miss a good standard client side library following the Kafka-in-SQL philosophy. I started on in my previous job and we used it internally but it never got good enough that it would be widely used elsewhere, and now I work somewhere else...
(PS: Talking about the pub/sub Kafka-like usecase, not the work queue FOR UPDATE usecase)
Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.
Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).
So it is a much more serious issue at stake here than event ordering/consistency.
As it happens, if you use event log tables in SQL "the Kafka way" you actually get guarantee on event ordering too as a side effect, but that is not the primary goal.
More detailed description of problem:
https://github.com/vippsas/mssql-changefeed/blob/main/MOTIVA...
Another approach which I used in the past was to assign sequence numbers after committing. Basically a separate process periodically scans the set of un-sequenced rows, applies any application defined ordering constraints, and writes in SNs to them. This can be surprisingly fast, like tens of thousands of rows per second. In my case, the ordering constraints were simple, basically that for a given key, increasing versions get increasing SNs. But I think you could have more complex constraints, although it might get tricky with batch boundaries
There needs to be serialization happening somewhere, either by writers or readers waiting for their turn.
What Kafka "is" in my view is simply the component that assigns sequential event numbers. So if you publish to Kafka, Kafka takes the same locks...
How to increase throughput is add more shards in a topic.
In a sense this is what Kafka IS architecturally: The component that assigns event sequence numbers.
Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.
If you would rather have readers waiting and parallel writers there is a more complex scheme here: https://blog.sequinstream.com/postgres-sequences-can-commit-...
https://www.oreilly.com/library/view/designing-data-intensiv...
You can generate distributed monotonic number sequences with a Lamport Clock.
https://en.wikipedia.org/wiki/Lamport_timestamp
The wikipedia entry doesn't describe it as well as that book does.
It's not the end of the puzzle for distributed systems, but it gets you a long way there.
See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock
Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":
https://ia904606.us.archive.org/32/items/distributed-systems...
Another way to speed it up is to grab unique numbers in batches instead of just getting them one at a time. No idea why you want your numbers to be in absolute sequence. That's hard in a distributed system. Probably best to relax that constraint and find some other way to track individual pieces of data. Or even better, find a way so you don't have to track individual rows in a distributed system.
There's poles.
1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.
Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.
I think it's still just 2 poles. However, I probably shouldn't have prescribed motivation to latter pole, as I purposely did not with the former.
Pole 2 is simply never adopt anything new ever, for whatever the motivation.
If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.
Obviously it's possible to build, for example, a machine with 2 cores, a 10Gbps network link, and a single HDD that would falsify my statement.
Even if you somehow get everyone to follow best-practices, you most likely still won't get to saturate the network on "minimal hardware". The number of client connections and requests per second will likely saturate your "minimal CPU".
It's true that minimal hardware on Kafka can saturate the network, but this mostly happens in low-digit client scenarios. In practice, orgs pushing serious data have serious client counts.
This is literally the point the author is making.
"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."
It's not so hard. You interpret it how it is written. Yes, they say one camp chases buzzwords and another chases common sense. Critique that if you want to. That's fine.
But what's not written in the OP is some sort of claim that Postgres performs better than Kafka. The opposite is written. The OP acknowledges that Kafka is fast. Right there in the title! What's written is OP's experiments and data that shows Postgres is slow but can be practical for people who don't need Kafka. Honestly I don't see anything bewildering about it. But if you think they're wrong about Postgres being slow but practical that's something nice to talk about. What's not nice is to post snarky comments insinuating that the OP is asking you to design unscalable solutions.
In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
[Disclosure: I work for Redpanda]
Kafka et al definitely have their place, but I think most people would be much better off reaching for a simpler queue system (or for some things, just using Postgres) unless you really need the advanced features.
* Lack of interest for other team members, which translated to doing what they thought was a sufficiently minimal amount of knowledge transfer
* An (unwise) attitude that "it's already set up and configured, and terraformed, so we can just acquire that knowledge if and when it's needed"
* Kafka guy left a lot faster than anybody really expected, not leaving much time and practically no documentation
* The rest of the team was already overwhelmed with other responsiblities and didn't have much bandwidth available
* Nobody wanted to be the person/people that ended up "owning" it, so there was a reverse incentive
Postgres is the solution in question of the article because I simply assume the majority of companies will start with Postgres as their first piece of infra. And it is often the case. If not - MySQL, SQLite, whatever. Just optimize for the thing you know, and see if it can handle your use case (often you'll be surprised)
You should be able to install within minutes.
None of this applies to Redpanda.
Yet to also be fair to the Kafka folks, Zookeeper is no longer default and hasn't been since April 2025 with the release of Apache Kafka 4.0:
"Kafka 4.0's completed transition to KRaft eliminates ZooKeeper (KIP-500), making clusters easier to operate at any scale."
Source: https://developer.confluent.io/newsletter/introducing-apache...
> This is literally the point the author is making.
Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!
https://www.youtube.com/watch?v=7CdM1WcuoLc
Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.
Just because you can do something with Postgres doesn't mean you should.
> 1. One camp chases buzzwords.
> 2. The other camp chases common sense
In this case, is "Postgres" just being used as a buzzword?
[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]
Kafka is a full on steaming solution.
Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.
Freudian slip? ;)
"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."
However, for all that said, Redpanda is still blazingly fast.
https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...
To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.
That's why in PostgreSQL it's feasible to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G... but not to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G....
Can you lose one Postgres instance?
I'm not saying they're useless, but if I see something like that lying around, it's more likely that someone put it there based on vibes rather than an actual engineering need. Postgres is good enough for OpenAI, chances are it's good enough for you.
It's built on pgmq and not married to supabase (nearly everything is in the database).
Postgres is enough.
241 more comments available on Hacker News