Kafka Is Fast – I'll Use Postgres

Posted2 months agoActiveabout 2 months ago

enether

561 points

401 comments

topicpartition.ioTechstoryHigh profile

heatedmixed

Debate

85/100

PostgresKafkaDatabaseMessage QueueTech Stack

Key topics

Postgres

Kafka

Database

Message Queue

Tech Stack

The article discusses using Postgres as a message queue instead of Kafka, sparking a debate on the appropriateness of using a database for messaging and the trade-offs between different technologies.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

51m

Peak period

151

Day 1

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 29, 2025 at 10:06 AM EDT
2 months ago
Step 01
02First comment
Oct 29, 2025 at 10:57 AM EDT
51m after posting
Step 02
03Peak activity
151 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Nov 8, 2025 at 6:33 AM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (401 comments)

Showing 160 comments of 401

zer00eyz

2 months ago

7 replies

> Should You Use Postgres? Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Kafka, GraphQL... These are the two technology's where my first question is always this: Does the person who championed/lead this project still work here?

The answer is almost always "no, they got a new job after we launched".

Resume Architecture is a real thing. Meanwhile the people left behind have to deal with a monster...

forgetfulness

2 months ago

1 reply

We’re all passing through our jobs, the value of the solutions remains in the hands of the shareholders, if you don’t try to squeeze some long-term value for your resume and long-term employability, you’re assuming a significant opportunity cost on their behalf

They’ll be fine if you made something that works, even if it was a bit faddish, make sure you take care of yourself along the way (they won’t)

candiddevmike

2 months ago

1 reply

Attitudes like this are why management treats developers like children who constantly need to be kept on task, IMO.

forgetfulness

2 months ago

Software is a line of work that has astounding amounts of autonomy, if you compare it to working in almost anything else.

My point stands, company loyalty tallies up to very little when you’re looking for your next job; no interviewer will care much to hear of how you stood firm, and ignored the siren song of tech and practices that were more modern than the one you were handed down (the tech and practices they’re hiring for).

The moment that reverses, I will start advising people not to skill up, as it will look bad in their resumes.

Groxx

2 months ago

2 replies

having never hosted a GraphQL service, but I can see many obvious room for problems:

is there some reason GraphQL gets so much hate? it always feels to me like it's mostly just a normal RPC system but with some incredibly useful features (pipelining, and super easy to not request data you don't need), with obvious perf issues in code and obvious room for perf abuse because it's easy to allow callers to do N+1 nonsense.

so I can see why it's not popular to get stuck with for public APIs unless you have infinite money, it's relatively wide open for abuse, but private seems pretty useful because you can just smack the people abusing it. or is it more due to specific frameworks being frustrating, or stuff like costly parsing and serialization and difficult validation?

twodave

2 months ago

1 reply

As someone who works with GraphQL daily, many of the criticisms out there are from before the times of persisted queries, query cost limits, and composite schemas. It’s a very mature and useful technology. I agree with it maybe being less suitable for a public API, but less because of possible abuse and more because simple HTTP is a lot more widely known. It depends on the context, as in all things, of course.

Groxx

2 months ago

yeah, I took one look at it and said "great, so add some cost tracking and kill requests before they exceed it" because like. obviously. it's similar to exposing a SQL endpoint: you need to build for that up front or the obvious results will happen.

which I fully understand is more work than "it's super easy just X" which it gets presented as, but that's always the cost of super flexible things. does graphql (or the ecosystem, as that's part of daily life of using it) make that substantially worse somehow? because I've dealt with people using protobuf to avoid graphql, then trying to reimplement parts of its features, and the resulting API is always an utter abomination.

marcosdumay

2 months ago

1 reply

Take a look on how to implement access control over GraphQL requests. It's useless for anything that isn't public data (at least public for your entire network).

And yes, you don't want to use it for public APIs. But if you have private APIs that are so complex that you need a query language, and still want use those over web services, you are very likely doing something really wrong.

Groxx

2 months ago

1 reply

I'm honestly not seeing much here that isn't identical to almost all other general purpose RPC systems: https://graphql.org/learn/authorization/

"check that the user matches the data they're requesting by comparing the context and request field by hand" is ultra common - there are some real benefits to having authorization baked into the language, but it seems very rare in practice (which is part of why it's often flawed, but following the overwhelming standard is hardly graphql's mistake imo). I'd personally think capabilities are a better model for this, but that seems likely pretty easy to chain along via headers?

marcosdumay

2 months ago

> identical to almost all other general purpose RPC systems

The problem is that GraphQL doesn't behave like all other general purpose RPC systems. As a rule, authorization does not work on the same abstraction level as GraphQL.

And that explanation you quoted is disingenuous, because GraphQL middleware and libraries don't usually export places where you can do anything by hand.

bencyoung

2 months ago

2 replies

Kafka is great tech, never sure why people have an issue with it. Would I use it all the time? No, but where it's useful, it's really useful, and opens up whole patterns that are hard to implement other ways

evantbyrne

2 months ago

3 replies

Managed hosting is expensive to operate and self-managing kafka is a job in of itself. At my last employer they were spending six figures to run three low volume clusters before I did some work to get them off some enterprise features, which halved the cost, but it was still at least 5x the cost of running a mainstream queue. Don't use kafka if you just need queuing.

bencyoung

2 months ago

2 replies

Cheapest MSK cluster is $100 a month and can easily run a dev/uat cluster with thousands of messages a second. They go up from there but we've made a lot of use of these and they are pretty useful

singron

2 months ago

I've basically never had a problem with MSK brokers. The issue has usually been "why are we rebalancing?" and "why aren't we consuming?", i.e. client problems.

evantbyrne

2 months ago

It's not the dev box with zero integrations/storage that's expensive. AWS was quoting us similar numbers for MSK. Part of the issue is that modern kafka has become synonymous with Confluent, and once you buy into those features, it is very difficult to go back. If you're already on AWS and just need queuing, start with SQS.

CuriouslyC

2 months ago

I always push people to start with NATS jetstream unless I 100% know they won't be able to live without Kafka features. It's performant and low ops.

j45

2 months ago

Engaging difficulty is a form of procrastination and avoiding stoking a product in some cases.

Instead of not knowing 1 thing to launch.. let’s pick as many new to us things, that will increase the chances of success.

bonesss

2 months ago

Kafka also provides early architectural scaffolding for multiple teams to build in parallel with predictable outcomes (in addition to the categorical answers to hard/error-prone patterns). It’s been adopted in principle by the services on, and is offered turn-key by, all the major cloud providers.

Personally I’d expect some kind of internal interface to abstract away and develop reusable components for such an external dependency, which readily enables having relational data stores mirroring the brokers functionality. Handy for testing and some specific local scenarios, and those database backed stores can easily pull from the main cluster(s) later to mirror data as needed.

sitestable

2 months ago

The best architecture decision is the one that's still maintainable when the person who championed it leaves. Always pretend the person who maintains a project after you knows where you live and all that.

darkstar_16

2 months ago

GraphQL sure, but I'm not sure I'd put kafka in the same bucket. It is a nice technology that has it's use in some cases, where postgresql would not work. It is also something a small team should not start with. Start with postgres and then move on to something else when the need arises.

janwijbrand

2 months ago

"resume" as in "resumé" not as in "begin again or continue after a pause or interruption" - it took me longer than I care to admit to get that.

kvdveer

2 months ago

To be fair, this is true for all technologically interesting solutions, even when they use postgres. People championing novel solutions typically leave after the window for creativity has closed.

jjice

2 months ago

1 reply

This is a well written addition to the list of articles I need to reference on occasion to keep myself from using something new.

Postgres really is a startup's best friend most of the time. Building a new product that's going to deal with a good bit of reporting that I began to look at OLAP DBs for, but had hesitation to leave PG for it. This kind of seals it for me (and of course the reference to the class "Just Use Postgres for Everything" post helps) that I should Just Use Postgres (R).

On top of being easy to host and already being familiar with it, the resources out there for something like PG are near endless. Plus the team working on it is doing constant good work to make it even more impressive.

j45

2 months ago

1 reply

It’s totally reasonable to start with fewer technologies to do more and then outgrow them.

sanskarix

2 months ago

This mindset is criminally underrated in the startup/indie builder world. There's so much pressure to architect for scale you might never reach, or to use "industry standard" stacks that add enormous complexity.

I've been heads-down building a scheduling tool, and the number of times I've had to talk myself out of over-engineering is embarrassing. "Should I use Kafka for event streaming?" No. "Do I need microservices?" Probably not. "Can Postgres handle this?" Almost certainly yes.

The real skill is knowing when you've actually outgrown something vs. when you're just pattern-matching what Big Tech does. Most products never get to the scale where these distinctions matter—but they DO die from complexity-induced paralysis.

What's been your experience with that inflection point where you actually needed to graduate to more complex tooling? How did you know it was time?

agentultra

2 months ago

9 replies

You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.

I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.

sneilan1

2 months ago

3 replies

Yes, performance can be a big issue with postgres. And vertical scaling can really put a damper on things when you have a major traffic hit. Using it for kafka is misunderstanding the one of the great uses of kafka which is to help deal with traffic bursts. All of a sudden your postgres server is overwhelmed and the kafka server would be fine.

zenmac

2 months ago

1 reply

>And vertical scaling can really put a damper on things when you have a major traffic hit.

Wouldn't OrioleDB solve that issue though?

sneilan1

2 months ago

Not familiar with OrioleDB. I’ll look it up. May I ask how this helps? Just curious.

enetherAuthor

about 2 months ago

Agree but we really have to put a number on baseline traffic and max traffic burst in order to be productive in the discussion. I would argue that the majority of use cases never need to be designed for a max-traffic-number that PG can't handle

mike_hearn

2 months ago

It's worth noting that Oracle has solved this problem. It has horizontal multi-master scalability (not sharded) and a queue subsystem called TxEQ which scales like Kafka does, but it's also got the features of a normal MQ broker. You can dequeue a message into a transaction, update tables in that same transaction, then commit to remove the message from the queue permanently. You can dequeue by predicate, delay messages, use producer/consumer patterns etc. It's quite flexible. The queues can be accessed via SQL stored procs, or client driver APIs, or it implements a Kafka compatible API now too I think.

If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.

Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.

fukka42

2 months ago

1 reply

My strategy is to use postgres first. Get the idea off the ground and switch when postgres becomes the bottleneck.

It often doesn't.

jorge-d

2 months ago

Definitely, this is also one of the direction Rails is heading[1]: provide a basis setup most of the people can use out of the box. And if needed you can always plug in more "mature" solutions afterwards.

[1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required

fud101

2 months ago

4 replies

When someone says just use Postgres, are they using the same instance for their data as well for the queue?

Yeroc

2 months ago

1 reply

You would typically want to use the same database instance for your queue as long as you can get away with it because then transaction handling is trivial. As soon as you move the queue somewhere else you need to carefully think about how you'll deal with transactionality.

enetherAuthor

about 2 months ago

I believe most setups using DB+{Queue,Kafka} don't truly deal with it fwiw.

marcosdumay

2 months ago

When people say "just use postgres" it's because their immediate need is so low that this doesn't matter.

And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)

victorbjorklund

2 months ago

Yes, I often use PG for queues on the same instance. Most of the time you dont see any negative effects. For a new project with barely any users it doesn’t matter.

j45

2 months ago

It can be a different database in the same server or a separate server.

When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.

Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.

We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.

javier2

2 months ago

1 reply

Postgres doesnt scale into oblivion, but it can take some serious chunks of data once you start batching and making sure a every operation only touches single row with no transactions needed.

AtlasBarfed

2 months ago

2 replies

And then you are 99% of the way to Cassandra.

Of course the other 99% is the remaining 1%.

javier2

2 months ago

Nearly true, but you dont need to run a cassandra cluster to ship your 3k msg/sec and you can take smaller locks if you have a small number of senders that delete sent messages and send in chunks

riku_iki

2 months ago

cassandra doesn't have ACID, so you will start dealing with tons of other problems.

AtlasBarfed

2 months ago

1 reply

Postgres is just fantastic software.

But anytime you treat a database, or a queue, like a black box dumpster, problems will ensue.

EdwardDiego

2 months ago

Exactly. Or worse, you treat one as a straightforward black box swap in replacement for another. If you're looking to scale, you _will_ need to code to the idiosyncraties of your chosen solution.

skunkworker

2 months ago

1 reply

I wish postgres would add a durable queue like data structure. But trying to make a durable queue that can scale beyond what a simple redis instance can do starts to run into problems quickly.

Also, LISTEN/NOTIFY do not scale, and they introduce locks in areas you aren't expecting - https://news.ycombinator.com/item?id=44490510

abtinf

2 months ago

1 reply

SKIP LOCKED doesn't work for your use case?

throwwgisgreat

2 months ago

It would probably work fine, it would also put the jobs at risk of people who managed to convince their enterprises that a dumb but fast server (Kafka) was actually a good idea.

j45

2 months ago

100%

Postgres isn’t meant to be a guaranteed permanent replacement.

It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.

Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.

Maybe a tweak to Postgres or resources, or consider a jump to Kafka.

SoftTalker

2 months ago

This is true of any data storage. You have to understand the concurrency model and assumptions, and know where bottlenecks can happen. Even among relational databases there are significant differences.

BinaryIgor

2 months ago

True, but you have to have a really intensive workload to hit its limits; something in the order of tens of thousands writes per second; and even then, you can shard to a few instances. So yes, there is a limit - but in practice, not for most systems

guywithahat

2 months ago

2 replies

> One camp chases buzzwords

> ...

> The other camp chases common sense

I don't really like these simplifications. Like one group obviously isn't just dumb, they're doing things for reasons you maybe don't understand. I don't know enough about data science to make a call, but I'm guessing there were reasons to use Kafka due to current hardware limits or scalability concerns, and while the issues may not be as present today that doesn't mean they used Kafka just because they heard a new word and wanted to repeat it.

sumtechguy

2 months ago

Kafka and other message systems like it have their uses. But sometimes all you need is just need a database. Now you start doing realtime streaming and notifications and event type things a messaging system is good. You can even back it up with a boring database. Would I start with kafka? Probably not. I would start with a boring databsee and then if if my bashing on the db over and over saying 'have you changed' doesnt work as good anymore then you put in a messaging system.

temporallobe

2 months ago

Agree with this sentiment - it’s easy to be judgmental about these things, but project-level issues and decisions can be very complicated and engineers often have little to no visibility into them. We’re using Kafka for a gigantic pipeline where IMO any reasonably modern database would suffice (and may even be superior), but our performance requirements are unclear. At some point in the distant future, we may have a significant surge in data quantity and speed, requiring greater throughput and (de)serialization speed, but I am not convinced that Kafka ultimately helps us there. I imagine this is a case where the program leadership was sold a solution which we are now obligated to use. This happens a LOT, and I have seen unnecessary and unused products cost companies millions over the years. For example, my team was doing analysis on replacing our existing Atlassian Data Center with other solutions, and in doing so, we discovered several underused/unused Atlassian plugins for which we are paying very high license fees. At some point, users over the years had requested some functionality for a specific workflow and the plugins were purchased. The people and projects went away or otherwise processes became OBE, but the plugins happily hummed along while the bills were paid.

sneilan1

2 months ago

4 replies

I'm starting to like mongodb a lot more given the python library mongomock. I find it wonderful to create tests that run my queries against mongo in code before I deploy them. Yes, mongo has a lot of quirks and you have to know aws networking to set it up with your vpc so you don't get nailed with egress costs. And it's not the same query patterns and some queries are harder and you have maintain your own schemas. But the ability to test mongo code with mongomock w/o having to run your own mongo server is SO VALUABLE. And yes, there are edge cases with mongomock not supporting something but the library is open source and pretty easy to modify. And it fails loudly which is super helpful. So if something is not supported you'll know. Maybe you might find a real nasty feature that's hard to implement but then just use a repository pattern like you would for testing postgres code in your application.

https://github.com/mongomock/mongomock Extrapolating from my personal usage of this library to others, I'm starting to think that mongodb's 25 billion dollar valuation is partially based on this open source package :)

pphysch

2 months ago

1 reply

Or just use devcontainers and have an actual Postgres DB to test against? I've even done this on a Chromebook. This is a solved problem.

sneilan1

2 months ago

1 reply

True but then my tests take longer to run. I really like having very fast tests. And then my tests have to make local network calls to a postgres server. I like my tests isolated.

pphysch

2 months ago

1 reply

They are isolated, your devcontainer config can live in your source repo. And you're not gonna see significant latency from your loopback interface... If your test suite includes billions of queries you may want to reassess.

sneilan1

2 months ago

You know what, you have a very good point. I'll give this another shot. Maybe it can be fast enough and I can just isolate the orm queries to some kind of repository pattern so I'm not testing sql queries over and over.

candiddevmike

2 months ago

2 replies

Curious why you think the risk of edge cases from mocking is a worthwhile trade off vs the relatively low complexity of setting up a container to test against?

sneilan1

2 months ago

The other unspoken aspect of this is with agentic coding, the ability to have the ai also test queries quickly is very valuable. In a non-agentic coding setup, mongomock would not be as useful.

sneilan1

2 months ago

Because I can read the mongomock library and understand exactly what it's doing. And mongo's aggregation pipelines are easier to model than sql queries in code. Sure, it's possible to run into an edge case but for a lot of general queries for filtering & aggregation, it's just fine.

philipallstar

2 months ago

1 reply

You can also do this with sqlite, running an in-memory sqlite is lightning fast and I don't think there are any edge cases. Obviously doesn't work for everything, but when sqlite is possible, it's great!

sneilan1

2 months ago

True but if you wind up using parts of postgres that aren't supported by sqlite then it's harder to use sqlite. I agree however, if I was able to just use sqlite, I would do that instead. But I'm using a lot of postgres extensions & fields that don't have direct mappings to sqlite.

Otherwise SQLITE :)

j45

2 months ago

1 reply

That might work for some.

I prefer not to start with a nosql database and then undertake odysseys to make it into a relational database.

sneilan1

2 months ago

This is the way.

jimbokun

2 months ago

2 replies

For me the killer feature of Kafka was the ability to set the offset independently for each consumer.

In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.

Does Postgres support this functionality for their queues?

Jupe

2 months ago

3 replies

Isn't it just a matter of having each consumer use their own offset? I mean if the queue table is sequentially or time-indexed, the consumer just provides a smaller/earlier key to accomplish the offset? (Maybe I'm missing something here?)

cortesoft

2 months ago

Kafka allows you to have a consumer group… you can have multiple workers processing messages in parallel, and if they all use the same group id, the messages will be sharded across all the workers using that key… so each message will only be handled by one worker using that key, and every message will be given to exactly one worker (with all the usual caveats of guaranteed-processed-exactly-once queues). Other consumers can use different group keys and they will also get every single message exactly once.

So if you want an individual offset, then yes, the consumer could just maintain their own… however, if you want a group’s offset, you have to do something else.

altcognito

2 months ago

Correct, offsets and sharding aren't magic. And partitions in Kafka are user defined, just like they would be for postgresql.

jimbokun

2 months ago

Yes.

Is a queuing system baked into Postgres? Or there client libraries that make it look like one?

And do these abstractions allow for arbitrarily moving the offset for each consumer independently?

If you're writing your own queuing system using pg for persistence obviously you can architect it however you want.

altcognito

2 months ago

2 replies

The article basically states unless you need a lot of throughput, you probably don't need Kafka. (my interpretation extends to say) You probably don't need offsets because you don't need multi-threaded support because you don't need multiple threads.

I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.

But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.

j45

2 months ago

1 reply

PG has several queue management extensions and I’m working my way through trying them out.

Here is one: https://pgmq.github.io/pgmq/

Some others: https://github.com/dhamaniasad/awesome-postgres

Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.

Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.

dagss

2 months ago

pgmq looks cool, thanks for the link!

But it looks like a queue, which is a fundamentally different data structure from an event log, and Kafka is an event log.

They are very different usecases; work distribution vs pub/sub.

The article talks about both usecases, assuming the reader is very familiar with the distinction.

jimbokun

2 months ago

Well in my workplace we need all of those things.

johnyzee

2 months ago

1 reply

Seems like you would at the very least need a fairly thick application layer on top of Postgres to make it look and act like a messaging system. At that point, seems like you have just built another messaging system.

Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.

Performance is probably last on the list of reasons to choose Kafka over Postgres.

j45

2 months ago

1 reply

You expose the api on Postgres much like any other group of developers use and call it a day.

There’s several implementations of queues to increase the chance of finishing what one is after. https://github.com/dhamaniasad/awesome-postgres

dagss

2 months ago

There's a lot of logic involved client side regarding managing read cursors and marking events as processed consumer side. Possibly also client side error queues and so on.

I truly miss a good standard client side library following the Kafka-in-SQL philosophy. I started on in my previous job and we used it internally but it never got good enough that it would be widely used elsewhere, and now I work somewhere else...

(PS: Talking about the pub/sub Kafka-like usecase, not the work queue FOR UPDATE usecase)

vbezhenar

2 months ago

10 replies

How do you implement "unique monotonically-increasing offset number"?

Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.

Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).

munchbunny

2 months ago

1 reply

I have this problem in the system I work on - the short nuance-less answer from my experience is that, once your scale gets large enough, you can't prevent ordering issues entirely and you have to build the resilience into the architecture and the framing of the problem. You often end up paying for consistency with latency.

dagss

2 months ago

I think you may be talking past each other. In the approach taken in the article and the parent comment, if the event sequence number allocation of the writer races the reader cursor position in the wrong way, events will NEVER BE DELIVERED.

So it is a much more serious issue at stake here than event ordering/consistency.

As it happens, if you use event log tables in SQL "the Kafka way" you actually get guarantee on event ordering too as a side effect, but that is not the primary goal.

More detailed description of problem:

https://github.com/vippsas/mssql-changefeed/blob/main/MOTIVA...

grogers

2 months ago

1 reply

You can fill in a noop for sequence number 123 after a timeout. You also need to be able to kill old transactions so that the transaction which was assigned 123 isn't just chilling out (which would block writing the noop).

Another approach which I used in the past was to assign sequence numbers after committing. Basically a separate process periodically scans the set of un-sequenced rows, applies any application defined ordering constraints, and writes in SNs to them. This can be surprisingly fast, like tens of thousands of rows per second. In my case, the ordering constraints were simple, basically that for a given key, increasing versions get increasing SNs. But I think you could have more complex constraints, although it might get tricky with batch boundaries

vbezhenar

2 months ago

2 replies

My approach is: select max(id), and commit with id=max(id)+1. If commit worked, then all good. If commit failed because of unique index violation, repeat the transaction from the beginning. I think it should work correctly with proper transaction isolation level.

grogers

2 months ago

1 reply

That limits you to a few tens of TPS since everything is trying to write the same row which must happen serially. I wouldn't start out with that solution since it'll be painful to change to something more scalable later. Migrating to something better will probably involve more writes per txn during the migration, so it gets even worse before it gets better.

dagss

2 months ago

The counter in another table used in the article also serializes all writers to the table. Probably better than the max() approach but still serial.

There needs to be serialization happening somewhere, either by writers or readers waiting for their turn.

What Kafka "is" in my view is simply the component that assigns sequential event numbers. So if you publish to Kafka, Kafka takes the same locks...

How to increase throughput is add more shards in a topic.

name_nick_sex_m

2 months ago

Does the additional read query cause concern? Or mostly this is ok? (i'm sure the answer depends on scale)

procaryote

2 months ago

1 reply

In the article, they just don't and instead do "SELECT FOR UPDATE SKIP LOCKED" to make sure things get picked up once.

dagss

2 months ago

The article speaks of two usecases, work queue and pub/sub event log. You talk about the first and the comment you reply to the latter. You need event sequence numbering for the pub/sub event log.

In a sense this is what Kafka IS architecturally: The component that assigns event sequence numbers.

name_nick_sex_m

2 months ago

1 reply

Funnily enough, I was just designing a queue exactly this way, thanks for catching this. (chat GPT meanwhile was assuring me the approach was airtight)

1oooqooq

2 months ago

1 reply

you're really trying to vibe architect?

name_nick_sex_m

2 months ago

Gotta make a living somehow

theK

2 months ago

> unique monotonically-increasing offset number

Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.

dagss

2 months ago

The article describes using a dedicated table for the counter, one row per table, in the same transaction (so parallel writers to the same table wait for each other through a lock on that row).

If you would rather have readers waiting and parallel writers there is a more complex scheme here: https://blog.sequinstream.com/postgres-sequences-can-commit-...

xnorswap

2 months ago

It's a tricky problem, I'd recommend reading DDIA, it covers this extensively:

https://www.oreilly.com/library/view/designing-data-intensiv...

You can generate distributed monotonic number sequences with a Lamport Clock.

https://en.wikipedia.org/wiki/Lamport_timestamp

The wikipedia entry doesn't describe it as well as that book does.

It's not the end of the puzzle for distributed systems, but it gets you a long way there.

See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock

Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":

https://ia904606.us.archive.org/32/items/distributed-systems...

singron

2 months ago

The log_counter table tracks this. It's true that a naive solution using sequences does not work for exactly the reason you say.

sigseg1v

2 months ago

What about a `DEFERRABLE INITIALLY DEFERRED` trigger that increments a sequence only on commit?

hunterpayne

2 months ago

The "unique monotonically-increasing offset number" use case works just fine. I need a unique sequence number in ascending order doesn't (your problem). Why you need two queue to share the same sequence object is your problem I think.

Another way to speed it up is to grab unique numbers in batches instead of just getting them one at a time. No idea why you want your numbers to be in absolute sequence. That's hard in a distributed system. Probably best to relax that constraint and find some other way to track individual pieces of data. Or even better, find a way so you don't have to track individual rows in a distributed system.

ownagefool

2 months ago

3 replies

The camps are wrong.

There's poles.

1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.

Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.

wosined

2 months ago

2 replies

I am the third pole: 3. Everything we have currently sucks and what is new will suck for some hitherto unknown reason.

antonvs

2 months ago

1 reply

If you choose wisely, things should suck less overall as you move forward. That's kind of the overall goal, otherwise we'd all still be toggling raw machine code into machines using switches.

wosined

2 months ago

Computers got faster, software is not so straightforward. I don't even know why a text webpage needs 100mb of memory to render and display.

ownagefool

2 months ago

Heh, me too.

I think it's still just 2 poles. However, I probably shouldn't have prescribed motivation to latter pole, as I purposely did not with the former.

Pole 2 is simply never adopt anything new ever, for whatever the motivation.

jppope

2 months ago

So 1. RDD 2. Curmudgeon and 3. People who rationally look at the problem and try to solve it in the best way possible (omitted in the article)

binarymax

2 months ago

This is it right here. My foil is the Elasticsearch replacement because PG has inverted indices. The ergonomics and tunability of these in PG are terrible compared to ES. Yes, it will search, but I wouldn’t want to be involved in constructing or maintaining that search.

uberduper

2 months ago

3 replies

Has this person actually benchmarked kafka? The results they get with their 96 vcpu setup could be achieved with kafka on the 4 vcpu setup. Their results with PG are absurdly slow.

If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.

loire280

2 months ago

5 replies

In fact, a properly-configured Kafka cluster on minimal hardware will saturate its network link before it hits CPU or disk bottlenecks.

altcognito

2 months ago

1 reply

This doesn't even make sense. How do you know what the network links or the other bottlenecks are like? There are a grandiose number of assumptions being made here.

loire280

2 months ago

1 reply

There is a finite and relatively narrow range of ratios of CPU, memory, and network throughput in both modern cloud offerings and bare hardware configurations.

Obviously it's possible to build, for example, a machine with 2 cores, a 10Gbps network link, and a single HDD that would falsify my statement.

altcognito

2 months ago

1 reply

But the workload matters. Even the comment in the article doesn't completely make sense for me in that way -- if your workload is 50 operations per byte transferred versus 5000 operations per byte transferred, there is a considerable difference in hardware requirements.

enetherAuthor

about 2 months ago

Exactly. "a properly-configured Kafka cluster" implies you have very properly configured your clients too, which is almost never the case because it's practically very hard to do in the messy reality of a large-scale organization.

Even if you somehow get everyone to follow best-practices, you most likely still won't get to saturate the network on "minimal hardware". The number of client connections and requests per second will likely saturate your "minimal CPU".

It's true that minimal hardware on Kafka can saturate the network, but this mostly happens in low-digit client scenarios. In practice, orgs pushing serious data have serious client counts.

theK

2 months ago

Isn't that true for everything on the cloud? I thought we are long into the era where your disk comes over the network there.

UltraSane

2 months ago

A network link can be anything from 1Gbps to 800Gbps.

EdwardDiego

2 months ago

Depends on how you configure the clients, ask me how I know that using a K8s pod id in a consumer group id is a really bad idea - or how setting batch size to 1 and linger to 0 is a really bad idea - the former blows up disk (all those unique consumer groups cause the backing topic to consume a lot of space, as the topic is by default only compacted) and the latter thrashes request handler CPU time.

j45

2 months ago

But it can do so many processes a second I’ll be able to scale to the moon before I ever launch.

010101010101

2 months ago

3 replies

> If you don't need what kafka offers, don't use it.

This is literally the point the author is making.

uberduper

2 months ago

2 replies

It seems like their point was to criticize people for using new tech instead of hacking together unscalable solutions with their preferred database.

blenderob

2 months ago

1 reply

That wasn't their point. Instead of posting snarky comments, please review the site guidelines:

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."

lenkite

2 months ago

1 reply

But honestly, isn't that the strongest plausible interpretation according to the "site guidelines" ? When one explicitly says that the one camp chases "buzzwords" and the other chases "common sense", how else are you supposed to interpret it ?

blenderob

2 months ago

> how else are you supposed to interpret it?

It's not so hard. You interpret it how it is written. Yes, they say one camp chases buzzwords and another chases common sense. Critique that if you want to. That's fine.

But what's not written in the OP is some sort of claim that Postgres performs better than Kafka. The opposite is written. The OP acknowledges that Kafka is fast. Right there in the title! What's written is OP's experiments and data that shows Postgres is slow but can be practical for people who don't need Kafka. Honestly I don't see anything bewildering about it. But if you think they're wrong about Postgres being slow but practical that's something nice to talk about. What's not nice is to post snarky comments insinuating that the OP is asking you to design unscalable solutions.

EdwardDiego

2 months ago

Which is crazy, because Kafka is like olllld compared to competing tech like Pulsar and RedPanda. I'm trying to remember what year I started using v0.8, it was probably mid-late 2010s?

PeterCorless

2 months ago

2 replies

But in this case, it is like saying "You don't need a fuel truck. You can transport 9,000 gallons of gasoline between cities by gathering 9,000 1-gallon milk jugs and filling each, then getting 4,500 volunteers to each carry 2 gallons and walk the entire distance on foot."

In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.

[Disclosure: I work for Redpanda]

kragen

2 months ago

3 replies

Getting a 288-core machine might be easier than setting up Kafka; I'm guessing that it would be a couple of weeks of work to learn enough to install Kafka the first time. Installing Postgres is trivial.

brianmcc

2 months ago

2 replies

"Lots of the team knows Postgres really well, nobody knows Kafka at all yet" is also an underrated factor in making choices. "Kafka was the ideal technical choice but we screwed up the implementation through well-intentioned inexperience" being an all too plausible outcome.

freedomben

2 months ago

1 reply

Indeed, I've seen this happen first hand where there was really only one guy who really "knew" Kafka, and it was too big of a job for just him. In that case it was fine until he left the company, and then it became a massive albatross and a major pain point. In another case, the eng team didn't really have anyone who really "knew" Kafka but used a managed service thinking it would be fine. It was until it wasn't, and switching away is not a light lift, nor is mass educating the dev team.

Kafka et al definitely have their place, but I think most people would be much better off reaching for a simpler queue system (or for some things, just using Postgres) unless you really need the advanced features.

EdwardDiego

2 months ago

1 reply

I'm wondering why there wasn't any push for the Kafka guy to share his knowledge within his team, or to other teams?

freedomben

2 months ago

1 reply

Multiple factors (neither a good excuse, just reality):

* Lack of interest for other team members, which translated to doing what they thought was a sufficiently minimal amount of knowledge transfer

* An (unwise) attitude that "it's already set up and configured, and terraformed, so we can just acquire that knowledge if and when it's needed"

* Kafka guy left a lot faster than anybody really expected, not leaving much time and practically no documentation

* The rest of the team was already overwhelmed with other responsiblities and didn't have much bandwidth available

* Nobody wanted to be the person/people that ended up "owning" it, so there was a reverse incentive

kragen

2 months ago

Interesting, thanks!

enetherAuthor

2 months ago

This is the crux of my point.

Postgres is the solution in question of the article because I simply assume the majority of companies will start with Postgres as their first piece of infra. And it is often the case. If not - MySQL, SQLite, whatever. Just optimize for the thing you know, and see if it can handle your use case (often you'll be surprised)

PeterCorless

2 months ago

1 reply

The only thing that might take "weeks" is procrastination. Presuming absolutely no background other than general data engineering, a decent beginner online course in Kafka (or Redpanda) will run about 1-2 hours.

You should be able to install within minutes.

kragen

2 months ago

1 reply

I mean, setting up Zookeeper, tweaking the kernel settings, configuring the hardware, the kind of stuff mentioned in guides like https://medium.com/@ankurrana/things-nobody-will-tell-you-se... and https://dungeonengineering.com/the-kafkaesque-nightmare-of-m.... Apparently you can do without Zookeeper now, but that's another choice to make, possibly doing careful experiments with both choices to see what's better. Much more discussion in https://news.ycombinator.com/item?id=37036291.

None of this applies to Redpanda.

PeterCorless

2 months ago

2 replies

True. Redpanda does not use Zookeeper.

Yet to also be fair to the Kafka folks, Zookeeper is no longer default and hasn't been since April 2025 with the release of Apache Kafka 4.0:

"Kafka 4.0's completed transition to KRaft eliminates ZooKeeper (KIP-500), making clusters easier to operate at any scale."

Source: https://developer.confluent.io/newsletter/introducing-apache...

EdwardDiego

2 months ago

Good on you for being fair in this discussion :)

kragen

2 months ago

Right, I was talking about installing Kafka, not installing Redpanda. Redpanda may be perfectly fine software, but bringing it up in that context is a bit apples-and-oranges since it's not open-source: https://news.ycombinator.com/item?id=45748426

EdwardDiego

2 months ago

1 reply

Just use Strimzi if you're in a K8s world (disclosure used to work on Strimzi for RH, but I still think it's far better than Helm charts or fully self-managed, and far cheaper than fully managed).

kragen

2 months ago

1 reply

Thanks! I didn't know about Strimzi!

EdwardDiego

2 months ago

Even though I'm a few years on from Red Hat, I still really recommend Strimzi. I think the best way to describe it is "a sorta managed Kafka". It'll make things that are hard in self-managed Kafka (like rolling upgrades) easy as.

ilkhan4

2 months ago

I'll push the metaphor a bit: I think the point is that if you have a fleet of vehicles you want to fuel, go ahead and get a fuel truck and bite off on that expense. However, if you only have 1 or 2, a couple of jerry cans you probably already have + a pickup truck is probably sufficient.

blenderob

2 months ago

>> If you don't need what kafka offers, don't use it.

> This is literally the point the author is making.

Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!

PeterCorless

2 months ago

2 replies

Exactly. Just yesterday someone posted how they can do 250k messages/second with Redpanda (Kafka-compatible implementation) on their laptop.

https://www.youtube.com/watch?v=7CdM1WcuoLc

Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.

Just because you can do something with Postgres doesn't mean you should.

> 1. One camp chases buzzwords.

> 2. The other camp chases common sense

In this case, is "Postgres" just being used as a buzzword?

[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]

j45

2 months ago

1 reply

Is it about what Kafka could get or what you need right now.

Kafka is a full on steaming solution.

Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.

kitd

2 months ago

1 reply

> Kafka is a full on steaming solution.

Freudian slip? ;)

j45

2 months ago

Haha, and a typo!

mxey

2 months ago

4 replies

Doesn’t Kafka/Redpanda have to fsync for every message?

PeterCorless

2 months ago

1 reply

Yes, for Redpanda. There's a blog about that:

"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."

However, for all that said, Redpanda is still blazingly fast.

https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...

uberduper

2 months ago

4 replies

I'm highly skeptical of the method employed to simulate unsync'd writes in that example. Using a non-clustered zookeeper and then just shutting it down, breaking the kafka controller and preventing any kafka cluster state management (not just preventing partition leader election) while manually corrupting the log file. Oof. Is it really _that_ hard to lose ack'd data from a kafka cluster that you had to go to such contrived and dubious lengths?

kasey_junk

2 months ago

Kafka no longer has Zookeeper dependency and RedPanda never did (this is just an aside for those reading along, not a rebuttal).

jackvanlightly

2 months ago

We fixed that particular issue: https://jack-vanlightly.com/blog/2023/8/17/kafka-kip-966-fix...

mxey

2 months ago

> while manually corrupting the log file

To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.

That's why in PostgreSQL it's feasible to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G... but not to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G....

mxey

2 months ago

I just read the post and didn’t find it contrived at all. The point is to simulate a) network isolation and b) loss of recent writes.

uberduper

2 months ago

1 reply

I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.

mxey

2 months ago

1 reply

If I don’t actually want durable and consistent data, I could also turn off fsync in Postgres …

mrkeen

2 months ago

1 reply

The tradeoff here is that Kafka will still work perfectly if one of its instances goes down. (Or you take it down, for upgrades, etc.)

Can you lose one Postgres instance?

zozbot234

2 months ago

1 reply

AIUI Postgres has high-availability out of the box, so it's not a big deal to "lose" one as long as a secondary can take over.

mxey

2 months ago

Only replication is built-in, you need to add a cluster manager like Patroni to make it highly-available.

UltraSane

2 months ago

On enterprise grade storage writes go to NVRAM buffers before being flushed to persistent storage so this isn't much of a bottleneck.

kragen

2 months ago

Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.

odie5533

2 months ago

How fast is failover?

qsort

2 months ago

I feel so seen lol. I work in data engineering and the first paragraph is me all the time. There are a lot of cool technologies (timeseries databases, vector databases, stuff like Synapse on Azure, "lakehouses" etc.) but they are mostly for edge cases.

I'm not saying they're useless, but if I see something like that lying around, it's more likely that someone put it there based on vibes rather than an actual engineering need. Postgres is good enough for OpenAI, chances are it's good enough for you.

honkostani

2 months ago

Resume driven design, is running into the desert of moores plateau punishing the use of ever more useless abstractions. They get quieter, because their projects keep on dying after the revolutionary tech is introduced and they jump ship.

cpursley

2 months ago

Related: https://www.pgflow.dev

It's built on pgmq and not married to supabase (nearly everything is in the database).

Postgres is enough.

241 more comments available on Hacker News

View full discussion on Hacker News

ID: 45747018Type: storyLast synced: 11/22/2025, 11:00:32 PM

Want the full context?