Dbos: Durable Workflow Orchestration with Go and Postgresql
Key topics
The HN community discusses DBOS, a Go library for durable workflow orchestration using PostgreSQL, with comments highlighting its potential, comparisons to similar projects like Temporal, and concerns about its production readiness.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4d
Peak period
39
84-96h
Avg / period
20
Based on 60 loaded comments
Key moments
- 01Story posted
Sep 29, 2025 at 8:35 AM EDT
3 months ago
Step 01 - 02First comment
Oct 2, 2025 at 10:39 PM EDT
4d after posting
Step 02 - 03Peak activity
39 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 4, 2025 at 7:42 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The big difference, like that blog post (https://www.dbos.dev/blog/durable-execution-coding-compariso...) describes, is the operational model. DBOS is a library you can install into your app, whereas Temporal et al. require you to rearchitect your app to run on their workers and external orchestrator.
For example, a Rust library. Am I missing how a go library is useful for non-go applications?
No Rust yet, but we'll see!
1. How much work is it to add bindings for new languages? 2. I know you provide conductor as a service. What are my options for workflow recovery if I don't have outbound network access? 3. Considering this came out of https://dbos-project.github.io/, do you guys have plans beyond durable workflows?
2. There are built-in APIs for managing workflow recovery, documented here: https://docs.dbos.dev/production/self-hosting/workflow-recov...
3. We'll see! :)
Also, how is DBOS handling workflow versioning?
Looking forward for your Java implementation. Thanks
DBOS naturally scales to distributed environments, with many processes/servers per application and many applications running together. The key idea is to use the database concurrency control to coordinate multiple processes. [1]
When a DBOS workflow starts, it’s tagged with the version of the application process that launched it. This way, you can safely change workflow code without breaking existing ones. They'll continue running on the older version. As a result, rolling updates become easy and safe. [2]
[1] https://docs.dbos.dev/architecture#using-dbos-in-a-distribut...
[2] https://docs.dbos.dev/architecture#application-and-workflow-...
So applications continuously poll the database for work? Have you done any benchmarking to evaluate the throughput of DBOS when running many workflows, activities, etc.?
Throughput mainly comes down to database writes: executing a workflow = 2 writes (input + output), each step = 1 write. A single Postgres instance can typically handle thousands of writes per second, and a larger one can handle tens of thousands (or even more, depending on your workload size). If you need more capacity, you can shard your app across multiple Postgres servers.
I use NATS to acheive this type of durable processing. It works well. Of course, idempotent code is needed but I don't think this can be avoided.
The library seems fantastic but my team did not use this because at scale they believe that the number of DB reads and writes becomes very significant for a large number of workflows with many steps and that with PG vs Cassandra/ScyllaDB it would not be feasible for our throughput. I tried to convince them otherwise but it is difficult to quantify from the current documentation.
The cost of DBOS durable execution is 1 write per step (checkpoint the outcome) and 2 additional writes per workflows (upsert the workflow status, checkpoint the outcome). The write size is the size of your workflows/steps output.
Postgres can support several thousands writes per seconds (influenced by the write size, ofc): DBOS can thus support several thousands of workflows/steps per second.
Postgres scales remarkably well. In fact, most org will never out scale a single, vertically scaled Postgres instance. There's a very good write up by Figma telling how they scaled Postgres horizontally: https://www.figma.com/blog/how-figmas-databases-team-lived-t...
what is your input on these two topics? aka pull vs push and working well with serverless workflows
This sounds...impossible? If you have some step in your workflow, either you 1) record it as completed when you start, but then you can crash halfway through and when you restore the workflow it now isn't processed 2) record it as completed after you're done, but then you can crash in-between completing and recording and when you restore you run the step twice.
#2 sounds like the obvious right thing to do, and what I assume is happening, but is not exactly once and you'd need to still be careful that all of your steps are idempotent.
For step processing, what you say is true--steps are restarted if they crash mid-execution, so they should be idempotent.
The DBOS workflow execution itself is idempotent (assume each step is idempotent). When DBOS starts a workflow, the "start" (workflow inputs) is durably logged first. If the app crashes, on restart, DBOS reloads from Postgres and resumes from the last completed step. Steps are checkpointed so they don't re-run once recorded.
You specifically need exactly once when the action you are doing is not idempotent.
Where DBOS really shines (vs. Temporal and other workflow systems) is a radically simpler operational model--it's just a library you can install in your app instead of a big heavyweight cluster you have to rearchitect your app to work with. This blog post goes into more detail: https://www.dbos.dev/blog/durable-execution-coding-compariso...
Golem [1] is an interesting counterexample to this. They run your code in a WASM runtime and essentially checkpoint execution state at every interaction with the outside world.
But it seems they are having trouble selling into the workflow orchestration market. Perhaps due to the preconception above? Or are there other drawbacks with this model that I’m not aware of?
1. https://www.golem.cloud/post/durable-execution-is-not-just-f...
Plus, if the crash happens in the outside world (where you have no control), then checkpointing at finer granularity won't help.
For example, if you call an API (the outside world) to charge the user’s credit card, and the WASM host fails and the process is restarted, you’ll need to be careful to not charge again. This can happen after the request is issued, but before the response is received/processed.
This is no different than any other workflow library or service.
The WASM idea is interesting, and maybe lets you be more granular in how you checkpoint (eg for complex business logic that is self-contained but expensive to repeat). The biggest win is probably for general preemption or resource management, but those are generally wins for the provider not the user. Also, this requires compiling your application into WASM, which restricts which languages/libraries/etc you can use.
From what I can tell though, NF just runs a single workflow at a time, no queue or database. It relies on filesystem caching for "durability". That's changing recently with some optional add-ons.
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... - Not even 3 full pages worth over the past 5 years, though the first page is entirely from this year. It's maybe 2-3 a month on average this year, and a lot are dupes.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... - Nim, for comparison, which doesn't really make a dent in the programming world but shows up a lot. The first 15 pages covers the same time period.
Being able to see the state of workflows and their histories is a key part of having an application in production. Without a control plane, my understanding is that DBOS can't offer the same kind of failure recovery as Temporal, though it's unclear to me how the "Transact" engine does this.
A big benefit that Temporal's architecture provides is separation of concerns. Temporal can coordinate workflows across many apps, whereas with DBOS each app (as far as I understand it, at least) is a silo managing its own queues.
A postgres server can host many databases, and multiple applications can use the same server. The same dashboard can be used to monitor them all.
With respect to recovery: A new Transact process will run a round of recovery at startup. Transact also exposes an admin server with a recovery endpoint.
For more elaborate scenarios, we have control plane options commercially available.
You can share a database server with DBOS, but it's common to give applications dedicated database resources (one Postgres cluster per app in different regions), meaning it won't work with DBOS unless you write your own federated control layer that can speak to multiple instances. Which is also not offered out of the box. Sharing one DBOS-specific server across all apps would introduce a single point of failure.
Again, I like DBOS, but right now the value proposition isn't that great given that Temporal has already nailed this.
Even better if the interface is also embeddable into a go http handler
https://www.pgflow.dev
They use supabase (demo functions) as client but could be your language of choice.
https://www.pgflow.dev
They use supabase (demo functions) as client but could be your language of choice.
https://www.pgflow.dev
They use supabase (demo functions) as client but could be anything.