Litestream V0.5.0
Posted3 months agoActive3 months ago
fly.ioTechstoryHigh profile
excitedpositive
Debate
40/100
LitestreamSqliteDatabase Replication
Key topics
Litestream
Sqlite
Database Replication
The release of Litestream v0.5.0 has generated significant interest and enthusiasm among HN users, with discussions focusing on its features, use cases, and potential applications.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
20h
Peak period
124
18-24h
Avg / period
17.8
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 2, 2025 at 3:02 PM EDT
3 months ago
Step 01 - 02First comment
Oct 3, 2025 at 10:47 AM EDT
20h after posting
Step 02 - 03Peak activity
124 comments in 18-24h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 6, 2025 at 10:51 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45453936Type: storyLast synced: 11/20/2025, 8:32:40 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
> But the market has spoken! Users prefer Litestream. And honestly, we get it: Litestream is easier to run and to reason about. So we’ve shifted our focus back to it.
at the end of the day with litestream, when you respond back to a client with a successful write you are only guaranteeing a replication factor of 1.
SQLite is designed for one local client at a time. Client-server relational databases are designed for many clients at a time.
If you have lots of concurrent writes SQLite isn't the right solution. For concurrent reads it's fine.
SQLite also isn't a network database out-of-the-box. If you want to be able to access it over the network you need to solve that separately.
(Don't try and use NFS. https://sqlite.org/howtocorrupt.html#_filesystems_with_broke... )
That becomes an instant problem if users ever write to your database. You can't duplicate the environment unless it's read-only.
And even if the database is read-only for users, the fact that every time you update it you need to redeploy the database to every client, is pretty annoying.
That's why it's usually better to start with Postgres or MySQL. A single source of truth for data makes everything vastly easier.
If you have a read-heavy app (99% of saas) that runs on one server and dont have millions of users then sqlite is a great option.
Usually you want to be able to run multiple webservers against a single database though, since that's the first thing you'll usually need to scale.
Now if I'm building a local app then absolutely sqlite makes the most sense but I don't see it otherwise.
Systemctl's only in there to restart it if it crashes; litestream itself is (iirc) a single cli binary.
I love postgres but in no way is it as simple to run as sqlite (pretty sure even postgres core team would agree that postgres is more complex than sqlite).
People care about things like long-term support. Postgres 13, from 2020, is still officially supported. Litestream 0.1.0 was the first release, also from 2020, but I can't tell if it is supported still. Worrying about the maturity, stability, and support of an application database is very reasonable in risk adverse projects.
Besides the question wasnt litestream vs postgres backup apps. It was sqlite vs postgres.
More than once I've started a project with sqlite and then had to migrate to postgres. In my experience it's because of the way sqlite has to lock the whole database file while writing to it, blocking concurrent reads - this isn't a problem in postgres. (There's WAL mode, but it still serialises all writes, and requires periodic explicit checkpointing IME)
You may also find you really want a feature postgres has, for example more advanced data types (arrays, JSON), more advanced indices (GIN inverted index on array members), replication...
If you’re building something as a hobby project and you know it will always fit on one server, sqlite is perfect.
If it’s meant to be a startup and grow quickly, you don’t want to have to change your database to horizontally scale.
Deploying without downtime is also much easier with multiple servers. So again, it depends whether you’re doing something serious enough that you can’t tolerate dropping any requests during deploys.
For sure downtime is easier with kubernete etc but again overkill for 99,99% of apps.
On the other side, why not just store everything in memory and flush to a local json file if you won't have any users? sqlite is overkill!
It's definitely really nice though that if you do choose sqlite initially to keep things as small and simple as possible, you don't immediately need to switch databases if you want to scale.
This is essentially what GraphQL does instead of crafting each of these super tailored API endpoints for each of your screens, you use their query language to ask for the data you want, it queries the DB for you and get you the data back in a single network roundtrip from the user perspective.
(Not an expert, so I trust comments to correct what I got wrong)
No, JOINs should be orders of magnitude faster.
> What are you really supposed to do if you need to, e.g. fetch a list of posts along with the number of comments on each post?
You're really supposed to do a JOIN, together with a GROUP BY and a COUNT(). This is elementary SQL.
of course, it's all possible with custom SQL but it gets complicated quick.
The filtering you describe is trivial with COUNT(flag IN (...) AND state=...) etc.
If you want to retrieve data on the last comment, as opposed to an aggregate function of all comments, you can do that with window functions (or with JOIN LATERAL for something idiomatic specifically to Postgres).
Learning how to do JOIN's in SQL is like learning pointers in C -- in that it's a basic building block of the language. Learning window functions is like learning loops. These are just programming basics.
For anything that isn't just a basic row lookup from a single table, you should really just be writing the SQL yourself.
Alternately, if you can separate query issuance from result parsing, you can make N+1 palletable. Ex, do your query to get the ids, wait for the results, loop and issue the N queries, then loop and wait for results in order. That will be two-ish round trips rather than N+1 round trips. But you have to search to find database apis that allow that kind of separation.
[1] You can almost always express the query you want in SQL, but that doesn't mean it will have a reasonable runtime. Sometimes server side join and client side join have about the same runtime... if it's significant and you have the usual case where clients are easier to scale than database servers, it might be worthwhile to have the join run on the client to reduce server load.
For the latter, it's along the lines of `select * from posts where ...` and `select * from authors where id in {posts.map(author_id)}`. And then once it's in memory you manually work out the associations (or rely on your ORM to do it).
More formally, the number of queries should be constant and not linearly scaling with the number of rows you're processing.
In particular, JSON aggregations mean you can have a single query that does things like fetch a blog entry and the earliest 10 comments in a single go. I wrote up some patterns for doing that in SQLite and PostgreSQL here: https://github.com/simonw/til/blob/main/sqlite/related-rows-...
Here's an example PostgreSQL query that does this to fetch tags for posts: https://simonwillison.net/dashboard/json-agg-example
The alternative, more common path is the pattern that Django calls "prefetch_related". Effectively looks like this: Once you have a list of e.g. 20 IDs you can run a bunch of cheap additional queries to fetch extra data about all 20 of those items.In general, you want to ask the remote server once for all the data you need, then read all the results. It applies to databases as well as APIs.
Pipelined requests also solve the problem and can be more flexible.
Also, joins can be optimised in different ways. Sometimes the optimal way to do a join isn't to query each row one-by-one, but to do something like (when the rows you want are a large fraction of the rows that exist) making a bloom filter of the rows you want and then sequentially reading all the rows in the table.
You can install MySQL/PostgreSQL on the application server, connect over a unix socket and get the same benefits as if you'd used SQLite on the application server (no network latency, fast queries). Plus the other benefits that come from using these database servers (Postgres extensions, remote connections, standard tooling etc). I'm guessing more RAM is required on the application server than if you used SQLite but I haven't benchmarked it.
In my opinion if you have an easy way to run postgres,MySQL,... - just run that.
There's usually a lot of quirks in the details of DB usage (even when it doesn't immediately seem like it - got bitten by it a few times). Features not supported, different semantics, ...
IMO every project has an "experimental stuff" budget and if you go over it it's too broken to recover, and for most projects there's just not that much to win by spending them on a new database thing
This is an interesting take; why do you see recent hype around the most boring and stone-age of technologies, SQLite?
Not super sure who followed who but there was all of a sudden a lot of excitement
SQLite's "buzz" isn't new, type "sqlite" into my https://tools.simonwillison.net/hacker-news-histogram tool and you'll see interest (on HN at least) has been pretty stable since 2021.
Bit more stretched out than I thought it had been
While I don't have stats about every conference talk for the last decade, my experience has been that SQLite has been featured more in Rails conference talks. There's a new book titled "SQLite on Rails: The Workbook" that I don't think would have had an audience five years ago. And I've noticed more blog posts and more discussion in Rails-related discussion platforms. Moreover, I expect we'll see SQLite gain even more in popularity as it simplifies multi-agent development with multiple git worktrees.
1- https://law-theorem.com/
So this fills that gap by giving you a database as a service level of QOL without needing to provision a database as a service backend. Otherwise you're dicking about maintaining a service with all that comes with that (provisioning, updating, etc) when really all you need is a file that is automagically backed up or placed somewhere on the web to avoid the drawbacks of the local file system.
> If there are many client programs sending SQL to the same database over a network, then use a client/server database engine instead of SQLite.
https://www.sqlite.org/whentouse.html
MySQL, Postgres, etc. have a much greater overhead for setup, unless you want to pay for a managed database, which is not going to be worth the price for small quantities of data.
In every case where I had a SQLite vertical that required resilience, the customer simply configured the block storage device for periodic snapshots. Litestream is approximately the same idea, except you get block device snapshots implicitly as part of being in the cloud. There is no extra machinery to worry about and you won't forget about a path/file/etc.
Also, streaming replication to S3 is not that valuable an idea to me when we consider the recovery story. All other solutions support hot & ready replicas within seconds.
But most apps should just use a classic n-tier database architecture like Postgres. We mostly do too (though Litestream does back some stuff here like our token system).
If you have access to a database that is well managed on your behalf I would definitely still go with that for many usecases.
But I'd love to hear more from someone more well-versed in the use cases for reliable sql-lite
I once was responsible for migrating a legacy business app to Azure, and the app had a local MSSQL server co-running with the app (the same pattern that Litestream is using).
As have been mentioned below, the app had been developed assuming the local access (and thus <1ms latency), so it had a ton of N+1 everywhere.
This made it almost impossible to migrate/transition to another configuration.
So, if this style of app hosting doesn't take off and you're at all worried about this being a dead end storage once you reach a certain scale, I'd recommend not doing this, otherwise your options will be very limited.
Then again - I bet you could get very very far on a single box, so maybe it'd be a non factor! :)
It was a Rails app, therefore easy to get into the N+1 but also somewhat easy to fix.
Probably some of the most valuable code I've ever written on a per LOC basis lol.
But anyhow, merging that into a new project was always a fun day. But on the other side of the cleanup the app stops falling down due to memory leaks.
A naive ORM setup will often end up doing a 1 query to get a list of object it needs, and then perform N queries, one per object, usually fetching each object individually by ID or key.
So for example, if you wanted to see “all TVs by Samsung” on a consumer site, it would do 1 query to figure out the set of items that match, and then if say 200 items matched, it would do 200 queries to get those individual items.
ORMs are better at avoiding it these days, depending on the ORM or language, but it still can happen.
I really wish there was a way to compose SQL so you can actually write the dumb/obvious thing and it will run a single query. I talked with a dev once who seemed to have the beginnings of a system that could do this. It leveraged async and put composable queryish objects into a queue and kept track of what what callers needed what results, merged and executed the single query, and then returned the results. Obviously far from generalizable for arbitrary queries but it did seem to work.
e.g. for ActiveRecord there's ar_lazy_preloader[0] or goldiloader[1] which fix many N+1s by keeping track of a context: you load a set of User in one go, and when you do user.posts it will do a single query for all, and when you then access post.likes it will load all likes for those and so on. Or, if you get the records some other way, you add them to a shared context and then it works.
Doesn't solve everything, but helps quite a bit.
[0] https://github.com/DmitryTsepelev/ar_lazy_preload
[1] https://github.com/salsify/goldiloader
It used to be a very common pitfall - and often not at all obvious. You’d grab a collection of objects from the ORM, process them in a loop, and everything looked fine because the objects were already rehydrated in memory.
Then later, someone would access a property on a child object inside that loop. What looked like a simple property access would silently trigger a database query. The kicker was that this could be far removed from any obvious database access, so the person causing the issue often had no idea they were generating dozens (or hundreds) of extra queries.
Then, the framework will be used to do something like this
which results in N SQL queries with and there's your N+1We ran into this while building, funnily enough, a database management app called DB Pro (https://dbpro.app) At first we were doing exactly that: query for all schemas, then for each schema query its tables, and then for each table query its columns. On a database with hundreds of tables it took ~3.8s.
We fixed it by flipping the approach: query all the schemas, then all the tables, then all the columns in one go, and join them in memory. That dropped the load time to ~180ms.
N+1 is one of those things you only really “get” when you hit it in practice.
It's usually a big problem for database performance because each query carries additional overhead for the network round trip to the database server.
SQLite queries are effectively a C function call accessing data on local disk so this is much less of an issue - there's an article about that in the SQLite docs here: https://www.sqlite.org/np1queryprob.html
With single instances topping out at 20+ TBs of RAM and hundreds of cores, I think this is likely very under-explored as an option
Even more if you combine this with cell-based architecture, splitting on users / tenants instead of splitting the service itself.
Use k9s (not a misspelling) and headlamp to observe your cluster if you need a gui.
Then you are off to races. you can add more nodes etc later to give it a try.
Configure the init flags to disable all controllers and other doodads, deploy them yourself with Helm. Helm sucks to work with but someone has already gone through the pain for you.
AI is GREAT at K8s since K8s has GREAT docs which has been trained on.
A good mental model is good: It's an API with a bunch of control loops
I would say the biggest downside is that managing databases is less rigid than using something like RDS, but the flip side is that my DB is far more performant and far cheaper (I own the CPU cycles! no noisy neighbors.), and I still run daily backups to external object storage.
Once you get k8s running, it kind of just works. And when I want to do something funky or experimental (like splitting AI bots to separate pods), I can go ahead and do that with ease.
I run two separate k8s "clusters" (both single node) and I kind of love it. k9s (obs. tool) is amazing. I built my own logging platform because I hated all the other ones, might release that into its own product one day (email in my profile if you're interested).
If you need to deploy it elsewhere, you just install k3s/k8s or whatever and apply the yamls (except for stateful things like db).
IT also handles name resolution with service names, restarts etc.
IT's amazing.
Its not its fault. :)
Such a system would also require a side channel propagating WAL updates (over Kafka or similar) to replicas, so that the running replicas can update themselves incrementally and stay fresh without loading anything from object storage.
https://litestream.io/guides/s3/
I think this is also roughly what Turso is, although it's becoming a SQLite-compatible db rather than vanilla
https://docs.turso.tech/features/embedded-replicas/introduct...
https://docs.turso.tech/cloud/durability
Turso looks cool and is a project I will keep an eye on, but it's replica mode seems to be designed for apps rather than mechanism to rapidly scale out read replicas in a server cluster. Also, the web site warns that it's not ready for production use.
Litestream is working on that now - the code is already in https://github.com/benbjohnson/litestream/blob/v0.5.0/vfs.go but it's not yet a working, documented feature.
They already have a prototype, and... it's pretty rough on the edges.
I'm porting it to my Go SQLite driver and already ran into a bunch of issues. But it seems at least feasible to get it into a working shape.
https://github.com/benbjohnson/litestream/issues/772
https://github.com/ncruces/go-sqlite3/compare/main...litestr...
What exactly are you trying to port?
After some testing, I expect it to be working correctly in the vast majority of cases, but the performance may disappoint. The original still needs work.
https://github.com/ncruces/go-sqlite3/tree/litestream
When I saw the v0.5.0 tag, I dived into just porting it. It's just over a couple hundred lines, and I have more experience with SQLite VFSes than most, so why not.
But it's still pretty shaky.
https://developers.cloudflare.com/d1/best-practices/read-rep...
>But the market has spoken! Users prefer Litestream.
>Yeah, we didn’t like those movies much either.
The NATS Jetstream use case is something I'm curious about.
Cheers and keep up the great work on Litestream.
I love Litestream and use it in every app I build now.
They advertise it as costing "pennies per day," but it's even less expensive than that. It obviously varies depending on how much storage you need, but I had a real app in production, and Litestream replication to S3 only cost me 2-3 cents ($0.02-$0.03) per month.[0]
[0] https://mtlynch.io/litestream/#using-logpaste-in-production
https://fly.io/blog/litestream-revamped/ https://news.ycombinator.com/item?id=44045292
29 more comments available on Hacker News