Replace Postgresql with Git for Your Next Project
Posted4 months agoActive4 months ago
devcenter.upsun.comTechstory
skepticalnegative
Debate
80/100
GitDatabaseVersion Control
Key topics
Git
Database
Version Control
The article suggests using Git as a replacement for PostgreSQL, sparking a heated discussion among commenters who question the practicality and performance of such an approach.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
12m
Peak period
11
0-1h
Avg / period
4.8
Comment distribution29 data points
Loading chart...
Based on 29 loaded comments
Key moments
- 01Story posted
Sep 24, 2025 at 11:16 AM EDT
4 months ago
Step 01 - 02First comment
Sep 24, 2025 at 11:28 AM EDT
12m after posting
Step 02 - 03Peak activity
11 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 25, 2025 at 3:46 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45361574Type: storyLast synced: 11/20/2025, 1:39:00 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I assume you'd struggle to get a few hundred commits per second even on good hardware?
With respect to querying Git repos, I was pleasantly surprised with how usable git cat-file --batch was as a programmatic way to walk around the Git graph in http://canonical.org/~kragen/sw/dev3/threepowcommit.py, which reads about 8000 commits per second on my laptop—not fast, but not as slow as you'd expect.
> While Git makes an interesting database alternative for specific use cases, your production applications deserve better. Upsun provides managed PostgreSQL, MySQL, and other database services
1. Built-in audit trails. Git's a hash tree, right? So you can't tamper with it? Wrong. You can force push and delete commits. By default, git garbage collects dangling commits and old branches. In order to make a cryptographic audit trail, you'd need to publish the hash tree publicly and non-deniably so your auditor can know if you tampered with the logs.
2. Atomic transactions. Many databases support these, with a lot of granular control over the details. Postgres is one of them. (The ones that don't may have good reasons.) Git does not have much granular control; merges are not guaranteed to keep any data intact.
3. Distributed architecture. Good point. But wait, that's not really true. How do we decide what the real "main" is? It's whatever main is on your upstream, which in almost all cases is a centralized server. And if it's not, git doesn't provide any automated tooling for dealing with conflicts between multiple sources of truth (no algorithm for byzantine generals, for example). So this is a red herring.
4. Content addressing. Does this mean a commit hash? Like, say, an _index_ but without the ability to index over any other data?
In my career, I have seen git used twice as a "database" in a sensible way: for tracking source code changes (that is, changes to structured text files). This is precisely the use case it was designed for. And in both cases, it was actually underutilized---no real merge capability, just file versioning, so they could have gone simpler.
But git is not even good for most "collaborative text editing" since merge handling is not automatic. You wouldn't use it for a google-docs style platform, for example.
Edit: formatting
Which is exactly what they claim to use it for.
> so they could have gone simpler.
Maybe, but they also claim that it is a desirable, maybe even necessary, property to ingrate with existing git use. So you can only go simpler until you have to actually interface with git and then any possible downsides of git come rearing their ugly head at this point anyway. So what have you gained by just not using git from top to bottom?
Contextual.
You are right that the article is a bit hard to follow where it goes down roads that it also says you shouldn't actually do, but that is there just to explain that git is a database, which isn't as obvious as it should be to some people. There are no doubt better ways to get that idea across, but someone in the business of providing managed Postgres hosting isn't apt to have writing long-form text as a core competency. Nor should one need to have it as their core competency. Writing is for everyone — even those who aren't particularly good at it.
So this is kinda the contrary to "use Postgres for everything" vibe. Consider using Git as an actual database if you were considering using Sqlite or simple text files, or consider actually wrapping those with Git because it will give you quite a bit of extra oomph with very little effort.
I personally used git as a database on a project where we had just 200 records. With infrequent changes. Some bulk updates. That needed, for weird reasons, to be synchronized every ~12 months, yeah, and some of the nodes were air-gapped. As far as I know, more than 15 years later this is still running with basically 0 maintenance (other than git being upgraded with OS updates).
The fact is that basically every data structure can be abused as a database (python dicts, flat files, wave in mercury traveling down a long tube). Don’t reinvent the wheel, learn to use a power tool like Postgres.
I never did the next part but yeah, I was hoping to write some agents running remotely to actively follow the repo & then rewrite history with better commit names. It was pre AI boom, but auto-summarizers & taggers running as their own agents were a fantasy ask, something that felt like the architecture would have been condusive for if/when the disaggregated work became possible.
Git as eventsourcing is kind of a ground floor. Being able to decoupled the processing is the loftier hope for centralizing around git imo.
There's a lot of people yelling pretty loudly 'git would never work' 'the performance wouldn't be enough' 'write your own app specific version control if you need it', but being in the world's best ecosystem with the world's best tools for version controlled things feels like a no brainer. Theres so many fun distributed computing tasks folks get up to, so many ways of doing sync, but we developers already know a really good standard option and the world of tools around it is enviable beyond imagining.
Performance might also have some big upsides, by virtue of processing being so seamlessly decentealizable here too. I do think that the giant "huge tables of all intermixed customer data" monoliths / dataliths that most apps use is not a good fit, won't scale nicely. But if you can give up some coherency, and take advantage of the scale out ability, it lets any machine any process loosely couple it's way into the transaction processing or analysis pipelines: it becomes more interesting and compelling. I've worked at a place where each customer had their own data tables, and BlueSky/atproto'a Personal Data Store'a one-sqlite-per-user also rather mirrors this architecture, of having lots of data stores, then separate systems doing any necessary aggregation with more typical big table architectures): scaling a lot of gits feels like the place where git would win.
Wandering off possible core architectural upsides briefly: biggest likely challenge I see is figuring out how to save state nicely into files. To really take advantage of diffs and merges, we need to walk away from treating git like a blob store, and imo likely walk away from JSON stores too. Finer grained updates, having the file system exposes the structure of the data feels quasi-mamdatory, if you want to really let git help you store data: you have to come down to its level, leverage the filesystem to file your data.
Json2dir for example nicely expands a single JSON into a directory tree of simple files, which git would be far more able to mall and track. https://github.com/alurm/json2dir https://news.ycombinator.com/item?id=44840307
It was just awkward to use. Diffs were weird and hard to wrap a UI around, search wasn't great, it was hard to make ids sequential (the EE team hated uuids), etc., and conflict resolution usually still had to be done in a code editor. Even the killer app, the audit trail, didn't work quite the way the EE team wanted. Code to work around the disparities was probably half the codebase.
I ended up migrating to a normal database after a few months and everything was a lot better.
A recent example of this was using Temporal for state management of multi-page user-driven workflows. In happy path use cases it made the code incredibly simple. But when CS had to get involved for a user, or when there was a bug and we had to patch a bunch of invalid user states in the backend after fixing it, there's often a need for some kind of direct access to that state, to update it in ways that hadn't necessarily been planned for. In Temporal, much of the state is an implicit "function pointer", which is what makes the code so nice in the happy path, but that you can't touch if you need to. Yeah you could rewrite the Temporal logic in terms of a state machine and event loop so that the function pointer always ends up at the same spot, but then you're essentially writing a state machine anyway, except without a good way to do bulk updates, and just paying extra latency, cost, and system overhead.
So it ended up that a database was still the best option. A little more boilerplate in happy path cases, but a lot more flexible in cases where unexpected things happened.
More seriously, i agree that to make a use-case fit seems too much of a stretch...Yes, its cool that git can be used in this fashion, as a neat experiment! But, for even non-serious needs, not sure that i would ever do this. But, still, very clever thinking for even thinking of and doing this; kudos!
https://github.com/projekt-dgb/dgb-server/blob/master/API.md...
In the current system, rights, owners and debts of any land parcel in Germany are simply recorded in PDF files, with an ID for each parcel. So, when adding a "record", the govt employees literally just open the PDF file in a PDF editor and draw lines in the PDF, then save it again. Some PDF files are simply scanned pages of typewriter text (often even pre-WW2), so the lines are just added on top. It's a state-of-the-art "digital" workflow for our wonderful, modern country.
Anyway, so I wrote an entire system to digitize it properly (using libtesseract + a custom verifying tool, one PDF file => one JSON file) and track changes to parcels using git. The system was also able to generate Änderungsmitteilungen (change notices) using "git diff" and automatically send them via E-Mail (or invoke a webhook to notify banks or other legal actors, that the parcel had changed - currently this is done using paper and letters).
It was a really cool system (desktop digitizing tool + web server for data management), but the German government didn't care, although some called it "quite impressive". So it's now just archived on GitHub. Their current problem with "digitization" was that every state uses different formats and extra fields for the and they are still (after 20+ years) debating about what the "standardized" database schema should be (I tried to solve that with an open, extensible JSON schema, but nah [insert guy flying out of window meme]). I'm a one-man show, not a multi-billion dollar company, so I didn't have the "power" to change much.
Instead, their "digital Grundbuch" (dabag) project is currently a "work in progress" for 20+ years: https://www.grundbuch.eu/nachrichten/ because 16 states cannot standardize on a unified DB scheme. So it's back to PDF files. Why change a working system, I guess. Germans, this is where your taxes are spent on - oh well, the project was still very cool.
See https://gerrit-review.googlesource.com/Documentation/note-db...