Against SQL (2021)
Posted2 months agoActive2 months ago
scattered-thoughts.netTechstory
calmmixed
Debate
80/100
SQLDatabase Query LanguagesData Management
Key topics
SQL
Database Query Languages
Data Management
The article 'Against SQL' critiques the SQL language, listing its shortcomings and proposing traits for a potential replacement, sparking a discussion on the merits and limitations of SQL.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
50
0-12h
Avg / period
9.7
Comment distribution68 data points
Loading chart...
Based on 68 loaded comments
Key moments
- 01Story posted
Oct 25, 2025 at 11:00 AM EDT
2 months ago
Step 01 - 02First comment
Oct 25, 2025 at 12:50 PM EDT
2h after posting
Step 02 - 03Peak activity
50 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 31, 2025 at 12:24 PM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45704419Type: storyLast synced: 11/20/2025, 5:57:30 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Everything may have been true at the time of writing, but details may be obsolete. For example this article refers to Neo4j. Knowing the article is 4 years old helps me understand that comment is not current.
The landscape can change quickly. The older an article the more one takes that into account. Given that this article promotes an alternative technique, Knowing the article is old allows me to wonder if any of the suggestions were gelled, and if so to what success.
In this case, since SQL has been around since the 70s, it's not surprising that the complaints are not novel, and are all likely to be true for years to come. SQL has truly enormous inertia on its side though.
On one hand SQL, is the most established relational db. Not sure what might drastically change about.
Python and Javascript are from the 90’s and have evolved as a language in their own way like SQL and others.
I was asking about the year as an individual comment here to understand what significance the year relative to the content of the topic it bore.
Having to think about it per-topic is just making it more complicated for no good reason. Especially since SQL does get new additions.
People posts comments like that as a reminder because the title originally didn't have it in there, someone edited it in after the comment.
Against SQL (2021) - https://news.ycombinator.com/item?id=43777515 - April 2025 (1 comment)
Against SQL - https://news.ycombinator.com/item?id=40454627 - May 2024 (1 comment)
Against SQL (2021) - https://news.ycombinator.com/item?id=39777515 - March 2024 (1 comment)
Against SQL - https://news.ycombinator.com/item?id=27791539 - July 2021 (339 comments)
* a list of things they don't like in sql
* a list of traits they think a replacement should exhibit by negating the first list
I was kind of hoping for some example of what this much better language should look like
It's not hyper-performant and mega web scale but the object database and Prolog like query language that comes with Picolisp is quite fun and sometimes rather useful, and has helped me think differently about how to model things in the default SQL database engines.
Pilog is similar to both, the basics are kind of easy to learn, it's basically a bit of Lisp:ish syntax and keywords for 'give me a subset from these sets'. But it's a graph of objects instead of tables or atoms.
The closest existing database to this ideal is probably FoundationDB although it also externalizes the query planner, which I don't necessarily consider a downside.
I don't love SQL, but somehow the alternatives haven't beaten it yet
A few top line items:
There is nothing too ground-breaking about it. Just streamlines some logic into a more holistic experience.[0] https://prql-lang.org/
Postgres already has this.
SQL isn't for everything.
Neither is starting with NOSQL thinking it might be better and then proceeding to spend way too many man years making it a relational database, when learning a bit of SQL would have handled it fine.
> The relational model is great ... but SQL is the only widely-used implementation of the relational model ...
I'm not too familiar with GraphQL but on the surface it seems like another bad idea. Shouldn't you always have some proper API abstraction between your components? My sense for this has been like GraphQL was invented out of the frustration of the frontend team needing to rely on backend teams for adding/changing APIs. But the answer can't be have no APIs?
All that said there might be some situations where your goal is to query raw/tabular data from the client. If that's your application then APIs that enable that can make sense. But most applications are not that.
EDIT: FWIW I do think SQL is pretty good at the job it is designed to do. Trying to replace it seems hard and with unclear value.
GraphQL was supposed to help front-end and back-end meet in the middle by letting front-end write specific queries to satisfy specific UX while back-end could still constrain and optimize performance. Front-end could do their work without having to coordinate with back-end, and back-end could focus on more important things than adding fields to some JSON output.
I think it's important to keep this context in mind to appreciate what problem GraphQL is solving.
This is also the motivation that would lead me to advocate for adopting GraphQL for a product. Moreso than a technical decision, it is an organizational decision regarding resource trade-offs, and where the highest iteration or code churn is expected to be located.
As I was saying, there might be some situations where that's the right thing, but in general it seems you want to have a well controlled layer there the specifies the contract between these pieces.
I was not intending to dodge your questions, but nor was I trying to comprehensively answer them, because they felt a bit unclear. I will make an attempt, combining snippets within your two posts that seem to be related:
>Shouldn't you always have some proper API abstraction between your components?
>But those endpoints are abstractions. Don't we want control over the surface of the API and our abstractions?
I can't answer this unless I know what concepts/layers you are referring to when you say "abstraction between components". If you mean "between the client and server", then yes, and GraphQL does this by way of the schema, types, and resolvers that the server supports, along with the query language itself. The execution is still occurring on the server, and the server still chooses what to implement and support.
If by "abstraction between components" you mean "URL endpoints and HTTP methods" then no, GraphQL chose to not have the abstraction be defined by the URL endpoint. If you use GraphQL, you do so having accepted that the decision point where resources are named is not at the URL or routing level. That doesn't make it not an abstraction, or not "proper" in some way.
>But the answer can't be have no APIs?
I don't understand what you mean by "No APIs"? You also mention "control over the surface"...
Is your concern that, because the client can ask the server "Please only respond with this subset of nodes, edges and properties: _______", the server has "no API"? Or it doesn't have "control"? I assure you that you can implement a server with whatever controls you desire. That doesn't mean it will always be easy, or be organized the way you are used to, or have the same performance profile you are used to, but the server can still implement whatever behavior it wants.
>...in general it seems you want to have a well controlled layer there the specifies the contract between these pieces.
I think this wording brings me closer to understanding your main concern.
First, let me repeat: I am not a big GraphQL fan, and am only explaining my understanding after implementing it on both clients and servers. I am not attempting to convince you this is good, only to explain a GraphQL approach to these matters.
The "well-controlled layer" is the edge between nodes, implemented as resolvers. This was the "aha" moment for me in implementing GraphQL the first time: edges are a first-class concept, not just the nodes/entities. If you try using GraphQL in a small project whose domain model has lots of "ifs" and "buts", you will be forced to reach for that layer of control, and get a sense of it. It is simply located in a different place than you are used to.
This "edges are first-class concepts" has an analogue in proper hypermedia REST APIs, but most organizations don't implement REST that way, so except for the five people who fully implement true HATEOAS, it is mostly beside the point.
Perhaps unfettered write access has its problems, and GQL has permissions that handle this issue plenty gracefully, but I don’t see why your data model should be obfuscated from your clients which rely on that data.
Let's say your software is HR software and you can add and remove employees. The abstraction is "Add an employee with these details". The data model should be completely independent of the abstraction. I.e. nobody should care how the model is implemented (even if in practice it's maybe some relational model that's more or less standard). Similarly for querying employees. Queries should not be generic, they should be driven by your application use cases, and presumably the underlying implementation and data model is optimized for those as well.
But I get it the GQL can be that thing in a more generic schema-driven thing. It still feels like a layer where you can inadvertently create the wrong contract. Especially if, as I think the case is, that different teams control the schema and the underlying models/implementation. So what it seems to be saving teams/developers is needing to spell out the exact requirements/implementation details of the API. But don't you want to do that?
How do people end up use GQL in practice? what is the layer below GQL? Is it actually a SQL database?
Taking an HR example, you could query for an employee, their PTO status and accrual history, their manager, and their reports all in one nice easy query that no one has to write any business logic for, just a schema set up with employees, manager, reports, and PTO tables joined on ID keys.
And in such a case, what abstraction does the backend team need to put in front of the schema? I can’t motivate what this means myself. A well designed DB schema is truly a beautiful contract, and with table and column comments you can even get intellisense docs in the IDE for the front end team building the client.
On the flip side, I agree the write operation should be done thru an API when there is complexity and requirements beyond just writing one row to one table, but read operations are much more graceful and speedier to define in GQL than REST.
It's no different ideologically from gRPC, OpenAPI, or OData -- except for the ability to select subsets of fields, which not all of those provide.
Just a type-documented API that the server allows clients to introspect and ask for a listing of operations + schema types.
GQL resolvers are the same code that you'd find behind endpoint handlers for REST "POST /users/1", etc
When it comes to written English, perhaps that could do with some reforms just as with SQL. Yet the way we write remains mostly unchanged.
IME, the majority of responses sent to the client is tabular data hammered into a JSON tree.
If you generalise all your response to tabular data, that lets you return scalar values (a table of exactly one row and one column), arrays (a table of exactly one row with multiple columns) or actual tables (a table of multiple rows with multiple columns).
The problem comes in when some of the values within those cells are trees themselves, but I suspect that can be solved by having a response contain multiple tables, with pointer-chasing on the client side reconstructing the trees within cells using the other tables in the response.
That would still leave the 1% of responses that actually are trees, though.
Generally, generating the JSON response directly for consumption in the DB is faster.
Think usescases like: Allowing users to write configuration rules, or lists of custom tag <-> value pairs for (whatever), things of these sorts
For instance, analytics usecases favor SQL stores, as slicing and dicing is better done with row or column stores instead of document databases.
Also, Postgres is getting more popular for lot of usecases, so SQL is here to stay.
That's not my impression. A decision maker today should typically make the decision to use SQL. I'm pretty sure the author would agree with that.
I think the target audience is language designers and tool builders. The author is urging people to envision and build new better interfaces to interact with relational data.
Newer platforms like Arrow ADBC/FlightSQL are better-suited to high-volume, OLAP style data queries we're seeing become commonplace today but the ecosystem and adoption haven't caught up.
https://arrow.apache.org/adbc/current/index.html
https://arrow.apache.org/docs/format/FlightSql.html
(And on top of that they need to clearly perceive the value of Strange New Thing, and clearly perceive the relative lack of value of the thing they have been emotionally invested in for decades...)
Standard SQL is not helpful, though. If that (failed) experiment was ended, database implementations would have even more freedom to explore superior syntax. Prescriptive language standards are a mistake.
The stuff that is more painful is building any kind of interesting application on top of a database. For example, as far as I know, it's very hard to "type check" a query (to get the "type" returned by a given query). It's also hard to efficiently compose SQL. And as far as I know, there's no standard, bulletproof way to escape SQL ("named parameters" is fine when you need to escape parameters, but most of SQL isn't parameters). There's also no good way to express sum types (a "place" can be a "park" or a "restaurant" or a "library", and each of those have different associated data--I don't need a "has_cycling_trails" boolean column for a restaurant, but I do for a park). There are various workarounds, all deeply unsatisfying.
I’ve written basic custom report writer functionality using this technique that lets users(usually me the developer or a super user) do custom sanitised SQL selects.
I assume similar functionality exists in all the different vendors databases.
> I’ve written basic custom report writer functionality using this technique that lets users(usually me the developer or a super user) do custom sanitised SQL selects.
I’m not sure how having the column metadata helps you sanitize SQL.
Columns are added, removed, pivoted, summed etc by the user running the report. This can’t be static but the OP was mentioning how you can’t get column meta data easily.
By sanitised SQL I mean the query is fed to the MSSQL parser and only select & union all is allowed as far queries go(eg. no delete, drops, updates etc).
Statically refers to taking some metadata about the database as well as the query and being able to anticipate the shape of the output columns without needing to run a query against a Postgres database. I'm not sure what you mean about "in the context of a report writer"--I don't think that's the use case I was describing in my original comment.
> Columns are added, removed, pivoted, summed etc by the user running the report. This can’t be static but the OP was mentioning how you can’t get column meta data easily.
I think I am the OP, right? And I think the report-writer stuff is confusing this conversation.
> By sanitised SQL I mean the query is fed to the MSSQL parser and only select & union all is allowed as far queries go(eg. no delete, drops, updates etc).
Enforcing read-only is helpful, but it's insufficient. Firstly, reads aren't always safe (e.g., reading confidential data) and secondly we also want the ability to safely generate queries that write or delete data.
The real problem is not that "it is good enough"; it's that SQL is still better than many of the newer proposals.
I mean, sure, if newcomer tech $BAR was slightly better than existing tech $FOO, then maybe $FOO might be eventually replaced. What we are seeing is that the newcomers are simply not better than the existing $FOO.
Specifically: the connector bits that deal w/ translating Relational Algebra IR expressed as GraphQL nodes -> SQL engine-specific code.
The author's comments about lack of standardization and portability might not get across just how nightmarishly different SQL dialects are.
I might put together a list of some of the batshit-insane bugs we've run into, even between version upgrades of the same engine.
I really think folks would raise an eyebrow if they understood just how much variance exists between implementations in what might be considered "common" functionality, and the sorts of contortions you have to do to get proper shims/polyfills/emulations.
Worrying that your data query language works across multiple vendors DB’s is not a concern ever considered imho.
"Because SQL is so inexpressive, incompressible and non-porous it was never able to develop a library ecosystem. Instead, any new functionality that is regularly needed is added to the spec, often with it's own custom syntax. So if you develop a new SQL implementation you must also implement the entire ecosystem from scratch too because users can't implement it themselves.
This results in an enormous language."
The article simultaneously complains that the SQL standard is not universally implemented (fair) and that SQL is not easily extensible (also fair). But taken together it seems odd to me in that if you make SQL very extensible, then not only will it vary between databases, it will vary between every single application.
Also, the line between SQL and database feels a little fuzzy to me, but don’t a lot of postgresql extensions effectively add new functionality to SQL?
"In modern programming languages, the language itself consists of a small number of carefully chosen primitives. Programmers combine these to build up the rest of the functionality, which can be shared in the form of libraries. This lowers the burden on the language designers to foresee every possible need and allows new implementations to reuse existing functionality. Eg if you implement a new javascript interpreter, you get the whole javascript ecosystem for free."
You have languages like JavaScript which are very “expressive” in that it comes with very little functionality but there are a wealth of libraries you can use to augment this. And this tradeoff is often lamented on HN since it’s never enough to just know JS; you have to know the particular libraries being used by the project.
Contrast that with batteries included languages like python or go.
And like I said above, Postgres extensions add features to the language, usually without any syntax changes (just new functions or operators). Isn’t this like a “library” in another language?
...Or is that the joke?
Feel free to innovate and bring forth other RDBMS/Data query languages and tools, perhaps something may succeed and stick as long as SQL has.
Cheers
Even though SQL as flaws, maybe a lot, it has one upside which is: it's so easy to onboard people on it, in the data ecosystem (warehousing etc.) it means that we can do way much stuff faster than before and hire less technical people, which is great
9 more comments available on Hacker News