Adding or Logic Forced Us to Confront Why Users Preferred Raw SQL
Posted4 months agoActive4 months ago
signoz.ioTechstory
calmmixed
Debate
80/100
Query BuildersSQLOrmsData Querying
Key topics
Query Builders
SQL
Orms
Data Querying
The article discusses the challenges of building a query builder that can handle complex queries, and the discussion revolves around the trade-offs between using SQL, ORMs, and custom DSLs for data querying.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4d
Peak period
66
84-96h
Avg / period
17.4
Comment distribution87 data points
Loading chart...
Based on 87 loaded comments
Key moments
- 01Story posted
Sep 10, 2025 at 9:00 AM EDT
4 months ago
Step 01 - 02First comment
Sep 13, 2025 at 9:20 PM EDT
4d after posting
Step 02 - 03Peak activity
66 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 16, 2025 at 11:33 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45196991Type: storyLast synced: 11/20/2025, 5:48:27 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Not a fan of all the proxy object circus ORMs do but I'd leave row-> domain object mapping and filter building to some library. Sweet spot is probably something akin to Android Room / Micronaut Data JDBC.
And if you're testing, you've got to test every query combination anyways. It's not just syntax that can be wrong, but logic and performance.
String concatenation isn't really a major source of that. Just make sure your parentheses match, as you need to do no matter what, and include a space at the start and end of each string to make sure you don't accidentally smush terms together likethis.
Parameter management in some languages are unpleasant, like how JDBC only has positional arguments; and any time you do string concat in a language, you start getting in danger of misformed SQL.
Ultra basic ORMs, like Exposed (Kotlin), are well-tested frameworks that do exactly what I want. Want a single value in the =? Or want it to be an “in”? Or what if it’s null? Handled. No special string management. Want parameters? Handled.
When I see pure ORM’d code, I can feel safe expecting it to be protected from injection and formatting issues. It’s reduced cognitive load and greater safety.
When I see raw string SQL management, I have to put another layer of care and attention to try and make sure (and maybe still fail) there’s no application-crashing mistakes in that part of code.
It’s kinda like working with typed and compiled code. Greater protection from error.
This is another reason why the ORM is a leaky abstraction - it hides all the best features from you.
I suspect it’s because people never learned to use them, but they did learn to use the ORM.
You might choose to have a set of views and functions that are versioned separately from your app, for the same reasons you might choose to set up a microservice. But as with microservices, it should be a deliberate decision that you make because you're getting something out of it that outweighs the complexity costs and version-skew headaches; it should not be the default way to do composition.
SQL can be stored in version control just as well as any other code. This can include application-level queries as well as the DDL SQL which defines the actual structure of your database.
It's sad that tooling for this kind of workflow doesn't seem to be particularly good across the board, Visual Studio being somewhat of an exception.
I think Postgres sort of supports this but it's really clunky, and also I think you'd have to run the function-creation code on every connection; I don't know whether that would create any performance problems.
What does Visual Studio do?
I don't really see a "problem" here. Having everything in the same repo is probably the easiest way to ensure that the client cannot go out of sync with the database. When making a release, create a Git tag and deploy both client and database from there.
But you must make sure you know which tag is deployed at each customer. We use naming conventions for that (the name of the customer is part of the Git tag name), or you could hold that mapping externally if necessary.
Once you have that, making a client-level hotfix for a specific customer is (relatively) easy - just branch from the customer's release tag, do the changes you need, run the tests, and there is high probability everything will work properly. Once you release the hotfix, you create another tag and remember that this tag is now installed at the customer, and so on...
If you make changes to the database, then of course you still need to have an upgrade procedure from one version of the database to another, but you can be confident that the client will never query the "wrong" database version. And since both old and new database structure are just SQL files under different Git tags, you can discover exactly what changed by just by diffing.
> What does Visual Studio do?
Visual Studio has a special SQL project type, where you can keep you base table definitions, as well as all the SQL code (stored procedures, functions, views, user-defined types, indexes etc.). You can group the SQL project together with client projects, tools, automated tests and benchmarks in the same "solution" (kind of a project-of-projects). When working on the product, you load that solution, and you have all dependencies in one place, right in the Solution Explorer. This eases navigation and minimizes dependencies whose source code you cannot (easily) see.
After you make your changes, you deploy to the local database (we use the free SQL Server Developer Edition, other databases have their own free editions), run/debug tests, run/debug clients as needed. You can even start debugging a C# test or client application and step into T-SQL stored procedure seamlessly, which can be a very powerful tool for ferreting-out difficult bugs. When you are done with your changes, commit to Git, let the CI double-check the tests and make the build for the the last-row-of-defense manual QA testing. Then deploy, associate the new Git tag to the customer, rinse-and-repeat...
Basically, we treat SQL like every other code, and manage SQL dependencies not fundamentally unlike any other dependencies.
The database is a separate component, the biggest mistake nearly every developer makes is trying to make a single application own it. Let me tell you, from experience, the chances that this one single application will be the only thing that every connects to your database past initial development is slim. "Oh, but we're doing microservices and everything only ever talks over HTTP or gRPC" - yeah, sure, and I don't have a backlog of tech debt a mile long.
ADO.NET has full support for table-valued parameters.
In any case, it's just one framework; previous comment said "all major languages". And it's useful to be able to abstract and compose over expressions and predicates and such, not just data values, which this still doesn't help with.
ADO.NET is available both in the legacy Windows-only .NET Framework and in the new cross-platform .NET (previously known as .NET Core).
> In any case, it's just one framework; previous comment said "all major languages".
Well, you are not implementing a piece of code in "all major languages" - you can pick the one that fits the problem best.
> And it's useful to be able to abstract and compose over expressions and predicates and such, not just data values, which this still doesn't help with.
You can do that via LINQ - there is even special query-like syntax built right into C# for that that looks like this:
This does NOT load the entire table in memory just to filter on City. It actually transpiles to SQL which does the filtering on the server.But anything non-trivial is much better done in SQL proper, IMO. Most of the time, at least for OLTP, you'll be using static SQL - that is you will not need to change the text of the SQL query, just parameters. But dynamic SQL is a thing and can be very useful on occasion - which is string concatenation with all the problems that might bring.
Even for simple SELECTs, I lost count of how many times I had to enable runtime DEBUG logging in my query builders to get a query that I can copy-paste into my DB client for debugging, data exploring, etc. I should be able to copy-paste my query from code and run it in `psql` with minimal changes.
Raw SQL is the way to go[0]. Period. You just need a thin wrapper on top of it that adds escaping, arguments, type-safety and other QoL.
[0] https://gajus.medium.com/stop-using-knex-js-and-earn-30-bf41...
ORMs are not query builders. The problem with ORMs is that they hide the query logic from you. It's not clear what's getting joined in, how its getting joined in, or if its actually 1 or N queries. The further you get from a straight query builder, too, the fewer SQL features you have access to, such as parameterized joins, CTEs, window functions, etc. Sometimes you can hack those into the ORM, but often you have to resort to string concat and build the parameterized query and arguments manually.
I've never used Exposed, but from what I can tell it's kind of a hybrid? the query builder parts look great, but I'd still be wary of the ORM parts.
I've never had a good experience debugging performance issues in ORM-generated queries. Maybe I haven't used the right ones, but the libraries I've used have gone out of their way to hide how the query is actually being executed and only have obtuse levers to tweak that execution. Sure you can see the raw logs of what queries the ORM executed, but you can't easily figure out why its chosen a particular strategy. Or you can't easily make it stop using a pathological join ordering.
They keys is to encapsulate most of the (possibly complex) CRUD logic in Db functions (for retrieval these would be table-valued functions) and access these from the application side as virtual tables.
I also have capable but easy to use filtering/sorting/paging operators and combinators in the ORM that are translated into SQL where/sort/limit clauses.
Because the heavy lifting is already done by the db functions (which you have full control of to use whatever SQL you need), the pattern is actually quite powerful but easy to use.
You can define virtual read only tables, several virtual tables that access the same actual table in different way, custom operators that transcend SQL, etc
These types do not necessarily have to map to the underlying table types.
Theyre also seem quite restrictive to what raw sql can do.
I use ORMs so that I can write the thing I want to do in a vaguely reasonable language, just like I manipulate XML datastructures in code instead of writing XSLT.
It'd be neat if you could let more users just have SQL but there's so many ways for that to go terribly wrong nowadays, with all those nice juicy SQL features that so many engines now support.
This only works when your customers are of a reasonable size (e.g. small businesses or individuals) but could provide arbitrary analytics power.
It's also a safe target for AIs to write sql against, if you're into that sort of thing.
At a high level, we use DuckDB as an in-memory OLAP cache that’s invalidated via Postgres logical replication.
We compute a lot of derived data to offer a read-heavy workload as a product.
Possibly the most dubious choice I tried out recently was letting the front end just execute SQL directly against DuckDB with a thin serialization layer in the backend.
The best apart about it is the hot reload feedback loop. The backend doesn’t have to rebuild to iterate quickly.
If you're afraid of users tanking performance, read replicas. As instantaneous as it gets, and no customer can tank others.
Or... depending if your database layout allows, you might be able to achieve that with a per-tenant read replica server and MySQL replication filters [1] or Postgres row filters [2].
A sqlite db is effectively the safest option because there is no way to bypass an export step... but it might also end up seriously corrupting your data (e.g. columns with native timezones) or lack features like postgre's spatial stuff.
[1] https://dev.mysql.com/doc/refman/8.4/en/change-replication-f...
[2] https://www.postgresql.org/docs/current/logical-replication-...
(In case you're wondering, https://www.knime.com/)
You cannot use JQL to pull up anything but a list of issues. You cannot aggregate fields within those issues. Many fields aren't even an option. The built-in functions are garbage. Some are redundant. Some are named similarly but do wildly different things. You can't even use subqueries or regex. Is there documentation disproving me somewhere that I'm not aware of?
These are all huge problems with endless unresolved tickets on Atlassian's support pages that point out very good and very common use cases that people have desperately wanted for over a decade. The answer is always yet another plugin. Security and data integrity can't be the reason Jira search sucks since many of those plugins are buggy and have more access to the backend.
As far as I can tell "JQL" was a name devised by their marketing department to make it seem more powerful than it really is and otherwise prop up a nasty plugin ecosystem that nobody wants. I sometimes feel like my outlook inbox rules are more sophisticated and that's pretty sad.
That's right. JQL (Jira Query Language) is based on AQL (Atlassian Query Language), same as CQL (Confluence Query Language), and targets Lucene (IIRC), not SQL, and CQL and JQL are (I suppose!) translated to Lucene queries.
These query languages are a subset of what you can do with a Solr / Lucene query string (basically the same thing as Solr is a fork of Lucene; Lucene is what is behind Atlassian products' search features IIRC). If you squint a little, you can liken it to some limited SQL where clause without the joins and the sub queries, but the target is a search platform, not an SQL database.
AQL fields map to Lucene fields, and there are functions that are also (likely) translated to Lucene expressions or replaced with some stuff from the context (currentContent() for instance in CQL is replaced to the id of the current Confluence page, and parentContent() to the id of the parent of the current page - again, this is a guess).
I suppose they invented this family or query languages to provide something more high level / user-friendly than raw lucene query strings, or possibly didn't want to give full access to the Lucene search capabilities for some reason.
There are Maven packages provided by Atlassian to deal with AQL [1], but they are not open source (and I have not studied their code, disassembled or otherwise).
I have reimplemented an AQL parser [2] from scratch using their limited docs and guesses, and, with this, implemented a CQL to Solr query translator for XWiki, mapping Confluence features to corresponding XWiki features [3]. Because people are migrating from Confluence to XWiki and their content is sprinkled with CQL-based macros that we needed to make work.
If you are curious about how a CQL statement can be translated to a Solr query (although querying fields in XWiki, not Confluence), you can look at the tests [4].
I actually find AQL pretty elegant, although I'll admit I haven't used it much.
[1] https://mvnrepository.com/artifact/com.atlassian.querylang
[2] https://github.com/xwiki-contrib/cql/tree/master/aqlparser
[3] https://github.com/xwiki-contrib/cql/tree/master/query
[4] https://github.com/xwiki-contrib/cql/blob/master/query/src/t...
a bit tangential, but isn't Solr (and ElasticSearch and probably others) all use Lucerne "indexes" (as in Lucene is a library like LevelDB)?
what's strange is that most JQL that people want would translate to the most boring Hibernate query (or whatever Jira uses for relational DB access)
The very complaints you are making are kind of my point; that you can even consider making them means they've put a lot of work into a search interface that few other search interfaces have. The vast majority of "search fields" I run into can't even do a single AND or OR; one does not get so far as to wish for those things because clearly we're miles away from them.
GitLab has GLQL[1]. ServiceNow has CMDB[2]. Both offer more flexible queries on any data the users can see and can return arbitrary tables. The dashboards, reports, APIs, etc. are actually useful because of that. On Jira, all you get are lists of issues and dashboard widgets that are barely more helpful that the list of issues they obscure.
This is expected to be all available out-of-the-box without plugins or other gotchas. Nobody should have to rewrite their scripts to run the same basic recurring jobs on another project or part of the company just because they're on a different instance and don't want to buy a plugin that is seldom used. The only reason anyone tracks projects with Jira is its familiarity. Hardly anyone uses the newer features that Atlassian has shifted focus towards. All anyone wants is the core UI, for that UI to not be so bloated and slow (and it's only getting worse), and a better JQL.
[1]: https://docs.gitlab.com/user/glql/
[2]: https://www.servicenow.com/products/servicenow-platform/conf...
Seems to be 3978 lines of Go at present.
This wasn't a minor limitation; it was a fundamental capability gap. Users were forced to learn ClickHouse SQL, write raw queries, and maintain them as our schemas evolved. We'd built a query builder that couldn't handle real-world queries.
What is it with the LinkedIn style?
No X
No Y
No Z
Isn't A its B
Heh. But seriously, all frontier models do it, it's in the top 3 of tells that even someone with zero LLM experience can spot.
Our articles are now being read by stakeholders beyond our ICP.
I agree that LinkedIn/ChatGPT style isn't the best route. We cringe on it too. We are experimenting to find a middle ground between what gets more reach, while not giving into the trending writing styles.
Can I please get some more feedback from you?
- would you prefer more technical details in this article? - or just a change in the sentence structure that is more natural (like this response)? - or both?
I would rather not read other people's slop. I could pass your article through an LLM myself, if I wanted that. Here's just one of the most tired snowclones that current LLMs love, everywhere in your content:
>This wasn't a minor limitation; it was a fundamental capability gap
>context-switch not just between data types, but between entirely different mental models of how to query data.
>This wasn't something we asked them to do. They discovered that the query builder could now handle their complex cases, and they preferred it over raw SQL.
>That's not just a technical achievement. That's validation that we finally understood the problem we were trying to solve.
It wasn't just a minor stylistic issue; It was a signal to close the page.
Overall I found it a decent piece, a few too many "<term>: <explanation>" blocks for my taste but better than what I can write - and than most of the tech-industry blogging I come across.
Everyone else managed to read it fine.
But you could survey more people as maybe it is just me!
yes! please stop making webpages background dark. It's a terrible design for ppl with astigmatism like me...
We are revamping the design. I'll ensure I understand more about this and make it more accessible.
Devs seem to prefer dark theme across the brand (eg Supabase, Linear). Hence, the current choice.
I think “SQL is the interface” even for telemtry is the thing that truly makes sense, but it is messy with logs compared to splunk for example
Can't do a query builder in one step, or must write Apex.
> Currently, logs and traces live in separate worlds. You can see that a trace has an error, and you can see related logs, but you can't query them together.
I've looked into SigNoz a few times but still using Grafana. The former does look promising, and features like this would start to make the case for maybe switching.
This is a chronic problem in software. What I do instead is either omit the complexity or make it as visible as everything else. If it's not worth making it discoverable, it's not worth having. If you omit it, you get customer feedback about its importance instead of them struggling in silence.