Thalamusdb: Query Text, Tables, Images, and Audio
Posted3 months agoActive3 months ago
github.comTechstory
calmmixed
Debate
40/100
DatabaseAIMultimodal Search
Key topics
Database
AI
Multimodal Search
ThalamusDB is a new database that allows querying text, tables, images, and audio, sparking discussion on its design, functionality, and potential applications.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
3d
Peak period
9
84-96h
Avg / period
4.5
Comment distribution18 data points
Loading chart...
Based on 18 loaded comments
Key moments
- 01Story posted
Oct 7, 2025 at 3:34 PM EDT
3 months ago
Step 01 - 02First comment
Oct 10, 2025 at 8:14 PM EDT
3d after posting
Step 02 - 03Peak activity
9 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 12, 2025 at 5:57 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45507753Type: storyLast synced: 11/20/2025, 1:30:03 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
A problem I have with LLMs and the way they are marketed is that are being treated as and offered as if they were toys.
You’ve given a few tantalizing details, but what I would really admire is a link to full details about exactly what you did to collect sufficient evidence that this system can be trusted and in what ways it can be trusted.
In general, when using LLMs, there are no formal guarantees on output quality anymore (but the same applies when using, e.g., human crowd workers for comparable tasks like image classification etc.).
Having said that, we did some experiments evaluating output accuracy for a prior version of ThalamusDB and the results are here: https://dl.acm.org/doi/pdf/10.1145/3654989 We will actually publish more results with the new version within the next few months as well. But, again, no formal guarantees.
But LLMs routinely make errors that if made by a human would cause us to believe that human is utterly incompetent, acting in bad faith, or dangerously delusional. So we should never just shrug and say nobody’s perfect. I have to be responsible for what my product does.
Thanks for the link!
What's the advantage of this over using llamaindex?
Although even asking that question I will be honest, the last thing I used llamaindex for, it seemed mostly everything had to be shoehorned in as using that library was a foregone conclusion, even though ChromaDB was doing just about all the work in the end because the built in test vector store that llamaindex has strangely bad performance with any scale.
I do like how simple the llamaindex DocumentStore or whatever is where you can just point it at a directory. But it seems when using a specific vectordb you often can't do that.
I guess the other thing people do is put everything in postgres. Do people use pgvector to store image embeddings?
It's less applicable if the answer cannot be extracted from a small data subset. E.g., you want to count the number of pictures showing red cars in your database (rather than retrieving a few pictures of red cars). Or, let's say you want to tag beach holiday pictures with all the people who appear in them. That's another scenario where you cannot easily work with RAG. ThalamusDB supports such scenarios, e.g., you could use the query below in ThalamusDB:
SELECT H.pic FROM HolidayPictures H, ProfilePictures P as Tag WHERE NLFILTER(H.pic, 'this is a picture of the beach') AND NLJOIN(H.pic, P.pic, 'the same person appears in both pictures');
ThalamusDB handles scenarios where the LLM has to look at large data sets and uses a few techniques to make that more efficient. E.g., see here (https://arxiv.org/abs/2510.08489) for the implementation of the semantic join algorithm.
A few other things to consider:
1) ThalamusDB supports SQL with semantic operators. Lay users may prefer the natural language query interfaces offered by other frameworks. But people who are familiar with SQL might prefer writing SQL-style queries for maximum precision.
2) ThalamusDB offers various ways to restrict the per-query processing overheads, e.g., time and token limits. If the limit is reached, it actually returns a partial result (e.g., lower and upper bounds for query aggregates, subsets of result rows ...). Other frameworks do not return anything useful if query processing is interrupted before it's complete.