Supporting Our AI Overlords: Redesigning Data Systems to Be Agent-First
Posted4 months agoActive3 months ago
arxiv.orgTechstory
calmmixed
Debate
60/100
AIData SystemsAgent-First Design
Key topics
AI
Data Systems
Agent-First Design
A research paper proposes redesigning data systems to be 'agent-first' to support the growing use of AI systems, sparking discussion on the implications and potential solutions.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
2h
Peak period
8
8-10h
Avg / period
2.5
Comment distribution20 data points
Loading chart...
Based on 20 loaded comments
Key moments
- 01Story posted
Sep 19, 2025 at 11:41 PM EDT
4 months ago
Step 01 - 02First comment
Sep 20, 2025 at 1:53 AM EDT
2h after posting
Step 02 - 03Peak activity
8 comments in 8-10h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 21, 2025 at 2:21 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45310123Type: storyLast synced: 11/20/2025, 1:30:03 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I thought AI can do anything, why do I have to help it if it’s so smart and powerful and intelligent and useful? Is it really just a complex computer program that is actually trained to do very narrowly defined activities?
The abstract blurb (linked) doesn't mention AI overlords in either context, so I think it's mostly just an edgy title.
Would be a shame to invest all those billions of dollars and resources to get unreliable mediocre results.
A few decades ago, people visit each other using IP protocol, it is people themselves that collect news, read information, and publish new data.
After that, browsers visit each site using HTTP protocol, it is browsers that collect data, translate pages, and interact with user.
Nowadays, it is highly possible that, AI, WILL, involves into our daily life, and the above rewrites by, AIs request each <what> using <new> protocol, it is AIs that <do a lot thing>, and interact with user.
Information never get unavailable, but main method for retrieving info does change. We could of course manipulate command-line utilities instead of browsers when browser became popular, we could of course continue to search and click everywhere on browser instead of AI-enhanced searching when AI got hot today. However it is a trend that AI will bring us to a new evolution in fast-pace information era.
Users are whom sit behind the screen, they never changes, but their methods/agents/proxies change over time.
Next step is enforced bias.
https://fortune.com/2025/07/08/elon-musk-grok-ai-conservativ...
https://www.tomshardware.com/tech-industry/artificial-intell...
Overlord indeed, but not interesting. The West isn't interesting, or interested, it is biased to claim status from the get-go.
How should we rethink query interfaces, query processing techniques, long-term data stores, and short-term data stores to be able to handle the greater volume of agentic queries we will likely see, whether we want it or not, in coming years, if people and organizations continue to adopt AI systems for more and more tasks.
The authors study the characteristics of agentic queries they identify (scale, heterogeneity, redundancy, and steerability) and outline several new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores.
That could hinder other and maybe better approaches.
If people and organizations don't do that, the research evidently becomes pointless.
I think the real problem here is answering the question. But there's no way to intelligently get information out of the internet. (I assume Google is building one, but it apparently hasn't yet, and if they did, it's not what OpenAI would use.)
Hammering every WP site with infinite queries every time someone asks a question seems like the wrong solution to the problem. I'm not sure what the right solution looks like.
I got an 80% solution in like ten lines of python by doing "just Google it then look at the top 10 search results" (i.e. dump them into GPT). That works surprisingly well, although the top n results are increasingly AI generated.
I had a funny experience when Bard first came out (the original name for Gemini). I asked it a question, it gave me the precise opposite of the truth (the truth but negated). It even cited sources. The sources were both AI blogspam. That still makes me laugh.
Can you source that claim? It sounds absolutely ridiculous and costly/wasteful. It would be nigh impossible to ingest 1000s of webpages into a single chat.
https://news.ycombinator.com/item?id=42726827
However, upon further investigation, this is a special case triggered by a security researcher, and not the normal mode of operation.
Someone else here recommended sites maintaining their own CLAUDE.md file, good idea but too vendor specific. Ten months ago someone online was recommending the name llms.txt as a generic markdown file for agent use and I added one https://markwatson.com/llms.txt I stopped collecting web page visit statistics so I have no idea how often that file is discovered however.
At a high-level, 90% of the complexity of their data retrieval system can be deleted by simply having attaching a `CLAUDE.md` file to every data store that is automatically kept up to date the agents can read.
High-throughput queries by an agent don't feel much different than high-throughput querying that large scale systems Instagram and Youtube need to service on a daily basis. Whatever works for 10M active users per second on IG would also work for 50 agents making 1M queries per second.
I can see a need for innovation in data store still. My little startup probably can't afford the same AWS bill than Meta but the tide would lift all boats, not just AI-specific use cases.