Supporting Our AI Overlords: Redesigning Data Systems to Be Agent-First

Posted4 months agoActive3 months ago

derekhecksher

67 points

20 comments

arxiv.orgTechstory

calmmixed

Debate

60/100

AIData SystemsAgent-First Design

Key topics

Data Systems

Agent-First Design

A research paper proposes redesigning data systems to be 'agent-first' to support the growing use of AI systems, sparking discussion on the implications and potential solutions.

Snapshot generated from the HN discussion

Discussion Activity

Moderate engagement

First comment

Peak period

8-10h

Avg / period

2.5

Comment distribution20 data points

Loading chart...

Based on 20 loaded comments

Key moments

01Story posted
Sep 19, 2025 at 11:41 PM EDT
4 months ago
Step 01
02First comment
Sep 20, 2025 at 1:53 AM EDT
2h after posting
Step 02
03Peak activity
8 comments in 8-10h
Hottest window of the conversation
Step 03
04Latest activity
Sep 21, 2025 at 2:21 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (20 comments)

Showing 20 comments

Towaway69

4 months ago

2 replies

Is the title to be taken seriously or is “AI Overlords” become some type of well-meaning indication of the positivity of having overlords?

I thought AI can do anything, why do I have to help it if it’s so smart and powerful and intelligent and useful? Is it really just a complex computer program that is actually trained to do very narrowly defined activities?

david_shaw

4 months ago

1 reply

> Is the title to be taken seriously or is “AI Overlords” become some type of well-meaning indication of the positivity of having overlords?

The abstract blurb (linked) doesn't mention AI overlords in either context, so I think it's mostly just an edgy title.

croes

4 months ago

Maybe it really mean AI company overlords because they are seen as the next rulers of the economy.

croes

4 months ago

Current AI has its limits and now we must tailor our data in the hope it fixes some problems.

Would be a shame to invest all those billions of dollars and resources to get unreliable mediocre results.

unisyncd

4 months ago

1 reply

It is a debate for the main visitors the Internet services serve.

A few decades ago, people visit each other using IP protocol, it is people themselves that collect news, read information, and publish new data.

After that, browsers visit each site using HTTP protocol, it is browsers that collect data, translate pages, and interact with user.

Nowadays, it is highly possible that, AI, WILL, involves into our daily life, and the above rewrites by, AIs request each <what> using <new> protocol, it is AIs that <do a lot thing>, and interact with user.

Information never get unavailable, but main method for retrieving info does change. We could of course manipulate command-line utilities instead of browsers when browser became popular, we could of course continue to search and click everywhere on browser instead of AI-enhanced searching when AI got hot today. However it is a trend that AI will bring us to a new evolution in fast-pace information era.

Users are whom sit behind the screen, they never changes, but their methods/agents/proxies change over time.

rixed

4 months ago

1 reply

Exactly how I picture things. AI is the next step after good search engines. We dreamed about the semantic web but never really delivered on this. AI is the semantic search we were longing for; Still a bit fuzzy, but already very useful.

croes

4 months ago

1 reply

This will be the best we‘ll get.

Next step is enforced bias.

https://fortune.com/2025/07/08/elon-musk-grok-ai-conservativ...

https://www.tomshardware.com/tech-industry/artificial-intell...

mallowdram

4 months ago

All Western languages come enforced (or defanged doyv) with agentic status, control perhaps manipulation. This is the West. The sender is always staking claim to the opening lines right to status. And now add the conduit metaphor problem/paradox that infects every word in training.

Overlord indeed, but not interesting. The West isn't interesting, or interested, it is biased to claim status from the get-go.

cs702

4 months ago

2 replies

The paper's title is too clickbaitish for my taste, but its subject is important:

How should we rethink query interfaces, query processing techniques, long-term data stores, and short-term data stores to be able to handle the greater volume of agentic queries we will likely see, whether we want it or not, in coming years, if people and organizations continue to adopt AI systems for more and more tasks.

The authors study the characteristics of agentic queries they identify (scale, heterogeneity, redundancy, and steerability) and outline several new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores.

croes

4 months ago

2 replies

Isn’t it bad to tailor the data for specific type of AI?

That could hinder other and maybe better approaches.

cs702

4 months ago

That's why my comment was conditional (emphasizing the "if" here, for clarity): "... if people and organizations continue to adopt AI systems for more and more tasks".

If people and organizations don't do that, the research evidently becomes pointless.

lyu07282

4 months ago

It sounded to me that's not what they are doing, it's more about making your existing data accessible via *hand-waving* "agentic" architectures (= an unimaginable inefficient burning of tokens/s) it's all nonsense if you asked me

andai

4 months ago

1 reply

The issue we have is that websites (including small websites) are getting hammered by bots. Apparently ChatGPT makes 2000 http requests per web search.

I think the real problem here is answering the question. But there's no way to intelligently get information out of the internet. (I assume Google is building one, but it apparently hasn't yet, and if they did, it's not what OpenAI would use.)

Hammering every WP site with infinite queries every time someone asks a question seems like the wrong solution to the problem. I'm not sure what the right solution looks like.

I got an 80% solution in like ten lines of python by doing "just Google it then look at the top 10 search results" (i.e. dump them into GPT). That works surprisingly well, although the top n results are increasingly AI generated.

I had a funny experience when Bard first came out (the original name for Gemini). I asked it a question, it gave me the precise opposite of the truth (the truth but negated). It even cited sources. The sources were both AI blogspam. That still makes me laugh.

yunohn

4 months ago

1 reply

> Apparently ChatGPT makes 2000 http requests per web search.

Can you source that claim? It sounds absolutely ridiculous and costly/wasteful. It would be nigh impossible to ingest 1000s of webpages into a single chat.

andai

3 months ago

1 reply

It turned out I remembered the number incorrectly. It was actually 5000 http requests!

https://news.ycombinator.com/item?id=42726827

However, upon further investigation, this is a special case triggered by a security researcher, and not the normal mode of operation.

yunohn

3 months ago

If one reads the security advisory - the security researcher’s claim is that a particular API endpoint would accept URLs without deduping, so they were able to send 5000 URLs to it - nothing more sophisticated.

lyu07282

4 months ago

Is there an appendix with prompts separately somewhere?

mark_l_watson

4 months ago

Interesting paper, and something I have been thinking about. I am retired so I am a lighter user of AI than most people here but I still have Gemini and ChatGPT to a half dozen deep research studies for me a week. It is sobering to see how many web sites are speculatively searched. I mostly find the results useful and I prefer this new process to manual web search. After deep research asking for 'the best' reference link usually produces something else worth reading in addition to the research report.

Someone else here recommended sites maintaining their own CLAUDE.md file, good idea but too vendor specific. Ten months ago someone online was recommending the name llms.txt as a generic markdown file for agent use and I added one https://markwatson.com/llms.txt I stopped collecting web page visit statistics so I have no idea how often that file is discovered however.

apwell23

4 months ago

15 ppl worked on this glorified blogpost?

frenchmajesty

4 months ago

The proposed design in this paper is bad, but the core of the idea is very interesting.

At a high-level, 90% of the complexity of their data retrieval system can be deleted by simply having attaching a `CLAUDE.md` file to every data store that is automatically kept up to date the agents can read.

High-throughput queries by an agent don't feel much different than high-throughput querying that large scale systems Instagram and Youtube need to service on a daily basis. Whatever works for 10M active users per second on IG would also work for 50 agents making 1M queries per second.

I can see a need for innovation in data store still. My little startup probably can't afford the same AWS bill than Meta but the tide would lift all boats, not just AI-specific use cases.

View full discussion on Hacker News

ID: 45310123Type: storyLast synced: 11/20/2025, 1:30:03 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN