Effective Context Engineering for AI Agents
Mood
calm
Sentiment
mixed
Category
other
Key topics
The article discusses effective context engineering for AI agents, and the discussion revolves around the challenges and strategies for optimizing context windows for large language models.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4d
Peak period
30
Day 5
Avg / period
10.7
Based on 32 loaded comments
Key moments
- 01Story posted
Sep 29, 2025 at 4:18 PM EDT
about 2 months ago
Step 01 - 02First comment
Oct 3, 2025 at 7:00 PM EDT
4d after posting
Step 02 - 03Peak activity
30 comments in Day 5
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 11, 2025 at 11:50 AM EDT
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Create evals from previous issues and current tests. Use DSPy on prompts. Create hypotheses for the value of different context packs, and run an eval matrix to see what actually works and what doesn't. Instrument your agents with Otel and stratify failure cases to understand where your agents are breaking.
Isn't it a programming language type thing?
Can you even integrate that into an existing codebase easily?
No algorithms, no Linux, no open protocols, maybe not even internet.
RoPE is great and all, but doesn't magically give 100% performance over the lengthened context; that takes more work.
I’m not saying it won’t eventually be known, but not in these initial stages.
The only thing separating Claude, Gemini and ChatGPT is their context and prompt engineering, assuming the frontier models belong to the same class of capability. You can absolutely release a competitor to these things that could perform better for certain things (or even all things, if you introduce brand new context engineering ideas), if you wanted to.
Even if someone fine tuned an LLM with this type of data, Deepseek has shown that they can just use a teacher-student strategy to steal from whatever model you trained (exfiltrate your value-add, which is how they stole from OpenAI). Stealing is already a thing in this space, so don’t be shocked if over time you see a lot more protectionism (protectionism is something we already see geopolitically on the hardware front).
I don’t know what’s going to happen, but I can confidently say that if humans are involved at this stage, there will absolutely be some level of information siloing, and stealing.
——
But to directly answer your question:
”… instead of becoming something with standard practices that work well for most use cases?”
In no uncertain terms, the answer is because of money.
We can't really do much with the information that x amount is reserved for MCP, tool calling or the system prompt.
I actually think this is pretty useful information. It helps you evaluate whether an MCP server is worth the context cost. Similar for getting a feel for how much context certain tool uses use up. I feel like there's a way you can change the system prompt, and so that helps you evaluate if what you've got there is worth it also.
What we need is a way to manage the dynamic part of the context without just starting from zero each time.
https://platform.openai.com/docs/guides/function-calling#con...
1. Have agents emit chatter in a structured format. Have them emit hypotheses, evidence, counterfactuals, invariants, etc. The fully natural language agent chatter is shit for observability, and if you have structured agent output you can actually run script hooks that are very powerful in response to agent input/output.
2. Have agents summarize key evidence from toolcalls, then just drop the tool call output from context (you can give them a tool to retrieve the old value without recomputation, cache tool output in redis and give them a key to retrieve it later if needed). Tool calls dominate context expansion bloat, and once you've extracted evidence the original tool output is very low value.
Agents are stateless, hence the need for context. This means that all they know about the ongoing session is what's in that context (generally speaking). As the context grows any particular element within it becomes a smaller and smaller percentage of the whole. The LLM is not 'losing focus'; it's being diluted with more tokens. But then I suppose anthropomorphism comes naturally to a company named Anthropic, and 'losing focus' does make it sound more human.
They didn't need a study and article, but it likely contributes towards the mystique. Hence the use of phrases like "this results in n² pairwise relationships for n tokens" to make it sound more erudite and revelatory.
Another interesting thought might be that long horizon tasks need different tooling, and with the shift to long running tasks you can use cheaper models as well. None of the big providers have good tools for that at the moment, so the only thing they can say is: to fix our contexts but still use their models.
The you will have a good starting point, with less chance of running out of space before solving the task.
If you can’t give it full context at the beginning, you can give it a tree listing of the files involved, and maybe a couple of READMEs (if there are any) and ask it see if it can work out what files are needed, giving it a couple of files at a time, at its suggestion.
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.