How to Fix Your Context
Posted4 months agoActive4 months ago
dbreunig.comTechstory
calmpositive
Debate
40/100
AILlmsContext Management
Key topics
AI
Llms
Context Management
The article discusses strategies for improving the context of large language models (LLMs), and the discussion revolves around the practical applications and challenges of implementing these strategies.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
3h
Peak period
9
36-42h
Avg / period
2.5
Comment distribution20 data points
Loading chart...
Based on 20 loaded comments
Key moments
- 01Story posted
Aug 24, 2025 at 3:09 AM EDT
4 months ago
Step 01 - 02First comment
Aug 24, 2025 at 6:13 AM EDT
3h after posting
Step 02 - 03Peak activity
9 comments in 36-42h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 27, 2025 at 7:43 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45002048Type: storyLast synced: 11/20/2025, 2:18:13 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
However it uses various terms that I'm not sure of the definition for.
E.g. the word "Tool".
How can I learn about the definitions of these words? Will reading prior articles in the series help? Please advise.
One who lacks the mental capacity to know he is being used. A fool. A cretin. Characterized by low intelligence and/or self-esteem.
Disclaimer: I only cite definitions from Urban Dictionary, but I remain firmly convinced they are correct definitions in context.
https://youtu.be/owDd1CJ17uQ?si=Z2bldI8IssG7rGON&t=1330
It's an _incredibly_ important concept to understand if you have even a passing interest in LLMs. You need to understand it if you want to have any kind of mental model for how LLM-powered agents are even possible.
That's the whole thing.
For a counter-example, consider Claude Code:
- 1 long context window, with (at most) 1 sub-agent
- same tools available at all times, and to sub-agent (except: spawning a sub-sub-agent)
- Full context stays in conversation, until you hit the context window limit. Compaction is automatic but extremely expensive. Quality absolutely takes a dive until everything is re-established.
- Deterministic lookup of content. Claude reads files with tools, not includes random chunks from RAG cosine similarity.
I could go on. In my experience, if you're going to use these techniques 1) maybe don't and 2) turn up the determinism to 11. Get really specific about _how_ you're going to use, and why, in a specific case.
For example, we're working on code migrations [0]. We have a tool that reads changelogs, migration guides, and OSS source. Those can be verbose, so they blow the context window on even 200k models. But we're not just randomly deleting things out of the "plan my migration" context, we're exposing a tool that deliberately lets the model pull out the breaking changes. This is "Context Summarization," but before using it, we had to figure out that _those_ bits were breaking the context, _then_ summarizing them. All our attempts at generically pre-summarizing content just resulted in poor performance because we were hiding information from the agent.
[0] https://tern.sh
This is different from a lot of the context-preserving sub-agents, which have fully different toolsets and prompts. It's much more general.
This is an itch I've been wanting to scratch myself, but the space has so many entrants that it's hard to justify the time investment.
That plus babysitting Claude Code's context is annoying as hell.
>That plus babysitting Claude Code's context is annoying as hell.
It's crazy to me that—last I checked—its context strategy was basically tool use of ls and cat. Despite the breathtaking amount of engineering resources major AI companies have, they're eschewing dense RAG setups for dirt simple tool calls.
To their credit it was good enough to fuel Claude Code's spectacular success, and is fine for most use cases, but it really sucks not having proper RAG when you need it.
On the bright side, now that MCP has taken off I imagine one can just provide their preferred RAG setup as a tool call.
If I understand you correctly, doesn't this break prefix KV caching?
This does reduce the context cache hit rate a bit, but I'm cache aware so I try to avoid repacking the early parts if I can help it. The tradeoff is 100% worth it though.
> For smaller models, the problems begin long before we hit 30 tools. One paper we touched on last post, “Less is More,” demonstrated that Llama 3.1 8b fails a benchmark when given 46 tools, but succeeds when given only 19 tools. The issue is context confusion, not context window limitaions.
High number of tools is a bit of a "smell" to me and often makes me wonder if the agent doesn't have too much responsibility. A bit like a method with so many parameters, it can do almost anything.
Have folks had success with agents like that? I found the fewer tools the better, e.g. <10 "ballpark".
6 more comments available on Hacker News