Recursive Language Models
Key topics
The debate around recursive language models is heating up, with researchers exploring the idea of treating long prompts as part of the environment that a large language model (LLM) can interact with symbolically. Commenters are drawing parallels to existing techniques like Retrieval-Augmented Generation (RAG), but highlighting key differences, such as the recursive nature of this new approach and its more "agentic" behavior. As one commenter quipped, it's "LLMs all the way down," while others are calling for greater transparency around techniques like compaction, with some even speculating about potential implementations. The discussion is sparking interesting insights into the evolving landscape of LLMs and their potential applications.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
5h
Peak period
4
5-6h
Avg / period
2.4
Based on 24 loaded comments
Key moments
- 01Story posted
Jan 3, 2026 at 6:29 AM EST
5d ago
Step 01 - 02First comment
Jan 3, 2026 at 11:51 AM EST
5h after posting
Step 02 - 03Peak activity
4 comments in 5-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Jan 3, 2026 at 10:06 PM EST
4d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
How is this fundamentally different from RAG? Looking at Figure 4, it seems like the key innovation here is that the LLM is responsible for implementing the retrieval mechanism as opposed to a human doing it.
1. RAG (as commonly used) is more of a workflow, this thing is more "agentic"
2. The recursive nature of it
First, the way I see workflow vs. agentic: the difference is where the "agency" is. In a workflow, the coder decides (i.e. question -> embed -> retrieve -> (optional) llm_call("rerank these parts with the question {q} in mind") -> select chunks -> llm_call("given question {q} and context {c}, answer the question to the best of your knowledge") )
The "agentic" stuff has the agent decide what to search for, how many calls to make and so on, and it then decides when to answer (i.e. if you've seen claude code / codex work on a codebase, you've seen them read files, ripgrep a repo, etc).
The second thing, about recurrence has been tried before (babyagi was one of the first that I remember, circa '23) but the models weren't up to it. So there was a lot of glue around them to make them kinda sorta work. Now they do.
This technique should be something you could swap in for whatever Claude Code bakes in — but I don’t think the correct hooks or functionality is exposed.
I have read the gemini source and it’s a pretty simple prompt to summarize everything when the context window is full
0: https://github.com/apple/ml-clara
Neat idea, but not a new idea.
> RLMs are not agents, nor are they just summarization. The idea of multiple LM calls in a single system is not new — in a broad sense, this is what most agentic scaffolds do. The closest idea we’ve seen in the wild is the ROMA agent that decomposes a problem and runs multiple sub-agents to solve each problem. Another common example is code assistants like Cursor and Claude Code that either summarize or prune context histories as they get longer and longer. These approaches generally view multiple LM calls as decomposition from the perspective of a task or problem. We retain the view that LM calls can be decomposed by the context, and the choice of decomposition should purely be the choice of an LM.
@summarizable(recursive=True)
def long_running_task(Subagent)
on my long horizon tasks, where the hierarchy is determined at agent execution time…
[0] https://github.com/adagradschool/scope
[1] https://www.deepclause.ai