Claude Code
github.comKey Features
Tech Stack
Key Features
Tech Stack
We built a shared memory layer you can drop in as a Claude Code Skill. It’s basically a tiny memory DB with recall that remembers your sessions
Not magic. Not AGI. Just state.
Install in Claude Code
/plugin marketplace add https://github.com/mutable-state-inc/ensue-skill /plugin install ensue-memory # restart Claude Code
What it does:
1. persists context between sessions 2. semantic & temportal search (not just string grep)
basically git for your Claude brain
What it doesn’t do: - it won’t read your mind - it’s alpha; it might break if you throw a couch at it
Repo: https://github.com/mutable-state-inc/ensue-skill
If you try it and it sucks, tell me why so I can fix it. If you like it, a star would be fun
Don't be kind, tia
Very clearly AI written
I'm sold.
With that said, I can't think of a way that this would work. How does this work? I took a very quick glance, and it's not obvious at first glance.
The whole problem is, the AI is short on context, it has limited memory. Of course, you can store lots of memory elsewhere, but how do you solve the problem of having the AI not know what's in the memory as it goes from step to step? How does it sort of find the relevant memory at the time that that relevance is most active?
Could you just walk through the sort of conceptual mechanism of action of this thing?
I think of it like a file tree with proper namespacing and keep abstract concepts in separate directories. so like my food preferences will be in like /preferences/sandos. or you can even do things like /system-design preferences and then load them into a relevant conversation for next time.
Then Claude uses the MCP tools according to the [SKILL.md](https://github.com/mutable-state-inc/ensue-skill/blob/main/s...) definition.
You call it using a skill by saying: “look over our past conversations for when we talked about caching user profiles” and it lets the frontier model query a database of your conversations locally.
It is incredibly efficient because you only retrieve what you need and it does not rely on embeddings, because the AI can quickly fuzz queries to RAG in on what’s useful. it works.
The skill is called Total Recall, and the CLI is distributed with a macOS app called Contextify.
It works by ingesting all of your text transcripts from your conversations with Claude Code or Codex as they happen, and exposing AI friendly tools for searching them.
I shipped the feature a few days ago, with an intro here: https://contextify.sh/blog/
It can also do some pretty nifty reporting if you ask for reports.
I’ve got a Linux client in development if anyone is not on macOS and wants to give it a try.
1. Embeds the current request.
2. Runs a semantic + timestamp-weighted search over your past sessions. Returns only the top N items that look relevant to this request.
3. Those get injected into the prompt as context (like extra system/user messages), so Claude sees just enough to stay oriented without blowing context limits.
Think of it like: Attention over your historical work, more so than brute force recall. Context on demand basically giving you an infinite context window. Bookmark + semantic grep + temporal rank. It doesn’t “know everything all the time.” It just knows how to ask its own past: “What from memory might matter for this?”
When you try it, I’d love to hear where the mechanism breaks for you.
Text Index of past conversations, using prompt-like summaries.
Or, over continuing the same session and compacting?
The project is still in alpha, so you could shape what we build next - what do you need to see, or what gets you comfortable sending proprietary code to other external services?
Honestly? It just has to be local.
At work, we have contracts with OpenAI, Anthropic, and Google with isolated/private hosting requirements, coupled with internal, custom, private API endpoints that enforce our enterprise constraints. Those endpoints perform extensive logging of everything, and reject calls that contain even small portions of code if it's identified as belonging to a secret/critical project.
There's just no way we're going to negotiate, pay for, and build something like that for every possible small AI tooling vendor.
And at home, I feed AI a ton of personal/private information, even when just writing software for my own use. I also give the AI relatively wide latitude to vibe-code and execute things. The level of trust I need in external services that insert themselves in that loop is very high. I'm just not going to insert a hard dependency on an external service like this -- and that's putting aside the whole "could disappear / raise prices / enshittify at any time" aspect of relying on a cloud provider.
The app works. When I feel like working on it, I just open a CLI coding agent and say 'start working'. Then every so often I say 'commit and push' or 'find opportunities to improve the code base by refactoring, and create an issue for each opportunity'.
(I followed the instructions to add the boilerplate instructions for both bd and bv, to AGENTS.md)
We use Cursor where I work and I find it a good medium for still being in control and knowing what is happening with all of the changes being reviewed in an IDE. Claude feels more like a black box, and one with so many options that it's just overwhelming, yet I continue to try and figure out the best way to use it for my personal projects.
It's always interesting reading other people's approaches, because I just find them all so very different than my experience.
I need Agents, and Skills to perform well.
Things that need special settings now won’t in the future and vice versa.
It’s not worth investing a bunch of time into learning features and prompting tricks that will be obsoleted soon
They do get better, but not enough to change any of the configuration I have.
But you are correct, there is a real possibility that the time invested with be obsolete at some point.
For sure the work towards MCPs are basically obsolete via skills. These things happen.
how would that be a "skill"? just wrap the mcp in a cli?
fwiw this may be a skill issue, pun intended, but i can't seem to get claude to trigger skills, whereas it reaches for mcps more... i wonder if im missing something. I'm plenty productive in claude though.
So a Skill is just a smaller granulatrity level of that concept. It's just one of the individual things an MCP can do.
This is about context management at some level. When you need to do a single thing within that full list of potential things, you don't need the instructions about a ton of other unrelated things in the context.
So it's just not that deep. It would be having a python script or whatever that the skill calls that returns the runtime dependencies and gives them back to the LLM so they can refactor without blindly greping.
Does that make sense?
So in my nano banana image generation skill, it contains a python script that does all the actual work. The skill just knows how to call the python script.
We're attaching tools to the md files. This is at the granular level of how to hammer a nail, how to use a screw driver, etc. And then the agent, the handyman, has his tool box of skills to call depending on what he needs.
But for sure, there are places it makes sense, and there are places it doesn't. I'm arguing to maximully use it for places that make sense.
People are not doing this. They are leaving the LLM to everything. I am arguing it is better to move everything possible into tools that you can, and have the LLM focus only on the bits that a program doesn't make sense for.
again going to my example, a skill to do a dependency graph would have to do a complex search. and in some languages the dependency might be hidden by macros/reflection etc.
how would you do this with a skill, which is just a text file nudging the llm whereas the MCP's server goes out and does things.
not that i care too too much about small amounts of tokens but depleting your context rapidly seems bad. what is the positive tradeoff here?
That uses less tokens. The LLM is just calling the script, and getting the response, and then using that to continue to reason.
So I'm not exactly following.
Consider more when you're 50+ hours in and understand what more you want.
Agree with the other comments: pretty much running vanilla everything and only the Playwright MCP (IMO way better than the native chrome integration) and ccstatusline (for fun). Subagents can be as simple as saying "do X task(s) with subagent(s)". Skills are just self @-ing markdown files.
Two of the most important things are 1) maintaining a short (<250 lines) CLAUDE.md and 2) having a /scratch directory where the agent can write one-off scripts to do whatever it needs to.
I've TL'd and PM'd as well as IC'd. Now my IC work feels a lot more like a cross between being a TL and being a senior with a handful of exuberant and reasonably competent juniors. Lots of reviewing, but still having to get into the weeds quickly and then get out of their way.
Interesting... I've been in management for a few years now and recently doing some AI coding work. I've found my skills as a manager/TL are far more adaptable to getting the best out of AI agents than my skills as a coder.
That said, it's well know that Anthropic uses CC for production. You just slow things down a bit, spend more time on the spec/planning stage and manually approve each change. IMO the main hurdle to broader Claude Code adoption is getting over the "that's not how I would have written it" mindset.
This helps it organize temporary things it does like debugging scripts and lets it (or me) reference/build on them later, without filling the context window. Nothing fancy, just a bit of organization that collects in a repo (Git ignored)
I've been trying to write blogs explaining it recently, but I don't think I'm very good at making it sound interesting to people.
What can I explain that you would be interested in?
Here was my latest attempt today.
https://vexjoy.com/posts/everything-that-can-be-deterministi...
Here is what I don't get. it's trivial to do this. Mine is of course customized to me and what I do.
The idea is to communicate the ideas, so you can use them in your own setup.
It's trivial to put for example, my do router blog post in claude code and generate one customized for you.
So what does it matter to see my exact version?
These are the type of things I don't get. If I give you my details, it's less approachable for sure.
The most approachable thing I could do would be to release individual skills.
Like I have skills for generating images with google nano banana. That would be approachable and easy.
But it doesn't communicate the why. I'm trying to communicate the why.
When you've tried 10 ways of doing it but they all end up getting into a "feed the error back into the LLM and see what it suggests next" you aren't that motivated to put that much effort into trying out an 11th.
The current state of things is extremely useful for a lot of things already.
I'm not sure if the problems you run into with using LLMs will be solved if you do it my way. My problems are solved doing it my way. If I heard more about your problems, I would have a specific answer to them.
These are the solutions to where I have run into issues.
For sure, but my solutions are not feed the error back into the LLM. My solutions are varied, but as the blog shows, they are move as much as possible into scripts, and deterministic solutions, and keep the LLM to the smallest possible scope.
The current state of things is extremely useful for a subset of things. That subset of things feels small to me. But it may be every thing a certain person wants to do exists in that subset of things.
It just depends. We're all doing radically different things, and trying very different things.
I certainly understand and appreciate your perspective.
In some sense, computers and digital things have now just become a part of reality, blending in by force.
But the things I am doing might not be the things you are doing.
If you want proof, I intend to release a game to the App Store and steam soon. At that point you can judge if it built a thing adequately.
I hope you're just one of the ones who figured it out early and all the hype isn't fake bullshit. I'd much rather be proven wrong than for humanity to have wasted all this time and resources.
I think of this stuff as trivial to understand from my point of view. I am trying to share that.
I have nothing to sell, I don’t want anyone to use my exact setup.
I just want to communicate the value as I see it, and be understood.
The vast majority of it all is complete bullshit, so of course I am not offended that I may sound like 1000 other people trying to get you to download my awesome Claude Code Plugins repo.
Except I’m not actually providing one lol
My basic problem is: "first-run" LLM agent output frequently does one or more of the following: fails to compile/run, fails existing test coverage, or fails manual verification. The first two steps have been pretty well automated by agents: inspect output, try to fix, re-run. IME this works really well for things like Python, less-well for things like certain Rust edge cases around lifetimes and such, or goroutine coordination, which require a different sort of reasoning than "typical" procedural programming.
But let's assume that the agents get even better at figuring out the deal with the more specialized languages/features and are able to iterate w/o interaction to fix things.
If the first-pass output still has issues, I still have concerns. They aren't "I'm not going to use these tools" concerns, because I also sometimes write bugs, and they can write the vast majority of code faster than I can.
But they are "I'm not gonna vibe-code my day job" concerns because the existence of trivially-catchable issues suggests that there's likely harder-to-catch issues that will need manual review to make sure (a) test coverage is sufficient, (b) the mental model being implemented is correct, (c) the outside world is interacted with correctly. And I still find bugs in these areas that I have to fix manually.
This all adds up to "these tools save me 20-30% of my time" (the first-draft coding) vs "these agents save me 90% of my time."
So I'm kinda at a plateau for a few months where it'll be hard to convince me to try new things to try to close that 20-30% -> 90% number.
The real issue is I don’t know the issues ahead of time. So each experience is an iteration stopping things I didn’t know would happen.
Thankfully, I’m not trying to sell anyone anything. I don’t even want people to use what I use. I only want people to understand the why of what I do, and how it adds me value.
I think it’s important to understand this thing we use as best we can.
The personal value you can get, is entirely up to your tolerance for it.
I just enjoy the process
For large codebases (my own has 500k lines and my company has a few tens of millions) you need something better like RPI.
If nothing else just being able to understand code questions basically instantly should give you a large speed up, even without any fancy stuff.
I agree that this level of finetuning feels overwhelming and might let yourself doubting whether you do utilize Claude to its optimum and the beauty is, that finetunging and macro usage don't interfere, when you stay in your lane.
For example I now don't use the planing agent anymore instead incorporated this process into the normal agents much to the project's advantage. Consistency is key. Anthropic did the right thing.
Codex is quite a different beast and comes from the opposite direction so to say.
I use both, Codex and Claude Opus especially, in my daily work and found them complementary not mutual exclusive. It is like two different evangelists who are on par exercising with different tools to achieve a goal, that both share.
It's also deeply interesting because it's essentially unsolved space. It's the same excitement as the beginning of the internet.
None of us know what the answers will be.
1. Current directory ./CLAUDE.md
2. User directory ~/.claude/CLAUDE.md
I stick general preferences in what it calls "user memory" and stick project specific preferences in the working directory.The PMs were right all along!
the docs if you are curious: https://www.ensue-network.ai/docs
I’ll give this a go though and let you know!
If you're using them though, we no longer have the problem of Claude forgetting things.
Agents are an md file with instructions.
Skills are an md file with instructions.
Commands are.. you get the point.
We're just dealing with instructions. Claude.md is handled by Claude Code. It is forgotten almost entirely often when the context fills.
Okay, what is an agent? An agent is basically a Claude.md file, but you make it extremely granular. So it only has instructions of let's say, Typescript.
We're all just doing context management here. We're trying to make sure our instructions that matter stay.
To do that, we have to remove all other instructions from the picture.
When you're doing typescript, you only know type script things.
Okay, what's a skill? A skill is doing a single thing with type script. Why? So that the context is even smaller.
Instead of the agent having every single instruction you need about typescript, you put them in skills so they only get put into context when that thing is needed.
But skills are also where you connect deterministic programs. For example, I have a skill for creating images in nano banana.
So when the Typescript Agent needs to create an image, it calls the skill, that calls the python script, to create images in nano banana.
We're managing all the context to only be available when it's needed, keeping all other instructions out.
Does that help?
What I'd think would help, would be doing things in smaller chunks: an agent to do this small subtask, another to do that small subtask, and the parent task context only grows with the sub-agents reporting back, not with their chain-of-thought.
Could be I should have a much bigger, more detailed Claude.md. Mine tend to be small project overviews and a list of TODOs, not that different from README.md: https://github.com/lkbm/sideways_math/blob/main/CLAUDE.md
This has been found in all sorts of variations, and is accepted. It's not just my word, it's the standard understanding. But also, it's true in my experience.
You also limit the session work when you are offloading to agents, which are calling skills. So it does do that as well.
But that's what we're talking about, having agents do things in smaller chunks. Maybe I didn't explain that clearly enough?
It's just a complex topic. It's hard to cover everything. But yet, that's part of the idea. We're making context very specific to the task, and cutting things up into smaller chunks with dedicated context windows only for that task.
I almost completely ignore Claude.md - there are very few rules I want every subtask to know about.
the main system is for cordination, of the many agents with their own dedicated Claude.mds which call their own skills with their own dedicated instructions.
It's like Russian dolls. Now, I do use much, much MUCH larger Agent mds than anyone else. I've been doing this for 10 months, and I believe it's the correct way to do it.
I intend to write a blog post on it, it's a topic in it's own part.
Even your small Claude.md - I would have in several files. The Typescript agent is the only one that needs to know about the Typescript details. This is the kind of thing I mean.
This wasn't mentioned in the first post, but the use case we’re focused on isn’t really “Claude forgetting,” but context living beyond a single agent or tool. Even if Claude remembers well within a session, that context is still owned by that agent instance.
The friction shows up when you switch tools or models (Claude → Codex / Cursor / etc.), run multiple agents in parallel, or want context created in one place to be reused elsewhere without re-establishing it.
In those cases, the problem isn’t forgetting so much as fragmentation. If someone is happy with one agent and one tool, there are probably a bunch of memory solutions to choose from. The value of this external memory network that you can plug into any model or agent shows up once context needs to move across tools and people.
Otherwise the ability to search back through history is a valuable simple grep/jq combo over the session directory.
I feel that way too. I have a lot of these things.
But the reality is, it doesn't really happen that often in my actual experience. Everyone is very slow as a whole to understand what these things mean, so far you get quite a bit of time just with an improved, customized system of your own.
So... it's tough. I think memory abstractions are generally a mistake, and generally not needed, however I also think that compacting has gotten so wrong recently that they are also required until Claude Code releases a version with improved compacting.
But I don't do memory abstraction like this at all. I use skills to manage plans, and the plans are the memory abstraction.
But that is more than memory. That is also about having a detailed set of things that must occur.
I think planning is a critical part of the process. I just built https://github.com/backnotprop/plannotator for a simple UX enhancement
I am working alone. So I am instead having plans automatically update. Same conception, but without a human in the mix.
But I am utilizing skills heavily here. I also have a python script which manages how the LLM calls the plans so it's all deterministic. It happens the same way every time.
That's my big push right now. Every single thing I do, I try to make as much of it as deterministic as possible.
I'm just also working on real projects as well, so a lot of my priority is focused on new skills building, and not worrying about managing the current ones I have as github repos.
My approach is literally just a top-level, local, git version controlled memory system with 3 commands:
- /handoff - End of session, capture into an inbox.md
- /sync - Route inbox.md to custom organised markdown files
- /engineering (or /projects, /tasks, /research) - Load context into next session
I didn't want a database or an MCP server or embeddings or auto-indexing when I can build something frictionless that works with git and markdown.
Repo: https://github.com/ossa-ma/double (just published it publicly but its about the idea imo)
I will typically make multiple '/handoff's per day as I use Claude code whereas I typically use '/sync' at the end of the day to organise them all at once.
I think at this point in time, we both have it right.
Why did you need to use AI to write this post?
Though I have found repo level claude.md that is updated everytime claude makes a mistake plus using —restore to select a previous relevant session works well.
There is no way for Anthropic to optimize Claude code or the underlying models for these custom setups. So it’s probably better to stick with the patterns Anthropic engineers use internally.
And also - I genuinely worry about vendor lock-in, do you?
None, that's what I'm trying to say. My favorite is just storing project context locally in docs that agents can discover on their own or I can point to if needed. This doesn't require me to upload sensitive code or information to anonymous people's side projects and has and equivalent amount of hard evidence for efficacy (zero), but at least has my own anecdotal evidence of helping and doesn't invite additonal security risk.
People go way overboard with MCPs and armies of subagents built on wishes and unproven memory systems because no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress. Doesn't mean it's time to send our data to strangers.
FWIW, I find this eventual degradation point comes much later and with fewer consequences when there are strict guardrails inside and outside of the LLM itself.
From what I've seen, most people try to fix only the "inside" part - by tweaking the prompts, installing 500 MCPs (that ironically pollute the context and make problem worse), yell in uppercase in hopes that it will remember etc, and ignore that automated compliance checks existed way before LLMs.
Throw the strictest and most masochistic linting rules at it in a language that is masochistic itself (e.g. rust), add tons of integration tests that encode intent, add a stop hook in CC that runs all these checks and you've got a system that is simply not allowed to silently drift and can put itself back on track with feedback it gets from it.
Basically, rather than trying to hypnotize an agent to remember everything by writing a 5000 line agents.md, just let the code itself scream at it and feed the context.
So a better question to ask is - Do you have any ideas for an objective way to a measure a performance of agentic coding tools? So we can truly determine what improves performance or not.
I would hope that internal to OpenAI and Anthropic they use something similar to the harness/test cases they use for training their full models to determine if changes to claude code result in better performance.
I often use restore conversion checkpoint after successfully completing a side quest.
All of these systems are for managing context.
You can generally tell which ones are actually doing something if they are using skills, with programs in them.
Because then, you're actually attaching some sort of feature to the system.
Otherwise, you're just feeding in different prompts and steps, which can add some value, but okay, it doesn't take much to do that.
Like adding image generation to claude code with google nano banana, a python script that does it.
That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"
another is one claude code ships with, using rip grep.
Those are actual features. It's adding deterministic programs that the llm calls when it needs something.
> Otherwise, you're just feeding in different prompts and steps
"skills" are literally just .md files with different prompts and steps.
> That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"
It's not adding anything but a prompt saying "when asked to fo X invoke script Y"
You're packaging the tool with the skill, or multiple tools to do a single thing.
There's no inherent magic to skills, or any fundamental difference between them and "just feeding in different prompts and steps". It literally is just feeding different prompt and steps.
Also, the pick up or not pick up, or discover or may not discover is solved as well. It's handled by my router, which I wrote about here - https://vexjoy.com/posts/the-do-router/
So these are solved problems to me. There are many more problems which are not solved, which are the interesting space to continue with.
I’m not sure where the ‘despite’ comes in. Experts and vets have opinions and this is probably the best online forum to express them. Lots of experts and vets also dislike extremely popular unrelated tools like VB, Windows, “no-code” systems, and Google web search… it’s not a personality flaw. It doesn’t automatically mean they’re right, either, but ‘expert’ and ‘vet’ are earned statuses, and that means something. We’ve seen trends come and go and empires rise and fall, and been repeatedly showered in the related hype/PR/FUD. Not reflexively embracing everything that some critical mass of other people like is totally fine.
I run it in automatic mode with decent namespacing, so thoughts, notes, and whole conversations just accumulate in a structured way. As I work, it stores the session and builds small semantic, entity-based hypergraphs of what I was thinking about.
Later I’ll come back and ask things like:
what was I actually trying to fix here?
what research threads exist already?
where did my reasoning drift?
Sometimes I’ll even ask Claude to reflect on its own reasoning in a past session and point out where it was being reactive or missed connections.
Claude itself can just update the claude.md file with whatever you might have forgot to put in there.
You can stick it in git and it lives with the project.
66 more comments available on Hacker News
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.