Skills Officially Comes to Codex
Key topics
The AI world is buzzing as Codex officially rolls out "Skills," a feature that lets users extend its capabilities with task-specific instructions and resources. Commenters are abuzz, with some poking fun at the concept while others are diving in, sharing their own experiences and creations, like custom skills for back-testing services. The conversation is also drawing parallels to Anthropic's similar "Agent Skills" feature, sparking curiosity about the overlap and potential applications in agentic workflows. As users explore this new functionality, the discussion is revealing a mix of excitement and skepticism, with some raving about the possibilities while others joke about the "particular set of skills" required to make the most of it.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
101
0-12h
Avg / period
21.7
Based on 130 loaded comments
Key moments
- 01Story posted
Dec 20, 2025 at 3:09 AM EST
21 days ago
Step 01 - 02First comment
Dec 20, 2025 at 3:09 AM EST
0s after posting
Step 02 - 03Peak activity
101 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 24, 2025 at 8:02 PM EST
16 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Skills are available in both the Codex CLI and IDE extensions.
I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.
That said, for many tasks (summaries and data extraction) I do use Gemini 2.5 Flash, as it cheap and fast. So excited to try Gemini 3 Flash as well.
Just the format would be. There's no rigid structure that gets any preferrential treatment by the LLM, even if it did accept. In the end it's just instructions that are no different in any way from the prompt text.
And nothing stops you from making a "parameterized and normalized to some agreed-upon structure" and passing it directly to the LLM as skills content, or parsing it and dumping it as skills regular text content.
The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough
With Skills however, you just selectively append more text to prompt and pray.
The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.
Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.
At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.
There you go, you're welcome.
[1]: https://news.ycombinator.com/item?id=46338371
Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).
Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).
Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.
Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.
I'm having a hard time figuring out how could I leverage skills in a medium size web application project.
It's python, PostgreSQL, Django.
Thanks in advance.
I wonder if skills are more useful for non crud-like projects. Maybe data science and DevOps.
but if it’s something more involved or less frequently used (perhaps some debugging methodology, or designing new data schemas) skills are probably a good fit
listTables getTableSchema executeQuery (blocks destructive queries like anything containing DROP, DELETE, etc..)
I wouldn't trust a textual instructions to prevent LLMs from dropping a table.
The key here is “on demand”. Not every agent or convention needs to know kung fu. But when they do, a skill is waiting to be consumed. This basic idea is “progressive disclosure” and it composes nicely to keep context windows focused. Eg i have a metabase skill to query analytics. Within that I conditionally refer to how to generate authentication if they arent authenticated. If they are authenticated, that information need not be consumed.
Some practical “skills”: writing tests, fetching sentry info, using playwright (a lot of local mcps are just flat out replaced by skills), submitting a PR according to team conventions (eg run lint, review code for X, title matches format, etc)
Maybe you have a custom auth backend that needs an annoying local proxy setup before it can be tested—you don’t need all of those instructions in the primary agents.md bloating the context on every request, a skill would let you separate them so they’re only accessed when needed.
Or if you have a complex testing setup and a multi-step process for generating realistic fixtures and mocks: the AI maybe only needs some basic instructions on how to run the tests 90% of the time, but when it’s time to make significant changes it needs info about your whole workflow and philosophy.
I have a django project with some hardcoded constants that I source from various third party sites, which need to be updated periodically. Originally that meant sitting down and visiting a few websites and copy pasting identifiers from them. As AI got better web search I was able to put together a prompt that did pretty well at compiling them. With a skill I can have the AI find the updated info, update the code itself, and provide it some little test scripts to validate it did everything right.
Poor man's "skills" is just manually adding different .md files to the context.
Also every time you instruct the agent to do something correctly that it did incorrectly before, you ask it to revise a relevant .md file/"skill", so it has that correction from now on.
Compared to MCPs, this is a much faster and more approachable flow to add "capabilities" to your agents.
I think that would be a really really interesting thing to do on a bunch of different tasks involving developer tooling (e.g. git, jj, linters, etc.)
The path to recursive self-improvement seems to be emerging.
On the other hand, from a pure functional coding appeal, new skills that don't have leaking roles can be more atomic and efficient in the long run. Both have their pros/cons.
Otherwise, why not just keep the password in an .env file, and state “grab the password from the .env file” in your Postgres skill?
Why not the filesystem?
I would create a local file (e.g. .env) in each project using postgres, then in my postgres skill, tell the agent to check that file for credentials.
I have many "folders"... each with a README.md, a scripts folder, and an optional GUIDE.md.
Whenever I arrive at some code that I know can be reused easily (for example: clerk.dev integration hat spans frontend and backend both), I used to create a "folder" of the same.
When needed, I used to just copy-paste all the folder content using my https://www.npmjs.com/package/merge-to-md package.
This has worked flawlessly well for me uptil now.
Glad we are bringing such capability natively into these coding agents.
It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.
(To clarify, I meant that some engineers mostly use CC while others mostly use Codex, as opposed to engineers using both at the same time.)
Some paths are emerging popular, but in a lot of cases we’re still not sure even these are the long term paths that will remain. It doesn’t help that there’s not a good taxonomy (that I’m aware of) to define and organize the different approaches out there. “Agent” for example is a highly overloaded term that means a lot of things and even in this space, agents mean different things to different groups.
For LLMs, we're just about at the stage where we've realized we can jam a sharp thing in the spinny part and use it to cut things. The race is on not only to improve the motors (models) themselves, but to invent ways of holding and manipulating and taking advantage of this fundamental thing that feel so natural that they seem obvious in hindsight.
Tools are useful so the AI can execute commands, but beyond that it's just ways to help you build the context for your prompt. Either pulling in premade prompts that provides certain instructions or documentation, or providing more specialized tools for the model to use along with instructions on using those tools.
More like a gallery than a marketplace
not ranked with comments but I’d expect solid quality from these and they should “just work” in Codex etc.
- you will be getting a TON of spam. Just look at all the MCP folks, and how they're spamming everywhere with their claude-vibed mcp implementation over something trivial.
- the security implications are enormous. You'd need a way to vet stuff, moderate, keep track of things and so on. This only compounds with more traffic, so it'd probably be untenable really fast.
- there's probably 0 money in this. So you'd have to put a lot of work in maintaining a platform that attracts a lot of abuse/spam/prompt kiddies, while getting nothing in return. This might make sense to do for some companies that can justify this cost, but at that point, you'd be wondering what's in it for them. And what control do they exert on moderation/curation, etc.
I think the best we'll get in this space is from "trusted" entities (i.e. recognised coders / personalities / etc), from companies themselves (having skills in repos for known frameworks might be a thing, like it is with agents.md), and maybe from the token providers themselves.
Imagine having Skills available that implements authentication systems, multi-tenancy, etc.. in your codebase without having to know all the details about how to do this securely and correctly. This would probably boost code quality a lot and prevent insecure/buggy vibe coded products.
A lot of the things we want continuous learning for can actually be provided by the ability to obtain skills on the fly.
I've this mental map:
Frontmatter <---> Name and arguments of the function
Text part of Skill md <---> description field of the function
Code part of the Skill <---> body of the function
But the function wouldn't look as organised as the .md, also, Skill can have multiple function definitions.
So if you have subtle logic in a Skill that’s not mentioned in a description, or you use the skill body to describe use-cases not obvious from the front-matter, it may never be discovered or used.
Additionally, skill descriptions are all essentially prompt injections, whether relevant/vector-adjacent to your current task or not; if they nudge towards a certain tone, that may apply to your general experience with the LLM. And, of course, they add to your input tokens on every agentic turn. (This feature was proudly brought to you by Big Token.) So be thoughtful about what you load in what context.
See e.g. https://github.com/openai/codex/blob/a6974087e5c04fc711af68f...
1. Open-Skills: https://github.com/BandarLabs/open-skills
This is really an agentic harness issue, not LLM issue per se.
In 2026, I think we'll see agentic harnesses much more tightly integrated with their respective LLMs.
Close enough, welcome back index.htm, can't wait to see the first ads being served in my skills
Obviously they are empowering Codex and Claude etc, and many will be open source or free.
But for those who have commercial resources or tools to add to the skills choice, is there documentation for doing that smoothly, or a pathway to it?
I can see at least a couple of ways - skills requiring API keys or other similar approaches, but this adds friction to an otherwise smooth skill integration process.
Having instead a transparent commission on usage sent to registered skill suppliers would be much cleaner but I'm not confident that would be offered fairly, and I've seen no guidance yet on plans in that regard.
What do "skills" look like, generically, in this framework?
<Skills>
</Skills>The harness then may periodically resend this notification so that the LLM doesn't "forget" that skills are available. Because the notification is only name + description + file, this is cheap r.e tokens. The harness's ability to tell the LLM "IMPORTANT: this is a skill, so pay attention and use it when appropriate" and then periodically remind them of this is what differentiates a proper Anthropic-style skill from just sticking "If you need to do postgres stuff, read skills/postgres.md" in AGENTS.md. Just how valuable is this? Not sure. I suspect that a sufficiently smart LLM won't need the special skill infrastructure.
(Note that skill name is not technically required, it's just a vanity / convenience thing).
... And do we know how it does that? To my understanding there is still no out-of-band signaling.
So it's just like a standard way to bring in prompts/scripts to the LLM with support from the tooling directly.
One difference is the model might have been trained/fine-tuned to be better at "read all the front-matters from a given folder on your filesystem and then decide..." compared a model with those instructions only in its context.
I see there might be advantages. The manual alternative could be tweaked further though. For example you might make it hierarchical.
Or you could create an "howTo" MCP with more advanced search capabilities. (or a grandma MCP to ask advice to after a failure)
Interesting topic, I guess has found a real best practice, everybody is still exploring.
Anthropic: https://www.anthropic.com/engineering/equipping-agents-for-t...
Copilot: https://github.blog/changelog/2025-12-18-github-copilot-now-...
can we use notepad or somrthing free and not proprietary?
that's all there is to it.
If you want to go deeper, then Skills are dynamically unfolding prompts.
If you want a large library of skills and don't want to fill up your context window then checkout opencode-skillful
As of this week, this also applies to Hacker News.