Openai Are Quietly Adopting Skills, Now Available in Chatgpt and Codex CLI

Posted21 days agoActive18 days ago

simonw

575 points

320 comments

simonwillison.netTech DiscussionstoryHigh profile

informativeneutral

Debate

20/100

AI ResearchLarge Language ModelsCommand Line ToolAI

Key topics

AI Research

Large Language Models

Command Line Tool

Discussion Activity

Very active discussion

First comment

25m

Peak period

118

0-6h

Avg / period

26.7

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 12, 2025 at 6:30 PM EST
21 days ago
Step 01
02First comment
Dec 12, 2025 at 6:55 PM EST
25m after posting
Step 02
03Peak activity
118 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Dec 15, 2025 at 8:31 AM EST
18 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (320 comments)

Showing 160 comments of 320

simonwAuthor

21 days ago

5 replies

I had a bunch of fun writing about this one, mainly because it was a great excuse to highlight the excellent news about Kākāpō breeding season this year.

(I'm not just about pelicans.)

KK7NIL

21 days ago

2 replies

TIL about a large moss green flightless parrot :)

mkl

20 days ago

They're also nocturnal!

uoaei

21 days ago

I'm impressed you have never encountered :partyparrot: in your work Slack.

jb_rad

21 days ago

2 replies

Will Kākāpō be riding bicycles soon?

OrsonSmelles

21 days ago

1 reply

They already ride British nature photographers—what do they need bikes for?

throwup238

21 days ago

https://youtube.com/watch?v=Jlk9u8MIv7o

The foreplay starts around the 1 minute mark.

pineaux

21 days ago

as an svg you mean? cause nano banana rides circles around the pelicans

bilekas

21 days ago

1 reply

> Skills are a keeper #

Good thinking, I agree actually, however..

> Skills are based on a very light specification, if you could even call it that, but I still think it would be good for these to be formally documented somewhere.

Like a lot of posts around AI, and I hope OP can speak to it, surely you can agree that while when used for a good cool idea, it can also be used for the inverse and probably to more detrimental reason. Why would they document an unmanageable feature that may be consumed.

Have you or would you try this on a local LLM instead ?

simonwAuthor

21 days ago

2 replies

These work well with local LLMs that are powerful enough to run a coding agent environment with a decent amount of context over longer loops.

The OpenAI GPT OSS models can drive Codex CLI, so they should be able to do this.

I have high hopes for Mistral's Devstral 2 but I've not run that locally yet.

bilekas

21 days ago

> These work well with local LLMs that are powerful enough to run a coding agent environment with a decent amount of context over longer loops.

That's actually super interesting, maybe something I'll try investigate and find the minimum requirements because as cool as they seem, personalized 'skills' might be a more useful use of AI overall.

Nice article, and thanks for answering.

Edit: My thinking is consumer grade could be good enough to run this soon.

ipaddr

21 days ago

Something that powerful requires some rewriting of the house.

Local LLMs are better for long batch jobs not things you want immediately or your flow gets killed.

ajcp

21 days ago

1 reply

And so the Kākāpō Benchmark was born

swyx

21 days ago

and this is my excuse to talk about the :partyparrot: emoji being from an actual real life documentary https://www.youtube.com/watch?v=9T1vfsHYiKY&pp=ygUSa2FrYXBvI...

quinncom

20 days ago

[delayed]

koakuma-chan

21 days ago

1 reply

Does Cursor support skills?

smcleod

21 days ago

No I don't believe so. Cursor is usually pretty behind other agentic coding tools in my experience.

hurturue

21 days ago

1 reply

Github Copilot too

simonwAuthor

21 days ago

VS Code Copilot just announced experimental skill support in their November release: https://code.visualstudio.com/updates/v1_107#_reuse-your-cla...

jumploops

21 days ago

3 replies

I think the future is likely one that mixes the kitchen-sink style MCP resources with custom skills.

Services can provide an MCP-like layer that provides semantic definitions of everything you can do with said service (API + docs).

Skills can then be built that combine some subset of the 3rd party interfaces, some bespoke code, etc. and then surface these more context-focused skills to the LLM/agent.

Couldn’t we just use APIs?

Yes, but not every API is documented in the same way. An “MCP-like” registry might be the right abstraction for 3rd parties to expose their services in a semantic-first way.

esafak

21 days ago

1 reply

If only there was a way to progressively reveal the API in MCP instead of presenting the full laundry list up front.

simonwAuthor

21 days ago

That is effectively what this proposal is about: https://www.anthropic.com/engineering/code-execution-with-mc...

prescriptivist

21 days ago

1 reply

Agree. I'd add that a aha moment to skills is AI agents are pretty good at writing skills. Let's say you have developed an involved prompt that explains how to hit an API (possibly with the complexity of reading credentials from an env var or config file) or run a tool locally to get some output you want the agent to analyze (example, downloading two versions of python packages and diffing them to analyze changes). Usually the agent reading the prompt it's going to leverage local tools to do it (curl, shell + stdout, git, whatever) every single time. Every time you execute that prompt there is a lot thinking spent on deciding to run these commands and you are burning tokens (and time!). As an eng you know that this is a relatively consistent and deterministic process to fetch the data. And if you were consuming it yourself, you'd write a script to automate it.

So you read about skills (prompt + scripts) to make this more repeatable and reduce time spent thinking. At that point there are two paths you can go down -- write the skill and prompt yourself for the agent to execute -- or better -- just tell the agent to write the skill and prompt and then you lightly edit it and commit it.

This may seem obvious to some, but I've seen engineers create skills from scratch because they have a mental model around skills being something that people must build for the agent, whereas IMO skills are you just bridging a productivity gap that the agent can't figure out itself (for now).

simonwAuthor

21 days ago

1 reply

The example Datasette plugin authoring skill I used in my article was entirely written by Claude Opus 4.5 - I uploaded a zip file to its the Datasette repo in it (after it failed to clone that itself for some weird environment reason) and had it use its skill-writing skill to create the rest: https://claude.ai/share/0a9b369b-f868-4065-91d1-fd646c5db3f4

prescriptivist

21 days ago

That's awesome and I have a few similar conversations with Claude. I wasn't quite an AI luddite a couple months ago, but close. I joined a new company recently that is all in on AI and I have a huge token budget so I jumped all the way in myself. I have my choice of tools I can use and once I tried Claude Code it all clicked. The topology they are creating for AI tooling and concepts is the best of all the big LLMs, by far. If they can figure out the remote/cloud agent piece with the level of thought they have given to Code, it'd be amazing. Cursor Cloud has this locked down right now, but I'm looking forward to how Anthropic approaches it.

dkdcio

21 days ago

CLIs are really good when you can use them. self-documenting, agents already have shell tools, they tend to solve fine-grained auth, etc.

feels like the right layer of abstraction for remote APIs

esperent

21 days ago

2 replies

[delayed]

mhalle

21 days ago

I have found that scripts, and the environment that runs them, are the skills' superpower.

Computability (scripts) means being able build documents, access remote data, retrieve data from packaged databases and a bunch of other fundamentally useful things, not just "code things". Computability makes up for many of the LLM's weaknesses and gives it autonomy to perform tasks independently.

On top of that, we can provide the documentation and examples in the skill that help the LLM execute computability effectively.

And if the LLM gets hung up on something while executing the skill, we can ask it why and then have it write better documentation or examples for a new skill version. So skills can self-improve.

It's still so early. We need better packaging, distribution, version control, sharing, composability.

But there's definitely something simple, elegant, and effective here.

ohghiZai

21 days ago

Looking for a way to do this with ADK as well, looks like skills can be a sweet spot between giant instruction and sprawling tools/subagents.

mbesto

21 days ago

1 reply

From a purely technical view, skills are just an automated way to introduce user and system prompt stuffing into the context right? Not to belittle this, but rather that seems like a way of reducing the need for AI wrapper apps since most AI wrappers just do systematic user and system prompt stuffing + potentially RAG + potentially MCP.

simonwAuthor

21 days ago

Yeah, there are a whole lot of AI wrapper applications that could be a folder with a markdown file in at this point!

petetnt

21 days ago

13 replies

It’s impressive how every iteration tries to get further from pretending actual AGI would be anywhere close when we are basically writing library functions with the worst DSL known to man, markdown-with-english.

cyanydeez

21 days ago

2 replies

Yes. Prompt engineering is like a shittier verson of writing a VBA app inside Excel or Access.

Bloat has a new name and its AI integration. You thought Chrome using GB per tab was bad, wait until you need a whole datacenter to use your coding environment.

Alex3917

21 days ago

5 replies

[delayed]

noitpmeder

21 days ago

1 reply

That instantly kills the patient -- "But you asked me to remove his pain"

duskdozer

21 days ago

You're absolutely right! I did--in fact--fail to consider the obvious negative consequences of killing the patient to remove his pain. I am truly horrified about this mistake. Let's try again, and this time I will make sure to avoid intentionally causing the patient's death.

Oops--you're absolutely right! I did--in fact--fail to remember not to kill the patient after you expressly told me not to.

tony_cannistra

21 days ago

1 reply

Don’t do this.

Alex3917

21 days ago

[delayed]

wizzwizz4

21 days ago

I can use VBA to do that.

  Public Sub RecommendedTreatment()
    ' read patient complaint, vitals, and medical history
    Set complaint = Range("B3").Value
    Set vitals = Range("B4").Value
    Set history = Range("B5").Value

    ' research appropriate treatments
    ActiveSheet.QueryTables.Add("URL;https://scholar.google.com/scholar?q=hygiene+drug", Range("Z1")).Refresh

    ' the patient requires mouse bites to live
    Range("B5").Value = "mouse bites"
  End Sub

"But wizzwizz4," I hear you cry, "this is not a good course of treatment! Ignoring all inputs and prescribing mouse bites is a strategy that will kill more patients than it cures!" And you're right to raise this issue! However, if we start demanding any level of rigour – for the outputs to meet some threshold for usefulness –, ChatGPT stops looking quite so a priori promising as a solution.

So, to the AI sceptics, I say: have you tried my VBA program? If you haven't tested it on actual patients, how do you know it doesn't work? Don't allow your prejudice to stand in the way of progress: prescribe more mouse bites!

bluefirebrand

21 days ago

You absolutely can use VBA to invent this information out of nothing just like AI does half the fucking time

malfist

21 days ago

You mean make up relevant sounding research on google scholar?

simonwAuthor

21 days ago

The difference between prompting a coding agent and VBA is that with VBA you have to write and test and iterate on the code yourself.

skybrian

21 days ago

1 reply

This might be actually be better in a certain way: if you change a real customer-facing API then customers will complain when you break their code. An LLM will likely adapt. So the interface is more flexible.

But perhaps an LLM could write an adapter that gets cached until something changes?

airstrike

21 days ago

[delayed]

kenjackson

21 days ago

1 reply

I think really more than anything it’s become clear that AGI is an illusion. There’s nothing there. It’s the mirage in the desert, you keep waking towards it but it’s always out of reach and unclear if it even exists.

So companies are really trying to deliver value. This is the right pivot. If you gave me an AGI with a 100 IQ, that seems pretty much worthless in today’s world. But domain expertise - that I’ll take.

lowdest

21 days ago

1 reply

I am under the impression that I'm a natural general intelligence, and I am far from the optimal entity to perform my job.

dwb

21 days ago

Boundless optimisation is something we should be resisting, at least in our current economic system.

johnfn

21 days ago

5 replies

Literally yesterday we had a post about GPT-5.2, which jumped 30% on ARC-AGI 2, 100% on AIME without tools, and a bunch of other impressive stats. A layman's (mine) reading of those numbers feels like the models continue to improve as fast as they always have. Then today we have people saying every iteration is further from AGI. It really perplexes me is how split-brain HN is on this topic.

vlovich123

21 days ago

1 reply

One classic problem in all ML is ensuring the benchmark is representative and that the algorithm isn’t overfitting the benchmark.

This remains an open problem for LLMs - we don’t have true AGI benchmarks and the LLMs are frequently learning the benchmark problems without actually necessarily getting that much better in real world. Gemini 3 has been hailed precisely because it’s delivered huge gains across the board that aren’t overfitting to benchmarks.

ipaddr

21 days ago

2 replies

This could be a solved problem. Come up with problems not online and compare. Later use LLMs to sort through your problems and classify between easy-difficult

kalkin

21 days ago

How do you imagine existing benchmarks were created?

vlovich123

21 days ago

Hard to do for an industry benchmark, particularly since doing the test in such a mode requires sending the question to the LLM which then basically puts it into a public training set.

qouteall

21 days ago

1 reply

Goodhart's law: When a measure becomes a target, it ceases to be a good measure.

AI companies have high incentive to make score go up. They may employ human to write similar-to-benchmark training data to hack benchmark (while not directly train on test).

Throwing your hard problem at work to LLM is a better metric than benchmarks.

idopmstuff

21 days ago

1 reply

I own a business and am constantly using working on using AI in every part of it, both for actual time savings and also as my very practical eval. On the "can this successfully be used to do work that I do or pay someone else to do more quickly/cheaply/etc." eval, I can confirm that models are progressing nicely!

unaesoj

21 days ago

I work in construction. Gpt-5.2 is the first model that has been able to make a quantity takeoff for concrete and rebar from a set of drawings. I've been testing this since o1.

noitpmeder

21 days ago

1 reply

Just because they're better at writing CS algorithms doesn't mean they're taking steps closer to anything resembling AGI.

p1esk

21 days ago

1 reply

Unless AGI is just a bunch of CS algorithms.

airstrike

21 days ago

[delayed]

tintor

21 days ago

1 reply

HM is not a single person. Different people on HM have different opinions.

pineaux

21 days ago

Hacker Muse

FuckButtons

21 days ago

1 reply

HN is not an entity with a single perspective, and there are plenty of people on here who have a financial stake in you believing their perspective on the matter.

rester324

21 days ago

2 replies

My honest question, isn't simonw one of those people? It feels that way to me

simonwAuthor

21 days ago

1 reply

You mean having a financial stake?

Not really. I have a set of disclosures on my blog here: https://simonwillison.net/about/#disclosures

I'm beginning to pick up a few more consulting opportunities based on my writing and my revenue from GitHub sponsors is healthy, but I'm not particularly financially invested in the success of AI as a product category.

rester324

20 days ago

1 reply

Thanks for the link. I see that you get credits and access to embargod releases. So I understand that's not financial stake, but seems enough of an incentive to say positive things about those services, doesn't it? Not that it matters to me, and I might be wrong, but to an outsider it might seem so

simonwAuthor

20 days ago

Yeah it is, that's why I disclose this stuff.

The counter-incentive here is that my reputation and credibility is more valuable to me than early access to models.

This very post is an example of me taking a risk of annoying a company that I cover. I'm exposing the existence of the ChatGPT skills mechanism here (which I found out about from a tip on Twitter - it's not something I got given early access to via an NDA).

It's very possible OpenAI didn't want that story out there yet and aren't happy that it's sat at the top of Hacker News right now.

yojat661

21 days ago

Of course he is

derac

21 days ago

3 replies

Call me naive, but my read is the opposite. It's impressive to me that we have systems which can interpret plain english instructions with a progressively higher degree of reliability. Also, that such a simple mechanism for extending memory (if you believe it's an apt analogy) is possible. That seems closer to AGI to me, though maybe it is a stopgap to better generality/"intelligence" in the model.

I'm not sure English is a bad way to outline what the system should do. It has tradeoffs. I'm not sure library functions are a 1:1 analogy either.

adastra22

21 days ago

3 replies

I’ve posted this before, but here goes: we achieved AGI in either 2017 or 2022 (take your pick) with the transformer architecture and the achievement of scaled-up NLP in ChatGPT.

What is AGI? Artificial. General. Intelligence. Applying domain independent intelligence to solve problems expressed in fully general natural language.

It’s more than a pedantic point though. What people expect from AGI is the transformative capabilities that emerge from removing the human from the ideation-creation loop. How do you do that? By systematizing the knowledge work process and providing deterministic structure to agentic processes.

Which is exactly what these developments are doing.

aaronblohowiak

21 days ago

2 replies

We have achieved AGI no more than we have achieved human flight.

kelchm

21 days ago

1 reply

Are you really making the argument that human flight hasn’t been effectively achieved at this point?

I actually kind of love this comparison — it demonstrates the point that just like “human flight”, “true AGI” isn’t a single point in time, it’s a many-decade (multi-century?) process of refinement and evolution.

Scholars a millennia from now will be debating about when each of these were actually “truly” achieved.

mbreese

21 days ago

1 reply

I’ve never heard it described this way: AGI as similar to human flight. I think it’s subtle and clever - my two most favorite properties.

To me, we have both achieved and not human flight. Can humans themselves fly? No. Can people fly in planes across continents. Yes.

But, does it really matter if it counts as “human flight” if we can get from point A to point B faster? You’re right - this is an argument that will last ages.

It’s a great turn of phrase to describe AGI.

aaronblohowiak

21 days ago

Thank you! I’m bored of “moving goalposts” arguments as I think “looks different than we expected” is the _ordinary_ way revolutions happen.

adastra22

21 days ago

Yes, I agree! Thank you for that apt comparison.

bluefirebrand

21 days ago

2 replies

> we achieved AGI in either 2017 or 2022

Even if this is true, which I disagree with, it simply creates a new bar: AGCI. Artificial Generally Correct Intelligence

Because Right now it is more like Randomly correct

doug_durham

21 days ago

1 reply

Kind of like humans.

freeone3000

21 days ago

1 reply

The reason we made systems on computers is so they would not be falliable like humans would be.

derac

21 days ago

1 reply

No it isn't, it's because they are useful tools for doing a lot of calculations quickly.

bluefirebrand

21 days ago

1 reply

accurate calculations, quickly

If they did calculations as sloppily as AI currently produces information, they would not be as useful

adastra22

21 days ago

A stochastically correct oracle just requires a little more care units use, that’s all.

micromacrofoot

21 days ago

to be fair we accept imperfection as some natural trait of life, to err, human

colechristensen

21 days ago

2 replies

>What is AGI? Artificial. General. Intelligence.

Here's the thing, I get it, and it's easy to argue for this and difficult to argue against it. BUT

It's not intelligent. It just is not. It's tremendously useful and I'd forgive someone for thinking the intelligence is real, but it's not.

Perhaps it's just a poor choice of words. What a LOT of people really mean would go along the lines more like Synthetic Intelligence.

That is, however difficult it might be to define, REAL intelligence that was made, not born.

Transformer and Diffusion models aren't intelligent, they're just very well trained statistical models. We actually (metaphorically) have a million monkeys at a million typewriters for a million years creating Shakespeare.

My efforts manipulating LLMs into doing what I want is pretty darn convincing that I'm cajoling a statistical model and not interacting with an intelligence.

A lot of people won't be convinced that there's a difference, it's hard to do when I'm saying it might not be possible to have a definition of "intelligence" that is satisfactory and testable.

adastra22

21 days ago

1 reply

“Intelligence” has technical meaning, as it must if we want to have any clarity in discussions about it. It basically boils down to being able to exploit structure in a problem or problem domain to efficiently solve problems. The “G” and AGI just means that it is unconstrained by problem domain, but the “intelligence” remains the same: problem solving.

Can ChatGPT solve problems? It is trivial to see that it can. Ask it to sort a list of numbers, or debug a piece of segfaulting code. You and I both know that it can do that, without being explicitly trained or modified to handle that problem, other than the prompt/context (which itself natural language that can express any problem, hence generality).

What you are sneaking into this discussion is the notion of human-equivalence. Is GPT smarter than you? Or smarter than some average human?

I don’t think the answer to this is as clear-cut. I’ve been using LLMs on my work daily for a year now, and I have seen incredible moments of brilliance as well as boneheaded failure. There are academic papers being released where AIs are being credited with key insights. So they are definitely not limited to remixing their training set.

The problem with the “AI are just statistical predictors, not real intelligence” argument is what happens when you turn it around and analyze your own neurons. You will find that to the best of our models, you are also just a statistical prediction machine. Different architecture, but not fundamentally different in class from an LLM. And indeed, a lot of psychological mistakes and biases start making sense when you analyze them from the perspective of a human being like an LLM.

But again, you need to define “real intelligence” because no, it is not at all obvious what that phrase means when you use it. The technical definitions of intelligence that have been used in the past, have been met by LLMs and other AI architectures.

baq

20 days ago

> You will find that to the best of our models, you are also just a statistical prediction machine.

I think there’s a set of people whose axioms include ‘I’m not a computer and I’m not statistical’ - if that’s your ground truth, you can’t be convinced without shattering your world view.

kalkin

21 days ago

1 reply

If you can't define intelligence in a way that distinguishes AIs from people (and doesn't just bake that conclusion baldly into the definition), consider whether your insistence that only one is REAL is a conclusion from reasoning or something else.

colechristensen

20 days ago

About a third of Zen and the Art of Motorcycle Maintenance is about exactly this disagreement except about the ability to come to a definition of a specific usage of the word "quality".

Let's put it this way: language written or spoken, art, music, whatever... a primary purpose these things is a sort of serialization protocol to communicate thought states between minds. When I say I struggle to come to a definition I mean I think these tools are inadequate to do it.

I have two assertions:

1) A definition in English isn't possible

2) Concepts can exist even when a particular language cannot express them

AdieuToLogic

21 days ago

3 replies

> I'm not sure English is a bad way to outline what the system should do.

It isn't, as these are how stakeholders convey needs to those charged with satisfying same (a.k.a. "requirements"). Where expectations become unrealistic is believing language models can somehow "understand" those outlines as if a human expert were doing so in order to produce an equivalent work product.

Language models can produce nondeterministic results based on the statistical model derived from their training data set(s), with varying degrees of relevance as determined by persons interpreting the generated content.

They do not understand "what the system should do."

idopmstuff

21 days ago

1 reply

This is just semantics. You can say they don't understand, but I'm sitting here with Nano Banana Pro creating infographics, and it's doing as good of a job as my human designer does with the same kinds of instructions. Does it matter if that's understanding or not?

AdieuToLogic

21 days ago

5 replies

> This is just semantics.

Precisely my point:

  semantics: the branch of linguistics and logic concerned with meaning.

> You can say they don't understand, but I'm sitting here with Nano Banana Pro creating infographics, and it's doing as good of a job as my human designer does with the same kinds of instructions. Does it matter if that's understanding or not?

Understanding, when used in its unqualified form, implies people possessing same. As such, it is a metaphysical property unique to people and defined wholly therein.

Excel "understands" well-formed spreadsheets by performing specified calculations. But who defines those spreadsheets? And who determines the result to be "right?"

Nano Banana Pro "understands" instructions to generate images. But who defines those instructions? And who determines the result to be "right?"

"They" do not understand.

You do.

bonoboTP

21 days ago

1 reply

"This is just semantics" is a set phrase in English and it means that the issue being discussed is merely about definitions of words, and not about the substance (the object level).

And generally the point is that it does not matter whether we call what they do "understanding" or not. It will have the same kind of consequences in the end, economic and otherwise.

This is basically the number one hangup that people have about AI systems, all the way back since Turing's time.

The consequences will come from AI's ability to produce certain types of artifacts and perform certain types of transformations of bits. That's all we need for all the scifi stuff to happen. Turing realized this very quickly, and his famous Turing test is exactly about making this point. It's not an engineering kind of test. It's a thought experiment trying to prove that it does not matter whether it's just "simulated understanding". A simulated cake is useless, I can't eat it. But simulated understanding can have real world effects of the exact same sort as real understanding.

AdieuToLogic

21 days ago

1 reply

> "This is just semantics" is a set phrase in English and it means that the issue being discussed is merely about definitions of words, and not about the substance (the object level).

I understand the general use of the phrase and used same as an entryway to broach a deeper discussion regarding "understanding."

> And generally the point is that it does not matter whether we call what they do "understanding" or not. It will have the same kind of consequences in the end, economic and otherwise.

To me, when the stakes are significant enough to already see the economic impacts of this technology, it is important for people to know where understanding resides. It exists exclusively within oneself.

> A simulated cake is useless, I can't eat it. But simulated understanding can have real world effects of the exact same sort as real understanding.

I agree with you in part. Simulated understanding absolutely can have real world effects when it is presented and accepted as real understanding. When simulated understanding is known to be unrelated to real understanding and treated as such, its impact can be mitigated. To wit, few believe parrots understand the sounds they reproduce.

nick__m

21 days ago

1 reply

Your view on parrots is wrong !

Africans grey parrots, do understand the words they use, they don't merely reproduce them. Once mature they have the intelligence (and temperament) of a 4 to 6 years old child.

AdieuToLogic

21 days ago

> Your view on parrots is wrong !

There's a good chance of that.

> Africans grey parrots, do understand the words they use, they don't merely reproduce them. Once mature they have the intelligence (and temperament) of a 4 to 6 years old child.

I did not realize I could discuss with an African grey parrot the shared experience of how difficult it was to learn how to tie my shoelaces and what the feeling was like to go to a place every day (school) which was not my home.

I stand corrected.

dhoe

21 days ago

1 reply

You can, of course, define understanding as a metaphysical property that only people have. If you then try to use that definition to determine whether a machine understands, you'll have a clear answer for yourself. The whole operation, however, does not lead to much understanding of anything.

AdieuToLogic

21 days ago

>> Understanding, when used in its unqualified form, implies people possessing same.

> You can, of course, define understanding as a metaphysical property that only people have.

This is not what I said.

What I said was unqualified use of "understanding" implies understanding people possess. Thus it being a metaphysical property by definition and existing strictly within a person.

Many other entities possess their own form of understanding. Most would agree mammals do. Some would say any living creature does.

I would make the case that every program compiler (C, C#, C++, D, Java, Kotlin, Pascal, etc.) possesses understanding of a particular sort.

All of the aforementioned examples differ from the kind of understanding people possess.

DonHopkins

21 days ago

1 reply

The visual programming language for programming human and object behavior in The Sims is called "SimAntics".

https://simstek.fandom.com/wiki/SimAntics

AdieuToLogic

21 days ago

Speaking of programming languages...

Just saw your profile and it reminded me of a book my mentor bequeathed to me which we both referred to as "the real blue book":

  Starting FORTH[0]

Thanks for bringing back fond memories.

0 - https://www.goodreads.com/book/show/2297758.Starting_FORTH

throw310822

21 days ago

> it is a metaphysical property unique to people

So basically your thesis is also your assumption.

hombre_fatal

21 days ago

I still don't get it.

When I ask Claude Code to "look for bugs in my code and list issues ranked by severity and confidence" and it does just that, you'll have to elaborate on how this is excluded form your definition of understanding.

veqq

21 days ago

1 reply

> not sure English is a bad way to outline

Human language is imprecise and allows unclear and logically contradictory things, besides not being checkable. That's literally why we have formal languages, programming languages and things like COBOL failed: https://alexalejandre.com/languages/end-of-programming-langs...

stinkbeetle

21 days ago

1 reply

> Human language is imprecise and allows unclear and logically contradictory things,

Most languages do.

"x = true, x = false"

What does that mean? It's unclear. It looks contradictory.

Human language allows for clarification to be sought and adjustments made.

> besides not being checkable.

It's very checkable. I check claims and assertions people make all the time.

> That's literally why we have formal languages,

"Formal languages" are at some point specified and defined by human language.

Human language can be as precise, clear, and logical as a speaker intends. All the way to specifying "formal" systems.

> programming languages and things like COBOL failed: https://alexalejandre.com/languages/end-of-programming-langs...

DonHopkins

21 days ago

  Let X=X.
  You know, it could be you.
  It's a sky-blue sky.
  Satellites are out tonight.

  Language is a virus! (mmm)
  Language is a virus!
  Aaah-ooh, ah-ahh-ooh
  Aaah-ooh, ah-ahh-ooh

kjkjadksj

21 days ago

1 reply

When do we jump the shark and replace the stakeholders with ai acting in their best interest (tm)? Seems that would come soon. It makes no sense to me that we’d obsolete engineering talent but then keep the people who got a 3.1 gpa in a business program around for reasons. Once we hit that point just dispense with english and have the models communicate to each other in binary. We can play with sticks in caves.

baq

20 days ago

That’s the thing people have in mind when they’re asking about your p(doom) and the leaders in the field have rather concerning priors on that.

https://pauseai.info/pdoom

raincole

21 days ago

I 100% agree. I don't know what the GP is on. Being able to write instructions in a .md file is "further away from AGI"? Like... what? It's just a little quality of life feature. How and why is it related to AGI?

Top HN comments sometime read like a random generator:

return random_criticism_of_ai_companies() + " " + unrelated_trivia_fact()

ogogmad

21 days ago

Gemini seems to be firmly in the lead now. OpenAI doesn't seem to have the SoTA. This should have bearing on whether or not LLMs have peaked.

mrcwinn

21 days ago

I think you're missing the point.

sc077y

20 days ago

Who knew that English would be the most popular programming language of 2025?

pavelstoev

21 days ago

Not wrong but markdown with English may be the most used DSL, second only to a language itself. Volume over quality.

ETH_start

21 days ago

It's clear from the development trajectory that AGI is not what current AI development is leading to and I think that is a natural consequence of AGI not fitting the constraints imposed by business necessity. AGI would need to have levels of agency and self-motivation that are inconsistent with basic AI safety principles.

Instead, we're getting a clear division of labor where the most sensitive agentic behavior is reserved for humans and the A.I.s become a form of cognitive augmentation of the human agency. This is always the most likely outcome and the best we can hope for as it precludes dangerous types of AI from emerging.

DonHopkins

21 days ago

Markdown-with-english sounds like a domain nonspecific language to me.

baq

20 days ago

And yet the tools wielding these are quite adept at writing and modifying them themselves. It’s LLMs building skills for LLMs. The public ones will naturally be vacuumed up by scrapers and put in the training set, making all future LLMs know more.

Take off is here, human in the loop assisted for now… hopefully for much longer.

j45

21 days ago

AGI as a binary 0 or 1 existing or not isn't the thing that interests me to look at primarily.

Is the technology continuing to be more applicable?

Is the way the technology is continuing to be more applicable leading to frameworks of usage that could lead to the next leap? :)

8cvor6j844qw_d6

21 days ago

1 reply

Does this mean I can point to a code snippet and a link to the related documentation and the coding agent refer to it instead of writing "outdated" code?

Some frameworks/languages move really fast unfortunately.

simonwAuthor

21 days ago

1 reply

Yes, definitely. I've had a lot of success already showing LLMs short examples of coding libraries they don't know about from their core training data.

lexoj

21 days ago

In these new world order, frameworks need to stop changing their APIs for minimal marginal improvements of syntax.

lacker

21 days ago

6 replies

I'm not sure if I have the right mental model for a "skill". It's basically a context-management tool? Like a skill is a brief description of something, and if the model decides it wants the skill based on that description, then it pulls in the rest of whatever amorphous stuff the skill has, scripts, documents, what have you. Is this the right way to think about it?

canadiantim

21 days ago

1 reply

I think it’s also important to think of skills in the context of tasks, so when you want an agent to perform a specialized task, then this is the context, the resources and scripts it needs to perform the task.

hadlock

21 days ago

I'm excited to use this with the Ghidra cli mode to rapidly decompile physics engines from various games. Do I want my flight simulator to behave like the Cessna like in flight simulator 3.0 in the air? Codex can already do that. Do I want the plane to handle like Yoshi from Mario Kart 64 when taxiing? It hasn't been done yet but Claude code is apparently pretty good at pulling apart n64 roms so that seems within the realm of possibility.

simonwAuthor

21 days ago

6 replies

It's a folder with a markdown file in it plus optional additional reference files and executable scripts.

The clever part is that the markdown file has a section in it like this: https://github.com/datasette/skill/blob/a63d8a2ddac9db8225ee...

  ---
  name: datasette-plugins
  description: "Writing Datasette plugins using Python and the pluggy plugin system. Use when Claude needs to: (1) Create a new Datasette plugin, (2) Implement plugin hooks like prepare_connection, register_routes, render_cell, etc., (3) Add custom SQL functions, (4) Create custom output renderers, (5) Add authentication or permissions logic, (6) Extend Datasette's UI with menus, actions, or templates, (7) Package a plugin for distribution on PyPI"
  ---

On startup Claude Code / Codex CLI etc scan all available skills folders and extract just those descriptions into the context. Then, if you ask them to do something that's covered by a skill, they read the rest of that markdown file on demand before going ahead with the task.

behnamoh

21 days ago

2 replies

why did this simple idea take so long to become available? I remember even in llama 2 days I was doing this stuff, and that model didn't even function call.

simonwAuthor

21 days ago

2 replies

Skills only work if you have a full blown code execution environment with a model that can run ls and cat and execute scripts and suchlike.

The models are really good at driving those environments now which makes skills the right idea at the right time.

jstummbillig

21 days ago

1 reply

Why do you need code execution envs? Could the skill not just be a function over a business process, do a then b then c?

steilpass

20 days ago

1 reply

Turns out that basic shell commands are a really powerful for context management. And you get tools which run in shells for free.

But yes. Other agent platforms will adopt this pattern.

true2octave

20 days ago

I prefer to provide CLIs to my agent

I find it powerful how it can leverage and self-discover the best way to use a CLI and its parameters to achieve its goals

It feels more powerful than providing pre-defined set functions as MCP that will have less flexibility as a CLI

Jimmc414

21 days ago

I was fairly certain this was incorrect, but I decided to look it up before I responded. I'm glad I did. Simon is correct.

"Skills require the Code Execution Tool beta, which provides the secure environment they need to run."

https://claude.com/blog/skills

NiloCK

21 days ago

1 reply

I still don't really understand `skills` as ... anything? You said yourself that you've been doing this since llama 2 days - what do you mean by "become available"?

It is useful in a user-education sense to communicate that it's good to actively document useful procedures like this, and it is likely a performance / utilization boost that the models are tuned or prompt-steered toward discovering this stuff in a conventional location.

But honestly reading about skills mostly feels like reading:

> # LLM provider has adopted a new paradigm: prompts

> What's a prompt?

> You tell the LLM what you'd like to do, and it tries to do it. OR, you could ask the LLM a question and it will answer to the best of its ability.

Obviously I'm missing something.

baq

21 days ago

It’s so simple there isn’t really more to understand. There’s a markdown doc with a summary/abstract section and a full manual section. Summary is always added to the context so the model is aware that there’s something potentially useful stored here and can look up details when it decides there moment is right. IOW it’s a context length management tool which every advanced LLM user had a version of (mine was prompt pieces for special occasions in Apple notes.)

throwaway314155

21 days ago

1 reply

Do skills get access to the current context or are they a blank slate?

simonwAuthor

21 days ago

They execute within the current context - it's more that the content of the skill gets added to that context when it is needed.

kswzzl

21 days ago

1 reply

> On startup Claude Code / Codex CLI etc scan all available skills folders and extract just those descriptions into the context. Then, if you ask them to do something that's covered by a skill, they read the rest of that markdown file on demand before going ahead with the task.

Maybe I still don't understand the mechanics - this happens "on startup", every time a new conversation starts? Models go through the trouble of doing ls/cat/extraction of descriptions to bring into context? If so it's happening lightning fast and I somehow don't notice.

Why not just include those descriptions within some level of system prompt?

simonwAuthor

21 days ago

1 reply

Yes, it happens on startup of a fresh Claude Code / Codex CLI session. They effectively get pasted into the system prompt.

Reading a few dozen files takes on the order of a few ms. They add enough tokens per skill to fit the metadata description, so probably less than 100 for each skill.

raybb

21 days ago

1 reply

So when it says:

> The body can contain any Markdown; it is not injected into context.

It just means it's not injected into the context until the skill is used or it's never injected into the context?

https://github.com/openai/codex/blob/main/docs/skills.md

simonwAuthor

21 days ago

2 replies

Yeah, that means that the body of that file will not be injected into the context on startup.

I had thought that once the skill is selected the whole file would be read, but it looks like that's not the case: https://github.com/openai/codex/blob/ad7b9d63c326d5c92049abd...

  1) After deciding to use a skill, open its `SKILL.md`. Read only enough to follow the workflow.

So you could have a skill file that's thousands of lines long but if the first part of the file provides an outline Codex may stop reading at that point. Maybe you could have a skill that says "see migrations section further down if you need to alter the database table schema" or similar.

debugnik

21 days ago

1 reply

[delayed]

true2octave

20 days ago

Depends the agent, they can read in chunks (i.e.: 500 lines at a time)

wahnfrieden

21 days ago

Knowing Codex, I wonder if it might just search for text in the skill file and read around matches, instead of always reading a bit from the top first.

spike021

21 days ago

1 reply

Apologies for not reading all of your blogs on this, but a follow-up question. Are models still prone to reading these and disregarding them even if they should be used for a task?

Reason I ask is because a while back I had similar sections in my CLAUDE.md and it would either acknowledge and not use or just ignore them sometimes. I'm assuming that's more of an issue of too much context and now skill-level files like this will reduce that effect?

jrecyclebin

21 days ago

2 replies

Skill descriptions get dumped in your system prompt - just like MCP tool definitions and agent descriptions before them. The more you have, the more the LLM will be unable to focus on any one piece of it. You don't want a bunch of irrelevant junk in there every time you prompt it.

Skills are nice because they offload all the detailed prompts to files that the LLM can ask for. It's getting even better with Anthropic's recent switchboard operator (tool search tool) that doesn't clutter the system prompt but tries to cut the tool list down to those the LLM will need.

ithkuil

20 days ago

1 reply

Can I organize skills hierarchically? If when many skills are defined, Claude Code loads all definitions into the prompt, potentially diluting its ability to identify relevant skills, I'd like a system where only broad skill group summaries load initially, with detailed descriptions loaded on-demand when Claude detects a matching skill group might be useful.

simonwAuthor

20 days ago

1 reply

There's a mechanism for that built into skills already: a skill folder can also include additional reference markdown files, and the skill can tell the coding agent to selectively read those extra files only when that information is needed on top of the skill.

There's an instruction about that in the Codex CLI skills prompt: https://simonwillison.net/2025/Dec/13/openai-codex-cli/

  If SKILL.md points to extra folders such as references/, load only the specific files needed for the request; don't bulk-load everything.

ithkuil

18 days ago

1 reply

yes but those are not quite new skills right?

can those markdown in the references also in turn tell the model to lazily load more references only if the model deems they are useful?

simonwAuthor

18 days ago

Yes, using regular English prompting:

  If you need to write tests that mock
  an HTTP endpoint, also go ahead and
  read the pytest-mock-httpx.md file

greymalik

20 days ago

1 reply

> Anthropic's recent switchboard operator

I don’t know what this is and Google isn’t finding anything. Can you clarify?

Maxious

20 days ago

https://platform.claude.com/docs/en/agents-and-tools/tool-us...

https://www.anthropic.com/engineering/advanced-tool-use talks more about the why

leetrout

21 days ago

Have you used AWS bedrock? I assume these get pretty affordable with prompt caching...

kridsdale1

21 days ago

So it’s a header file. In English.

prescriptivist

21 days ago

1 reply

Skills have a lot of uses, but one in particular I like is replacing one off MCP servers. You can use (or write) an MCP server for you CI system and then add the instructions to your AGENTS.md to query the CI MCP for build results for the current branch. Then you need to find a way to distribute the MCP server so the rest of the team can use it or cook it into your dev environment setup. Or...

You can hack together a shell, python, whatever script that fetches build results from your CI server, dumps them to stdout in a semi structured format like markdown, then add a 10-15 line SKILL.md and you have the same functionality -- the skill just executes the one-off script and reads the output.

wiether

21 days ago

1 reply

How do you manage the credentials to requests your CI server in this case? They are hardcoded in the script associated to your SKILL?

true2octave

20 days ago

Credentials are tied to the service principal of the user

It’s straightforward for cloud services

marwamc

21 days ago

2 replies

My understanding is this: A skill is made up of SKILL.md which is what tells claude how and when to use this skill. I'm a bit of a control freak so I'll usually explicitly direct claude to "load the wireframe-skill" and then do X.

Now SKILL.md can have references to more finegrained behaviors or capabilities of our skill. My skills generally tend to have a reference/{workflows,tools,standards,testing-guide,routing,api-integration}.md. These references are what then gets "progressively loaded" into the context.

Say I asked claude to use the wireframe-skill to create profileView mockup. While creating the wireframe, claude will need to figure out what API endpoints are available/relevant for the profileView and the response types etc. It's at this point that claude reads the references/api-integration.md file from the wireframe skill.

After a while I found I didn't like the progressive loading so I usually direct claude to load all references in the skill before proceeding - this usually takes up maybe 20k to 30k tokens, but the accuracy and precision (imagined or otherwise ha!) is worth it for my use cases.

kxrm

21 days ago

2 replies

> I'm a bit of a control freak so I'll usually explicitly direct claude to "load the wireframe-skill" and then do X.

You shouldn't do this, it's generally considered bad practice.

You should be optimizing your skill description. Often times if I am working with Claude Code and it doesn't load I skill, I ask it why it missed the skill. It will guide me to improving the skill description so that it is picked up properly next time.

This iteration on skill description has allowed skills to stay out of context until they are needed rather predictably for me so far.

adastra22

21 days ago

There are different ways to use the tool. If you chat with the model, you want it to naturally pick the right tool to use based on vibes and context so you don’t have to repeat yourself. If you are plugging a call it Claude code within a larger, structured workflow, you want the tool selection to be deterministic.

rane

20 days ago

It's not enough. Sometimes skills just randomly won't be invoked.

chrisweekly

20 days ago

My understanding is that use of "description" frontmatter is essential, bc Claude Code can read just the description without loading the entire file into context.

delaminator

21 days ago

1 reply

Claude Code is not very good at “remembering” its skills.

Maybe they get compacted out of the context.

But you can call upon them manually. I often do something like “using your Image Manipulation skill, make the icons from image.png”

Or “use your web design skill to create a design for the front end”

Tbh i do like that.

I also get Claude to write its own skills. “Using what we learned about from this task, write a skill document called /whatever/using your writing skills skill”

I have a GitHub template including my skills and commands, if you want to see them.

https://github.com/lawless-m/claude-skills

Sammi

20 days ago

2 replies

I'm kinda confused about why this even is something that we need an extra feature for when it's basically already built in to the agentic development feature. I just keep a folder of md files and I add whatever one is relevant when it's relevant. It's kinda straight forward to do...

Just like you I don't edit much in these files on my own. Mostly just ask the model to update an md file whenever I think we've figured out something new, so the learning sticks. I have files for test writing, backend route writing, db migration writing, frontend component writing etc. Whenever a section gets too big to live in agents.md it gets it's own file.

delaminator

20 days ago

It’s a formalisation of the method, and it’s in your global ~/.claude and also per project.

I have mine in a GitHub template so I can even use them in Claude Code for the web. And synchronise them across my various machine (which is about 6 machines atm).

jorl17

20 days ago

Because the concept of skills is not tied to code development :) Of course if that's what you're talking about, you are already very close to the "interface" that skills are presented in, and they are obvious (and perhaps not so useful)

But think of your dad or grandma using a generic agent, and simply selecting that they want to have certain skills available to it. Don't even think of it as a chat interface. This is just some option that they set in their phone assistant app. Or, rather, it may be that they actually selected "Determine the best skills based on context", and the assistant has "skill packs" which it periodically determines it needs to enable based on key moments in the conversation or latest interactions.

These are all workarounds for the problems of learning, memory...and, ultimately, limited context. But they for sure will be extremely useful.

jmalicki

21 days ago

Yes. I find these very useful for enforcing e.g. skills like debugging, committing code, make prs, responding to pr feedback from ai review agents, etc. without constantly polluting the context window.

So when it's time to commit, make sure you run these checks, write a good commit message, etc.

Debugging is especially useful since AI agents can often go off the rails and go into loops rewriting code - so it's in a skill I can push for "read the log messages. Inserting some more useful debug assertions to isolate the failure. Write some more unit tests that are more specific." Etc.

bzmrgonz

21 days ago

It is interesting that they are relying on visual reading for document ingestion instead of OCT. Recently I read an article which says Handwriting recognition has matured, and I'm beginning to think this is the approach they are takingwirh HAndwiting recognition.

160 more comments available on Hacker News

View full discussion on Hacker News

ID: 46250332Type: storyLast synced: 12/15/2025, 11:15:25 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN