Claude Skills

Posted3 months agoActive3 months ago

meetpateltech

816 points

427 comments

anthropic.comTechstoryHigh profile

calmmixed

Debate

60/100

AILlmsClaudeAnthropic

Key topics

Llms

Claude

Anthropic

https://www.anthropic.com/engineering/equipping-agents-for-t...

Anthropic released 'Claude Skills', a feature allowing users to create and manage repeatable instructions for Claude, sparking discussion on its utility, complexity, and potential overlap with existing features.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

42m

Peak period

129

0-6h

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 16, 2025 at 12:05 PM EDT
3 months ago
Step 01
02First comment
Oct 16, 2025 at 12:47 PM EDT
42m after posting
Step 02
03Peak activity
129 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Oct 19, 2025 at 6:42 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (427 comments)

Showing 160 comments of 427

bicx

3 months ago

4 replies

Interesting. For Claude Code, this seems to have generous overlap with existing practice of having markdown "guides" listed for access in the CLAUDE.md. Maybe skills can simply make managing such guides more organized and declarative.

crancher

3 months ago

1 reply

It's interesting (to me) visualizing all of these techniques as efforts to replicate A* pathfinding through the model's vector space "maze" to find the desired outcome. The potential to "one shot" any request is plausible with the right context.

candiddevmike

3 months ago

1 reply

> The potential to "one shot" any request is plausible with the right context.

You too can win a jackpot by spinning the wheel just like these other anecdotal winners. Pay no attention to your dwindling credits every time you do though.

NitpickLawyer

3 months ago

On the other hand, our industry has always chased the "one baby in one month out of 9 mothers" paradigm. While you couldn't do that with humans, it's likely you'll soon (tm) be able to do it with agents.

kfarr

3 months ago

Yeah my first thought was, oh it sounds like a bunch of CLAUDE.md's under the surface :P

guluarte

3 months ago

it also may point out that the solution for context rot may not be coming in the foreseeable future

j45

3 months ago

If so, it would be a better way than encapsulating functionality in markdown.

I have been using claude code to create some and organize them but they can have diminishing return.

phildougherty

3 months ago

10 replies

getting hard to keep up with skills, plugins, marketplaces, connectors, add-ons, yada yada

prng2021

3 months ago

3 replies

Yep. Now I need an AI to help me use AI

consumer451

3 months ago

1 reply

I mean, that is a very common thing that I do.

wartywhoa23

3 months ago

2 replies

That's why the key word for all the AI horror stories that have been emerging lately is "recursion".

mikkupikku

3 months ago

"Recursion" is a word that shows up a lot in the rants of people in AI psychosis (believe they turned the chatbot into god, or believe the chatbot revealed themselves to be god.)

consumer451

3 months ago

Does that imply no human in the loop? If so, that's not what I meant, or do. Whoever is doing that at this point: bless your heart :)

josefresco

3 months ago

Joking aside, I ask Claude how to uses Claude... all the time! Sometimes I ask ChatGTP about Claude. It actually doesn't work well because they don't imbue these AI tools with any special knowledge about how they work, they seem to rely on public documentation which usually lags behind the breakneck pace of these feature-releases.

andoando

3 months ago

Train AI to setup/train AI on doing tasks. Bam

hansonkd

3 months ago

4 replies

Thats the start of the singularity. The changes will keep accelerating and less and less people will be able to keep up until only the AIs themselves know how to use.

skybrian

3 months ago

1 reply

People thought the same in the ‘90’s. The argument that technology accelerates and “software eats the world” doesn’t depend on AI.

It’s not exactly wrong, but it leaves out a lot of intermediate steps.

xpe

3 months ago

Yes and as we rely on AI to help us choose our tools... the phenomena feels very different, don't you think? Human thinking, writing, talking, etc is becoming less important in this feedback loop seems to me.

xpe

3 months ago

1 reply

abstractions all the way down:

    abstraction
      abstraction
        abstraction
          abstraction
            ...

absturtles

3 months ago

1 reply

... absturtles

xpe

3 months ago

this is pure absturtity! ("absturtlety"?)

matthewaveryusa

3 months ago

Nah, we'll create AI to manage the AI....oh

AaronAPU

3 months ago

I don’t think these are things to keep up with. Those would be actual fundamental advances in the transformer architecture and core elements around it.

This stuff is like front end devs building fad add-ons which call into those core elements and falsely market themselves as fundamental advancements.

marcusestes

3 months ago

1 reply

Agreed, but I think it's actually simple.

Plugins include: * Commands * MCPs * Subagents * Now, Skills

Marketplaces aggregate plugins.

input_sh

3 months ago

It's so simple you didn't even name all of them properly.

tempusalaria

3 months ago

1 reply

All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs. Devs should focus on working directly with model generate apis and not using all the decoration.

tqwhite

3 months ago

Me? I love some lock in. Give me the coolest stuff and I'll be your customer forever. I do not care about trying to be my own AI company. I'd feel the same about OpenAI if they got me first... but they didn't. I am team Anthropic.

hiq

3 months ago

3 replies

IMHO, don't, don't keep up. Just like "best practices in prompt engineering", these are just temporary workaround for current limitations, and they're bound to disappear quickly. Unless you really need the extra performance right now, just wait until models get you this performance out of the box instead of investing into learning something that'll be obsolete in months.

lukev

3 months ago

1 reply

I agree with your conclusion not to sweat all these features too much, but only because they're not hard at all to understand on demand once you realize that they all boil down to a small handful of ways to manipulate model context.

But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.

hiq

3 months ago

My point is that there'll be some layer doing that for you. We already have LLMs writing plans for another LLM to execute, and many other such orchestrations, to reduce the constraints on the actual human input. Those implementing this layer need to develop this context engineering; those simply using LLM-based products do not, as it'll be done for them somewhat transparently, eventually. Similar to how not every software engineer needs to be a compiler expert to run a program.

vdfs

3 months ago

IMO, these are just marketing or new ways of using functions calling, under the hood they all get re-written as tools the model can call

spprashant

3 months ago

I agree with this take. Models and the tooling around them are both in flux. I d rather not spend time learning something in detail for these companies to then pull the plug chasing next-big-thing.

hansmayer

3 months ago

Well, have some understanding: the good folks need to produce something, since their main product is not delivering the much yearned for era of joblessness yet. It's not for you, it's signalling their investors - see, we're not burning your cash paying a bunch of PhDs to tweak the model weights without visible results. We are actually building products. With a huge and willing A/B testing base.

xpe

3 months ago

If I were to say "Claude Skills can be seen as a particular productization of a system prompt" would I be wrong?

From a technical perspective, it seems like unnecessary complexity in a way. Of course I recognize there are lot of product decisions that seem to layer on 'unnecessary' abstractions but still have utility.

In terms of connecting with customers, it seems sensible, under the assumption that Anthropic is triaging customer feedback well and leading to where they want to go (even if they don't know it yet).

Update: a sibling comment just wrote something quite similar: "All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs." I think I agree.

adidoit

3 months ago

All of it is ultimately managing the context for a model. Just different methods

gordonhart

3 months ago

Agree — it's a big downside as a user to have more and more of these provider-specific features. More to learn, more to configure, more to get locked into.

Of course this is why the model providers keep shipping new ones; without them their product is a commodity.

dominicq

3 months ago

Features will be added until morale improves

BoredPositron

3 months ago

1 reply

It is a bit ironic that the better the models get they seem to need more and more user input.

quintu5

3 months ago

More like they can better react to user input within their context window. With older models, the value of that additional user input would have been much more limited.

nozzlegear

3 months ago

3 replies

It superficially reminds me of the old "Alexa Skills" thing (I'm not even sure if Alexa still has "Skills"). It might just be the name making that connection for me.

candiddevmike

3 months ago

1 reply

And how many of those Alexa Skills are still being updated...

This is where waiting for this stuff to stablize/standardize, and then writing a "skill" based on an actual RFC or standard protocol makes more sense, IMO. I've been burned too many times building vendor-locked chatbot extensions.

nozzlegear

3 months ago

> And how many of those Alexa Skills are still being updated...

Not mine! I made a few when they first opened it up to devs, but I was trying to use Azure Logic Apps (something like that?) at the time which was supremely slow and finicky with F#, and an exercise in frustration.

phildougherty

3 months ago

Alexa skills are 3rd party add-ons/plugins. Want to control your hue lights? add the phillips hue skill. I think claude skills in an alexa world would be like having to seed alexa with a bunch of context for it to remember how to turn my lights on and off or it will randomly attempt a bunch of incorrect ways of doing it until it gets lucky.

j45

3 months ago

Seems to be a bit more than that.

sshine

3 months ago

2 replies

I love how the promise of free labor motivates everyone to become API first, document their practices, and plan ahead in writing before coding.

skybrian

3 months ago

1 reply

Cheaper, not free. Also, no training to learn a new skill.

Building a new one that works well is a project, but then it will scale up as much as you like.

This is bringing some of the advantages of software development to office tasks, but you give up some things like reliable, deterministic results.

sshine

3 months ago

2 replies

There is an acquisition cost of researching and developing the LLM, but the running cost should not be classified as a wage, hence cost of labor is zero.

skybrian

3 months ago

1 reply

Don't call it "free labor" at all then? Regardless, running an LLM is usually not free.

sshine

3 months ago

I wouldn’t be able to express the embedded irony if I didn’t use this oxymoron.

On the one hand, AI doesn’t classify as labor in a traditional sense, even though some aspire to replace labor with AI.

On the other hand, if it classified as labor under some new definition, it isn’t free when you consider the external costs of outsourcing basic brain activity, as an individual and as a society.

maigret

3 months ago

It’s still opex for finance

ebiester

3 months ago

It helps that you can have the "free" labor document the processes and build the plan.

nperez

3 months ago

1 reply

Seems like a more organized way to do the equivalent of a folder full of md files + instructing the LLM to ls that folder and read the ones it needs

j45

3 months ago

1 reply

If so it would be most welcome since LLMs doesn't always consistently follow the folder full of MD files to the same depth and consistency.

RamtinJ95

3 months ago

3 replies

what makes it more likely that claude would read these .md files then?

j45

3 months ago

1 reply

Skills is hopefully put through a deterministic process that is guaranteed to occur, instead of a non-deterministic one that can only ever be guaranteed to happen most of the time (the way it is now).

adastra22

3 months ago

It is literally just injecting context into the prompt.

adastra22

3 months ago

1 reply

It includes both the file names and a configurable description string. That’s where you put the TLDR of when to use each skill.

j45

3 months ago

This improves it a great deal but at a certain point, maybe 60-80% of the way it can start fading.

phildougherty

3 months ago

trained to

meetpateltechAuthor

3 months ago

1 reply

Detailed engineering blog:

"Equipping agents for the real world with Agent Skills" https://www.anthropic.com/engineering/equipping-agents-for-t...

dang

3 months ago

Thanks, we'll put that link in the toptext as well

Flux159

3 months ago

1 reply

I wonder how this works with mcpb (renamed from dxt Desktop extensions): https://github.com/anthropics/mcpb

Specifically, it looks like skills are a different structure than mcp, but overlap in what they provide? Skills seem to be just markdown file & then scripts (instead of prompts & tool calls defined in MCP?).

Question I have is why would I use one over the other?

rahimnathwani

3 months ago

One difference I see is that with tool calls the LLM doesn’t see the actual code. It delegates the task to the LLM. With scripts in an agent, I think the agent can see the code being run and can decide to run something different. I may be wrong about this. The documentation says that assets aren’t read into context. It doesn’t say the same about scripts, which is what makes me think the LLM can read them.

irtemed88

3 months ago

2 replies

Can someone explain the differences between this and Agents in Claude Code? Logically they seem similar. From my perspective it seems like Skills are more well-defined in their behavior and function?

j45

3 months ago

Skills might be used by Agents.

Skills can merge together like lego.

Agents might be more separated.

rahimnathwani

3 months ago

Subagents have their own context. Skills do not.

ryancnelson

3 months ago

1 reply

The uptake on Claude-skills seems to have a lot of momentum already! I was fascinated on Tuesday by “Superpowers” , https://blog.fsck.com/2025/10/09/superpowers/ … and then packaged up all the tool-building I’ve been working on for awhile into somewhat tidy skills that i can delegate agents to:

http://github.com/ryancnelson/deli-gator I’d love any feedback

skinnymuch

3 months ago

Delegation is super cool. I can sometimes end up having too much Linear issue context coming in. IE frequently I want a Linear issue description and last comment retrieved. Linear MCP grabs all comments which pollutes the context and fills it up too much.

mousetree

3 months ago

4 replies

I'm perplexed why they would use such a silly example in their demo video (rotating an image of a dog upside down and cropping). Surely they can find more compelling examples of where these skills could be used?

antiloper

3 months ago

The developer page uses a better example, a PDF processing skill: https://github.com/anthropics/skills/tree/main/document-skil...

I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.

Mouvelie

3 months ago

You'd think so, eh ? https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...

mritchie712

3 months ago

this is the best example I found

https://github.com/anthropics/skills/blob/main/document-skil...

I was dealing with 2 issues this morning getting Claude to produce a .xlsx that are covered in the doc above

alansaber

3 months ago

Dog photo >> informing the consumer

Imnimo

3 months ago

11 replies

I feel like a danger with this sort of thing is that the capability of the system to use the right skill is limited by the little blurb you give about what the skill is for. Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job. But Claude is always starting from ground zero and skimming your descriptions.

zobzu

3 months ago

4 replies

IMO this is a context window issue. Humans are pretty good are memorizing super broad context without great accuracy. Sometimes our "recall" function doesn't even work right ("How do you say 'blah' in German again?"), so the more you specialize (say, 10k hours / mastery), the better you are at recalling a specific set of "skills", but perhaps not other skills.

On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.

Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.

When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.

ryancnelson

3 months ago

1 reply

I commented here already about deli-gator ( https://github.com/ryancnelson/deli-gator ) , but your summary nailed what I didn’t mention here before: Context.

I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.

dingnuts

3 months ago

3 replies

>I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months

what the fuck, there is absolutely no way this was cheaper or more productive than just learning to use curl and writing curl calls yourself. Curl isn't even hard! And if you learn to use it, you get WAY better at working with HTTP!

You're kneecapping yourself to expend more effort than it would take to just write the calls, helping to train a bot to do the job you should be doing

jmtulloss

3 months ago

My interpretation of the parent comment was that they were loading specific curl calls into context so that Claude could properly exercise the endpoints after making changes.

ryancnelson

3 months ago

i know how to use curl. (I was a contributor before git existed) … watching Claude iterate to re-learn whether to try application/x-form-urle ncoded or GET /?foo wastes SO MUCH time and fills your context with “how to curl” that you re-send over again until your context compacts.

You are bad at reading comprehension. My comment meant I can tell Claude “update jira with that test outcome in a comment” and, Claude can eventually figure that out with just a Key and curl, but that’s way too low level.

What I linked to literally explains that, with code and a blog post.

F7F7F7

3 months ago

He’s likely talking about Claude’s hook system that Anthropic created to provide better control over context.

mbesto

3 months ago

> IMO this is a context window issue.

Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.

adastra22

3 months ago

Worth noting, even though it isn’t critical to your argument, that LLMs do not have perfect recall. I got to great lengths to keep agentic tools from relying on memory, because they often get it subtly wrong.

dwaltrip

3 months ago

There are ways to compensate for lack of “continual learning”, but recognizing that underlying missing piece is important.

mbesto

3 months ago

6 replies

> Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job.

Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).

More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.

[0] https://www.youtube.com/watch?v=21EYKqUsPfg

buildbot

3 months ago

2 replies

The industry has been doing RL on many kinds of neural networks, including LLMs, for quite some time. Is this person saying we RL on some kind of non neural network design? Why is that more likely to bring AGI than an LLM?.

> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.

Citation?

jfarina

3 months ago

1 reply

Why are you asking them to cite something for that statement? Are you questioning whether it's the foundation for intelligence or whether LLMS understand goals and consequences?

buildbot

3 months ago

1 reply

Yes, I'm questioning if that's the foundation of intelligence. Says who?

mbesto

3 months ago

Richard Sutton. He won a Turing Award. Why ask your question above when you can just watch the YouTube link I posted?

anomaloustho

3 months ago

1 reply

Looks like they added the link. But I think it’s doing RL in realtime vs pre-trained as an LLM is.

And I associate that part to AGI being able to do cutting edge research and explore new ideas like humans can. Where, when that seems to “happen” with LLMs it’s been more debatable. (e.g. there was an existing paper that the LLM was able to tap into)

I guess another example would be to get an AGI doing RL in realtime to get really good at a video game with completely different mechanics in the same way a human could. Today, that wouldn’t really happen unless it was able to pre-train on something similar.

ibejoeb

3 months ago

I don't think any of the commercial models are doing RL at the consumer. The R is just accepting or rejecting the action, right?

mediaman

3 months ago

8 replies

It's a false dichotomy. LLMs are already being trained with RL to have goal directedness.

He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.

dingnuts

3 months ago

2 replies

Explain something to me that I've long wondered: how does Reinforcement Learning work if you cannot measure your distance from the goal? In other words, how can RL be used for literally anything qualitative?

kmacdough

3 months ago

2 replies

This is one of known hardest parts of RL. The short answer is human feedback.

But this is easier said than done. Current models require vastly more learning events than humans, making direct supervision infeasable. One strategy is to train models on human supervisors, so they can bear the bulk of the supervision. This is tricky, but has proven more effective than direct supervision.

But, in my experience, AIs don't specifically struggle with the "qualitative" side of things per-se. In fact, they're great at things like word choice, color theory, etc. Rather, they struggle to understand continuity, consequence and to combine disparate sources of input. They also suck at differentiating fact from fabrication. To speculate wildly, it feels like it's missing the the RL of living in the "real world". In order to eat, sleep and breath, you must operate within the bounds of physics and society and live forever with the consequences of an ever-growing history of choices.

ewoodrich

3 months ago

1 reply

Whenever I watch Claude Code or Codex get stuck trying to force a square peg into a round hole and failing over and over it makes me wish that they could feel the creeping sense of uncertainty and dread a human would in that situation after failure after failure.

Which eventually forces you to take a step back and start questioning basic assumptions until (hopefully) you get a spark of realization of the flaws in your original plan, and then recalibrate based on that new understanding and tackle it totally differently.

But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.

Because no big deal, if it’s wrong it’s the human's problem to untangle and Anthropic gets paid either way so why not try?

jon-wood

3 months ago

1 reply

> But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.

In fairness I have on many an occasion worked with real life software developers who really should know better deciding the problem lies anywhere but their initial model of how this should work. Quite often that developer has been me, although I like to hope I've learned to be more skeptical when that thought crosses my mind now.

ewoodrich

3 months ago

Right, but typically making those kind of mistakes creates more work for yourself and with the benefit of experience you get better at recognizing the red flags to avoid getting in that situation again. but it

Which is why I think the parent post had a great observation about human problem solving having evolved in a universe inherently formed by the additive effect of every previous decision you've ever made made in your life.

There's a lot of variance in humans, sure, but inescapable stakes/skin in the game from an instinctual understanding that you can't just revert to a previous checkpoint any time you screw up. That world model of decisions and consequences helps ground abstract problem solving ability with a healthy amount of risk aversion and caution that LLMs lack.

mbesto

3 months ago

This 100%.

While we might agreed that language is foundational to what it is to be human, it's myopic to think its the only thing. LLMs are based on training sets of language (period).

mediaman

3 months ago

RL works great on verifiable domains like math, and to some significant extent coding.

Coding is an interesting example because as we change levels of abstraction from the syntax of a specific function to, say, the architecture of a software system, the ability to measure verifiable correctness declines. As a result, RL-tuned LLMs are better at creating syntactically correct functions but struggle as the abstraction layer increases.

In other fields, it is very difficult to verify correctness. What is good art? Here, LLMs and their ilk can still produce good output, but it becomes hard to produce "superhuman" output, because in nonverifiable domains their capability is dependent on mimicry; it is RL that gives the AI the ability to perform at superhuman levels. With RL, rather than merely fitting its parameters to a set of extant data it can follow the scent of a ground truth signal of excellence. No scent, no outperformance.

anomaloustho

3 months ago

2 replies

I wrote elsewhere but I’m more interpreting this distinction as “RL in real-time” vs “RL beforehand”.

munchler

3 months ago

1 reply

I agree with this description, but I'm not sure we really want our AI agents evolving in real time as they gain experience. Having a static model that is thoroughly tested before deployment seems much safer.

mbesto

3 months ago

1 reply

> Having a static model that is thoroughly tested before deployment seems much safer.

While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.

CryptoBanker

3 months ago

> While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.

Many people would argue that's a good thing

stevenpetryk

3 months ago

1 reply

This is referred to as “online reinforcement learning” and is already something done by, for example Cursor for their tab prediction model.

https://cursor.com/blog/tab-rl

tinodb

3 months ago

Not sure that’s the same. They just very frequently retrain and “deploy a new model”.

isodev

3 months ago

1 reply

Let’s not overstate what the technology actually is. LLMs amount to random token generators that try their best to have their outputs “rhyme” with their prompts, instructions, skills, or what humans know as goals and consequences.

adastra22

3 months ago

1 reply

It does a lot more than that.

isodev

3 months ago

1 reply

It’s literally a slot machine for random text. With “services around it” to give the randomness some shape and tools.

adastra22

3 months ago

It is literally not. 2/3 of the weights are in the multi-layer perceptron which is a dynamic information encoding and retrieval machine. And the attention mechanisms allow for very complex data interrelationships.

At the very end of an extremely long and sophisticated process, the final mapping is softmax transformed and the distribution sampled. That is one operation among hundreds of billions leading up to it.

It’s like saying is a jeopardy player is random word generating machine — they see a question and they generate “what is “ followed by a random word—random because there is some uncertainty in their mind even in the final moment. That is both technically true, but incomplete, and entirely missing the point.

leptons

3 months ago

I can't wait to try to convince an LLM/RL/whatever-it-is that what it "thinks" is right is actually wrong.

samrus

3 months ago

The LLMs dont have RL baked into them. They need that at the token prediction level to be able to do the sort of things humans can do

baxtr

3 months ago

So it’s on-the-fly adaptive mimicry?

mbesto

3 months ago

> LLMs are already being trained with RL to have goal directedness.

That might be true, but we're talking about the fundamentals of the concept. His argument is that you're never going to reach AGI/super intelligence on an evolution of the current concepts (mimicry) even through fine tuning and adaptions - it'll like be different (and likely based on some RL technique). At least we have NO history to suggest this will be case (hence his argument for "the bitter lesson").

OtherShrezzing

3 months ago

In the interview transcript, he seems aware that the field is doing RL, and he makes a compelling argument that bootstrapping isn’t as scalable as a purely RL trained AI would be.

vonneumannstan

3 months ago

1 reply

This is an uninformed take. Much of the improvement in performance of LLM based models has been through RLHF and other RL techniques.

mbesto

3 months ago

1 reply

> This is an uninformed take.

You may disagree with this take but its not uninformed. Many LLMs use self‑supervised pretraining followed by RL‑based fine‑tuning but that's essentially it - it's fine tuning.

vonneumannstan

3 months ago

I think you're seriously underestimating the importance of the RL steps on LLM performance.

Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.

Weeenion

3 months ago

I would love to understand were this notion of LLM becoming AGI ever came from?

ChatGPT broke upen the dam to massive budget on AI/LM and LLM will probably be a puzzle peace to AGI. But otherwise?

I mean it should be clear that we have so much work to do like RL (which now happens btw. on massive scale because you thumb up or down every day), thinking, Model of Experts, toolcalling and super super critical: Architecture.

Compute is a hard upper limit too.

And the math isn't done either. The performance of Context length has advanced, we also saw other approcheas like a diffusion based models.

Whenever you hear the leading experts talking, they mention world models.

We are still in a phase were we have plenty of very obivous ideas people need to try out.

But alone the quality of whispher, llm as an interface and tool calling can solve problems with robotics and stuff, no one was able to solve that easy ever before.

hbarka

3 months ago

For humans, it’s not uncommon to have a clever realization by way of serendipity. How do you skill AI to have serendipity.

skurilyak

3 months ago

Besides a "reference manual", Claude Skills is analogous to a "toolkit with an instruction manual" in that it includes both instructions (manuals) and executable functions (tools/code)

ChadMoran

3 months ago

1 reply

This is the crux of knowledge/tool enrichment in LLMs. The idea that we can have knowledge bases and LLMs will know WHEN to use them is a bit of a pipe dream right now.

fragmede

3 months ago

2 replies

Can you be more specific? The simple case seems to be solved, eg if I have an mcp for foo enabled and then ask about a list of foo, Claude will go and call the list function on foo.

corytheboyd

3 months ago

1 reply

> […] and then ask about a list of foo

Not OP, but this is the part that I take issue with. I want to forget what tools are there and have the LLM figure out on its own which tool to use. Having to remember to add special words to encourage it to use specific tools (required a lot of the time, especially with esoteric tools) is annoying. I’m not saying this renders the whole thing “useless” because it’s good to have some idea of what you’re doing to guide the LLM anyway, but I wish it could do better here.

fragmede

3 months ago

1 reply

I've got a project that needs to run a special script and not just "make $target" at the command line in order to build, and with instructions in multiple . MD files, codex w/ gpt-5-high still forgets and runs make blindly which fails and it gets confused annoyingly often.

ooh, it does call make when I ask it to compile, and is able to call a couple other popular tools without having to refer to them by name. if I ask it to resize an image, it'll call imagemagik, or run ffmpeg and I don't need to refer to ffmpeg by name.

so at the end of the day, it seems they are their training data, so better write a popular blog post about your one-off MCP and the tools it exposes, and maybe the next version of the LLM will have your blog post in the training data and will automatically know how to use it without having to be told

delaminator

3 months ago

Yeah, I've done this just now.

I installed ImageMagik on Windows.

Created a ".claude/skills/Image Files/" folder

Put an empty SKILLS.md file in it

and told Claude Code to fill in the SKILLS.md file itself with the path to the binaries.

and it created all the instructions itself including examples and troubleshooting

and in my project prompted

"@image.png is my base icon file, create all the .ico files for this project using your image skill"

and it all went smoothly

ChadMoran

3 months ago

It doesn't reliably do it. You need to inject context into the prompt to instruct the LLM to use tools/kb/etc. It isn't deterministic of when/if it will follow-through.

larrymcp

3 months ago

> starting from ground zero

You probably mean "starting from square one" but yeah I get you

SebastianSosa1

3 months ago

Excellent point, put simply building those preferences and lessons would demand a layer of latent memory, personal models, maybe now is a good time to revisit this idea...

seunosewa

3 months ago

The blurbs can be improved if they aren't effective. You can also invoke skills directly.

The description is equivalent to your short term memory.

The skill is like your long term memory which is retrieved if needed.

These should both be considered as part of the AI agent. Not external things.

andruby

3 months ago

Would this requirement to start from ground zero in current LLMs be an artefact of the requirement to have a "multi-tenant" infrastructure?

Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.

Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?

RicDan

3 months ago

Skills are literally technical documentation for your project it seems. So now we can finally argue for time to write doc, just name it "AI enhancing skill definitions"

blackoil

3 months ago

Most of the experience is general information not specific to project/discussion. LLM starts with all that knowledge. Next it needs a memory and lookup system for project specific information. Lookup in humans is amazingly fast, but even with a slow lookup, LLMs can refer to it in near real-time.

ex3ndr

3 months ago

Humans dont need a skill to know that they need a skill

j45

3 months ago

LLMs are a probability based calculation, so it will always skim to some degree, and always guess to some degree, and often pick the best choice available to it even though it might not be the best.

For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.

fridder

3 months ago

2 replies

All of these random features is just pushing me further towards model agnostic tools like goose

cesarvarela

3 months ago

I wonder how much this affects the model's performance. I imagine Anthropic trains its models to use a generic set of tools, but they can also lean on their specific tool definitions to save the agent from having to guess which tool for what.

xpe

3 months ago

Thanks for sharing goose.

This phase of LLM product development feels a bit like the Tower of Babel days with Cloud services before wrapper tools became popular and more standardization happened.

asdev

3 months ago

1 reply

I wonder what the accuracy is for Claude to always follow a Skill accurately. I've had trouble getting LLMs to follow specific workflows 100% consistently without skipping or missing steps.

Yeroc

3 months ago

We also have the same issues with our fellow humans. LLMs do not replace the need for imperative programs that reliably execute well-defined steps. Turn it inside out. Use the LLM to write the imperative program to execute the workflow. Where necessary, insert the LLM into the workflow to perform the task(s) that can't be done imperatively.

rob

3 months ago

2 replies

Subagents, plugins, skills, hooks, mcp servers, output styles, memory, extended thinking... seems like a bunch of stuff you can configure in Claude Code that overlap in a lot of areas. Wish they could figure out a way to simplify things.

singularity2001

3 months ago

1 reply

Also the post does not contain a single word how it relates to the very similar agents in claude code. Capabilities, connectors, tasks, apps, custom-gpts, ... the space needs some serious consolidation and standardization!

I noticed the general tendency for overlap also when trying to update claude since 3+ methods conflicted with each other (brew, curl, npm, bun, vscode).

Might this be the handwriting of AI? ;)

kordlessagain

3 months ago

The post is simply "here's a folder with crap in it I may or may not use".

CuriouslyC

3 months ago

My agent has handlebars system prompts that you can pass variables at orchestration time. You can cascade imports and such, it's really quite powerful; a few variables can result in radically different system prompt.

_greim_

3 months ago

1 reply

> Developers can also easily create, view, and upgrade skill versions through the Claude Console.

For coding in particular, it would be super-nice if they could just live in a standard location in the repo.

GregorStocks

3 months ago

Looks like they do:

> You can also manually install skills by adding them to ~/.claude/skills.

pixelpoet

3 months ago

2 replies

Aside: I really love Anthropic's design language, so beautiful and functional.

lukev

3 months ago

3 replies

I agree 100%, except for the logo, which persistently looks like something they... probably did not intend.

micromacrofoot

3 months ago

a helpful reminder that these things often speak from their asses

exographicskip

3 months ago

First time I saw it I immediately thought of Vonnegut's logo

nozzlegear

3 months ago

I always thought of it as an ink blot. Until now.

maigret

3 months ago

Yes and fantastically executed, consistently through all their products and website - desktop, command line, third parties and more.

_pdp_

3 months ago

At first I wasn't sure what this is. Upon further inspection skills are effectively a bunch of markdown files and scripts that get unzipped at the right time and used as context. The scripts are executed to get deterministic output.

The idea is interesting and something I shall consider for our platform as well.

deeviant

3 months ago

Basically just rules/workflows from cursor/windsurf, but with a UI.

joilence

3 months ago

If I understand correctly, looks like `skill` is a instructed usage / pattern of tools, so it saves llm agent's efforts at trial & error of using tools? and it basically just a prompt.

azraellzanella

3 months ago

"Keep in mind, this feature gives Claude access to execute code. While powerful, it means being mindful about which skills you use—stick to trusted sources to keep your data safe."

Yes, this can only end well.

j45

3 months ago

I wonder if Claude Skills will help return Claude back to the level of performance it had a few months ago.

bgwalter

3 months ago

"Skills are repeatable and customizable instructions that Claude can follow in any chat."

We used to call that a programming language. Here, they are presumably repeatable instructions how to generate stolen code or stolen procedures so users have to think even less or not at all.

jasonthorsness

3 months ago

When the skill is used locally in Claude Code does it still run in a virtual machine? Like some sort of isolation container with the target directory mounted?

jampa

3 months ago

I think this is great. A problem with huge codebases is that CLAUDE.md files become bloated with niche workflows like CI and E2E testing. Combined with MCPs, this pollutes the context window and eventually degrades performance.

You get the best of both worlds if you can select tokens by problem rather than by folder.

The key question is how effective this will be with tool calling.

m3kw9

3 months ago

I feel like this is making things more complicated than it needs to be. LLMs should automatically do this behind you, you won’t even see it.

crancher

3 months ago

Seems like the exact same thing, from front page a few days ago: https://github.com/obra/superpowers/tree/main

267 more comments available on Hacker News

View full discussion on Hacker News

ID: 45607117Type: storyLast synced: 11/22/2025, 11:17:55 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN