Claude Skills
Posted3 months agoActive3 months ago
anthropic.comTechstoryHigh profile
calmmixed
Debate
60/100
AILlmsClaudeAnthropic
Key topics
AI
Llms
Claude
Anthropic
Anthropic released 'Claude Skills', a feature allowing users to create and manage repeatable instructions for Claude, sparking discussion on its utility, complexity, and potential overlap with existing features.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
42m
Peak period
129
0-6h
Avg / period
20
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 16, 2025 at 12:05 PM EDT
3 months ago
Step 01 - 02First comment
Oct 16, 2025 at 12:47 PM EDT
42m after posting
Step 02 - 03Peak activity
129 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 19, 2025 at 6:42 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45607117Type: storyLast synced: 11/22/2025, 11:17:55 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
You too can win a jackpot by spinning the wheel just like these other anecdotal winners. Pay no attention to your dwindling credits every time you do though.
I have been using claude code to create some and organize them but they can have diminishing return.
It’s not exactly wrong, but it leaves out a lot of intermediate steps.
This stuff is like front end devs building fad add-ons which call into those core elements and falsely market themselves as fundamental advancements.
Plugins include: * Commands * MCPs * Subagents * Now, Skills
Marketplaces aggregate plugins.
But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.
From a technical perspective, it seems like unnecessary complexity in a way. Of course I recognize there are lot of product decisions that seem to layer on 'unnecessary' abstractions but still have utility.
In terms of connecting with customers, it seems sensible, under the assumption that Anthropic is triaging customer feedback well and leading to where they want to go (even if they don't know it yet).
Update: a sibling comment just wrote something quite similar: "All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs." I think I agree.
Of course this is why the model providers keep shipping new ones; without them their product is a commodity.
This is where waiting for this stuff to stablize/standardize, and then writing a "skill" based on an actual RFC or standard protocol makes more sense, IMO. I've been burned too many times building vendor-locked chatbot extensions.
Not mine! I made a few when they first opened it up to devs, but I was trying to use Azure Logic Apps (something like that?) at the time which was supremely slow and finicky with F#, and an exercise in frustration.
Building a new one that works well is a project, but then it will scale up as much as you like.
This is bringing some of the advantages of software development to office tasks, but you give up some things like reliable, deterministic results.
On the one hand, AI doesn’t classify as labor in a traditional sense, even though some aspire to replace labor with AI.
On the other hand, if it classified as labor under some new definition, it isn’t free when you consider the external costs of outsourcing basic brain activity, as an individual and as a society.
"Equipping agents for the real world with Agent Skills" https://www.anthropic.com/engineering/equipping-agents-for-t...
Specifically, it looks like skills are a different structure than mcp, but overlap in what they provide? Skills seem to be just markdown file & then scripts (instead of prompts & tool calls defined in MCP?).
Question I have is why would I use one over the other?
Skills can merge together like lego.
Agents might be more separated.
http://github.com/ryancnelson/deli-gator I’d love any feedback
I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.
https://github.com/anthropics/skills/blob/main/document-skil...
I was dealing with 2 issues this morning getting Claude to produce a .xlsx that are covered in the doc above
On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.
Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.
When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.
I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.
what the fuck, there is absolutely no way this was cheaper or more productive than just learning to use curl and writing curl calls yourself. Curl isn't even hard! And if you learn to use it, you get WAY better at working with HTTP!
You're kneecapping yourself to expend more effort than it would take to just write the calls, helping to train a bot to do the job you should be doing
You are bad at reading comprehension. My comment meant I can tell Claude “update jira with that test outcome in a comment” and, Claude can eventually figure that out with just a Key and curl, but that’s way too low level.
What I linked to literally explains that, with code and a blog post.
Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.
Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).
More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.
[0] https://www.youtube.com/watch?v=21EYKqUsPfg
> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.
Citation?
And I associate that part to AGI being able to do cutting edge research and explore new ideas like humans can. Where, when that seems to “happen” with LLMs it’s been more debatable. (e.g. there was an existing paper that the LLM was able to tap into)
I guess another example would be to get an AGI doing RL in realtime to get really good at a video game with completely different mechanics in the same way a human could. Today, that wouldn’t really happen unless it was able to pre-train on something similar.
He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.
But this is easier said than done. Current models require vastly more learning events than humans, making direct supervision infeasable. One strategy is to train models on human supervisors, so they can bear the bulk of the supervision. This is tricky, but has proven more effective than direct supervision.
But, in my experience, AIs don't specifically struggle with the "qualitative" side of things per-se. In fact, they're great at things like word choice, color theory, etc. Rather, they struggle to understand continuity, consequence and to combine disparate sources of input. They also suck at differentiating fact from fabrication. To speculate wildly, it feels like it's missing the the RL of living in the "real world". In order to eat, sleep and breath, you must operate within the bounds of physics and society and live forever with the consequences of an ever-growing history of choices.
Which eventually forces you to take a step back and start questioning basic assumptions until (hopefully) you get a spark of realization of the flaws in your original plan, and then recalibrate based on that new understanding and tackle it totally differently.
But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.
Because no big deal, if it’s wrong it’s the human's problem to untangle and Anthropic gets paid either way so why not try?
In fairness I have on many an occasion worked with real life software developers who really should know better deciding the problem lies anywhere but their initial model of how this should work. Quite often that developer has been me, although I like to hope I've learned to be more skeptical when that thought crosses my mind now.
Which is why I think the parent post had a great observation about human problem solving having evolved in a universe inherently formed by the additive effect of every previous decision you've ever made made in your life.
There's a lot of variance in humans, sure, but inescapable stakes/skin in the game from an instinctual understanding that you can't just revert to a previous checkpoint any time you screw up. That world model of decisions and consequences helps ground abstract problem solving ability with a healthy amount of risk aversion and caution that LLMs lack.
While we might agreed that language is foundational to what it is to be human, it's myopic to think its the only thing. LLMs are based on training sets of language (period).
Coding is an interesting example because as we change levels of abstraction from the syntax of a specific function to, say, the architecture of a software system, the ability to measure verifiable correctness declines. As a result, RL-tuned LLMs are better at creating syntactically correct functions but struggle as the abstraction layer increases.
In other fields, it is very difficult to verify correctness. What is good art? Here, LLMs and their ilk can still produce good output, but it becomes hard to produce "superhuman" output, because in nonverifiable domains their capability is dependent on mimicry; it is RL that gives the AI the ability to perform at superhuman levels. With RL, rather than merely fitting its parameters to a set of extant data it can follow the scent of a ground truth signal of excellence. No scent, no outperformance.
While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.
Many people would argue that's a good thing
https://cursor.com/blog/tab-rl
At the very end of an extremely long and sophisticated process, the final mapping is softmax transformed and the distribution sampled. That is one operation among hundreds of billions leading up to it.
It’s like saying is a jeopardy player is random word generating machine — they see a question and they generate “what is “ followed by a random word—random because there is some uncertainty in their mind even in the final moment. That is both technically true, but incomplete, and entirely missing the point.
That might be true, but we're talking about the fundamentals of the concept. His argument is that you're never going to reach AGI/super intelligence on an evolution of the current concepts (mimicry) even through fine tuning and adaptions - it'll like be different (and likely based on some RL technique). At least we have NO history to suggest this will be case (hence his argument for "the bitter lesson").
You may disagree with this take but its not uninformed. Many LLMs use self‑supervised pretraining followed by RL‑based fine‑tuning but that's essentially it - it's fine tuning.
Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.
ChatGPT broke upen the dam to massive budget on AI/LM and LLM will probably be a puzzle peace to AGI. But otherwise?
I mean it should be clear that we have so much work to do like RL (which now happens btw. on massive scale because you thumb up or down every day), thinking, Model of Experts, toolcalling and super super critical: Architecture.
Compute is a hard upper limit too.
And the math isn't done either. The performance of Context length has advanced, we also saw other approcheas like a diffusion based models.
Whenever you hear the leading experts talking, they mention world models.
We are still in a phase were we have plenty of very obivous ideas people need to try out.
But alone the quality of whispher, llm as an interface and tool calling can solve problems with robotics and stuff, no one was able to solve that easy ever before.
Not OP, but this is the part that I take issue with. I want to forget what tools are there and have the LLM figure out on its own which tool to use. Having to remember to add special words to encourage it to use specific tools (required a lot of the time, especially with esoteric tools) is annoying. I’m not saying this renders the whole thing “useless” because it’s good to have some idea of what you’re doing to guide the LLM anyway, but I wish it could do better here.
ooh, it does call make when I ask it to compile, and is able to call a couple other popular tools without having to refer to them by name. if I ask it to resize an image, it'll call imagemagik, or run ffmpeg and I don't need to refer to ffmpeg by name.
so at the end of the day, it seems they are their training data, so better write a popular blog post about your one-off MCP and the tools it exposes, and maybe the next version of the LLM will have your blog post in the training data and will automatically know how to use it without having to be told
I installed ImageMagik on Windows.
Created a ".claude/skills/Image Files/" folder
Put an empty SKILLS.md file in it
and told Claude Code to fill in the SKILLS.md file itself with the path to the binaries.
and it created all the instructions itself including examples and troubleshooting
and in my project prompted
"@image.png is my base icon file, create all the .ico files for this project using your image skill"
and it all went smoothly
You probably mean "starting from square one" but yeah I get you
The description is equivalent to your short term memory.
The skill is like your long term memory which is retrieved if needed.
These should both be considered as part of the AI agent. Not external things.
Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.
Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?
For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.
This phase of LLM product development feels a bit like the Tower of Babel days with Cloud services before wrapper tools became popular and more standardization happened.
I noticed the general tendency for overlap also when trying to update claude since 3+ methods conflicted with each other (brew, curl, npm, bun, vscode).
Might this be the handwriting of AI? ;)
For coding in particular, it would be super-nice if they could just live in a standard location in the repo.
> You can also manually install skills by adding them to ~/.claude/skills.
The idea is interesting and something I shall consider for our platform as well.
Yes, this can only end well.
We used to call that a programming language. Here, they are presumably repeatable instructions how to generate stolen code or stolen procedures so users have to think even less or not at all.
You get the best of both worlds if you can select tokens by problem rather than by folder.
The key question is how effective this will be with tool calling.
267 more comments available on Hacker News