Toolkit to Help You Get Started with Spec-Driven Development
Posted2 months agoActive2 months ago
github.comTechstory
skepticalmixed
Debate
80/100
Spec-Driven DevelopmentAI-Assisted CodingSoftware Development Methodologies
Key topics
Spec-Driven Development
AI-Assisted Coding
Software Development Methodologies
GitHub released Spec-Kit, a toolkit for Spec-Driven Development, sparking debate on the role of AI in coding and the viability of spec-driven approaches.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
6d
Peak period
31
144-156h
Avg / period
8.4
Comment distribution42 data points
Loading chart...
Based on 42 loaded comments
Key moments
- 01Story posted
Nov 3, 2025 at 7:48 AM EST
2 months ago
Step 01 - 02First comment
Nov 9, 2025 at 6:04 AM EST
6d after posting
Step 02 - 03Peak activity
31 comments in 144-156h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 11, 2025 at 2:52 AM EST
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45798473Type: storyLast synced: 11/20/2025, 4:44:33 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://github.com/github/spec-kit/tree/main?tab=readme-ov-f...
If you want to go all in on specs, you must fully commit to allowing the AI to regenerate the codebase from scratch at any point. I'm an AI optimist, but this is a laughable stance with current tools.
That said, the idea of operating on the codebase as a mutable, complex entity, at arms length, makes a TON of sense to me. I love touching and feeling the code, but as soon as there's 1) schedule pressure and 2) a company's worth of code, operating at a systems level of understanding just makes way more sense. Defining what you want done, using a mix of user-centric intent and architecture constraints, seems like a super high-leverage way to work.
The feedback mechanisms are still pretty tough, because you need to understand what the AI is implicitly doing as it works through your spec. There are decisions you didn't realize you needed to make, until you get there.
We're thinking a lot about this at https://tern.sh, and I'm currently excited about the idea of throwing an agentic loop around the implementation itself. Adversarially have an AI read through that huge implementation log and surface where it's struggling. It's a model that gives real leverage, especially over the "watch Claude flail" mode that's common in bigger projects/codebases.
Is the key insight and biggest stumbling block for me at the moment.
At the moment (encourage by my company) I'm experimenting with as hands off as possible Agent usage for coding. And it is _unbelievably_ frustrating to see the Agent get 99% of the code right in the first pass only to misunderstand why a test is now failing and then completely mangle both it's own code and the existing tests as it tries to "fix" the "problem". And if I'd just given it a better spec to start with it probably wouldn't have started producing garbage.
But I didn't know that before working with the code! So to develop a good spec I either have to have the agent stopping all the time so I can intervene or dive into the code myself to begin with and at that point I may as well write the code anyway as writing the code is not the slow bit.
And my process now (and what we're baking into the product) is:
- Make a prompt
- Run it in a loop over N files. Full agentic toolkit, but don't be wasteful (no "full typecheck, run the test suite" on every file).
- Have an agent check the output. Look for repeated exploration, look for failures. Those imply confusion.
- Iterate the prompt to remove the confusion.
First pass on the current project (a Vue 3 migration) went from 45 min of agentic time on 5 files to 10 min on 50 files, and the latter passed tests/typecheck/my own scrolling through it.
On your homepage there is a mention that Tern “writes its own tools”, could you give an example on how this works?
Tern can write that tool for you, then use it. It gives you more control in certain cases than simply asking the AI to do something that might appear hundreds of times in your code.
For spec driven development to truly work, perhaps what’s needed is a higher level spec language that can express user intent precisely, at the level of abstraction where the human understanding lives, while ensuring that the lower level implementation is generated correctly.
A programmer could then use LLMs to translate plain English into this “spec language,” which would then become the real source of truth.
That's a good idea, have a specification, divide into chunks, have an army of agents, each of them implementing a chunk, have an agent identify weak points, incomplete implementations, bugs and have an army of agents fixing issues.
Reminds me of TDD bandwagon which was all the rage when I started programming. It took years to slowly die out and people realized how overhyped it really was. Nothing against AI, I love it as a tool, but this "you-don't-need-code" approach shows similar signs. Quick wins at first, lots of hype because of those wins, and then reaching a point where doing even tiny changes becomes absurdly difficult.
You need code. You will need it for a long time.
"The readymade components we use are essentially compressed bundles of context—countless design decisions, trade-offs, and lessons are hidden within them. By using them, we get the functionality without the learning, leaving us with zero internalized knowledge of the complex machinery we've just adopted. This can quickly lead to sharp increase in the time spent to get work done and sharp decrease in productivity."
https://martinfowler.com/articles/llm-learning-loop.html
I think in terms of building features. TDD generally requires thinking in terms of proving behavior. I still can't wrap my head around first writing a test that fails and then writing minimal code to make it pass (I know I am simplifying it).
Different strokes for different folks. I'm sure it works great for some people but not for me.
So you build feature A. Great.
Why not write a test to instantiate the feature? It will fail, because you haven’t built it yet. Now go build it.
I assume you build the interface first already, before every little detail of the methods?
Also, you really don’t have to stick to the TDD principles that much. It’s basically to ensure: 1. You have a test which can fail (it actually tests something) 2. You actually have a test for your unit 3. You actually only code what you’re supposed to
This is great for juniors, but as you have more experience, these individual steps lose value, but the principles remain I think.
Also, I would never write tests for every method (as TDD might have you believe), because that’s not my “unit” in unit testing.
It never really went away. The problem is that there is a dearth of teaching materials telling people how to do it properly:
* E2E test first
* Write high level integration tests which match requirements by default
* Only start writing lower level unit tests when a clear and stable API emerges.
and most people when they tried it didn't do that. They mostly did the exact opposite:
* Write low level unit tests which match the code by default.
* Never write a higher level tests (some people don't even think it's possible to write an integration or e2e test with TDD because "it has to be a unit test").
it makes it really hard to recommend TDD when people believe they already know what it is but are doing it ass backwards.
https://htmx.org/essays/codin-dirty/
For something complex, it’s kinda hard to write and debug high level tests when all the lower level functionality is missing and just stubbed out.
We don’t expect people to write working software that cannot be executed first, yet we expect people to write (and complete) all tests before the actual implementation.
Sure for trivial things, it’s definitely doable. But then extensive tests wouldn’t be needed for such either!
Imagine someone developing an application where the standard C library was replaced with a stub implementation… That wouldn’t work… Yet TDD says one should be able to do pretty much the same thing…
No it doesnt say you should do that. TDD says red green refactor that is all. You can and should do that with an e2e test or integration test and a real libc to do otherwise would be ass backwards.
Yours is the exact unit testing dogma that I was referring to that people have misunderstood as being part of TDD due to bad education.
[1] https://github.com/github/spec-kit/blob/main/spec-driven.md
Worth reading before jumping in.
> We need to avoid at all costs the "great specs - no MVP" problem.
this issue doesn't seem useful or helpful at all.
I’m perhaps less sold on the idea of the spec being the source of truth — would have to do some design iterations and see if that holds up. I do like that it imposes some structure/rigor on the design process.
It’s higher level programming, perhaps, but it’s still programming.