AI Is Forcing Us to Write Good Code
Key topics
As AI becomes increasingly integrated into development workflows, a provocative question is being debated: is AI forcing us to write better code? Some commenters are skeptical, with one dryly referencing "drinking the Kool-Aid" while others, like imron, appreciate the insights, acknowledging the challenges of harnessing AI's potential. A fascinating dynamic emerged around the difficulties solo devs versus teams face when working with AI agents, with justatdotin noting that solo devs tend to have an easier time. The author, sgk284, clarifies that many teams struggle to make AI work for them, expecting a more magical solution than the reality.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
40m
Peak period
45
4-6h
Avg / period
12.3
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 29, 2025 at 2:11 PM EST
9 days ago
Step 01 - 02First comment
Dec 29, 2025 at 2:51 PM EST
40m after posting
Step 02 - 03Peak activity
45 comments in 4-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 30, 2025 at 4:32 PM EST
8 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
"Ship AI features and tools in minutes, not weeks. Give Logic a spec, get a production API—typed, tested, versioned, and ready to deploy."
some (many?) devs don't want agents. Either because the agent takes away the 'fun' part of their work, or because they don't trust the agent, or because they truly do not find a use for it in their process.
I remember being on teams which only remained functional because two devs tried very hard to stay out of one another's way. Nothing wrong with either of them, their approach to the work was just not very compatible.
In the same way, I expect diverse teams to struggle with finding a mode of adoption that does not negatively impact on the existing styles of some members.
i was thinking it was more like llms when used personally can make huge refactorings and code changes that you review yourself and just check it in, but with a team its harder to make sweeping changes that an llm might make more possible cause now everyone's changes start to conflict... but i guess thats not much of an issue in practice?
100% coverage for AI generated code is a very different value proposition than 100% coverage for human generated code (for the reasons outlined in the article).
All praise drunk-driving for increased seatbelt use.
That's a negative correlation signal for me (as are all the other weird TLDs that I have not seen besides SEO spam results and perhaps the occasional HN submission.) On the other hand, .com, .net, and .org are a positive signal.
It's a footnote on the post, but I expand on this with:
Seems like even if people could potentially die, industry standards are not really 100% realistic. (Also, redundancy in production is more of a solution than having some failures and recalls, which are solved with money.)
But what I care about is code breaking (or rather, it not breaking). I'd rather put effort into ensuring my test suite does provide a useful benefit in that regard, rather than measure an arbitrary target which is not a good measure of that.
jmo, but tests should be a chance to improve the implementation of functions not just one-off "write and forget" confirmations of the happy path only... automating all that just short-circuits that whole process... but maybe i'm missing something.
How are LLMs going to stay on top of new design concepts, new languages, really anything new?
Can LLMs be trained to operate "fluently" with regards to a genuinely new concept?
I think LLMs are good for writing certain types of "bad code", i.e. if you're learning a new language or trying to quickly create a prototype.
However to me it seems like a security risk to try to write "good code" with an LLM.
This is true now, but it can't stay true, given the enormous costs of training. Inference is expensive enough as is, the training runs are 100% venture capital "startup" funding and pretty much everyone expects them to go away sooner or later
Can't plan a business around something that volatile
Of course, some of those predictions may also turn out to be true. But either way, we have abundant empirical evidence that the reasoning is not sound.
Especially with the massive context windows modern LLMs have. The core idea that the GPT-3 paper introduced was (summarizing):
(A thing I think is under-explored is how much LLMs change where the value of tests are. Back in the artisan hand-crafted code days, unit tests were mostly useful as scaffolding: Almost all the value I got from them was during the writing of the code. If I'd deleted the unit tests before merging, I'd've gotten 90% of the value out of them. Whereas now, the AI doesn't necessarily need unit tests as scaffolding as much as I do, _but_ having them put in there makes future agentic interactions safer, because they act as reified context.)
The tests I have for systems that keep evolving while being production critical over a decade are invaluable. I cannot imagine touching a thing without the tests. Many of which reference a ticket they prove remains fixed: a sometimes painfully learned lesson.
Writing, art, creative output, that's nothing at all like code, which puts the software industry in a more particular spot than anything else in automation.
I don't think anyone wants production code paths that have never been tried, right?
But not all bad tests come from a goal of 100% coverage.
I've worked at one (1) place that, whilst not quite fully that, they did have a spare dev environment that you could claim temporarily for deploying changes, doing integration tests, etc. Super handy when people are working on (often wildly) divergent projects and you need at least one stable dev environment + integration testing.
Been trying to push this at $CURRENT without much success but that's largely down to lack of cloudops resources (although we do have a sandbox environment, it's sufficiently different to dev that it's essentially worthless.)
I use devenv.sh to give me quick setup of individual environments, but I'm spending a bit of my break trying to extend that (and its processes) to easily run inside containers that I can attach Zed/VSCode remoting to.
It strikes me that (as the article points out) this would also be useful for using Agents a bit more safely, but as a regular old human it'd also be useful.
(If the analogy, in the first paragraph, of a Roomba dragging poop around the house didn't convince you)
Gentleman, the dog writes poetry and music, but it is not even good poetry. Nothing to see.
Some people are amazed the dog can write poetry. Some people complain that the poetry isn't good enough.
It's like having the power to turn water into gross, undrinkable wine.
At Qlty, we are going al far as to rewrite hundreds of thousands of lines of code to ensure full test coverage, end-to-end type checking (including database-generated types).
I’ll add a few more:
1. Zero thrown errors. These effectively disable the type checker and act as goto statements. We use neverthrow for Rust-like Result types in TypeScript.
2. Fast auto-formatting and linting. An AI code review is not a substitute for a deterministic result in sub-100ms to guarantee consistency. The auto-formatter is set up as a post-tool use Claude hook.
3. Side-effect free imports and construction. You should be able to load all the code files and construct an instance of every class in your app without a network connection spawning. This is harder than it sounds and without it you run into all sorts of trouble with the rest.
3. Zero mocks and shared global state. By mocks, I mean mocking frameworks which override functions on existing types or global. These effectively are injecting lies into the type checker.
Should put to tsgo which has dramatically lowered our type checking latency. As the tok/sec of models keeps going up, all the time is going to get bottlenecked on tool calls (read: type checking and tests).
With this approach we now have near 100% coverage with a test suite that runs in under 1,000ms.
But the weird thing is: those things have always been important to me.
And it has always been a good idea to invest in those, for my team and me.
Why am doing this 200% now?
With no contention for shared resources and no async/IO, it just function calls running on Bun (JavaScriptCore) which measures function calling latency in nanoseconds. I haven't measured this myself, but the internet seems to suggest JavaScriptCore function calls can run in 2 to 5 nanoseconds.
On a computer with 10 cores, fully concurrent, that would imply 10 billion nanoseconds of CPU time in one wall clock second. At 5 nanoseconds per function call, that would imply a theoretical maximum of 2 billion function calls per second.
Real world is not going to be anywhere close to that performance, but where is the time going otherwise?
Thats why those Engineering fields have strict rules and often require formal education and someone can even end up in prison if screw up good enough.
Software is so much easier and safer, till very recently anonymous engineering was the norm and people are very annoyed with Apple pushing for signing off the resulting product.
Highly paid software across the board must have been an anomaly that is ending now. Maybe in the future only those who code actually novel solutions or high risk software will be paid very well - just like engineers in the other fields.
Apple is very much welcome to push for signing off of software that appears on their own store. That is nothing new.
What people are annoyed about is Apple insisting that you can only use their store, a restriction that has nothing to do with safety or quality and everything to do with the stupendous amounts of money they make from it.
Also, people complain all the time about rules and regulations for making stuff. Especially in EU, you can't just create products however you like and let people decide if it is safe to use, you are required to make your products to meet certain criteria and avoid use certain chemicals and methods, you are required to certify certain things and you can't be anonymous. If you are making and selling cupcakes for example and if something goes wrong you will be held responsible. Not only when things go wrong, often local governments will do inspections before letting you start making the cupcakes and every now and then they can check you out.
Software appears to be headed to that direction. Of course du to the nature of software probably wouldn't be exactly like that but IMHO it is very likely that at least having someone responsible for the things a software does will become the norm.
Maybe in the future if your software leaks sensitive information for example, you may end up being investigated and fined if not following best practices that can be determined by some institute etc.
It turns out that Apple is not, in fact, the government.
This is already the case in the UK, and the EU too as far as I’m aware.
Just to be clear, apps have to be notarized/signed to run on an Apple device. For macOS, notorized apps aren't required to be distributed in the App Store. Due to sandbox restrictions, some dev tools are distributed independently.
Or there are two versions: a less capable version for the App Store and a more capable version distributed independently.
Skill and strictness required is only vaguely related to pay, if there is enough people for the job it won't pay amazing, regardless on how hard it is.
> Software is so much easier and safer, till very recently anonymous engineering was the norm and people are very annoyed with Apple pushing for signing off the resulting product.
that has nothing to do with engineering quality, that is just to make it harder to go around their ecosystem (and skip paying the shop fee). With additional benefit of signed package being harder to attack. You can still deliver absolute slop, but the slop will be from you, not the middleman that captured the delivery process
Wow. What about, I don't know, self-teaching? In general, you have to be very arrogant to say that you've applied all the "real" applications.
LLMs MAY be a version of office hours or asking the TA, if you only have the book and no actual teacher. I have seen nothing that convinces me they are anything more than the latest version of the hammer in our toolbox. Not every problem is a nail.
In my experience, most TA's are not great at explaining things to students. They were often the best student in their class, and they can't relate to students who don't grasp things as easily as they do--"this organic chemistry problem set is so easy; I don't know why you're not getting it."
But an LLM has infinite patience and can explain concepts in a variety of ways, in different languages and at different levels. Bilingual students that speak English just fine, but they often think and reason in their native language in their mind. Not a problem for an LLM.
A teacher in an urban school system with 30 students, 20 of which need customized lesson plans due to neurological divergence can use LLMs to create these lesson plans.
Sometimes you need things explained to you like you're five years old and sometimes you need things explained to you as an expert.
On deeper topics, LLMs give their references, so a student can and should confirm what the LLM is telling them.
That’s a complete bogus. And LLMs are yes men by default, nothing stops you from overriding initial setting.
So why arent you a master of, I don't know, reupholstery? Because the barrier isn't information, it's you. You're the bottle neck, we all are, because we're humans.
One of the most important factors in actually learning something is humility. Unfortunately, LLM chatbots are designed to discourage this in their users. So many people think they’re experts because they asked a chatbot. They aren’t.
It's delusional and very arrogant of you to confidently asserts anything without proof: A topic like RLC circuits has got a body of rigorous theorems and proofs underlying it*, and nothing stops you from piecing it together using an LLM.
* - See "Positive-Real Functions", "Schwarz-Pick Theorem", "Schur Class". These are things I've been mulling over.
It's a very useful tool of course, but not as good as the software situation.
The response "that's why we shouldn't have done it like that!" sounds like a variation on the usual "You're absolutely right! I apologize for any confusion". Why would we want to get stuck in a loop where an AI produces loads of absolute nonsense for us to painstakingly debug and debunk, after which the AI switches track to some different nonsense, which we again have debug and debunk, and so on. That doesn't sound like a good loop.
I literally just had this exact argument; biggest issue is they can be tested for smaller amperages but you just downside the breaker.
>CEO of an AI company
Many such cases
What I notice is that Claude stumbles more on code that is illogical, unclear or has bad variable names. For example if a variable is name "iteration_count" but actually contains a sum that will "fool" AI.
So keeping the code tidy gives the AI clearer hints on what's going on which gives better results. But I guess that's equally true for humans.
LLM has very high chance of on shotting this and doing it well.
So eventually it gets to the point where I'm basically explaining to it what interfaces to abstract, what should be an implementation detail and what can be exposed to the wider system, what the method signatures should look like, etc.
So I had a better experience when I just wrote the code myself at a very high level. I know what the big picture look of the software will be. What types I need, what interfaces I need, what different implementations of something I need. So I'll create them as stubs. The types will have no fields, the functions will have no body, and they'll just have simple comments explaining what they should do. Then I ask the LLM to write the implementation of the types and functions.
And to be fair, this is the approach I have taken for a very long time now. But when a new more powerful model is released, I will try and get it to solve these types of day to day problems from just prompts alone and it still isn't there yet.
It's one of the biggest issues with LLM first software development from what I've seen. LLMs will happily just build upon bad foundations and getting them to "think" about refactoring the code to add a new feature takes a lot of prompting effort that most people just don't have. So they will stack change upon change upon change and sure, it works. But the code becomes absolutely unmaintainable. LLM purists will argue that the code is fine because it's only going to be read by an LLM but I'm not convinced. Bad code definitely confuses the LLMs more.
I tend to use a shotgun approach, and then follow with an aggressive refactor. It can actually take a lot of time to prune and restructure the code well compared to the firehose of code generation.
This seems to work well for me. I write a lot of model training code, and it works really well for the breadth of experiments I can run. But by the end it looks like a graveyard of failed ideas.
It is kind of backwards because it would have been great to do it before. But it was never prioritized. Now good internal documentation is seen as essential because it feeds the models.
Surely they know 100% code coverage is not a magical bullet because the code flow and the behavior can differ depending on the input. Just because you found a few examples which happen to hit every line of code you didn't hit every possible combination. You are living in a fool's paradise which is not a surprise because only fools believe in LLMs. You are looking for a formal proof of the codebase which of course no one does because the costs would be astronomical (and LLMs are useless for it which is not at all unique because they are useless for everything software related but they are particularly unusable for this).
1. Clearly explain the massive harm LLMs cause society and the environment to everyone. (Mass media should do this instead of parroting every nonsense the promptfondlers feed them.)
2. Ban them all. Don't tell me it's impossible just because it's widespread. Asbesthos was everywhere.
Seems actively harmful, and the AI hype died out faster than I thought it would.
> Agents will happily be the Roomba that rolls over dog poop and drags it all over your house
There it is, folks!
If you can make good tests the AI shouldn't be able to cheat them. It will either churn forever or pass them.
Since it sticks pretty close to the spec and since TLA+ is about modifying state, the code it generates is pretty ugly, but ugly-and-correct code beats beautiful code that's not verified.
It's not perfect; something that naively adheres to a spec is rarely optimized, and I've had to go in and replace stuff with Tokio or Mio or optimize a loop because the resulting code is too slow to be useful, and sometimes the code is just too ugly for me to put up with so I need to rewrite it, but the amount of time to do that is generally considerably lower than if I were doing the translation myself entirely.
The reason I started doing this: the stuff I've been experimenting with lately has been lock-free data structures, and I guess what I am doing is novel enough that Codex does not really appear to generate what I want; it will still use locks and lock files and when I complain it will do the traditional "You're absolutely right", and then proceed to do everything with locks anyway.
In a sense, this is close to the ideal case that I actually wanted: I can focus on the high-level mathey logic while I let my metaphorical AI intern deal with the minutia of actually writing the code. Not that I don't derive any enjoyment out of writing Rust or something, but the code is mostly an implementation detail to me. This way, I'm kind of what I'm supposed to be doing, which is "formally specify first, write code second".
That said, we're not talking about vibe coding here, but properly reviewed code, right? So the human still goes "no, this is wrong, delete these tests and implement for these criteria"?
Now I can leave an agent running, come back an hour or two later, and it's written almost perfect, typed, extremely well tested code.
It's funny how on one side you have people using AI to write worse code than ever, and on the other side people use AI as an extension of their engineering discipline.
> For decades, we’ve all known what “good code” looks like.
When relatively trivial concerns such as the ideal length of methods haven't achieved consensus, I doubt there can be any broadly accepted standard for software quality. There are plenty of metrics such as test coverage, but anyone with experience could tell you how easy it is to game those and that enforcing arbitrary standards can even cause harm.
Very little effort expended on tests, and very little gain from them. Just useless formality. But management was happy, because 90% coverage is something good, must be.
Including using more rigidly typed languages, making sure things are covered with tests, using code analysis tools to spot anti patterns and addressing all the warnings, etc. That was always a good idea but we now have even less excuses to skip all that.
No explanation of how this isn't chasing a metric, or gaming it.
> When a model adds or changes code, we force it to demonstrate how that line behaves. It can’t stop at “this seems right.” It has to back it up with an executable example.
More often IME, the test is just making basic smoke tests that assert useless details. That sure sounds like gaming the metric to me. Really, nothing in this article about PR reviews of AI-generated tests?
54 more comments available on Hacker News