Getting AI to Work in Complex Codebases

Posted3 months agoActive3 months ago

dhorthy

517 points

418 comments

github.comTechstoryHigh profile

calmmixed

Debate

70/100

AI-Assisted CodingSoftware DevelopmentLarge Codebases

Key topics

AI-Assisted Coding

Software Development

Large Codebases

The article discusses strategies for effectively using AI in complex codebases, sparking a discussion on the benefits and challenges of AI-assisted coding, including its impact on developer productivity and code quality.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

106

0-12h

Avg / period

22.9

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Sep 23, 2025 at 10:27 AM EDT
3 months ago
Step 01
02First comment
Sep 23, 2025 at 1:20 PM EDT
3h after posting
Step 02
03Peak activity
106 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Sep 28, 2025 at 5:04 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (418 comments)

Showing 160 comments of 418

malfist

3 months ago

3 replies

This article bases its argument on the predicate that AI _at worst_ will increase developer productivity be 0-10%. But several studies have found that not to be true at all. AI can, and does, make some people less effective

dhorthyAuthor

3 months ago

2 replies

definitely - the standford video has a slide about how many cases caused people to be even slower than without AI

mcny

3 months ago

2 replies

Question for discussion - what steps can I take as a human to set myself up for success where success is defined by AI made me faster, more efficient etc?

0xblacklight

3 months ago

In many cases (though not all) it's the same thing that makes for great engineering managers:

smart generalists with a lot of depth in maybe a couple of things (so they have an appreciation for depth and complexity) but a lot of breadth so they can effectively manage other specialists,

and having great technical communication skills - be able to communicate what you want done and how without over-specifying every detail, or under-specifying tasks in important ways.

Peritract

3 months ago

>where success is defined by AI made me faster, more efficient etc?

I think this attitude is part of the problem to me; you're not aiming to be faster or more efficient (and using AI to get there), you're aiming to use AI (to be faster and more efficient).

A sincere approach to improvement wouldn't insist on a tool first.

keeda

3 months ago

According to the Stanford video the only cases (statistically speaking) where that happened was high-complexity tasks for legacy / low popularity languages, no? I would imagine that is a small minority of projects. Indeed, the video cites the overall productivity boost at 15 - 20% IIRC.

simonw

3 months ago

1 reply

"AI can, and does, make some people less effective"

So those people should either stop using it or learn to use it productively. We're not doomed to live in a world where programmers start using AI, lose productivity because of it and then stay in that less productive state.

bgwalter

3 months ago

If managers are convinced by stakeholders who relentlessly put out pro-"AI" blog posts, then a subset of programmers can be forced to at least pretend to use "AI".

They can be forced to write in their performance evaluation how much (not if, because they would be fired) "AI" has improved their productivity.

telliott1984

3 months ago

There's also the more insidious gap between perceived productivity and actual productivity. Doesn't help that nobody can agree on how to measure productivity even without AI.

tschellenbach

3 months ago

1 reply

I wrote this blogpost on the same topic: https://getstream.io/blog/cursor-ai-large-projects/

It's super effective with the right guardrails and docs. It also works better on languages like Go instead of Python.

dhorthyAuthor

3 months ago

2 replies

why do you think go is better than python (i have some thoughts but curious your take)

polishdude20

3 months ago

1 reply

probably because it's typed?

0xblacklight

3 months ago

1 reply

Among other things; coding agents that can get feedback by running a compile step on top of the linter will tend to produce better output.

Also, strongly-typed languages tend to catch more issues through the language server which the agent can touch through LSP.

heavyset_go

3 months ago

Python is strongly typed

mholm

3 months ago

1 reply

imo:

1. Go's spec and standard practices are more stable, in my experience. This means the training data is tighter and more likely to work.

2. Go's types give the llm more information on how to use something, versus the python model.

3. Python has been an entry-level accessible language for a long time. This means a lot of the code in the training set is by amateurs. Go, ime, is never someone's first language. So you effectively only get code from someone who has already has other programming experience.

4. Go doesn't do much 'weird' stuff. It's not hard to wrap your head around.

dhorthyAuthor

3 months ago

yeah i love that there is a lot of source data for "what is good idiomatic go" - the model doesn't have it all in the training set but you can easily collect coding standards for go with deep research or something

And then I find models try to write scripts/manual workflows for testing, but Go is REALLY good for doing what you might do in a bash script, and so you can steer the model to build its own feedback loop as a harness in go integration tests (we do a lot of this in github.com/humanlayer/humanlayer/tree/main/hld)

ath3nd

3 months ago

4 replies

Why though. Why should we do that?

If AI is so groundbreaking, why do we have to have guides and jump through 3000 hoops just so we can make it work?

ej88

3 months ago

1 reply

why do we have guides and lessons on how to use a chainsaw when we can hack the tree with an axe?

leptons

3 months ago

1 reply

The chainsaw doesn't sometimes chop off your arm when you are using it correctly.

crent

3 months ago

1 reply

If you swing an axe with a lack of hand eye coordination you don't think it's possible to seriously injure yourself?

leptons

3 months ago

Was the axe or the chainsaw designed in such a way that guarantees that it will definitely miss the log and hit your hand fair amount of the times you use it? If it were, would you still use it? Yes, these hand tools are dangerous, but they were not designed so that it would probably cut off your hand even 1% of the time. "Accidents happen" and "AI slop" are not even remotely the same.

So then with "AI" we're taking a tool that is known to "hallucinate", and not infrequently. So let's put this thing in charge of whatever-the-fuck we can?

I have no doubt "AI" will someday be embedded inside a "smart chainsaw", because we as humans are far more stupid than we think we are.

logicchains

3 months ago

1 reply

Even if we had perfectly human-level AI it'd still need management, just like human workers do, and turns out effective management is actually nontrivial.

bluefirebrand

3 months ago

I don't want to effectively manage the idiot box

I want to do the work

spaniard89277

3 months ago

1 reply

Because now your manager will measure on LOCs against other engineers again and it's only software engineers worrying about complexity, maintainability, and, in summary, the health of the very creature it's going to pay your salary.

This is the new world we live in. Anyone who actually likes coding should seriously look for other venues because this industry is for other type of people now.

I use AI in my job. I went from tolerable (not doing anything fancy) to unbearable.

I'm actually looking to become a council employee with a boring job and code my own stuff, because if this is what I have to do moving forward, I rather go back to non-coding jobs.

dhorthyAuthor

3 months ago

1 reply

i strongly disagree with this - if anything, using AI to code real production code in real complex codebase is MORE technical than just writing software.

Staff/Principal engineers already spend a lot more time designing systems than writing code. They care a lot about complexity, maintainability, and good architecture.

The best people I know who have been using these techniques are former CTOs, former core Kubernetes contributors, have built platforms for CRDTs at scale, and many other HIGHLY technical pursuits.

svieira

3 months ago

This is actually where the "myth" of the 10x engineer comes from - there do exist such people and they always could do more than the rest of us ... because they knew what to build. It's not 10K lines of code, it's _the right_ 10K lines of code. Whether using LLMs or LLVM to produce bytes the bytes produced are not the "τέχνη".

That said, I don't think it takes MORE τέχνη to use the machine, merely a distinct ἐμπειρία. That said, both ἐμπειρία and τέχνη aren't σοφία.

0xblacklight

3 months ago

if nuclear power is so much better than coal, why do we need to learn how to safely operate a reactor just to make it work? Coal is so much easier

shafyy

3 months ago

1 reply

> Heck even Amjad was on a lenny's podcast 9 months ago talking about how PMs use Replit agent to prototype new stuff and then they hand it off to engineers to implement for production.

Please kill me now

cube00

3 months ago

1 reply

I got lectured this week that I wasn't working fast enough because the client had already vibe coded (a broken, non-functional prototype) in under an hour.

They saw the the first screen assembled by Replit and figured everything they could see would work with some "small tweaks" which is where I was allegedly to come into the picture.

They continued to lecture me about how the app would need Web Workers for maximum client side performance (explanations full of em-dashes so I knew they were pasting in AI slop at me) and it must all be browser based with no servers because "my prototype doesn't need a server"

Meanwhile their "prototype" had a broken Node.js backend running alongside the frontend listening on a TCP port.

When I asked about this backend they knew nothing about it be assured me their prototype was all browser based with no "servers".

Needless to say I'm never taking on any work from that client again, one of the small joys of being a contractor.

shafyy

3 months ago

Sounds like hell

cheschire

3 months ago

1 reply

As an aside, this single markdown file as an entire GitHub repo is a unique approach to blog posts.

dhorthyAuthor

3 months ago

s/unique/lazy

merlincorey

3 months ago

1 reply

It seems we're still collectively trying to figure out the boundaries of "delegation" versus "abstraction" which I personally don't think are the same thing, though they are certainly related and if you squint a bit you can easily argue for one or the other in many situations.

> We've gotten claude code to handle 300k LOC Rust codebases, ship a week's worth of work in a day, and maintain code quality that passes expert review.

This seems more like delegation just like if one delegated a coding task to another engineer and reviewed it.

> That in two years, you'll be opening python files in your IDE with about the same frequency that, today, you might open up a hex editor to read assembly (which, for most of us, is never).

This seems more like abstraction just like if one considers Python a sort of higher level layer above C and C a higher level layer above Assembly, except now the language is English.

Can it really be both?

dhorthyAuthor

3 months ago

2 replies

I would say its much more about abstraction and the leverage abstractions give you.

You'll also note that while I talk about "spec driven development", most of the tactical stuff we've proven out is downstream of having a good spec.

But in the end a good spec is probably "the right abstraction" and most of these techniques fall out as implementation details. But to paraphrase sandy metz - better to stay in the details than to accidentally build against the wrong abstraction (https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction)

I don't think delegation is right - when me and vaibhav shipped a week's worth of work in a day, we were DEEPLY engaged with the work, we didn't step away from the desk, we were constantly resteering and probably sent 50+ user messages that day, in addition to some point-edits to markdown files along the way.

sothatsit

3 months ago

1 reply

I continue to write codebases in programming languages, not English. LLM agents just help me manipulate that code. They are tools that do work for me. That is delegation, not abstraction.

To write and review a good spec, you also need to understand your codebase. How are you going to do that without reading the code? We are not getting abstracted away from our codebases.

For it to be an abstraction, we would need our coding agents to not only write all of our code, they would also need to explain it all to us. I am very skeptical that this is how developers will work in the near future. Software development would become increasingly unreliable as we won't even understand what our codebases actually do. We would just interact with a squishy lossy English layer.

dhorthyAuthor

3 months ago

3 replies

You don’t think early c programmers spent a lot of time reading the assembly that was produced?

rsynnott

3 months ago

I mean, the ones who were actually _writing_ a C compiler, sure, and to some who were in performance critical spaces (early C compilers were not _good_). But normal programmers, checking for correctness, no, absolutely not. Where did you get that idea?

(The golden age of looking at compiler-generated assembly would've been rather later, when processors added SIMD instructions and compilers started trying to get clever about using them.)

sarchertech

3 months ago

No not really. They didn’t need to spend a lot of time looking at the output because (especially back then) they mostly knew exactly what the assembly was going to look like.

With an LLM, you don’t need to move down to the code layer so you can optimize a tight loop. You need to look at the code so you can verify that the LLM didn’t write a completely different program that what you asked it to write.

sothatsit

3 months ago

Probably at first when the compiler was bad at producing good assembly. But even then, the compiler would still always produce code that matches the rules of the language. This is not the case with LLMs. There is no indication that in the future LLMs will become deterministic such that we could literally write codebases in English and then "compile" them using an LLM into a programming language of our choice and rely on the behaviour of the final program matching our expectations.

This is why LLMs are categorically not compilers. They are not translating English code into some other type of code. They are taking English direction and then writing/editing code based upon that. They are working on a codebase alongside us, as tools. And then you still compile that code using an actual compiler.

We will start to trust these tools more and more, and probably spend less time reviewing the code they produce over time. But I do not see a future where professional developers completely disregard the actual codebase and rely entirely on LLMs for code that matters. That would require a completely different category of tools than what we have today.

sarchertech

3 months ago

It’s definitely not abstraction. You don’t watch a compiler output machine code and constantly “resteer” it.

fusslo

3 months ago

5 replies

Maybe I am just misunderstanding. I probably am; seems like it happens more and more often these days

But.. I hate this. I hate the idea of learning to manage the machine's context to do work. This reads like a lecture in an MBA class about managing certain types of engineers, not like an engineering doc.

Never have I wanted to manage people. And never have I even considered my job would be to find the optimum path to the machine writing my code.

Maybe firmware is special (I write firmware)... I doubt it. We have a cursor subscription and are expected to use it on production codebases. Business leaders are pushing it HARD. To be a leader in my job, I don't need to know algorithms, design patterns, C, make, how to debug, how to work with memory mapped io, what wear leveling is, etc.. I need to know 'compaction' and 'context engineering'

I feel like a ship corker inspecting a riveted hull

dolebirchwood

3 months ago

1 reply

Guess it boils down to personality, but I personally love it. I got into coding later in life, and coming from a career that involved reading and writing voluminous amounts of text in English. I got into programming because I wanted to build web applications, not out of any love for the process of programming in and of itself. The less I have to think and write in code, the better. Much happier to be reading it and reviewing it than writing it myself.

skydhash

3 months ago

3 replies

No ones like programming that much. That's like saying someone love speaking English. You have an idea and you express it. Sometimes there's additional complexity that got in the way (initializing the library, memory cleanup,...), but I put those at the same level as proper greetings in a formal letter.

It also helps starting small, get something useful done and iterate by adding more features overtime (or keeping it small).

kranner

3 months ago

1 reply

> No ones like programming that much. That's like saying someone love speaking English. You have an idea and you express it.

I can assure you both kinds of people exist. Expressing ideas as words or code is not a one-way flow if you care enough to slow down and look closely. Words/clauses and data structures/algorithms exert their own pull on ideas and can make you think about associated and analogous ideas, alternative ways you could express your solution, whether it is even worth solving explicitly and independently of a more general problem, etc.

skydhash

3 months ago

1 reply

IMO, that’s a sign of overthinking (and one thing I try hard to not get caught in). My process is usually:

- What am I trying to do?

- What data do I have available?

- Where do they come from?

- What operations can I use?

- What’s the final state/output?

Then it’s a matter of shifting into the formal space, building and linking stuff.

What I did observe is a lot of people hate formalizing their thoughts. Instead they prefer tweaking stuff until something kinda works and they can go on to the next ticket/todo item. There’s no holistic view about the system. And they hate the 5 why’s. Something like:

- Why is the app displaying “something went wrong” when I submit the form?

- Why is the response is an error when the request is valid?

- Why is the data is persisted when the handler is failing and giving a stack trace in the log?

- Why is it complaining about missing configuration for Firebase?

- …

Ignorance is te default state of programming effort. But a lot of people have great difficulty to say I don’t know AND to go find the answer they lack.

kranner

3 months ago

1 reply

None of this is excluded by my statement. And arguably someone else can draw a line in the sand and say most of this is overthinking somehow and you should let the machine worry about it.

skydhash

3 months ago

I would love to let the computer do the investigative work for me, but I have to double check it, and there's not much mental energy and time saved (if you care about work quality). When I use `pgrep` to check if a process is running, I don't have to inspect the kernel memory to see if it's really there.

It's very much faster, cognitively, to just understand the project and master the tooling. Then it just becomes routine, like playing a short piano piece for the 100th time.

yeasku

3 months ago

The greatest people in each field love what they do.

The best climber in the world loves climbing. Same with drives, cheffs, and yes, people who write code.

procaryote

3 months ago

I know lots of programmers (usually the good ones) who do love programming.

yeasku

3 months ago

Same kind of people wanted to sell smart contracts, blockchain, bitcoin are the ones selling AI.

For them is the world, for us it means nothing.

jmkni

3 months ago

I imagine the LLM's have been trained on a lot less firmware code than say, HTML

jnwatson

3 months ago

I've started to use agents on some very low-level code, and have middling results. For pure algorithmic stuff, it works great. But I asked it to write me some arm64 assembly and it failed miserably. It couldn't keep track of which registers were which.

qweiopqweiop

3 months ago

Honestly - if it's such a good technique it should be built into the tool itself. I think just waiting for the tools to mature a bit will mean you can ignore a lot of the "just do xyz" crap.

It's not at senior engineer level until it asks relevant questions about lacking context instead of blindly trying to solve problems IMO.

daxfohl

3 months ago

2 replies

> And yeah sure, let's try to spend as many tokens as possible

It'd be nice if the article included the cost for each project. A 35k LOC change in a 350k codebase with a bunch of back and forth and context rewriting over 7 hours, would that be a regular subscription, max subscription, or would that not even cover it?

CharlesW

3 months ago

1 reply

From a cost perspective, you would definitely want a Claude Max subscription for this.

dhorthyAuthor

3 months ago

1 reply

yes - correct. For the record, if spending raw tokens, the 2 prs to baml cost about $650.

but yes we switched off per-token this week because we ran out of anthropic credits, we're on max plan now

daxfohl

3 months ago

Haha, when I asked claude the question, it estimated $20-45. https://claude.ai/share/5c3b0592-7bc9-4c40-9049-459058b16920

Horrible, right? When I asked gemini, it guessed 37 cents! https://g.co/gemini/share/ff3ed97634ba

daxfohl

3 months ago

Oh, oops it says further down

> oh, and yeah, our team of three is averaging about $12k on opus per month

I'll have to admit, I was intrigued with the workflow at first. But emm, okay, yeah, I'll keep handwriting my open source contributions for a while.

vanillax

3 months ago

1 reply

Doesnt githubs new speckit solve this? https://github.com/github/spec-kit

0xblacklight

3 months ago

how does this solve it?

ooopakaj

3 months ago

2 replies

I’m not an expert in either language, but seeing a 20k LoC PR go up (linked in the article) would be an instant “lgtm, asshole” kind of review.

> I had to learn to let go of reading every line of PR code

Ah. And I’m over here struggling to get my teammates to read lines that aren’t in the PR.

Ah well, if this stuff works out it’ll be commoditized like the author said and I’ll catch up later. Hard to evaluate the article given the authors financial interest in this succeeding and my lack of domain expertise.

ActionHank

3 months ago

1 reply

I dunno man, I usually close the PR when someone does that and tell them to make more atomic changes.

Would you trust an colleague who is over confident, lies all the time, and then pushes a huge PR? I wouldn't.

Our_Benefactors

3 months ago

6 replies

Closing someone else’s PR is an actively hostile move. Opening a 20k LOC isn’t great either, but going ahead and closing it is rude as hell.

wwweston

3 months ago

1 reply

A 20k LOC PR isn’t reviewable in any normal workflow/process.

The only moves are refusing to review it, taking it up the chain of authority, or rubber stamping it with a note to the effect that it’s effectively unreviewable so rubber stamping must be the desired outcome.

smrtinsert

3 months ago

2 replies

Sure it is if the great majority is tests.

rimunroe

3 months ago

1 reply

I don't understand this attitude. Tests are important parts of the codebase. Poorly written tests are a frequent source of headaches in my experience, either by encoding incorrect assumptions, lying about what they're testing, giving a false sense of security, adding friction to architectural changes/refactors, etc. I would never want to review even 2k lines of test changes in one go.

yojo

3 months ago

1 reply

Preach. Also, don't forget making local testing/CI take longer to run, which costs you both compute and developer context switching.

I've heard people rave about LLMs for writing tests, so I tried having Claude Code generate some tests for a bug I fixed against some autosave functionality - (every 200ms, the auto-saver should initiate a save if the last change was in the previous 200ms). Claude wrote five tests that each waited 200ms (!) adding a needless entire second to the run-time of my test suite.

I went in to fix it by mocking out time, and in the process realized that the feature was doing a time stamp comparisons when a simpler/non-error prone approach was to increment a logical clock for each change instead.

The tests I've seen Claude write vary from junior-level to flat-out-bad. Tests are often the first consumer of a new interface, and delegating them to an LLM means you don't experience the ergonomics of the thing you just wrote.

dhorthyAuthor

3 months ago

i think the general take away for all of this is the model can write the code but you still have to design it. I don't disagree with anything you've said, and I'd say my advice is engage more, iterate more, and work in small steps to get the right patterns and rules laid out. It wont work well on day one if you don't set up the right guidelines and guardrails. That's why it's still software engineering, despite being a different interaction medium.

sensanaty

3 months ago

And if the 10k lines of tests are all garbage, now what? Because tests are the 1 place you absolutely should not delegate to AI outside of setting up the boilerplate/descriptions.

lispisok

3 months ago

Dumping a 20k LOC PR on somebody to review especially if all/a lot of it was generated with AI is disrespectful. The appropriate response is to kick that back and tell them to make it more digestible.

sensanaty

3 months ago

I'm the owner of some of my work projects/repos. I will absolutely without a 2nd thought close a 20k LoC PR, especially an AI generated one, because the code that ends up in master is ultimately my responsibility. Unless it's something like a repo-wide linter change or whatever, there's literally never a reason to have such a massive PR. Break it down, I don't care if it ends up being 200 disparate PRs, that's actually possible to properly review compared to a single 20k line PR.

bak3y

3 months ago

Opening a 20k LOC PR is an actively hostile move worthy of an appropriate response.

Closed > will not review > make more atomic changes.

ActionHank

3 months ago

Dumping a huge PR across a shared codebase wherein everyone else also has to deal with the risk of you monumental changes is pretty rude as well, I would even go so far as to say that it is likely selfishly risky.

GoatInGrey

3 months ago

If somebody did this, it means they ignored their team's conventions and offloaded work onto colleagues for their own convenience. Being considered rude by the offender is not a concern of mine when dealing with a report who pulls this kind of antisocial crap.

CuriouslyC

3 months ago

If this stuff works out, you'll be behind the curve and people who were on the ball will have your job.

koakuma-chan

3 months ago

3 replies

Context has never been the bottleneck for me. AI just stops working when I reach certain things that AI doesn't know how to do.

jmkni

3 months ago

1 reply

My problem is it keeps working, even when it reaches certain things it doesn't know how to do.

I've been experimenting with Github agents recently, they use GPT-5 to write loads of code, and even make sure it compiles and "runs" before ending the task.

Then you go and run it and it's just garbage, yeah it's technically building and running "something", but often it's not anything like what you asked for, and it's splurged out so much code you can't even fix it.

Then I go and write it myself like the old days.

koakuma-chan

3 months ago

I have same experience with CC. It loves to comment out code, add a "fallback" implementation that returns mock data, and act like the thing works.

0xblacklight

3 months ago

1 reply

> Context has never been the bottleneck for me. AI just stops working when I reach certain things that AI doesn't know how to do.

It's context all the way down. That just means you need to find and give it the context to enable it to figure out how to do the thing. Docs, manuals, whatever. Same stuff that you would use to enable a human that doesn't know how to do it to figure out how.

koakuma-chan

3 months ago

1 reply

At that point it's easier to implement the thing yourself, and then let AI work with that.

bluefirebrand

3 months ago

Or just forget the AI entirely, if you can build it yourself then do it yourself

I treat "uses AI tools" as a signal that a person doesn't know what they are doing

lacy_tinpot

3 months ago

3 replies

Specifically what did you have difficulty implementing where it "just stops working"?

koakuma-chan

3 months ago

1 reply

Anything it has not been trained on. Try getting AI to use OpenAI's responses API. You will have to try very hard to convince it not to use the chat completions API.

anthonypasq

3 months ago

1 reply

in cursor you can index docs by just adding a url and then reference it like file context in the editor

dhorthyAuthor

3 months ago

1 reply

yeah once again you need the right context to override what's in the weights. It may not know how to use the responses api, so you need to provide examples in context (or tools to fetch them)

anthonypasq

3 months ago

1 reply

im struggling to understand what the issue with that is

incoming1211

3 months ago

1 reply

This is just an issue with people who expect AI to solve all of lifes problems before they get out of bed not realising they have no idea how AI works or what it produces and decide "it stops working because it sucks" instead of "it stops working because I don't know what I'm doing"

yeasku

3 months ago

The issue is people who cant code.

saberience

3 months ago

I've had AI totally fail several times on Swift concurrency issues, i.e. threads deadlocking or similar issues. I've also had AI totally fail on memory usage issues in Swift. In both cases I've had to go back to reasoning over the bugs myself and debugging them by hand, fixing the code by hand.

nicklaf

3 months ago

In my limited experiments with Gemini: it stops working when presented with a program containing fundamental concurrency flaws. Ask it to resolve a race condition or deadlock and it will flail, eventually getting caught in a loop, suggesting the same unhelpful remedies over and over.

I imagine this has to to with concurrency requiring conceptual and logical reasoning, which LLMs are known to struggle with about as badly as they do with math and arithmetic. Now, it's possible that the right language to work with the LLM in these domains is not program code, but a spec language like TLA+. However, at that point, I'd probably just spend less effort to write the potentially tricky concurrent code myself.

philipp-gayret

3 months ago

1 reply

Can't agree with the formula for performance, on the "/ size" part. You can have a huge codebase, but if the complexity goes up with size then you are screwed. Wouldn't a huge but simple codebase be practical and fine for AI to deal with?

The hierarchy of leverage concept is great! Love it. (Can't say I like the 1 bad line of CLAUDE.md is 100K lines of bad code; I've had some bad lines in my CLAUDE.md from time to time - I almost always let Claude write it's own CLAUDE.md.).

dhorthyAuthor

3 months ago

i mean there's also the fact that claude code injects this system message into your claude.md which means that even if your claude.md sucks you will probably be okay:

<system-reminder> IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context or otherwise consider it in your response unless it is highly relevant to your task. Most of the time, it is not relevant. </system-reminder>

lots of others have written about this so i won't go deep but its a clear product decision, but if you don't know what's in your context window, you can't respond/architect your balance between claude.md and /commands well.

hellovai

3 months ago

2 replies

if you haven't tried the research -> plan -> implementation approach here, you are missing out on how good LLMs are. it completely changed my perspective.

the key part was really just explicitly thinking about different levels of abstraction at different levels of vibecoding. I was doing it before, but not explicitly in discrete steps and that was where i got into messes. The prior approach made check pointing / reverting very difficult.

When i think of everything in phases, i do similar stuff w/ my git commits at "phase" levels, which makes design decision easier to make.

I also do spend ~4-5 hours cleaning up the code at the very very end once everything works. But its still way faster than writing hard features myself.

0xblacklight

3 months ago

3 replies

tbh I think the thing that's making this new approach so hard to adopt for many people is the word "vibecoding"

Like yes vibecoding in the lovable-esque "give me an app that does XYZ" manner is obviously ridiculous and wrong, and will result in slop. Building any serious app based on "vibes" is stupid.

But if you're doing this right, you are not "coding" in any traditional sense of the word, and you are *definitely* not relying on vibes

Maybe we need a new word

dhorthyAuthor

3 months ago

1 reply

alex reibman proposed hyperengineering

i've also heard "aura coding", "spec-driven development" and a bunch of others I don't love.

but we def need a new word cause vibe coding aint it

Xss3

3 months ago

Vibe coding is accepting ai output based on vibes. Simple as that.

You can vibe code using specs or just by having a conversation.

simonw

3 months ago

2 replies

I'm sticking to the original definition of "vibe coding", which is AI-generated code that you don't review.

If you're properly reviewing the code, you're programming.

The challenge is finding a good term for code that's responsibly written with AI assistance. I've been calling it "AI-assisted programming" but that's WAY too long.

dhorthyAuthor

3 months ago

we can come up with something better :)

greymalik

3 months ago

It’s just programming. We don’t use a different word for writing code in an IDE either.

chickensong

3 months ago

AI is the new pAIr programming.

giancarlostoro

3 months ago

> but not explicitly in discrete steps and that was where i got into messes.

I've said this repeatedly, I mostly use it for boilerplate code, or when I'm having a brain fart of sorts, I still love to solve things for myself, but AI can take me from "I know I want x, y, z" to "oh look I got to x, y, z in under 30 minutes, which could have taken hours. For side projects this is fine.

I think if you do it piecemeal it should almost always be fine. When you try to tell it to do two much, you and the model both don't consider edge cases (Ask it for those too!) and are more prone for a rude awakening eventually.

iambateman

3 months ago

3 replies

I built a package which I use for large codebase work[0].

It starts with /feature, and takes a description. Then it analyzes the codebase and asks questions.

Once I’ve answered questions, it writes a plan in markdown. There will be 8-10 markdowns files with descriptions of what it wants to do and full code samples.

Then it does a “code critic” step where it looks for errors. Importantly, this code critic is wrong about 60% of the time. I review its critique and erase a bunch of dumb issues it’s invented.

By that point, I have a concise folder of changes along with my original description, and it’s been checked over. Then all I do is say “go” to Claude Code and it’s off to the races doing each specific task.

This helps it keep from going off the rails, and I’m usually confident that the changes it made were the changes I wanted.

I use this workflow a few times per day for all the bigger tasks and then use regular Claude code when I can be pretty specific about what I want done. It’s proven to be a pretty efficient workflow.

[0] GitHub.com/iambateman/speedrun

j45

3 months ago

1 reply

This looks very cool.

I see it has a pseudo code step, was it helpful at all to try to define a workflow, process or procedure beforehand?

I've also heard that keeping each file down to 100 lines is critical before connecting them. Noticed the same but haven't tried it in depth.

dhorthyAuthor

3 months ago

File size matters if you don’t have strategically placed “read the entire file” instructions for certain parts of the workflow (we do)

eddywebs

3 months ago

4 replies

The biggest challenge i found with LLMs on large codebase is making the same mistakes again and again How do keep track of the architecture decisions in context of every tasks on the large codebase ?

tom_m

3 months ago

2 replies

Very very clear, unambiguous, prompts and agent rules. Use strong language like "must" and "critical" and "never" etc. I would also try working on smaller sections of a large codebase at a time too if things are too inaccurate.

The AI coding tools are going to be looking at other files in the project to help with context. Ambiguity is the death of AI effectiveness. You have to keep things clear and so that may require addressing smaller sections at a time. Unless you can really configure the tools in ways to isolate things.

This is why I like tools that have a lot of control and are transparent. If you ask a tool what the full system and user prompt is and it doesn't tell you? Run away from that tool as fast as you can.

You need to have introspections here. You have to be able to see what causes a behavior you don't want and be able to correct it. Any tool that takes that away from you is one that won't work.

sarchertech

3 months ago

3 replies

> Use strong language like "must" and "critical" and "never" etc.

Truly we live in the stupidest timeline. Imagine if you had a domestic robot but when you asked it make you breakfast you had to preface your request with “it’s critical that you don’t kill me.”

Or when you asked it to do the laundry you had to remember to tell it that it “must not make its own doors by knocking holes in the wall” and hope that it listens.

sally_glance

3 months ago

There is even a chance our timeline might include that robot too at some point...

Book recommendation no one asked for but which is essentially about some guy living through multiple more or less stupid timelines: Count to Eschaton series by John C. Wright

phs318u

3 months ago

> Truly we live in the stupidest timeline.

Wholeheartedly agree. Truly, I look around and see no shortage of evidence for this assertion.

EDITED to make it clear I am agreeing with parent.

filoeleven

3 months ago

"Better living through prompt fondling."

mattmanser

3 months ago

1 reply

I always chuckle at these rain dance posts.

INVOKE THE ULTRATHINK OH MIGHTY CLAUDE AND BLESS MY CODE.

Have you tried kissing the keyboard before you press enter? It makes the code 123% more flibbeled.

alecco

3 months ago

Cool. Added it to the top of all my CLAUDE.md files and my web UI preferences. But I'll kiss the mic instead, as I'm a voice-only vivecoder.

athrowaway3z

3 months ago

1 reply

`opencode` will read any amount of `!cmd` output.

I start my sessions with something like `!cat ./docs/*` and I can start asking questions. Make sure you regularly ask it to point out any inconsistencies or ambiguity in the docs.

eddywebs

3 months ago

nice !

loandbehold

3 months ago

Whenever I see Claude Code make same mistake multiple times I add instructions to clade.md to avoid it in the future.

hyperadvanced

3 months ago

In some sense “the same mistakes again and again” is either a prompting problem or a “you” problem insofar as your expectations differ from the machine overlords.

scuff3d

3 months ago

6 replies

I will never understand why anyone wants to go through all this. I don't believe for a second this is more productive than regular coding with a little help from the LLM.

mattigames

3 months ago

2 replies

There is a chunk of devs using AI that do it not because they believe it makes them more productive in the present but because it might do so in the near future thanks to advances on AI tech/models, and then some do it because they think it might be required from them to do it this way by their bosses at some point in the future, so they can show preparedness and give the impression of being up to date with how the field evolves, even if at the end it turns out it doesn't speed up things that much.

scuff3d

3 months ago

2 replies

That line of thinking makes no sense to me honestly.

We are years into this, and while the models have gotten better, the guard rails that have to be put on these things to keep the outputs even semi useful are crazy. Look into the system prompts for Claude sometime. And then we have to layer all these additional workflows on top... Despite the hype I don't see any way we get to this actually being a more productive way to work anytime soon.

And not only are we paying money for the privilege to work slower (in some cases people are shelling out for multiple services) but we're paying with our time. There is no way working this way doesn't degrade your fundamental skills, and (maybe) worse the understanding of how things actually work.

Although I suppose we can all take solice in the fact that our jobs aren't going anywhere soon. If this is what it takes to make these things work.

athrowaway3z

3 months ago

1 reply

I'm sorry to be blunt here, but the fact you're looking at idiotic use of Claude.md system prompts tells me you're not actually looking at the most productive users, and your opinion doesn't even cover 'where we are'.

I don't blame people who think this. I've stopped visiting Ai Subreddits because the average comment and post is just terrible, with some straight up delusional.

But broadly speaking - in my experience - either you have your documentation set up correctly and cleanly such that a new junior hire could come in and build or fix something in a few days without too many questions. Or you don't. That same distinction seems to cut between teams who get the most out of AI and those that insist everybody must be losing more time than it costs.

---

I suspect we could even flip it around: the cost it takes to get an AI functioning in your code base is a good proxy for technical debt.

scuff3d

3 months ago

I wasn't talking about the system prompts provided by users. I was talking about what the companies have to put between the users and the LLM.

ktzar

3 months ago

And most importantly, we're paying with our brain and skills degradation. Once all these services stop being subsidised there will be a massive amount of programmers who no longer can code.

rsynnott

3 months ago

1 reply

Using something becuase it _might one day be useful_ is pretty weird. Just use it if and when it is useful.

faeyanpiraat

3 months ago

1 reply

But _everything_ you do initially falls into the category of _might one day be useful_, since you haven't yet learned how to do the thing well.

rsynnott

3 months ago

The claim I was responding to you was that some people use our friends the magic robots not because they think they are useful now, but because they think they might be useful in the future.

the_duke

3 months ago

3 replies

It absolutely can be, by a huge margin.

You spend a few minutes generating a spec, then agents go off and do their coding, often lasting 10-30 minutes, including running and fixing lints, adding and running tests, ...

Then you come back and review.

But you had 10 of these running at the same time!

You become a manager of AI agents.

For many, this will be a shitty way to spend their time.... But it is very likely the future of this profession.

noodletheworld

3 months ago

1 reply

You didnt have 10 of them running though.

You want to do that, but Ill bet money you arent doing it.

Thats the problem: this is speculative; maybe it scales sometimes, but mostly people do not work on ten things at once.

“Fix the landing page”

“I’ll make you ten new ones!”

“No. Calm down. Fix this one, and do it now, not when youre finished playing with your prompts”

There are legitimate times when complex pieces of work decompose into parallel tasks, but its the exception not the norm.

Most complex work has linked dependencies that need to be done in order.

Remember the mythical man month? Anyone? Anyone???!!??

You can't just add “more parallel” to get things done faster.

the_duke

3 months ago

1 reply

I definitely did have 10 running sometimes, mostly just based on copy-pasting issues from the issue tracker.

Codex / Jules etc make this pretty easy.

It's often not a sustainable pace with where the current tooling is at, though.

Especially because you still need to do manual fixes and cleanups quite often.

noodletheworld

3 months ago

1 reply

> sometimes

Mhm. Money -> to the dealer.

Anyway… watch the videos the OP has of the coding live streams. Thats the most interesting part of this post: actual real examples of people really using these tools in a way that is transferable and specifically detailed enough to copy and do yourself.

quikoa

3 months ago

1 reply

Could you share a link to the coding live streams? I can't find it.

pauletienney

3 months ago

I found this in the article: https://www.youtube.com/watch?v=42AzKZRNhsk

OtherShrezzing

3 months ago

1 reply

For each process, say you spend 3 minutes generating a spec. Presumably you also spend 5 minutes in PR and merging.

You can’t do 10 of these processes at once, because there’s 8 minutes of human administration which can’t be parallelised for every ~20min block of parallelisable work undertaken by Claude. You can have two, and intermittently three, parallel process at once under the regime described here.

Ianjit

3 months ago

1 reply

Parallel vs serial computation... its something you would hope software engineers understand...

scuff3d

3 months ago

From the code I've seen surprisingly few do.

scuff3d

3 months ago

2 replies

The number you have running is irrelevant. Primarily because humans are absolutely terrible at multitasking and context switching. An endless number of studies have been done on this. Each context switch cost you a non-trivial amount of time. And yes, even in the same project, especially big ones, you will be context switching each time one of these finishes it's work.

That coupled with the fact that you have to meticulously review every single thing the AI does is going to obliterate any perceived gains you get from going through all the trouble to set this up. And on top of that it's going to be expensive as fuck quick on a non trivial code base.

And before someone says "well you don't have to be that thorough with reviews", in a professional settings absolutely you do. Every single AI policy in every single company out there makes the employee using the tool solely responsible for the output of the AI. Maybe you can speed run when you're fucking around on your own, but you would have to be a total moron to risk your job by not being thorough. And the more mission critical the software the more thorough you have to be.

At the end of the day a human with some degree of expertise is the bottleneck. And we are decades away from these things being able to replace a human.

oblio

3 months ago

> you have to meticulously review every single thing the AI does

Joke's on you (and me? and I guess on us as a profession?).

brulard

3 months ago

How about a bug fixing use case? Let agents pick bugs from Jira and let it do some research and thinking, setting up data and environment for reproduction. Let it write a unit test manifesting the bug (making it failing test). Let it take a shot at implementing the fix. If it succeeds, let it make a PR.

This can all be done autonomously without user interaction. Now many bugs can be few lines of code and might be relatively easy to review. Some of these bug fixes may fail, may be wrong etc. but even if half of them were good, this is absolutely worth it. In my specific experience the success rate was around 70%, and the rest of the fixes were not all worthless but provided some more insight into the bug.

ByteDrifter

3 months ago

2 replies

Maybe the real question isn’t whether AI is useful, but whether we’ve designed workflows that let humans and AI collaborate effectively.

scuff3d

3 months ago

You can build the greatest house ever built, but if you build it on top of sand it's still going to collapse.

oblio

3 months ago

> but whether we’ve designed workflows that let humans and AI collaborate effectively.

In my experience with workflows that let humans and humans (let alone AIs) collaborate effectively, they are NP-hard problems.

procaryote

3 months ago

1 reply

I've always assumed it is because they can't do the regular coding themselves. If you compare spending months on trying to shake a coding agent into not exploding too much with spending years on learning to code, the effort makes more sense

iambateman

3 months ago

1 reply

As a counterpoint, I’ve been coding professionally since 2010.

Every feature I’ve asked Claude Code to write was one I could’ve written myself.

And I’m quite certain it’s faster for my use case.

I won’t be bothered if you choose to ignore agents but the “it’s just useful for the inept” argument is condescending.

brulard

3 months ago

I'm in the same boat. I'm 20 years into my SWE career, I can write all the things Claude Code writes for me now but it still makes me faster and deliver better quality features (Like accessibility features, transitions, nice to have bells and whistles) I may not had time or even thought of to do otherwise. And all that with documentation and tests.

KaiMagnus

3 months ago

2 replies

I got access to Kiro from Amazon this week and they’re doing something similar. First a requirements document is written based on your prompt, then a design document and finally a task list.

At first I thought that was pretty compelling, since it includes more edge cases and examples that you otherwise miss.

In the end all that planning still results in a lot of pretty mediocre code that I ended up throwing away most of the time.

Maybe there is a learning curve and I need to tweak the requirements more tho.

For me personally, the most successful approach has been a fast iteration loop with small and focused problems. Being able to generate prototypes based on your actual code and exploring different solutions has been very productive. Interestingly, I kind of have a similar workflow where I use Copilot in ask mode for exploration, before switching to agent mode for implementation, sounds similar to Kiro, but somehow it’s more successful.

Anyways, trying to generate lots of code at once has almost always been a disaster and even the most detailed prompt doesn’t really help much. I’d love to see how the code and projects of people claiming to run more than 5 LLMs concurrently look like, because with the tools I’m using, that would be a mess pretty fast.

scuff3d

3 months ago

I doubt there's much you could do to make the output better. And I think that's what really bothers me. We are layering all this bullshit on to try and make these things more useful then they are, but it's like building a house on sand. The underlying tech is impressive for what it is, and has plenty of interesting use cases in specific areas, but it flat out isn't what these corporations want people to believe it is. And none of it justifies the massive expenditure of resources we've seen.

peanutz454

3 months ago

> At first I thought that was pretty compelling, since it includes more edge cases and examples that you otherwise miss.

So, is it good for writing requirements, and creating design, if not for coding?

sneilan1

3 months ago

1 reply

It’s not necessarily faster to do this for a single task. But it’s faster when you can do 2-3 tasks at the same time. Agentic coding increases throughout.

scuff3d

3 months ago

1 reply

Until you reach the human bottle neck of having to context switch, verify all the work, presumably tell them to fix it, and then switch back to what you were doing or review something else.

I believe people are being honest when they say these things speed them up, because I'm sure it does seem that way to them. But reality doesn't line up with the perception.

sneilan1

3 months ago

1 reply

True, if you are in a big company with lots of people, you won't benefit much from the improved throughput of agentic coding.

A greenfield startup however with agentic coding in it's DNA will be able to run loops around a big company with lots of human bottlenecks.

The question becomes, will greenfield startups, doing agentic coding from the ground up, replace big companies with these human bottlenecks like you describe?

What does a startup, built using agentic coding with proper engineering practices, look like when it becomes a big corporation & succeeds?

scuff3d

3 months ago

That's not my point at all. Doesn't matter where you work, if a developer is working in a code base with a bunch of agents, they are always going to be the bottleneck. All the agent threads have to merge back to the developer thread at some point. The more agent threads the more context switching that has to occur, the smaller and smaller the productivity improvement gets, until you eventually end up in the negative.

I can believe a single developer with one agent doing some small stuff and using some other LLM tools can get a modest productivity boost. But having 5 or 10 of these things doing shit all at once? No way. Any gains are offset by having to merge and quality check all that work.

iagooar

3 months ago

2 replies

I am working on a project with ~200k LoC, entirely written with AI codegen.

These days I use Codex, with GPT-5-Codex + $200 Pro subscription. I code all day every day and haven't yet seen a single rate limiting issue.

We've come a long way. Just 3-4 months ago, LLMs would start doing a huge mess when faced with a large codebase. They would have massive problems with files with +1k LoC (I know, files should never grow this big).

Until recently, I had to religiously provide the right context to the model to get good results. Codex does not need it anymore.

Heck, even UI seems to be a solved problem now with shadcn/ui + MCP.

My personal workflow when building bigger new features:

1. Describe problem with lots of details (often recording 20-60 mins of voice, transcribe)

2. Prompt the model to create a PRD

3. CHECK the PRD, improve and enrich it - this can take hours

4. Actually have the AI agent generate the code and lots of tests

5. Use AI code review tools like CodeRabbit, or recently the /review function of Codex, iterate a few times

6. Check and verify manually - often times, there are a few minor bugs still in the implementation, but can be fixed quickly - sometimes I just create a list of what I found and pass it for improving

With this workflow, I am getting extraordinary results.

AMA.

mentos

3 months ago

2 replies

What platform are you developing for, web?

Did you start with Cursor and move to Codex or only ever Codex?

drewnick

3 months ago

1 reply

Not OP, but I use Codex for back-end, scripting, and SQL. Claude Code for most front-end. I have found that when one faces a challenge, the other often can punch through and solve the problem. I even have them work together (moving thoughts and markdown plans back and fourth) and that works wonders.

My progression: Cursor in '24, Roo code mid '25, Claude Code in Q2 '25, Codex CLI in Q3 `25.

iagooar

3 months ago

Cursor for me until 3-4 weeks ago, now Codex CLI most of the time.

These tools change all the time, very quickly. Important to stay open to change though.

iagooar

3 months ago

Yes, it is a web project with next.js + Typescript + Tailwind + Postgres (Prisma).

I started with Cursor, since it offers a well-rounded IDE with everything you need. It also used to be the best tool for the job. These days Codex + GPT-5-Codex is king. But I sometimes go back to Cursor, especially when reading / editing the PRDs or if I need the ocasional 2nd opinion by Claude.

Mockapapella

3 months ago

1 reply

This sounds very similar to my workflow. Do you have pre-commits or CI beyond testing? I’ve started thinking about my codebase as an RL environment with the pre-commits as hyperparameters. It’s fascinating seeing what coding patterns emerge as a result.

iagooar

3 months ago

Yes, I do have automated linting (a bit of a PITA at this scale). On the CI side I am using Github Actions - it does the job, but haven't put much work into it yet.

Generally I have observed that using a statically typed language like Typescript helps catching issues early on. Had much worse results with Ruby.

r2ob

3 months ago

I refactored CPython using GPT-5, turning the compiler bilingual for english and portuguese keywords.

https://github.com/ricardoborges/cpython

what web programming task GPT-5 can't handle?

jascha_eng

3 months ago

Except for ofc pushing their own product (humanlayer) and some very complex prompt template+agent setups that are probably overkill for most, the basics in this post about compaction and doing human review at the correct level are pretty good pointers. And giving a bit of a framework to think within is also neat

jb2403

3 months ago

It’s refreshing to read a full article this was written by a human. Content +++

258 more comments available on Hacker News

View full discussion on Hacker News

ID: 45347532Type: storyLast synced: 11/22/2025, 11:00:32 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN