AI Was Supposed to Help Juniors Shine. Why Does It Mostly Make Seniors Stronger?
Posted4 months agoActive4 months ago
elma.devTechstoryHigh profile
heatedmixed
Debate
85/100
AI in Software DevelopmentJunior vs Senior DevelopersProductivity and AI
Key topics
AI in Software Development
Junior vs Senior Developers
Productivity and AI
The article discusses how AI is making senior developers more productive, but not necessarily helping junior developers shine as initially expected, sparking a debate on the role of AI in software development.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
44m
Peak period
43
3-6h
Avg / period
12.3
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 20, 2025 at 8:56 PM EDT
4 months ago
Step 01 - 02First comment
Sep 20, 2025 at 9:41 PM EDT
44m after posting
Step 02 - 03Peak activity
43 comments in 3-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 22, 2025 at 1:28 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45319062Type: storyLast synced: 11/20/2025, 8:09:59 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Those are two different narratives. One implies that everyone will be able to code and build: "English as a programming language", etc. The other is one of those headless-chicken, apocalyptic scenarios where AI has already made (or will very shortly make) human programmers obsolete.
"AI taking jobs" means everyone's job. I won't even comment on the absurdity of that idea; to me, it only comes from people who've never worked professionally.
At the end of the day, companies will take any vaguely reasonable excuse to cull juniors and save money. It's just business. LLMs are simply the latest excuse, though yes, they do improve productivity, to varying degrees depending on what exactly you work on.
Also, those two narratives are sometimes deployed as a false-dichotomy, where both just make the same assumption that LLM weaknesses will vanish and dramatic improvement will continue indefinitely.
A historical analogy:
* A: "Segway™ balancing vehicles will be so beneficially effective that private vehicles will be rare in 2025."
* B: "No, Segways™ will be so harmfully effective that people will start to suffer from lower body atrophy by 2025."
Once you've worked professionally, it's not so absurd. I mean, you really see to believe the extreme compromises in quality that upper management is often willing to tolerate to save a buck in the short term.
I work professionally (I am even a bit renowned) and still believe AI will take my (and everyone's) job.
It's wrong maybe 40-50% of the time, so I can't even imagine the disasters I'm averting by recognising when it's giving me completely bonkers suggestions.
To me, that has never been more true.
Most junior dev ask GeminiPiTi to write the JavaScript code for them, whereas I ask it for explanation on the underlying model of async/await and the execution model of a JavaScript engine.
There is a similar issue when you learn piano. Your immediate wish is to play Chopin, whereas the true path is to identify,name and study all the tricks there are in his pieces of art.
Chopin has beginners pieces too, many in our piano studio were first year pianists doing rain drop prelude, e minor prelude, or other beginner works like Bach.
But my important point in it was the « identify and name » the elements of your problem [piece of music, or whatever]
Learning process has usually been, as you mention, to follow the path first, then eventually you can name things afterwards. Which is a highly uncomfortable process.
Another path that AI might force us to follow is to quick-identifyAndName the proper concepts, ahead of practical experience.
I have never heard that before
In the end AI is a tool that helps everyone to get better but the knowledge and creativity is still in the people not in the input files of chatgpt.
1. Unconsciously incompetent
2. Consciously incompetent
3. Consciously competent
4. Unconsciously competent
The challenge with AI, it will give you “good enough” output, without feedback loops you never move to 2,3,4 and assume you are doing ok. Hence it stunts learning. So juniors or inexperienced stay inexperienced, without knowing what they don’t know.
You have to Use it as an expert thinking partner. Tell it to ask you questions & not give you the answer.
Similarly, it takes experience to spot when the LLM is going in the wrong direction it making mistakes.
I think for supercharging a junior, it should be used more like a pair programmer, not for code generation. It can help you quickly gain knowledge and troubleshoot. But relying on a juniors prompts and guidance to get good code gen is going to be suboptimal.
-techs they understand but still not master. AI aids with implementation details only experts knowb about
- No time for long coding tasks. It aids with fast implementations and automatic tests.
- No time for learning techs that adress well understood problems. Ai helps with quick intros, fast demos and solver of learners' misunderstandings
In essence, in seniors it impacts productivity
In the case of juniors AI fills the gaps too. But these are different from seniors' and AI does not excell in them because gaps are wider and broader
- Understand the problems of the business domain. AI helps but not that much.
- Understand how the organization works. AI is not very helpful here.
- Learn the techs to be used. AI helps but it doesn't know how to guide a junior in a specific organisational context and specific business domain.
In essence it helps, but not that much because the gaps are wider and more difficult to fill
Only when you know about the basic notions in the field you want to work with AI can be productive. This is not only valid for coding but also for other fields in science and humanities.
I’ve literally asked for details about libraries I know exist by name, and had every llm I’ve tried (Claude, Gemini Pro, ChatGPT) just make shit up that sounded about right, but was actually just-wrong-enough-to-lead-me-on-a-useless-rabbit-hole-search.
At least most people on stackoverflow saying that kind of thing were somewhat obviously kind of dumb or didn’t know what they were doing.
Like function calls with wrong args (or spelled slightly differently), capitalization being wrong (but one of the ‘okay’ ways), wrong paths and includes.
I've lost count of how many times I've asked whether some command line tool has an option or config available for some niche case and ChatGPT or Gemini shouts "Yes! Absolutely! just use '--use-external-mode' to get the behavior you want, it's that simple!" and it's 100% hallucination created by mangling together my intent with a real option in the docs but which in reality does not actually exist nor has it ever existed. It's even worse with GUI/menu navigation questions I'm guessing because it's even less grounded by text-based docs and trivially easy to bullshit that an option is buried in Preferences, the External tab maybe, somewhere, probably.
The desperate personality tuning to please the user at all costs combined with LLMs inherently fuzzy averaging of reality produces negative value whenever I truly need a binary yes/no "Does X exist in Y or not?" answer to a technical question. Then I waste a bunch of time falling back to Google trying to definitively prove or disprove whether "--use-external-mode" is a real thing and sure enough, it's not.
It does occasionally lead to hilariously absurd exchanges where when challenged instead of admitting its mistake the LLM goes on to invent an elaborate entirely fabricated backstory about the implementation of the "--use-external-mode" command to explain why despite appearing to not exist, it actually does but due to conflicts with X and Y it isn't supported on my environment, etc, etc.
I use Claude Code, Roo Code, Codex and Gemini CLI constantly so I'm no kneejerk LLM hater to be clear. But for all the talk about being "a better version of Google" I have had so much of my time wasted by sending me down endless rabbit holes where I ignored my sneaking suspicion I was being lied to because the answer sounded just so plausibly perfect. I've had the most success by far as a code generation tool vs. a Google replacement.
Yeah I've had that one a lot. Or, it's a real option that exists in a different, but similar product, but not in this one.
Man I don't miss that place or those people. Glad AI's basically destroyed it.
This is all a pretty well-trodden debate at this point though. AI works as a Copilot which you monitor and verify and task with specific things, it does not work as a pilot. It's not about junior or senior, it's about whether you want to use this thing to do your homework/write your essay/write your code for you or whether you use it as an assistant/tutor, and whether you are able to verify its output or not.
Edit interesting thread: https://news.ycombinator.com/item?id=27678424
Edit: an example of the kind of comment I was talking about: https://news.ycombinator.com/item?id=27677690
Senior engineers either already know exactly where the changes need to be made and can suggest what to do. They probably know the pitfalls, have established patterns, architectures and designs in their head. Juniors on the other hand don't have that, so they go with whatever. Nowadays a lot of them also "ask ChatGPT about its opinion on architecture" when told to refactor (a real quote from real junior/mid engineers), leading to either them using whatever sloppypasta they get provided.
Senior devs earned their experience of what is good/bad through writing code, understanding how hard and annoying it is to make a change, then reworking those parts or making them better the next time. The feedback loop was impactful beacause it was based on that code and them working with that code, so they knew exactly what the annoying parts are.
Vibe-coding juniors do not know that, their conversation context knows that. Once things get buggy and changes are hard, they will fill up their context with tries/retries until it works, leading to their feedback loop being trained on prompts and coding tools, not code itself.
Even if they read the outputted code, they have no experience using it so they are not aware of the issues - i.e. something would be better being a typed state, but they don't really use it so they will not care, as they do not have to handle the edge cases, they will not understand the DX from an IDE, they will not build a full mental model of how it works, just a shallow one.
This leads to insane inefficiencies - wasting 50 prompt cycles instead of 10, not understanding cross-codebase patterns, lack of learning transfer from codebase to codebase, etc.
With a minor understanding of state modeling and architecture, an vibe-coding junior can be made 100x more efficient, but due to the vibe-coding itself, they will probably never learn state modeling and architecture, learn to refactor or properly manipulate abstractions, leading to an eternal cycle of LLM-driven sloppypasta code, trained on millions of terrible github repositories, old outdated API's and stack overflow answers.
I don't think this is necessarily a massive moat for senior programmers. I feel it's a not a massive jump to teach AI architecture patterns and good data modelling?
I feel that anthropic etc al. just haven't got to that training stage yet.
That then leaves you with the mental model problem. Yes, there then a large context problem, but again I was wondering if setting up an MCP that presented the AI a meaningful class map or something might help.
Essentially give the AI a mental model of the code. I personally find class maps useless as they tend to clash with my own mental model. But it might work with AI. The class map can obviously be built without AI, but then you might even get AI to go through the code function by function and annotate the class map with comments about any oddities of each function. The MCP server could even limit the size of the map, depending on what part of the code it's looking to change (working on the email sending, don't bother sending them the UI later).
I'm guessing someone's already tried it given some of the ridiculous .Claude folders I've seen[1] but I've seen no-one talking about whether it works or not yet in the discussions I follow.
[1] That I suspect are pointlessly over complicated and make CC worse not better
The issue is that having them learn that on it's own is currently an inaccurate process with a lot of overlooking. I recently tried doing some of the techniques that fared well on smaller repositories on a giant monorepo, and while sometimes they did yield improvements, most often things got overlooked, dependencies forgot about, testing suites confused. And it wastes a ton of compute in the end for smaller yields.
It will get better, that I am sure of, but currently the best way is to introduce it an architecture, give it some samples so it can do what it does best - follow text patterns. But people are mostly trying to one-shot things with this magical AI they heard about without any proper investment of time and mindshare into it.
While some might say "oh that wont work well in legacy repositores, we got 6 architectures here", pointing that out and adding a markdown explaining each helps a ton. And not "hey claude generate me an architecture.md" but transferring the actual knowledge you have, together with all the thorny bits into documentation, which will both improve your AI usage and your organisation.
We are at the level of the original Waymo cars where they had to have a person behind the wheel ready to take the controls, just in case it inexplicably decided to drive off a bridge.
As Claude's safety driver I have to intervene in perhaps 25-50% of tasks. Some days I don't even open Claude because I'm working on a part of the codebase that has to be just so and I know Claude would mess it up.
The moat is real. I don't have any illusions that one day Claude will catch up but that's at least 5 years away.
Or until it does not. On numerous occasions I've observed LLMs get stuck in the endless loop of fix: one thing, break the other. Senior is capable of fixing it themselves and juniors may not even have a clue how the code works.
It was quite interesting to have discussions with him after his code check-ins and I think the whole process was a good educational experience for everybody who was involved. It would not have worked this way without a combination of AI and experienced people involved.
It can be really, really hard to tell when what it's producing is a bag of ** and it's leading you down the garden path. I've been a dev for 20 years (which isn't to imply I'm any good at it yet) and it's not uncommon I'll find myself leaning on the AI a bit too hard and then you realise you've lost a day to a pattern that wasn't right, or an API it hallucinated, in the first place.
It basically feels like I'm being gaslit constantly, even though I've changed my tools to some that feel like they work better with AIs. I expect it's difficult for junior devs to cope with that and keep up with senior devs, who normally would have offloaded tasks to them instead of AI.
If you have good tests and a good sense for design and you know how to constrain and direct the AI, you can avoid a lot of boring work. That is something.
For instance, I've been working on an app recently with some social share icon logos in svg.
Whenever I get it to tweak bits of code elsewhere in the same file, 80% of the time it goes and changes those svg icons, completely corrupting some of the logos, curiously consistent in how it does it. Several times I've had that slip through and had to go and undo it again, at which point it starts to feel like the superfast junior dev you're using has malign intent!
That’s the whole issue in a nutshell.
Can the output of a generative system be verified as accurate by a human (or ultimately verified by a human)
Experts who can look at an output and verify if it is valid are the people who can use this.
For anyone else it’s simply an act of faith, not skill.
It would be great if responses were tagged with uncertainty estimates.
It is much more difficult and time consuming to build a mental model of AI generated code and verify it than to build the damn thing yourself and verify it while it is fresh in your memory
So in the end, it's code that I know very, very well. I could have written it but it would have taken me about 3x longer when all is said and done. Maybe longer. There are usually parts that have difficult functions but the inputs and outputs of those functions are testable so it doesn't matter so much that you know every detail of the implementation, as long as it is validated.
This is just not junior stuff.
I am surprising myself these days with how fast I'm being using AI as a glorified Stack Overflow.
We are also having studies and posts come out that when actually tried side-by-side, the AI writes the coding route is slower, though the developer percieves it as faster.
I have this pattern while driving.
Using the main roads, when there is little to no traffic, the commute is objectively, measurably the fastest.
However, during peak hours, I find myself in traffic jams, so I divert to squiggly country roads which are both slower and longer, but at least I’m moving all the time.
The thing is, when I did have to take the main road during the peak traffic, the difference between it and squiggly country roads was like two to three minutes at worst, and not half an hour like I was afraid it would be. Sure, ten minutes crawling or standing felt like an hour.
Maybe coding with LLMs makes you think you are doing something productive the whole time, but the actual output is little different from the old way? But hey, at least it’s not like you’re twiddling your thumbs for hours, and the bossware measuring your productivity by your keyboard and mouse activity is happy!
Meanwhile we adults can do real work on a separate real computer. Never use their laptop more than absolutely minimum possible.
1) the bossware might take screenshots too
2) the bosses pay for the whole LLM so they expect you to use the whole LLM
3) you may not want to contaminate your spare computer with whatever shit you're working on on job, and indeed it may be considered a breach of security (as if feeding OpenAI/Anthropic isn't, lol, but that's beside the point.
So you continue to feel miserable, but you get your paycheck, and it's better than unemployment, and your kids are fed and clothed, so there's that.
I think the mixed reports on utility have a lot to do with the very different ways the tool is used and how much 'magic' the end-user expects versus how much the end-user expects to guide the tool to do the work.
To get the best out of it, you do have to provide significant amount of scaffolding (though it can help with that too). If you're just pointing it at a codebase and expecting it to figure it out, you're going to have mixed results at best. If you guide it well, it can save a significant amount of manual effort and time.
It doesn’t work. The only way it could is if the LLM has a testing loop itself. I guess in web world it could, but in my world of game dev, not so much.
So I stick with the method I outlined in OP and it is sometimes useful.
Yeah, this is a big thing I'm noticing a lot of people miss.
I have tons of people ask me "how do I get claude to do <whatever>?"
"Ask claude" is the only response I can give.
You can get the LLM to help you figure out how to get to your goal and write the right prompt before you even ask the LLM to get to your goal.
But I think, and this is just conjecture, that if you measure over a longer timespan, the ai assisted route will be consistently faster.
And for me, this is down to momentum and stamina. Paired with the ai, I’m much more forward looking, always anticipating the next architectural challenge and filling in upcoming knowledge and resource gaps. Without the ai, I would be expending much more energy on managing people and writing code myself. I would be much more stop-and-start as I pause, take stock, deal with human and team issues, and rebuild my capacity for difficult abstract thinking.
Paired with a good ai agent and if I consistently avoid the well known pitfalls of said agent, development feels like it has the pace of cross country skiing, a long pleasant steady and satisfying burn.
If I were to attack the same system myself without any LLM assist, I'd make a lot of choices to optimize for my speed and knowledge base. The code would end up much simpler. For something that would be handed off to another person (including future me) that can be a win. But if the system is self contained then going bigger and fancier in that moment can be a win. It all depends on the exact goals.
All in all, there's a lot of nuance to this stuff and it's probably not really replacing anyone except people who A) aren't that skilled to start with and B) spend more time yelling about how bad AI is than actually digging in and trying stuff.
Really does not sound like that from your description. It sounds like coaching a noob, which is a lot of work in itself.
Wasn’t there a study that said that using LLMs makes people feel more productive while they actually are not?
And if this is true, you will have to coach AI each time whereas a person should advance over time.
edit: because people are stupid, 'competitively' in this sense isn't some theoretical number pulled from an average, it's 'does this person feel better off financially working with you than others around them who don't work with you, and is is this person meeting their own personal financial goals through working with you'?
Junior is a person, not your personal assistant like LLM.
Also it is never a policy to pay competitively for the existing employees, only for the new hires.
As for humans, they might not have the motivation technical writing skill to document what they learnt. And even if they did, the next person might not have the patience to actually read it.
Also, a good few times, if it were a human doing the task, I would have said they both failed to follow the instructions and lied about it and attempted to pretend they didn’t. Luckily their lying abilities today are primitive, so it’s easy to catch.
I've been trying out Codex the last couple days and it's much more adherent and much less prone to lying and laziness. Anthropic says they're working on a significant release in Claude Code, but I'd much rather have them just revert back to the system as it was ~a month ago.
I've never had a model lie to me as much as Claude. It's insane.
It's funny. Just yesterday I had the experience of attending a concert under the strong — yet entirely mistaken — belief that I had already been to a previous performance of the same musician. It was only on the way back from the show, talking with my partner who attended with me (and who had seen this musician live before), trying to figure out what time exactly "we" had last seen them, with me exhaustively listing out recollections that turned out to be other (confusingly similar) musicians we had seen live together... that I finally realized I had never actually been to one of this particular musician's concerts before.
I think this is precisely the "experience" of being one of these LLMs. Except that, where I had a phantom "interpolated" memory of seeing a musician I had never actually seen, these LLMs have phantom actually-interpolated memories of performing skills they have never actually themselves performed.
Coding LLMs are trained to replicate pair-programming-esque conversations between people who actually do have these skills, and are performing them... but where those conversations don't lay out the thinking involved in all the many implicit (thinking, probing, checking, recalling) micro-skills involved in actually performing those skills. Instead, all you get in such a conversation thread is the conclusion each person reaches after applying those micro-skills.
And this leads to the LLM thinking it "has" a given skill... even though it doesn't actually know anything about "how" to execute that skill, in terms of the micro-skills that are used "off-screen" to come up with the final response given in the conversation. Instead, it just comes up with a prediction for "what someone using the skill" looks like... and thinks that that means it has used the skill.
Even after a hole is poked in its use of the skill, and it realizes it made a mistake, that doesn't dissuade it from the belief that it has the given skill. Just like, even after I asked my partner about the show I recall us attending, and she told me that that was a show for a different (but similar) musician, I still thought I had gone to the show.
It took me exhausting all possibilities for times I could have seen this musician before, to get me to even hypothesize that maybe I hadn't.
And it would likely take similarly exhaustive disproof (over hundreds of exchanges) to get an LLM to truly "internalize" that it doesn't actually have a skill it believed itself to have, and so stop trying to use it. (If that meta-skill is even a thing that LLMs have ever learned from their training data — which I doubt. And even if they did, you'd be wasting 90% of a Transformer's context window on this. Maybe something that's worth keeping in mind if we ever switch back to basing our LLMs on RNNs with true runtime weight updates, though!)
This is probably because the llm is trained on millions of lines of Go with nested error checks vs a few lines of contrary instructions in the instructions file.
I keep fighting this because I want to understand my tools, not because I care that much about this one preference.
These models are only going to get better and cheaper per watt.
What do you base this claim on? They have only gotten exponentially more expensive for decreasing gain so far - quite the opposite of what you say.
Humans aren’t tools.
Even if you do it by yourself, you need to do the same thinking and iterative process by yourself. You just get the code almost instantly and mostly correctly, if you are good at defining the initial specification.
The trick is knowning where the particular LLM sucks. I expect in a short amount of time there is no productivity gain but when you start to understand the limitations and strengths - holey moley.
It's more like x units of time thinking and y units of times coding, whereas I see people spend x/2 thinking, x typing the specs, y correcting the specs, and y giving up and correcting the code.
These are not _tools_ -they are like cool demos. Once you have a certain mass of functional code in place, intuition - that for myself required decades of programming to develop - kicks in and you get these spider sense tinglings ”ahh umm this does not feel right, something’s wrong”.
My advice would be don’t use LLM until you have the ”spider-sense” level intuition.
On a tangent; that study is brought up a lot. There are some issues with it, but I agree with the main takeaway to be weary of the feeling of productivity vs actual productivity.
But most of the time its brought up by AI skeptics, that conveniently gloss over the fact it's about averages.
Which, while organizationally interesting, is far less interesting than to discover what is and isn't currently possible at the tail end by the most skillful users.
Productivity is something that creates business value. In that sense an engineer who writes 10 lines of code but that code solves a $10M business problem or allows the company to sign 100 new customers may be the most productive engineer in your organization.
Taken along with the dozens of other studies that show that humans are terrible at estimating how long it will take them to complete task, you should be very skeptical when someone says an LLM makes them x% more productive.
There’s no reason to think that the most skillful LLM users are not overestimating productivity benefits as well.
There have been many more studies showing productivity gains across a variety of tasks that preceded that one.
That study wasn't necessarily wrong about the specific methodology they had for onboarding people to use AI. But if I remember correctly it was funded by an organization that was slightly skeptical of AI.
AI coding assistant trial: UK public sector findings report: https://www.gov.uk/government/publications/ai-coding-assista... - UK government. "GDS ran a trial of AI coding assistants (AICAs) across government from November 2024 to February 2025. [...] Trial participants saved an average of 56 minutes a working day when using AICAs"
Human + AI in Accounting: Early Evidence from the Field: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5240924 - "We document significant productivity gains among AI adopters, including a 55% increase in weekly client support and a reallocation of approximately 8.5% of accountant time from routine data entry toward high-value tasks such as business communication and quality assurance."
OECD: The effects of generative AI on productivity, innovation and entrepreneurship: https://www.oecd.org/en/publications/the-effects-of-generati... - "Generative AI has proven particularly effective in automating tasks that are well-defined and have clear objectives, notably including some writing and coding tasks. It can also play a critical role for skill development and business model transformation, where it can serve as a catalyst for personalised learning and organisational efficiency gains, respectively [...] However, these potential gains are not without challenges. Trust in AI-generated outputs and a deep understanding of its limitations are crucial to leverage the potential of the technology. The reviewed experiments highlight the ongoing need for human expertise and oversight to ensure that generative AI remains a valuable tool in creative, operational and technical processes rather than a substitute for authentic human creativity and knowledge, especially in the longer term.".
> On average, users reported time savings of 56 minutes per working day [...] It is also possible that survey respondents overestimated time saved due to optimism bias.
Yet in conclusion, this self-reported figure is stated as an independently observed fact. When people without ADHD take stimulants they also self-report increased productivity, higher accuracy, and faster task completion but all objective measurements are negatively affected.
The OECD paper supports their programming-related findings with the following gems:
- A study that measures productivity by the time needed to implement a "hello world" of HTTP servers [27]
- A study that measures productivity by the number of lines of code produced [28]
- A study co-authored by Microsoft that measures productivity of Microsoft employees using Microsoft Copilot by the number of pull requests they create. Then the code is reviewed by their Microsoft coworkers and the quality of those PRs is judged by the acceptance rate of those PRs. Unbelievably, the code quality doesn't only remain the same, it goes up! [30]
- An inspirational pro-AI paper co-authored by GitHub and Microsoft that's "shining a light on the importance of AI" aimed at "managers and policy-makers". [31]
Interesting analogy, because all those studies with objective measurements are defied by US students year by year, come finals seasons.
Regardless, I'm not saying it's a cheap or practical to get high this way, especially over the long term. People probably try stimulants because folk wisdom tells them that they'll get better grades. Then they get high and they feel like a superman from the dopamine rush, so they keep using them because they think it's materially improving their grades but really they're just getting high.
I don't need a study to tell me that five projects that have been stuck in slow plodding along waiting for me to ever have time or resources for nearly ten years. But these are now nearing completion after only two months of picking up Claude Code. And with high-quality implementations that were feverdreams.
My background is academic science not professional programming though and the output quality and speed of Claude Code is vastly better than what grad students generate. But you don't trust grad student code either. The major difference here is that suggestions for improvement loop in minutes rather than weeks or months. Claude will get the science wrong, but so do grad students.
(But sure technically they are not finished yet ... but yeah)
But in all seriousness, completion is not the only metric of productivity. I could easily break it down into a mountain of subtasks that have been fully completed for the bean counters. In the meantime, the code that did not exist 2 months ago does exist.
It takes an LLM 2-20 minutes to give me the next stage of output not 1-2 days (week?). As a result, I have higher context the entire time so my side of the iteration is maybe 10x faster too.
That's a significant difference. There are a lot of tasks that can be done by a n00b with some advice, especially when you can say "copy the pattern when I did this same basic thing here and here".
And there are a lot of things a n00b, or an LLM, can't do.
The study you reference was real, and I am not surprised — because accurately gauging the productivity win, or loss, obtained by using LLMs in real production coding workflows is also not junior stuff.
I don't have to worry about managing the noob's emotions or their availability, I can tell the LLM to try 3 different approaches and it only takes a few minutes... I can get mad at it and say "fuck it I'll do this part myself", the LLM doesn't have to be reminded of our workflow or formatting (I just tell the LLM once)
I can tell it that I see a code smell and it will usually have an idea of what I'm talking about and attempt to correct, little explanation needed
The LLM can also: do tons of research in a short amount of time, traverse the codebase and answer questions for me, etc
it's a noob savant
It's no replacement for a competent person, but it's a very useful assistant
i just soent some time cleaning up au code where it lied about the architecture so it wrote the wrong thing. the architecture is wonky, sure, but finding the wonks earlier would have been better
But that was just said by crappy influencers whose opinion doesn’t matter as they are impressed by examples result of overfitting
Like 19% weaker, according to the only study to date that measured their productivity.
306 more comments available on Hacker News