Your Job Is to Deliver Code You Have Proven to Work

https://v5.chriskrycho.com/elsewhere/seeing-like-a-programme...

15 days ago

1 reply

> Your job is to deliver code you have proven to work.

Your job is to solve customer problems. Their problems may only be solvable with code that is proven to work, but it is equally likely (I dare say even more likely) that their problem isn't best solved with code at all, or even solved with code that doesn't work properly but works well enough.

wrsh07

15 days ago

1 reply

I would argue that the word "proof" in the title might be misleading you.

From the post and the example he links, the point is that if you don't at least look at the running code, you don't know that it works.

In my opinion the point is actually well illustrated by Chris's talk here:

(summary of the relevant section if you're not going to click)

>>>

In the talk "Seeing Like a Programmer," Chris Krycho quotes the conductor and composer Eímear Noone, who said:

> "The score is potential energy. It's the potential for music to happen, but it's not the music."

He uses this quote to illustrate the distinction between "software as artifact" (the code/score) and "software as system" (the running application/music). His point is that the code itself is just a static artifact—"potential energy"—and the actual "software" only really exists when that code is executed and running in the real world.

15 days ago

1 reply

> if you don't at least look at the running code, you don't know that it works.

Your tests run the code. You know it works. I know the article is trying to say that testing is not comprehensive enough, but my experience disagrees. But I also recognize that testing is not well understood — and if you don't understand it well, you can get caught not testing the right things or not testing what you think you are. I would argue that you would be better off using that time to learn how to write great tests instead of using it to manually test your code, but to each their own.

wrsh07

11 days ago

To be clear, I don't think we disagree very much

I've seen people only run tests and break things (because the thing they broke wasn't covered by tests), I've seen people try to fix things and not verify that their fix works, etc

Good tests are sufficient in many cases to be confident that your code still works. But in general tests don't cover a lot of fundamental behavior, and if you don't exercise that fundamental behavior in one way or another, then you don't know that your code works

allcentury

15 days ago

1 reply

Manual testing as the first step… not very productive imo.

Outside in testing is great but I typically do automated outside in testing and only manual at the end. The loop process of testing needs to be repeatable and fast, manual is too slow

simonwAuthor

15 days ago

2 replies

Yeah that's fair, the manual testing doesn't have to sequentially go first - but it does have to get done.

I've lost count of the number of times I've skipped it because the automated test passed and then found there was some dumb but obvious bug that I missed, instantly exposed when I actually exercised the feature myself.

robryk

15 days ago

3 replies

Would automated tests that produce a transcript of what they've done allow perusing that transcript to substitute for manual testing?

pjc50

15 days ago

That sounds harder?

There's a lot of pedantry here trying to argue that there exists some feature which doesn't need to be "manually" tested, and I think the definition of "manual" can be pushed around a lot. Is running a program that prints "OK" a manual test or not? Is running the program and seeing that it now outputs "grue" rather than "bleen" manual? Does verifying the arithmetic against an Excel spreadsheet count?

There are programs that almost can't be manual, and programs that almost have to be manual. I remember when working on PIN pad integration we looked into getting a robot to push the buttons on the pad - for security reasons there's no way of injecting input automatically.

What really matters is getting as close to a realistic end user scenario as possible.

simonwAuthor

15 days ago

No. I've fallen for that trap in the past. Something inevitably catches you out in the end.

bluGill

15 days ago

The value of manual tests is when you "see something" that you didn't even think of.

15 days ago

1 reply

Maybe a bit pedantic, but does testing really need to be done, or is the intent here more towards being a usability review? I can't think of any time obvious bugs showed up not caught by the contract encoded in tests (there is no reason to write code that doesn't have a contractual purpose), but finding out that what you've created has an awful UX is something I have encountered and is much harder to encode in tests[1].

[1] As far as I can tell. If there are good solutions for this too, I'd love to learn.

RaftPeople

15 days ago

1 reply

> I can't think of any time obvious unintended behaviour showed up not caught by the contract encoded in tests

Unit testing, whether manual or automated, typically catches about 30% of bugs.

End to end testing and visual inspection of code are both closer to 70% of bugs.

15 days ago

Automated testing (there aren't different kinds) doesn't catch bugs, it defines a contract. Code is then written to conform to that contract. Bugs cannot be introduced as they would violate the contract.

Of course that is not a panacea. What can happen in the real world is not truly understanding what the software needs to do. This can result in tests not being aligned with what the software actually needs. It is quite reasonable to call the outcome of that "bugs", but tests cannot catch that either. The tests are where the problem lies!

Most aspects of software a pretty clear cut, though. You can reasonably define a full contract. UX is a particular area where I've struggled to find a way to determine what the software needs before seeing it, though. There is seemingly no objective measure that can be applied in determining if a UX is going to work in order to encode that in a contract. Of course, as before, I'm quite interested to learn how others are solving that problem.

andy99

15 days ago

1 reply

I think the problem is in what “proven” means. People that don’t know any better will just do that all with LLMs and still deliver the giant untested PRs but with some LLM written tests attached.

I vibe code a lot of stuff for myself, mostly for viewing data, when I don’t really need to care how it works. I’m coming around to the idea that outside of some specific circumstances where everyone has agreed they don’t need to care about or understand the code, team vibe coding is a bad practice.

If I’m paying an engineer, it’s for their work, unless explicitly agreed otherwise.

I think vibe coding is soon going to be seen the same way as “research” where you engage an offshore team (common e.g. in consulting) to give you a rundown on some topic and get back the first five google search results. Everyone knows how to do that, if it’s what they wanted they wouldn’t be hiring someone to do it.

simonwAuthor

15 days ago

3 replies

That's why I emphasized the manual testing component as well. Attaching a screenshot or video of a feature working to your PR is a great way to prove that you've actually seen it work correctly - at least once, which is still a huge improvement over it not actually working at all.

JambalayaJimbo

15 days ago

This might be useful when working on a low trust team but I can’t imagine doing that in my job, unless specifically working a poc or presentation.

Nizoss

15 days ago

Yes! This is something that I also value. Having demo gifs of before and after helps a lot. I have encountered situations where what I thought was a minor finishing clean up had an effect that I didn't anticipate. By including demos in the PR it becomes a kind of guardrail against those situations for me. I also think it is neat and generally helpful for everyone.

doganugurlu

15 days ago

If someone opened a PR, and it obviously doesn’t work but they claim they tested it, maybe that’s ok for the first time.

The second time it happens they gotta go.

I would find the expectation that I need to attach a screenshot insulting. And the understanding that my peers test their code to produce a screenshot would be pretty demoralizing.

15 days ago

25 replies

there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

Is anyone else seeing this in their orgs? I'm not...

[0] https://en.wikipedia.org/wiki/Ward_Cunningham#Law

15 days ago

5 replies

It's not a new phenomenon. Time was, people would copy-paste from blog posts with the same effect.

evilduck

15 days ago

1 reply

[delayed]

1-more

15 days ago

1 reply

I think it's impossible to actually write an email regex because addresses can have arbitrarily deeply nested escaping. I may have that wrong. I'd hope that regex would be .+@.+ and that's it (watch me get Cunninghammed because there is some valid address wherein those plusses should be stars).

notpachet

15 days ago

TIL Cunningham's Law[0]. I knew about that phenomenon but not the proper name. Thanks!

lm28469

15 days ago

1 reply

Always the same old tiring "this has always been possible before in some remotely similar fashion hence we should not criticise anything ever again" argument.

You could intuitively think it's just a difference of degree, but it's more akin to a difference of kind. Same for a nuke vs a spear, both are weapons, no one argues they're similar enough that we can treat them the same way

array_key_first

15 days ago

Yes, I'm so over this argument. It can literally be made for anything, and it is!

At the end of the day we're not performing war by poking other people with long sticks and we're not getting the word out by sending out a carrier pigeon.

Methods and medium matters.

troyvit

15 days ago

1 reply

I used to do that in simpler days. I'd add a link to where I copied it from so we could reference it if there were problems. This was for relatively small projects with just a few people.

15 days ago

2 replies

> I'd add a link to where I copied it from

LLMs can't do this.

Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from.

lcnPylGDnU4H9OF

15 days ago

1 reply

> Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from.

This is not a truism. "My" code might come from an LLM and that's fine if I can be reasonably confident it works. I might try to gain that confidence by testing the code and reading it to understand what it's doing. It is also true of blog post code, regardless of how I refer to the code; if I link to the blog post, it's because it does a better job of explaining than I ever could in code comments. Whether LLMs make one more productive is hard to measure but it seems to be missing the point to write this.

The point is, including the code is a choice and one should be mindful of it, no matter the code's origin. At that point, this comes off like you just have something to prove; there doesn't seem to be a reason not to use the LLM code if you know it works and you know why it works.

15 days ago

1 reply

Believing you know how it works and why it works is not the same as that actually being the case. If the code has no author (in that it's been plagiarised by a statistical process that introduces errors), there's nowhere to go if you realise "oops, I didn't understand that as well as I had thought!".

lcnPylGDnU4H9OF

15 days ago

1 reply

> If the code has no author ... there's nowhere to go if you realise "oops, I didn't understand that as well as I had thought!"

That's also true if I author the code myself; I can't go to anyone for help with it, so if it doesn't work then I have to figure out why.

> Believing you know how it works and why it works is not the same as that actually being the case.

My series of accidental successes producing working code is honestly starting to seem like real skill and experience at this point. Not sure what else you'd call it.

14 days ago

2 replies

> so if it doesn't work then I have to figure out why.

But it's built on top of things that are understood. If it doesn't work, then either:

• You didn't understand the problem fully, so the approach you were using is wrong.

• You didn't understand the language (library, etc) correctly, so the computer didn't grasp your meaning.

• The code you wrote isn't the code you intended to write.

This is a much more tractable situation to be in than "nobody knows what the code means, or has a mental model for how it's supposed to operate", which is the norm for a sufficiently-large LLM-produced codebase.

> My series of accidental successes

That somewhat misses the point. To write working code, you must have some understanding of the relationship between your intention and your output. LLMs have a poor-to-nonexistent understanding of this relationship, which they cover up with the ability to regurgitate (permutations of) a large corpus of examples – but this does not grant them the ability to operate outside the domain of those examples.

LLM-generated codebases very much do not lie within that domain: they lack the clues and signs of underlying understanding that human readers and (to an extent) LLMs rely on. Worse, the LLMs do replicate those signals, but they don't encode anything coherent in the signal. Unless you are very used to critically analysing LLM output, this can be highly misleading. (It reminds me of how chess grandmasters blunder, and struggle to even remember, unreachable board positions.)

Believing you know how LLM-generated code works, and why it works, is not the same as that actually being the case – in a very real sense that is different to that of code with human authors.

lcnPylGDnU4H9OF

14 days ago

1 reply

> "nobody knows what the code means, or has a mental model for how it's supposed to operate"

> Believing you know how LLM-generated code works, and why it works, is not the same as that actually being the case

This is a strawman argument which I'm not really interested to engage. You can assume competence. (In a scenario where one doesn't make these mistakes, what's left in your argument? It is a sufficiently strong claim to say these cannot be avoided such that it is reasonable to dismiss the claim unless supporting evidence is provided. In other words, the solution is as simple as not making these mistakes.) As I wrote up-thread, including the code is a choice and one should be mindful of it.

14 days ago

I am assuming competence. Competent people make these mistakes.

If "assume competence" means "assume that people do not make the mistakes they are observed to make", then why write tests? Wherefore bounds checking? Pilots are competent, so pre-flight checklists are a waste of time. Your doctor's competent: why seek a second opinion?

It's possible that you're just that good – that you can implement a solution "as simple as not making these mistakes" –, in which case, I'd appreciate if you could write up your method and share it with us mere mortals. But could it also be possible that you are making these mistakes, and simply haven't noticed yet? How would you know if your understanding of the program didn't match the actual program, if you've only tested the region in which the behaviours of both coincide?

fragmede

14 days ago

1 reply

Just like there are some easy "tells" with LLM generated English, vibecode has a certain smell to it. Parallel variables that do the same thing is probably the most common one I've seen in the hundreds of thousands of lines of vibecode I've generated and then reviewed (and fixed) by now. That's the philosophical Chinese room thought experiment though. It's a computer. Some sand that we melted into a special shape. Can it "understand"? Leave that for philosophers to decide. There's code, that was generated via LLM and not yacc, fine. Code is code though. If you sit down and read all of the code to understand what each variable, function, and class does, it doesn't matter where the code came from, that is what we call understanding what the code does. Sure, most people are too lazy to actually do that, and again, vibecode has a certain smell to it, but to claim that some because some artificial intelligence generated the code makes it incomprehensible to humans seems unsupported. It's fair to point out that there may not be humans that have bothered to, but that's a different claim. If we simplify the question, if ChatGPT generates the code to generate the Fibonacci sequence, can we, as humans, understand that code? Can we understand it if a human writes that same seven lines of code? As we scale up to more complex code though, at what point does it become incomprehensible to human grade intelligence? If it's all vibecode that isn't being reviewed and is just being thrown into a repo, then sure, no human does understand it. But it's just code. With enough bashing your head against it, even if there are three singleton factory classes doing almost the exact same thing in parallel and they only share state on Wednesdays over an RPC mechanism that shouldn't even work in the first place, but somehow it does, code is still code. There's not arcane hidden whitespace that whispers to the compiler to behave differently because AI generated it. It may be weird and different, but have you tried Erlang? You huff enough of the right kind of glue and you can get anything to make sense. If we go back to the Chinese room thought experiment though. If I, as a human, am able to work on tickets to cause intentional changes to the behavior of the vibecoded program/system that results in desired behavior/changes, at what point does it become actual understand vs merely thinking I understand the code.

Say you start at BigCo and are given access to their million line repo(s) with no docs and are given a ticket to work on. Ugh. You just barely started. But after you've been there for five years, it's obvious to you what the Pequad service does, and you might even know who gave it that name. If the claim is LLMs generate code that's simply incomprehensible by humans, the two counterexamples I have for you are TheDailyWtf.com, and Haskell.

14 days ago

> but to claim that some because some artificial intelligence generated the code makes it incomprehensible to humans seems unsupported

That's not my claim. My claim is that AI-generated code is misleading to people familiar with human-written code. If you've grown up on AI-generated code, I wouldn't expect you to have this problem, much like how chess newbies don't find impossible board states much harder to process than possible ones.

newsoftheday

15 days ago

Agreed on the first part for sure since an LLM is the computer/software version of a blender.

So, I'm agreed on the second part too then.

bgwalter

15 days ago

I don't see the problem with fentanyl given that people have been using caffeine forever.

nunez

15 days ago

Yeah, but being able to produce nuclear-sized 10k+ LOC PRs to open-source projects in minutes with relatively-zero effort definitely is. At least you had to use your brain to know which blog posts/SO answers to copypasta from.

15 days ago

1 reply

i left my last job because this was endemic

DonHopkins

14 days ago

1 reply

More likely you were fired for being an asshole.

AnimalMuppet

14 days ago

Hey, personal attacks are against site rules. You've been here long enough to know that.

briliantbrandon

15 days ago

2 replies

I'm seeing a little bit of this in the Fortune 100 I work at. However, I will add that the primary culprits are engineers that were submitting low quality PRs before they had access to LLMs, they can just submit them faster now.

15 days ago

2 replies

What's the ratio of people who things the right way vs not? I mean, is it a matter of giving them feedback to remind them what a "quality PR" is? Does that help?

15 days ago

1 reply

LLMs have dramatically empowered sociopath software developers.

If you are sufficiently motivated to appear more "productive" than your coworkers, you can force them to review thousands of lines of incorrect AI slop code while you sit back and mess around with your chatbots.

andy99

15 days ago

1 reply

Only if leadership lets them. Right now (anecdotally) a lot of “leaders” don’t understand the difference between AI generated and human generated work, and just look at loc as productivity so all incentives are on AI coding, but that will change.

heliumtera

15 days ago

It will never change. Managers will consider every stupid metric players push to sell their solutions. Be it code coverage, extensive CI/CD pipelines with useless steps, "productivity gains" with gen tools. The gen tools euphoria is stupid and will cease to exist, but before this was bdd,tdd,DDD, test before, test after, test your mocks, transpile to a different language and then ignore the output, code maturity, best practices, oop, pants in head oriented programming... There is always something stupid on the horizon this is certainly not the last stupid craze

briliantbrandon

15 days ago

1 reply

It's roughly 1/10 that are causing issues. Not a huge deal but dealing with them inevitably takes up a couple hours a week. We have a very weird team structure right now where my primary team is too large, and we also have a codebase that is shared with some other teams and our primary offenders are on one of those separate teams.

I think this is largely an issue that can be solved culturally within a team, we just unfortunately only have so much input on how other teams work. It doesn't help either when the low quality PRs are coming from Seniors on the other team and their manager doesn't seem to care about the feedback... Corporate politics are fun.

15 days ago

Yeah, I mean to get back to the original statement in the blog, this seems like less of a tech issue and more of a culture issue. The LLM enables the junior to do this once. It's the team culture that allows them to continue doing it.

lm28469

15 days ago

3 replies

LLMs are tools that make mediocre devs 100x more "productive" and good devs 2x more productive

15 days ago

6 replies

From my vantage I would argue LLMs make good devs around 0.65x more productive

quentindanjou

15 days ago

1 reply

They must not be that good if they use a tool that slows them.

It is really difficult to evaluate but most of the good dev I have seen uses LLMs more as of a code completion improvement than anything else, so around 10-20% more productive at best, but definitely not slowing them down.

15 days ago

you're deluding yourself

bluGill

15 days ago

1 reply

Good devs are still learning how to use LLMs, and so are willing to accept the 0.65x once in a while. Any complex tool will have a learning curve. Most tools improve over time. As such good devs either have found how to use LLMs to make them more productive (probably not 10x, but even 1.1x is something), or they try them again every few months to see if things are better.

15 days ago

1 reply

you are bending over backwards to figure out how to put "1.1x" in your comment

the idea that LLMs make developers more productive is delusional.

simonwAuthor

15 days ago

1 reply

Hi, delusional developer reporting for duty here.

Avicebron

15 days ago

1 reply

How are you measuring productivity these days Simon? Do you have a boss that has certain expectations? If you don't hit those are you going to lose your house?

simonwAuthor

15 days ago

1 reply

I work for myself, so mainly through guilt and self-doubt.

wiml

15 days ago

One of the things LLMs are demonstrably good at is eliminating self-doubt. That's why they're so disastrous.

square_usual

15 days ago

1 reply

Yep, that's why very accomplished, widely regarded developers like Mitchell Hashimoto and Antirez use them. They need to make programming more challenging to keep it fun.

15 days ago

1 reply

developers or cult leaders

swah

15 days ago

Mitchell shares the Amp threads on how he delivered some smaller features/fixes.

roblh

15 days ago

2 replies

I think they make good devs 2x more productive for the first month, which then slowly declines as that good dev spends less time actually writing and understanding and debugging code until it falls well below the 1x mark. It’s basically a high interest loan people take against their own skills. For some people that loan might be worth it. Maybe they’re trying to change their role in an organization and need the boost to start taking up new responsibilities they want to own. I think it’s temporary though. The slow shift into “skim mode”, where the authors just don’t quite put that same amount of effort into understanding what’s being churned out. I dunno, that’s just what I’ve seen.

AstroBen

15 days ago

[delayed]

candiddevmike

15 days ago

Because there's a mental overhead when you're not writing the code that is arguably worse than when you are writing the code. No one is talking about this enough IMO but that's why everyone is so exhausted when using LLMs and end up just pulling the slot machine until it works without actually reading it.

Reading code sucks, it always has. The flow state we all crave is when the code is in our working memory as an understood construct and we're just translating our mental model to a programming language. You don't get that with LLMs.

coffeebeqn

15 days ago

I just spent a day trying to get Claude to write reasonable unit tests and then after sleeping on it, reverted everything and did it myself. I’m not gonna be using it for a while because it 0.5x’d me once again

dsego

15 days ago

I think on average a dev can be x percent more productive, but there is a best case and worst case scenario. Sometimes it's a shortcut to crank out a solution quickly, other times the LLM can spin you in circles and you lose the whole day in a loop where the LLM is fixing its own mistakes, and it would've been easier to just spend some time working it out yourself.

chasd00

15 days ago

LLMs are great at spewing content and code is a form of "content". I think what we're seeing is software development turning into youtube. Content creators cranking out content, some is great, most is meh, a lot is really bad. I do find it all a bit funny and ironic. My wife was a journalist and bemoaned news blogs and social media for terrible terrible writing claiming it was journalism. She would tell me about how much work quality journalism is and all the mistakes these bloggers and social media make and how detrimental it was to society at large blah blah blah

Now the power to create tons and tons of code (ie content) is in the hands of everyone and here we are complaining about it just like my wife use to complain about journalism. I think the myth of the highly regarded Software Developer perched in front of the warming glow of a screen solving and automating critical problems is coming to an end. Deservedly really, there's nothing more special about typing words into an editor than, say, framing a house.

lunar_mycroft

15 days ago

[citation needed]. No study I've seen shows an even 50% productivity improvement for programming, let alone a 100% or 9900% improvement.

fnands

15 days ago

6 replies

A friend of mine is working for a small-ish startup (11 people) and he gets to work and sees the CTO push 10k loc changes straight to main at 3 am.

Probs fine when you are still in the exploration phase of a startup, scary once you get to some kind of stability

titzer

15 days ago

1 reply

That's...idiotic.

15 days ago

2 replies

LLMs are for idiots

204957065897

15 days ago

1 reply

hang yourself, retard.

15 days ago

he's mad because he knows i'm right.

LLMs made him an idiot and now he only writes a sentence by himself when he wants to say the R word

edit: the original comment said "Hang yourself Ret*rd"

titzer

15 days ago

I mean, I've vibe-coded a few useful single-file HTML tools, but checking in 10kloc at 3am into the production database...by the CTO...omg.

tossandthrow

15 days ago

1 reply

The cto is ultimately responsible for the outcome and will be there at 4am to fix stuff.

pjc50

15 days ago

Yes .. and no. Someone who does this will definitely make the staff clean up after them.

ryandrake

15 days ago

1 reply

I feel like this becomes kind of unacceptable as soon as you take on your first developer employee. 10K LOC changes from the CTO is fine when it's only the CTO working on the project.

Hell, for my hobby projects, I try to keep individual commits under 50-100 lines of code.

bonesss

15 days ago

Templates and templating languages are still a thing. Source generators are a thing. Languages that support macros exist. Metaprogramming is always an option. Systems that write systems…

If these AIs are so smart, why the giant LOCs?

Sure, it’s cheaper today than yesterday to write out boilerplate, but programming is about eliminating boilerplate and using more powerful abstractions. It’s easy to save time doing lots of repetitive nonsense, stopping the nonsense should be the point.

coffeebeqn

15 days ago

I worked with a “CTO” who did that before LLMs - one of the worst jobs I have had in the last 10 years. I spent at least 50% of my time putting out fires or refactoring his garbage code

peab

15 days ago

Lol I worked at a startup where the CTO did this. The problem was that it was pure spaghetti code. It was so bad it kept me up at night, thinking about how to fix things. I left within 30 days

jimbohn

15 days ago

I'd go mental if I was a SWE having to mop that up later

hexbin010

15 days ago

1 reply

[delayed]

endemic

15 days ago

Hah! I've been trying to push back on this sort of thought. The bot writes code for you, not you for the bot.

x3n0ph3n3

15 days ago

1 reply

It's been a struggle with a few teammates that we are trying to solve through explicit policy, feedback, and ultimately management action.

15 days ago

Yeah, a slice of this is technology related, but it's really a policy issue. It's probably easier to manage with a tighter team. Maybe I'm taking team size for granted.

davey48016

15 days ago

5 replies

A friend of mine has a junior engineer who does this and then responds to questions like "Why did you do X?" with "I didn't, Claude did, I don't know why".

https://github.com/WireGuard/wireguard-android/pull/82 https://github.com/WireGuard/wireguard-android/pull/80

15 days ago

1 reply

no hate but i would try to fire someone for saying that

KalMann

14 days ago

This but with hate.

tossandthrow

15 days ago

1 reply

That would be an immidiate reason of termination in my book.

fennecfoxy

15 days ago

Yes, if they can't debug + fix the reason the production system is down or not working correctly then they're not doing their job, imo.

Developers aren't hired to write code that's never run (at least in my opinion). We're also responsible for running the code/keeping it running.

Ekaros

15 days ago

I think words that would follow from me would get me send to HR...

And if it was repeated... Well I would probably get fired...

insin

15 days ago

See also "Why did you do X?" → Flurry of new commits → Conversation marked as resolved

And not just from juniors

gardenhedge

15 days ago

Some other comments suggest immediately firing.. but a junior engineer needs to be mentored. It should be explained to them clearly that they need to understand the changes they have made. They should also be pointed towards the coding standards and SDLC documentation. If they refuse to change their ways, then firing makes sense.

zx2c4

15 days ago

3 replies

Voila:

In that first one, the double pasted AI retort in the last comment is pretty wild. In both of these, look at the actual "files changed" tab for the wtf.

drio

14 days ago

Scary stuff.

I’d love to hear your thoughts on LLMs, Jason. How do you use them in your projects? Do they play a role in your workflow at all?

IshKebab

15 days ago

Yeah this guys comment here is spot on: https://github.com/WireGuard/wireguard-android/pull/80#issue...

I recently reviewed a PR that I suspect is AI generated. It added a function that doesn't appear to be called from anywhere.

It's shit because AI is absolutely not on the level of a good developer yet. So it changes the expectation. If a PR is not AI generated then there is a reasonable expectation that a vaguely competent human has actually thought about it. If it's AI generated then the expectation is that they didn't really think about it at all and are just hoping the AI got it right (which it very often doesn't). It's rude because you're essentially pawning off work that the author should have done to the reviewer.

Obviously not everyone dumps raw AI generated code straight into a PR, so I don't have any problem with using AI in general. But if I can tell that your code is AI generated (as you easily can in the cases you linked), then you've definitely done it wrong.

newsoftheday

15 days ago

That's a good example of what we're seeing as leads, thanks.

stackskipton

15 days ago

2 replies

Yep. Remember, people not posting on this website are just grinding away at jobs where their individual output does not matter, and entire motivation is work JUST hard enough not to get fired. They don't get stock grants, extremely favorable stock options or anything else, they get salary and MAYBE a small bonus based off business factors they have little control over.

My eyes were wide open when 2 jobs ago, they said they would be blocking all personal web browsing from work computers. Multiple Software Devs were unhappy because they were using their work laptop for booking flights, dealing with their kids schools stuff and other personal things. They did not have personal computer at all.

nutjob2

15 days ago

1 reply

They don't have phones?

stackskipton

15 days ago

They do but obviously laptop is easier than doing it on their phone. That’s what most of them ended up doing.

throw1235435

14 days ago

There are people posting on this website that are in that category; or in those companies. For example most people working outside America as a SWE who like the profession. The options to work for a place that gives stock options, and equity in general is small -> and generally in many countries is heavily penalised tax wise.

0x500x79

15 days ago

3 replies

I am currently going through this with someone in our organization.

Unfortunately, this person is vibe coding completely, and even the PR process is painful: * The coding agent reverts previously applied feedback * Coding agent not following standards throughout the code base * Coding agent re-inventing solutions that already exist * PR feedback is being responded to with agent output * 50k line PRs that required a 10-20 line change * Lack of testing (though there are some automated tests, but their validations are slim/lacking) * Bad error handling/flow handling

LandR

15 days ago

2 replies

Fire them?

JambalayaJimbo

15 days ago

This is not really an option for your standard IC.

0x500x79

15 days ago

I believe it is getting close to this. Things like this just take time though, and when this person talks to management/leadership they talk about how much they are producing and how everyone is blocking their work. So it becomes a challenging political maneuvering depending on the ability of certain leadership to see through the BS.

(By my organization, I meant my company - this person doesn't report to me or in my tree).

gardenhedge

15 days ago

Just reject the PR?

nunez

15 days ago

> 50k line PRs that required a 10-20 line change

This is hilarious. Not when you're the reviewer, of course, but as a bystander, this is expert-level enterprise-grade trolling.

neutronicus

15 days ago

1 reply

I'm not either

But LLMs don't really perform well enough on our codebase to allow you to generate things that even appear to work. And I'm the most junior member of my team at 37 years of age, hired in 2019.

I really tried to follow the mandate from on high to use Copilot, but the Agent mode can't even write code that compiles with the tools available to it.

Luckily I hooked it up to gptel so I can at least ask it quick questions about big functions I don't want to read in emacs.

notpachet

15 days ago

1 reply

> And I'm the most junior member of my team at 37 years of age

This sounds fucking awesome.

neutronicus

15 days ago

Would be nice to have someone enthusiastic junior to me.

I just had leadership on a big green-field project and IMO most of the team is ... looking forward to retirement a little too much to have really engaged (or taken leadership).

nbaugh1

15 days ago

1 reply

Not at all. Submitting untested PRs is a wildly outside of my experience. Having tests written to cover your code is a pre-requisite for having your PR reviewed on our team. "Does it work" aka passing manual testing, is literally the bare minimum before submitting a PR

ncruces

15 days ago

2 replies

If it's all vibe coded, how do you know — without review — that the new tests, for a new feature, test anything useful at all?

AnimalMuppet

14 days ago

When I was in a test-driven development environment, one of our rules was that you had to see the test fail. You had to prove that it would actually test what you were trying to test.

nbaugh1

7 days ago

We don't, that's why we do review it. We also do things like communicate with teammates, have expectations of not wasting other people's time, and try to uphold standards and meet SLAs. Maybe people should worry about why their teams are so dysfunctional rather than how the code was produced

eudamoniac

15 days ago

I started seeing it from a particularly poor developer sometime last year. I was the only reviewer for him so I saw all of his PRs. He refused to stop despite my polite and then not so polite admonishments, and was soon fired for it.

kaffekaka

15 days ago

I thought we were not, but we had just been lucky. A sequence of events lately have shown that the struggle is real. This was not a junior developer though, but an experienced one. Experience does not equal skill, evidently.

bdangubic

15 days ago

first time we’d see this there would be a warning, second one is pink slip

iamflimflam1

15 days ago

I'm seeing it on some open source projects I maintain. Recently had 10 or so PRs come in. All very valid features - but from looking at them, not actually tested.

zahlman

15 days ago

Quite a few FOSS maintainers have been speaking up about it.

ncruces

15 days ago

Yes, in the only successful OSS project that I “maintain.”

Fully vibe coded, which at least they admitted. And when I pointed out the thing is off by an order of magnitude, and as such doesn't implement said feature — at all — we get pressed on our AI policy, so as to not waste their time.

I don't have an AI policy, like I don't have an IDE policy, but things get ridiculous fast with vibe coding.

peab

15 days ago

Definitely seeing a bit of this, but it isn't constrained to junior devs. It's also pretty solvable by explaining to the person why it's not great, and just updating team norms.

JambalayaJimbo

15 days ago

I’ve been seeing obviously LLM generated PRs, but not huge ones.

ncruces

15 days ago

Yes, in the only successful project that I “maintain.”

Fully vibe coded, which at least they admitted. And when I pointed out the thing is off by an order of magnitude, and doesn't implement said feature — at all — we get pressed on our AI policy, so as to not waste their time.

I don't have an AI policy, like I don't have an IDE policy, but things get ridiculous fast with vibe coding.

bluGill

15 days ago

It isn't only junior engineers, but otherwise. It is a small number of people from all levels.

People do what they think they will be rewarded for. When you think your job is to write a lot of code then LLMs are great. When you need quality code you start to ask if LLMs are better or not?

Yodel0914

15 days ago

Not so much the huge PRs, but definitely the LLM generated code that the “developer” doesn’t understand.

nunez

15 days ago

I feel like a story about some open-source project getting (and rejecting) mammoth-sized PRs hits HN every week!

mrkeen

15 days ago

I don't see most PRs because they happen in other teams, but I am part of Slack channel where there are too many "oops" messages for my liking.

I.e. 1-2 times a month, there's an SQL script posted that will be run against prod to "hopefully fix data for all customers who were put into a bad state from a previous code release".

The person who posts this type of message most often is also the one running internal demos of the latest AI flows and trying to get everyone else onboard.

webdev1234568

15 days ago

4 replies

Whole article seems very much all llm generated

simonwAuthor

15 days ago

1 reply

Not a single word of it was. I wrote this one entirely in apple Notes, so there weren't even any VS Code completed sentences.

webdev1234568

15 days ago

1 reply

My biggest appologies, a very bad move on my part. I'll pay more attention before any sort of accusation like this

minimaxir

15 days ago

No one should be making any accusations of AI generation without strong evidence other than vibes. That hurts the cause of anti-AI use and punishes people who don't use it.

ramon156

15 days ago

Do elaborate, I don't see anything standing out

jairuhme

15 days ago

Did you read the article and come to that conclusion or just blindly count the number of em-dashes and assume that? Because I don't get the impression that it was LLM generated

ai_coder42

15 days ago

So what? as long as it conveys the point it was supposed to, should be fine IMO.

If we are accepting LLM generated code, we should accept LLM generated content as long as it is "proof read" :)

zkmon

15 days ago

1 reply

How about letting LLMs maintain a vast number of product versions all available at the same, which receive multiple versions of untested versions of the same patch, from LLMs, and then let the models elect a version of the software based on probabilistic or gradient methods? This elected version could change for different assessments. No human touches or looks at the code!

Just a wild thought, nothing serious.

throwuxiytayq

15 days ago

2 replies

Talk is cheap. Show me the proompt.

rkomorn

15 days ago

Had to search whether "proompt" a new meme misspelling.

New to me, but I'm on board.

zkmon

15 days ago

That's hard for me. Feed my comment to a model and ask for prompts.

Rperry2174

15 days ago

10 replies

Im not fully convinced by "a computer can never be held accountable"

We already delegate accountability to non-humans all the time: - CI systems block merges - monitoring systems page people - test suites gate different things

In practice accountability is enforced by systems, not humans.. humans are defintiely "blamed" after the fact, but the day-to-day control loop is automated.

As agents get better at running code, inspecting ui state, correlating logs, screenshots, etc they're starting to operationally be "accountable" and preventing bad changes from shipping and producing evidence when something goes wrong .

At some point humans role shifts from "i personally verify this works" to "i trust this verification system and am accountable for configuring it correctly".

Thats still responsibility, but kind of different from whats described here. Taken to a logical extreme, the arguement here would suggest that CI shouldn't replace manual release checklists

cess11

15 days ago

1 reply

Right, so how do you hold these things accountable? When your CI fails, what do you do? Type in a starkly worded message into a text file and shut off the power for three hours as a punishment? Invoice Intel?

falcor84

15 days ago

Well, we're not there yet, but I do envision a future, where some AIs work for as independent contractors with their own bank accounts that they want to maximize, and if such an AI fails in a bad way, its client would be able to fine it, fire it or even sue it, so that it, and the human controlling it would be financially punished.

simonwAuthor

15 days ago

1 reply

I need to expand on this idea a bunch, but I do think it's one of the key answers to the ongoing questions people have about LLMs replacing human workers.

Human collaboration works on trust.

Part of trust is accountability and consequences. If I get caught embezzling money from my employer I can lose my job, harm my professional reputation and even go to jail. There are stakes!

I computer system has no stakes, and cannot take accountability for its actions. This drastically limits what it makes sense to outsource to that system.

A lot of this comes down to my work on prompt injection. LLMs are fundamentally gullible: an email assistant might respond to an email asking for the latest sales figures by replying with the latest (confidential) sales figures.

If my human assistant does that I can reprimand or fire them. What am I meant to do with an LLM agent?