Ask HN: How to deal with long vibe-coded PRs? | Not Hacker News!

Q&A highlight

186 points

349 comments

Posted2 months agoActive2 months ago

Ask HN: How to deal with long vibe-coded PRs?

code reviewpull requestsAI-generated code

Ask HN: How to deal with long vibe-coded PRs?

No synthesized answer yet. Check the discussion below.

Discussion (349 comments)

Showing 160 comments of 349

2 months ago

1 reply

I'd just reject it for being ridiculous. It didn't pass the first step of the review process: the sniff test.

2 months ago

2 replies

Charitably, even though it is not what you or I would do, the pull request could be a best good faith effort of a real human being.

So to me, it's less about being ridiculous (and "ridiculous" is a fighting word) and more a simple "that's not how this team does things because we don't have the resources to work that way."

Mildly hurt feelings in the most likely worst case (no food for a viral overtop tweet). At best recruitment of someone with cultural fit.

2 months ago

1 reply

My objection to a PR like this has nothing to do with whether or not a human wrote it. It's that the PR is too large and complex. The reason I'd give for rejecting it would be that. I wouldn't say "it's ridiculous" as the reason. I would 100% be thinking that, though.

2 months ago

1 reply

That’s good.

My experience is “too large/complex” provides an opening for arguementivenes and/or drama.

“We don’t do it like this” does not so much. It is social, sufficient and not a matter of opinion (“too” is a matter of opinion).

2 months ago

What about "this is large and complex enough to be not the way we do things"?

2 months ago

It's very rare to see 9K PRs by humans. They usually take weeks of work if it's a serious change. No one wants to spend such an amount of time just to get rejected. You split it and send one by one while discussing changes.

But if it takes 1 hour with AI, you just send it right away...

2 months ago

7 replies

You review it like it wasn't AI generated. That is: ask author to split it in reviewable blocks. Or if you don't have an obligation to review it, you leave it there.

2 months ago

5 replies

This is it. The fact that the PR was vibe coded isn't the problem, and doesn't need to influence the way you handle it.

2 months ago

2 replies

It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity, and to not handle it with due specificity.

2 months ago

3 replies

I contend that, by far and away the biggest difference between entirely human-generated slop and AI-assisted stupidity is the irrational reaction that some people have to AI-assisted stuff.

2 months ago

3 replies

Calling things "slop" is just begging the question. The real differentiating factor is that, in the past, "human-generated slop" at least took effort to produce. Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".) Claude has no such inhibitions. So, when you look at a big bunch of code that you haven't read yet, are you more or less confident when you find out an LLM wrote it?

2 months ago

2 replies

I have pretty much the same amount of confidence when I receive AI generated or non-AI generated code to review: my confidence is based on the person guiding the LLM, and their ability to that.

Much more so than before, I'll comfortably reject a PR that is hard to follow, for any reason, including size. IMHO, the biggest change that LLMs have brought to the table is that clean code and refactoring are no longer expensive, and should no longer be bargained for, neglected or given the lip service that they have received throughout most of my career. Test suites and documentation, too.

(Given the nature of working with LLMs, I also suspect that clean, idiomatic code is more important than ever, since LLMs have presumably been trained on that, but this is just a personal superstition, that is probably increasingly false, but also feels harmless)

The only time I think it is appropriate to land a large amount of code at once is if it is a single act of entirely brain dead refactoring, doing nothing new, such as renaming a single variable across an entire codebase, or moving/breaking/consolidating a single module or file. And there better be tests. Otherwise, get an LLM to break things up and make things easier for me to understand, for crying out loud: there are precious few reasons left not to make reviewing PRs as easy as possible.

So, I posit that the emotional reaction from certain audiences is still the largest, most exhausting difference.

2 months ago

1 reply

clean code and refactoring are no longer expensive

Are you contending that LLMs produce clean code?

2 months ago

2 replies

They do, for many people. Perhaps you need to change your approach.

2 months ago

2 replies

If you can produce a clean design, the LLM can write the code.

2 months ago

Unless you're doing something fabulously unique (at which point I'm jealous you get to work on such a thing), they're pretty good at cribbing the design of things if it's something that's been well documented online (canonically, a CRUD SaaS app, with minor UI modification to support your chosen niche).

2 months ago

I think maybe there's another step too - breaking the design up into small enough peices that the LLM can follow it, and you can understand the output.

2 months ago

1 reply

The code I've seen generated by others has been pretty terrible in aggregate, particularly over time as the lack of understanding and coherent thought starts to show. Quite happy without it thanks, haven't seen it adding value yet.

2 months ago

Or is the bad code you've seen generated by others pretty terrible, but the good code you've seen generated by others blends in as human-written?

My last major PR included a bunch of tests written completely by AI with some minor tweaking by hand, and my MR was praised with, "love this approach to testing."

2 months ago

1 reply

I don't really understand your point. It reads like you're saying "I like good code, it doesn't matter if it comes from a person or an LLM. If a person is good at using an LLM, it's fine." Sure, but the problem people have with LLMs is their _propensity_ to create slop in comparison to humans. Dismissing other people's observations as purely an emotional reaction just makes it seem like you haven't carefully thought about other people's experiences.

2 months ago

2 replies

My point is that, if I can do it right, others can too. If someone's LLM is outputing slop, they are obviously doing something different: I'm using the same LLMs.

All the LLM hate here isn't observation, it's sour grapes. Complaining about slop and poor code quality outputs is confessing that you haven't taken the time to understand what is reasonable to ask for, aren't educating your junior engineers how to interact with LLMs.

2 months ago

1 reply

"My point is that, if I can do it right, others can too."

Can it also be, that different people work in different areas and LLM's are not equally good in all areas?

2 months ago

That was my first assumption, quite a while ago now.

2 months ago

???

People complaining about receiving bad code is, by definition, observation.

2 months ago

1 reply

If you try and one shot it, sure, but if you question Claude, point out the errors of its ways, tell it to refactor and ultrathink, point out that two things have similar functionality and could be merged. It can write unhinged code with duplicate unused variable definitions that don't work, and it'll fix it up if you call it out, or you can just do it yourself. (cue questions of if, in that case, it would just be faster to do it yourself.)

2 months ago

2 replies

I have a Claude max subscription. When I think of bad Claude code, I'm not thinking about unused variable definitions. I'm thinking about the times you turn on ultrathink, allow it to access tools and negotiate it's solution, and it still churns out an over complicated yet partially correct solution that breaks. I totally trust Claude to fix linting errors.

2 months ago

1 reply

If you are getting garbage out, you are asking it for too much at once. Don't ask for solutions - ask for implementations.

2 months ago

1 reply

Distinction without a difference. I'm talking about its output being insufficient, whatever word you want to use for output.

2 months ago

And I'm arguing that if the output wasn't sufficient, neither was your input.

You could also be asking for too much in one go, though that's becoming less and less of a problem as LLMs improve.

2 months ago

1 reply

It's hard to really discuss in the abstract though. Why was the generared code overly complicated? (I mean, I believe you when you say it was, but it doesn't leave much room for discussion). Similarly, what's partially correct about it? How many additional prompts does it take before you a) use it as a starting point b) use it because it works c) don't use any of it, just throw it away d) post about why it was lousy to all of the Internet reachable from your local ASN.

2 months ago

1 reply

I've read your questions a few times and I'm a bit perplexed. What kind of answers are you expecting me to give you here? Surely if you use Claude Code or other tools you'd know that the answers are so varying and situation specific it's not really possible for me to give you solid answers.

2 months ago

However much you're comfortable sharing! Obviously ideal would be the full source for the "overly complicated" solution, but naturally that's a no go, so even just more words than a two word phrase "overly complicated". Was it complicated because it used 17 classes with no inheritance and 5 would have done it? Was it overly complicated because it didn't use functions and so has the same logic implemented in 5 different places?

I'm not asking you, generically, about what bad code do LLMs produce. It sounds like you used Claude Code in a specific situation and found the generated code lacking. I'm not questioning that it happened to you, I'm curious in what ways it was bad for your specific situation more specifically than "overly complicated". How was it overly complicated?

Even if you can't answer that, maybe you could help me reword the phrasing of my original comment so it's less perplexing?

2 months ago

> Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".)

Given the same ridiculously large and complex change, if it is handwritten only a seriously insensitive and arrogant crackpot could, knowing what's inside, submit it with any expectation that you accept it without a long and painful process instead of improving it to the best of their ability; on the other hand using LLM assistance even a mildly incompetent but valuable colleague or contributor, someone you care about, might underestimate the complexity and cost of what they didn't actually write and believe that there is nothing to improve.

2 months ago

2 replies

Many of the people who submit 9000-line AI-generated PRs today would, for the most part, not have submitted PRs at all before, or would not have made something that passes CI, or would not have built something that looks sufficiently plausible to make people spend time reviewing it.

2 months ago

1 reply

Most of those people should still keep their ignorance to themselves, without bothering actual programmers, like they did before LLM hype convinced them that "sufficiently plausible" is good enough.

A similar trend: the popularity of electric scooters among youngsters who would otherwise walk, use public transport, or use decent vehicles increases accidents in cities.

2 months ago

I think my comment may have been misparsed. I was observing that one of the problems with LLMs is making it possible for people to produce 9000-line PRs they don't understand where previously they might have been gated by making something even remotely plausible that compiles or passes CI.

2 months ago

9000-line PRs were never a good idea, have only been sufficiently plausible because we were forced to accept bad PR review practices. Coding was expensive and management beat us into LGTMing them into the codebase to keep the features churning.

Those days are gone. Coding is cheap. The same LLMs that enable people to submit 9000 line PRs of chaos can be used to quickly turn them into more sensible work. If they genuinely can't do a better job, rejecting the PR is still the right response. Just push back.

2 months ago

Are you quite sure that's the only difference you can think of? Let me give you a hint: is there any difference in the volume for the same cost at all?

2 months ago

1 reply

> It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity

I 100% know what you mean, and largely agree, but you should check out the guidelines, specifically:

> Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.

And like, the problem _is_ *bad*. A fun, on-going issue at work is trying to coordinate with a QA team who believe chatgpt can write css selectors for HTML elements that are not yet written.

That same QA team deeply care about the spirit of their work, and are motivated by, the _very_ relatable sentiment of, you DONT FUCKING BREAK USER SPACE.

Yeah, in the unbridled, chaotic, raging plasma that is our zeitgeist at the moment, I'm lucky enough to have people dedicating a significant portion of their life to trying to do quality assurance in the idiomatic, industry best-standard way. Blame the FUD, not my team.

I would put to you that the observation that they do not (yet) grok what, for lack of a more specific universally understood term we are calling, "AI" (or LLMs if you are Fancy. But of course none of these labels are quite right). People need time to observe, and learn. And people are busy with /* gestures around vaguely at everything /*.

So yes, we should acknowledge that long-winded trash PRs from AI are a new emergent problem, and yes, if we study the specific problem more closely we will almost certainly find ever more optimal approaches.

Writing off the issue as "stupidity" is mean. In both senses.

2 months ago

I do not think that is being curmudgeonly. Instead, OP is absolutely right.

We collectively used the strategy of "we pretend we are naively stupid and dont talk directly about issues" in multiple areas ... and it failed every single time in all of them. It never solves the problem, it just invites to bad/lazy/whatever actors to play semantic manipulative games.

2 months ago

2 replies

It is 1995. You get an unsolicited email with a dubious business offer. Upon reflection, you decide it's not worth consideration and delete it. No need to wonder how it was sent to you; that doesn't need to influence the way you handle it.

No. We need spam filters for this stuff. If it isn't obvious to you yet, it will be soon. (Or else you're one of the spammers.)

2 months ago

1 reply

Didn’t even hit the barn, sorry. Codegen tools were obvious, review assistance tools are very lagging, but will come.

2 months ago

We already have some of them. And if you have a wide enough definition, we had them for a while.

2 months ago

The original ask was about one PR.

2 months ago

It is a huge problem. PR reviews are a big deal, not just for code reasons, but they are one of the best teaching tools for new hires. Good ones take time and mental energy.

Asking me to review a shitty PR that you don't understand is just disrespectful. Not only is it a huge waste of everyones time, you're forcing me to do your work for you (understanding and validating the AI solution) and you aren't learning anything because it isn't your work.

2 months ago

It's the problem. I often have to guide LLMs 2-4 times to properly write 150-300 LOC changes because I see how the code can be simplified or improved.

There is no way that 9000 lines of code are decent. It's also very hard to review them and find bad spots. Why spent your time in the first place? It probably took one hour for a person to generate it, but it will take ten to review and point out hundreds (probably) problems.

Without AI, no one would submit 9000 lines, because that's tens of hours of work which you usually split into logical parts.

2 months ago

It 100% is.

Why would I bother reviewing code you didn't write and most likely didn't read ?

2 months ago

7 replies

I’m curious how people would suggest dealing with large self-contained features that can’t be merged to main until they are production-ready, and therefore might become quite large prior to a PR.

While it would be nice to ship this kind of thing in smaller iterative units, that doesn’t always make sense from a product perspective. Sometimes version 0 has bunch of requirements that are non-negotiable and simply need a lot of code to implement. Do you just ask for periodic reviews of the branch along the way?

2 months ago

1 reply

> I’m curious how people would suggest dealing with large self-contained features that can’t be merged to main until they are production-ready

Are you hiding them from CIA or Al-Qaeda?

Feature toggles, or just plain Boolean flag are not rocket science.

2 months ago

Not rocket science, but I think there are also some tradeoffs with feature flags?

People could build on top of half-baked stuff because it’s in main. Or you might interact with main in ways that aren’t ready for production and aren’t trivial to toggle… or you just forget a flag check somewhere important.

I could also see schema/type decisions getting locked in too early while the feature is still in flux, and then people don’t want to change after it’s already reviewed since it seems like thrashing.

But yeah, definitely it’s one option. How do you consider those tradeoffs?

JonChesterfield

2 months ago

They come from people who have established that their work is worth the time to review and that they'll have put it together competently.

If it's a newcomer to the project, a large self contained review is more likely to contain malware than benefits. View with suspicion.

2 months ago

In our case, if such a thing happens (a few times per year across hundreds of people), a separate branch is created and a team working on that feature is completely autonomous for a while, while there is constant normal work in trunk by everyone else. Team tests their feature and adjacent code to an acceptable beta state but doesn't do any extensive or full coverage because it is impossible. Their code may be reviewed at that point if they request it, but it done as an extra activity, with meetings and stuff. Then they optionally give this build to the general QA to run full suite on it. This may be done in several cycles if fatal issues are found. Then they announce that they will do merge into trunk on days A to B and ask everyone to please hold off on committing into trunk in that time. Around that time they send a mail outlining changes and new functionality and potential or actual unfixed issues. QA teams runs as full cover of tests as possible. Merge may be reverted at this point if it is truly bad. Or if it good, team announces success and proceeds with normal work mode.

2 months ago

you line up 10-20 PRs and merge them in a temporary integration branch that gets tested/demoed. The PRs still have to be reviewed/accepted and merged into main separately. You can say 'the purpose of this pr is to do x for blah, see top level ticket'. often there will be more than one ticket based on how self-contained the PRs are.

2 months ago

The way we do it where I work (large company in the cloud/cybersecurity/cdn space):

- Chains of manageable, self-contained PRs each implementing a limited scope of functionality. “Manageable” in this context means at most a handful of commits, and probably no more than a few hundred lines of code (probably less than a hundred tbh).

- The main branch holds the latest version of the code, but that doesn’t mean it’s deployed to production as-is. Releases are regularly cut from stable points of this branch.

- The full “product” or feature is disabled by a false-by-default flag until it’s ready for production.

- Enablement in production is performed in small batches, rolling back to disabled if anything breaks.

2 months ago

The partial implementation could be turned off with a feature flag until it's complete.

2 months ago

I will schedule review time with coworkers I trust to go over it with them.

It is about ownership to me. I own my PRs. If I throw garbage out and expect you to fix it I am making you own my PRs. No one wants to be forced to own other peoples work.

2 months ago

2 replies

If you try to inspect and question such code, you will usually quickly run into that realisation that the "author" has basically no idea what the code even does.

"review it like it wasn't AI generated" only applies if you can't tell, which wouldn't be relevant to the original question that assumes it was instantly recognisable as AI slop.

If you use AI and I can't tell you did, then you're using it effectively.

2 months ago

If it's objectively bad code, it should be easy enough to point out specifics.

After pointing out 2-3 things, you can just say that the quality seems too low and to come back once it meets standards. Which can include PR size for good measure.

If the author can't explain what the code does, make an explicit standard that PR authors must be able to explain their code.

2 months ago

You are optimistic like the author even cared about the code. Most of the time you get another LLM response on why the code “works”

2 months ago

3 replies

If you ask them to break it into blocks, are they not going to submit 10 more AI-generated PRs (each having its own paragraphs of description and comment spam), which you then have to wade through. Why sink even more time into it?

2 months ago

1 reply

Being AI-generated is not the problem. Being AI-generated and not understandable is the problem. If they find a way to make the AI-generated code understandable, mission accomplished.

2 months ago

How much of their time should open source maintainers sink into this didactic exercise? Maybe someone should vibe-code a bot to manage the process automatically.

2 months ago

1 reply

There's probably also a decent chance that the author can't actually do it.

Let's say it's the 9000 lines of code. I'm also not reviewing 900 lines, so it would need to be more than 10 PRs. The code needs to be broken down into useful components, that requires the author to think about design. In this case you'd probably have the DSL parser as a few PRs. If you do it like that it's easier for the reviewer to ask "Why are you doing a DSL?" I feel like in this case the author would struggle to justify the choice and be forced to reconsider their design.

It's not just chopping the existing 9000 lines into X number of bits. It's submitting PRs that makes sense as standalone patches. Submitting 9000 lines in one go tells me that you're a very junior developer and that you need guidance in terms of design and processes.

For open source I think it's fine to simply close the PR without any review and say: Break this down, if you want me to look at it. Then if a smaller PR comes in, it's easier to assess if you even want the code. But if you're the kind of person that don't think twice about submitting 9000 lines of code, I don't think you're capable of breaking down you patch into sensible sub-components.

2 months ago

Some of the current AI coding tools can follow instruction like “break this PR up into smaller chunks”, so even a completely clueless user may be able to follow those instructions. But that doesn’t mean it’s worth a maintainer’s time to read the output of that.

2 months ago

I think breaking a big PR up like this is usually fair

Sometimes I get really into a problem and just build. It results in very large PRs.

Marking the PR as a draft epic then breaking it down into a sequence smaller PRs makes it much easier to review. But you can solicit big picture critique there.

I’m also a huge fan of documentation, so each PR needs to be clear, describe the bigger picture, and link back to your epic.

2 months ago

1 reply

My record is 45 comments on a single review. Merge conditions were configured so that every comment must be resolved.

If PR author can satisfy it - I'm fine with it.

2 months ago

1 reply

They will let AI somewhat satisfying it and ask you for further review

2 months ago

1 reply

Reminds me of curl problems with vulnerability report: https://news.ycombinator.com/item?id=43907376

At that point it is just malicious.

2 months ago

Some people genuinely believe agentic coding works great and they mastered it. Someone who PR a simple feature with its own DSL probably is on that team and won't see the issue with their way. They may think you are too old and resist AI. They probably would tell you if that's too much for your old fashioned coding skills, then just use an agent for the PR.

If you think that way, who cares about the code and additional DSL? If there is an issue or evolution required, we'll let AI work on it. If it works, just let it merge. Much cheaper than human reviewing everything.

I hate it, maybe I'm too old.

2 months ago

Eh, ask the author to split it in reviewable blocks if you think there's a chance you actually want a version of the code. More likely if it's introducing tons of complexity to a conceptually simple service you just outright reject it on that basis.

Possibly you reject it with "this seems more suitable for a fork than a contribution to the existing project". After all there's probably at least some reason they want all that complexity and you don't.

2 months ago

> Or if you don't have an obligation to review it, you leave it there.

Don’t just leave it there, that reflects badly on you and your project and pushes away good contributors. If the PR is inadequate, close it.

2 months ago

1 reply

Ask the submitter to review and leave their comments first or do a peer code review with them and force them to read the code. It's probably the first time they'll have read the code as well...

2 months ago

I really like this, the fact that vibe coded PRs are often bad is that people don't review it themselves first, they just look at the form, and if it looks vaguely similar to what they had in their mind, they'll just hit save and not ask the LLM for corrections

2 months ago

4 replies

Amazon eng did some research and found the number of comments in a code review is proportional to the number of lines changed. Huge CRs get little comments. Small CRs get a lot of comments. At Amazon, it's common to have a 150 to 300 line limit to changes. It depends on the team.

In your case, I'd just reject it and ensure repo merges require your approval.

2 months ago

6 replies

That’s a great way to discourage anyone ever doing any large scale refactoring, or any other heavy lifting.

2 months ago

1 reply

That's a good thing, large scale refactorings should be very, very rare. Even automated code style changes can be controversial because of the churn they create. For large and/or important software, churn should be left to a minimum, even at the cost of readability or code cleanliness. I've seen enough open source projects that simply state they won't accept refactoring / reformatting PRs.

2 months ago

1 reply

That means your code will stay old.

A new language feature is released, you cannot apply it to old code, since that would make a big PR. You need to do super slowly over time and most old code will never see it.

A better static type checker, that finds some bugs for you, you cannot fix them as your PR would be too big, you instead would need to make a baseline and split it up endlessly.

In theory yes, maybe a bit safer to do it this way, but discouraging developers to make changes is bad IMO. Obviously depends on your usecase, if you develop software that is critical to people's literal life, then you'll move more carefully.

But I wager 99% of the software the world produces is some commerce software, where the only thing lost is money.

2 months ago

> A new language feature is released, you cannot apply it to old code, since that would make a big PR.

Good. Don't change code for the sake of shiny new things syndrome.

> A better static type checker, that finds some bugs for you, you cannot fix them as your PR would be too big,

Good. Report each bug separately, with a suggested fix, categorised by region of the code. Just because you ran the program, that doesn't mean you understand the code well enough to actually fix stuff: those bugs may be symptomatic of a deeper issue with the module they're part of. The last thing you need is to turn accidentally-correct code into subtly-wrong code.

If you do understand the code well enough, what's the harm in submitting each bugfix as a separate (independent) commit? It makes it easier for the reviewers to go "yup, yup, yup", rather than having to think "does this part affect that part?".

2 months ago

Large-scale refactoring is not something you want from an external contributed, especially not if unsolicited.

Typically such refactoring is done by the core development team / maintainers, who are very familiar with the codebase. Also because DOING such a change is much easier than REVIEWING it if done by someone else.

2 months ago

That's good. Because large refactorings are usually harmful. They are also usually unplanned, not scoped and based on very unquantifiable observations like "I don't like the code is structured" - let's do ity way.

2 months ago

The review bots can be bypassed.

2 months ago

Just split up your work across multiple PRs.

2 months ago

You want to do large scale refactoring without the main team agreeing? Seems like a disaster.

2 months ago

"Inversely proportional" for what it's worth

2 months ago

Also, some teams have CR metrics that can be referenced for performance evaluations.

2 months ago

Could you please provide a reference? I couldn't find it.

2 months ago

1 reply

How long was this person working on it? Six months? Anything this big should’ve had some sort of design review. The worst is some junior going off and coding some garbage no one sees for a month.

2 months ago

You can churn this stuff out in about an hour these days though, seriously. Thats part of the problem, the asymmetry of time to create vs time to review.

If I can write 8 9k line PRs everyday and open them against open source projects, even closing them let alone engaging with them in good faith is an incredible time drain vs the time investment to create them.

2 months ago

1 reply

We are seeing a lot more drive by PRs in well known open source projects lately. Here is how I responded to a 1k line PR most recently before closing and locking. For context, it was (IMO) a well intentioned PR. It purported to implement a grab bag of perf improvements, caching of various code paths, and a clustering feature

Edit: left out that the user got flamed by non contributors for their apparently AI generated PR and description (rude), in defense of which they did say they were using several AI tools to drive the work. :

We have a performance working group which is the venue for discussing perf based work. Some of your ideas have come up in that venue, please go make issues there to discuss your ideas

my 2 cents on AI output: these tools are very useful, please wield them in such a way that it respects the time of the human who will be reading your output. This is the longest PR description I have ever read and it does not sound like a human wrote it, nor does it sound like a PR description. The PR also does multiple unrelated things in a single 1k line changeset, which is a nonstarter without prior discussion.

I don't doubt your intention is pure, ty for wanting to contribute.

There are norms in open source which are hard to learn from the outside, idk how to fix that, but your efforts here deviate far enough from them in what I assume is naivety that it looks like spam.

2 months ago

1 reply

Daniel Stenberg of curl gave a talk about some of what theyve been experiencing, mostly on the security beg bounty side. A bit hyperbolic, and his opinion is clear from the title, but I think a lot of maintainers feel similarly.

“AI Slop attacks on the curl project” https://youtu.be/6n2eDcRjSsk

2 months ago

1 reply

I think it's only fair to give an example where he feels AI is used correctly: https://mastodon.social/@bagder/115241241075258997

2 months ago

Wow very cool, theyve now closed 150 bugs identified via ai assistance/static analysis!

For ref, here is the post from Joshua Rogers about their investigation into the tooling landscape which yielded those findings

https://joshua.hu/llm-engineer-review-sast-security-ai-tools...

2 months ago

1 reply

Forget about code for a second. This all depends a lot of what goal does the PR achieve? Does it align with the goals of the project?

2 months ago

3 replies

How can you tell if it aligns with the goals of the project without reviewing 9000 lines of code first?

2 months ago

1 reply

Are you kidding me? You should be able to explain from the user PoV what does the PR achieve, a new feature? a bugfix?

That data point is waaaaaay more important than any other when considering if you should think about reviewing it or not.

2 months ago

1 reply

Okay, it does align. What next?

2 months ago

Ok great. Now that it aligns, how valuable it is? Is it a deadly & terrific feature? Then maybe you should review it. It's just fixing a nitpick? Then ask the contributor to find a less verbose change.

2 months ago

PRs rarely exist in a vacuum. Usually there is a ticket/issue/context which required a code change.

2 months ago

Read the title and description of the PR first. If that fails the sniff test, the code would as well.

2 months ago

1 reply

It's funny just today I published an article with the solution to this problem.

If they don't bother writing the code, why should you bother reading it? Use an LLM to review it, and eventually approve it. Then of course, wait for the customer to complain, and feed the complaint back to the LLM. /s

Large LLM generated PRs are not a solution. They just shift the problem to the next person in the chain.

throwawayffffas

2 months ago

2 replies

How do you know they didn't bother to write it? For all we know the submitter has been quietly hammering away at this for months.

2 months ago

1 reply

The title says it is vibe-coded. By definition, it means they didn't write it.

throwawayffffas

2 months ago

2 replies

But how do they know it's vibe-coded? It may have a smell to it. But the author might not know it for a fact. The fact it's vibe-coded is actually irrelevant the size of the request is the main issue.

2 months ago

I'm not gonna make assumptions on behalf of OP, but if you have domain knowledge, you can quickly tell when a PR is vibe-coded. In a real world scenario, it would be pretty rare for someone to generate this much code in a single PR.

And if they did in fact spend 6 months painstakingly building it, it wouldn't hurt to break it down into multiple PRs. There is just so much room for error reviewing such a giant PR.

2 months ago

You can recognize it by the rocket emojis in the PR description ;)

2 months ago

Then it would have extensive vcs history. Unless they just amend into one humongous commit.

2 months ago

1 reply

Be tactful and kind, but straightforward about what you can't/don't want to spend time reviewing.

"Thanks for the effort, but my time and energy is limited and I can't practically review this much code, so I'm closing this PR. We are interested in performance improvements, so you are welcome to pick out your #1 best idea for performance improvement, discuss it with the maintainers via ..., and then (possibly) open a focused PR which implements that improvement only."

2 months ago

2 replies

Depends on context of course, but in my book "my time and energy is limited" is not a valid reason for a reject. Get back once you have time, review in chunks.

2 months ago

1 reply

ivanjermakov, I don't know if you are an open source maintainer or not (I am, for several projects). If you are, and you follow the policy that "I will never reject PRs because of having no time, I will always get to it eventually", then I salute you. That is a self-sacrificing, altruistic position to take. It's also a very difficult position to maintain for the long term. If you can do it: congratulations!

As for me, my position is: "My project is my house. You want to be a guest in my house, you follow my rules. I really like people and am usually happy to answer questions from people who are reasonably polite, to review and provide feedback on their PRs, and so on. But I won't be pressured to prioritize your GitHub issue or PR over my work, my family, my friends, my health, or my personal goals in life. If you try to force me, I'll block you and there will be no further interaction."

If you don't like that position, well, I understand your feelings.

2 months ago

I'm absolutely with you on that. I'm not saying that every contribution deserves equal attention and that rejecting contributions is a bad/impolite thing.

There has to be a better reason than "your PR is too big" as it's likely just a symptom, also very much context sensitive. If it is a 5kLOC PR that adds a compiler backend for a new architecture then it probably deserves attention because of its significance.

But if it's obviously low quality code than my response would be that it is low quality code. Long story short, it's you (submitter) problem, not me (reviewer, BDFL) problem.

2 months ago

> is not a valid reason for a reject

As a reviewer or as a submitter?

2 months ago

1 reply

I'd just close it without comment. Or maybe if I'm feeling really generous I'll make a FAQ.md that gives a list of reasons why we'll close PRs without review or comment and link that in the close comments. I don't owe anyone any time on my open source projects. That said, I haven't had this issue yet.

2 months ago

1 reply

That's fine for an open source project, but many many companies are mandating AI use, they're putting it in performance reviews, they're buying massive Cursor subscriptions. You'd be cast as an obstructionist to AI's god like velocity ™.

2 months ago

Well in my case I'd just fire them if they opened a 9,000 LOC PR that they didn't understand.

throwawayffffas

2 months ago

2 replies

> How would you go about reviewing a PR like this?

Depends on the context. Is this from:

1. A colleague in your workplace. You go "Hey ____, That's kind of a big PR, I am not sure I can review this in a reasonable time frame can you split it up to more manageable pieces? PS: Do we really need a DSL for this?"

2. A new contributor to your open source project. You go "Hey ____, Thanks for your interest in helping us develop X. Unfortunately we don't have the resources to go over such a large PR. If you are still interested in helping please consider taking a swing at one of our existing issues that can be found here."

3. A contributor you already know. You go "Hey I can't review this ___, its just too long. Can we break it up to smaller parts?"

Regardless of the situation be honest, and point out you just can't review that long a PR.

2 months ago

3 replies

Telling a new contributor no thank you is hard. Open source contributors are hard to come by, and so I’ve always dealt with PRs like this (albeit before AI days but from people who had never written a line of code before their PR) by leaving a message that it’s a huge PR so it’s going to take a while to review it and a request to make smaller PRs in the future. A couple of times I ended up leaving over a hundred review comments, but most times they were all fixed and the contributor stuck around with many better PRs later.

2 months ago

1 reply

Git is flexible enough that you can tell people to break up their PR. They don't have to redo all their work.

If you want to be really nice, you can even give them help in breaking up their PR.

2 months ago

1 reply

Yeah exactly, the OP describes a completely new service built start to finish all in one merge request, where normally you'd start with a proposal and work from there.

2 months ago

You can even create the proposal retroactively from the PR, if you already have the PR.

2 months ago

The vast majority of PRs are bad. They could even be described as “selfish” in the sense that the “contributor” is haphazardly making whatever change minimally fixes their exact use case without consideration for the project’s style, health, usability, or other users. This isn’t outright malicious or even deliberately inconsiderate, but it still has a negative effect.

Refusing such a PR (which, again, is most of them) is easy. But it is also time consuming if you don’t want to be rude. Everything you point out as inadequate is a chance for them to rebut or “fix” in a way which is again unsatisfactory, which only leads to more frustration and wasted time. The solution is to be specific about the project’s goals but vague about the code. Explain why you feel the change doesn‘t align with what you want for the project, but don’t critique specific lines.

There are, of course, exceptions. Even when I refuse a PR, if it’s clear it was from a novice with good intentions and making an effort to learn, I’ll still explain the issues at length so they can improve. If it’s someone who obviously used an LLM, didn’t understand anything about what they did and called it a day, I’ll still be polite in my rejection but I’ll also block them.

Ginger Bill (creator of Odin) talked about PRs on a podcast a while back and I found myself agreeing in full.

https://www.youtube.com/watch?v=0mbrLxAT_QI&t=3359s

throwawayffffas

2 months ago

> Telling a new contributor no thank you is hard.

In life in general having the wherewithal to say no is a superpower. While I appreciate the concern about alienating newcomers, you don't start contributing to an existing project by adding 9k lines of the features you care about. I have not run any open source projects that accept external contributions, but my understanding in general is that you need to demonstrate that you will stick around before being trusted with just adding large features. All code is technical debt, you can't just take on every drive by pull request in hopes they will come back to fix it when it brakes a year down the line.

2 months ago

If it’s the first one I’d be going a step further back to see how the work was defined. More often than not I’d expect the PR comes from a ticket that is too broad in scope and could have been broken down with a bit of architectural thinking.

The problem being that once someone has put together a PR, it’s often too late to go back to the serious thinking step and you end up having to massage the solution into something workable.

2 months ago

3 replies

Everyone talking about having them break it down into smaller chunk. Vibe coding there is a near guarantee the person doesn't know what the code does either.

That alone should be the reason to block it. But LLM generated code is not protected by law, and by extension you can damage your code base.

My company does not allow LLM generated code into anything that is their IP. Generic stuff outside of IP is fine, but every piece has to flagged that it is created by an LLM.

In short, these are just the next evolution of low quality PRs.

2 months ago

1 reply

> Everyone talking about having them break it down into smaller chunk. Vibe coding there is a near guarantee the person doesn't know what the code does either.

that's the point though, if they can't do it, then you close the ticket and tell them to fork off.

2 months ago

2 replies

I agree, but you are potentially opening yourself up to 20+ PRs which are all vibe coded.

2 months ago

Copy and paste is your friend here. If there's 20+ huge PRs, just paste "This PR is far too large to review, please break it down and submit smaller PRs and engage with us ahead of time to understand how to solve this problem."

Comment & Close PR, only engage in discussions on tickets or smaller, understandable PRs.

As other have said: if someone drive-by opens a huge PR, it's as likely to be malware as a beneficial implementation.

2 months ago

You can read one or two and decide if it's worth going through. Otherwise you can just reject them.

2 months ago

1 reply

> Vibe coding there is a near guarantee the person doesn't know what the code does either.

Accepting code into the project when only one person (the author) knows what it does is a very bad idea. That's why reviews exist. Accepting code that zero persons know what it does is sheer screaming insanity.

2 months ago

1 reply

Unless it's not important. I think vibe coding is fine for self-hosted weekend projects / hackathons / POCs and only if there's no intersection with legal stuff (like PII or payment processing).

But for any open source or enterprise project? Hell no.

2 months ago

If you don't ever need to know or maintain the code, sure, it's not your code, you don't own it, in fact the code is disposable. For something like POC, where I don't care how it's done, I just want to see if it can be done - I've done it myself. Then if real code is needed, you throw the disposable one out or rewrite it completely. That's fine. But if it's a long term project, somebody needs to own it.

2 months ago

> Vibe coding there is a near guarantee the person doesn't know what the code does either.

Having spent some time vibe coding over the weekend to try it out, I disagree. I understand every line of code the super-specific Android app I generated does, even if I don't have the Android dev experience to come up with the code from the top of my head. Laziness is as good a reason to vibe code as inexperience or incompetence.

I wouldn't throw LLM code at a project like this, though, especially not in a PR of this size.

2 months ago

1 reply

That 10+ years old joke never gets old:

10 lines of code = 10 issues.

500 lines of code = "looks fine."

Code reviews.

2 months ago

I recently reached another milestone.

+153675, -87954 : I don't care. Just taking time to read it will take longer than bondo fix the related bugs.

2 months ago

Just reflect upon it, see if you gave him less time to complete it. I would just have a meet with him and confront it.

2 months ago

Don't accept this PR. If it's bot generated you are not here to review it. They can find a bot to review bot generated requests.

ChrisMarshallNY

2 months ago

I write full app suites that have less than 9000 LoC. I tend toward fewer, large-ish source files, separated by functional domains.

I once had someone submit a patch (back in the SVN days), that was massive, and touched everything in my system. I applied it, and hundreds of bugs popped up.

I politely declined it, but the submitter got butthurt, anyway. He put a lot of work into it.

2 months ago

reject outright. ask to split it into reasonable chain of changesets.

2 months ago

TBH, depends on what is being reviewed. Is it a prototype that might not see light of day and is only for proof-of-concept? Did an RFC doc precede it and reviewers are already familiar with the project? Were the authors expecting this PR? Was there a conversation before the PR was sent out? Was there any effort to have a conversation after the PR was shared? Was this even meant to be merged into main?

I'll just assume good intent first of all. Second, 9000 LOC spanning 63 lines is not necessarily an AI generated code. It could be a code mod. It could be a prolific coder. It could be a lot of codegen'd code.

Finally, the fact that someone is sending you 9000 LOC code hints that they find this OK, and this is an opportunity to align on your values. If you find it hard to review, tell them that I find it hard to review, I can't follow the narrative, its too risky, etc. etc.

Code review is almost ALWAYS an opportunity to have a conversation.

2 months ago

You vibe review it. I’m actually only half kidding here.

2 months ago

Use AI to generate the review, obviously.

2 months ago

With a middle finger

2 months ago

If it's full of the typical vibe-coded nonsense that's easy to spot upon a quick-but-close inspection (unused functions, dead-end variables and paths that don't make sense, excessively verbose and inaccurate comments, etc.), I would immediately reject.

2 months ago

Ideally you have a document in place saying this is how we handle vibe coding, something like: if you have the AI write the first version, it is your responsibility to make it reviewable.

The you can say (and this is hard), this looks like it is vibe code and misses that first human pass we want to see in these situations (link), please review and afterwards feel free to (re)submit.

In my experience they'll go away. Or they come back with something that isn't cleaned up and you point out just one thing. Or sometimes! they actually come back with the right thing.

2 months ago

In my opinion no PR should have so much changes. It's impossible to review such things.

The only exception is some large migration or version upgrade that required lots of files to change.

As far it goes for Vibe coded gigantic PRs It's a straight reject from me.

throwaway106382

2 months ago

You don't.

Was your project asking for all this? No? Reject.

2 months ago

Reject it and tell them to actually code it.

2 months ago

I made a /split-commit prompt that automatically splits a megacommit into smaller commits. I've found this massively helpful for making more reviewable commits. You can either run this yourself or send this to your coworker to have them run it before asking you to re-review it.

Sometimes it doesn't split it among optimal boundaries, but it's usually good enough to help. There's probably room for improvement and extension (eg. re-splitting a branch containing many not-logical commits, moving changes between commits, merging commits, ...) – contributions welcome!

You can install it as a Claude Code plugin here: https://github.com/KevinWuWon/kww-claude-plugins (or just copy out the prompt from the repo into your agent of choice)

2 months ago

The same way you review a non vibe coded pr. Whats that got to do with anything? A shit pr is a shit pr.

2 months ago

close button.

2 months ago

Please review this PR. Look carefully for bugs, security issues, and logical conflicts with existing code. Report 'Pass' if the PR is of sufficient quality or 'Fail' if you find any serious issues. In the latter case, generate a detailed report to pass along to the submitter.

(ctrl-v)

ripped_britches

2 months ago

Obviously by vibe reviewing it

2 months ago

How about this?

“This PR is really long and I’m having a hard time finding the energy to review it all. My brains gets full before I get to the end. Does it need to be this long?”

Force them to make a case for it. Then see how they respond. I’d say good answers could include:

- “I really trieeld to make it smaller, but I couldn’t think of a way, here’s why…”

- “Now that I think about it, 95% of this code could be pushed into a separate library.”

- “To be honest, I vibe coded this and I don’t understand all of it. When I try to make it smaller, I can’t find a way. Can we go through it together?”

2 months ago

Reject it and request the author makes it smaller.

PRs should be under 1000 lines.

The alternative is to sit down with them and ask what they're trying to accomplish and solve the problem from that angle.

2 months ago

This is effectively a product, not a feature (or bug). Ask the submitter how you can you determine if this meets functional and non-functional requirements, to start with?

2 months ago

excuse me, 9000? If that isn't mostly codegen, including some new plugin/API, or a fresh repository I'd reject it outright. LLM's or not.

In my eyes, there really shouldn't be more than 2-3 "full" files worth of LOC for any given PR (which should aim to to address 1 task/bug each. If not, maybe 2-3 at most), and general wisdom is to aim to keep "full" files around 600 LOC each (For legacy code, this is obviously very flexible, if not infeasible. But it's a nice ideal to keep in mind).

An 1800-2000 LOC PR is already pushing what I'd want to review, but I've reviewed a few like that when laying scaffolding for a new feature. Most PR's are usually a few dozen lines in 4-5 files each, so it's far below that.

9000 just raises so many red flags. Do they know what problem they are solving? Can they explain their solution approach? Give general architectual structure to their implementation? And all that is before asking the actual PR concerns of performance, halo effects, stakeholders, etc.

2 months ago

vibe review it with AI then run it on vibe production support. simple.

2 months ago

Reject it

2 months ago

Enforce stacked PRs, reject PRs over 500-1k LoC (I'd argue even lower, but it's a hard sell)

2 months ago

9000 LOC is way too long for a pull request unless there is some very special circumstance.

I would ask them to break it up into smaller chunks.

2 months ago

The same way you do a non vibe coded pr. If its a shit pr, its a shit pr.

189 more comments available on Hacker News