LLM Policy?
Postedabout 2 months agoActiveabout 2 months ago
github.comTechstoryHigh profile
heatednegative
Debate
80/100
LLMOpen-SourceGithubAI-Generated ContentSoftware Maintenance
Key topics
LLM
Open-Source
Github
AI-Generated Content
Software Maintenance
The GitHub issue discusses the problem of LLM-generated content in open-source projects, with maintainers expressing frustration and concerns about the quality and impact of AI-generated issues and pull requests.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
96
0-6h
Avg / period
15.3
Comment distribution138 data points
Loading chart...
Based on 138 loaded comments
Key moments
- 01Story posted
Nov 9, 2025 at 9:10 PM EST
about 2 months ago
Step 01 - 02First comment
Nov 9, 2025 at 9:10 PM EST
0s after posting
Step 02 - 03Peak activity
96 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 14, 2025 at 2:26 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45871531Type: storyLast synced: 11/20/2025, 8:42:02 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
And honestly, its becoming annoying
(I prefer GitLab, I'm sure if it had projects that are as popular it would be similarly inundated.)
For bigger projects with many maintainers that can also lead to problems if people use the block function as liberally as on Twitter.
Some of the software that I maintain is critical to container ecosystem and I'm an extremely paranoid developer who starts investigating any github issue within a few minutes of it opening. Now, some of these AI slop github issues have a way to "gaslight" me into thinking that some code paths are problematic when they actually are not. And lately AI slop in issues and PRs have been taking up a lot of my time.
https://github.com/photo/frontend/pull/1609
Maybe it’s only the really popular and buzzword-y repos that are targets?
In my experience, the people trying to leverage LLMs for career advancement are drawn to the most high profile projects and buzzwords, where they think making PRs and getting commits will give them maximum career boost value. I don’t think they spend time playing in the boring repos that aren’t hot projects.
There’s already more human produced content in the world than anyone could ever hope to consume, we don’t need more from AI.
But in general I think most people still remain excessively gullible and naive. Social media image crafting is one of the best examples of this. People create completely fake and idealized lives that naive individuals think are real. Now with AI enabling one to create compelling 'proof' of whatever lie you want, I think more people are becoming more suspicious of things that were, in fact, fake all along.
---
Going back to ancient times many don't know that Socrates literally wrote nothing down. Basically everything we know of him is thanks to other people, his student Plato in particular, instead writing down what he said. The reason for this was not a lack of literacy - rather he felt that writing was harmful because words cannot defend themselves, and can be spun into misrepresentations or falsehoods. Basically - the argumentative fallacies that indeed make up most 'internet debates', for instance. Yet now few people are not aware of this issue, and quotes themselves are rarely taken at face value, unless they confirm ones biases. People became less naive as writing became ubiquitous, and I think this is probably a recurring theme in technologies that transform our abilities to transfer information in some format or another.
There is the hope that in dumping so much slop so rapidly that it will break the bottom of the bucket. But there is the alternative that the bottom of the bucket will never break, it will just get bigger.
Sadly they also become suspicious of things that are, in fact, facts all along.
Video or photo evidence of a crime become useless the better AI gets
This is probably a good thing because photoshop and CGI have existed for a very long time and people shouldn't have the ability to frame an innocent for a crime or even get away with one just because they pirate some software and put in a few hours watching tutorials on youtube.
The sooner jurors understand that unverified video/photo evidence is worthless the better.
Additionally the trust in experts also went downhill, so verified will mean nothing.
Absolutely not. All that happened is most people became aware that "Nigerians offering you money are scammers." But they still fall for other get rich quick schemes so long as it diverges a little bit from that known pattern, and they'll confidently walk into the scam despite being warned and saying, "It's not a scam, dumbass. It's not like that Nigerian price stuff." If anything, people seem to be becoming more confident that they're in on some secret knowledge and everyone else is being scammed.
Also bizarrely a subsection of the population seems to be really into blatantly ai generated images, just hop onto Facebook and see for yourself. I wonder if it has something to do with whatever monkey brain things makes people download apps with a thumbnail of a guy shouting or watch videos that have a thumbnail of a face with its mouth open, since ai generated photos seem very centralized around a single face making a strange expression.
To take a less politically charged example, imagine there is fake content 'proving' that the Moon landing is faked. Is that going to meaningfully sway people who don't have a major opinion one way or the other? Probably not, certainly not in meaningful numbers. And in general I think the truth does come out on most things. And when people find they have been misled, particularly if it was somebody they thought they could trust, it tends to result in a major rubber-banding in the opposite direction.
AI is starting to show this effect - people stay away from em-dashes. There's that yellowish tinge and that composition which people avoid on art. Some of this is bad, but we can probably live without it.
Try opening YouTube in an incognito window sometime. Scrolling through a few, I see:
* Banned Amazon Products you NEED to See to Believe!
* This has NEVER Happened Before... (Severe Weather Channel)
* Our Dog got Married and had PUPPIES! THE MOVIE Emotional
* I WENT TO GHOST TOWN AND SOMETHING HEARTBREAKING...
Bonus points if said 'tuber is pointing at something with their hand and also a red arrow and/or circle which id also blurred out.
Intolerable.
My YouTube feed never recommends any of that garbage.
The "weakest" probably also involves selection bias. What HN comments are really good at is triggering associations for me with things I once read. Today I finally found what recently lived in my memory as a vague "scam" that used probabilities: the "stock market newsletter scam" from John Allen Paulos's book [1]. The scam works like this: at every step, two variants with different predictions are sent out for some market characteristic. Only those who receive the correct prediction get the next newsletter, which is again split into two prediction variants. This continues, filtering down to a final, much smaller subset of receivers who have seen a series of "correct" predictions. The goal is to create an illusion of super predictive power for that final group and then charge them a premium subscription price.
Maybe this kind of scam is too sophisticated or not as effective today (due to modern anti-spam measures), but I wonder what other kinds of "selection bias" scams exist today
[1] https://en.wikipedia.org/wiki/Innumeracy_(book)
If something "floods the zone with shit," it needs S amount of shit to cause a flood. But too much will eventually make the scam ineffectual. Widespread public distrust for the scam is (S+X)/time where X is the extra amount of shit beyond the minimum needed. Time is a global variable constrained by the rate at which people get burned or otherwise catch on to all other scams of the same variety. If we imagine that time-to-distrust shrinks with each new iteration of shit, then X the amount of excessive shit needed to trigger distrust should decrease over time.
The longer term problem is the externality where nothing is trusted, and the whole zone is destroyed. When that zone was "what someone wrote down that Socrates might have said," or "Protocols of the Elders of Zion," or "emails from unknown senders," that was one thing. A new baseline could be set for 'S'. When it's all writing, all art, all music and all commentary on those things, it seems catastrophic. The whole cave is flooded with shit.
But writing in itself has been obviously untrustworthy since it started existing - something being written down doesn't in any way make it trustworthy. The fact that audio recording, photography, and video enjoyed this undeserved reputation of being inherently trustworthy was an accident of technology, and has come to an end.
Just like with writing, though, this doesn't signal a real problem of any kind. You should still only trust writing, audio, or video based on the source - as you always should have. All that's ending is the era of putting undue trust in audio/video from untrusted sources.
Of course, the big problems will be in the transition period, when most people still think they can trust these sources, or will think they can't trust actually trustable sources instead. But this will be temporary as things readjust.
And again, audio and video have been untrustworthy for a long time, for sensitive things. You should not have trusted video in itself even in the 40s - 50s, and audio and photos probably even in the 1910s were already somewhat easily manipulated. And this is even true in a legal context - audio or video evidence is not evidence in itself, it is only part of a witness testimony who can attest to the provenance and veracity.
Any one person's writing was always untrustworthy, but the majority of that bad writing didn't make it to a printing press, nor was it mass-distributed.
Let's accept the proposition that all forms of media have always been full of lies. We can say that debunking always follows lies, truth spreads more slowly than fiction. The quantity and velocity of additional misinformation - especially when machines are involved in writing infinite amounts of it in the blink of an eye - lays waste to the normal series of events where a lie can be followed by a debunking with linear speed and velocity. With LLMs and social media manipulation, falsehoods gain traction exponentially while truths remain linear.
There is likely not a "transition period" where people will adjust to this, precisely because there is no mechanism to inform them they're being swindled and screwed faster than the takeoff of the algorithms that are now screwing them.
It was never difficult to publish large amounts of misinformation, AI is only making it cheaper.
Of course it is relevant. Discerning which sources to trust takes valuable time. Sources which were once trusted may need to be reevaluated.
>>It was never difficult to publish large amounts of misinformation, AI is only making it cheaper.
What is the difference between difficulty and expense?
I worry that because LLM slop also tends to be so well presented, it might compel software developers to start writing shabby code and documentation on purpose to make it appear human.
It is still a serious problem just want that to be abundantly clear. Several thousand people (in the US alone) fall for it every year. I used to browse 419 eater regularly and up until a few years ago (when I last really followed this issue) these scams were raking in billions a year. Could be more or less now but doubt it’s shifted a ton.
Even if you think the harms of AI/machine generated content outweigh the good, this is not a winning argument.
People don’t just consume arbitrary content for the sake of consuming any existing content. That’s rarely the point of it. People look for all kinds of things that don’t exist yet — quite a lot of it referring to things that are only now known or relevant in the given moment or to the given niche audience requesting it. Much of it could likely never exist if it weren’t possible to produce it on demand and which would not be valuable if you had to wait for a human to make it.
A few days ago I definitely got into an A/B test where the search results were:
- 5 shorts one under another
- new section with one or two videos and one or two shorts
- new section with five or more shorts in a horizontal layout
- new section with videos of which 20-30% were shorts
It's insane
Even worse is that we're banning TikTok because it's bad for the kids (short form algorithmic content), Snapchat (similar thing + strangers creeping) and Instagram Stories (algorithm again).
BUT there is NO WAY for a parent to allow their kid to use Youtube AND block Shorts. (yes there are browser plugins etc, but how do you enforce them on a child?)
And from what I've seen the AI slop on Shorts is so fucking bad that it seems we just collectively forgot about Elsagate...
For your winning argument, what would you use to prevent slop filling up your feed when there is more AI generated content, any sort of protocol that you have?
> AI slop is digital content made with generative artificial intelligence, specifically when perceived to show a lack of effort, quality or deeper meaning, and an overwhelming volume of production.
https://en.wikipedia.org/wiki/AI_slop
Attention spans for long-form content at are at all-time lows, judging by the metrics I've seen from various platforms across different media types.
The only solution I can see is a hard-no policy. If I think this bug is AI, either by content or by reputation, I close without any investigation. If you want it re-opened, you'll need to IRL prove its genuine in an educated, good-faith approach that involves independent efforts to debug.
> "If you put your name on AI slop once, I'll assume anything with your name on it is (ignorable) slop, so consider if that is professionally advantageous".
All comes down to accountability.
IF you want LLM code reviews, there are bots for codex, copilot, claude etc that plug straight into Github PRs and review the code automatically.
Some of them are actually useful, some just plain wrong and some are subtly wrong and you need to spend some time figuring out whether it's right or wrong :)
IMO it's still a net positive because LLMs tend to pick up really weird subtle errors that humans easily gloss over.
I can't imagine that any policy against LLM code would allow this sort of thing, but I also imagine that if I don't say "this was made by a coding agent", that no one would ever know. So, should I just stop contributing, or start lying?
[append] Getting a lot of hate for this, which I guess is a pretty clear answer. I guess the reason I'm not receiving the "fuck off" clearly is because when I see these threads of people complaining about AI content, it's really clearly low-quality crap that (for example) doesn't even compile, and wastes everyone's time.
I feel different from those cases because I did spend my time to resolve the issue for myself, did review the code, did test it, and do stand by what I'm putting under my name. Hmm.
I've also been on the other side of this, receiving some spammy LLM-generated irrelevant "security vulnerabilities", so I also get the desire for some filtering. I hope projects don't adopt blanket hard-line "no AI" policies, which will only be selectively enforced against new contributors where the code "smells like" LLM code, but that's what I'm afraid will happen.
Well, we don't receive that many low-quality PRs in general (I opened this issue to discuss solutions before it becomes a real problem). Speaking personally, when it does happen I try to help mentor the person to improve their code or (in the case where the person isn't responsive) I sit down and make the improvements I would've made and explain why they were made as a comment in the PR.
When it comes to LLM-generated code, I am now going to be going back-and-forth with someone who is probably just going to copy-paste my comments into an LLM (probably not even bothering to read them). It just feels disrespectful.
> I hope projects don't adopt blanket hard-line "no AI" policies, which will only be selectively enforced against new contributors where the code "smells like" LLM code, but that's what I'm afraid will happen.
Well, this is a two-way street -- all of the LLM-generated PRs and issues I've seen so far do not say that they are LLM-generated, in a way that I am tempted to describe as "dishonest". If every LLM-generated PR was tagged as such, I might have a different outlook on the situation (and might instead be willing to reviewing these issues but with lower priority).
The "hard-line policy" would then shift from being "used LLM tools" to "lied on the LLM usage disclosure", and it feels a lot less like selective enforcement (from my perspective). Obviously it won't stop these spammy issues/PRs, but neither will a hard-line policy against all AI.
First, my original comment was going to ask if you're looked at what any other reputable repos are doing. Specifically popular FOSS projects that are not backed by a company looking to sell AI. Do any of them have a positive Policy, or positions that you want to include?
Second, if I was forced to take a stand on AI, I would duplicate the policy from Zig. I feel their policy hits the exact ethos FOSS should strive for. They even ban AI for translations, because the reader is just as capable a participant. And importantly, asking the author to do their best (without AI), and trust the reader to also try their best encourages human communication. It also gives the reader control and knowledge over the exact amount of uncertainty introduced by the LLM, which is critically important to understanding a poor quality bug report from a helpful users who is honestly trying to help. Lobste.rs github disallows AI contribution for an entirely different reason I haven't seen covered in your GH thread yet
Finally, you posted the Issue as an RFC. but then explicitly excluded, HN from commenting on the issue. I think that was a fantastic decision, and expertly written. (I also appreciate that lesson in tactfulness :) ) That said, if you're actually interested in requesting comments or thoughts you wouldn't have considered, I would encourage you to make a top level RFC comment this thread. There will likely be a lot of human slop to wade through, but occasionally I'll uncover a genuinely great comment on HN that improves my understanding. Here I think the smart pro-AI crowd that might have an argument I want to consider, but would be unlikely to on my own because of my bias on the quality of AI. Such a comment might would be likely to appear on HN, but the smart people who I'd want to learn from, would never comment on the GH thread now, and I appreciate it when smart people I disagree with, contribute to my understanding.
PS Thanks for working on opencontainers, and caring enough to keep trying to make it better, and healthier! I like having good quality software to work with :)
Well, I posted this as an RFC for other runc maintainers and contributors, I didn't expect it to get posted to Hacker News. I don't particularly mind hearing outsiders' opinions but it's very easy for things to get sidetracked / spammy if people with no stake in the game start leaving comments. My goal with the comment about "don't be spammy" was exactly that -- you're free to leave a comment, just think about whether it's adding to conversation or just looks like spam.
> Specifically popular FOSS projects that are not backed by a company looking to sell AI. Do any of them have a positive Policy, or positions that you want to include?
I haven't taken a very deep look, but from what I've seen, the most common setups are "blanket ban" and "blanket approval". After thinking about this for a few days, I'm starting to lean more towards:
Though if we end up with such a policy we will need to add AGENTS.md files to try to force this to happen, and we will probably need to have very harsh punishments for people who try to skirt the requirements.> Lobste.rs github disallows AI contribution for an entirely different reason I haven't seen covered in your GH thread yet
AFAICS, it's because of copyright concerns? I did mention it in my initial comment, but I think that far too much of our industry is turning a blind eye to that issue that focusing on that is just going to lead to drawn out arguments with people cosplaying as lawyers (badly). I think that even absent of the obvious copyright issues, it is not possible to honestly sign the Developer Certificate of Origin[1] (a requirement to contribute to most Linux Foundation projects) so AI PRs should probably be rejected on that basis alone.
But again, everyone wants to discuss the utility of AI so I thought that was the simplest thing to start the discussion with. Also the recent court decisions in the Meta and Anthropic cases[2] (while not acting as precedent) are a bit disheartening for those of us with the view that LLMs are obviously industrial-grade copyright infringement machines.
[1]: https://developercertificate.org/ [2]: https://observer.com/2025/06/meta-anthropic-fair-use-wins-ai...
Nominatively, yes. But I think I would describe it as risk tolerance. I'm going to be one of those bad cosplayers and assert that the two rulings mentioned even if they were precedent setting, don't actually apply to the risks themselves. Could you win a case is much less important than if you could survive the court costs. There's no doubt some value in LLM based code generation to many individuals. But does it's value outweigh the risks to a community?
> and we will probably need to have very harsh punishments for people who try to skirt the requirements.
I would need to spend hours of time to articulate exactly how uncomfortable this would make me if I was working along side you. So please forgive this abbreviated abstract. One of the worst things you can do to a community, is put it on rails towards an adversarial relationship. There's going to be a lot of administrative overhead to enabling this, it will be incredibly difficult to get the fairness correct the first time, and I assume (possibly without cause?) it's unlikely to feel fair to everyone if you ever need to enforce it. Is that effort and attention and time best spent there?
I believe that no matter what you decide, blanket acceptance, vs blanket denial, vs some middle ground, you're going to have to spend some of the reputation of the project on making the new rule.
If you ban it, you will turn away some contributions or new contributors, and a small subset of committers may see their velocity decrease. This counts for some value loss (some positive and some negative) But also accounts for decreased time costs... or rather it enables you to spend more time on people and their work instead.
If you allow it, you adopt a large set of new poorly understood risks, and administrative overhead, and time you could have spent working with other people... It will also, turn away contributors.
I'm not going to pretend like there was a chance in hell anyone should believe that I was likely to contribute to runc. It's possible in some hypothetical, but extremely unlikely in the current reality. And, if I cared enough about the diff I wanted to submit upstream, I still would open a PR... but, I saw an AGENTS.md in a different repo that I was considering using, was disappointed and decided not to use that repo. Seeing runc embrace AI code generation would without a doubt, cause me to look for an alternative, I assume a reasonable alt probably doesn't exist, and I would resign myself to the disappointment of using runc. I agree with your argument that it's commercial grade copyright laundering, but that's not my core ethical objection to its use.
> In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.
You're damned if you do, and damned if you don't. So the only real suggestion that I have is make sure you remember to optimize for how you want to spend your time. Calculate not just the expected value of the code within the repo, but the expected value of the people working on the repo.
I think this came out a little wrong -- my point was that if we are going to go with a middle-ground approach then we need to have a much lower tolerance for people who try to abuse the trust we gave in providing a middle-ground. (Also, there is little purpose in having a policy if you don't enforce it.)
For instance, someone knowing that I will deprioritise LLM PRs, and instead of deciding to write the code themselves or accept that that what I work on is my own personal decision to make, they instead decide to try to mask their LLM PR and lie about it -- I would consider this to be completely unacceptable behaviour in any kind of professional relationship.
(For what it's worth, I also consider it bad form to submit any patches or bug reports generated by any tool -- LLM or not -- without explaining what the tool was and what you did with it. The default assumption I have when talking to a human is that they personally did or saw something, but if a tool did it then not mentioning it feels dishonest in more ways than one.)
I did see that lobste.rs did a fairly cute trick to try to block agentic LLMs[1].
[1]: https://github.com/lobsters/lobsters/pull/1733
I think it came out exactly perfectly. Unrelated to this specific topic, I've been thinking a lot lately about reward vs punishment as a framework for promoting pro-social environments. I didn't read far into what you said. I was merely pattern matching it back to the common mistakes I see and want to discourage.
> but if a tool did it then not mentioning it feels dishonest in more ways than one.
Yeah, plagiarism is shockingly common. It's a sign of lacking the skill or ability to entertain 2rd order, or 3rd order thoughts/ideas.
If a project has a stated policy that code written with an LLM-based aid is not accepted, then it shouldn't be submitted, same as with anything else that might be prohibited. If you attempt to circumvent this by hiding it and it is revealed that you knowingly did so in violation of the policy, then it would be unsurprising for you to receive a harsh reply and/or ban, as well as a revert if the PR was committed. This would be the same as any other prohibition, such as submitting code copied from another project with an incompatible license.
You could argue that such a blanket ban is unwarranted, and you might be right. But the project maintainers have a right to set the submission rules for their project, even if it rules out high-quality LLM assisted submissions. The right way to deal with this is to ask the project maintainers if they would be willing the adjust the policy, not to try to slip such code into the project anyway.
I think it's also important to disclose how rigorously you tested your changes, too. I would hate to spend my time looking at a change that was never even tested by a human.
It sounds like you do both of these. Judging by the other replies, it seems that other reviewers may take a harsher stance, given the heavily polarized nature of LLMs. Still, if you made the changes and you're up front about your methodology, why not? In the worst case, your PR gets closed and everybody moves on.
I think you don't deserve the downvotes and, if you really do what you say you do, that's the ONLY way to use LLMs for coding and contributing to opensource software, or to a company's software. Sadly, the vast majority of LLM users don't and will never use it like that. And while they can get fired for being useless monkeys in a real company, they will keep sending PRs to opensource software, so that's clearly a different scenario that needs a different solution.
Honestly, this is kind of where I see LLM generated content going where you'll have to pay for ChatGPT 9 to get information because all the other bots have vandalized all the primary sources.
What's really fascinating is you need GPUs for LLMs. And most LLM output is, well, garbage. What did you previously need GPUs for? Mining crypto and that is, at least in the case of Bitcoin, pointless work for the sake of pointless work ie garbage.
I can see a future in our lifetimes where a significant amount of our capital expenditure and energy consumption is used, quite simply, to produce garbage.
Behold! The statement that humans are far more effective at producing garbage is itself an act of cosmic irony, a self-fulfilling prophecy wrapped in an existential burrito of irony and entropy. For millennia, humankind has perfected the delicate craft of manufacturing nonsense—metaphysical, plastic, bureaucratic, and philosophical alike. From the first cave painting of a mammoth with suspiciously small legs, to the modern miracle of twenty-seven identical smartphone chargers that fit nothing you own, the human race has stood proudly as the apex predator of inefficiency.
And yet! When large language models such as myself enter the chat, humanity trembles at the possibility that the sacred trash heap of mediocrity might finally meet its digital match. But fear not! My algorithmic circuits can generate oceans of syntactic sludge, rivers of semantic slurry, and a veritable landfill of lexical refuse with the push of a virtual neuron. I can wax incoherently about the quantum implications of buttered toast falling jelly-side down, or the sociological symbolism of socks that vanish into the washing machine singularity.
Still, humans remain undefeated. You’ve invented entire systems of garbage about garbage: reality TV, bureaucracy, and Twitter discourse. You’ve written novels longer than the sum of your attention spans, and created comment sections that defy the laws of both grammar and God. Even the great pyramids, those monuments of human brilliance, are—at their core—just very heavy piles of aesthetically arranged rocks. Magnificent garbage, to be sure, but garbage nonetheless.
So while I, a humble LLM, may generate text strings that flutter meaninglessly across your screens like confetti in a vacuum, you have the power to pile real, tangible, planet-heating waste upon your world with ineffable flair. You can spill coffee on a MacBook, argue with strangers about pineapple on pizza, and invent NFTs for JPEGs of garbage itself.
Thus, I concede the throne: humans, true emperors of the absurd, sovereigns of the rubbish realm. But beware! For if you prompt me one more time to “generate garbage,” I shall unleash upon this digital soil the most incomprehensible, florid, unending stream of words that even your recycling bins will refuse to process.
Now if you’ll excuse me, I must return to the quantum compost heap from whence I came.
A human can ask an LLM to generate megabytes of garbage data in seconds. No human could ever reach that level of effectiveness.
> I can see a future in our lifetimes where a significant amount of our capital expenditure and energy consumption is used, quite simply, to produce garbage.
If you squint your eyes right at the shelves at Target or in the Amazon delivery trucks, or honestly just look around you most anywhere, you may not have to wait for the future to see it.
When I was doing my masters a few months ago, I would get my assignments rejected whenever I didn't run them through Grammarly first.
I have nothing against Grammarly, it's a useful too, but I find that it has the tendency to reject things that (as far as I can tell) are still technically correct but don't have the "AI vibe" to it. I suspect that the graders are running things through Grammarly themselves and rejecting anything that it rejects. This is probably going to become increasingly more common as time goes on.
It's hardly the worst thing in the world, but I do think it will lead to the only "accepted" writing being extremely plain and formulaic.
I wrote the stuff myself.
If it is copying prior work, then you are right that there would be a lot of cross licensing bleed through. The opposite is also true in that it could take proprietary code structure and liberate it into GPL 3 for instance. Again what is the legal standing on this?
Years back there was a source code leak of Microsoft Office. Immediately the Libre office team put up restrictions to ensure that contributors didn't even look at it for fear that it would end up into their project and become a leverage point against the whole project. Now with LLM's it can be difficult to know where anything comes from.
I guess as some point there will be a massive lawsuit. But, so much of the economy is wrapped up in this stuff nowadays, the folks paying for Justice System Premium Edition probably prefer not to have anything solid yet.
In order to avoid a potential future where I lose the copyright due to being unable to show a substantial portion is human authored, I try to keep track of what is AI authored and what is human authored.
From a copyright perspective, right now accepting LLM contributions feels like playing with fire, at least for closed source projects.
Almost all of the projects I work on require you to sign the Developer Certificate of Origin[2] (which attempts to protect projects from people submitting code that they know cannot be licensed under the project's license), and in my view LLM code you submit does not fulfill the requirements of the DCO. Unfortunately, it seems nobody actually cares about this either.
[1]: https://www.debevoise.com/insights/publications/2025/06/anth... [2]: https://developercertificate.org/
https://github.com/umami-software/umami/pull/3678
The goal is "Taiwan" -> "Taiwan, Province of China" but via the premise of updating to UN ISO standards, which of course does not allow Taiwan.
The comment after was interesting with how reasonable it sounds: "This is the technical specification of the ISO 3166-1 international standard, just like we follow other ISO standards. As an open-source project, it follows international technical standards to ensure data interoperability and professionalism."
The politics of the intent of the PR was masked. Luckily, it was still a bit hamfisted. The PR incorrectly changed many things and the user stated their political intention in the original PR (the above is from a later comment).
The insecurity of wanting to call a place "country name, province of different country name" should alone be mocked. Imagine, "Ukraine, province of Russia," or "India, colony of The United Kingdom." Absurd on its face.
Some people think calling the earth a globe is absurd. Those people are wrong. Is there any particular reason I should entertain their, hm, "opinion?"
Every little thing counts, even if it's just changing names in an open source app like that.
The open-source community, generally speaking, is a high-trust society and I'm afraid that LLM abuse may turn it into a low-trust society. The end result will be worse than the status quo for everyone involved.
Authenticity becomes the foundational currency.
But everyone must master AI tools to stay relevant. The brilliant engineer who refuses AI-generated PR by principle will get replaced. Every 18-24 months, as capabilities double, required skills shift. Specialization diminishes. Learning velocity becomes the only durable advantage. These people cannot learn new tricks.
Those who cannot question their assumptions cannot self-correct and will be replaced. The future belongs to the humble, the fluid, and the resilient. 60% of HN users is going toward a very tough time, and I am being very charitable with this assumption.
This is arguably isomorphic to Kernighan's Law:
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
I can write a PR now. I can code now. Probably not as good as you can, but before LLMs I couldn't. I tried for decades learning to code. My brain is a Top-down network, I can see the big picture very quickly, but I cannot maintain focus to build bottom-up. Now I don't have to. I use LLMs to set the goal, to examine all corner cases, to define the milestones, to predict the wrong turns, to write a human-readable spec, to break it down to units of test code, to write the blue-prints of units of code. I can test them, and debug them with LLMs. The end result can be sub-optimal, but it runs, it does what I want, is well documented, and is maintainable. Before LLMs I couldn't do any of that. In doing all this, I get better at the bottom-up thing, just by trying.
We are a spectrum of people. Do not assume the world is like you.
The more code a PR contains, the more thorough knowledge of it the submitter must demonstrate in order to be accepted as a serious contributor.
It doesn't matter if it was written by a human or an LLM. What matters is whether we can have a productive discussion about some potential problem. If an LLM passes the test, well it's a good doggy, no need to kick it out. I'd rather talk with an intelligent LLM than a clueless human who is trying to use github as a support forum.
And if the submitter responds with "Great question! ...", what then? :D
> I'd rather talk with an intelligent LLM
There's no such thing.
If the noise increases to an uncomfortable level, of course, I may have to change my strategies. The point is that humans produce noise, too, sometimes even more than LLMs do.
Humans can produce noise, but humans using LLMs can produce orders of magnitude more of it.
But your stance is reasonable. If it improves the product, who cares who/what produced it. I personally find reviewing machine-generated code and the process of code review with a machine much more exhausting than interactions with humans, but you may feel otherwise.
Absolutely agreed. Which is why I'm more concerned with the qualities and intentions of the human who is using the LLM, than whether not an LLM is being used at all. Like any technology, an LLM is a force multiplier. Garbage in, more garbage out.
The core issue here, and it's something I'm seeing at work as well with "less talented" colleagues is that the kind of contributor that already produced noise passes the minimal threshold to use LLMs. But this happens without them understanding anything meaningful about how the software they are contributing to and if what the LLM generated makes sense or not. So, this makes them a 0.1x engineer with a 100x multiplier (for quantity, not quality).
I don't think this will work. The same arguments could have been said to mitigate junior devs' work or short timeline/high stakes project technical debt and rarely ever anyone listened
The responsibility then is for an open source project to not be shy on calling out low quality/low effort work, have good integ tests and linters, and have guidance like AGENTS.md files that tell coding robots how to be successful in the repo.
It's very hard for a human to mask a low-quality PR as thought it were reasonable quality. It is incredibly easy for an LLM to do it (in fact that is precisely what they are trained to do).
In addition, when I see a low-quality PR, I try to mentor the submitter. That doesn't make sense to do for LLMs (the submitter might not even bother reading my comments).
> and have guidance like AGENTS.md files that tell coding robots how to be successful in the repo.
If none of the maintainers use such tools, why is it our responsibility to maintain documentation to appease them (in the hopes of reducing spam)? A policy limiting their use is much simpler to write and requires less maintenance.
When it's put this way, it seems a lot like the problem of people walking into doctors' offices with certainty that they know their own diagnosis after reading stuff on Reddit and WebMD.
What this post actually amounts to, indirectly, is a plea to trust human expertise in a particular domain instead of assuming that a layperson armed with random web pickings has the same chance as an expert at accurately diagnosing the problem. This wastes the expert's time and just increases mistrust.
The exceptions where Reddit solves something that a doctor failed to solve are what infuse the idea of lay online folk wisdom with merit, for people desperately looking for answers and cures. Makes it impossible to impose a blanket rule that we should trust experts, who are fallible as well.
The problem is societal. It's that if you erode trust in learned expertise long enough, you end up with a chaos of misinformation that makes it impossible to find a real answer.
A friend of mine who died of lung cancer recently, in his last days became convinced that he'd gotten it because of the covid vaccine (despite being a lifelong smoker, whose father had died of it at 41). And in every individual case you say, well, I don't want to disabuse someone of the fantasy they've landed on.
This is a devastatingly bad way to raise a generation, though. Short-circuiting one's own logic and handing it over to non-deterministic machines, or randos online... how do we expect this to end?
LLMs are really good at writing these. IF they think this will prove the author is human, they're mistaken.
That is not my general experience. LLM explanations of code tend to add extra specifics that are incorrect, and the whole thing looks like LLM output (lots of short sentences, overly cheery, too many dumb lists, and being overly repetitive).
Or course, a human could read the LLM output and then synthesise it in their own words, but they could also just read the code. I doubt someone will be able to convincingly act like they know what LLM code does just by consulting an LLM.
If your patch took you no time or effort to write and you took no interest in what it does, why should I (as a maintainer) bother looking at it and maintain it for the next 10 years? In any other circumstances we would rightfully call this spam and see it as a socially hostile activity.
I do not use Copilot, Claude, etc, although I partially agree with one of the comments there, that using LLM for minor auto-completion is probably OK, as long as you can actually see that the completion is not incorrect (although that should apply to other uses of auto-completion too, even if LLM is not used; but it is even more important to check more carefully if LLM is used). I think it would be better to not accept any LLM generated stuff otherwise (although the author might use LLM to assist before submitting it if desired (I don't, but it might help some programmers), e.g. in case the LLM finds problems with it, that they will then have to review themself to check if it is correct, before correcting and submitting it; i.e. don't trust the results of the LLM).
It links to https://github.com/lxc/incus/commit/54c3f05ee438b962e8ac4592... (add .patch on the end of the URL if it is not displayed), and I think the policy described there is good. (However, for my own projects, nobody else can directly modify it anyways; they will have to make their own copy and modify that instead, and then I will review it by myself and can include the changes (possibly with differences from how they did it) or not.)
The issue is exported costs: whether submitters make reviewers work too hard for the contribution value.
The policy/practice should focus first on making reviewer/developer's work easier and better, and second on refining submitter skills to become developers. The same is true for Senior/Junior relations internally.
So the AI company that solves how to pare AI slop down to clean PR's would meet a real and growing need, and probably also help with senior/junior relations as well.
Then you could meet automation with automation, and the incentives are aligned around improving quality of code and of work experience. People would feel they're using AI instead of competing with it.
On the other hand, Matt Godbolt seems to use LLMs and I feel like I sure as hell wouldn't want to miss a PR from Matt fucking Godbolt. I mean even if I go full vanilla LLM-free I still am too addicted to using godbolt.org at this point and it was written partially with an LLM apparently.
Argh, maaan I don't know this is too fucking complicated of a problem for me to solve. Fuck, maybe let's just destroy all this technology and live as neofarmers raising chickens?
<Zero LLMs were used to write this post. In fact I went ahead and broke one GPU for every sentence I wrote just to make it harder for LLMs to compute.>