The Use of LLM Assistants for Kernel Development
Posted4 months agoActive4 months ago
lwn.netTechstory
calmmixed
Debate
70/100
LLMKernel DevelopmentAI in Software DevelopmentOpen Source
Key topics
LLM
Kernel Development
AI in Software Development
Open Source
The article discusses the potential use of Large Language Models (LLMs) in kernel development, sparking a debate among commenters about the benefits and risks of integrating AI-generated code into critical projects.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
5h
Peak period
14
12-24h
Avg / period
4.6
Comment distribution32 data points
Loading chart...
Based on 32 loaded comments
Key moments
- 01Story posted
Aug 22, 2025 at 7:02 PM EDT
4 months ago
Step 01 - 02First comment
Aug 23, 2025 at 12:04 AM EDT
5h after posting
Step 02 - 03Peak activity
14 comments in 12-24h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 27, 2025 at 10:43 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 44990981Type: storyLast synced: 11/20/2025, 8:42:02 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Ain't that anticipatory obedience?
There is no reason why I can't sue every single developer to ever use an LLM and publish and/or distribute that code for AGPLv3 violations. They cannot prove to the court that their model did not use AGPLv3 code, as they did not make the model. I can also, independently, sue the creator of the model, for any model that was made outside of China.
No wonder the model makers don't want to disclose who they pirated content from.
If their model reproduces enough of an AGPLv3 codebase near verbatim, and it cannot be simply handwaved away as a phonebook situation, then it is a foregone conclusion that they either ingested the codebase directly, or did so through somebody or something that did (which dooms purely synthetic models, like what Phi does).
I imagine a lot of lawyers are salivating over the chance of bankrupting big tech.
TL;DR: you have not discovered an infinite money glitch in the legal system.
If you have a VHS deck, copy a VHS tape, then start handing out copies of it, I pick up a copy of it from you, and then see, lo and behold, it contains my copyrighted work, I have sufficient proof to sue you and most likely win.
If you train an LLM on pirated works, then start handing out copies of that LLM, I pick up a copy of it, and ask it to reproduce my work, and it can do so, even partially, I have sufficient proof to sue you and most likely win.
Technically, even involving "which license" is a bit moot, AGPLv3 or not, its a copyright violation to reproduce the work without license. GPL just makes the problem worse for them: anything involving any flavor of GPLv3 can end up snowballing with major GPL rightsholders enforcing the GPLv3 curing clause, as they will most likely also be able to convince the LLM to reproduce their works as well.
The real TL;DR is: they have not discovered an infinite money glitch. They must play by the same rules everyone else does, and they are not warning their users of the risk of using these.
BTW, if I was wrong about this, (IANAL after all), then so are the legal departments at companies across the world. Virtually all of them won't allow AGPLv3 programs in the door just because of the legal risk, and many of them won't allow the use of LLMs with the current state of the legal landscape.
A recent anthropic lawsuit decision also reaffirms that training on copyright is not a violation of copyright.[1]
However outputting copyright still would be a violation, the same as a person doing it.
Most artists can draw a batman symbol. Copyright means they can't monetize that ability. It doesn't mean they can't look at bat symbols.
[1]https://www.npr.org/2025/06/25/nx-s1-5445242/federal-rules-i...
As for the Anthropic lawsuit, the piracy part of the case is continuing. Most models are built with pirated or unlicensed inputs. The part that was decided on, although the decision imo was wrong, only covers if someone CAN train a model.
At no point have I claimed you can't train one. The question is can you distribute one, and then use one. An LLM is not simplistic enough to be considered a phonebook, so they can't just handwave that away.
Saying an LLM can do that is like saying an artist can make a JPEG of a Batman symbol, and that's totally okay for them to distribute because the JPEG artifacts are transformative. LLMs ultimately are just a clever way of compressing data, and compressors are not transformative under the law, but possessing a compressor is not inherently illegal, nor is using one on copyrighted material for your own personal use.
Again, it's illegal for artists to recreate copyright, it's not illegal for them to see it or know it. It's not like you cannot hire a guy because he can perfectly visualize Pikachu in his head.
The conflation of training on copyright being equivalent to distribution of copyright is so disingenuous, and thankfully the courts so far recognize that.
Its illegal for artists to distribute recreated copyright in a way that is not transformative. It isn't illegal to produce it and keep it to themselves.
People also distribute models, they don't merely offer them as a service. However, if someone asks their model to produce a copyright violation, and it does so, the person that created and distributed the model (its the distribution that is the problem), the service that ran it (assuming it isn't local inference), and the person that asked for the violation to be created can all be looped into the legal case.
This has happened before, before the world of AI. Even companies that 100% participated in the copyright regime, quickly performed takedowns, ran copyright detection to the best of their ability were sued and they lost because their users committed copyright violation using their services, even though the company did everything right and absolutely above board.
The law is stacked against service providers on the Internet, as it essentially requires them to be omniscient and omnipotent. Such requirements are not levied against other service providers in other industries.
There is no reason why I can't sue every single developer to ever use an LLM and publish and/or distribute that code.
Simply proving that it's possible to reproduce your work with an LLM doesn't prove that I did, in fact, reproduce your work with an LLM. Just like you can't sue me for owning a VHS — even though it's possible that I could reproduce your work with one. The onus is on you to show that the person using the LLM has actually used it to violate your copyrighted work.
And running around blindly filing lawsuits claiming someone violated your copyright with no proof other than "they used an LLM to write their code!" will get your case thrown out immediately, and if you do it enough you'd likely get your lawyer disbarred (not that they'd agree to do it; there's no value in it for them, since you'll constantly lose). Just like blindly running around suing anyone who owns a VHS doesn't work. You have not discovered an infinite money glitch, or an infinite lawsuit glitch.
If you think you have, go talk to a lawyer. It's infinite free money, after all.
The onus would be on the toolmaker/service provider to prove there is legal uses of that tool/service and that their tool/service should not be destroyed. This is established case law, and people have lost those cases, and the law is heavily tilted in favor of the copyright holders.
The majority of LLMs are trained on pirated works. The companies are not disclosing this (as they would be immediately sued if they did so), and letting their users twist in the wind. Again, if those users use the LLM to reproduce a copyrighted work, all involved parties can be sued.
See the 1984 Betamax case (Sony Corp. of America v. Universal City Studios) on how the case law around this works: Sony was able to prove there is legitimate and legal uses for being able to record things that, thus can still produce Betamax products and cannot be sued for pirates pirating with Betamax products...
... but none of the LLM distributors or inference service providers have (or may be even able to) reach that and that doesn't make it legal to pirate things with Betamax, those people were still sued and sometimes even put in prison, and similarly, it would not free LLM users to continue pirating works using LLMs, it would only prevent OpenAI, Anthropic, etc, from being shut down.
If you still think this is an infinite money glitch, then it is exactly as you say, and this glitch has been being used against the American people by the rich for our entire lives.
In an even greater misunderstanding of the American legal system, you're using the Sony case to argue that you would win court cases against LLM users. The plaintiffs in the Sony case lost! This makes your pretend case even harder: the established precedent is in fact the opposite of what you want to do, which is randomly sue everyone who uses LLMs based on a shaky analysis that since it's possible to use them to infringe, everyone is guilty of infringement until proven innocent.
Moreover, at this point you're heavily resorting to motte and bailey, where you originally claimed you could sue anyone who used LLMs, and are now trying to back up and reduce that claim to just being able to sue OpenAI, Anthropic, and training companies.
Continuing this discussion feels pointless. Your claim was wrong. You can't blindly sue anyone who uses LLMs. If you think you can, go talk to a lawyer, since you seem to believe you've found a cheat code for money.
>In an even greater misunderstanding of the American legal system, you're using the Sony case to argue that you would win court cases against LLM users.
Not at all. I said this is the only actual path for the companies to survive, if they can thread that legal needle. The users do not get the benefit of this. The FBI spent the better part of 3 decades busting small time pirates reproducing VHS tapes using perfectly legal (as per the case I quoted) tape decks.
Notice, not everybody has won this challenge, the Sony case merely shows you how high you have to jump. Many companies have been found liable for producing a tool or service whose primary use is to commit crimes or other illegal acts.
Companies that literally bent over backwards to comply with the law still got absolutely screwed, see what happened to Megaupload, and all they did was provide an encrypted offsite file storage system, and complied with all applicable laws promptly and without challenge.
Absolutely nothing stops the AI companies from being railroaded like that. However, I believe that they will attempt a Sony-like ruling to save their bacon, but throw their users under the bus.
>the established precedent is in fact the opposite of what you want to do,
Nope, just want to sue the code pirates. Everyone else can go enjoy their original AI slop as long as it comes from a 100% legally trained model and everybody keeps their hands clean.
>and are now trying to back up and reduce that claim
No, I literally just gave the Sony case as an example of reducing the claim into the other direction. The companies may in fact find a way to weasel out of this, but the users never will.
Another counter-example, btw, not that you asked for one, is Napster. Napster was ordered by a court to shut down their service as it's primary use was to facilitate piracy. While it is most likely OpenAI et al. will try to Sony their way out, they could end up like Napster instead, or worse, end up like Megaupload.
>everyone is guilty of infringement until proven innocent.
Although you are saying this in plain language, this is largely how copyright cases work in the US, even though, in theory, it should be innocent until proven guilty. However, that exact phrase is only meaningful in criminal cases. It is much more loose in civil cases, and the bar for winning a civil case is much lower.
Usually in a copyright case, the copyright owner is usually the plantiff (although not always!), and copyright owner plantiffs usually win these cases, even in cases where they really shouldn't have.
>Continuing this discussion feels pointless.
Yes it really does. Many people on HN clearly think it is okay to copyright-wash through LLMs, and that the output of LLMs are magically free of infringement by some unexplained handwaving.
You still have not explained how a user can have an LLM reproduce a copyrighted work, and then distribute it, and somehow the copyright owners cannot sue everyone involved, which is standard practice in such cases.
As one commenter notes, we seem to be heading towards a “don't ask, don't tell policy”. I do find that unfortunate, because there is great potential in sharing solutions and ideas more broadly among experienced developers.
The worst case for AI and OSS is a flood of vibe-coded PRs that increase bugs/burden on project maintainers; the best case is that talented but time-starved engineers are more likely to send the occasional high-quality PR as the time investment per PR decreases.
This is not running a prompt as that’s probabilistic so doesn’t guarantee anything! This is having an agent create a self-contained check that becomes part of the codebase and runs in milliseconds. It could do anything - walk the AST of the code looking for one anti-pattern, check code conventions.. a linter on steroids.
Building and refining a library of such checks relieves maintainers’ burden and lets submitters check their own code.
I’m not just saying it - its worked super well for me. I am always adding checks to my codebase. They enforce architecture “routes are banned from directly importing the DB they must go via the service layer” or “no new dependencies”, they inspect frontend code to find all the fetch calls & href’s then flag dead API routes and unlinked pages. With informative error messages, agents can tell they’ve half finished/half assed an implementation. My favorite prompt is “keep going til the checks pass”.
What kernel reviewers do is complex - but I wonder how much can be turned into lore in this way. Refined over time to make kernel development even more foolproof as it becomes more complex.
I have a repo with several libraries where i need error codes to be globally unique, as well as adhere to a set of prefixes attributed to each library. This was enforced by carefully reviewing any commits that touched the error code headers.
I’ve had a ticket open for years to write a tool to do this and the general idea of the tool’s architecture but never got around to implementing it.
I used the LLMs to research design alternatives (clang tools, tree sitter, etc) and eventually implement a tree sitter based python tool that: given a json config of the library prefixes, checks they all adhere and that there are no duplicate error codes within a library.
This would probably have taken me at least a few days to do on my own (or probably would just sit in the backlog forever), took about 3 hours.
Runs
> LLMs are particularly effective for language-related tasks - obviously. For example, they can proof-read text, generate high-quality commit messages, or at least provide solid drafts.
> LLMs are not so strong for programming, especially when it comes to creating something totally new. They usually need very limited and specific context to work well.
The big takeaway is regardless of whoever generated the code: "...it is the human behind the patch who will ultimately be responsible for its contents." which implies they need* to understand what the code does with no regressions introduced.
https://www.linuxfoundation.org/blog/blog/welcoming-pytorch-...
Kernel developers work for large "AI" booster corporations and may or may not experience indirect pressure. It is encouraging that there is still dissent despite all of this. Perhaps there should by anonymous proposals and secret voting.
The Linux Foundation's "AI" policy is sketchy. It allows content generated "whole or in part using AI tools".
https://www.linuxfoundation.org/legal/generative-ai
Definitely code generated "in whole" cannot be copyrighted or put under the GPL, as discussed recently here, see the top comment for example:
https://news.ycombinator.com/item?id=44976568
12 more comments available on Hacker News