Anthropic Judge Rejects $1.5b AI Copyright Settlement
Posted4 months agoActive4 months ago
news.bloomberglaw.comTechstoryHigh profile
heatedmixed
Debate
85/100
Artificial IntelligenceCopyrightIntellectual Property
Key topics
Artificial Intelligence
Copyright
Intellectual Property
A judge rejected a $1.5B settlement between Anthropic and copyright holders over AI training data, sparking debate about the fairness of AI companies using copyrighted materials and the value of intellectual property.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1m
Peak period
135
Day 1
Avg / period
26.7
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 9, 2025 at 4:46 AM EDT
4 months ago
Step 01 - 02First comment
Sep 9, 2025 at 4:47 AM EDT
1m after posting
Step 02 - 03Peak activity
135 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 21, 2025 at 1:16 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45179304Type: storyLast synced: 11/20/2025, 8:18:36 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Thus, I stand to receive about $9,000 as a result of this settlement.
I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.
Where can I check if I'm eligible?
Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?
Given that books can be imitated by humans with no compensation, this isn't as strong as an argument as you think. Moreover AFAIK the training itself has been ruled legal, so Anthropic could have theoretically bought the book for $20 (or whatever) and be in the clear, which would obviously bring less revenue than the $9k settlement.
And in general, when an LLM is able to recreate text that's a training error. Recreating text is not the purpose. Which is not to excuse it happening, but the distinction matters.
Real-world absurd example: A company hires a bunch of workers. They then give them access to millions of books and have the workers reading the books all day. The workers copy the books word by word, but after each word try to guess the next word that will appear. Eventually, they collectively become quite good at guessing the next word given a prompt text, even reproducing large swaths of text almost verbatim. The owner of company Y claims they owe nothing to the book owners, because it doesn't count as reading the book, and any reproduction is "coincidental" (even though this is the explicit task of the readers). They then use these workers to produce works to compete with the authors of the books, which they never paid for.
It seems many people feel this is "fair use" when it happens on a computer, but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style. If you feel this is still fair use, then you should agree all books should be free to everyone (as well as art, code, music, and any other training material).
Can you provide an example of someone being successfully sued for "mimicking style", presumably in the US judicial system?
I won't rehash the many arguments as to why the output is also a violation, but my point was more the absurd view that stealing and using all the data in the world isn't a problem because the output is a lossy encoding (but the explicit training objective is to reproduce the training text / image).
However, AI has been show to copy a lot more than what people consider style.
Music has had this happen numerous times in the US. The distinction isn’t an exact replica, it’s if it could be confused for the same style.
George Harrison lost a case for one of his songs. There are many others.
https://ultimateclassicrock.com/george-harrison-my-sweet-lor...
That's called extreme overfitting. Proper training is supposed to give subtle nudges toward matching each source of text, and zillions of nudges slowly bring the whole thing into shape based on overall statistics and not any particular sources. (But that does require properly removing duplicate sources of very popular text which seems to be an unsolved problem.)
So your analogy is far enough off that I can't give it a good reply.
> It seems many people feel this is "fair use" when it happens on a computer, but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style.
I haven't seen anyone defend the piracy, and the piracy is what this settlement is about.
People are defending the training itself.
And I don't think anyone would seriously say the AI version is fair use but the human version isn't. You really think "many people" feel that way?
To generate working code the output must follow the API exactly. Nothing separates code and natural language as far as the underlying algorithm is concerned.
Companies slightly randomize output to minimize the likelihood of direct reproduction of source material, but that’s independent of what the neural network is doing.
And it's not really about randomizing output. The model gives you a list of likely words, often with no clear winner. You have to pick one somehow. It's not like it's taking some kind of "real" output and obfuscating it.
But copyright was based on substantial similarity, not causal links. That is the subtle change. Copyright is expanding more and more.
In my view, unless there is substantially similarity to the infringed work, copyright should not be invoked.
Even the substantial similarity concept is already an expanded concept from original "protected expression".
It makes no sense to attack gen-AI for infringement, if we wanted the originals we would get the originals, you can copy anything you like on the web. Generating bootleg Harry Potter is slow, expensive and unfaithful to the original. We use gen-AI for creating things different from the training data.
Copyright isn’t supposed to apply if you happen to write a story that bares an uncanny similarity to a story you never read written in 1952 in a language you don’t know that sold 54 copies.
(They can still sue for damages, but they can't claim copyright over your game itself.)
But otherwise, you're essentially asking if you can somehow bypass license agreements by simply refusing to read them, which would obviously render all licensing useless.
In the event that you try to play games to get around that acknowledgement: Courts aren't machines, they can tell that you're acting in bad faith to avoid license restrictions and can punish you appropriately.
Thus, isn't the settlement essentially Anthropic admitting that they don't really have an effective defense against the piracy claim?
The authors can still sue for damages though (and did, and had a strong enough case Anthropic is trying to settle for over a billion dollars).
Or you could sue him on a theory of unjust enrichment, in which case, if he lost, he'd owe you nothing, and if he won, he'd owe you all of his winnings.
It's not clear to me why the same theory wouldn't be available to Adobe, though the copyright question wouldn't be the main thrust of the case then.
So you're agreeing with me? The courts have been pretty clear on what's copyrightable. Copyrights only protect specific expressions of an idea. You can copyright your specific writing of a recipe, but not the concept of the dish or the abstract instructions itself.
The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.
The suit is about Anthropic procuring those materials from a pirated dataset.
The infringement, in other words, happened at the time of procurement, not at the time of training.
If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.
https://www.documentcloud.org/documents/26084996-proposed-an...
> reproducing purchased and scanned books to train AI constituted fair use
Library Genesis has one copy. It then sends you one copy and keeps it's own. The entity that violated the _copy_right is the one that copied it, not the one with the copy.
Of course, American law is different. But is it the case that copies made for the purpose of using illegally obtained works are not infringing?
Well, the question here is "who made the copy?"
If you advertise in seedy locations that you will send Xeroxed copies of books by mail order, and I order one, and you then send me the copy I ordered, how many of us have committed a copyright violation?
The portion the court said was bad was not Anthropic getting books from pirated sites to train its model. The court opined that training the model was fair use and did not distinguish between getting the books from pirated sites or hard copy scans. The part the court said was bad, which was settled, was Anthropic getting books from a pirate site to store in a general purpose library.
--
--Questions
As an author do you think it matters where the book was copied from? Presumably, a copyright gives the author the right to control when a text is reproduced and distributed. If the AI company buys a book and scans it, they are reproducing the book without a license, correct? And fair use is the argument that even though they violated the copyright, they are execused. In a pure sense, if the AI company copied (assuming they didn't torrent back the book) from a "pirate source" why is that copy worse then if they copied from a hard book?
isn't digitizing your own copies as backups and personal use fine? so long as you dont give away the original while keeping the backups. similarly, dont give away the digital copies.
No? I think there are a lot more details that need to be known before answering this question. It matters what they do with it after they scan it.
Yes
> it means that yes what I did was technically a violation but is forgiven
Not at all. All "affirmative defence" means is that procedural the burden is on me to establish that I was not violating the law. The law isn't "you can't do the thing", rather it is "you can't do the thing unless its like this". There is no violation, there is no forgiveness as there is nothing to forgive, because it was done "like this" and doing it "like this" doesn't violate the law in the first place.
The entire point of deep learning is to copy aspects from training materials, which is why it’s unsurprising when you can reproduce substantial material from a copyrighted work given the right prompts. Proving damages for individual works in court is more expensive than the payout but that’s what class action lawsuits are for.
There was no issues with the physical copies of books they purchased and scanned.
I believe the issue of USING these texts for AI training is a separate issue/case(s)
It may be fair to you but how about other authors? Maybe it's not fair at all to them.
I don't think $3k is likely a bad deal, but I still think you're over simplifying things.
You're treating the system as isolated when it is not.
I think you are confused. Yes, it is piracy but not like the typical piracy most of us do. There's no loss in pirating a movie if you would never have paid to see the movie in the first place.But there's future costs here as people will use LLMs to generate books, which is competition. The cost of generating such a book is much cheaper, allowing for a much cheaper product.
In your effort to simplify things you have only complicated them.There's "Statutory Damages" which account for a wide range of things[0].
Not to mention you just completely ignored what I argued!
Seriously, you've been making a lot of very confident claims in this thread and they are easy to verify as false. Just google some of your assumptions before you respond. Hell, ask an LLM and they'll tell you! Just don't make assumptions and do zero amount of vetting. It's okay to be wrong, but you're way off base buddy.
[0] https://en.wikipedia.org/wiki/Statutory_damages
What about Meta, who did the same thing?
What about Google, who did the same thing?
What about Nvidia, who did the same thing?
Clearly something should be done because it's not like these companies can't afford the cost of the books. I mean Meta recently hired people giving out >$100m packages and bought a data company for $15bn. Do you think they can't afford to buy the books, videos, or even the porn? We're talking about trillion dollar companies.
It's been what, a year since Eric Schmidt said to steal everything and let the lawyers figure it out if you become successful?[1] Personal I'm not a big fan of "the ends justify the means" arguments. It's led to a lot of unrest, theft, wars, and death.
Do you really not think it's possible to make useful products ethically?
[0] https://news.ycombinator.com/newsguidelines.html
[1] https://www.theverge.com/2024/8/14/24220658/google-eric-schm...
One of the consequences of retaining their rights is that they can also sue Meta and Google and OpenAI etc for the same thing.
[0] https://news.ycombinator.com/item?id=45190232
> Clearly something should be done because it's not like these companies can't afford the cost of the books
Yes indeed it should, and it has. They have been forced to pay $3000 per book they pirated, which is more than 100x what they would have gained if they had gotten away with it.
IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy. If you want to argue that the penalty should be more, you can do that, but it is completely missing my point. You are talking about what is fair punishment to the companies, and my comment was talking about what is fair compensation to the authors. Those are two completely different things.
But that is a different topic all together. I still think you've vastly over simplified the conversation and thus unintentionally making some naive assumptions. It's the whole reason I said "probably" in [16]. The big difference being just that you're smart enough to figure out how law works and I'm smart enough to know that neither of us are lawyers.
And please don't ask me for more citations unless they are difficult to Google... I think I already set some kinda record here...
Anti-piracy groups use scare letters on pirates where they threaten to sue for tens of thousands of dollars per instance of piracy. Why should it be lower for a company?
Yes. Nemotron:
https://www.nvidia.com/en-gb/ai-data-science/foundation-mode...
If there's evidence of this that will stand up in court, they should be sued as well, and they'll presumably lose. If this hasn't happened, or isn't in the works, then I guess they covered their tracks well enough. That's unfortunate, but that's life.
[0] https://gprivate.com/6ib6y
This is what generative AI essentially is.
Maybe the payment should be $500/h (say $5k a page) to cover the cost of preparing a human verified dataset for anthropic.
Don't get me wrong: I think this is in incredibly bad deal for authors. That said, I would be horrified if it wasn't treated as fair use. It would be incredibly destructive to society since people would try to use such rulings to chissel away at fair use. Imagine schools who had to pay yearly fees to use books. We know they would do that, they already try to do so (single use workbooks, online value added services). Or look at software. It is already going to be problematic for people who use LLMs. It is already problematic due to patents. Now imagine what would happen if reformulating algorithms that you read in a book was not considered as fair use. Or look at books themselves. A huge chunk of non-fiction consists of doing research and re-expressing ideas in non-original terms. Is that fair use? The main difference between that and a generative AI is we can say a machine did it in the case of generative AI, but is that enough to protect fair use in the conventional sense?
I feel like we aren't far from that. Wouldn't be surprised if new books get published (in whatever medium) that are licensed out instead of sold.
Thus the $3k per violation is still punitive at (conservatively) 100x the cost of the book.
Given that it is fair use, Authors do not have rights to restrict training on their works under copyright law alone.
…especially given the US “fair use” doctrine takes into account the effect that a particular use might have on the market for similar works, so the authors are bound to argue that the existence of AI that can reproduce fanfiction-like facsimiles of works at scale is going to poison the well and reduce the market for people spending actual money on future works (whether or not that’s true is another question).
So in my view the court is going to say that buying a book doesn’t give them the right to train on the contents because that is mechanical reproduction which is explicitly disallowed by the copyright notice and they don’t fall under the “fair use” carveout because they affect the future market. There isn’t anywhere else where they were granted the right to use the authors’ works so the work is disallowed. Obviously no court finding is ever 100% guaranteed but that really seems the only logically-consistent conclusion they could come to.
Publishers get exclusive print publishing rights for a given market, typically get digital and audio publication rights for the same, and frequently get a handful of other rights like the ability to license it for publication in other markets. But ownership of the work is almost always retained by the author.
> Statutory penalties are found at 18 U.S.C. § 2319. A defendant, convicted for the first time of violating 17 U.S.C. § 506(a) by the unauthorized reproduction or distribution, during any 180-day period, of at least 10 copies or phonorecords, or 1 or more copyrighted works, with a retail value of more than $2,500 can be imprisoned for up to 5 years and fined up to $250,000, or both. 18 U.S.C. §§ 2319(b), 3571(b)(3).
If you broaden it to include DMCA violations you could spend a lot of time in jail. It's even worse in some other countries.
With a typical torrenter, it would be straightforward to make some truly monumental penalties.
The reality is, they rarely care.
Granted, the motivation was the copyright infringement, but to do what they did they needed to dress it up.
And this is why it is correct to say that he was persecuted for copyright infringement. Noting that he wasn't charged with anything related to copyright doesn't change the story, it only makes it less agreeable.
This settlement has nothing to do with any criminal liability Anrhropic might have, only tort liability (and it doesn’t involves damages, not fines.)
John Doe McDrugUser 2
John Doe McDrugUser 3
John Doe McDrugUser 4
John Doe McDrugUser 5
John Doe McDrugUser 6
John Doe McDrugUser 7
- Sam Bankman-Fried (FTX): Sentenced to 25 years in prison in 2024 for orchestrating a massive fraud involving the misappropriation of billions in customer funds.
- Elizabeth Holmes (Theranos): Began an 11-year prison sentence in 2023 after being convicted of defrauding investors with false claims about her blood-testing technology.
- Ramesh "Sunny" Balwani (Theranos): The former president of Theranos was sentenced to nearly 13 years in prison for his role in the same fraud as Elizabeth Holmes.
- Trevor Milton (Nikola Corporation): Convicted of securities and wire fraud, he was sentenced to four years in prison in 2023.
- Ippei Mizuhara: The former translator for MLB star Shohei Ohtani was charged in April 2024 with bank fraud for illegally transferring millions from the athlete's account.
- Sergei Potapenko and Ivan Turogin: Convicted in February 2025 for a $577 million cryptocurrency fraud scheme.
- Bernard Madoff: Sentenced to 150 years in prison in 2009 for running the largest Ponzi scheme in history. He died in prison in 2021.
- Jeffrey Skilling (Enron): The former CEO of Enron was sentenced to 24 years in prison in 2006 for fraud and conspiracy. His sentence was later reduced, and he was released in 2019.
- Dennis Kozlowski (Tyco International): The former CEO served over six years in prison after being convicted in 2005 for looting millions from the company.
- Bernard "Bernie" Ebbers (WorldCom): Sentenced to 25 years in prison for orchestrating an $11 billion accounting fraud. He was granted early release in 2019 and died shortly after.
Apart from this list I know Nissan's ex CEO was put into solitary confinement for months.
Who went to prison from Exxon for the Valdez oil spill[1], or from BP for the Deep Water Horizon[2] debacle?
Who went to prison from Norfolk-Southern for the East Palestine train derailment[3]?
Who went to prison from Boeing for the 737Max debacle[4]?
[0] https://en.wikipedia.org/wiki/Bhopal_disaster
[1] https://en.wikipedia.org/wiki/Exxon_Valdez
[2] https://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill
[3] https://en.wikipedia.org/wiki/East_Palestine%2C_Ohio%2C_trai...
[4] https://en.wikipedia.org/wiki/Boeing_737_MAX_groundings
> - Elizabeth Holmes (Theranos): Began an 11-year prison sentence in 2023 after being convicted of defrauding investors with false claims about her blood-testing technology.
many from your list went to jail because they robbed rich, and not poor.
“Greyball”: https://www.nytimes.com/2017/03/03/technology/uber-greyball-...
My uncle went to jail for picking up someone in an airport in his taxi. He didnt have the airport permit (could only drop off, not pick up). Travis Kalanick industrialized that crime on a grand scale and got billions of dollars instead of jail.
The lesson is clear. Don't make things that don't make money to the already rich.
Remarkably similar, bulk copying of data for other use, except Swartz wanted to make it free vs Anthropic, who wants to make it available via it's "AI" repackaging. One is Federally prosecuted with possibility of decades of jail time and million-dollar fines, the other is a mere civil action.
To actually get convicted of anything as a corporate officer, you have to have substantially defrauded your own shareholders, who are senior to the public's interest in justice. Most such crimes involve financial malfeasance.
And please don't assume a "you wouldn't if it was your own employer" - no, I very much would, despite the struggles it would cause.
Entire company shutdown. All employees fired. All servers shutdown. All windows computers stop working. All companies using azure gets their stuff turned off. And so on.
Is the world a better place?
1. Hit them with fines or punitive damages high enough to wipe out all their operating profit and executive pay for as many years as a person would be in prison.
2. Seize the company (retainership?), replace its executives, and make the new leaders sign off to not do that thing again. That's in addition to a huge fine.
3. Dissolve it. Liquidate its assets.
They usually just let the big companies off while throwing everything they have at many individuals who aren't corporations.
For settlement-type deals, maybe see if they'll give all authors they ripped off free access to Claude models, too. They reap the benefits of what was produced. At cost with certain amount of free credits.
Give the government partial ownership. This dilutes the other owners and ties them do the government. This gives the government more 'oversight' power over the business, just like jail. Give the government an oversite seat on the board.
There are many ways you can put a business in jail, we're just told you can't because that would inconvenience the current business models of breaking the laws/rules/obligations to 'streamline' business and 'innovate'.
I don’t get fined 7000USD for illegally downloading 3 books for example, much less. Although if I’m a repeat offender it can go up to prison I think.
While I'm sure it feels good and validating to have this called copyright infringement, and be compensated, it's a mixed blessing at best. Remember, this also means that your works will owe compensation to anyone you "trained" off of. Once we accept that simply "learning from previous copyrighted works to make new ones" is "infringement", then the onus is on you to establish a clean creation chain, because you'll be vulnerable to the exact same argument, and you will owe compensation to anyone whose work you looked at in learning your craft.
This point was made earlier in this blog post:
https://blog.giovanh.com/blog/2025/04/03/why-training-ai-can...
HN discussion of the post: https://news.ycombinator.com/item?id=43663941
It remains to be seen, but typically this forms a moat. Other companies can't bring together the investment resources to duplicate the effort and they die.
The only reasons why this wouldn't be a moat:
1. Too many investment dollars and companies chasing the same goal, and none of them consolidate. (Non-consolidation feels impractical.)
2. Open source / commoditize-my-complement offerings that devalue foundation models. We have a few of these, but the best still require H100s and they're not building product.
I think there's a moat. I think Anthropic is well positioned to capitalize from this.
1. Getting the maximum statutory damages for copyright infringement, which would be something like &250,000 per instance of infringement; you can be generous and call their training and reproduction of your works as a single instance, though it’s probably many more than that. 2. An admission of wrongdoing plus withdrawal from the market and permanent deletion of all models trained on infringed works. 3. A perpetual agreement to only train new models on content licensed for such training going forward, with safeguards to prevent wholesale reproduction of works.
It’s no less than what they would do if they thought you were infringing their copyrights. It’s only fair that they be subject to the same kind of serious penalties, instead of something they can write off as a slap on the wrist.
160 more comments available on Hacker News