Anthropic Judge Rejects $1.5b AI Copyright Settlement

4 months ago

11 replies

I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

franze

4 months ago

2 replies

where can i check if my book was in it?

simonw

4 months ago

One of the sources is LibGen, you can search that with this tool: https://www.theatlantic.com/technology/archive/2025/03/searc...

pier25

4 months ago

Maybe here: https://www.anthropiccopyrightsettlement.com/

k__

4 months ago

1 reply

Cool.

Where can I check if I'm eligible?

https://www.theatlantic.com/technology/archive/2025/03/searc...

4 months ago

1 reply

gabriel666smith

4 months ago

This is a 2025 snapshot of Libgen, rather than the dataset Anthropic trained on, or the 'reduced to 500k books' dataset of books by authors who are deemed legally due some cash from Anthropic because Anthropic downloaded the ebook file.

visarga

4 months ago

4 replies

How is it fair? Do you expect 9,000 from Google, Meta, OpenAI, and everyone else? Were your books imitated by AI?

Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?

4 months ago

2 replies

>Were your books imitated by AI?

Given that books can be imitated by humans with no compensation, this isn't as strong as an argument as you think. Moreover AFAIK the training itself has been ruled legal, so Anthropic could have theoretically bought the book for $20 (or whatever) and be in the clear, which would obviously bring less revenue than the $9k settlement.

visarga

4 months ago

3 replies

Copyright should be about copying rights, not statistical similarities. Similarity vs causal link - a different standard all together.

4 months ago

2 replies

The entire purpose of training materials is to copy aspects of them. That’s the causal link.

Dylan16807

4 months ago

1 reply

The aspect it's supposed to copy is the statistics of how words work.

And in general, when an LLM is able to recreate text that's a training error. Recreating text is not the purpose. Which is not to excuse it happening, but the distinction matters.

program_whiz

4 months ago

2 replies

In training, the model is trained to predict the exact sequence of words of a text. In other words, it is reproducing the text repeatedly for its own trainings. The by-product of this training is that it influences model weights to make the text more likely to be produced by the model -- that is its explicit goal. A perfect model would be able to reproduce the text perfectly (0 loss).

Real-world absurd example: A company hires a bunch of workers. They then give them access to millions of books and have the workers reading the books all day. The workers copy the books word by word, but after each word try to guess the next word that will appear. Eventually, they collectively become quite good at guessing the next word given a prompt text, even reproducing large swaths of text almost verbatim. The owner of company Y claims they owe nothing to the book owners, because it doesn't count as reading the book, and any reproduction is "coincidental" (even though this is the explicit task of the readers). They then use these workers to produce works to compete with the authors of the books, which they never paid for.

It seems many people feel this is "fair use" when it happens on a computer, but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style. If you feel this is still fair use, then you should agree all books should be free to everyone (as well as art, code, music, and any other training material).

4 months ago

3 replies

>but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style

Can you provide an example of someone being successfully sued for "mimicking style", presumably in the US judicial system?

program_whiz

4 months ago

The damages arise from the very process of stealing material for training. The justification "yes but my training didn't cause me to directly copy the works" is faulty.

I won't rehash the many arguments as to why the output is also a violation, but my point was more the absurd view that stealing and using all the data in the world isn't a problem because the output is a lossy encoding (but the explicit training objective is to reproduce the training text / image).

https://ultimateclassicrock.com/george-harrison-my-sweet-lor...

4 months ago

Style in an ambiguous term here as it doesn’t directly map to what’s being considered. The case between “Blurred Lines” and “Got to Give It Up” is often considered one of style and the Court of Appeals for the Ninth Circuit upheld copyright infringement.

However, AI has been show to copy a lot more than what people consider style.

snowe2010

4 months ago

> Second, the songs must share SUBSTANTIAL SIMILARITY, which means a listener can hear the songs side by side and tell the allegedly infringing song lifted, borrowed, or appropriated material from the original.

Music has had this happen numerous times in the US. The distinction isn’t an exact replica, it’s if it could be confused for the same style.

George Harrison lost a case for one of his songs. There are many others.

Dylan16807

4 months ago

1 reply

> In training, the model is trained to predict the exact sequence of words of a text. In other words, it is reproducing the text repeatedly for its own trainings.

That's called extreme overfitting. Proper training is supposed to give subtle nudges toward matching each source of text, and zillions of nudges slowly bring the whole thing into shape based on overall statistics and not any particular sources. (But that does require properly removing duplicate sources of very popular text which seems to be an unsolved problem.)

So your analogy is far enough off that I can't give it a good reply.

> It seems many people feel this is "fair use" when it happens on a computer, but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style.

I haven't seen anyone defend the piracy, and the piracy is what this settlement is about.

People are defending the training itself.

And I don't think anyone would seriously say the AI version is fair use but the human version isn't. You really think "many people" feel that way?

4 months ago

1 reply

There isn’t a clear line for extreme overfitting here.

To generate working code the output must follow the API exactly. Nothing separates code and natural language as far as the underlying algorithm is concerned.

Companies slightly randomize output to minimize the likelihood of direct reproduction of source material, but that’s independent of what the neural network is doing.

Dylan16807

4 months ago

You want different levels of fitting for different things, which is difficult. Tight fighting on grammar and APIs and idioms, loose fitting on creative text, and it's hard to classify it all up front. But still, if it can recite harry potter that's not on purpose, and it's never trained to predict a specific source losslessly.

And it's not really about randomizing output. The model gives you a list of likely words, often with no clear winner. You have to pick one somehow. It's not like it's taking some kind of "real" output and obfuscating it.

visarga

4 months ago

1 reply

> That’s the causal link.

But copyright was based on substantial similarity, not causal links. That is the subtle change. Copyright is expanding more and more.

In my view, unless there is substantially similarity to the infringed work, copyright should not be invoked.

Even the substantial similarity concept is already an expanded concept from original "protected expression".

It makes no sense to attack gen-AI for infringement, if we wanted the originals we would get the originals, you can copy anything you like on the web. Generating bootleg Harry Potter is slow, expensive and unfaithful to the original. We use gen-AI for creating things different from the training data.

4 months ago

Substantial similarly is less stringent than causal links. With substantial similarity the worlds’s a landline of unpopular media.

Copyright isn’t supposed to apply if you happen to write a story that bares an uncanny similarity to a story you never read written in 1952 in a language you don’t know that sold 54 copies.

4 months ago

2 replies

Those statistical similarities originate from a copyright violation, there's your causal link. Basically the same as selling a game made using pirated Photoshop.

reissbaker

4 months ago

3 replies

Selling a game whose assets were made with a pirated copy of Photoshop does not extend Adobe's copyright to cover your game itself. They can sue you for using the pirated copy of Photoshop, but they can't extend copyright vampirically in that manner — at least, not in the United States.

(They can still sue for damages, but they can't claim copyright over your game itself.)

gowld

4 months ago

1 reply

What is illegal about using pirated software that someone else distributed to you, if you never agreed to a license contract?

4 months ago

1 reply

If you can show that the pirated copy was provided to you without your knowledge, and that there was no reasonable way for you to know that it was pirated, there probably isn't anything illegal about it for you.

But otherwise, you're essentially asking if you can somehow bypass license agreements by simply refusing to read them, which would obviously render all licensing useless.

4 months ago

1 reply

Why do you think reading the agreement is notionally mandatory before the software becomes functional?

4 months ago

Most paid software generally makes you acknowledge that you have read and accepted the terms of the license before first use, and includes a clause that continued use of the software constitutes acceptance of the license terms.

In the event that you try to play games to get around that acknowledgement: Courts aren't machines, they can tell that you're acting in bad faith to avoid license restrictions and can punish you appropriately.

4 months ago

1 reply

Are the authors claiming copyright over the LLM? My understanding is they were suing Anthropic for using the authors' data in their training product. The court ruled that using the books for training would be fair use, but that piracy is not fair use.

Thus, isn't the settlement essentially Anthropic admitting that they don't really have an effective defense against the piracy claim?

reissbaker

4 months ago

Oh I don't disagree that the authors may have a compelling case — I was just responding to the statistical similarities vs copying argument. Anthropic may have violated the authors rights, but technically that doesn't extend copyright via a "causal link."

The authors can still sue for damages though (and did, and had a strong enough case Anthropic is trying to settle for over a billion dollars).

4 months ago

Well, there are damages torts and there's also an unjust enrichment tort. In the paradigm example where you make funding available to your treasurer and he makes an unscheduled stop in Las Vegas to bet it on black, you can sue him for damages. If he lost the bet, he owes you the amount he lost. If he won, he owes you nothing (assuming he went on and deposited the full amount in your treasury as expected).

Or you could sue him on a theory of unjust enrichment, in which case, if he lost, he'd owe you nothing, and if he won, he'd owe you all of his winnings.

It's not clear to me why the same theory wouldn't be available to Adobe, though the copyright question wouldn't be the main thrust of the case then.

4 months ago

The statistical similarities originate from fair use, just as the judge ruled in this case.

4 months ago

>Copyright should be about copying rights, not statistical similarities

So you're agreeing with me? The courts have been pretty clear on what's copyrightable. Copyrights only protect specific expressions of an idea. You can copyright your specific writing of a recipe, but not the concept of the dish or the abstract instructions itself.

arduanika

4 months ago

1 reply

Machines aren't people.

4 months ago

1 reply

They're not, but that's a red herring given that humans vs machines is not a relevant factor in current copyright statues or case law. Short of new laws being passed or activist judges ruling otherwise, it'll remain this way.

snowe2010

4 months ago

But whether or not it is a machine _is_ relevant in current copyright law. https://constitutioncenter.org/blog/federal-court-rules-arti...

https://www.documentcloud.org/documents/26084996-proposed-an...

4 months ago

3 replies

You've misunderstood the case.

The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.

The suit is about Anthropic procuring those materials from a pirated dataset.

The infringement, in other words, happened at the time of procurement, not at the time of training.

If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.

mmargenot

4 months ago

2 replies

Do foundation model companies need to license these books or simply purchase them going forward?

sharkjacobs

4 months ago

2 replies

> On June 23, 2025, the Court rendered its Order on Fair Use, Dkt. 231, granting Anthropic’s motion for summary judgment in part and denying its motion in part. The Court reached different conclusions regarding different sources of training data. It found that reproducing purchased and scanned books to train AI constituted fair use. Id. at 13-14, 30–31. However, the Court denied summary judgment on the copyright infringement claims related to the works Anthropic obtained from Library Genesis and Pirate Library Mirror. Id. at 19, 31.

> reproducing purchased and scanned books to train AI constituted fair use

4 months ago

1 reply

The usual analysis was that when you download a book from Library Genesis, that is an instance of copyright infringement committed by Library Genesis. This ruling appears to reverse that analysis.

papercrane

4 months ago

1 reply

Do you have a source for that because MAI Systems Corp. v. Peak Computer, Inc established that even creating a copy in RAM is considered a "copy" under the Copyright Act and can be infringement.

parineum

4 months ago

1 reply

It's not an issue of where it's being copied, it's who's doing the copying.

Library Genesis has one copy. It then sends you one copy and keeps it's own. The entity that violated the _copy_right is the one that copied it, not the one with the copy.

masfuerte

4 months ago

1 reply

There are many copies made as the text travels from Library Genesis to Anthropic. This isn't just of theoretical interest. English law has specific copyright exemptions for transient copies made by internet routers, etc. It doesn't have exemptions for the transient copies made by end users such as Anthropic, and they are definitely infringing.

Of course, American law is different. But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

4 months ago

> But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

Well, the question here is "who made the copy?"

If you advertise in seedy locations that you will send Xeroxed copies of books by mail order, and I order one, and you then send me the copy I ordered, how many of us have committed a copyright violation?

greensoap

4 months ago

Actually, the court really only said downloading a pirated book to store in your "library" was bad. The opinion is intentionally? ambiguous on whether the decision regarding copies used to train an LLM applies only to scanned books or also to pirated books. The facts found in the case are the training datasets were made from the "library" copies of books that included scans and pirated downloads. And the court said the training copies were fair use. The court also said the scanned library copies were fair use. The court found that the pirated library copies was not fair use. The court did not say for certain whether the pirated training copies were fair use.

bhickey

4 months ago

Probably the latter.

gowld

4 months ago

3 replies

I thought that distribution of copyrighted materials was legally encumbered, not reception thereof.

lawlessone

4 months ago

1 reply

Did they use a torrent? If they used a torrent isn't it likely they distributed it while downloading it?

gkbrk

4 months ago

1 reply

Surely a state-of-the-art tech company would know how to disable seeding.

LeoPanthera

4 months ago

BitTorrent clients will not send data to clients which aren't uploading, as far as I know.

adrr

4 months ago

Downloading is making a copy and covered by copyright law. Its also covered by statutory damages clause of up to $150k per violation if willful. I assume Anthropic knew they were using pirated books.

thayne

4 months ago

Do you have a source for that? My understanding was that both were illegal, although of course media companies have an interest in making people believe that even if it isn't true.

greensoap

4 months ago

2 replies

A point of clarifications and some questions.

The portion the court said was bad was not Anthropic getting books from pirated sites to train its model. The court opined that training the model was fair use and did not distinguish between getting the books from pirated sites or hard copy scans. The part the court said was bad, which was settled, was Anthropic getting books from a pirate site to store in a general purpose library.

  "To summarize the analysis that now follows, the use of the books at issue to train Claude
  and its precursors was exceedingly transformative and was a fair use under Section 107 of the
  Copyright Act. And, the digitization of the books purchased in print form by Anthropic was. 
  also a fair use but not for the same reason as applies to the training copies. Instead, it was a
  fair use because all Anthropic did was replace the print copies it had purchased for its central
  library with more convenient space-saving and searchable digital copies for its central
  library — without adding new copies, creating new works, or redistributing existing copies.
  However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
  permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy."

  "Because the legal issues differ between the *library copies* Anthropic purchased and
  pirated, this order takes them in turn."

Questions

As an author do you think it matters where the book was copied from? Presumably, a copyright gives the author the right to control when a text is reproduced and distributed. If the AI company buys a book and scans it, they are reproducing the book without a license, correct? And fair use is the argument that even though they violated the copyright, they are execused. In a pure sense, if the AI company copied (assuming they didn't torrent back the book) from a "pirate source" why is that copy worse then if they copied from a hard book?

8note

4 months ago

1 reply

> AI company buys a book and scans it, they are reproducing the book without a license, correct

isn't digitizing your own copies as backups and personal use fine? so long as you dont give away the original while keeping the backups. similarly, dont give away the digital copies.

esrauch

4 months ago

It is, Google Books did it over a decade ago (bought up physical books and scanned them all). There were some rulings about how much of a snippet they were allowed to show end users as fair use, but I'm fairly sure the actual scanning and indexing of the books was always allowed.

cortesoft

4 months ago

1 reply

> If the AI company buys a book and scans it, they are reproducing the book without a license, correct?

No? I think there are a lot more details that need to be known before answering this question. It matters what they do with it after they scan it.

greensoap

4 months ago

2 replies

That is only relevant to whether it is fair use not to whether the copying is an infringement. Fair use is what is called an affirmative defense -- it means that yes what I did was technically a violation but is forgiven. So on technicalities the copying is an infringement but that infringement is "okay" because there is a fair use. A different scenario is if the copyright owner gives you a license to copy the work (like open source licenses). In that scenario the copying was not an infringement because a license exists.

gpm

4 months ago

> Fair use is what is called an affirmative defense

Yes

> it means that yes what I did was technically a violation but is forgiven

Not at all. All "affirmative defence" means is that procedural the burden is on me to establish that I was not violating the law. The law isn't "you can't do the thing", rather it is "you can't do the thing unless its like this". There is no violation, there is no forgiveness as there is nothing to forgive, because it was done "like this" and doing it "like this" doesn't violate the law in the first place.

cortesoft

4 months ago

If I have have an app on my phone that lets me point my phone at a page to scan, OCR, and read the page out loud to me, it wouldn't even require fair use, would it?

4 months ago

Penalties can be several times actual damages, and substantial similarity includes MP3 files and other lossy forms of compression which don’t directly look like the originals.

The entire point of deep learning is to copy aspects from training materials, which is why it’s unsurprising when you can reproduce substantial material from a copyrighted work given the right prompts. Proving damages for individual works in court is more expensive than the payout but that’s what class action lawsuits are for.

wingspar

4 months ago

My understanding is this settlement is about the MANNER in which Anthropic acquired the text of the books. They downloaded illegal copies of the books.

There was no issues with the physical copies of books they purchased and scanned.

I believe the issue of USING these texts for AI training is a separate issue/case(s)

tartoran

4 months ago

2 replies

> I think that's fair, considering that two of those books received advances under $20K and never earned out.

It may be fair to you but how about other authors? Maybe it's not fair at all to them.

4 months ago

1 reply

Then they can opt out of the class.

gowld

4 months ago

1 reply

Or the judge can reject the settlement as insufficient, which is what TFA is about.

NoahZuniga

4 months ago

That doesn't seem why the judge rejected the settlement. To me it seems like there judge thought that the details weren't worked out enough to tell if its reasonable.

4 months ago

4 replies

Do they sell their books for more than $3000 per copy? In that case it isn't fair. Otherwise they are getting a windfall because of Anthropic's stupidity in not buying the books.

paulryanrogers

4 months ago

2 replies

Some judgements are punitive, to deter future abuse. Otherwise why pay for anything when you can just always steal and pay only what's owed whenever you're caught?

4 months ago

1 reply

Yes, in this particular case the damages are statutory, which means they are specifically punitive and not in compensation to the author. This is why it is definitely not unfair to the author. It is a lucky win for them.

4 months ago

1 reply

I think you are using a naïve model. You're making the comparison based on "price of book" vs "compensation". Do you think thats all the costs here? Who knows about OP, but I'm willing to bet many of those authors taught legal council, which costs money. Opportunity costs are also difficult to measure. Same with lost future incomes.

I don't think $3k is likely a bad deal, but I still think you're over simplifying things.

4 months ago

2 replies

This is a class action suit, so the legal fees are almost certainly being paid on contingency and not out of pocket. And there is no opportunity cost or lost future income here because this is piracy not theft. The authors were never deprived of any ability to continue to sell their work through normal channels. They only lost the revenue from the sale of a single copy.

4 months ago

1 reply

  > the legal fees are almost certainly being paid on contingency and not out of pocket.

The legal fees for this lawsuit. Not the legal feels for anyone who went and talked to a lawyer suspecting their material was illegitimately used.

You're treating the system as isolated when it is not.

  > no opportunity cost or lost future income here because this is piracy not theft.

I think you are confused. Yes, it is piracy but not like the typical piracy most of us do. There's no loss in pirating a movie if you would never have paid to see the movie in the first place.

But there's future costs here as people will use LLMs to generate books, which is competition. The cost of generating such a book is much cheaper, allowing for a much cheaper product.

  > They only lost the revenue from the sale of a single copy.

In your effort to simplify things you have only complicated them.

4 months ago

1 reply

You are not entitled to protection from future competition, only from loss of sales of your current work. You are not ever entitled to legal fees you pay if you don't file a suit.

[0] https://en.wikipedia.org/wiki/Statutory_damages

4 months ago

  > You are not entitled to protection from future competition

What do you think patents, copyright, trademarks, and all this other stuff is even about?

There's "Statutory Damages" which account for a wide range of things[0].

Not to mention you just completely ignored what I argued!

Seriously, you've been making a lot of very confident claims in this thread and they are easy to verify as false. Just google some of your assumptions before you respond. Hell, ask an LLM and they'll tell you! Just don't make assumptions and do zero amount of vetting. It's okay to be wrong, but you're way off base buddy.

iamsaitam

4 months ago

1 reply

"The authors were never deprived of any ability to continue to sell their work through normal channels" this isn't exactly true is it? If the "AI" used their books for training, then it's able to provide information/value/content from them, lowering the incentive for people to buy these books.

vidarh

4 months ago

1 reply

However, the judge does not appear to believe they have any legal right to protection from that in this case. The settlement is over their use of pirated copies instead of buying one copy of each of the works in question.

jimmydorry

4 months ago

I haven't read this particular case, but typically judges will keep the judgement as narrow as possible... so it may entirely be the case that these IP owners or in similar future cases may also have legal right to protection from it.

f33d5173

4 months ago

Supposing a book is usually $30, then this would be a factor of 100 above that. That seems fairly punitive to me.

[0] https://news.ycombinator.com/newsguidelines.html

4 months ago

3 replies

  | Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.[0]

Please don't be disingenuous. You know that none of the authors were selling their books for $3k a piece, so obviously this is about something more

  > because of Anthropic's stupidity in not buying the books.

And what about OpenAI, who did the same thing?

What about Meta, who did the same thing?

What about Google, who did the same thing?

What about Nvidia, who did the same thing?

Clearly something should be done because it's not like these companies can't afford the cost of the books. I mean Meta recently hired people giving out >$100m packages and bought a data company for $15bn. Do you think they can't afford to buy the books, videos, or even the porn? We're talking about trillion dollar companies.

It's been what, a year since Eric Schmidt said to steal everything and let the lawyers figure it out if you become successful?[1] Personal I'm not a big fan of "the ends justify the means" arguments. It's led to a lot of unrest, theft, wars, and death.

Do you really not think it's possible to make useful products ethically?

[1] https://www.theverge.com/2024/8/14/24220658/google-eric-schm...

janalsncm

4 months ago

1 reply

This isn’t a deal to sell their books. The authors are getting $3k per book while maintaining the rights to their IP. The settlement is to avoid statutory damages which are between $750 and $30k or more per infringement.

One of the consequences of retaining their rights is that they can also sue Meta and Google and OpenAI etc for the same thing.

[0] https://news.ycombinator.com/item?id=45190232

4 months ago

I think we are in agreement[0]. I was just focusing on a different part

4 months ago

3 replies

Where is your evidence that Meta, Google, and OpenAI did the same thing? (As for NVIDIA, do they even train models?) Because if they did, why haven't they been sued? This is a garden variety copyright infringement case and would be a slam dunk win for the plaintiffs. The only novel part of the case is the claim that the plaintiffs lost on, which establishes president that training an LLM is fair use.

> Clearly something should be done because it's not like these companies can't afford the cost of the books

Yes indeed it should, and it has. They have been forced to pay $3000 per book they pirated, which is more than 100x what they would have gained if they had gotten away with it.

IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy. If you want to argue that the penalty should be more, you can do that, but it is completely missing my point. You are talking about what is fair punishment to the companies, and my comment was talking about what is fair compensation to the authors. Those are two completely different things.

https://www.nvidia.com/en-gb/ai-data-science/foundation-mode...

4 months ago

I mean you can Google these... They also have been popping up on HN for the last year, it is even referenced in the article, and there's even another post in the sidebar titled "Anthropic Record AI Copyright Pact Sets Bar for OpenAI, Meta"[0], so I really didn't feel it was necessary to provide links. But sure, if you're feeling lazy, I got your back. I'll even limit it to HN posts so you don't have to even leave the site

  Torrenting:
  Meta Pirating Books[1,2,3]
    - [1] Fun fact, [1] is the most popular post of all time on HN for the search word "torrent" and the 5th ranking for "Meta". [2] is the 16th for "illegal"
  Nvidia [4,5]
  Apple, Nvidia, Anthropic[6]
  GitHub [7,8]
  OpenAI [9,10]
  Google [11]
    - I mean this one was even mentioned in the articled from the Anthropic post from a few days ago[12]

I hope that's sufficient. You can find plenty more if you do a good old fashion search instead of just using the HN search. But most of these were pretty high profile stories so was pretty quick to look.

  > which establishes president that training an LLM is fair use.
                      ~~~~~~~~~
                      precedent

I think you misunderstand. The precedent is over the issue of piracy. This has not made precedence over the issue of fair use. There is ongoing litigation, but there was precedence set in another lawsuit with Meta[13], which is currently going through appeals. I'll give you a head start on that one [14,15]. But the issue of fair use is still being debated. These things take years and I don't think anyone will be surprised when this stuff lands in some of the highest courts and gets revisited in a different administration.

  > IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy.

Sure. You can have whatever opinion you want. I wasn't arguing about your opinion. I even agreed with it[16]!

But that is a different topic all together. I still think you've vastly over simplified the conversation and thus unintentionally making some naive assumptions. It's the whole reason I said "probably" in [16]. The big difference being just that you're smart enough to figure out how law works and I'm smart enough to know that neither of us are lawyers.

And please don't ask me for more citations unless they are difficult to Google... I think I already set some kinda record here...

  [0] https://archive.is/3oCg8
  [1] https://news.ycombinator.com/item?id=42971446
  [2] https://news.ycombinator.com/item?id=43125840
  [3] https://news.ycombinator.com/item?id=42772771
  [4] https://news.ycombinator.com/item?id=40505480
  [5] https://news.ycombinator.com/item?id=41163032
  [6] https://news.ycombinator.com/item?id=40987971
  [7] https://news.ycombinator.com/item?id=33457063
  [8] https://news.ycombinator.com/item?id=27724042
  [9] https://news.ycombinator.com/item?id=42273817
  [10] https://news.ycombinator.com/item?id=38781941
  [11] https://news.ycombinator.com/item?id=11520633
  [12] https://news.ycombinator.com/item?id=45142885
  [13] https://perkinscoie.com/insights/update/court-sides-meta-fair-use-and-dmca-questions-leaves-door-open-future-challenges
  [14] https://arstechnica.com/tech-policy/2025/07/meta-pirated-and-seeded-porn-for-years-to-train-ai-lawsuit-says/
  [15] https://torrentfreak.com/copyright-lawsuit-accuses-meta-of-pirating-adult-films-for-ai-training/
  [16] https://news.ycombinator.com/item?id=45190232

jimmydorry

4 months ago

> IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy.

Anti-piracy groups use scare letters on pirates where they threaten to sue for tens of thousands of dollars per instance of piracy. Why should it be lower for a company?

vidarh

4 months ago

> As for NVIDIA, do they even train models?

Yes. Nemotron:

kelnos

4 months ago

1 reply

> And what about $OTHER_AI_COMPANY, who did the same thing?

If there's evidence of this that will stand up in court, they should be sued as well, and they'll presumably lose. If this hasn't happened, or isn't in the works, then I guess they covered their tracks well enough. That's unfortunate, but that's life.

4 months ago

I mean they are being sued? I provided a long list of HN links in the sibling comment. But you know... you can also check Google[0]

[0] https://gprivate.com/6ib6y

giveita

4 months ago

4 replies

If I copy your book and sell a million bootleg copies that compete directly with your book is that worth the $30 cover price?

This is what generative AI essentially is.

Maybe the payment should be $500/h (say $5k a page) to cover the cost of preparing a human verified dataset for anthropic.

II2II

4 months ago

2 replies

The thing is: you aren't distributing copies with generative AI, in any sensible meaning of the word.

Don't get me wrong: I think this is in incredibly bad deal for authors. That said, I would be horrified if it wasn't treated as fair use. It would be incredibly destructive to society since people would try to use such rulings to chissel away at fair use. Imagine schools who had to pay yearly fees to use books. We know they would do that, they already try to do so (single use workbooks, online value added services). Or look at software. It is already going to be problematic for people who use LLMs. It is already problematic due to patents. Now imagine what would happen if reformulating algorithms that you read in a book was not considered as fair use. Or look at books themselves. A huge chunk of non-fiction consists of doing research and re-expressing ideas in non-original terms. Is that fair use? The main difference between that and a generative AI is we can say a machine did it in the case of generative AI, but is that enough to protect fair use in the conventional sense?

rkagerer

4 months ago

1 reply

Imagine schools who had to pay yearly fees to use books

I feel like we aren't far from that. Wouldn't be surprised if new books get published (in whatever medium) that are licensed out instead of sold.

II2II

4 months ago

You just reminded me that it is a thing, or at least was a thing. Around the time that I was leaving the university world publishers were were starting to introduce time limited ebooks. This not only affected the second hand book market, but students who would have their books in the past.

giveita

4 months ago

1 reply

This is parallel to mass surveillance. Surveillance is OK (private eye) so dragnetting is also OK as it is just scaled up private detectives. If 1 is OK then 1+1 is OK. And so is by peano, a Googolplex of the OK thing.

rangestransform

4 months ago

This is unironically what happened with Katz vs. USA wrt. expectation of privacy in public

4 months ago

In that case the damages would be $3000 per copy you sold. Distributing copyrighted work is an entirely different category of offense than just simply downloading and consuming. Anthropic didn't distribute any copies, so the damages are limited to the one copy they pirated. That is not remotely what generative AI is, and it's why the judge ruled that it was perfectly legal to feed the books to the model.

aeon_ai

4 months ago

It’s been determined that training is fair use by the same judge - Anthropic did in fact buy copies of books and train on those as well.

Thus the $3k per violation is still punitive at (conservatively) 100x the cost of the book.

Given that it is fair use, Authors do not have rights to restrict training on their works under copyright law alone.

megaman821

4 months ago

I am not sure what types of books you read, but AI has replaced absolutely no books for me.

seanhunter

4 months ago

If you read the copyright text on the back of the title page of a book, buying it doesn’t give you the right to “mechanically reproduce” the book. I would be very surprised if there was a court ruling that didn’t either A)completely strike that notice and say it’s fair game to photocopy or scan books you have bought for any purpose (which is not what courts have held in the past, so it would be a big shift) or B)uphold it and say it also applies to scraping the content of a book for training.

…especially given the US “fair use” doctrine takes into account the effect that a particular use might have on the market for similar works, so the authors are bound to argue that the existence of AI that can reproduce fanfiction-like facsimiles of works at scale is going to poison the well and reduce the market for people spending actual money on future works (whether or not that’s true is another question).

So in my view the court is going to say that buying a book doesn’t give them the right to train on the contents because that is mechanical reproduction which is explicitly disallowed by the copyright notice and they don’t fall under the “fair use” carveout because they affect the future market. There isn’t anywhere else where they were granted the right to use the authors’ works so the work is disallowed. Obviously no court finding is ever 100% guaranteed but that really seems the only logically-consistent conclusion they could come to.

thayne

4 months ago

1 reply

How much of that $9000 will go to your publisher?