Openai May Not Use Lyrics Without License, German Court Rules
Postedabout 2 months agoActiveabout 2 months ago
reuters.comTechstoryHigh profile
heatedmixed
Debate
85/100
AICopyrightMusic Industry
Key topics
AI
Copyright
Music Industry
A German court rules that OpenAI may not use copyrighted lyrics without a license, sparking debate about AI's impact on creative industries and the future of copyright law.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
46m
Peak period
145
0-12h
Avg / period
22.9
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Nov 11, 2025 at 6:20 AM EST
about 2 months ago
Step 01 - 02First comment
Nov 11, 2025 at 7:06 AM EST
46m after posting
Step 02 - 03Peak activity
145 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 18, 2025 at 4:41 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45886131Type: storyLast synced: 11/20/2025, 8:32:40 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
However of course OpenAI will ignore this and at worst nothing will change and at best they get a slap on the wrist and a fine and continue scraping.
You can’t take that stuff out of the models at this point anyway.
But realistically, all that will happen is that the "Pauschalabgabe" is extended to AI subscriptions, making stuff more expensive for everyone.
And if you are not capable to do this you will likely not succeed with the chatgpt instructions.
Soon music industry will be begging OpenAI for exposure of their content, just like the media industry is begging Google for scraping.
I guess the main difference between the situation with language models and humans is one of scale.
I think the question should be viewed like this, if I as a corporation do the same thing but just with humans, would it be legal or not. Given a hypothetical of hiring a bunch of people, having them read a bunch of lyrics, and then having them answer questions about lyrics. If no law prohibits the hypothetical with people, then I don't see why it should be prohibited with language models, and if it is prohibited with people, then there should be no specific AI ruling needed.
All this being said, Europe is rapidly becoming even more irrelevant than it was, living of the largess of the US and China, it's like some uncontacted tribe ruling that satellites can't take areal photos of them. It's all good and well, just irrelevant. I guess Germany can always go the route of North Korea if they want.
I think the difference here is that your example is what a search engine might do, whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.
Is this not something every single creative person ever has done? Is this not what creating is? We take in the world, and then create something based on that.
If you sell tickets to an event where you read the lyrics aloud, it's commercial performance and you need to pay the author. (Usually a cover artist would be singing, but that's not a requirement.)
So it's not like a human can recite the lyrics anywhere freely either.
If they hire me primarily to recite lyrics, then sure, that would probably be some manner of infringement if I don't license them. But I feel like the case with a language model is much more the former than the latter.
But then with the analogy, if I'm a secretary and the copyright holder of lyrics calls me and asks if I know the lyrics of one of their songs, I don't think it's infringement to say yes and then repeat it back to them.
The LLM is not publicising anything, it's just doing what you ask it to do, it's the humans using it publicising the output.
With all major models not basically trained on nearly all available data, beyond the financial AI bubble about to burst there’s also a big content bubble that’s about exhausted as folks are just pumping out slop vs producing original creative human output. That may be the ultimate long term tragedy of the present AI hype cycle. Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.
I think people would still produce original things as long they have the means for doing it. I guess we could say it is our nature. My fear is AI monopolizing the wealth that once would go to support people producing art.
I went to a grammar school and I write in mostly pretty high-quality sentences with a bit of British English colloquialism. I spell well, spend time thinking about what I am saying and try to speak clearly, etc.
I've always tried to be kind about people making errors but I am currently retraining my mind to see spelling mistakes and grammar errors as inherent authenticity. Because one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar.
And you're right: IMO the rage against the cultural damage AI will do is only just beginning, and I don't think people have clocked on to the fact that economic havoc is built-in, success or failure.
The web/AI/software-tech industry will be loathed even more than it is now (and this loathing is increasingly justified)
Just wait a few more years until the majority of ChatGPT training data is filled with misspellings, accidental eggcorns, malapropisms and terrible grammar.
That, and AI slop itself.
Tastes will mature, society will more vocally mock this crap, and we’ll stop seeing the sloppier stuff come out of reputable locations.
BBC truly was ahead of times with their deletion of tv shows.
We have cars, buses and planes, yet people do partake in pilgrimages. The process matters, even if only personally.
Plastic/synthetics are the slop of the physical world. They're a side product of extracting oil and gas so they're extremely cheap.
Yet if you look at synthetics by volume, probably 99% of them are used just because they're cheaper than the natural alternative. Yes, some have characteristics that are novel, but by and large everything we do with plastics is ultimately based on "they're cheaper".
Plastics, unfortunately, aren't going away.
Honestly if your only motivation for creating art was “computers can’t do what I do” then… I don’t want to be too gatekeepy about it, but that doesn’t sound like you’re a ‘real’ artist to me. Real artists create art because they enjoy doing it, not because it’s the exclusive domain of humans.
You don’t need to be special, you don’t need to be the best, you don’t need to even be good or successful or recognized or appreciated (although of course all those things are nice) - you just have to be creating art.
But I'd be surprised if that was generally the case. It's easy to see why ChatGPT 1:1 reproducing a song's lyrics would be a copyright issue. But creating a derivative work based on the song?
What if I made a website that counts the number of alliterations in certain songs' lyrics? Would that be copyright infringement, because my algorithm uses the original lyrics to derive its output?
If this ruling really applied to any alogrithm deriving content from copyright protected works, it would be pretty absurd.
But absurd copyright laws would be nothing new, so I won't discount the possibility.
1. it wouldn't matter as derivative work still needs the original license
2. expect if it's not derivative but just inspired,
and the court case was about it being pretty much _the same work_
OpenAIs defense also wasn't that it's derived or inspired but, to quote
> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.
and the court oder said more or less
- if it can reproduce the song lyrics it means it stored a copy of the song lyrics somehow somewhere (memorization), but storing copies requires a license and OpenAI has no license
- it it outputs a copy of the song lyrics it means it's making another copy of them and giving them to the user which is copyright infringement
and this makes sens, if a human memorizes a song and then writes it down when asked it's still is and always has been copyright infringement (else you could just launder copy right by hiring people to memorize things and then write them down, which would be ridiculous).
and technically speaking LLMs are at the core a lossy compressed storage of their training content + statistic models about them. And to be clear that isn't some absurd around five corners reasoning. It's a pretty core aspect of their design. And to be clear this are things well know even before LLMs became a big deal and OpenAI got huge investment. OpenAI pretty much knew about this being a problem from the get to go. But like any recent big US "startup" following the law doesn't matter.
it technically being a unusual form of lossy compressed storage means it makes that the memorization counts as a copyright infringement (with current law)
but I would argue the law should be improved in that case, so that under some circumstances "memorization" in LLMs is treated as "memorization" in Humans (i.e. not a illegal copy, until you make it one by writing it down). But you can't make it all circumstances because like mentioned you can use the same tech to bascially to lossy file compression and you don't want people to launder copy right by training an LLM on a a single text/song/movie and then distributing that...
FYI most do. Have a look at many software licenses. In particular Microsoft (who as we know invested lots into OpenAI), will argue it is so.
I would also say it makes sense. If it wasn't the case we can just load a program into lots of computers using only a single license/installation medium.
Is running a program making a copy? If I run it on some distributed system is it then making more copies than allowed? This gets insane quickly.
I think it's just a bandaid for fixing removable drive installations. These should have had their own laws/rules/etc.
It has knock-on effects like being able to enforce other IP law to someone you just licensed your software to.
Similarly I think this is more an "interpret words to get the desired outcome instead of the likely spirit or meaning of the words".
The law doesn't care what technical trickery you use to encode/compress copyrighted material. If you take data and then create a equation which contains it based on it it which can reproduce the data trivially then yes, IMHO obviously, this form of embedding copyrighted data still is embedding copyrighted data.
Think about it if that weren't the case I could just transform a video into an equation system and then distribute the latest movies, books, whatever to everyone without permission and without violating copy right even through de-facto I'm doing exactly what copy right law is supposed to prevent... (1)
Just because you come up with a clever technical trick to encode copyrighted content doesn't mean you can launder/circumvent copyright law, or any law at that. Law mostly doesn't care about technical tricks but the outcomes.
Maybe even more importantly LLMs under hood the are basically at the core compression systems where by not giving them enough entropy to store information you force to generalize and with that happen to create a illusion of sentience.
E.g. what is the simplest case of training a transformer? You put in data to create the transformer state (which has much smaller entropy) and then output it from that state and then you find a "transformation" where this works as well as possible for a huge amount of different data. That is a compression algorithm!!! And sure in reality it's more complex you don't train to compress a specific input but more like a dictionary of "expected" input->output mappings where the output parts need to be fully embedded i.e. memorized in the algorithm in some form.
LLMs are basically obscure multi layered hyper dimensional lossy compression systems which compress a simple input->output mapping (i.e. database) defined by all entries in it's training data. A compressed mapping Which due to forcing a limited entropy needs to do compression through generalization....
And since when is compression allowing you to avoid copyright??
So if you want it to be handled differently by law because it's isn't used as a compressed database you have to special case it in law.
But it is used as a compressed database, in that case e.g. it was used to look up lyrics based on some clues. That's basically a lookup in a lossy compressed obscure database system no matter how you would normally think about LLMs.
(1): And in case it's not clear this doesn't mean every RNG is a violation because under some unknown seed it probably would reproduce copyrighted content. Because the RNG wasn't written "based on" the copy righted content.
Does that mean I can distribute the seed if I find one and this RNG wasn't trained on that content?
Does it prevent me from sharing that number on the internet?
It seems like theres a lot of subjective intent here that I'm extremely skeptical
For an LLM also:
If it's lossy enough that it needs RAG to fix the results is that okay?
-------------------
In my opinion I think actually getting the output is where the infringement happens. Having and distributing the LLM weights shouldn't be infringment (in my head) because of the enforcability of results. Otherwise you risk banning RNGs or them all being forced to prove they didn't train on copyrighted content
You need a license to create derivative works.
the lawsuit was also not about weather it is or isn't copy right infringement. It was about who is responsible (OpenAI or the user who tries to bait it into making another illegal copy of song lyrics).
A model outputting song lyrics means it has it stored somehow somewhere. Just because the storage is in a lossy compressed obscure hyper dimensional transformation of some kind, doesn't mean it didn't store an illegal copy. Or it wouldn't have been able to output it. _Technical details do not protect from legal responsibilities (in general)_
you could (maybe should) add new laws which in some form treat LLM memorized things the same as if a human did memorize it, but currently LLMs have no special legal treatment when it comes to them storing copies of things.
Did I suggest either of those things?
They're not saying no LLMs, they're saying no LLMs using lyrics without a license. OpenAI simply need to pay for a license, or train an LLM without using lyrics.
If it's really essential that they train their models on song lyrics, or books, or movie scripts, or articles, or whatever, they should pay license fees.
(Vis a vis, I take it you write a certified letter to Universal before reproducing Happy Birthday in public? ;) That is actually a far more egregious violation indeed, as it is both a performance of the copyrighted work and in front of an audience - neither of which are the case for the chatbot - yet one we all seem to understand to be fair use.
They already "filter" the code to prevent it from happening (reproducing exact works). My guess it is just superficially changing things around so it is harder to prove copyright violations.
It doesn't appear that modern LLMs are really that hard to build, expensive perhaps, but if you have monopoly on a large enough market, price isn't really your main concern.
That's not how laws and regulations work in European or even EU countries. Courts/the legal system in Germany can not set legal precedents for other countries, and countries don't use legal precedents from other countries, as they obviously have different laws. It could be cited as an authority, but no one is obligated to follow that.
What could happen for example, would be that EU law is interpreted through the CJEU (Court of Justice of the European Union), and its rulings bind EU member states, but that's outside of what individual countries do.
Sidenote, I'm not a English native speaker, but I think it's "precedent", not "precedence", similar words but the first one is specifically what I think you meant.
The seminal authority for all copyright laws, the Berne Convention, is ratified by 181 countries. Its latest revisions are TRIPS (concerning authorship of music recordings) and the WIPO Copyright Treaty (concerning digital publication), both of which are ratified by the European Union as a whole. It's not directly obvious to me that EU member states have different laws in this particular area.
That said, the EU uses the civil law model and precedent doesn't quite have the same weight here as it does under common law.
Do you have some sort of different understanding of copyright law where it's legal to commercially use lyrics (verbatim, mind you) without a license?
Some places have a concept of de minimus as applied to copyright. It is often not prosecuted to have an acoustic guitar and an open case and play music on a park bench. You may need a license for busking in some places - but that's not tied to the music that you play (it could be your own or it could be covers).
I am not saying that it is legal, but rather that it is beneath the notice of the courts.
yes, even if just looking at other court cases in Germany the role of precedent is "in general" not quite as powerful (as Courts are supposed to follow what the law says not what other courts say). To be clear this is quite a bit oversimplified. Other court ruling does still matter in practice, especially if it is from higher courts. But it's very different to how it is commonly presented to work in the US (can't say if it actually works that way).
but also EU member states do synchronize the general working of many laws to make a unified marked practically possible and this does include the general way copy right works (by implementing different country specific laws which all follow the same general framework, so details can differ)
and the parts which are the same are pretty clear about that
- if you distribute a copy of something it's a copy right violation no matter the technical details
a human memorizing the content and then reproducing it would still make it a copy right infringement, so it should be pretty obvious that this applies to LLMs to, where you potentially could even argue that it's not just "memorizing it" but storing it compressed and a bit lossy....
and that honestly isn't just the case in the Germany, or the EU, the main reason AI companies got mostly away with it so far is due to judges being pressured to rule leniently as "it's the future of humanity", "the country wouldn't be able to compete" etc. etc. Or in other words corruption (as politicians are supposed to change laws if things change not tell judges to not do their job properly).
I think you're right, also not native English speaker.
No, you're right that a German can't influence e.g. the similar lawsuit against Suno in Denmark, but as you point out, it can, and most likely will be cited, and I think it's often the case that this carries a lot of weight.
German student performance may plateau, but when student performance in other countries falls, that still leaves them in a better place.
God I can only hope
> I don’t think a country’s government can justify no commercial LLMs to its populace.
Counter-argument: can any country's government justify allowing its population and businesses to become completely dependent on an overseas company which does not comply with its laws? (For Americans, think "China" in this case)
second it probably would be good for the EU and even US as it would de-monopolize the market a bit before that becomes fully impossible
> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.
Another glimpse into the "mind" of a tech corporation allowing itself full freedom to profit from the vast body of human work available online, while explicitly declining any societal responsibility at all. It's the user's fault, he wrote an illegal prompt! We're only providing the "technology"!
It's an interesting observation that the big AI corps very much argue that learning "is the same that humans do", so fair use. But then when it comes to using that learning they argue the other way, i.e. "this is just a machine, it's the person asking who is doing the infringement".
Your prompt may be asking something for illegal (i.e. reproducing the lyrics), but the one reproducing the lyrics is the AI company, not you yourself.
In your example you are asking Adobe to draw Mickey Mouse and Adobe happily draws a perfect rendition of Mickey Mouse for you and you have to pay Adobe for that image.
Reproduction (again, IANAL) seems to consist of a lot more than "I made it", it consists of how you use it and whether that usage constitutes infringement.
EDIT: To add, genuine question, what does "asking" come down to? I can ask Photoshop to draw Mickey Mouse through a series of clever Mickey-Mouse-shaped brush strokes. I can ask Microsoft Word to reproduce lyrics by typing them in. At what gradient between those actions and text prompting am I (or OpenAI, or Adobe) committing copyright infringement?
- You asking the painter to create a Mickey Mouse painting: not illegal. You still are asking for a derivative work without permission, but if used privately you're good (this is different per jurisdiction) - The artist creating the painting of a derivative work is acting illegally - they are selling you the picture and hence this is a commercial act and trademark infringement - Displaying the bought Mickey Mouse image publicly is likely infringement, but worse is if you would charge admission to show the picture, that would definitely be illegal - If you were to hide the image in your basement and look at it privately, it would most likely not be illegal (private use - but see first point since this is different per jurisdiction)
Comparing violations doesn't really make sense (the artist creating it vs. you displaying it) - the act of creating the image for money is illegal. If it were the artist creating the image for him/herself - that would be fine.
Now getting back to the LLM and your question which also the court answered (jurisdiction: Germany). The courts opinion is that the AI recreating these lyrics by itself is illegal (think about the artist creating the image for you for money).
Personally I would think the key part and similarity is the payment. You pay for using OpenAI. You pay for it creating those lyrics/texts. In my head I can create a similar reasoning to your Mickey Mouse example. If we'd take open source LLMs and THEY would create perfect lyrics, I think the court would have a much harder case to make. Who would you be suing and for what kind of money? It would all be open source and nobody is paying anyone anything to recreate the lyrics. It would be and is very hard to prove that the LLMs were trained on copyrighted material - in the lyrics example, they may have ingested illegal lyrics-sharing sites, but they may also just have ingested Twitter or Reddit where people talk about the lyrics - how could any LLM know that these contents were illegal or not to be ingested.
https://www.digitaltrends.com/social-media/rap-genius-deserv... (2013)
Long ago the first site I remember to do this was lyrics.ch, which was long since shut down by litigation. I'm not endorsing the status quo here, but if the licensing system exists it is obviously unfair to exempt parties from it simply because they're too big to comply.
E.g. why offering lame chat agents as a service, when you can keep the value generation in-house. E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation. Just cut off the end users/public form the model access, and flood the market with AI generated apps/content/works yourself (or with selected partners). Then have a lawyer checking right before publishing.
So this court decision may turn everything worse? I don't know.
If there was a lot of gold to find they wouldn't sell the shovels.
There is a reason that Cisco doesn't offer websites, and you are probably actively ignoring whatever websites your ISP has. ASML isn't making chips, and TSMC isn't making chip designs
A media generation company that is forced to publish uncopyrightable works, because it cannot make the usage to these media generators public, since that would violate copyright - that does sound like a big win for everyone but that company.
How is that worse?
Because that's the only business model that the management of these model provider companies suspect to have a chance of generating income, at the current state.
> While I partially understand (but not support) the hate against AI due to possible plagiarism
There's no *possible* plagiarism, every AI slop IS result of plagiarism.
> E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation.
Having lame chat agents as a service does not preclude them from doing this. The fact that they are only selling the shovels should be somewhat insightful.
For AI to have a positive ROI, it has to be highly applicable to basically every industry, and has to be highly available.
Your cheap app just got really expensive
Of course, maybe OpenAI et al should have get a license before training on the lyrics or to avoid training on copyrighted content. But the first would be expensive and the latter would require them to develop actual intelligence.
Same goes for websites where you can watch piracy streams. "The action is the user pressing play" sounds like it might win you an internet argument, but I'm 99% sure none of the courts will play those games, you as the operator who enabled whatever the user could do ends up liable.
My concern is that where are we going to put the line: If I type a copyrighted song in Word is Microsoft liable? If I upload a lyric to ChatGPT and ask it to analyze or translate it, is it a copyright violation?
I totally understand your line of thinking. However, the one I'm suggesting could be applied as well and it has precedents in law (intellectual authors of crimes are punishable, not only the perpetrators).
Well...YouTube is liable for any copyrighted material on their site, and do 'more than one thing'
The problem is if OpenAI is liable for reproducing copyrighted content, so will be other products such as word processors, video editors and so on. So, as society where we will put the line?
Are we going to tolerate some copyright infringement in these tools or are we going to pursue copyright infringements even in other tools as we already got the tools to detect it?
We cannot have double standards, law should be applied equally to everyone.
I do think that overall making OpenAI liable for output is a bad precedent, because of repercusions beyond AI tools. I'm all fine with making them liable for having trained on copyrighted content and so on...
> Not really. Youtube is not liable as long as they remove the content after a copyright complain and other mechanisms.
They have to take action precisely because they're liable for the material on their platform.
At the very least, the users being liable instead of OpenAI makes no sense. Like arresting only drug users and not dealers.
I'm just asking where are we going to put the line and why.
> However, the lyrics are shown because the user requested them, shouldn't be the user be liable instead?
I would imagine the sociological rationale for allowing sex work would not map to a multi-billion-dollar company.
And to add, the social network example doesn't map because the user is producing the content and sharing it with the network. In OpenAI's case, they are creating and distributing copyrighted works.
The social networks are distributing such content AND benefiting from selling ads on them. Adding ads on top is a derivative work.
Personally I'm on the side of penalizing the side that provides the input, not the output:
- OpenAI training on copyrighted works. - Users requesting custom works based on copyrighted IP
That is my opinion on how it should be layered, that's it. I'm happy to discuss why it should be that way or why not. As I put in other comment, my concern is that mandating copyright filtering o each generative tool would end up propagating to every single digital tool, which as society we don't really want.
If that was case then Google wouldn't receive DMCA takedown of piracy links, instead offer up users searching for piracy content. Former is more prevalent than latter because one, it requires invasion of privacy - you have to serve up everyone's search results
two, it requires understanding of intent.
Same is the issue here. OpenAI then needs to share all chats for courts to shift through and second, how to judge intent. If someone asks for a German pop song and OpenAI decides to output Bochum - whose fault is that?
Everyone knows that these LLMs were trained on copyrighted material, and as a next-token prediction model, LLMs are strongly inclined to reproduce text they were trained on.
There’s nothing law breaking about quoting publicly available information. Google isn’t breaking the law when it displays previews of indexed content returned by the search algorithm, and that’s clearly the approach being taken here.
Most LLMs were trained on vast troves of pirated copyrighted material. Folks point this out, but they don't ever talk about what the alternative was. The content industries, like music, movies, and books, have done nothing to research or make their works available for analysis and innovation, and have in fact fought industries that seek to do so tooth and nail.
Further, they use the narrative that people that pirate works are stealing from the artists, where the vast majority of money that a customer pays for a piece of copyrighted content goes to the publishing industry. This is essentially the definition of rent seeking.
Those industries essentially tried to stop innovation entirely, and they tried to use the law to do that (and still do). So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.
The authors have been abused by the publishing industry for many decades. I think they're just caught in the middle, because they were never going to get a payday, whether from AI or selling books. I think the percentage of authors that are commercially successful is sub 1%.
We have laws and rules, but those are intended to work for society. When they fail to do so, society routes around them. Copyright in particular has been getting steadily weaker in practice since the advent of the Internet, because the mechanisms it uses to extract value are increasingly impractical since they are rooted in the idea of printed media.
Copyright is fundamentally broken for the modern world, and this is just a symptom of that.
Nothing in the ruling says it is legal to start outputting and selling content based off the results of that training process.
Your second paragraph is not what I'm discussing right now, and was not ruled on in the case you're referring to. I fully expect that, generally speaking, infringement will be on the users of the AI, rather than the models themselves, when it all gets sorted out.
Nothing says it's illegal, either. If anything the courts are leaning towards it being legal, assuming it's not trained on pirated materials.
>A federal judge dealt the case a mixed ruling in June, finding that training AI chatbots on copyrighted books wasn't illegal but that Anthropic wrongfully acquired millions of books through pirate websites.
https://www.npr.org/2025/09/05/g-s1-87367/anthropic-authors-...
That LLMs would be as expensively priced as they really are on society and energy costs? A lot of things are possible, whether they are economically feasible is determined by giving them a price. When that price doesn't reflect the real costs, society starts to wast work on weird things, like building large AI centers, because of a financial bubble. And yes putting people out of business does come with a cost.
"Innovation" is not an end goal.
I run my AI models locally, paying for the hardware and electricity myself, precisely to ensure the unit economics of the majority of my usage are something I can personallly support. I do use hosted models regularly, though not often these days, which is why I say "the majority of my usage".
In terms of the concerns you express, I'm simply not worried. Time will sort it out naturally.
I think they try to expand copyright from "protected expression" to "protected patterns and abstractions", or in other words "infringement without substantial similarity". Otherwise why would they sue AI companies? It makes no sense:
1. If I wanted a specific author, I would get the original works, it is easy. Even if I am cheap it is still much easier to pirate than use generative models. In fact AI is the worst infringement tool ever invented - it almost never reproduces faithfully, it is slow and expensive to use. Much more expensive than copying which is free, instant and makes perfect replicas.
2. If I wanted AI, it means I did not want the original, I wanted something Else. So why sue people who don't want the originals? The only reason to use AI is when you want to steer the process to generate something personalized. It is not to replace the original authors, if that is what I needed no amount of AI would be able to compare to the originals. If you look carefully almost all AI outputs get published in closed chat rooms, with a small fraction being shared online, and even then not in the same venues as the original authors. So the market substitution logic is flimsy.
So you assign zero value to the process of creation?
Zero value to the process of production?
So people who write and produce books, shows and films should all do what? Give up their craft?
Process of creation itself is gratifying and valuable to those who will pursue it. No reason to additionally reward it.
Lamp lighters had to give up their craft I suppose and made way to a better world.
spoken like someone who has never made anything in the real world
Holding a boom mic in the air is not gratifying and valuable to anyone who has to do it.
The fruits of your labour are not your labour.
You mean like, murder ?
the code models should also be banned, and all output they've generated subject to copyright infringement lawsuits
the sloppers (OpenAI, etc) may get away with it in the US, but the developed world has far more stringent copyright laws
and the countries that have massive industries based on copyright aren't about to let them evaporate for the benefit of a handful of US tech-bros
because other than public domain they all require at least displaying the license, which "AI" ignores
Pirating material is a violation of copyright, which some labs have done, but that has nothing to do with training AI and everything to do with piracy.
Why wouldn’t training be illegal? It’s illegal for me to acquire and watch movies or listen to songs without paying for them*. If consuming copyrighted material isn’t fair use, then it doesn’t make sense that AI training would be fair use.
* I hope it’s obvious but I feel compelled to qualify that, of course, I’m talking about downloading (for example torrenting) media, and not about borrowing from the library or being gifted a DVD, CD, book or whatever, and not listening/watching one time with friends. People have been successfully prosecuted for consuming copyrighted material, and that’s what I’m referring to.
> When building its tool, Ross sought to license Westlaw’s content as training data for its AI search engine. As the two are competitors, Thomson Reuters refused. Instead, Ross hired a third party, LegalEase, to provide training data in the form of “Bulk Memos,” which were created using Westlaw headnotes. Thomson Reuters’s suit followed, alleging that Ross had infringed upon its copyrighted Westlaw headnotes by using them to train the AI tool.
And the VC ecosystem and valuations are built around this assumption.
The result was mostly comical, the commentaries for vacuous pop music all sounded more or less the same: “‘Shake Your Booty’ by KC and the Sunshine Band expresses the importance of letting one’s hair down and letting loose. The song communicates to listeners how liberating it is to gyrate one’s posterior and dance.” Definitely one of the first signs that this new tech was not going to be good for the web.
85 more comments available on Hacker News