Back to Home11/12/2025, 2:08:28 PM

Fighting the New York Times' invasion of user privacy

397 points
416 comments

Mood

heated

Sentiment

negative

Category

tech

Key topics

privacy

New York Times

data protection

Debate intensity85/100

OpenAI is fighting back against the New York Times' alleged invasion of user privacy, sparking a heated debate.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

18m

Peak period

140

Day 1

Avg / period

40

Comment distribution160 data points

Based on 160 loaded comments

Key moments

  1. 01Story posted

    11/12/2025, 2:08:28 PM

    6d ago

    Step 01
  2. 02First comment

    11/12/2025, 2:26:45 PM

    18m after posting

    Step 02
  3. 03Peak activity

    140 comments in Day 1

    Hottest window of the conversation

    Step 03
  4. 04Latest activity

    11/15/2025, 4:25:03 PM

    3d ago

    Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (416 comments)
Showing 160 comments of 416
mac3n
6d ago
4 replies
> Trust, security, and privacy guide every product and decision we make.

-- openai

gk1
6d ago
1 reply
You know you have a branding problem when (1) you have to say that at the outset, and (2) it induces more eyerolls than a gaggle of golf dads.
wkat4242
6d ago
The same with Google "don't be evil" these days.
nrhrjrjrjtntbt
6d ago
1 reply
- any corporation

remember a corporation generally is an object owned by some people. Do you trust "unspecified future group of people" with your privacy? You can't. Best we can do is understand the information architecture and act accordingly.

latexr
6d ago
> - any corporation

I don’t recall seeing many food, furniture, plant, or generally anything not related to tech talking about trust, security, and privacy as guiding principles.

frig57
6d ago
Stopped reading at this line
great_wubwub
6d ago
> Trust, security, and privacy guide every product and decision we make except ones that involve money.

-- openai, probably.

jcranmer
6d ago
1 reply
"How dare the New York Times demand access to our vault of everything-we-keep to figure out if we're a bunch of lying asses. We must resist them in the name of user privacy! Signed, the people who have scraped literally everything to incorporate it into the products we make."

OpenAI may be trying to paint themselves as the goody-two-shoes here, but they're not.

greyman
6d ago
3 replies
But that vault can contain conversation between me and chatgpt, which I willingly did, but with the expectation that only openai has access to it. Why should some lawyer working for NYT have access to it? OpenAI is precisely correct, no matter what other motives could be there.
jcranmer
6d ago
https://openai.com/policies/privacy-policy/

> We may use Personal Data for the following purposes: [...] To comply with legal obligations and to protect the rights, privacy, safety, or property of our users, OpenAI, or third parties.

OpenAI outright says it will give your conversations to people like lawyers.

If you thought they wouldn't give it out to third parties, you not only have not read OpenAI's privacy policy, you've not read any privacy policy from a big tech company (because all of them are basically maximalist "your privacy is important, we'll share your data only with us and people who we deem worthy of it, which turns out to be everybody.")

mkipper
6d ago
> but with the expectation that only openai has access to it

You can argue about "the expectation" of privacy all you want, but this is completely detached from reality. My assumption is that almost no third parties I share information with have magic immunity that prevents the information from being used in a legal action involving them.

Maybe my doctor? Maybe my lawyer? IANAL but I'm not even confident in those. If I text my friend saying their party last night was great and they're in court later and need to prove their whereabouts that night, I understand that my text is going to be used as evidence. That might be a private conversation, but it's not my data when I send it to someone else and give them permission to store it forever.

buellerbueller
6d ago
Listen, man, I willingly did that murder, but with the expectation that no one would know about it, except the victim. Why should some lawyer working for the government have access to it?
adolph
6d ago
1 reply
Cynicism aside, this seems like an attempt to prune back a potentially excessive legal discovery demand by appealing to public opinion.

  The New York Times is demanding that we turn over 20 million of your private 
  ChatGPT conversations. They claim they might find examples of you using 
  ChatGPT to try to get around their paywall.
indoordin0saur
6d ago
2 replies
Yeah, I'm not sure why everyone feels the need to take a side here. Both of these organizations are ghoulish.
o11c
6d ago
The NYT has problems with being a stooge of the military-industrial complex, but I really don't see them doing anything wrong in this case.
mmooss
6d ago
How is the NYT like OpenAI, or 'ghoulish'?
miltonlost
6d ago
1 reply
This is the basic discovery process when OpenAI commits IP theft. They're trying to misinform the public of how justice process works.
mapontosevenths
6d ago
1 reply
> To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

The constitution is clear that the purpose of intellectual property is to promote progress. I feel that OpenAI is on the right side of that and this is not IP theft as long as they aren't reproducing others work in a non-transformative way.

Training the AI is clearly transformative (and lossy to boot). Giving the AI the ability to scrape and paraphrase others work is less clear and both sides each have valid arguments. I don't envy the judges that must make that call.

etchalon
6d ago
1 reply
If they're reproducing NY Times articles, in full, that that is non-transformative. That's the point of the case.
mapontosevenths
6d ago
1 reply
> That's the point of the case.

No, its not. See the PDF of the actual case below.

The case is largely about OpenAI training on the NY Times articles without permission. They do allege that it can reproduce their articles verbatim at times, but that's not the central allegation as it's obviously a bug and not an intentional infringement. You have to get way down to item 98 before they even allege it.

https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

etchalon
6d ago
1 reply
They alleged it in point 4?

"Defendants have refused to recognize this protection. Powered by LLMs containing copies of Times content, Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples. See Exhibit J. These tools also wrongly attribute false information to The Times."

mapontosevenths
5d ago
1 reply
You're right. No idea how I missed that. Thanks!

Still, that's a bug not a feature. OpenAI will just respond that its already been fixed and pay them damages of $2.50 or something to cover the few times it happened under very specific conditions.

mapontosevenths
5d ago
Just to double check that it was fixed, I asked ChatGPT what was on the front page of the New York times today and I get a summary with paraphrased titles. It doesn't reproduce anything exactly (not even the headlines).

Interestingly, the summary is made by taking screenshots of a (probably illegal) PDF it found someplace on the internet. It then cites that sketchy PDF as the source rather than linking back to the original NY Times articles.

If I were the NYT I would still be plenty pissed off.

ChatGPT's reference: https://d2dr22b2lm4tvw.cloudfront.net/ny_nyt/2025-11-13/fron... via https://frontpages.freedomforum.org/

rpdillon
6d ago
13 replies
I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.

But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.

In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.

It's quite literally a fishing expedition.

Sherveen
6d ago
1 reply
Yeah, everyone else in the comments so far is acting emotionally, but --

As a fan and DAU of both OpenAI and the NYT, this is just a weird discovery demand and there should be another pathway for these two to move fwd in this case (NYT to get some semblance of understanding, OAI protecting end-user privacy).

totallymike
6d ago
It sounds like the alternate path you're suggesting is for NYT to stop being wrong and let OpenAI continue being right, which doesn't sound much like a compromise to me.
Alex2037
6d ago
3 replies
>But conversations people thought they were having with OpenAI in private

...had never been private in the first place.

not only is the data used for refining the models, OpenAI had also shariah policed plenty of people for generating erotica.

Workaccount2
6d ago
1 reply
This is about private chats, which are not used for training and only stored for 30 days.

Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain through training on <1% more user chats. So no, they are not lying when they say they don't train on private chats.

bonsai_spool
6d ago
2 replies
> Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain

Is this true? I can’t recall anything like this (look at Ashley Madison which is alive and well)

Workaccount2
6d ago
1 reply
It's not national news when a company is found to be doing what they say they are doing.
bonsai_spool
6d ago
> It's not national news when a company is found to be doing what they say they are doing.

You said there would be ‘orders of magnitude’ of brand damage. What is the proof?

bee_rider
6d ago
I think it is hard to say because OpenAI is still heavily in development and working out their business model (and a reasonable complaint is that it is crazy to label them a massive success without seeing how they actually work when they need to make a profit).

But, all that aside, it seems that OpenAI is aiming to be bigger and more integrated into the day-to-day life of the average person than Ashley Madison, right?

mock-possum
6d ago
Yeah I don’t get why more people don’t understand this - why would you think your conversation was private when it wasnt actually private. Have you not been paying attention.
IlikeKitties
6d ago
> OpenAI had also shariah policed plenty of people for generating erotica.

That framing is retorically brilliant if you think about it. I will use that more. Chat Sharia Law for Chat Control. Mass Sharia Surveillance from flock etc.

cogman10
6d ago
10 replies
I get the feeling, but that's not what this is.

NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

That's a question they fundamentally cannot answer without these chat logs.

That's what discovery, especially in a copyright case, is about.

Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.

The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

sroussey
6d ago
1 reply
> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

This is nonsense. I’ve personally been involved in these things, and fought to protect user privacy at all levels and never lost.

giraffe_lady
6d ago
1 reply
You've successfully fought a subpoena on the basis of a third party's privacy? More than once? I'd love to hear more.
sroussey
4d ago
1 reply
I was CEO of a small startup called Network54 with about 4 million monthly users. It was a forum hosting service.

The early 2000s were the heyday of lawsuits. People would say something about someone and if that someone was rich they would sue. It happened often.

The attorneys would sue us, the domain registrar, the ISP, everyone.

Often the things said were true. But they would sue to find out who the people were.

People selling Ponzi schemes, CEOs of public companies trying to find what union employees to fire, it was all over the place.

We would fire to quash every time. File to move venues to CA which has anti-slap laws. Depositions in DC. It was very distracting and expensive.

Never lost. Made some people really mad that they didn’t get their way.

Now for criminal things, the opposite, sorry. Two person operation and the FBI walks in your office with a warrant, then yes sir let me see the warrant first. If no warrant, then sorry sir come back with a warrant but we will take this as a notice to soft delete not hard delete content.

giraffe_lady
4d ago
1 reply
Ah interesting, thanks for answering.

I've been in the situation of being instructed to pull unredacted logs for a subpoena before when I really did not think it was appropriate. I was just an IC but I talked to a lawyer about it. Since the company I worked for was not willing to fight it, my options were pull the logs, quit the job, or possibly catch a contempt charge.

It seems like everyone who is not the CEO or maybe the legal dept has much more constrained choices in this situation. I also wonder if the timeframes matter here, how much things may have changed in two decades. My experience with it was only a couple years ago, and I was surprised they chose not to fight it but presumably they know more about the chances of success than I do.

sroussey
3d ago
Yahoo got sued for not fighting it long enough to give a chance for the third party to quash on their own. If I remember correctly, they lost. But the case had a good argument of fairness to the little people whose data is just being given away and people fired or harassed because of it.

Anyhow, we worked with Public Citizen on a couple of cases and they were willing to fund to Supreme Court in order to set good precedent.

tantalor
6d ago
4 replies
> The user has no right to privacy

The correct term for this is prima facie right.

You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.

Similarly, liberty is a prima facie right; you can be arrested for committing a crime.

SilverElfin
6d ago
1 reply
Is there any evaluation of which right or which harm is larger? It seems like the idea that one outweighs another is arbitrary. Is there a principled thing behind it?
prasadjoglekar
6d ago
That's what the court is for. Weighing the different arguments and applying precedents
rpdillon
5d ago
1 reply
Seems to me my right to privacy is far more important than their right to copyright enforcement.
freejazz
5d ago
Have you read OpenAI's terms of service? Which part is being violated by producing anonymized logs in response to discovery? OpenAI's ToS state that they will produce your data in response to discovery. What's not clicking for you?
ronsor
6d ago
> enforcing the rights of others under copyright law

I certainly do not care about copyright more than my own privacy, and I certainly don't find that interest to be the public's interest, though perhaps it's the interest of legacy corporations and their lobbyists.

antonvs
6d ago
> You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.

What governs or codifies that? I would have expected that there would need to be some kind of specific overriding concern(s) that would need to apply in order to violate my (even limited) expectation of privacy, not just enforcing copyright law in general.

E.g. there's nothing resembling "probable cause" to search my own interactions with ChatGPT for such violations. On what basis can that be justified?

glenstein
6d ago
1 reply
>That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses.

The trouble with this logic is NYT already made that argument and lost as applied to an original discovery scope of 1.4 billion records. The question now is about a lower scope and about the means of review, and proposed processes for anonymization.

They have a right to some form of discovery, but not to a blank check extrapolation that sidesteps legitimate privacy issues raised both in OpenAIs statement as well as throughout this thread.

freejazz
5d ago
Again, as I pointed out to you numerous times in this thread. OpenAI already represented to the court that the data was anonymized and that they can anonymize it, so you are significantly departing from the actual facts in your discussion here. There are no genuine privacy issues left here. The data is anonymous and it is under a protective order so it must be maintained confidentially.
observationist
6d ago
1 reply
You don't hate the media nearly enough.

"Credible" my ass. They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles. OpenAI has taken measures to limit such methods and prevent arbitrary wholesale reproduction of copyrighted content since that time. That would have been the end of the situation if NYT was engaging in good faith.

The NYT is after what they consider "their" piece of the pie. They want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior. They haven't been injured, they were already dying, and this lawsuit is a hail mary attempt at grifting some life support.

Behavior like that of the NYT is why we can't have nice things. They're not entitled to exist, and by engaging in behavior like this, it makes me want them to stop existing, the faster, the better.

Copyright law is what you get when a bunch of layers figure out how to encode monetization of IP rights into the legal system, having paid legislators off over decades, such that the people that make the most money off of copyrights are effectively hoarding those copyrights and never actually produce anything or add value to the system. They rentseek, gatekeep, and viciously drive off any attempts at reform or competition. Institutions that once produced valuable content instead coast on the efforts of their predecessors, and invest proceeds into lawsuits, lobbying, and purchase of more IP.

They - the NYT - are exploiting a finely tuned and deliberately crafted set of laws meant to screw actual producers out of percentages. I'm not a huge OpenAI fan, but IP laws are a whole different level of corrupt stupidity at the societal scale. It's gotcha games all the way down, and we should absolutely and ruthlessly burn down that system of rules and salt the ground over it. There are trivially better systems that can be explained in a single paragraph, instead of requiring books worth of legal code and complexities.

totallymike
6d ago
2 replies
I'm not a fan of NYT either, but this feels like you're stretching for your conclusion:

> They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles....would have been the end of the situation if NYT was engaging in good faith.

I mean, if I was performing a bunch of investigative work and my publication was considered the source of truth in a great deal of journalistic effort and publication of information, and somebody just stole my newspaper off the back of a delivery truck every day and started rewriting my articles, and then suddenly nobody read my paper anymore because they could just ask chatgpt for free, that's a loss for everyone, right?

Even if I disagree with how they editorialize, the Times still does a hell of a lot of journalism, and chatgpt can never, and will never be able to actually do journalism.

> they want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior

I'd love to hear exactly what you mean by this.

Between what and what are they trying to insert themselves as middlemen, and why is chatgpt the victim in their attempts to do it?

What does 'rent seeking' mean in this context?

What does 'second hander' mean?

I'm guessing that 'sleazy lawyer' is added as an intensifier, but I'm curious if it means something more specific than that as well, I suppose.

> Copyright law....the rest of it

Yeah. IP rights and laws are fucked basically everywhere. I'm not smart enough to think of ways to fix it, though. If you've got some viable ideas, let's go fix it. Until then, the Times kinda need to work with what we've got. Otherwise, OpenAI is going to keep taking their lunch money, along with every other journalist's on the internet, until there's no lunch money to be had from anyone.

terminalshort
6d ago
1 reply
> my publication was considered the source of truth

Their publication is not considered the source of truth, at least not by anyone with a brain.

totallymike
6d ago
They are still considered a paper of record, but I chose to use a hypothetical outfit because I don’t love the Times myself but I believe the argument to be valid.

I’m not interested in arguing about whether or not they deserve to fail, because that whole discussion is orthogonal to whether OpenAI is in the wrong.

If I’m on my deathbed, and somebody tries to smother me, I still hope they face consequences

rpdillon
5d ago
> then suddenly nobody read my paper anymore

This is the part that Times won't talk about because people stopped reading their paper long before AI, and they haven't been able to point to any credible harm in terms of reduced readership as a result of open AI launching. They just think that people might be using ChatGPT to read the New York Times without paying. But it's not a very good hypothesis because that's not what ChatGPT is good at.

It's like the people filing the lawsuit don't really understand the technology at all.

antonvs
6d ago
1 reply
> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

The legal term is "expectation of privacy", and it does exist, albeit increasingly weakly in the US. There are exceptions to that, such as a subpoena, but that doesn't mean anyone can subpoena anything for any reason. There has to be a legal justification.

It's not clear to me that such a justification exists in this case.

amanaplanacanal
5d ago
That's why there is someone trained in the law (the judge) to make that determination.
realusername
6d ago
1 reply
> NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

Credible to whom? In their supposed "investigation", they sent a whole page of text and complex pre-prompting and still failed to get the exact content back word for word. Something users would never do anyways.

And that's probably the best they've got as they didn't publish other attempts.

mikkupikku
6d ago
1 reply
Agreed, they could carefully coerce the model to more or less output some of their articles, but the premise that users were routinely doing this to bypass the paywall is silly.
terminalshort
6d ago
2 replies
Especially when you can just copy paste the url into Internet Archive and read it. And yet they aren't suing Internet Archive.
realusername
6d ago
Let's be real, they are suing OpenAI because they have way more money than the Internet Archive and they would be happy with a cut
acdha
6d ago
Copyright law isn’t binary and has long-running allowances for fair use which take into consideration factors like scale, revenue, and whether it replaces the original. As a real non-profit, the Internet Archive is not selling its copies of the NYT and it’s always giving full credit to the source. In contrast, ChatGPT does charge for their output and while it may give citations that’s not a given.
terminalshort
6d ago
1 reply
Even if OpenAI is reproducing pieces of NYT articles, they still have a difficult argument because in no way is is a practical means of accessing paywalled NYT content, especially compared to alternatives. The entire value proposition of the NYT is news coverage, and probably 99.9% of their page views are from stories posted so recently that they aren't even in the training set of LLMs yet. If I want to reproduce a NYT story from LLM it's a prompt engineering mess, and I can only get old ones. On the other hand I can read any NYT story from today by archiving it: https://archive.is/5iVIE. So why is the NYT suing OpenAI and not the Internet Archive?
freejazz
5d ago
1 reply
OpenAI is not allowed to reproduce the NYT's articles, that's copyright infringement. It does not really matter if it is a practical thing or not, that would only go to damages, not liability.
terminalshort
5d ago
1 reply
What do you think it is you are liable for?
freejazz
5d ago
I'm confused. I don't think I'm liable for anything. I am not OpenAI.
protocolture
6d ago
1 reply
>NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content

They shouldnt have any rights to data after its released.

>That's a question they fundamentally cannot answer without these chat logs.

They are causing more damage than anything chatGPT could have caused to NYT. Privacy needs to be held higher than corporate privilege.

>Think about it this way. Let's say this were a book store selling illegal copies of books.

Think of it this way, no book should be illegal.

>They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

NYT glazers do more to uphold OpenAI as a privacy respecting platform than OpenAI has ever done.

>If this never happens then the amount will be low.

Should be zero, plus compensation to the affected OpenAI users from NYT.

>The user has no right to privacy.

And this needs to be remedied immediately.

>The same as how any internet service can be (and have been) compelled to produce private messages.

And this needs to be remedied immediately.

hekkle
6d ago
1 reply
I get that you're mad, and rightly should be for an invasion of your privacy, but the NYT would be foolish to use any of your data for anything other than this lawsuit, and to not delete it afterwards, as per their request.

They can't use this data against any individual, even if they explicitly asked, "How do I hack the NYT?"

The only potential issue is them finding something juicy in someone's chat, that they could publish as a story; and then claiming they found out about this juicy story through other means, (such as a confidential informant), but that's not likely an issue for the average punter to be concerned about.

Aerroon
5d ago
1 reply
>The only potential issue is them finding something juicy in someone's chat, that they could publish as a story; and then claiming they found out about this juicy story through other means, (such as a confidential informant)

Which is concerning since this is a news organization that's getting the data.

Let's say they do find some juicy detail and use it, then what? Nothing. It's not like you can ever fix a privacy violation. Nobody involved would get a serious punishment, like prison time, either.

freejazz
5d ago
1 reply
>Let's say they do find some juicy detail and use it, then what? Nothing. It's not like you can ever fix a privacy violation. Nobody involved would get a serious punishment, like prison time, either.

There are no privacy violations. OpenAI already told the court they anonymized it. What they say in court and what they say in the blog is different and so many people here are (unfortunately) falling for it!

Aerroon
5d ago
1 reply
There's no such thing. Anonymized data can still be used to identify someone as we've seen on numerous occasions.
freejazz
4d ago
Read the ToS next time
throw20251110
6d ago
> Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

Your claim doesn’t hold up, my friend. It’s inaccurate because nobody archives an entire dialogue with a seller for the record, and you certainly don’t have to show identification to purchase a book.

rpdillon
5d ago
It's not credible. Using AI to regurgitate news articles is not a good use of the tool, and it is not credible that any statistically significant portion of their user base is using the tool for that.
jcranmer
6d ago
1 reply
> In copyright cases, typically you need to show some kind of harm.

NYT is suing for statutory copyright infringement. That means you only need to demonstrate that the copyright infringement, since the infringement alone is considered harm; the actual harm only matters if you're suing for actual damages.

This case really comes down to the very unsolved question of whether or not AI training and regurgitation is copyright infringement, and if so, if it's fair use. The actual ways the AI is being used is thus very relevant for the case, and totally within the bounds of discovery. Of course, OpenAI has also been engaging this lawsuit with unclean hands in the first place (see some of their earlier discovery dispute fuckery), and they're one of the companies with the strongest "the law doesn't apply to US because we're AI and big tech" swagger.

Workaccount2
6d ago
5 replies
NYT doesn't care about regurgitation. When it was doable, it was spotty enough that no one would rely on it. But now the "trick" doesn't even work anymore (you would paste the start of an article and chatgpt would continue it).

What they want is to kill training, and more over, prevent the loss of being the middle-man between events and users.

totallymike
6d ago
1 reply
> prevent the loss of being the middle-man between events and users

I'm confused by this phrase. I may be misreading but it sounds like you're frustrated, or at least cynical about NYT wanting to preserve their business model of writing about things that happen and selling the publication. To me it seems reasonable they'd want to keep doing that, and to protect their content from being stolen.

They certainly aren't the sole publication of written content about current events, so calling them "the middle-man between events and users" feels a bit strange.

If your concern is that they're trying to prevent OpenAI from getting a foot in the door of journalism, that confuses me even more. There are so, so many sources of news: other news agencies, independent journalists, randos spreading word-of-mouth information.

It is impossible for chatgpt to take over any aspect of being a "middle-man between events and users" because it can't tell you the news. it can only resynthesize journalism that it's stolen from somewhere else, and without stealing from others, it would be worse than the least reliable of the above sources. How could it ever be anything else?

This right here feels like probably a good understanding of why NYT wants openai to keep their gross little paws off their content. If I stole a newspaper off the back of a truck, and then turned around and charged $200 a month for the service of plagiarizing it to my customers, I would not be surprised if the Times's finest lawyers knocked on my door either.

Then again, I may be misinterpreting what you said. I tend to side with people who sue LLM companies for gobbling up all their work and regurgitating it, and spend zero effort trying to avoid that bias

rpdillon
5d ago
1 reply
> preserve their business model of writing about things that happen and selling the publication. To me it seems reasonable they'd want to keep doing that

Be very wary of companies that look to change the landscape to preserve their business model. They are almost always regressive in trying to prevent the emergence of something useful and new because it challenges their revenue stream. The New York Times should be developing their own AI and should not be ignoring the march of technological progress, but instead they are choosing to lawyer up and use the legal system to try to prevent progress. I don't have any sympathy for them; there is no right to a business model.

totallymike
5d ago
This feels less like changing the landscape and more like trying to stop a new neighbor from building a four-level shopping complex in front of your beach-front property while also strip-mining the forest behind.

As for whether the Times should be developing their own LLM bot, why on earth would they want that?

sfink
6d ago
2 replies
> What they want is to kill training, and more over, prevent the loss of being the middle-man between events and users.

So... they want to continue reporting news, and they don't want their news reports to be presented to users in a place where those users are paying someone else and not them. How horrible of them?

If NYT is not reporting news, then NYT news reports will not be available for AIs to ingest. They can perhaps still get some of that data from elsewhere, perhaps from places that don't worry about the accuracy of the news (or intentionally produces inaccurate news). You have to get signal from somewhere, just the noise isn't enough, and killing off the existing sources of signal (the few remaining ones) is going to make that a lot harder.

The question is, does journalism have a place in a world with AIs, and should OpenAI be the one deciding the answer to that question?

rpdillon
5d ago
The problem is that the publishing industry seems to think their job is to print ink on paper, and they reluctantly admit that this probably also involves putting pixels on a screen.

They're hideously anti-tech and they completely ignore technological advancement when thinking about the scope of their product. Instead of investing millions of dollars in developing their own AI solutions that are the New York Times answer machine, they pay those millions of dollars to lawyers and sue people building the answer machines. It's entirely the wrong strategy, it's regressive, and yes, they are to blame for it.

The biggest bug I've observed in my life is that people think technology is its own sector when really it's a cross-cutting concern that everybody needs to be thinking about.

Workaccount2
6d ago
It's easy to see a future where primary sources post their information directly online (already largely the case) and AI agents make tailored, interactive news for their users.

Sure, there may still be investigative journalism and long form, but those are hardly the money makers.

Also, just like SWE's, writers have that same "do I have a place in the future?" anxiety in the back of their head.

The media is very hostile towards AI, and the threat is on multiple levels.

SilverElfin
6d ago
1 reply
It’s more than middle man right? Like if visits to NYT reduce then they get less ads revenue and their ability to do business goes away. On the other hand, if they demand licensing fees then they’ll just be marginalized by other news anyways.
rpdillon
5d ago
Notably absent from their complaint is any suggestion that they've been harmed by a reduction in readership as a result of OpenAI's emergence.
watwut
6d ago
> prevent the loss of being the middle-man between events and users.

OpenAI is free to do own reporting. NY Times is nowhere near trying to prevent others for competing as middleman.

itsnibs
6d ago
It sounds like the defendant would much prefer middle-men who do not have the resources to enforce copyright.
troyvit
6d ago
1 reply
It's a part of privacy policy boilerplate that if a company is compelled by the courts to give up its logs it'll do it. I'm sure all of OpenAI's users read that policy before they started spilling their guts to a bot, right? Or at least had an LLM summarize it for them?
Rastonbury
6d ago
This is it isn't it? For any technology, I don't think anyone should have the expectation of privacy from lawyers if the company who has your data is brought to court
Noaidi
6d ago
2 replies
To show harm they need the proof, this is the point of the lawsuit. They have sufficient evidence that OpenAI was scraping the web and the NY Times.

When Altman says "They claim they might find examples of you using ChatGPT to try to get around their paywall." he is blatantly misrepresenting the case.

https://smithhopen.com/2025/07/17/nyt-v-openai-microsoft-ai-...

"The lawsuit focuses on using copyrighted material for AI training. The NYT says OpenAI and Microsoft copied vast amounts of its content. They did this to build generative AI tools. These tools can output near-exact copies of NYT articles. Therefore, the NYT argues this breaks copyright laws. It also hurts journalism by skipping paywalls and cutting traffic to original sites. The complaint shows examples where ChatGPT mimics NYT stories closely. This could lead to money loss and harm from AI errors, called hallucinations."

This has nothing to do with the users, it has everything to do with OpenAI profiting off of pirated copyrighted material.

Also, Altmans is getting scared because the NY Times proved to the judge that CahtGPT copied many articles:

"2025 brings big steps in the case. On March 26, 2025, Judge Sidney Stein rejected most of OpenAI’s dismissal motion. This lets the NYT’s main copyright claims go ahead. The judge pointed to “many” examples of ChatGPT copying NYT articles. He found them enough to continue. This ruling dropped some side claims, like unfair competition. But it kept direct and contributory infringement, plus DMCA breaches."

terminalshort
6d ago
1 reply
> The lawsuit focuses on using copyrighted material for AI training

Well that's going to go pretty poorly for them considering it has already been ruled fair use twice: https://www.whitecase.com/insight-alert/two-california-distr...

On the other hand, distributing copies of NYT content is actually a breach of copyright, but only if the NYT can prove it was actually happening.

rpdillon
5d ago
It's really interesting living through this revolution because it's pretty obvious to me that the outcome here needs to be that training is fair use, pirating materials you train on is not going to end up being okay, and the user of the AI tool will be responsible for whether or not the resulting work is infringing. AI tools that are predominantly designed for infringing use cases will of course be ruled against.

I feel like this is all so blindingly obvious and yet I feel like it's going to take us decades to get there. I guess the wheels of justice turn slowly.

rpdillon
6d ago
Training has sometimes been held to be fair use under certain circumstances, but in determining fair use, one of the four factors that is considered is how it affects the market for the work being infringed. I would expect that determining to what degree it's regurgitating the New York Times' content is part of that analysis.
otterley
6d ago
1 reply
> This case is unusual because the New York Times can't point to any harm

It helps to read the complaint. If that was the case, the case would have been subject to a Rule 12(b)(6) (failure to state a claim for which relief can be granted) challenge and closed.

Complaint: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

See pages 60ff.

rpdillon
6d ago
2 replies
My observation is that section does not articulate any harm. It _claims_ harm, but doesn't actually explain what the harm is. Reduced profits? Lower readership? All they say is "OpenAI violated our copyrights, and we deserve money."

> 167. As a direct and proximate result of Defendants’ infringing conduct alleged herein, The Times has sustained and will continue to sustain substantial, immediate, and irreparable injury for which there is no adequate remedy at law. Unless Defendants’ infringing conduct is enjoined by this Court, Defendants have demonstrated an intent to continue to infringe the copyrighted works. The Times therefore is entitled to permanent injunctive relief restraining and enjoining Defendants’ ongoing infringing conduct. > 168. The Times is further entitled to recover statutory damages, actual damages, restitution of profits, attorneys’ fees, and other remedies provided by law.

They're simply claiming harm, nothing more. I want to see injuries, scars, and blood if there's harm. As far as I can tell, the NYT was on the ropes long before AI came along. If they could actually articulate any harm, they wouldn't need to read through everyone's chats.

otterley
6d ago
1 reply
> As a direct and proximate result of Defendants’ infringing conduct alleged herein, The Times has sustained and will continue to sustain substantial, immediate, and irreparable injury for which there is no adequate remedy at law. Unless Defendants’ infringing conduct is enjoined by this Court, Defendants have demonstrated an intent to continue to infringe the copyrighted works. The Times therefore is entitled to permanent injunctive relief restraining and enjoining Defendants’ ongoing infringing conduct.

This is boilerplate language in a claim seeking injunctive relief. In contract law in law school, you learn there's a historical difference between cases at law (where the only remedy is money) and cases in equity (where the court can issue injunctions). If you want to stop someone from violating your rights, you claim "irreparable injury" (that is, money isn't enough) and ask for the court in equity to issue an injunction.

> It _claims_ harm, but doesn't actually explain what the harm is. Reduced profits? Lower readership? All they say is "OpenAI violated our copyrights, and we deserve money."

Copyright violation, in and of itself, constitutes a judicially cognizable injury. It's a violation of a type of property right - that is, the right to exclude others from using your artistic works without your permission. The Copyright Act specifies that victims of copyright infringement are not only entitled to an injunction, but also to statutory damages as well as compensatory damages to be determined by a jury. See 17 U.S.C. § 504.

Similarly, you don't have to claim a specific injury in a garden-variety trespass action. The violation of your property rights is enough.

rpdillon
5d ago
1 reply
Very much appreciate the clarification and nuance here. I understand that legally they don't have to provide any of this detail, but I'm also somewhat astonished that there doesn't appear to be any evidence that they've been harmed in any way other than them claiming that they are.
otterley
5d ago
It’s because 1/the damages aren’t clearly articulable and would be speculative at the time of filing, and 2/they don’t have to claim the specific nature of the injury at this point in the case.
terminalshort
6d ago
> sustain substantial, immediate, and irreparable

Furthermore, any alleged injury is absolutely reparable. How many times did OpenAI replicate their content and how many page views did they lose to it? Very reparable monetary damages, if it did in fact occur (and I'm pretty sure it didn't).

vintagedave
6d ago
100% agreed. In the time you wrote this, I also posted: https://news.ycombinator.com/item?id=45901054

I felt quite some disappointment with the comments I saw on the thread at that time.

ozgrakkurt
6d ago
It is better if it is out in the open compared to just some select few diabolical organizations having access to it
phendrenad2
6d ago
> This would allow them to access millions of user conversations that are unrelated to the case

It feels like the NYT is really fishing for inside information on how GPT is used so they can run statistical analysis and write articles about it. I.E. if they find examples of racism, they can get some great articles about how racism is rampant on GPT or something.

themafia
6d ago
> having with OpenAI in private

I don't blieve that OpenAI, or any American corporation, has the wherewithal to actually maintain _your_ privacy in the face of _their_ profitability.

> typically you need to show some kind of harm.

You copied my material without my permission. I've been harmed. That right is independent of pricing. Otherwise Napster would never have generated legal cases.

> It's quite literally a fishing expedition.

It's why American courts are awesome.

stocksinsmocks
6d ago
No doubt. I’m sure NYT sees an opportunity to buy a few more years of life support by pickpocketing the conductor of the AI gravy train. When Sam Altman and the Sulzbergers fight though, as a normal person, my hope is that they destroy each other.

I think the winner are Chinese (and by extension OSS) models as they can ignore copyright. A net win, I think.

Workaccount2
6d ago
The original lawsuit has lots of examples of ChatGPT (3.5? 4?) regurgitating article...snippets. They could get a few paragraphs with ~80-90% perfect replication. But certainly not full articles, with full accuracy.

This wasn't solid enough for a summary judgement, and it seems the labs have largely figured out how to stop the models from doing this. So it looks like NYT wants to comb all user chats rather than pay a team of people tens of thousands a day to try an coax articles out of ChatGPT-5.

nlh
6d ago
1 reply
Man, maybe I'm getting old and jaded, but it's not often that I read a post that literally makes my skin crawl.

This is so transparently icky. "Oh woe is us! We're being sued and we're looking out for YOU the user, who is definitely not the product. We are just a 'lil 'ol (near) trillion-dollar business trying to protect you!"

Come ON.

Look I don't actually know who's in the right in the OAI vs. NYT dispute, and frankly I personally lean more toward the side the says that you are allowed to train models on the world's information as long as you consume it legally and don't violate copyright.

But this transparent attempt to get user sympathy under insanely disingenuous pretenses is just absurd.

greyman
6d ago
1 reply
Why it is absurd? Conversation between me and ChatGPT can be read by a lawyer working for NYT, and that is what is absurd.
HelloMcFly
6d ago
OpenAI has seemingly done everything they can to put publishers in a position to make this demand, and they've certainly not done anything to make it impossible for them to respond to it. Is there a better, more privacy minded way for NYT to get the data they need? Probably, I'm not smart enough to understand all the things that go into such a decision. But I know I don't view them as the villain for asking, and I also know I don't view OpenAI as some sort of guardian of my or my data's best interests.
nerdjon
6d ago
4 replies
This screams just as genuine as Google saying anything about Privacy.

Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...

Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.

stevarino
6d ago
2 replies
Its clearly propaganda. "Your data belongs to you." I'm sure the ToS says otherwise, as OpenAI likely owns and utilizes this data. Yes, they say they are working on end-to-end encryption (whatever that means when they control one end), but that is just a proposal at this point.

Also their framing of the NYT intent makes me strongly distrust anything they say. Sit down with a third party interviewer who asks challenging questions, and I'll pay attention.

BolexNOLA
6d ago
1 reply
>your data belongs to you

…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”

Edit: honestly I’m surprised I left out the bit where they just indiscriminately scraped everything they could online to train these models. The stones to go “your data belongs to you” as they clearly feel entitled to our data is unbelievably absurd

gruez
6d ago
4 replies
>…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”

Should walmart be "culpable" for selling rope that someone hanged themselves with? Should google be "culpable" for returning results about how to commit suicide?

hitarpetar
6d ago
3 replies
do you know what happens when you Google how to commit suicide?
gruez
6d ago
1 reply
The same that happens with chatgpt? ie. if you do it in an overt way you get a canned suicide prevention result, but you can still get the "real" results if you try hard enough to work around the safety measures.
littlestymaar
6d ago
1 reply
Except Google will never encourage you to do it, unlike the sycophantic Chatbot that will.
BolexNOLA
6d ago
The moment we learned ChatGPT helped a teen figure out not just how to take their own life but how to make sure no one can stop them mid-act, we should've been mortified and had a discussion.

But we also decided via Sandy Hook that children can be slaughtered on the altar of the second amendment without any introspection, so I mean...were we ever seriously going to have that discussion?

https://www.nbcnews.com/tech/tech-news/family-teenager-died-...

>Please don't leave the noose out… Let's make this space the first place where someone actually sees you.

How is this not terrifying to read?

glitchc
6d ago
1 reply
Actually, the first result is the suicide hotline. This is at least true in the US.
hitarpetar
6d ago
my point is, clearly there is a sense of liability/responsibility/whatever you want to call it. not really the same as selling rope, rope doesn't come with suicide warnings
tremon
6d ago
An exec loses its wings?
BolexNOLA
6d ago
1 reply
This is as unproductive as "guns don't kill people, people do." You're stripping all legitimacy and nuance from the conversation with an overly simplistic response.
gruez
6d ago
1 reply
>You're stripping all legitimacy and nuance from the conversation with an overly simplistic response.

An overly simplistic claim only deserves an overly simplistic response.

BolexNOLA
6d ago
1 reply
What? The claim is true. The nuance is us discussing if it should be true/allowed. You're simplifying the moral discussion and overall just being rude/dismissive.

Comparing rope and an LLM comes across as disingenuous. I struggle to believe that you believe the two are comparable when it comes to the ethics of companies and their impact on society.

ImPostingOnHN
6d ago
1 reply
> Comparing rope and an LLM comes across as disingenuous.

What makes you feel that? Both are tools, both have a wide array of good and bad uses. Maybe it'd be clearer if you explained why you think the two are incomparable except in cases of disingenuousness?

Remember that things are only compared when they are different -- you wouldn't often compare a thing to itself. So, differences don't inherently make things incomparable.

> I struggle to believe that you believe the two are comparable when it comes to the ethics of companies and their impact on society.

I encourage you to broaden your perspectives. For example: I don't struggle to believe that you disagree with the analogy, because smart people disagree with things all the time.

What kind of a conversation would such a rude, dismissive judgement make, anyways? "I have judged that nobody actually believes anything that disagrees with me, therefore my opinions are unanimous and unrivaled!"

BolexNOLA
6d ago
2 replies
A rope isn’t going to tell you to make sure you don’t leave it out on your bed so your loved ones can’t stop you from carrying out the suicide it helped talk you in to.
ImPostingOnHN
5d ago
You are 100% right, a rope likely isn't going to tell you anything. There's one of those differences I mentioned which makes comparisons useful. We could probably name a few differences!

So, what makes you think comparing the 2 tools is invalid? You just compared them yourself, and I don't think you were being disingenuous.

rpdillon
5d ago
This is a good observation! The LLM can tell you to kill yourself. The rope can actually actually help you do it.
Wistar
6d ago
There are current litigation efforts to hold Amazon liable for suicides committed by, in particular, self-poisoning with high-purity sodium nitrite, which, in low concentrations is used as a meat curing agent.

A 2023 lawsuit against Amazon for suicides with sodium nitrite was dismissed but other similar lawsuits continue. The judge held that Amazon, “… had no duty to provide additional warnings, which in this case would not have prevented the deaths, and that Washington law preempted the negligence claims.“

thinkingtoilet
6d ago
That depends. Does the rope encourage vulnerable people to kill themselves and tell them how to do it? If so, then yes.
preinheimer
6d ago
3 replies
"Your data belongs to you" but we can take any of your data we can find and use it for free for ever, without crediting you, notifying you, or giving you any way of having it removed.
glitchc
6d ago
It's owned by you but OpenAi has a "perpetual, irrevocable, royalty-free license" to use the data as they see fit.
bigyabai
6d ago
Wow it's almost like privately-managed security is a joke that just turns into de-facto surveillance at-scale.
thinkingtoilet
6d ago
We can even download it illegally to train our models on it!
outside1234
6d ago
1 reply
Honestly the sooner OpenAI goes bankrupt the better. Just a totally corrupt firm.
fireflash38
6d ago
1 reply
I really should take the "invest in companies you hate" advice seriously.
outside1234
6d ago
1 reply
I don't hate them. It is just plain to see they have discovered no scalable business model outside of getting larger and larger amounts of capital from investors to utilize intellectual property from others (either directly in the model aka NYT, or indirectly via web searches) without any rights. It is better for all of us the sooner this fails.
frm88
6d ago
1 reply
to utilize intellectual property from others (either directly in the model aka NYT, or indirectly via web searches) without any rights

... and put the liability for retrieving said property and hence the culpability for copyright infringement on the enduser:

Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.

https://www.reuters.com/world/german-court-sides-with-plaint...

rpdillon
5d ago
1 reply
But wait, isn't this what we want? This means the models can be very powerful and that people have to use their judgment when they produce output so that they are held accountable for whether or not they produced something that was infringing. Why is that a bad thing?
frm88
5d ago
Can I ask you why we would the enduser be punishable for the pirating OpenAI did? That would mean governments have to take the next step to protect copyrighted material and what we face then I don't even dare to imagine.
98codes
6d ago
1 reply
I got one sentence in and thought to myself, "This is about discovery, isn't it?"

And lo, complaints about plaintiffs started before I even had to scroll. If this company hadn't willy-nilly done everything they could to vacuum up the world's data, wherever it may be, however it may have been protected, then maybe they wouldn't be in this predicament.

rpdillon
5d ago
1 reply
How do you feel about Google vacuuming up the world's data when they created a search engine? I feel like everybody just ignores this because Google was ostensibly sending traffic to the resulting site. The actual infringement of scraping should be identical between OpenAI and Google. Why is nobody complaining about Google scraping their sites? Is it only because they're getting paid off to not complain?

Everybody acts like this is a moral argument when really it's about whether or not they're getting a piece of the pie.

watwut
5d ago
At the time Google created a search engine, they were not showing the data themselves, they were pointing to where those are. When they started to actually print articles themselves, they got sued. Showing where the thing is and showing content of the thing are two different actions.

So, when google did the same thing, there were complains.

> Why is nobody complaining about Google scraping their sites?

And second, search engines were actually pretty gentle with their sites scrapping. They needed the sites to work, so they respected robots.txt and made sure they wont accidentally DDoS sites by too many requests. AI companies just DDoS sites, do not respect robots.txt and if you block them, they will use another from their infinite amount of IPs.

Otherwise said, even back then, Google was kind trying to be ok non evil citizen. They became sociopathic only much later and even now kind of try to hide it. OpenAI and the rest of AI companies are openly sociopathic and proud of damage they cause.

stefan_
6d ago
Ironically there is precedent of Google caring more about this. When they realized location timeline was a gigantic fed honeypot, they made it per-device, locally stored only. No open letters were written in the process of.
nrhrjrjrjtntbt
6d ago
3 replies
Open AI deservedly getting a beating in this HN comments section but any comments about NYT overreach and what it means in general?

And what if they for example find evidence of X other thing such as:

1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?

2. A crime.

3. An ongoing crime.

4. Something else they can sue someone else for.

5. Top secret information

totallymike
6d ago
1-5: not a concern

It'll be the lawyers who need to go through the data, and given the scale of it, they won't be able to do anything more than trawl for the evidence they need and find specific examples to cite. They don't give a shit if you're asking chatgpt how to put a hit out on your ex, and they're not there to editorialize.

I wont pretend to guess* how they'll perform the discovery, but I highly doubt it will result in humans reading more than a handful of the records in total outside of the ones found via whatever method they automate the discovery process.

If there's top secret information in there, and it was somehow stumbled upon by one of these lawyers or a paralegal somewhere, I find it impossibly unlikely they'd be stupid enough to do anything other than run directly to whomever is the rightful possessor of said information and say "hey we found this in this place it shouldn't be" and then let them deal with it. Which is what we'd want them to do.

*Though if I had to speculate on how they'd do it, I do think the funniest way would be to feed the records back into chatgpt and ask it to point out all the times the records show evidence of infringement

great_wubwub
6d ago
AlienRobot
6d ago
1. That sounds useful.

2. That sounds useful.

3. That sounds useful.

4. That sounds useful.

5. That sounds useful.

Are these supposed to be examples of things that shouldn't be found out about? This has to be the worst pro-privacy argument I've ever seen on the internet. "Privacy is good because they will find out about our crimes"

vintagedave
6d ago
4 replies
Almost every comment (five) so far is against this: 'An incredibly cynical attempt at spin', 'How dare the New York Times demand access to our vault of everything-we-keep to figure out if we're a bunch of lying asses', etc.

In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.

Privacy is paramount. People _trust_ that their chats are private: they ask sensitive questions, ones to do with intensely personal or private or confidential things. For that to be broken -- for a company to force users to have their private data accessed -- is vile.

The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data, etc. I hope we can collectively be better (I'm using ethical terms for a reason) than the other replies show. We don't have to support OpenAI's actions in order to oppose the NYT's actions.

glenstein
6d ago
1 reply
I suspect that many of those comments are from the Philosopher's Chair (aka bathroom), and are not aspiring to be literal answers but are ways of saying "OpenAI Bad". But to your point there should be privacy preserving ways to comply, like user anonymization, tailored searches and so on. It sounds like the NYT is proposing a random sampling of user data. But couldn't they instead do a random sampling of their most widely read articles, for positive hits, rather than reviewing content on a case by case basis?
vintagedave
6d ago
I hadn't heard of the philosopher's chair before, but I laughed :) Yes, I think those views were one-sided (OpenAI Bad) without thinking through other viewpoints.

IMO we can have multiple views over multiple companies and actions. And the sort of discussions I value here on HN are ones where people share insight, thought, show some amount of deeper thinking. I wanted to challenge for that with my comment.

_If_ we agree the NYT even has a reason to examine chats -- and I think even that should be where the conversation is -- I agree that there should be other ways to achieve it without violating privacy.

wkat4242
6d ago
> In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.

These chats only need to be shared because:

- OpenAI pirated masses of content in the first place

- OpenAI refuse to own up to it even now (they spin the NYT claims as "baseless").

I don't agree with them giving my chats out either, but the blame is not with the NYT in my opinion.

> We don't have to support OpenAI's actions in order to oppose the NYT's actions.

Well the NYT action is more than just its own. It will set a precedent if they win which means other news outlets can get money from OpenAI as well. Which makes a lot of sense, after all they have billions to invest in hardware, why not in content??

And what alternative do they have? Without OpenAI giving access to the source materials used (I assume this was already asked for because it is the most obvious route) there is not much else they can do. And OpenAI won't do that because it will prove the NYT point and will cause them to have to pay a lot to half the world.

It's important that this case is made, not just for the NYT but for journalism in general.

Peritract
6d ago
> The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data

The tech community has been doing the scanning and tracking.

dangus
6d ago
OpenAI is the one who chose to store the information. Nobody twisted their arm to do so.

If you store data it can come up in discovery during lawsuits and criminal cases. Period.

E.g., storing illegal materials on Google Drive, Google WILL turn that over to the authorities if there’s a warrant or lawsuit that demands it in discovery.

E.g., my CEO writes an email telling the CFO that he doesn’t want to issue a safety recall because it’ll cost too much money. If I sue the company for injuring me through a product they know to be defective, that civil suit subpoena can ask for all emails discussing the matter and there’s no magical wall of privacy where the company can just say “no that’s private information.”

At the same time, I don’t get to trawl through the company’s emails and use some email the CEO flirting with their secretary as admissible evidence.

There are many ways the court is able to ensure privacy for the individuals. Sexual assault victims don’t have their evidence blasted across the the airwaves just because the court needs to examine that physical evidence.

The only way to avoid this is to not collect the data in the first place, which is where end to end encryption with user-controlled keys or simply not collecting information comes into play.

EdNutting
6d ago
1 reply
So why aren’t they offering for an independent auditor to come into OpenAI and inspect their data (without taking it outside of OpenAI’s systems)?

Probably because they have a lot to hide, a lot to lose, and no interest in fair play.

Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.

glenstein
6d ago
1 reply
By the same token, why isn't NYT proposing something like that rather than the world's largest random sampling?

You don't have to think that OpenAI is good to think there's a legitimate issue over exposing data to a third party for discovery. One could see the Times discovering something in private conversations outside the scope of the case, but through their own interpretation of journalistic necessity, believe it's something they're obligated to publish.

Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.

freejazz
6d ago
1 reply
>By the same token, why isn't NYT proposing something like that rather than the world's largest random sampling?

It's OpenAI's data, there is a protective order in the case and OpenAI already agreed to anonymize it all.

>Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.

lol... what?

glenstein
6d ago
Discovery isn't binary yes/no, it involves competing proposals regarding methods and scope for satisfying information requests. Sometimes requests are egregious or excessive, sometimes they are reasonable and subject to excessively zealous pushback.

Maybe you didn't read TFA but part of the case history was NYT requesting 1.4 billion records as part of discovery and being successfully challenged by OpenAI as unnecessary, and the essence of TFA is advocating for an alternative to the scope of discovery NYT is insisting on, hence the "not rolling over".

Try reading, it's fun!

grugagag
6d ago
Hypocrisy at best, this wall of text is not even penned by a human and yet they want us to believe they care about user privacy..
Apreche
6d ago
Says the people who scraped as much private information as they could get their hands on to train their bots in the first place.
techblueberry
6d ago
I’ll trust the people not asking for a Government bailout thank you very much.
eur0pa
6d ago
This is laughable
unyttigfjelltol
6d ago
If Donald Trump used this OpenAI product to-- who knows-- brainstorm Truth Social content, and his chats were produced to the NYT as well as its consultants and lawyers, who would believe Mr. Trump's content remained secure, confidential and protected from misuse against his wishes?

That's simply a function of the fact it's a controversial news organization running a dragnet on private communications to a technology platform.

"Great cases, like hard cases, make bad law."

hlieberman
6d ago
An incredibly cynical attempt at spin from a former non-profit that renounced its founding principles. A class act, all around.

256 more comments available on Hacker News

ID: 45900370Type: storyLast synced: 11/16/2025, 9:42:57 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.