Fighting the New York Times' invasion of user privacy
Mood
heated
Sentiment
negative
Category
tech
Key topics
privacy
New York Times
data protection
OpenAI is fighting back against the New York Times' alleged invasion of user privacy, sparking a heated debate.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
18m
Peak period
140
Day 1
Avg / period
40
Based on 160 loaded comments
Key moments
- 01Story posted
11/12/2025, 2:08:28 PM
6d ago
Step 01 - 02First comment
11/12/2025, 2:26:45 PM
18m after posting
Step 02 - 03Peak activity
140 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
11/15/2025, 4:25:03 PM
3d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
-- openai
remember a corporation generally is an object owned by some people. Do you trust "unspecified future group of people" with your privacy? You can't. Best we can do is understand the information architecture and act accordingly.
I don’t recall seeing many food, furniture, plant, or generally anything not related to tech talking about trust, security, and privacy as guiding principles.
-- openai, probably.
OpenAI may be trying to paint themselves as the goody-two-shoes here, but they're not.
> We may use Personal Data for the following purposes: [...] To comply with legal obligations and to protect the rights, privacy, safety, or property of our users, OpenAI, or third parties.
OpenAI outright says it will give your conversations to people like lawyers.
If you thought they wouldn't give it out to third parties, you not only have not read OpenAI's privacy policy, you've not read any privacy policy from a big tech company (because all of them are basically maximalist "your privacy is important, we'll share your data only with us and people who we deem worthy of it, which turns out to be everybody.")
You can argue about "the expectation" of privacy all you want, but this is completely detached from reality. My assumption is that almost no third parties I share information with have magic immunity that prevents the information from being used in a legal action involving them.
Maybe my doctor? Maybe my lawyer? IANAL but I'm not even confident in those. If I text my friend saying their party last night was great and they're in court later and need to prove their whereabouts that night, I understand that my text is going to be used as evidence. That might be a private conversation, but it's not my data when I send it to someone else and give them permission to store it forever.
The New York Times is demanding that we turn over 20 million of your private
ChatGPT conversations. They claim they might find examples of you using
ChatGPT to try to get around their paywall.The constitution is clear that the purpose of intellectual property is to promote progress. I feel that OpenAI is on the right side of that and this is not IP theft as long as they aren't reproducing others work in a non-transformative way.
Training the AI is clearly transformative (and lossy to boot). Giving the AI the ability to scrape and paraphrase others work is less clear and both sides each have valid arguments. I don't envy the judges that must make that call.
No, its not. See the PDF of the actual case below.
The case is largely about OpenAI training on the NY Times articles without permission. They do allege that it can reproduce their articles verbatim at times, but that's not the central allegation as it's obviously a bug and not an intentional infringement. You have to get way down to item 98 before they even allege it.
https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
"Defendants have refused to recognize this protection. Powered by LLMs containing copies of Times content, Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples. See Exhibit J. These tools also wrongly attribute false information to The Times."
Still, that's a bug not a feature. OpenAI will just respond that its already been fixed and pay them damages of $2.50 or something to cover the few times it happened under very specific conditions.
Interestingly, the summary is made by taking screenshots of a (probably illegal) PDF it found someplace on the internet. It then cites that sketchy PDF as the source rather than linking back to the original NY Times articles.
If I were the NYT I would still be plenty pissed off.
ChatGPT's reference: https://d2dr22b2lm4tvw.cloudfront.net/ny_nyt/2025-11-13/fron... via https://frontpages.freedomforum.org/
But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.
In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.
It's quite literally a fishing expedition.
As a fan and DAU of both OpenAI and the NYT, this is just a weird discovery demand and there should be another pathway for these two to move fwd in this case (NYT to get some semblance of understanding, OAI protecting end-user privacy).
...had never been private in the first place.
not only is the data used for refining the models, OpenAI had also shariah policed plenty of people for generating erotica.
Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain through training on <1% more user chats. So no, they are not lying when they say they don't train on private chats.
Is this true? I can’t recall anything like this (look at Ashley Madison which is alive and well)
You said there would be ‘orders of magnitude’ of brand damage. What is the proof?
But, all that aside, it seems that OpenAI is aiming to be bigger and more integrated into the day-to-day life of the average person than Ashley Madison, right?
That framing is retorically brilliant if you think about it. I will use that more. Chat Sharia Law for Chat Control. Mass Sharia Surveillance from flock etc.
NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"
That's a question they fundamentally cannot answer without these chat logs.
That's what discovery, especially in a copyright case, is about.
Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.
That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".
And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.
The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.
This is nonsense. I’ve personally been involved in these things, and fought to protect user privacy at all levels and never lost.
The early 2000s were the heyday of lawsuits. People would say something about someone and if that someone was rich they would sue. It happened often.
The attorneys would sue us, the domain registrar, the ISP, everyone.
Often the things said were true. But they would sue to find out who the people were.
People selling Ponzi schemes, CEOs of public companies trying to find what union employees to fire, it was all over the place.
We would fire to quash every time. File to move venues to CA which has anti-slap laws. Depositions in DC. It was very distracting and expensive.
Never lost. Made some people really mad that they didn’t get their way.
Now for criminal things, the opposite, sorry. Two person operation and the FBI walks in your office with a warrant, then yes sir let me see the warrant first. If no warrant, then sorry sir come back with a warrant but we will take this as a notice to soft delete not hard delete content.
I've been in the situation of being instructed to pull unredacted logs for a subpoena before when I really did not think it was appropriate. I was just an IC but I talked to a lawyer about it. Since the company I worked for was not willing to fight it, my options were pull the logs, quit the job, or possibly catch a contempt charge.
It seems like everyone who is not the CEO or maybe the legal dept has much more constrained choices in this situation. I also wonder if the timeframes matter here, how much things may have changed in two decades. My experience with it was only a couple years ago, and I was surprised they chose not to fight it but presumably they know more about the chances of success than I do.
Anyhow, we worked with Public Citizen on a couple of cases and they were willing to fund to Supreme Court in order to set good precedent.
The correct term for this is prima facie right.
You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.
Similarly, liberty is a prima facie right; you can be arrested for committing a crime.
I certainly do not care about copyright more than my own privacy, and I certainly don't find that interest to be the public's interest, though perhaps it's the interest of legacy corporations and their lobbyists.
What governs or codifies that? I would have expected that there would need to be some kind of specific overriding concern(s) that would need to apply in order to violate my (even limited) expectation of privacy, not just enforcing copyright law in general.
E.g. there's nothing resembling "probable cause" to search my own interactions with ChatGPT for such violations. On what basis can that be justified?
The trouble with this logic is NYT already made that argument and lost as applied to an original discovery scope of 1.4 billion records. The question now is about a lower scope and about the means of review, and proposed processes for anonymization.
They have a right to some form of discovery, but not to a blank check extrapolation that sidesteps legitimate privacy issues raised both in OpenAIs statement as well as throughout this thread.
"Credible" my ass. They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles. OpenAI has taken measures to limit such methods and prevent arbitrary wholesale reproduction of copyrighted content since that time. That would have been the end of the situation if NYT was engaging in good faith.
The NYT is after what they consider "their" piece of the pie. They want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior. They haven't been injured, they were already dying, and this lawsuit is a hail mary attempt at grifting some life support.
Behavior like that of the NYT is why we can't have nice things. They're not entitled to exist, and by engaging in behavior like this, it makes me want them to stop existing, the faster, the better.
Copyright law is what you get when a bunch of layers figure out how to encode monetization of IP rights into the legal system, having paid legislators off over decades, such that the people that make the most money off of copyrights are effectively hoarding those copyrights and never actually produce anything or add value to the system. They rentseek, gatekeep, and viciously drive off any attempts at reform or competition. Institutions that once produced valuable content instead coast on the efforts of their predecessors, and invest proceeds into lawsuits, lobbying, and purchase of more IP.
They - the NYT - are exploiting a finely tuned and deliberately crafted set of laws meant to screw actual producers out of percentages. I'm not a huge OpenAI fan, but IP laws are a whole different level of corrupt stupidity at the societal scale. It's gotcha games all the way down, and we should absolutely and ruthlessly burn down that system of rules and salt the ground over it. There are trivially better systems that can be explained in a single paragraph, instead of requiring books worth of legal code and complexities.
> They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles....would have been the end of the situation if NYT was engaging in good faith.
I mean, if I was performing a bunch of investigative work and my publication was considered the source of truth in a great deal of journalistic effort and publication of information, and somebody just stole my newspaper off the back of a delivery truck every day and started rewriting my articles, and then suddenly nobody read my paper anymore because they could just ask chatgpt for free, that's a loss for everyone, right?
Even if I disagree with how they editorialize, the Times still does a hell of a lot of journalism, and chatgpt can never, and will never be able to actually do journalism.
> they want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior
I'd love to hear exactly what you mean by this.
Between what and what are they trying to insert themselves as middlemen, and why is chatgpt the victim in their attempts to do it?
What does 'rent seeking' mean in this context?
What does 'second hander' mean?
I'm guessing that 'sleazy lawyer' is added as an intensifier, but I'm curious if it means something more specific than that as well, I suppose.
> Copyright law....the rest of it
Yeah. IP rights and laws are fucked basically everywhere. I'm not smart enough to think of ways to fix it, though. If you've got some viable ideas, let's go fix it. Until then, the Times kinda need to work with what we've got. Otherwise, OpenAI is going to keep taking their lunch money, along with every other journalist's on the internet, until there's no lunch money to be had from anyone.
Their publication is not considered the source of truth, at least not by anyone with a brain.
I’m not interested in arguing about whether or not they deserve to fail, because that whole discussion is orthogonal to whether OpenAI is in the wrong.
If I’m on my deathbed, and somebody tries to smother me, I still hope they face consequences
This is the part that Times won't talk about because people stopped reading their paper long before AI, and they haven't been able to point to any credible harm in terms of reduced readership as a result of open AI launching. They just think that people might be using ChatGPT to read the New York Times without paying. But it's not a very good hypothesis because that's not what ChatGPT is good at.
It's like the people filing the lawsuit don't really understand the technology at all.
The legal term is "expectation of privacy", and it does exist, albeit increasingly weakly in the US. There are exceptions to that, such as a subpoena, but that doesn't mean anyone can subpoena anything for any reason. There has to be a legal justification.
It's not clear to me that such a justification exists in this case.
Credible to whom? In their supposed "investigation", they sent a whole page of text and complex pre-prompting and still failed to get the exact content back word for word. Something users would never do anyways.
And that's probably the best they've got as they didn't publish other attempts.
They shouldnt have any rights to data after its released.
>That's a question they fundamentally cannot answer without these chat logs.
They are causing more damage than anything chatGPT could have caused to NYT. Privacy needs to be held higher than corporate privilege.
>Think about it this way. Let's say this were a book store selling illegal copies of books.
Think of it this way, no book should be illegal.
>They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".
NYT glazers do more to uphold OpenAI as a privacy respecting platform than OpenAI has ever done.
>If this never happens then the amount will be low.
Should be zero, plus compensation to the affected OpenAI users from NYT.
>The user has no right to privacy.
And this needs to be remedied immediately.
>The same as how any internet service can be (and have been) compelled to produce private messages.
And this needs to be remedied immediately.
They can't use this data against any individual, even if they explicitly asked, "How do I hack the NYT?"
The only potential issue is them finding something juicy in someone's chat, that they could publish as a story; and then claiming they found out about this juicy story through other means, (such as a confidential informant), but that's not likely an issue for the average punter to be concerned about.
Which is concerning since this is a news organization that's getting the data.
Let's say they do find some juicy detail and use it, then what? Nothing. It's not like you can ever fix a privacy violation. Nobody involved would get a serious punishment, like prison time, either.
There are no privacy violations. OpenAI already told the court they anonymized it. What they say in court and what they say in the blog is different and so many people here are (unfortunately) falling for it!
Your claim doesn’t hold up, my friend. It’s inaccurate because nobody archives an entire dialogue with a seller for the record, and you certainly don’t have to show identification to purchase a book.
NYT is suing for statutory copyright infringement. That means you only need to demonstrate that the copyright infringement, since the infringement alone is considered harm; the actual harm only matters if you're suing for actual damages.
This case really comes down to the very unsolved question of whether or not AI training and regurgitation is copyright infringement, and if so, if it's fair use. The actual ways the AI is being used is thus very relevant for the case, and totally within the bounds of discovery. Of course, OpenAI has also been engaging this lawsuit with unclean hands in the first place (see some of their earlier discovery dispute fuckery), and they're one of the companies with the strongest "the law doesn't apply to US because we're AI and big tech" swagger.
What they want is to kill training, and more over, prevent the loss of being the middle-man between events and users.
I'm confused by this phrase. I may be misreading but it sounds like you're frustrated, or at least cynical about NYT wanting to preserve their business model of writing about things that happen and selling the publication. To me it seems reasonable they'd want to keep doing that, and to protect their content from being stolen.
They certainly aren't the sole publication of written content about current events, so calling them "the middle-man between events and users" feels a bit strange.
If your concern is that they're trying to prevent OpenAI from getting a foot in the door of journalism, that confuses me even more. There are so, so many sources of news: other news agencies, independent journalists, randos spreading word-of-mouth information.
It is impossible for chatgpt to take over any aspect of being a "middle-man between events and users" because it can't tell you the news. it can only resynthesize journalism that it's stolen from somewhere else, and without stealing from others, it would be worse than the least reliable of the above sources. How could it ever be anything else?
This right here feels like probably a good understanding of why NYT wants openai to keep their gross little paws off their content. If I stole a newspaper off the back of a truck, and then turned around and charged $200 a month for the service of plagiarizing it to my customers, I would not be surprised if the Times's finest lawyers knocked on my door either.
Then again, I may be misinterpreting what you said. I tend to side with people who sue LLM companies for gobbling up all their work and regurgitating it, and spend zero effort trying to avoid that bias
Be very wary of companies that look to change the landscape to preserve their business model. They are almost always regressive in trying to prevent the emergence of something useful and new because it challenges their revenue stream. The New York Times should be developing their own AI and should not be ignoring the march of technological progress, but instead they are choosing to lawyer up and use the legal system to try to prevent progress. I don't have any sympathy for them; there is no right to a business model.
As for whether the Times should be developing their own LLM bot, why on earth would they want that?
So... they want to continue reporting news, and they don't want their news reports to be presented to users in a place where those users are paying someone else and not them. How horrible of them?
If NYT is not reporting news, then NYT news reports will not be available for AIs to ingest. They can perhaps still get some of that data from elsewhere, perhaps from places that don't worry about the accuracy of the news (or intentionally produces inaccurate news). You have to get signal from somewhere, just the noise isn't enough, and killing off the existing sources of signal (the few remaining ones) is going to make that a lot harder.
The question is, does journalism have a place in a world with AIs, and should OpenAI be the one deciding the answer to that question?
They're hideously anti-tech and they completely ignore technological advancement when thinking about the scope of their product. Instead of investing millions of dollars in developing their own AI solutions that are the New York Times answer machine, they pay those millions of dollars to lawyers and sue people building the answer machines. It's entirely the wrong strategy, it's regressive, and yes, they are to blame for it.
The biggest bug I've observed in my life is that people think technology is its own sector when really it's a cross-cutting concern that everybody needs to be thinking about.
Sure, there may still be investigative journalism and long form, but those are hardly the money makers.
Also, just like SWE's, writers have that same "do I have a place in the future?" anxiety in the back of their head.
The media is very hostile towards AI, and the threat is on multiple levels.
OpenAI is free to do own reporting. NY Times is nowhere near trying to prevent others for competing as middleman.
When Altman says "They claim they might find examples of you using ChatGPT to try to get around their paywall." he is blatantly misrepresenting the case.
https://smithhopen.com/2025/07/17/nyt-v-openai-microsoft-ai-...
"The lawsuit focuses on using copyrighted material for AI training. The NYT says OpenAI and Microsoft copied vast amounts of its content. They did this to build generative AI tools. These tools can output near-exact copies of NYT articles. Therefore, the NYT argues this breaks copyright laws. It also hurts journalism by skipping paywalls and cutting traffic to original sites. The complaint shows examples where ChatGPT mimics NYT stories closely. This could lead to money loss and harm from AI errors, called hallucinations."
This has nothing to do with the users, it has everything to do with OpenAI profiting off of pirated copyrighted material.
Also, Altmans is getting scared because the NY Times proved to the judge that CahtGPT copied many articles:
"2025 brings big steps in the case. On March 26, 2025, Judge Sidney Stein rejected most of OpenAI’s dismissal motion. This lets the NYT’s main copyright claims go ahead. The judge pointed to “many” examples of ChatGPT copying NYT articles. He found them enough to continue. This ruling dropped some side claims, like unfair competition. But it kept direct and contributory infringement, plus DMCA breaches."
Well that's going to go pretty poorly for them considering it has already been ruled fair use twice: https://www.whitecase.com/insight-alert/two-california-distr...
On the other hand, distributing copies of NYT content is actually a breach of copyright, but only if the NYT can prove it was actually happening.
I feel like this is all so blindingly obvious and yet I feel like it's going to take us decades to get there. I guess the wheels of justice turn slowly.
It helps to read the complaint. If that was the case, the case would have been subject to a Rule 12(b)(6) (failure to state a claim for which relief can be granted) challenge and closed.
Complaint: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
See pages 60ff.
> 167. As a direct and proximate result of Defendants’ infringing conduct alleged herein, The Times has sustained and will continue to sustain substantial, immediate, and irreparable injury for which there is no adequate remedy at law. Unless Defendants’ infringing conduct is enjoined by this Court, Defendants have demonstrated an intent to continue to infringe the copyrighted works. The Times therefore is entitled to permanent injunctive relief restraining and enjoining Defendants’ ongoing infringing conduct. > 168. The Times is further entitled to recover statutory damages, actual damages, restitution of profits, attorneys’ fees, and other remedies provided by law.
They're simply claiming harm, nothing more. I want to see injuries, scars, and blood if there's harm. As far as I can tell, the NYT was on the ropes long before AI came along. If they could actually articulate any harm, they wouldn't need to read through everyone's chats.
This is boilerplate language in a claim seeking injunctive relief. In contract law in law school, you learn there's a historical difference between cases at law (where the only remedy is money) and cases in equity (where the court can issue injunctions). If you want to stop someone from violating your rights, you claim "irreparable injury" (that is, money isn't enough) and ask for the court in equity to issue an injunction.
> It _claims_ harm, but doesn't actually explain what the harm is. Reduced profits? Lower readership? All they say is "OpenAI violated our copyrights, and we deserve money."
Copyright violation, in and of itself, constitutes a judicially cognizable injury. It's a violation of a type of property right - that is, the right to exclude others from using your artistic works without your permission. The Copyright Act specifies that victims of copyright infringement are not only entitled to an injunction, but also to statutory damages as well as compensatory damages to be determined by a jury. See 17 U.S.C. § 504.
Similarly, you don't have to claim a specific injury in a garden-variety trespass action. The violation of your property rights is enough.
Furthermore, any alleged injury is absolutely reparable. How many times did OpenAI replicate their content and how many page views did they lose to it? Very reparable monetary damages, if it did in fact occur (and I'm pretty sure it didn't).
I felt quite some disappointment with the comments I saw on the thread at that time.
It feels like the NYT is really fishing for inside information on how GPT is used so they can run statistical analysis and write articles about it. I.E. if they find examples of racism, they can get some great articles about how racism is rampant on GPT or something.
I don't blieve that OpenAI, or any American corporation, has the wherewithal to actually maintain _your_ privacy in the face of _their_ profitability.
> typically you need to show some kind of harm.
You copied my material without my permission. I've been harmed. That right is independent of pricing. Otherwise Napster would never have generated legal cases.
> It's quite literally a fishing expedition.
It's why American courts are awesome.
I think the winner are Chinese (and by extension OSS) models as they can ignore copyright. A net win, I think.
This wasn't solid enough for a summary judgement, and it seems the labs have largely figured out how to stop the models from doing this. So it looks like NYT wants to comb all user chats rather than pay a team of people tens of thousands a day to try an coax articles out of ChatGPT-5.
This is so transparently icky. "Oh woe is us! We're being sued and we're looking out for YOU the user, who is definitely not the product. We are just a 'lil 'ol (near) trillion-dollar business trying to protect you!"
Come ON.
Look I don't actually know who's in the right in the OAI vs. NYT dispute, and frankly I personally lean more toward the side the says that you are allowed to train models on the world's information as long as you consume it legally and don't violate copyright.
But this transparent attempt to get user sympathy under insanely disingenuous pretenses is just absurd.
Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...
Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.
Also their framing of the NYT intent makes me strongly distrust anything they say. Sit down with a third party interviewer who asks challenging questions, and I'll pay attention.
…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”
Edit: honestly I’m surprised I left out the bit where they just indiscriminately scraped everything they could online to train these models. The stones to go “your data belongs to you” as they clearly feel entitled to our data is unbelievably absurd
Should walmart be "culpable" for selling rope that someone hanged themselves with? Should google be "culpable" for returning results about how to commit suicide?
But we also decided via Sandy Hook that children can be slaughtered on the altar of the second amendment without any introspection, so I mean...were we ever seriously going to have that discussion?
https://www.nbcnews.com/tech/tech-news/family-teenager-died-...
>Please don't leave the noose out… Let's make this space the first place where someone actually sees you.
How is this not terrifying to read?
An overly simplistic claim only deserves an overly simplistic response.
Comparing rope and an LLM comes across as disingenuous. I struggle to believe that you believe the two are comparable when it comes to the ethics of companies and their impact on society.
What makes you feel that? Both are tools, both have a wide array of good and bad uses. Maybe it'd be clearer if you explained why you think the two are incomparable except in cases of disingenuousness?
Remember that things are only compared when they are different -- you wouldn't often compare a thing to itself. So, differences don't inherently make things incomparable.
> I struggle to believe that you believe the two are comparable when it comes to the ethics of companies and their impact on society.
I encourage you to broaden your perspectives. For example: I don't struggle to believe that you disagree with the analogy, because smart people disagree with things all the time.
What kind of a conversation would such a rude, dismissive judgement make, anyways? "I have judged that nobody actually believes anything that disagrees with me, therefore my opinions are unanimous and unrivaled!"
So, what makes you think comparing the 2 tools is invalid? You just compared them yourself, and I don't think you were being disingenuous.
A 2023 lawsuit against Amazon for suicides with sodium nitrite was dismissed but other similar lawsuits continue. The judge held that Amazon, “… had no duty to provide additional warnings, which in this case would not have prevented the deaths, and that Washington law preempted the negligence claims.“
... and put the liability for retrieving said property and hence the culpability for copyright infringement on the enduser:
Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.
https://www.reuters.com/world/german-court-sides-with-plaint...
And lo, complaints about plaintiffs started before I even had to scroll. If this company hadn't willy-nilly done everything they could to vacuum up the world's data, wherever it may be, however it may have been protected, then maybe they wouldn't be in this predicament.
Everybody acts like this is a moral argument when really it's about whether or not they're getting a piece of the pie.
So, when google did the same thing, there were complains.
> Why is nobody complaining about Google scraping their sites?
And second, search engines were actually pretty gentle with their sites scrapping. They needed the sites to work, so they respected robots.txt and made sure they wont accidentally DDoS sites by too many requests. AI companies just DDoS sites, do not respect robots.txt and if you block them, they will use another from their infinite amount of IPs.
Otherwise said, even back then, Google was kind trying to be ok non evil citizen. They became sociopathic only much later and even now kind of try to hide it. OpenAI and the rest of AI companies are openly sociopathic and proud of damage they cause.
And what if they for example find evidence of X other thing such as:
1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?
2. A crime.
3. An ongoing crime.
4. Something else they can sue someone else for.
5. Top secret information
It'll be the lawyers who need to go through the data, and given the scale of it, they won't be able to do anything more than trawl for the evidence they need and find specific examples to cite. They don't give a shit if you're asking chatgpt how to put a hit out on your ex, and they're not there to editorialize.
I wont pretend to guess* how they'll perform the discovery, but I highly doubt it will result in humans reading more than a handful of the records in total outside of the ones found via whatever method they automate the discovery process.
If there's top secret information in there, and it was somehow stumbled upon by one of these lawyers or a paralegal somewhere, I find it impossibly unlikely they'd be stupid enough to do anything other than run directly to whomever is the rightful possessor of said information and say "hey we found this in this place it shouldn't be" and then let them deal with it. Which is what we'd want them to do.
*Though if I had to speculate on how they'd do it, I do think the funniest way would be to feed the records back into chatgpt and ask it to point out all the times the records show evidence of infringement
2. That sounds useful.
3. That sounds useful.
4. That sounds useful.
5. That sounds useful.
Are these supposed to be examples of things that shouldn't be found out about? This has to be the worst pro-privacy argument I've ever seen on the internet. "Privacy is good because they will find out about our crimes"
In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.
Privacy is paramount. People _trust_ that their chats are private: they ask sensitive questions, ones to do with intensely personal or private or confidential things. For that to be broken -- for a company to force users to have their private data accessed -- is vile.
The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data, etc. I hope we can collectively be better (I'm using ethical terms for a reason) than the other replies show. We don't have to support OpenAI's actions in order to oppose the NYT's actions.
IMO we can have multiple views over multiple companies and actions. And the sort of discussions I value here on HN are ones where people share insight, thought, show some amount of deeper thinking. I wanted to challenge for that with my comment.
_If_ we agree the NYT even has a reason to examine chats -- and I think even that should be where the conversation is -- I agree that there should be other ways to achieve it without violating privacy.
These chats only need to be shared because:
- OpenAI pirated masses of content in the first place
- OpenAI refuse to own up to it even now (they spin the NYT claims as "baseless").
I don't agree with them giving my chats out either, but the blame is not with the NYT in my opinion.
> We don't have to support OpenAI's actions in order to oppose the NYT's actions.
Well the NYT action is more than just its own. It will set a precedent if they win which means other news outlets can get money from OpenAI as well. Which makes a lot of sense, after all they have billions to invest in hardware, why not in content??
And what alternative do they have? Without OpenAI giving access to the source materials used (I assume this was already asked for because it is the most obvious route) there is not much else they can do. And OpenAI won't do that because it will prove the NYT point and will cause them to have to pay a lot to half the world.
It's important that this case is made, not just for the NYT but for journalism in general.
The tech community has been doing the scanning and tracking.
If you store data it can come up in discovery during lawsuits and criminal cases. Period.
E.g., storing illegal materials on Google Drive, Google WILL turn that over to the authorities if there’s a warrant or lawsuit that demands it in discovery.
E.g., my CEO writes an email telling the CFO that he doesn’t want to issue a safety recall because it’ll cost too much money. If I sue the company for injuring me through a product they know to be defective, that civil suit subpoena can ask for all emails discussing the matter and there’s no magical wall of privacy where the company can just say “no that’s private information.”
At the same time, I don’t get to trawl through the company’s emails and use some email the CEO flirting with their secretary as admissible evidence.
There are many ways the court is able to ensure privacy for the individuals. Sexual assault victims don’t have their evidence blasted across the the airwaves just because the court needs to examine that physical evidence.
The only way to avoid this is to not collect the data in the first place, which is where end to end encryption with user-controlled keys or simply not collecting information comes into play.
Probably because they have a lot to hide, a lot to lose, and no interest in fair play.
Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.
You don't have to think that OpenAI is good to think there's a legitimate issue over exposing data to a third party for discovery. One could see the Times discovering something in private conversations outside the scope of the case, but through their own interpretation of journalistic necessity, believe it's something they're obligated to publish.
Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.
It's OpenAI's data, there is a protective order in the case and OpenAI already agreed to anonymize it all.
>Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.
lol... what?
Maybe you didn't read TFA but part of the case history was NYT requesting 1.4 billion records as part of discovery and being successfully challenged by OpenAI as unnecessary, and the essence of TFA is advocating for an alternative to the scope of discovery NYT is insisting on, hence the "not rolling over".
Try reading, it's fun!
That's simply a function of the fact it's a controversial news organization running a dragnet on private communications to a technology platform.
"Great cases, like hard cases, make bad law."
256 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.