AI Assistants Misrepresent News Content 45% of the Time
Posted2 months agoActive2 months ago
bbc.co.ukTechstoryHigh profile
skepticalmixed
Debate
80/100
AINewsAccuracyLlms
Key topics
AI
News
Accuracy
Llms
A study by the BBC and EBU found that AI assistants misrepresent news content 45% of the time, sparking a discussion on the reliability of both AI and human news reporting.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
32m
Peak period
65
0-2h
Avg / period
12.3
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 22, 2025 at 9:39 AM EDT
2 months ago
Step 01 - 02First comment
Oct 22, 2025 at 10:11 AM EDT
32m after posting
Step 02 - 03Peak activity
65 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 23, 2025 at 5:54 PM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45668990Type: storyLast synced: 11/20/2025, 8:18:36 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Obviously, AI isn't an improvement, but people who blindly trust the news have always been credulous rubes. It's just that the alternative is being completely ignorant of the worldviews of everyone around you.
Peer-reviewed science is as close as we can get to good consensus and there's a lot of reasons this doesn't work for reporting.
But, technology also gave us the internet, and social media. Yes, both are used to propagate misinformation, but it also laid bare how bad traditional media was at both a) representing the world competently and b) representing the opinions and views of our neighbors. Manufacturing consent has never been so difficult (or, I suppose, so irrelevant to the actions of the states that claim to represent us).
You just give up on uneconomical efforts at accuracy and you sell narratives that work for one political party or the other.
It is a model that has been taken up world over. It just works. “The world is too complex to explain, so why bother?”
And what will you or me do about it? Subscribe to the NYT? Most of us would rather spend that money on a GenAI subscription because that is bucketed differently in our heads.
Or against people in general.
It's a pet peeve of mine that we get these kinds of articles without a baseline established of how people do on the same measure.
Is misrepresenting news content 45% of the time better or worse than the average person? I don't know.
By extension: Would a person using an AI assistant misrepresent news more or less after having read a summary of the news provided by an AI assistant? I don't know that either.
When they have a "Why this distortion matters" section, those things matter. They've not established if this will make things better or worse.
(the cynic in me want another question answered too: How often does reporters misrepresent the news? Would it be better or worse if AI reviewed the facts and presented them vs. letting reporters do it? again: no idea)
I don’t have a personal human news summarizer?
The comparison is between a human reading the primary source against the same human reading an LLM hallucination mixed with an LLM referring the primary source.
> cynic in me want another question answered too: How often does reporters misrepresent the news?
The fact that you mark as cynical a question answered pretty reliably for most countries sort of tanks the point.
Not a personal one. You do however have reporters sitting between you and the source material a lot of the time, and sometimes multiple levels of reporters playing games of telephone with the source material.
> The comparison is between a human reading the primary source against the same human reading an LLM hallucination mixed with an LLM referring the primary source.
In modern news reporting, a fairly substantial proportion of what we digest is not primary sources. It's not at all clear whether an LLM summarising primary sources would be better or worse than reading a reporter passing on primary sources. And in fact, in many cases the news is not even secondary sources - e.g. a wire service report on primary sources getting rewritten by a reporter is not uncommon.
> The fact that you mark as cynical a question answered pretty reliably for most countries sort of tanks the point.
It's a cynical point within the context of this article to point out that it is meaningless to report on the accuracy of AI in isolation because it's not clear that human reporting is better for us. I find it kinda funny that you dismiss this here, after having downplayed the games of telephone that news reporting often is earlier in your reply, thereby making it quite clear I am in fact being a lot more cynical than you about it.
In cases where a reporter is just summarising e.g. a court case, sure. Stock market news has been automated since the 2000s.
More broadly, AI assistants misrepresenting news content may sometimes direct reference a court case. But they often don't. Even if they only could, that covers a small fraction of the news, much of which the AI will need to rely on reporters detailing the primary sources they're interfacing with.
Reporter error is somewhat orthogonal to AI assistants' accuracy.
It is not at all. Journalists are wrong all the time, but you still treat news like record and not a sample. In fact I'd put money that AI mischaracterizes events at a LOWER rate than AI does: narratives shift over time, and journalists are more likely to succumb to this shift.
Straw man. Everyone educated constantly argues over sourcing.
> I'd put money that AI mischaracterizes events at a LOWER rate than AI does
Maybe it does. But an AI sourcing journalists is demonstrably worse. Source: TFA.
> narratives shift over time, and journalists are more likely to succumb to this shift
Lol, we’ve already forgotten about MechaHitler.
At the end of the day, a lot of people consume news to be entertained. They’re better served by AI. The risk is folks of consequence start doing that, at which point I suppose the system self resolves by making them, in the long run, of no consequence compared to those who own and control the AI.
Is this not the editorial board and journalist? I'm not sure what the gripe is here.
I think we're on the same side of this, but I just want to say that we can do a lot better. As per studies around the Replication Crisis over the last decade [0], and particularly this 2016 survey conducted by Monya Baker from Nature [1]:
> 1,576 researchers who took a brief online questionnaire on reproducibility found that more than 70% of researchers have tried and failed to reproduce another scientist's experiment results (including 87% of chemists, 77% of biologists, 69% of physicists and engineers, 67% of medical researchers, 64% of earth and environmental scientists, and 62% of all others), and more than half have failed to reproduce their own experiments.
We need to expect better, needing both better incentives and better evaluation, and I think that AI can help with this.
[0] https://en.wikipedia.org/wiki/Replication_crisis
[1] https://www.nature.com/articles/533452a
How could a candidate who yelling "Fake News" like an idiot get elected? Because of the state of journalism.
How could people turn to AI slop? Because of the state of human slop.
> 31% of responses showed serious sourcing problems – missing, misleading, or incorrect attributions.
> 20% contained major accuracy issues, including hallucinated details and outdated information.
I'm generally against whataboutism, but here I think we absolutely have to compare it to human-written news reports. Famously, Michael Crichton introduced the "Gell-Mann amnesia effect" [0], saying:
> Briefly stated, the Gell-Mann Amnesia effect works as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.
This has absolutely been my experience. I couldn't find proper figures, but I would put good money on significantly over 45% of articles written in human-written news articles having "at least one significant issue".
[0] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
Regarding scientific reporting, there's as usual a relevant xkcd ("New Study") [0], and in this case even better, there's a fabulous one from PhD Comics ("Science News Cycle") [1].
[0] https://xkcd.com/1295/
[1] https://phdcomics.com/comics/archive.php?comicid=1174
It's also not clear if humans do better when consuming either, and whether the effect of an AI summary, even with substantial issues, is to make the human reading them better or worse informed.
E.g. if it helps a person digest more material by getting more focused reports, it's entirely possible that flawed summaries would still in aggregate lead to a better understanding of a subject.
On its own, this article is just pure sensationalism.
Why stop at what humans can do? AND to not be fettered by any expectations of accuracy, or even feasibility of retractions.
Truly, efficiency unbound.
https://www.pewresearch.org/journalism/fact-sheet/news-media...
However, 79% of Brits trust the BBC as per this chart:
https://legacy.pewresearch.org/wp-content/uploads/sites/2/20...
AI summarizes are good for getting a feel of if you want to read an article or not. Even with Kagi News I verify key facts myself.
Never share information about an article you have not read. Likewise, never draw definitive conclusions from an article that is not of interest.
If you do not find a headline interesting, the take away is that you did not find the headline interesting. Nothing more, nothing less. You should read the key insights before dismissing an article entirely.
I can imagine AI summarizes being problematic for a class of people that do not cross check if an article is of value to them.
I feel like that’s “the majority of people” or at least “a large enough group for it to be a societal problem”.
We're in a weird time. It's always been like this, it's just much.. more, now. I'm not sure how we'll adapt.
I don't know If i can agree with that. I think we make an error when we aggregate news in the way we do. We claim that "the right wing media" says something when a single outlet associated with the right says a thing, and vice versa. That's not how I enjoy reading the news. I have a couple of newspapers I like reading, and I follow the arguments they make. I don't agree with what they say half the time, but I enjoy their perspective. I get a sense of the "editorial personality" of the paper. When we aggregate the news, we don't get that sense, because there's no editorial. I think that makes the news poorer, and I think it makes people's views of what newspapers can be poorer.
The news shouldn't a stream of happenings. The newspaper is best when it's a coherent day-to-day conversation. Like a pen-pal you don't respond to.
Here is a sample:
> [1] Google DeepMind and Harvard researchers propose a new method for testing the ‘theory of mind’ of LLMs - Researchers have introduced a novel framework for evaluating the "theory of mind" capabilities in large language models. Rather than relying on traditional false-belief tasks, this new method assesses an LLM’s ability to infer the mental states of other agents (including other LLMs) within complex social scenarios. It provides a more nuanced benchmark for understanding if these systems are merely mimicking theory of mind through pattern recognition or developing a more robust, generalizable model of other minds. This directly provides material for the construct_metaphysics position by offering a new empirical tool to stress-test the computational foundations of consciousness-related phenomena.
> https://venturebeat.com/ai/google-deepmind-and-harvard-resea...
The link does not work, the title is not found in Google Search either.
Then they're not very good at search.
It's like saying the proverbial million monkeys at typewriters are good at search because eventually they type something right.
Do you have an in-depth understanding of how those "agentic powers" are implemented? If not, you should probably research it yourself. Understanding what's underneath the buzzwords will save you some disappointment in the future.
Not every LLM app has access to web / news search capabilities turned on by default. This makes a huge difference in what kind of results you should expect. Of course, the AI should be aware that it doesn't have access to web / news search, and it should tell you as much rather than hallucinating fake links. If access to web search was turned on, and it still didn't properly search the web for you, that's a problem as well.
I've felt it myself. Recently I was looking as some documentation without a clear edit history. I thought about feeding it into an AI and having it generate one for me, but didn't because I didn't have the time. To think, if I had done that, it probably would have generated a perfectly acceptable edit history but one that would have obscured what changes were actually made. I wouldn't just lack knowledge (like I do now) I would have obtained anti knowledge.
I do sales meetings all day every day, and I've tried different AI note takers that send a summary of the meeting afterwards. I skim them when they get dumped into my CRM and they're almost always quite accurate. And I can verify it, because I was in the meeting.
Agreed, it's generally quite accurate. I find for hectic meetings, it can get some things wrong... But the notes are generally still higher quality than human generated notes.
Is it perfect? No. Is it good enough? IMO absolutely.
Similar to many other things, the key is that you don't just blindly trust it. Have the LLM take notes and summarize, and then _proofread_ them, just as you would if you were writing them yourself...
Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.
This article contains significant issues.
No... the problem is that it cites Wikipedia articles that don't exist.
> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.
Also, is attributing, without any citation, ChatGPT's preference for Wikipedia to a reprisal to an active lawsuit a significant issue? Or do the authors get off scot-free because they caged it in "we don't know, but maybe it's the case"?
And the worst part about the people unironically thinking they can use it for "research" is, that it essentially supercharges confirmation bias.
The inefficient sidequests you do while researching is generally what actually gives you the ability to really reason about a topic.
If you instead just laser focus on the tidbits you prompted with... Well, your opinion is a lot less grounded.
https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...
I have seen a few cases before of "hallucinations" that turned out to be things that did exist, but no longer do.
It's not bad when they use the Internet at generation time to verify the output.
Pre prompting to cite sources is obviously a better way of going about things.
IDK how you people go through that experience more than a handful of times before you get pissed off and stop using these tools. I've wasted so much time because of believable lies from these bots.
Sorry, not even lies, just bullshit. The model has no conception of truth so it can't even lie. Just outputs bullshit that happens to be true sometimes.
That seems to be the real challenge with AI for this use case. It has no real critical thinking skills, so it's not really competent to choose reliable sources. So instead we're lowering the bar to just asking that the sources actually exist. I really hate that. We shouldn't be lowering intellectual standards to meet AI where it's at. These intellectual standards are important and hard-won, and we need to be demanding that AI be the one to rise to meet them.
For example, having a single central arbiter of source bias is inescapably the most biased thing you could possibly do. Bias has to be defined within an intellectual paradigm. So you'd have to choose a paradigm to use for that bias evaluation, and de facto declare it to be the one true paradigm for this purpose. But intellectual paradigms are inherently subjective, so doing that is pretty much the most intellectually biased thing you can possibly do.
I've seen a certain sensationalist news source write a story that went like this.
Site A: Bad thing is happening, cite: article Site B
* follow the source *
Site B: Bad thing is happening, cite different article on Site A
* follow the source *
Site A: Bad thing is happening, no citation.
I fear that's the current state of a large news bubble that many people subscribe to. And when these sensationalist stories start circulating there's a natural human tendency to exaggerate.
I don't think AI has any sort of real good defense to this sort of thing. 1 level of citation is already hard enough. Recognizing that it is citing the same source is hard enough.
There was another example from the Kagi news stuff which exemplified this. A whole article written which made 3 citations that were ultimately spawned from the same new briefing published by different outlets.
I've even seen an example of a national political leader who fell for the same sort of sensationalization. One who should have known better. They repeated what was later found to be a lie by a well-known liar but added that "I've seen the photos in a classified debriefing". IDK that it was necessarily even malicious, I think people are just really bad at separating credible from uncredible information and that it ultimately blends together as one thing (certainly doesn't help with ancient politicians).
These grifters simply were not attracted to these gigs in these quantities prior to AI, but now the market incentives have changed. Should we "blame" the technology for its abuse? I think AI is incredible, but market endorsement is different from intellectual admiration.
A recent Kurzgesagt goes into the dangers of this, and they found the same thing happening with a concrete example: They were researching a topic, tried using LLMs, found they weren't accurate enough and hallucinated, so they continued doing things the manual way. Then some weeks/months later, they noticed a bunch of YouTube videos that had the very hallucinations they were avoiding, and now their own AI assistants started to use those as sources. Paraphrased/remembered by me, could have some inconsistencies/hallucinations.
https://www.youtube.com/watch?v=_zfN9wnPvU0
Right. Let's talk about statistics for a bit. Or let's put it differently: they found in their report that 45% of the answers for 30 questions they have "developed" had a significant issue, e.g. inexisting reference
I'll give you 30 questions out of my sleeve where 95% of the answers will not have any significant issue.
Neither is my bucket of 30 questions statistcally significant but it goes to say that I can disprove their hypothesis just by giving them my sample.
I think that the report is being disingenious and I don't understand for what reasons. it's funny that they say "misrepresent" when that's exactly what they are doing.
A critical human reader can go as deep as they like in examining claims there: can look at the source listed for a claim, can often click through to read the claim in the source, can examine the talk page and article history, can search through the research literature trying to figure out where the claim came from or how it mutated in passing from source to source, etc. But an AI "reader" is a predictive statistical model, not a critical consumer of information.
I have one example that I check periodically just to see if anybody else has noticed. I've been checking it for several years and it's still there; the SDI page claims that Brilliant Pebbles was designed to use "watermelon sized" tungsten projectiles. This is completely made up; whoever wrote it up was probably confusing "rods from god" proposals that commonly use tungsten and synthesizing that confusion with "pebbles". The sentence is cited but the sources don't back it up. It's been up like this for years. This error has been repeated on many websites now, all post-dating the change on wikipedia.
If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.
Imagine if this was the ethos regarding open source software projects. Imaging Microsoft saying 20 years ago, "Linux has this and that bug, but you're not allowed to go fix it because that detracts from our criticism of open source." (Actually, I wouldn't be surprised if Microsoft or similar detractors literally said this.)
Of course Wikipedia has wrong information. Most open source software projects, even the best, have buggy, shite code. But these things are better understood not as products, but as processes, and in many (but not all) contexts the product at any point in time has generally proven, in a broad sense, to outperform their cathedral alternatives. But the process breaks down when pervasive cynicism and nihilism reduce the number of well-intentioned people who positively engage and contribute, rather than complain from the sidelines. Then we land right back to square 0. And maybe you're too young to remember what the world was like at square 0, but it sucked in terms of knowledge accessibility, notwithstanding the small number of outstanding resources--but which were often inaccessible because of cost or other barriers.
Yep.
Including, if not especially, the ones actively worked on by the most active contributors.
The process for vetting sources (both in terms of suitability for a particular article, and general "reliable sources" status) is also seriously problematic. Especially when it comes to any topic which fundamentally relates to the reliability of journalism and the media in general.
Not to mention, the AI companies have been extremely abusive to the rest of the internet so they are often blocked from accessing various web sites, so it's not like they're going to be able to access legitimate information anyways.
One way to successfully use LLMs is to do the initial research legwork. Run the 40 Google searches and follow links. Evaluate sources according to some criteria. Summarize. And then give the human a list of links to follow.
You quickly learn to see patterns. Sonnet will happily give a genuinely useful rule of thumb, phrasing it like it's widely accepted. But the source will turn out to be "one guy on a forum."
There are other tricks that work well. Have the LLM write an initial overview with sources. Tell it strictly limit itself to information in the sources, etc. Then hand the report off to a fresh LLM and tell it to carefully check each citation in the report, removing unsourced information. Then have the human review the output, following links.
None of this will get you guaranteed truth. But if you know what you're doing, it can often give you a better starting point than Wikipedia or anything on the first two pages of Google Search results. Accurate information is genuinely hard to get, and it always has been.
Still not enough as I find the LLM will not summarize all the relevant facts, sometimes leaving out the most salient ones. Maybe you'll get a summary of some facts, maybe the ones you explicitly ask for, but you'll be left wondering if the LLM is leaving out important information.
>...it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists.
smrq is asking why you would believe that explanation. The LLM doesn't necessarily know why it's doing what it's doing, so that could be another hallucination.
Your answer:
> ...I wanted to know if it was old link that broke or changed but no apparently
Leads me to believe that you misunderstood smrq's question.
Disclaimer: Started my career in onine journalism/aggregation. Hada 4 week internship with the dpa online daughter some 16 years ago.
Imo at least
There’s no such thing as unbiased.
I would expect this isn't the on-off switch they conceptualized, but I don't know enough about how different LLM providers handle news search and retrieval to say for sure.
With this in mind, 45% doesn't seem so bad anymore
If that is the case with a task so simple, why would we rely on these tools for high risk applications like medical diagnosis or analyzing financial data?
Optimistically that could be extended "twitter-style" by mandatory basic fact checking and reports when they just copy a statement by some politician or misrepresented science stuff (xkcd 1217, X cures cancer), and add the corrections.
But yeah... in my country, with all the 5G-danger craze, we had TV debates with a PhD in telecommunications on one side, and a "building biologist" on the other, so yeah...
> This time, we used the free/consumer versions of ChatGPT, Copilot, Perplexity and Gemini.
IOW, they tested ChatGPT twice (Copilot uses ChatGPT's models) and didn't test Grok (or others).
131 more comments available on Hacker News