AI Models Are Using Material From Retracted Scientific Papers
Posted3 months agoActive3 months ago
technologyreview.comTechstory
calmmixed
Debate
20/100
AIScientific ResearchData Quality
Key topics
AI
Scientific Research
Data Quality
AI models are trained on data that includes retracted scientific papers, raising concerns about their reliability, but some commenters see potential benefits in understanding what doesn't work through these papers.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
1h
Peak period
1
1-2h
Avg / period
1
Key moments
- 01Story posted
Sep 23, 2025 at 5:41 PM EDT
3 months ago
Step 01 - 02First comment
Sep 23, 2025 at 6:42 PM EDT
1h after posting
Step 02 - 03Peak activity
1 comments in 1-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 23, 2025 at 6:42 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45353124Type: storyLast synced: 11/20/2025, 6:03:33 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I think this may actually be positive, it just needs to get better at advising caution on the redacted papers.
It is important that negative or null responses be understood. In some ways redacted papers lead to better understanding of what didn't work, why not, what were the errors in the paper?
Was it complete fraud, ok, good to know. Was a mistake made, or have we learned something else as the science has progressed?
I wrote about this recently on our blog Beyond Cheerypicking Data[1].
We work in neurotech/sleeptech we increase slow-wave activity during sleep or what we refer to as enhancing restorative function.
We are building on more than a decade of research and 50+ published peer reviewed papers. We haven't seen retractions of papers, but we've seen a few null results, which helped us understand where the implementation of the technology has struggled in the past.
The article focuses on scientific papers for public consumption, but that doesn't really matter to the AI atm. There is so much junk pretend science on the internet already that has no citations. I see this as being something that can help the research community as it improves. Maybe with that background AI can be better at understanding when junk science is being promoted.
[1]https://www.affectablesleep.com/blog/beyond-cherry-picking-m...