Fake Research Is Doubling Every 1.5 Years
Key topics
A new study shows paper mills now double their output every 1.5 years. Real research, by contrast, only doubles every 15 years. If you follow the math, fake science eventually outnumbers real science.
This is a huge problem because scientific literature is upstream of everything: drug discovery, clinical guidelines, and increasingly, AI training data.
If fake papers keep scaling, we’re polluting downstream systems, including scientific AI.
Peer review was designed for a world where misconduct was rare and individual. It has no defenses against industrialized fraud.
Scientific publishing is collapsing. So what comes next?
The proliferation of fake research papers, produced by 'paper mills', is outpacing real research and threatening the integrity of scientific literature and downstream applications like AI training data.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
5m
Peak period
5
0-1h
Avg / period
3
Key moments
- 01Story posted
Sep 29, 2025 at 6:43 PM EDT
3 months ago
Step 01 - 02First comment
Sep 29, 2025 at 6:48 PM EDT
5m after posting
Step 02 - 03Peak activity
5 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 29, 2025 at 10:28 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Paper published in bad journals are simply ignored by serious researchers, including drug discovery and clinical guidelines.
I'm not sure about AI training, but if they don't they will have soon a nasty surprise and will have to reinvent impact factor, h-index and a few additional stupid metrics that everyone use to try to ignore all the crap that is published in bad journals (and no so bad journals).
Since you provided a source for your numbers, I'll bite. Formalize results into mechanical proofs that can be verified by computers so that we build a library of computer proofs. You can't bullshit a computer.
I'm not sure how it would work for statistical results, but defining a formalized standard might be a good first step for deriving numbers from raw data instead of relying on the authors to calculate the statistics themselves.