Why You Can't Trust Most AI Studies
Posted2 months agoActive2 months ago
thealgorithmicbridge.comTechstory
skepticalnegative
Debate
40/100
AI ResearchBenchmarkingScientific Validity
Key topics
AI Research
Benchmarking
Scientific Validity
The article argues that many AI studies are untrustworthy due to biased benchmarking practices, and commenters discuss the implications of this on the field's credibility.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
2h
Peak period
1
1-2h
Avg / period
1
Key moments
- 01Story posted
Nov 10, 2025 at 1:54 AM EST
2 months ago
Step 01 - 02First comment
Nov 10, 2025 at 3:47 AM EST
2h after posting
Step 02 - 03Peak activity
1 comments in 1-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 10, 2025 at 5:22 AM EST
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45873160Type: storyLast synced: 11/17/2025, 5:59:00 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Chatbot vendors routinely make up a new benchmark, then brag how well their hot new chatbot does on it. Like that time OpenAI’s o3 model trounced the FrontierMath benchmark, and it’s just a coincidence that OpenAI paid for the benchmark and got access to the questions ahead of time.
Every new model will be trained hard against all the benchmarks. There is no such thing as real world performance — there’s only benchmark numbers.
https://pivot-to-ai.com/2025/11/06/oxford-pretends-ai-benchm...
I find this phrase really amusing every time I see it, since "homo unius libri" meant "someone who has studied one thing and mastered it", rather than "someone with narrow knowledge".