Openai Has Trained Its LLM to Confess to Bad Behavior
Posted28 days agoActive28 days ago
technologyreview.comNewsstory
informativeneutral
Debate
60/100
BioethicsLLM DevelopmentAI Transparency
Key topics
Bioethics
LLM Development
AI Transparency
Discussion Activity
Light discussionFirst comment
17s
Peak period
2
0-1h
Avg / period
2
Key moments
- 01Story posted
Dec 6, 2025 at 10:22 AM EST
28 days ago
Step 01 - 02First comment
Dec 6, 2025 at 10:22 AM EST
17s after posting
Step 02 - 03Peak activity
2 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 6, 2025 at 10:53 AM EST
28 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 46173998Type: storyLast synced: 12/6/2025, 3:35:11 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
"Wants to"
I find I have a weird take on this kind of anthropomorphism... It doesn't bother me when people say a rock "wants to" roll down hill. But in does when someone say an LLM "wants to"... equally I'm not nearly as bothered by something like the LLM "tries to"... It's a strange rule set for correct communication...
Anyways, please forgive that preemptive tangent. My primary point is
[ citation needed ]
I remember reading a paper praising GPT for being able to explain it's decision making process. This paper provided no evidence, no arguments, and no citations for this exceptionally wild claim. How is this not just a worse version of that claim? I ask that as a real question, so many people willingly believe and state that an LLM is able to (correctly) explain it's decision making process. How? Why isn't it better to assume that's just another hallucination? Especially given it would be nonfalsifiable?