Claude Opus 4.5, and Why Evaluating New LLMs Is Increasingly Difficult
Postedabout 1 month agoActiveabout 1 month ago
Original: Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult
simonwillison.netTech Discussionstory
informativeneutral
Debate
20/100
Large Language ModelsAI Performance AnalysisClaude Opus
Key topics
Large Language Models
AI Performance Analysis
Claude Opus
Discussion Activity
Light discussionFirst comment
38m
Peak period
1
0-1h
Avg / period
1
Key moments
- 01Story posted
Nov 27, 2025 at 4:04 AM EST
about 1 month ago
Step 01 - 02First comment
Nov 27, 2025 at 4:41 AM EST
38m after posting
Step 02 - 03Peak activity
1 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 27, 2025 at 4:41 AM EST
about 1 month ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 46067294Type: storyLast synced: 11/27/2025, 9:06:08 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Discussion hasn't started yet.