Claude Opus 4.5, and Why Evaluating New LLMs Is Increasingly Difficult

Postedabout 1 month agoActiveabout 1 month ago

Original: Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

1 points

0 comments

simonwillison.netTech Discussionstory

informativeneutral

Debate

20/100

Large Language ModelsAI Performance AnalysisClaude Opus

Key topics

Large Language Models

AI Performance Analysis

Claude Opus

Light discussion

First comment

38m

Peak period

0-1h

Avg / period

Key moments

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (0 comments)

Discussion hasn't started yet.

ID: 46067294Type: storyLast synced: 11/27/2025, 9:06:08 AM

Want the full context?

Read the primary article or dive into the live Hacker News thread when you're ready.