Terminal-Bench 2.0 and Harbor
Postedabout 2 months agoActiveabout 2 months ago
tbench.aiTechstory
calmmixed
Debate
20/100
AI BenchmarkingTerminal-BenchHarbor Tool
Key topics
AI Benchmarking
Terminal-Bench
Harbor Tool
Terminal-Bench 2.0 was announced with a new evaluation approach using Harbor, significantly reshuffling the leaderboard and sparking discussion about the implications for assessing AI capabilities.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
1s
Peak period
1
0-1h
Avg / period
1
Key moments
- 01Story posted
Nov 11, 2025 at 4:59 AM EST
about 2 months ago
Step 01 - 02First comment
Nov 11, 2025 at 4:59 AM EST
1s after posting
Step 02 - 03Peak activity
1 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 11, 2025 at 4:59 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45885723Type: storyLast synced: 11/17/2025, 6:00:29 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Does anyone here have any insight on whether this genuinely reflects capabilities better? I'm asking because last I checked, Codex+gpt-5 significantly underperformed Claude Code for my use case.
[0] https://github.com/laude-institute/harbor