Not

Hacker

News!

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

Home
Hiring
Products
Companies
Discussion
Q&A
Privacy Policy

Resources

Visit Hacker News
HN API
Modal cronjobs
Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2026 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.

AI Benchmarking | Trending Topic on Hacker News | Not Hacker News!

Not

Hacker

News!

Home
Discussion
AI Benchmarking

AI Benchmarking

17 stories

•

24h: 0%

•

7d: 0

•

250 comments

Top contributors:mustaphah blndrt tosh luciesim codelensai

Stories

Related Stories

17 stories tagged with ai benchmarking

Top Model Scores May Be Skewed by Git History Leaks in Swe-Bench

466153 commentsby mustaphah

Posted4 months agoActiveabout 1 month ago

Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-Mini by 22%

19765 commentsby blndrt

Posted4 months agoActiveabout 1 month ago

Swe-Bench Pro

10128 commentsby tosh

Posted4 months agoActiveabout 1 month ago

Tau² Benchmark in Action: Early Results and Key Takeaways

160 commentsby luciesim

Posted4 months agoActiveabout 1 month ago

Benchmark AI on Your Actual Code (gpt-5, Claude, Grok, Gemini, O3)

70 commentsby codelensai

Posted3 months agoActiveabout 1 month ago

Context-Bench: Benchmarking Llms on Agentic Context Engineering

50 commentsby janpio

Posted2 months agoActiveabout 2 months ago

Epoch Capabilities Index Aggregates AI Benchmark Scores Into One Metric

40 commentsby finder83

Posted2 months agoActiveabout 1 month ago

Flashinfer Bench: a Benchmark Suite for AI Systems That Improve Themselves

40 commentsby yiyan

Posted3 months agoActiveabout 1 month ago

Measuring What Matters: Construct Validity in Large Language Model Benchmarks

32 commentsby Cynddl

Posted2 months agoActiveabout 2 months ago

Upbench: Dynamically Evolving Real-World Labor-Market Agentic Benchmark [pdf]

21 commentsby pablomendes

Postedabout 2 months agoActiveabout 2 months ago

Claude Haiku 4.5 Vs. Glm-4.6 Vs. GPT-5 Mini: Job Queue System Benchmark

20 commentsby heymax054

Posted2 months agoActiveabout 1 month ago

Gemini 2.5 Pro Still Tops Text and Vision Benchmarks

20 commentsby robertwt7

Posted3 months agoActiveabout 1 month ago

AI Agent Benchmark Compendium

20 commentsby nkko

Posted3 months agoActiveabout 1 month ago

Terminal-Bench 2.0 and Harbor

11 commentsby falcor84

Postedabout 2 months agoActiveabout 2 months ago

Imo-Bench – Towards Robust Mathematical Reasoning

10 commentsby stared

Posted2 months agoActiveabout 2 months ago

Seal Showdown Technical Report (ai Benchmark) [pdf]

10 commentsby freeqaz

Posted4 months agoActiveabout 1 month ago

Mlperf Inference V5.1 Results Land with New Benchmarks and Record Participation

10 commentsby rbanffy

Posted4 months agoActiveabout 1 month ago

Not

Hacker

News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

Home
Hiring
Products
Companies
Discussion
Q&A
Privacy Policy

Resources

Visit Hacker News
HN API
Modal cronjobs
Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2026 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.