Not
Hacker
News
!
Home
Hiring
Products
Discussion
Q&A
Users
AI Evaluation | Trending Topic on Hacker News | Not Hacker News!
Not
Hacker
News
!
Home
Hiring
Products
Discussion
Q&A
Users
Home
/
Discussion
/
AI Evaluation
Back to Discussion
AI Evaluation
Loading...
20 stories
•
24h:
0%
•
7d: 0
•
212 comments
Top contributors:
pseudolus
jxmorris12
zlatkov
mpavlov
capybarahi
Stories
Related Stories
20 stories tagged with ai evaluation
Study Identifies Weaknesses in How AI Systems Are Evaluated
416
192 comments
by pseudolus
Posted
about 2 months ago
Active
about 1 month ago
AI evaluation
LLM benchmarking
machine learning
Evals in 2025: Going Beyond Simple Benchmarks to Build Models People Can Use
80
8 comments
by jxmorris12
Posted
4 months ago
Active
about 1 month ago
AI evaluation
benchmarking
model performance
Deep Dive Into G-Eval: How Llms Evaluate Themselves
11
6 comments
by zlatkov
Posted
about 2 months ago
Active
about 1 month ago
Large Language Models
G-Eval
AI Evaluation
Why Alpha Arena Was a Bad Benchmark
6
0 comments
by mpavlov
Posted
about 2 months ago
Active
about 1 month ago
benchmarking
AI evaluation
Alpha Arena
Why Your AI Evals Keep Breaking
6
1 comments
by capybarahi
Posted
2 months ago
Active
about 1 month ago
AI evaluation
Large Language Models
machine learning
To Solve the Benchmark Crisis, Evals Must Think
6
0 comments
by hsikka
Posted
2 months ago
Active
about 1 month ago
AI evaluation
benchmarking
machine learning
Verse AI – Catch the AI Failures Your Evals Miss
5
0 comments
by 4thabang
Posted
about 2 months ago
Active
about 1 month ago
AI development
AI evaluation
software observability
New Eval From Swe-Bench Team Evalutes Lms Based on Goals Not Tickets
5
1 comments
by lieret
Posted
about 2 months ago
Active
about 1 month ago
AI evaluation
software development
reinforcement learning
Codelens.ai– Community Benchmark Comparing 6 Llms on Real Code Tasks
5
0 comments
by skrid
Posted
3 months ago
Active
about 1 month ago
LLM benchmarking
AI evaluation
software development
Gaia2 and Are: Empowering the Community to Evaluate Agents
5
1 comments
by mortimerp9
Posted
3 months ago
Active
about 1 month ago
AI evaluation
open-source
machine learning
Emotional Intelligence Leaderboard for Llms
5
0 comments
by surprisetalk
Posted
4 months ago
Active
about 1 month ago
LLMs
Emotional Intelligence
AI Evaluation
We Built Convolytic Because Nobody Knows If Their Voice AI Works
3
2 comments
by argamd
Posted
about 2 months ago
Active
about 1 month ago
Voice AI
Conversational Systems
AI Evaluation
Evaluating LLM-Generated Detection Rules in Cybersecurity
3
0 comments
by ianthiel
Posted
3 months ago
Active
about 1 month ago
LLMs
cybersecurity
AI evaluation
Are Large Language Models Worth It?
2
0 comments
by vinhnx
Posted
about 1 month ago
Active
about 1 month ago
large language models
AI evaluation
machine learning
Are Large Language Models Worth It?
2
0 comments
by freediver
Posted
about 1 month ago
Active
about 1 month ago
large language models
AI evaluation
machine learning
Agci Benchmark: Evaluating Long-Term and Adaptive Intelligence in AI Systems
2
0 comments
by tempinst5
Posted
about 2 months ago
Active
about 1 month ago
AI evaluation
long-term intelligence
adaptive systems
Multi-Domain Rubrics Requiring Professional Knowledge to Answer and Judge
2
0 comments
by PaulHoule
Posted
2 months ago
Active
about 1 month ago
AI evaluation
education technology
natural language processing
Llms Often Know When They're Being Evaluated
2
0 comments
by lawrenceyan
Posted
2 months ago
Active
about 1 month ago
Large Language Models
AI evaluation
machine learning
Agentic Ai: Why Evaluation Is the Make-or-Break Factor
2
0 comments
by paperplaneflyr
Posted
3 months ago
Active
about 1 month ago
Agentic AI
AI Evaluation
Artificial Intelligence
Thoughts on Evals
2
1 comments
by chw9e
Posted
3 months ago
Active
about 1 month ago
AI evaluation
LLM
testing frameworks