Gdpval: Measuring the Performance of Our Models on Real-World Tasks

Posted3 months agoActive3 months ago

BGyss

42 points

9 comments

openai.comTechstory

calmmixed

Debate

40/100

AI Model EvaluationOpenaiLLM Performance

Key topics

AI Model Evaluation

Openai

LLM Performance

OpenAI introduces GDPVal, a framework for measuring AI model performance on real-world economically viable tasks, sparking discussion on the methodology and comparison with other models.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

24m

Peak period

0-1h

Avg / period

Key moments

01Story posted
Sep 25, 2025 at 12:55 PM EDT
3 months ago
Step 01
02First comment
Sep 25, 2025 at 1:19 PM EDT
24m after posting
Step 02
03Peak activity
4 comments in 0-1h
Hottest window of the conversation
Step 03
04Latest activity
Sep 25, 2025 at 3:17 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (9 comments)

Showing 9 comments

westurner

3 months ago

1 reply

"GDPVal: Measuring AI model performance on real world economically viable tasks" (2025) https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf1...

GDP? GlobalGoals ... The Sustainable Development Goals (SDGs) include 17 goals, 169 targets, and over 230 indicators.

For strategic alignment,

Strategic alignment: https://en.wikipedia.org/wiki/Strategic_alignment

Sustainable Development Goals: https://en.wikipedia.org/wiki/Sustainable_Development_Goals

To produce the SDGs, IIUC they clustered the world's problems as an international collaborative exercise; to succeed the MDGs (2000-2015).

Each country voluntarily produces an annual SDG report on their progress on their Targets according to the Indicators.

IMHO, Priorities should include clean energy and AI efficiency, given the growth projections for energy use of AI (and our electrical bills given continued expected supply shortages of energy)

Which real-word SDG tasks can be AI eval'd?

Snuggly73

3 months ago

1 reply

Apparently producing a react component that returns a piece of html with aria tags set up. Long horizon my ass.

westurner

3 months ago

1 reply

Did the LLM in that case suggest adopting an open-source UI library that already has tests for and implements support for W3C ARIA accessibility features, like React-Aria or other alternatives?

Or did it just do the job as prompted and not mention suggestions for continuous improvement like reusing tested open source components?

Snuggly73

3 months ago

Not sure how it went in their tests - I've tried Opus and GPT5 and it was few lines of react + tests, so I guess 'no'

nextworddev

3 months ago

1 reply

Couldn’t find their open source evals dataset

Snuggly73

3 months ago

1 reply

https://huggingface.co/datasets/openai/gdpval/viewer/default...