Back to Home11/19/2025, 6:02:09 PM

Sims with verifiable rewards for web agent benchmarking and RL

1 points
1 comments

Discussion Activity

Light discussion

First comment

N/A

Peak period

1

Hour 1

Avg / period

1

Comment distribution1 data points

Based on 1 loaded comments

Key moments

  1. 01Story posted

    11/19/2025, 6:02:09 PM

    2h ago

    Step 01
  2. 02First comment

    11/19/2025, 6:02:09 PM

    0s after posting

    Step 02
  3. 03Peak activity

    1 comments in Hour 1

    Hottest window of the conversation

    Step 03
  4. 04Latest activity

    11/19/2025, 6:02:09 PM

    2h ago

    Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)
Showing 1 comments
wujerry2000
2h ago
Hi all! Sharing some of our recent work around building RL envs and sims for agent training.

There are a lot more technical details on building the benchmark in the post. If you are interested in more RL/Post-Training, I'd highly recommend reading this super in-depth blog from our partners at Yutori: https://yutori.com/blog/introducing-navigator

Some more casual thoughts and lessons:

1) A high volume of quality RL environments / sims remain one of the largest blockers to training frontier agents, especially as labs/enterprises shift towards creating increasingly specialized AI coworkers that can do real work.

2) Building an RL env is VERY different from building a high quality dataset. While the primary input for dataset creation is specialized human annotators and clear rubrics, the inputs to building a great RL env involve humans, engineers, product, data, and an orchestration of everything together. There are a lot of green field problems when you move from building singular environments to SCALING 1-3 orders of magnitude.

3) There is a constant push/pull between building tasks that are easily verifiable and building tasks that are realistic. Its sort of like a 2x2 grid. The best (and most valuable) tasks are realistic and verifiable. There are constant tradeoffs being made, and we often find ourselves limited by the types of realistic tasks we can make if they lack a clear verifier. I'm reminded of Jason Wei's post here: https://www.jasonwei.net/blog/asymmetry-of-verification-and-...

4) When it comes to building browser sims, we found the hardest challenges to come NOT from mimicking the frontend components but rather creating a realistic distribution of data to sit on top of. Although not immediately obvious, this makes a lot of sense. For example, when building Noodle Flights, the front end UI was (although non trivial) manageable to create, but modeling the distribution of complex flight data was infinitely harder.

5) Its an iterative process. Building a perfect sim / verifier out the gate is very difficult, and a large part of the RL process is shepherding / QA of specific tasks and verifiers. The best way to do this is by constantly reviewing trajectories and spotting false positives/negatives. This is tedious work, but often front loaded - until you see smooth gains :)

Have lots more thoughts but these were just top of mind today. If this work is interesting always happy to chat (we're also hiring)!

ID: 45982656Type: storyLast synced: 11/19/2025, 7:41:56 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.