Open-Sourced 3k Human Computer-Use Tasks Dataset for Training GUI Agents
Postedabout 2 months ago
huggingface.coTechstory
supportivepositive
Debate
0/100
AIGUI AgentsDataset Release
Key topics
AI
GUI Agents
Dataset Release
The author released a dataset of 3k human computer-use tasks to train GUI agents, sparking interest and appreciation from the community.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
N/A
Peak period
1
Start
Avg / period
1
Key moments
- 01Story posted
Nov 7, 2025 at 2:32 PM EST
about 2 months ago
Step 01 - 02First comment
Nov 7, 2025 at 2:32 PM EST
0s after posting
Step 02 - 03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 7, 2025 at 2:32 PM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45850097Type: storyLast synced: 11/17/2025, 7:57:02 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The dataset has 3,167 completed tasks: 2,220 browser tasks and 947 desktop application tasks, spanning 294 websites and 173 applications. Domains include shopping sites, research tools, productivity suites (Office, email, etc.), and other everyday software that people actually use.
For each task, we provide full screen-recording video (about 17 GB total), around 14k screenshots at key action moments, roughly 2k DOM snapshots for web tasks, detailed keyboard and mouse event logs with timestamps, and system metadata for the recording machine. In total the release is 49.2 GB and is MIT licensed.
The data was captured with our own Windows screen recorder, Captr, which we have also open sourced: https://github.com/anaishowland/Captr_MacOS https://github.com/anaishowland/Captr_Windows
Docs and small usage examples for loading the dataset with the Hugging Face datasets library are here: https://github.com/anaishowland/computeruse-data-psai
Intended use cases are reinforcement learning from human computer interactions, training and fine-tuning GUI agents, and benchmark-style evaluation of existing models on realistic multi-step tasks.
Happy to answer questions about how we recorded, cleaned, and structured the data, and would love to hear if anyone ends up using it.