Olmocr 2: Unit Test Rewards for Document Ocr
Posted2 months ago
allenai.orgTechstory
supportivepositive
Debate
0/100
OcrAIUnit Testing
Key topics
Ocr
AI
Unit Testing
The Allen Institute for AI is releasing OlmOCR 2, a document OCR system with unit test rewards, and the community is generally supportive of the development.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
N/A
Peak period
1
Start
Avg / period
1
Key moments
- 01Story posted
Oct 22, 2025 at 1:24 PM EDT
2 months ago
Step 01 - 02First comment
Oct 22, 2025 at 1:24 PM EDT
0s after posting
Step 02 - 03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 22, 2025 at 1:24 PM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45672316Type: storyLast synced: 11/17/2025, 9:11:18 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
We’re rolling out *olmOCR 2*—the next major update to our open OCR model for complex documents & scans.
olmOCR 2 turns messy files with tables, equations, handwriting, and more into clean text. Under the hood, we combine synthetic data with unit tests as verifiable rewards to push state-of-the-art performance on challenging docs.
*What’s new*
◆ *Stronger text recognition:* Trained with a new data mix, including 20,000 historical pages for better coverage of aged and degraded materials. Example: olmOCR 2 can now read Abraham Lincoln’s handwriting correctly, recovering the date “January 10th” in his 1864 letter to Major General Hitchcock.
◆ *Big benchmark gains:* 82.4 on olmOCR-Bench (up from 78.5), with improvements across every document category.
◆ *Faster & cheaper:* New FP8 quantized model (olmOCR-2-7B-1025-FP8) reaches ~3,400 output tokens/sec on a single H100—enough to process 10,000 pages for < $2.
◆ *Adapt to your data:* Want to fine-tune for your domain? We provide everything you need to customize and deploy.
Available now, and on the DeepInfra & Parasail APIs. We’re also updating our demo—try olmOCR 2 today!
Learn more: https://allenai.org/blog/olmocr-2 Model: https://huggingface.co/allenai/olmOCR-2-7B-1025-FP8