Show HN: OCR Arena – A playground for OCR models
ocrarena.aiI didn't expect IBM to be making relevant AI models but this thing is priced at $1 per 4,000,000 output tokens... I'm using it to transcribe handwritten input text and it works very well and super fast.
Super nice if it worked for our use case to simply get full output.
We had Mistral previously but had to remove it because their hosted API for OCR was super unstable and returned a lot of garbage results unfortunately.
Paddle, Nanonets, and Chandra being added shortly!
Also, some of the models are prone to infinite loops, which I suspect is not being penalized as harshly as it should be because the user will eventually get bored and leave (or their browser will lock up) before they can pick a winner.
Still, a really cool resource - I'm looking forward to more models being added.
i have it verify some stamps which are quite messy and sometimes obscured and honestly some i could not even read.
I assume to do that you’d need another model to do language detection on the inputs and/or outputs; but a language detection model can be a lot cheaper than an OCR model or an LLM
UX on mobile isn’t great. It wasn’t obvious to me where the second model output was and I was thrown off even more so because the option to vote for model 1 output was presented without ever even seeing model two output.
Second suggestion would be to install a MathJax plugin so one can properly rate mathematical equations and formulas. Raw LATeX is easy to mistake and it makes comparing between LATeX and Unicode outputs hard.
Working on a hobby project that interacts with user handwriting on <canvas>. Tried some CNN models for digits but had trouble with characters.
I don't know what the state of the art is, but an old model for digitizer pens might not do so bad either.
Note that I haven't tried any of them, but tesseract is still likely the leading open source OCR that works with CPU.
Some results look plausible but are just plain wrong. That is worse than useless.
Example: the "Table" sample document contains chemical substances and their properties. How many numbers did the LLM output and associate correctly? That is all that matters. There is no "preference" aspect that is relevant until the data is correct. Nicely formatted incorrect data is still incorrect.
I reviewed the output from Qwen3-VL-8B on this document. It contains numbers in the wrong rows. I presume using its output would be incredibly dangerous. It should not be used for this purpose. There is no winning aspect to it. Does another model produce worse results? Avoid both models at all costs.
Are there models available that are accurate enough for this purpose? I don't know. It is very time consuming to evaluate. This particular table seems pretty legible. A real production grade OCR solution should probably need a 100% score on this example before it can be adopted. The output of such a table is not something humans can realistically review. It either needs to work completely, or it does not work at all.
I noticed that some models were resisting better to faking data than other, especially I saw that in a sentence cut from the document, GPT5 was inventing the end of the sentence and opus was properly showing it cut.
I didn't try with my writing but in the playground there is one example and some models read it better than me.
I wish the output would show the confidence of the model on each part. I think it would help immensely.
Note that sometimes a model get stuck in a loop, preventing to vote and to see which model is which
[see https://news.ycombinator.com/item?id=45988611 for explanation]
I’ve had great results locally. Albeit you need macOS >=13 for this.
But still, this is incredibly useful!
Just this morning I came across HunyuanOCR which sounded very promising. https://huggingface.co/tencent/HunyuanOCR