Scribeocr – Web Interface for Recognizing Text, Ocr, & Creating Digitized Docs
Posted3 months agoActive3 months ago
github.comTechstory
calmmixed
Debate
40/100
OcrText RecognitionDocument Digitization
Key topics
Ocr
Text Recognition
Document Digitization
ScribeOCR is a web interface for OCR and document digitization that has garnered interest and discussion on its capabilities and limitations, with users sharing their experiences and suggestions for improvement.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
4d
Peak period
9
Days 3-4
Avg / period
4.5
Comment distribution18 data points
Loading chart...
Based on 18 loaded comments
Key moments
- 01Story posted
Oct 6, 2025 at 6:39 AM EDT
3 months ago
Step 01 - 02First comment
Oct 9, 2025 at 11:27 PM EDT
4d after posting
Step 02 - 03Peak activity
9 comments in Days 3-4
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 27, 2025 at 8:25 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45489881Type: storyLast synced: 11/20/2025, 8:42:02 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Have you looked at EasyOCR?
A native MacOS or Windows application could use the OCR facilities of the operating system and, in my experience, both produce results that are far better than Tesseract.
I have some lecture slides as image-only PDF (Hungarian language with a sparkle of English and Latin (biology)). I tried the tool on it and I had the following experience:
- proofreading with the overlay seems like a good idea, actually it is unusable when the original text has colors, and you need to recognize diacritic marks. Being able to show the original in grayscale or black&white could help. (BW worked, but Grayscale left everything colored)
- For proofreading the ebook mode was the most useful, I immediately spotted lots of errors that I could not see with overlay. A quick switch between the modes would be useful
- Editing text is not efficient when error rate is high (Hungarian language is not supported, that caused it mostly I guess), the interface has high overhead for mass corrections.
Very good idea, I think after a little polish it would even fit my usecase. For more traditional OCR usecases than mine it is probably already great.