I'm using tesseract-ocr-3.01 to scan many forms. The forms all follow a template, so I already know where the regions/rectangles of text are.
Is there a way to pass those regions to tesseract when using the command-line tool?
I found the answer, thanks to this thread.
It seems that tesseract suports the uzn format (used in the unvl tests).
From the thread:
Calling tesseract with parameter "-psm 4" and renaming the uzn file with the same name of the image seem works.
Example: If we have C:\input.tif
and C:\input.uzn
, we do this:
tesseract -psm 4 C:\input.tif C:\output