Tesseract: Specifying regions of text

ocr tesseract

sashoalm · Oct 19, 2012 · Viewed 17.8k times · Source

I'm using tesseract-ocr-3.01 to scan many forms. The forms all follow a template, so I already know where the regions/rectangles of text are.

Is there a way to pass those regions to tesseract when using the command-line tool?

Answer

I found the answer, thanks to this thread.

It seems that tesseract suports the uzn format (used in the unvl tests).

From the thread:

Calling tesseract with parameter "-psm 4" and renaming the uzn file with the same name of the image seem works.

Example: If we have C:\input.tif and C:\input.uzn, we do this:

tesseract -psm 4 C:\input.tif C:\output