I want to find a easy-to-use OCR python module in linux, I have found pytesser http://code.google.com/p/pytesser/, but it contains a .exe executable file.
I tried changed the code to use wine, and it really works, but it's too slow and really not a good idea.
Is there any Linux alternatives that as easy-to-use as it?
You can just wrap tesseract
in a function:
import os
import tempfile
import subprocess
def ocr(path):
temp = tempfile.NamedTemporaryFile(delete=False)
process = subprocess.Popen(['tesseract', path, temp.name], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
process.communicate()
with open(temp.name + '.txt', 'r') as handle:
contents = handle.read()
os.remove(temp.name + '.txt')
os.remove(temp.name)
return contents
If you want document segmentation and more advanced features, try out OCRopus.