Getting started with Python OCR on windows?

Ryan picture Ryan · Jun 28, 2015 · Viewed 10k times · Source

I have never used python before, and I am not sure where to start. My goal is to take image data, of numbers and multicolored background, and reliably get the correct characters identified. I looked into the tools necessary for this and I found the Anaconda python distribution which included all the possible packages I might need for this, as well as tesseract-ocr and pytesser.

Unfortunately, I'm lost in how to begin. I"m using the PyCharm Community IDE and simply trying to follow this guide: http://www.manejandodatos.es/2014/11/ocr-python-easy/ to get a grasp on OCR.

This is the code I'm using:

from PIL import Image
from pytesser import *

image_file = 'menu.jpg'
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text

and I believe the Anaconda distribution that I'm using has PIL, but I'm getting this error:

C:\Users\diego_000\Anaconda\python.exe C:/Users/diego_000/PycharmProjects/untitled/test.py
Traceback (most recent call last):
  File "C:/Users/diego_000/PycharmProjects/untitled/test.py", line 2, in <module>
    from pytesser import *
  File "C:\Users\diego_000\PycharmProjects\untitled\pytesser.py", line 6, in <module>
    import Image
ImportError: No module named Image

Process finished with exit code 1

Can anyone point me in the right direction?

Answer

msw picture msw · Jun 28, 2015

The document you point to says to use

from PIL import Image

except you use

import Image

and so the interpreter properly says:

ImportError: No module named Image

It looks as if you reordered the lines

from PIL import Image
from pytesser import *

and that pytesser has a improperly coded dependency on PIL. but I can't be certain with the code you provided.