I have never used python before, and I am not sure where to start. My goal is to take image data, of numbers and multicolored background, and reliably get the correct characters identified. I looked into the tools necessary for this and I found the Anaconda python distribution which included all the possible packages I might need for this, as well as tesseract-ocr and pytesser.
Unfortunately, I'm lost in how to begin. I"m using the PyCharm Community IDE and simply trying to follow this guide: http://www.manejandodatos.es/2014/11/ocr-python-easy/ to get a grasp on OCR.
This is the code I'm using:
from PIL import Image
from pytesser import *
image_file = 'menu.jpg'
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text
and I believe the Anaconda distribution that I'm using has PIL, but I'm getting this error:
C:\Users\diego_000\Anaconda\python.exe C:/Users/diego_000/PycharmProjects/untitled/test.py
Traceback (most recent call last):
File "C:/Users/diego_000/PycharmProjects/untitled/test.py", line 2, in <module>
from pytesser import *
File "C:\Users\diego_000\PycharmProjects\untitled\pytesser.py", line 6, in <module>
import Image
ImportError: No module named Image
Process finished with exit code 1
Can anyone point me in the right direction?
The document you point to says to use
from PIL import Image
except you use
import Image
and so the interpreter properly says:
ImportError: No module named Image
It looks as if you reordered the lines
from PIL import Image
from pytesser import *
and that pytesser has a improperly coded dependency on PIL. but I can't be certain with the code you provided.