Importing pytesseract

ComplexData picture ComplexData · Aug 5, 2016 · Viewed 15.9k times · Source

I have trying to use pytesseract for OCR (extracting text from the image). I have successfully installed pytessearct by using the command -

pip install pytessearct

When I try to install it again, it clearly says -

Requirement already satisfied (use --upgrade to upgrade): 
pytesseract in ./site-packages

This means pytessearct is installed successfully. When i try to import this package in my iPython notebook using -

import pytessearct

It throws an error -

ImportError: No module named pytesseract

Why is that happening?

Answer

ajlaj25 picture ajlaj25 · Aug 11, 2017

To use Python-tesseract - requires python 2.5+ or python 3.x - first you have to install PIL and pytesseract packages through pip:

pip install Pillow
pip install pytesseract

Then you have to download and install the tesseract OCR:

https://sourceforge.net/projects/tesseract-ocr-alt/?source=typ_redirect

As far as I know it automatically adds it to your PATH variable.

Then use it like this way:

import pytesseract
from PIL import Image

img = Image.open('Capture.PNG')
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
print( pytesseract.image_to_string(img) )

I hope it helps :)