pytesseract cannot find the file specified

jason m picture jason m · Dec 11, 2015 · Viewed 27.3k times · Source

My code is straight forward and is the following:

import pytesseract
from PIL import Image

img = Image.open('C:/temp/foo.jpg')
img.load()
i = pytesseract.image_to_string(img)

and the error response I get back is:

Traceback (most recent call last):
  File "img.py", line 6, in <module>
    i = pytesseract.image_to_string(img)
  File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 161, in image_to
_string
  File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 94, in run_tesse
ract
  File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda\lib\subprocess.py",
line 710, in __init__
    errread, errwrite)
  File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda\lib\subprocess.py",
line 958, in _execute_child
    startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

Any guidance would be fantastic.

Adding tesseract to my path variable helped: C:\Program Files (x86)\Tesseract-OCR

But the code now crashes when trying to run the pytesseract piece.

Answer

MaxU picture MaxU · Mar 6, 2016

Just hit the same error and decided to answer this question - it might help someone to save time...

First, make sure you have installed/copied Tesseract-OCR executables.

Windows can't find the executable tesseract in the directories specified in your PATH environment variable. So either make sure that the directory containing tesseract is in your PATH variable or overwrite tesseract_cmd variable in your Python script like as following (put your PATH instead):

import pytesseract

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'

Beside that make sure that TESSDATA_PREFIX Windows environment variable is set to the directory, containing tessdata directory. For example:

TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR

if tessdata location is: C:\Program Files (x86)\Tesseract-OCR\tessdata