I am using python 3.x and using the following code to convert image into text:
from PIL import Image
from pytesseract import image_to_string
image = Image.open('image.png', mode='r')
print(image_to_string(image))
I am getting the following error:
Traceback (most recent call last):
File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module>
print(image_to_string(image))
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 950, in __init__
restore_signals, start_new_session)
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 1220, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Please note that I have put the image in the same directory where my python is present. Also It does not raise error on image = Image.open('image.png', mode='r')
but it raises on the line print(image_to_string(image))
.
Any idea what might be wrong here? Thanks
You have to have tesseract
installed and accesible in your path.
According to source, pytesseract
is merely a wrapper for subprocess.Popen
with tesseract binary as a binary to run. It does not perform any kind of OCR itself.
Relevant part of sources:
def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
'''
runs the command:
`tesseract_cmd` `input_filename` `output_filename_base`
returns the exit status of tesseract, as well as tesseract's stderr output
'''
command = [tesseract_cmd, input_filename, output_filename_base]
if lang is not None:
command += ['-l', lang]
if boxes:
command += ['batch.nochop', 'makebox']
if config:
command += shlex.split(config)
proc = subprocess.Popen(command,
stderr=subprocess.PIPE)
return (proc.wait(), proc.stderr.read())
Quoting another part of source:
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'tesseract'
So quick way of changing tesseract path would be:
import pytesseract
pytesseract.tesseract_cmd = "/absolute/path/to/tesseract" # this should be done only once
pytesseract.image_to_string(img)