What is the difference between Pytesseract and Tesserocr?

Soufiane Sabiri picture Soufiane Sabiri · Feb 19, 2019 · Viewed 7.4k times · Source

I'm using Python 3.6 in Windows 10 and have Pytesseract already installed but I found in a code Tesserocr which by the way I can't install. What is the difference?

Answer

Houssam ASSANY picture Houssam ASSANY · May 31, 2019

From my experience Tesserocr is much faster than Pytesseract.

Tesserocr is a python wrapper aroung the Tesseract C++ API. Whereas pytesseract is a wrapper the tesseract-ocr CLI.

Therefore with Tesserocr you can load the model in the beginning or your program, and run the model seperately (for example in loops to process videos). With pytesseract, each time you call image_to_string function, it loads the model and process the image, therefore being slower for video processing.

To install tesserocr I just typed in the terminal pip install tesserocr.

To use tesserocr

import tesserocr
from PIL import Image
api = tesserocr.PyTessBaseAPI()
pil_image = Image.open('sample.jpg')
api.SetImage(pil_image)
text = api.GetUTF8Text()

To install pytesseract : pip install pytesseract.

To run it :

import pytesseract
import cv2
image = cv2.imread('sample.jpg')
text = pytesseract.image_to_string(image)