What is the difference between Pytesseract and Tesserocr?

Question 1

What is the difference between Pytesseract and Tesserocr?

python ocr tesseract python-tesseract

Soufiane Sabiri · Feb 19, 2019 · Viewed 7.4k times · Source

Answer

Answer

From my experience Tesserocr is much faster than Pytesseract.

Tesserocr is a python wrapper aroung the Tesseract C++ API. Whereas pytesseract is a wrapper the tesseract-ocr CLI.

Therefore with Tesserocr you can load the model in the beginning or your program, and run the model seperately (for example in loops to process videos). With pytesseract, each time you call image_to_string function, it loads the model and process the image, therefore being slower for video processing.

To install tesserocr I just typed in the terminal pip install tesserocr.

To use tesserocr

import tesserocr
from PIL import Image
api = tesserocr.PyTessBaseAPI()
pil_image = Image.open('sample.jpg')
api.SetImage(pil_image)
text = api.GetUTF8Text()

To install pytesseract : pip install pytesseract.

To run it :

import pytesseract
import cv2
image = cv2.imread('sample.jpg')
text = pytesseract.image_to_string(image)

Question 2

I'm using Python 3.6 in Windows 10 and have Pytesseract already installed but I found in a code Tesserocr which by the way I can't install. What is the difference?

What is the difference between Pytesseract and Tesserocr?

Answer

Related questions