I'm using Python 3.6 in Windows 10 and have Pytesseract already installed but I found in a code Tesserocr which by the way I can't install. What is the difference?
From my experience Tesserocr is much faster than Pytesseract.
Tesserocr is a python wrapper aroung the Tesseract C++ API. Whereas pytesseract is a wrapper the tesseract-ocr CLI.
Therefore with Tesserocr you can load the model in the beginning or your program, and run the model seperately (for example in loops to process videos).
With pytesseract, each time you call image_to_string
function, it loads the model and process the image, therefore being slower for video processing.
To install tesserocr I just typed in the terminal pip install tesserocr
.
To use tesserocr
import tesserocr
from PIL import Image
api = tesserocr.PyTessBaseAPI()
pil_image = Image.open('sample.jpg')
api.SetImage(pil_image)
text = api.GetUTF8Text()
To install pytesseract : pip install pytesseract
.
To run it :
import pytesseract
import cv2
image = cv2.imread('sample.jpg')
text = pytesseract.image_to_string(image)