Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata
Right now I'm using this simple script :
try:
import Image
except ImportError:
from PIL import Image
import pytesseract as tes
results = tes.image_to_string(Image.open('./test.jpg'),boxes=True)
file = open('parsing.text','a')
file.write(results)
print(results)
How to I use my traineddata file so I'm able to read new font with the python script ?
thanks !
edit#1 : so I understand that *.traineddata
can be used with Tesseract as a command-line program. so my question still the same, how do I use traineddata with python ?
edit#2 : the answer to my question is here How to access the command line for Tesseract from Python?
Below is a sample of pytesseract.image_to_string()
with options.
pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),
lang="eng",boxes=False,
config="--psm 4 --oem 3
-c tessedit_char_whitelist=-01234567890XYZ:"))
To use your own trained language data, just replace "eng"
in lang="eng"
with you language name(.traineddata)
.