Pytesser set character whitelist

Minato10 picture Minato10 · Apr 30, 2017 · Viewed 14.2k times · Source

Does anyone know how to set the character whitelist for Pytesseract? I want it to only output A-z and 0-9. Is this possible? I have the following:

img = Image.open('test.jpg')
result = pytesseract.image_to_string(img, config='-psm 6')

I'm getting other characters like / for a 1 so I would like to limit the options of possible characters.

Answer

James Vaughn picture James Vaughn · Apr 30, 2017

You can accomplish that with the below line. Or you can setup the config file for tesseract to do the same thing Limit characters tesseract is looking for

pytesseract.image_to_string(question_img, config="-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz -psm 6")

I am sure there are other ways to get it work, but this is what worked for me.