How to generate a tiff/box file from an image to train Tesseract in Windows

greenlasagna picture greenlasagna · Jul 31, 2015 · Viewed 9.3k times · Source

I'm trying to train Tesseract in Windows and for that I need a pair tiff/box file and I'm trying to create it using jTessBoxEditor but it doesn't accept images as input. I've also tried boxFactory but it doesn't run properly. Does anyone know what is the best tool to create the pair from images?

Thanks

Answer

darkpotpot picture darkpotpot · Aug 6, 2015

If you have jTessBoxEditor, then you have Tesseract bin files. Go to the tesseract-ocr subfolder of jTessBoxEditor and run the following command :

tesseract.exe D:\testocr\TestImage.tif D:\testocr\TestImage batch.nochop makebox

It should generate the file D:\testocr\TestImage.box. Then in jTessBoxEditor, go to Box Editor tab and open your image. The box file is automatically loaded, you can check if everything is ok and correct possible mistakes.