Emgu.cv's Tesseract object using incorrect path for OCR files

DanTheMan picture DanTheMan · Feb 29, 2012 · Viewed 8.9k times · Source

I wish to use EMGU.CV's Tesseract object to do OCR on some pictures. To start, I've downloaded, compiled and ran their OCR and LicensePlateRecognition examples.

However, Tesseract kept throwing the following exception:

Unable to create ocr model using Path 'teseract' and language 'eng'.

And I traced the source to the line:

_ocr = new Tesseract(@"tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_CUBE_COMBINED);

I tried fixing it with the most obvious ways: I gave it the full path, I copied the files around to just 'C: \', and I made sure that my program's current directory was the same one with the tessdata in it.

None of those worked, so I used procmon and discovered it was looking for the files here:

C: \Program Files (x86)\Tesseract-OCR\tessdata

And it seems no matter what I do I cannot change it from this location. (Moving the files there worked, of course). This location does not exist anywhere in EMGU.cv's code, so my guess is that it's compiled into Tesseract's code as some default (?).

So, how do I change Tesseract from using this location? The obvious way is that the Tesseract constructor should DO something with the path I pass into it, so what am I missing?

Answer

Dan Gøran Lunde picture Dan Gøran Lunde · Feb 3, 2013

I have tried copying files to the directory where my application runs, I have tried absolute and relative paths and I have tried using hte hard coded C: \Program Files (x86)\Tesseract-OCR\tessdata. None of them worked for me.

I got it working by doing the following:

  1. Copy tessdata folder to where my App is running
  2. Then specify an empty dataPath parameter (apparently tessdata/ is appended to dataPath by default). This code worked:

_ocr = new Tesseract("", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_CUBE_COMBINED);