Tesseract OCR simple example

Will Robinson picture Will Robinson · May 17, 2013 · Viewed 115.9k times · Source

Hi Can you anyone give me a simple example of testing Tesseract OCR preferably in C#.
I tried the demo found here. I download the English dataset and unzipped in C drive. and modified the code as followings:

string path = @"C:\pic\mytext.jpg";
Bitmap image = new Bitmap(path);
Tesseract ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
ocr.Init(@"C:\tessdata\", "eng", false); // To use correct tessdata
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
    Console.WriteLine("{0} : {1}", word.Confidence, word.Text);

Unfortunately the code doesn't work. the program dies at "ocr.Init(..." line. I couldn't even get an exception even using try-catch.

I was able to run the vietocr! but that is a very large project for me to follow. i need a simple example like above.

Thanks

Answer

Will Robinson picture Will Robinson · May 17, 2013

Ok. I found the solution here tessnet2 fails to load the Ans given by Adam

Apparently i was using wrong version of tessdata. I was following the the source page instruction intuitively and that caused the problem.

it says

Quick Tessnet2 usage

  1. Download binary here, add a reference of the assembly Tessnet2.dll to your .NET project.

  2. Download language data definition file here and put it in tessdata directory. Tessdata directory and your exe must be in the same directory.

After you download the binary, when you follow the link to download the language file, there are many language files. but none of them are right version. you need to select all version and go to next page for correct version (tesseract-2.00.eng)! They should either update download binary link to version 3 or put the the version 2 language file on the first page. Or at least bold mention the fact that this version issue is a big deal!

Anyway I found it. Thanks everyone.