How does one install Tesseract-OCR 3.03 in Ubuntu/Linux distributions?

greenteawarrior picture greenteawarrior · Jun 13, 2014 · Viewed 38.3k times · Source

A friend and I are interested in training the tesseract-OCR engine for a CV project. We tried using some wrappers such as PyTesser and pyocr, but the results are currently not as accurate as we need them to be. As such, we want to try training the tesseract to perform better for our purposes (i.e. identifying text on food labels), but are having some trouble installing the training tools.

What we've tried:

Looking on the google code website, the 'Compiling' page on the tesseract's google code wiki says the training tools are only available on version 3.03. However, the google code 'Downloads' page for tesseract-ocr only has the materials for 3.02. The bottom of the 'Compiling' page also has some comments about installing version 3.03 on Windows and OSX, but no comments yet for Linux users.

There also appears to be some sort of 3.03 source package for Ubuntu but we're not sure how to access it on our computers and the 'Compiling' page says we need to run these commands:

make training
sudo make training-install

We've also found a google group thread about tesseract 3.03 but again it seems like these posts do not include advice for Linux users (unless we missed something during the initial read).

Is this actually a really simple command-line install problem? Or, is there a way train tesseract with 3.02 (which we currently have installed)? Have we been looking at the wrong places for information?

Any advice or links to instructions for installing tesseract-ocr 3.03 for Linux distributions would be greatly appreciated! Thanks.

Answer

erluxman picture erluxman · Dec 23, 2014

Tesseract can directly be installed in Ubuntu 14.04 using

sudo apt-get install tesseract-ocr

I don't have any idea if you can do it in older version of Ubuntu because the repo might be updated in later version of Ubuntu.