How to install leptonica+tesseract on Windows without Visual Studio to use in Anaconda?

Viktoriia picture Viktoriia · Feb 8, 2016 · Viewed 10.2k times · Source

I wanted to perform text recognition from images and I want to use Python. I installed Anaconda. Now I want to install Tesseract but I also need to install Leptonica. I did not find any clear instruction how to do it in windows. For Leptonica I do not want to install Visual Studio. So could anybody provide clear instructions how to install leptonica and tesseract on Windows without Visual Studio to use in anaconda ? Thanks.

Answer

c.Parsi picture c.Parsi · Apr 22, 2016

Here is simple set of steps to have tesseract 3.05 dev version as of 04/22/2016 working both on windows 7 and windows 8 machines:

1- install tesseract from its executable from official tesseract-ocr page (version 3.02 for windoes will suffice)

2- download the following two files for tesseract 3.05 dev version from http://domasofan.spdns.eu/tesseract/

There are 2 exe files:

  • tesseract-core-yyyymmdd.exe Tesseract core application without language data
  • tesseract-langs-yyyymmdd.exe All the language data available for Tesseract.

(yyyymmdd means year 4 digits, month 2 digits and day 2 digits.)

The app is portable so you can install it on a USB stick or in another location.

sub Steps to install these:

  1. Download the tesseract-core and tesseract-langs packages.
  2. Double click the tesseract-core package and extract it to a directory where you want it to be (a temporary new folder called "Tess_temp").
  3. Double click the tesseract-langs package and extract it to the same directory but add \tessdata to it in the above "Tess_temp" folder. For example if i would have extracted tesseract-core to c:\Tess_temp, tesseract-langs needs to go to c:\Tess_temp\tessdata.

  4. Now copy what ever you have in "Tess_temp" to where tesseract 3.02 was installed in step 1 above (its usially in C:\Program Files (x86)\Tesseract-OCR) (replace 3.02 materials with 3.05 )

  5. It should work now with the 3.05 version on windows. copy a sample image test.png (with text) to this tesseract-ocr folder and open a cmd and type in the following commands:

    go to tesseract folder: cd C:\Program Files <x86>\Tesseract-OCR

    run tesseract on test.png: tesseract -l eng test.png test_text -psm 6

it will show you

Tesseract Open Source OCR Engine v3.05.00dev with Leptonica

congratulations ! (check test_txt.txt for the extracted text)