Tesseract traineddata not working in Swift 3.0 project using version 4.0

Adrian picture Adrian · Dec 13, 2016 · Viewed 10.7k times · Source

I'm attempting to use Tesseract-OCR-iOS in a new Swift 3.0 project. I'm using Xcode Version 8.1 (8B62). CocoaPods is version 1.1.1.

When I attempt to use tesseract.recognize(), my app crashes and I get the following output in the console:

actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 53

I found this post, which sounds I'm using the wrong version of traineddata. I downloaded tessdata from the tesseract-ocr/tessdata repo, so I'm baffled as to why I'd have a mismatch on the version numbers.

Any suggestions how to get Tesseract working are greatly appreciated. Below is additional information re: my setup.

Here's what my Podfile looks like:

# Uncomment the next line to define a global platform for your project
platform :ios, '9.0'

target 'TesseractDemo' do
  # Comment the next line if you're not using Swift and don't want to use dynamic frameworks
  use_frameworks!

  # Pods for TesseractDemo
pod 'TesseractOCRiOS', '4.0.0'

end

I've dragged a tessdata folder containing eng.traineddata into the root directory of my project outside of Xcode and dragged a reference from Finder to Xcode's Project Navigator.

Everything works fine up to this point. No compiler errors, linker whining, etc. In a UIViewController I'm importing TesseratOCR and calling it like so:

// MARK: - OCR Methods
func scanImage(image: UIImage) {
    if let tesseract = G8Tesseract(language: "eng") {
        tesseract.delegate = self
        tesseract.image = imageToScan?.g8_blackAndWhite()
        tesseract.recognize()

        textView.text = tesseract.recognizedText
    }
}

Update I found a link to a repo of traineddata files for version 4.0. I nuked my old eng.traineddata file and replaced it with the one from the 4.0 repo. I get the same error referencing the same line.

Answer

Adrian picture Adrian · Dec 15, 2016

The current version of eng.traineddata linked above on GitHub will not work with the current version of the Tesseract-OCR-iOS.

The installation instructions posted on GitHub work perfectly if you've got the right <language>.traineddata file.

I discovered this after dragging the eng.traineddata from Lyndsey Scott's brilliant Tesseract tutorial on Ray Wenderlich.

This repo contains the eng.traineddata file I needed to get Tesseract working. I'm not sure if that applies to all languages.