Image preprocessing with OpenCV before doing character recognition (tesseract)

Question 1

Image preprocessing with OpenCV before doing character recognition (tesseract)

java opencv tesseract tess4j anpr

DocC · May 18, 2016 · Viewed 16.1k times · Source

Answer

Answer

Here's how I suggest you should do this task.

Convert to Grayscale.
Gaussian Blur with 3x3 or 5x5 filter.
Apply Sobel Filter to find vertical edges.

Sobel(gray, dst, -1, 1, 0)
Threshold the resultant image to get a binary image.
Apply a morphological close operation using suitable structuring element.
Find contours of the resulting image.
Find minAreaRect of each contour. Select rectangles based on aspect ratio and minimum and maximum area.
For each selected contour, find edge density. Set a threshold for edge density and choose the rectangles breaching that threshold as possible plate regions.
Few rectangles will remain after this. You can filter them based on orientation or any criteria you deem suitable.
Clip these detected rectangular portions from the image after adaptiveThreshold and apply OCR.

a) Result after Step 5

b) Result after Step 7. Green ones are all the minAreaRects and the Red ones are those which satisfy the following criteria: Aspect Ratio range (2,12) & Area range (300,10000)

c) Result after Step 9. Selected rectangle. Criteria: Edge Density > 0.5

EDIT

For edge-density, what I did in the above examples is the following.

Apply Canny Edge detector directly to input image. Let the cannyED image be Ic.
Multiply results of Sobel filter and Ic. Basically, take an AND of Sobel and Canny images.
Gaussian Blur the resultant image with a large filter. I used 21x21.
Threshold the resulting image using OTSU's method. You'll get a binary image
For each red rectangle, rotate the portion inside this rectangle (in the binary image) to make it upright. Loop through the pixels of the rectangle and count white pixels. (How to rotate?)

Edge Density = No. of White Pixels in the Rectangle/Total no. of Pixels in the rectangle

Choose a threshold for edge density.

NOTE: Instead of going through steps 1 to 3, you can also use the binary image from step 5 for calculating edge density.

Question 2

I'm trying to develop simple PC application for license plate recognition (Java + OpenCV + Tess4j). Images aren't really good (in further they will be good). I want to preprocess image for tesseract, and I'm stuck on detection of license plate (rectangle detection).

My steps:

1) Source Image

Mat img = new Mat();
img = Imgcodecs.imread("sample_photo.jpg"); 
Imgcodecs.imwrite("preprocess/True_Image.png", img);

2) Gray Scale

Mat imgGray = new Mat();
Imgproc.cvtColor(img, imgGray, Imgproc.COLOR_BGR2GRAY);
Imgcodecs.imwrite("preprocess/Gray.png", imgGray);

3) Gaussian Blur

Mat imgGaussianBlur = new Mat(); 
Imgproc.GaussianBlur(imgGray,imgGaussianBlur,new Size(3, 3),0);
Imgcodecs.imwrite("preprocess/gaussian_blur.png", imgGaussianBlur);

4) Adaptive Threshold

Mat imgAdaptiveThreshold = new Mat();
Imgproc.adaptiveThreshold(imgGaussianBlur, imgAdaptiveThreshold, 255, CV_ADAPTIVE_THRESH_MEAN_C ,CV_THRESH_BINARY, 99, 4);
Imgcodecs.imwrite("preprocess/adaptive_threshold.png", imgAdaptiveThreshold);

Here should be 5th step, which is detection of plate region (probably even without deskewing for now).

I croped needed region from image (after 4th step) with Paint, and got:

Then I did OCR (via tesseract, tess4j):

File imageFile = new File("preprocess/adaptive_threshold_AFTER_PAINT.png");
ITesseract instance = new Tesseract();
instance.setLanguage("eng");
instance.setTessVariable("tessedit_char_whitelist", "acekopxyABCEHKMOPTXY0123456789");
String result = instance.doOCR(imageFile); 
System.out.println(result);

and got (good enough?) result - "Y841ox EH" (almost true)

How can I detect and crop plate region after 4th step? Have I to make some changes (improvements) in 1-4 steps? Would like to see some example implemented via Java + OpenCV (not JavaCV).
Thanks in advance.

EDIT (thanks to @Abdul Fatir's answer) Well, I provide working (for me atleast) code sample (Netbeans+Java+OpenCV+Tess4j) for those who interested in this question. Code is not the best, but I made it just for studying.
http://pastebin.com/H46wuXWn (do not forget to put tessdata folder into your project folder)

Image preprocessing with OpenCV before doing character recognition (tesseract)

Answer

Related questions