Python opencv sorting contours

user6796935 picture user6796935 · Sep 9, 2016 · Viewed 19.8k times · Source

I am following this question:

How can I sort contours from left to right and top to bottom?

to sort contours from left-to-right and top-to-bottom. However, my contours are found using this (OpenCV 3):

im2, contours, hierarchy = cv2.findContours(threshold,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)

and they are formatted like this:

   array([[[ 1,  1]],

   [[ 1, 36]],

   [[63, 36]],

   [[64, 35]],

   [[88, 35]],

   [[89, 34]],

   [[94, 34]],

   [[94,  1]]], dtype=int32)]

When I run the code

max_width = max(contours, key=lambda r: r[0] + r[2])[0]
max_height = max(contours, key=lambda r: r[3])[3]
nearest = max_height * 1.4
contours.sort(key=lambda r: (int(nearest * round(float(r[1])/nearest)) * max_width + r[0]))

I am getting the error

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

so I changed it to this:

max_width = max(contours, key=lambda r:  np.max(r[0] + r[2]))[0]
max_height = max(contours, key=lambda r:  np.max(r[3]))[3]
nearest = max_height * 1.4
contours.sort(key=lambda r: (int(nearest * round(float(r[1])/nearest)) * max_width + r[0]))

but now I am getting the error:

TypeError: only length-1 arrays can be converted to Python scalars

EDIT:

After reading the answer below I modified my code:

EDIT 2

This is the code that I use to "dilate" the characters and find the contours

kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(35,35))

# dilate the image to get text
# binaryContour is just the black and white image shown below
dilation = cv2.dilate(binaryContour,kernel,iterations = 2)

END OF EDIT 2

im2, contours, hierarchy = cv2.findContours(dilation,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)

myContours = []

# Process the raw contours to get bounding rectangles
for cnt in reversed(contours):

    epsilon = 0.1*cv2.arcLength(cnt,True)
    approx = cv2.approxPolyDP(cnt,epsilon,True)

    if len(approx == 4):

        rectangle = cv2.boundingRect(cnt)
        myContours.append(rectangle)

max_width = max(myContours, key=lambda r: r[0] + r[2])[0]
max_height = max(myContours, key=lambda r: r[3])[3]
nearest = max_height * 1.4
myContours.sort(key=lambda r: (int(nearest * round(float(r[1])/nearest)) * max_width + r[0]))

i=0
for x,y,w,h in myContours:

    letter = binaryContour[y:y+h, x:x+w]
    cv2.rectangle(binaryContour,(x,y),(x+w,y+h),(255,255,255),2)
    cv2.imwrite("pictures/"+str(i)+'.png', letter) # save contour to file
    i+=1

Contours before sorting:

[(1, 1, 94, 36), (460, 223, 914, 427), (888, 722, 739, 239), (35,723, 522, 228), 
(889, 1027, 242, 417), (70, 1028, 693, 423), (1138, 1028, 567, 643),     
(781, 1030, 98, 413), (497, 1527, 303, 132), (892, 1527, 168, 130),  
(37, 1719, 592, 130), (676, 1721, 413, 129), (1181, 1723, 206, 128), 
(30, 1925, 997, 236), (1038, 1929, 170, 129), (140, 2232, 1285, 436)]

Contours after sorting:

(NOTE: This is not the order I want the contours to be sorted in. Refer to image at the bottom)

[(1, 1, 94, 36), (460, 223, 914, 427), (35, 723, 522, 228), (70,1028, 693, 423), 
(781, 1030, 98, 413), (888, 722, 739, 239), (889, 1027, 242, 417), 
(1138, 1028, 567, 643), (30, 1925, 997, 236), (37, 1719, 592, 130), 
(140, 2232, 1285, 436), (497, 1527, 303, 132), (676, 1721, 413, 129), 
(892, 1527, 168, 130), (1038, 1929, 170, 129), (1181, 1723, 206, 128)]

Image I am working with

enter image description here

I want to find the contours in the following order: enter image description here

Dilation image used for finding contours enter image description here

Answer

ZdaR picture ZdaR · Sep 12, 2016

What you actually need is to devise a formula to convert your contour information to a rank and use that rank to sort the contours, Since you need to sort the contours from top to Bottom and left to right so your formula must involve the origin of a given contour to calculate its rank. For example we can use this simple method:

def get_contour_precedence(contour, cols):
    origin = cv2.boundingRect(contour)
    return origin[1] * cols + origin[0]

It gives a rank to each contour depending upon the origin of contour. It varies largely when two consecutive contours lie vertically but varies marginally when contours are stacked horizontally. So in this way, First the contours would be grouped from Top to Bottom and in case of Clash the less variant value among the horizontal laid contours would be used.

import cv2

def get_contour_precedence(contour, cols):
    tolerance_factor = 10
    origin = cv2.boundingRect(contour)
    return ((origin[1] // tolerance_factor) * tolerance_factor) * cols + origin[0]

img = cv2.imread("/Users/anmoluppal/Downloads/9VayB.png", 0)

_, img = cv2.threshold(img, 70, 255, cv2.THRESH_BINARY)

im, contours, h = cv2.findContours(img.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

contours.sort(key=lambda x:get_contour_precedence(x, img.shape[1]))

# For debugging purposes.
for i in xrange(len(contours)):
    img = cv2.putText(img, str(i), cv2.boundingRect(contours[i])[:2], cv2.FONT_HERSHEY_COMPLEX, 1, [125])

enter image description here

If you see closely, the third row where 3, 4, 5, 6 contours are placed the 6 comes between 3 and 5, The reason is that the 6th contour is slightly below the line of 3, 4, 5 contours.

Tell me is you want the output in other way around we can tweak the get_contour_precedence to get 3, 4, 5, 6 ranks of contour corrected.