Input Image:
Expected Output:
I intend to fit three (or some number of) polygons (for this case, rectangles) to signify the "big" white blobs in this image. The rectangles drawn in the output image are as per my perception of the white regions. I do not expect the algorithm to come up with these same bouding regions. What I wish for is to fit some number of tight polygons around the clusters of white pixels.
My initial solution consisted of finding contours for this image, and fitting a closed convex polygon around each contour, by finding the convex hull of the points in each contour.
However, since the white regions are highly fragmented with black regions within and ridged around the edges, the number of contours returned by cv2.findContours is very high (around 500 or so). Due to this, fitting a convex hull does not improve the shape of the white regions. The white regions mostly retain their original abstract shapes. My goal would be to merge the many small contours of a white region into one whole containing contour over which I can then fit a convex hull.
How do I solve this problem? Should I use a clustering algorithm on the contour points initially to find the contours that are close by each other?
You first need to perform morphological-closing(which is dilation followed by erosion) on this image. This closes all the tiny "holes" your image has while preserving the shape and size of the individual components. Opposed to it, when erosion is followed by dilation, it removes the noisy dots in the image. I am working on a similar image and I had to perform dilation+erosion as much as 10 times to even out my components. After you do it, use connected components or find contours. This will certainly bring the contour count down from 400 to 20-30.
Secondly, you mentioned you need 3 clusters. Though the two little clusters(covered by the red line) could have merged into one. What I made out of it was that you want each of your cluster to be as tightly fitting into its bounding rectangle as possible. So, I would suggest you to set a threshold efficiency (say 80%) and use hierarchical clustering to merge each connected component into a cluster. When your white pixels exert less than 80% of space of their bounding rectangle(of a cluster), you would stop the clustering and get the clusters.