Using opencv to match an image from a group of images for purpose of identification in C++

2c2c picture 2c2c · Feb 7, 2013 · Viewed 16.4k times · Source

EDIT: I've acquired enough reputation through this post to be able to edit it with more links, which will help me get my point across better

People playing binding of isaac often come across important items on little pedestals.

The goal is to have a user confused about what an item is be able to press a button which will then instruct him to "box" the item(think windows desktop boxing). The box gives us the region of interest(the actual item plus some background environment) to compare to what will be an entire grid of items.

Theoretical user boxed item enter image description here

Theoretical grid of items(there's not many more, I just ripped this out of the binding of isaac wiki) enter image description here

The location in the grid of items identified as the item the user boxed would represent a certain area on the image that correlates to a proper link to the binding of isaac wiki giving information on the item.

In the grid the item is 1st column 3rd from the bottom row. I use these two images in all of the things I tried below


My goal is creating a program that can take a manual crop of an item from the game "The Binding of Isaac", identify the cropped item by finding comparing the image to an image of a table of items in the game, then display the proper wiki page.

This would be my first "real project" in the sense that it requires a huge amount of library learning to get what I want done. It's been a bit overwhelming.

I've messed with a few options just from googling around. (you can quickly find the tutorials I used by searching the name of the method and opencv. my account is heavily restricted with link posting for some reason)

using bruteforcematcher:

http://docs.opencv.org/doc/tutorials/features2d/feature_description/feature_description.html

#include <stdio.h>
#include <iostream>
#include "opencv2/core/core.hpp"
#include <opencv2/legacy/legacy.hpp>
#include <opencv2/nonfree/features2d.hpp>
#include "opencv2/highgui/highgui.hpp"

using namespace cv;

void readme();

/** @function main */
int main( int argc, char** argv )
{
  if( argc != 3 )
   { return -1; }

  Mat img_1 = imread( argv[1], CV_LOAD_IMAGE_GRAYSCALE );
  Mat img_2 = imread( argv[2], CV_LOAD_IMAGE_GRAYSCALE );

  if( !img_1.data || !img_2.data )
   { return -1; }

  //-- Step 1: Detect the keypoints using SURF Detector
  int minHessian = 400;

  SurfFeatureDetector detector( minHessian );

  std::vector<KeyPoint> keypoints_1, keypoints_2;

  detector.detect( img_1, keypoints_1 );
  detector.detect( img_2, keypoints_2 );

  //-- Step 2: Calculate descriptors (feature vectors)
  SurfDescriptorExtractor extractor;

  Mat descriptors_1, descriptors_2;

  extractor.compute( img_1, keypoints_1, descriptors_1 );
  extractor.compute( img_2, keypoints_2, descriptors_2 );

  //-- Step 3: Matching descriptor vectors with a brute force matcher
  BruteForceMatcher< L2<float> > matcher;
  std::vector< DMatch > matches;
  matcher.match( descriptors_1, descriptors_2, matches );

  //-- Draw matches
  Mat img_matches;
  drawMatches( img_1, keypoints_1, img_2, keypoints_2, matches, img_matches );

  //-- Show detected matches
  imshow("Matches", img_matches );

  waitKey(0);

  return 0;
  }

 /** @function readme */
 void readme()
 { std::cout << " Usage: ./SURF_descriptor <img1> <img2>" << std::endl; }

enter image description here

results in not so useful looking stuff. Cleaner but equally unreliable results using flann.

http://docs.opencv.org/doc/tutorials/features2d/feature_flann_matcher/feature_flann_matcher.html

#include <stdio.h>
#include <iostream>
#include "opencv2/core/core.hpp"
#include <opencv2/legacy/legacy.hpp>
#include <opencv2/nonfree/features2d.hpp>
#include "opencv2/highgui/highgui.hpp"

using namespace cv;

void readme();

/** @function main */
int main( int argc, char** argv )
{
  if( argc != 3 )
  { readme(); return -1; }

  Mat img_1 = imread( argv[1], CV_LOAD_IMAGE_GRAYSCALE );
  Mat img_2 = imread( argv[2], CV_LOAD_IMAGE_GRAYSCALE );

  if( !img_1.data || !img_2.data )
  { std::cout<< " --(!) Error reading images " << std::endl; return -1; }

  //-- Step 1: Detect the keypoints using SURF Detector
  int minHessian = 400;

  SurfFeatureDetector detector( minHessian );

  std::vector<KeyPoint> keypoints_1, keypoints_2;

  detector.detect( img_1, keypoints_1 );
  detector.detect( img_2, keypoints_2 );

  //-- Step 2: Calculate descriptors (feature vectors)
  SurfDescriptorExtractor extractor;

  Mat descriptors_1, descriptors_2;

  extractor.compute( img_1, keypoints_1, descriptors_1 );
  extractor.compute( img_2, keypoints_2, descriptors_2 );

  //-- Step 3: Matching descriptor vectors using FLANN matcher
  FlannBasedMatcher matcher;
  std::vector< DMatch > matches;
  matcher.match( descriptors_1, descriptors_2, matches );

  double max_dist = 0; double min_dist = 100;

  //-- Quick calculation of max and min distances between keypoints
  for( int i = 0; i < descriptors_1.rows; i++ )
  { double dist = matches[i].distance;
    if( dist < min_dist ) min_dist = dist;
    if( dist > max_dist ) max_dist = dist;
  }

  printf("-- Max dist : %f \n", max_dist );
  printf("-- Min dist : %f \n", min_dist );

  //-- Draw only "good" matches (i.e. whose distance is less than 2*min_dist )
  //-- PS.- radiusMatch can also be used here.
  std::vector< DMatch > good_matches;

  for( int i = 0; i < descriptors_1.rows; i++ )
  { if( matches[i].distance < 2*min_dist )
    { good_matches.push_back( matches[i]); }
  }

  //-- Draw only "good" matches
  Mat img_matches;
  drawMatches( img_1, keypoints_1, img_2, keypoints_2,
               good_matches, img_matches, Scalar::all(-1), Scalar::all(-1),
               vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );

  //-- Show detected matches
  imshow( "Good Matches", img_matches );

  for( int i = 0; i < good_matches.size(); i++ )
  { printf( "-- Good Match [%d] Keypoint 1: %d  -- Keypoint 2: %d  \n", i, good_matches[i].queryIdx, good_matches[i].trainIdx ); }

  waitKey(0);

  return 0;
 }

 /** @function readme */
 void readme()
 { std::cout << " Usage: ./SURF_FlannMatcher <img1> <img2>" << std::endl; }

enter image description here

templatematching has been my best method so far. of the 6 methods it ranges from getting only 0-4 correct identifications though.

http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>

using namespace std;
using namespace cv;

/// Global Variables
Mat img; Mat templ; Mat result;
char* image_window = "Source Image";
char* result_window = "Result window";

int match_method;
int max_Trackbar = 5;

/// Function Headers
void MatchingMethod( int, void* );

/** @function main */
int main( int argc, char** argv )
{
  /// Load image and template
  img = imread( argv[1], 1 );
  templ = imread( argv[2], 1 );

  /// Create windows
  namedWindow( image_window, CV_WINDOW_AUTOSIZE );
  namedWindow( result_window, CV_WINDOW_AUTOSIZE );

  /// Create Trackbar
  char* trackbar_label = "Method: \n 0: SQDIFF \n 1: SQDIFF NORMED \n 2: TM CCORR \n 3: TM CCORR NORMED \n 4: TM COEFF \n 5: TM COEFF NORMED";
  createTrackbar( trackbar_label, image_window, &match_method, max_Trackbar, MatchingMethod );

  MatchingMethod( 0, 0 );

  waitKey(0);
  return 0;
}

/**
 * @function MatchingMethod
 * @brief Trackbar callback
 */
void MatchingMethod( int, void* )
{
  /// Source image to display
  Mat img_display;
  img.copyTo( img_display );

  /// Create the result matrix
  int result_cols =  img.cols - templ.cols + 1;
  int result_rows = img.rows - templ.rows + 1;

  result.create( result_cols, result_rows, CV_32FC1 );

  /// Do the Matching and Normalize
  matchTemplate( img, templ, result, match_method );
  normalize( result, result, 0, 1, NORM_MINMAX, -1, Mat() );

  /// Localizing the best match with minMaxLoc
  double minVal; double maxVal; Point minLoc; Point maxLoc;
  Point matchLoc;

  minMaxLoc( result, &minVal, &maxVal, &minLoc, &maxLoc, Mat() );

  /// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
  if( match_method  == CV_TM_SQDIFF || match_method == CV_TM_SQDIFF_NORMED )
    { matchLoc = minLoc; }
  else
    { matchLoc = maxLoc; }

  /// Show me what you got
  rectangle( img_display, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );
  rectangle( result, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );

  imshow( image_window, img_display );
  imshow( result_window, result );

  return;
}

http://imgur.com/pIRBPQM,h0wkqer,1JG0QY0,haLJzRF,CmrlTeL,DZuW73V#3

of the 6 fail,pass,fail,pass,pass,pass

This was sort of a best case result though. The next item I tried was

enter image description here and resulted in fail,fail,fail,fail,fail,fail

From item to item all of these methods have some that work well and some that do terribly

So I'll ask: is templatematching my best bet or is there a method I'm not considering that will be my holy grail?

How can I get a USER to create the crop manually? Opencv's documentation on this is really bad and the examples I find online are extremely old cpp or straight C.

Thanks for any help. This venture has been an interesting experience so far. I had to strip all of the links which would better portray how everything's been working out, but the site is saying I'm posting more than 10 links even when I'm not.


some more examples of items throughout the game:

the rock is a rare item and one of the few that can be "anywhere" on the screen. items like the rock are the reason why cropping of the item by user is the best way about isolating the item, otherwise their positions are only in a couple of specific places.

enter image description here

enter image description here

An item after a boss fight, lots of stuff everywhere and transparency in the middle. I would imagine this being one of the harder ones to work correctly

enter image description here

enter image description here

Rare room. simple background. no item transparency.

enter image description here

enter image description here

here are the two tables all of the items in the game are.. I'll make them one image eventually but for now they were directly taken from the isaac wiki.

enter image description here

enter image description here

Answer

ffriend picture ffriend · Feb 7, 2013

One important detail here is that you have pure image of every item in your table. You know color of background and can detach item from the rest of the picture. For example, in addition to matrix, representing image itself, you may store matrix of 1-s and 0-s of the same size, where ones correspond to image area and zeros - to background. Let's call this matrix "mask" and pure image of the item - "pattern".

There are 2 ways to compare images: match image with the pattern and match pattern with the image. What you have described is matching image with the pattern - you have some cropped image and want to find similar pattern. Instead, think about searching pattern on image.

Let's first define function match() that takes pattern, mask and image of the same size and checks if area on pattern under the mask is exactly the same as in image (pseudocode):

def match(pattern, mask, image):
    for x = 0 to pattern.width:
        for y = 0 to pattern.height: 
           if mask[x, y] == 1 and              # if in pattern this pixel is not part of background
              pattern[x, y] != image[x, y]:    # and pixels on pattern and image differ
               return False  
    return True

But sizes of pattern and cropped image may differ. Standard solution for this (used, for example, in cascade classifier) is to use sliding window - just move pattern "window" across image and check if pattern matches selected region. This is pretty much how image detection works in OpenCV.

Of course, this solution is not very robust - cropping, resizing or any other image transformations may change some pixels, and in this case method match() will always return false. To overcome this, instead of boolean answer you can use distance between image and pattern. In this case function match() should return some value of similarity, say, between 0 and 1, where 1 stands for "exactly the same", while 0 for "completely different". Then you either set threshold for similarity (e.g. image should be at least 85% similar to the pattern), or just select pattern with highest value of similarity.

Since items in the game are artificial images and variation in them is very small, this approach should be enough. However, for more complicated cases you will need other features than simply pixels under the mask. As I already suggested in my comment, methods like Eigenfaces, cascade classifier using Haar-like features or even Active Appearance Models may be more efficient for these tasks. As for SURF, as far as I know it's better suited for tasks with varying angle and size of object, but not for different backgrounds and all such things.