I'm using OpenCV 2.3 for keypoints detection and matching. But I am a bit confused with the size
and response
parameters given by the detection algorithm. What do they exactly mean?
Based on the OpenCV manual, I can't figure it out:
float size
: diameter of the meaningful keypoint neighborhood
float response
: the response by which the most strong keypoints have been selected. Can be used for further sorting or subsampling
I thought the best point to track would be the one with the highest response but it seems that it is not the case. So how could I subsample the set of key points returned by the surf detector to keep only the best one in term of trackability?
SURF is a blob detector, in short, the size of a feature is the size of the blob. To be more precise, the returned size by OpenCV is half the length of the approximated Hessian operator. The size is also known as scale, this is due to the way the blob detectors work, i.e., being functionally equal to first blurring the image with the Gaussian filter at several scales and then downsampling the images and finally detecting blobs with a fixed size. See the image below showing the the size of the SURF features. The size of each feature is the radius of the drawn circle. The lines going out from the center of the features to the circumference show the angles or orientations. In this image, the response strength of the blob detection filter is color coded. You can see the majority of the detected features have a weak response. (see the full size image here)
This histogram shows the distribution of the response strengths of the features in the above image:
The most robust feature tracker tracks all the detected features. The more features the more robustness. But it's impractical to track a large number of features as often we want to limit the computation time. The number of features to track often should be empirically tuned for each application. Often the image is divided into regular sub-regions and in each one the n strongest features are kept to be tracked. n is usually chosen such that in total about 500~1000 features are detected per frame.
Reading the journal paper describing SURF definitely will give you a good idea of how it works. Just try not to get stuck in the details, specially if your background isn't in machine/computer vision or image processing. The SURF detector may seem extremely novel at the first glance but the whole idea is estimating the Hessian operator (a well established filter) using integral images (which have been used by other methods long before SURF). If you want to understand SURF very well and you're not familiar with image processing, you need to go back and read some introductory material. Recently I came across a new and free book, whose chapter 13 has a good and brief introduction to feature detection. Not everything said in there is technically correct, but it's a good starting point. Here you can find another good description of SURF with several images showing how each step works. On that page you see this image:
You can see the white and black blobs, these are the blobs that SURF detects at several scales and estimates their sizes (radius in the OpenCV code).