Using Nearest Neighbour Algorithm for image pattern recognition

user293895 picture user293895 · May 2, 2010 · Viewed 9.7k times · Source

So I want to be able to recognise patterns in images (such as a number 4), I have been reading about different algorithms and I would really like to use the Nearest Neighbour algorithm, it looks simple and I do understand it based on this tutorial: http://people.revoledu.com/kardi/tutorial/KNN/KNN_Numerical-example.html Problem is, although I understand how to use it to fill in missing data sets, I don't understand how I could use it as a pattern recognition tool to aim in Image Shape Recognition. Could someone please shed some light as to how this algorithm could work for pattern recognition? I have seen tutorials using OpenCV, however I don't really want to use this library as I have the ability to do the pre-processing myself, and it seems silly that I would implement this library just for what should be a simple nearest neighbour algorithm.

Answer

leonbloy picture leonbloy · May 2, 2010

You simply (simply?) have to define a measure of "distance" for your data.

Lets asume you have already segmented you big image in small images, each one corresponding to a text character you want to classified. Lets assume that we are dealing with digital monocrome images, so each image is represented as a rectangular matrix of values (pixels) in (say) the 0-255 integer range (brightness). It is also assumed (NN is a "supervised clasification algorithm") that you have a lot of already well classified images (your training set).

Given a new small image, you must define a distance between two images so that the most close in the training set is chosen, and its "label" chosen as the recognized text character.

One naive approach would be to take the difference of pixels (sum of squares, for example). But this distance measure would be sensitive to translations (and rotations and scaling) and we usually don't want that. An alternative would be to compute the modulus of the Fourier transform, which is translation invariant (but this is not enough). From here you can start - and appreciate that the problem is difficult, and these kind of classification needs a lot of work to perform acceptably.