I have been playing around with different data clustering algorithms working on finding clusters between random data points represented an nodes, I keep reading that data clustering is used for image recognition. I am failing to make the connection, how does clustering data help in recognizing an image or in facial recognition. can someone explain this?
It's no surprise that clustering is used for pattern recognition at large, and image recognition in particular: clustering is a reducing process, and images in this megapixel era need boiling down... It is also a process which produces categories and that is of course useful.
However there are many approaches to the use of clustering as a technique for image recognition. One of the reasons for this diversity is that clustering can be applied at different level, for different purposes: from basic pixel level to feature level (feature be a line, a geometric figure...), for classification or for other purposes.
At a very high level, clustering is a statistical tool, it helps discovering the relative importance of various dimensions in defining the belonging of particular item to a particular category.
One [of many] usage[s] of such a tool, is with supervised learning, whereby a set of human-selected items (say images) are fed into the cluster-based logic, along with a label associated with a particular item ("this is an apple", "this is another apple", "this is a lemon"...), the clustering logic then determines how much each dimension of the input matters for helping each group of items (apples, lemons...) fit in a distinct cluster (for example the color may matter relatively little, but the shape, or the presence of dots, or whatever may matter a lot). After this training phase, new images can be fed to the logic and by seeing how close to a particular cluster this image falls, it is "recognized" (as a banana!).
When it comes to image processing one needs to remember that whatever is "fed" to the clustering logic is not necessarily (in fact, rarely) the raw pixels, but various "objects" characterizing various "elements" of the original data (essentially a collection of relatively high dimension vectors, not unlike some that one may have encountered in other other data clustering examples), and produced by previous stages of the process. For example a important element of facial recognition is probably the exact distance between the center of the eyes. In previous stages, the image is processed in a way that figures out where the eyes are (possibly relying on another clustering-based logic). Then the distance between the eyes, along with many other elements are fed to the final clustering logic.
The preceding description is only one example of the use of clustering for image recognition. Indeed, various forms of neural networks have been used, very successfully, in this domain, and it can be argued that in a sense these neural networks are clustering information. One of the reasons for the success of neural nets may lie in their ability to be more respectful of the locality dimension as found in the original input, and also their ability to work in a hierarchical fashion.
A good conclusion to this write up would be a short list of online resources, but I'm pressed for time at the moment... "to be continued" ;-)
Next day edit: (failed attempt to provide an introductory online bibliography on the subject)
My search for literature on the topic of clustering as applied to artificial vision and image processing revealed two distinct... clusters ;-)
In short I feel ill equipped to make any specific book or article suggestion.
You may find it informative to browse titles in say Google books, keying in by "Artificial vision" or "Image Recognition" or some or the titles mentioned above. With the preview feature and also the tag cloud (btw another application of clustering) found in the "about this book" link, one can get a good idea of the various books contents and maybe decide to purchase some of them. Unfortunately the reduced readership and the potentially lucrative applications in the field make these books relatively expensive. At the other end of the spectrum, you may download, sometimes for free, research papers discussing advanced topics in the field. These will also show up on regular (web) Google, or at specialized repositories such as CiteSeer.
Good luck with your exploration in that field!