I will explain what I am trying to do, as it seems to be relevant in order to understand my question.
I am currently trying to do face recognition of people that step in front of a camera, based on known pictures in the database.
These known pictures are being collected from an identifying Smart Card (which contains only a single frontal face picture) or a frontal face profile picture from a social network. From what I've read so far, it seems that for a good face recognition, a good amount of training images is required (50+). As such, since my collected images are very few to create a reliable training set, I instead tried using my live camera frame captures (currently using 150) as the training set, and the identified pictures collected previously as the test set. I'm not sure if what I'm trying with this is correct, so please let me know if I'm screwing up.
So, the problem is that after I have let's say, 5 identified pictures that I got from Smart Cards, I tried to do face recognition using as a training set, the 150 frames which the camera captured of my face. When trying to recognize, the confidence values for each of the 5 test faces is EXTREMELY similar, making the whole program useless, because I cannot accurately recognize anyone. Often, using different camera captures as training I get higher confidence values from pictures of random people than the picture of myself.
I would appreciate any help you can give me, because I'm at a loss here.
Thank you.
Note: I'm using the JavaCV wrapper for OpenCV to make my program, and the haarcascades that come included in the package. Eigenfaces being the algorithm used.
I want to add this. libfacerec has been included into the official OpenCV 2.4.2, see:
That means if you are using OpenCV 2.4.2, then you have the new cv::FaceRecognizer in the contrib module. I know a Python wrapper has been added lately (thanks for that!), probably Java is also wrapped at time of writing this.
cv::FaceRecognizer comes with an extensive documentation, that'll show you how to do face recognition with lots of full source code examples:
If you want to know how the available face recognition algorithms (Eigenfaces, Fisherfaces, Local Binary Patterns Histograms) work, then especially read the Guide To Face Recognition with OpenCV. In there I explain how the algorithms work and mention their shortcomings:
Now to your original problem of recognizing faces, when your training dataset is small. I'll write you a thorough answer, so it probably helps people coming here from Google.
Actually Eigenfaces and Fisherfaces should not be used, when you only have very few samples per person in your data set. You need data for these models to work, I can't stress that enough. The more the better. These methods are based on estimating the variance in your data, so give them some data to estimate your model from! A while ago I ran a small test on the AT&T Facedatabase (with the facerec framework), which shows the performance of these methods with a varying number of images per person:
I am not writing a publication here, nor will I back these figures with a detailed mathematical analysis. It has been done before, so I recommend everyone doubting these figures to look into (2), in order to see a very detailed analysis of the PCA (Eigenfaces) and LDA (Fisherfaces) for small training data sets.
So what I suggest is using Local Binary Patterns Histograms (3) for Face Recognition in the small sample scenario. These are also included in the OpenCV FaceRecognizer and have been proven to perform very well on small training data sets. If you combine this with a TanTriggs Preprocessing (4), you should have a really robust Face Recognition model. The TanTriggs Preprocessing is a 8-liner (or so) in Python, see https://github.com/bytefish/facerec/blob/master/py/facerec/preprocessing.py#L41 for the implementation. That should be easy to adapt to Java (or I can implement it with OpenCV, if people request it).