I am trying to implement Viola Jones detector. Because I don´t have enough images or time to train classifiers, I decided to use these OpenCV has.
So far, I was able to load whole haarcascade_frontface_alt.xml
into structures in memory, create Integral Image for fast area sum and create basic algorithm for detector. But it is not working as expected, well, at all.
So, if anyone knows how Viola Jones detector works, and knows how OpenCV uses its structures, please confirm or deny my assumptions:
1, integral image is calculated in float values ranging from 0->1 for white
2, for every feature, you take area within its rectangles, multiply with weight and sum with all rectangles
3, if the sum is > threshold, left_val is summed further, if not right_val is used
4, is sum for all classifiers in stage is > than stage_threshold, it might be face, so continue with next stage, if not, break
5, repeat for all stages, detection windows and scales...
So far I am getting all kinds of detected areas, except those containing faces...
Please, if my assumptions of openCV cascade usage are wrong, help. Thanks.
Whether you use float or int as datatype for the integral image does not matter as long as it allows you to store values big enough to prevent arithmetic overflow. The values do not have to be normalized. Normalization is done later, during evaluation of the classifier (see 3).
Yes
if sum * inverse_area < threshold * standard_deviation, left_val is summed further, ... (see below).
Yes
Yes
I recommend you to look at high-level OpenCV ports such as JViolaJones written in Java or js-objectdetect for stump-based cascades, especially computeSat()
and detectSingleScale()
and Haar.js for non-stump based cascades both written in JavaScript to get a better understanding of the algorithm. The optimized OpenCV C/C++-code is somewhat difficult to read.