I'm trying to make moving people tracking work with OpenCV in C++, with a camera looking at a street and people moving about it. for a sample video I shot and I'm using, see here: http://akos.maroy.hu/~akos/eszesp/MVI_0778.MOV
I read up on this topic, and I tried a number of things, including:
but none of these provide a good result. for my sample code, see below. for the output of the code based on the above video, see: http://akos.maroy.hu/~akos/eszesp/ize.avi . the contours detected against the background are in red, the bounding rectangles of the contours are in green, and the HOG people detector results are in blue.
the specific issues I have are:
background detection and then finding contours seems to work fine, although there are some false positives. but the main drawback is that a lot of times a single person is 'cut up' into multiple contours. is there a simple way to 'join' these together, maybe by an assumed 'ideal' person size, or some other means?
as for the HOG people detector, in my case it very seldomly identifies the real people on the image. what could I be doing wrong there?
all pointers, ideas welcome!
and thus, the code I'm using so far, which is a cust-and-paste glory of various samples I found here and there:
#include<opencv2/opencv.hpp>
#include<iostream>
#include<vector>
int main(int argc, char *argv[])
{
if (argc < 3) {
std::cerr << "Usage: " << argv[0] << " in.file out.file" << std::endl;
return -1;
}
cv::Mat frame;
cv::Mat back;
cv::Mat fore;
std::cerr << "opening " << argv[1] << std::endl;
cv::VideoCapture cap(argv[1]);
cv::BackgroundSubtractorMOG2 bg;
//bg.nmixtures = 3;
//bg.bShadowDetection = false;
cv::VideoWriter output;
//int ex = static_cast<int>(cap.get(CV_CAP_PROP_FOURCC));
int ex = CV_FOURCC('P','I','M','1');
cv::Size size = cv::Size((int) cap.get(CV_CAP_PROP_FRAME_WIDTH),
(int) cap.get(CV_CAP_PROP_FRAME_HEIGHT));
std::cerr << "saving to " << argv[2] << std::endl;
output.open(argv[2], ex, cap.get(CV_CAP_PROP_FPS), size, true);
std::vector<std::vector<cv::Point> > contours;
cv::namedWindow("Frame");
cv::namedWindow("Fore");
cv::namedWindow("Background");
cv::SimpleBlobDetector::Params params;
params.minThreshold = 40;
params.maxThreshold = 60;
params.thresholdStep = 5;
params.minArea = 100;
params.minConvexity = 0.3;
params.minInertiaRatio = 0.01;
params.maxArea = 8000;
params.maxConvexity = 10;
params.filterByColor = false;
params.filterByCircularity = false;
cv::SimpleBlobDetector blobDtor(params);
blobDtor.create("SimpleBlob");
std::vector<std::vector<cv::Point> > blobContours;
std::vector<cv::KeyPoint> keyPoints;
cv::Mat out;
cv::HOGDescriptor hog;
hog.setSVMDetector(cv::HOGDescriptor::getDefaultPeopleDetector());
for(;;)
{
cap >> frame;
bg.operator ()(frame, fore);
bg.getBackgroundImage(back);
cv::erode(fore, fore, cv::Mat());
cv::dilate(fore, fore, cv::Mat());
blobDtor.detect(fore, keyPoints, cv::Mat());
//cv::imshow("Fore", fore);
cv::findContours(fore, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE);
cv::drawContours(frame, contours, -1, cv::Scalar(0,0,255), 2);
std::vector<std::vector<cv::Point> >::const_iterator it = contours.begin();
std::vector<std::vector<cv::Point> >::const_iterator end = contours.end();
while (it != end) {
cv::Rect bounds = cv::boundingRect(*it);
cv::rectangle(frame, bounds, cv::Scalar(0,255,0), 2);
++it;
}
cv::drawKeypoints(fore, keyPoints, out, CV_RGB(0,255,0), cv::DrawMatchesFlags::DEFAULT);
cv::imshow("Fore", out);
std::vector<cv::Rect> found, found_filtered;
hog.detectMultiScale(frame, found, 0, cv::Size(8,8), cv::Size(32,32), 1.05, 2);
for (int i = 0; i < found.size(); ++i) {
cv::Rect r = found[i];
int j = 0;
for (; j < found.size(); ++j) {
if (j != i && (r & found[j]) == r) {
break;
}
}
if (j == found.size()) {
found_filtered.push_back(r);
}
}
for (int i = 0; i < found_filtered.size(); ++i) {
cv::Rect r = found_filtered[i];
cv::rectangle(frame, r.tl(), r.br(), cv::Scalar(255,0,0), 3);
}
output << frame;
cv::resize(frame, frame, cv::Size(1280, 720));
cv::imshow("Frame", frame);
cv::resize(back, back, cv::Size(1280, 720));
cv::imshow("Background", back);
if(cv::waitKey(30) >= 0) break;
}
return 0;
}
Actually, it is very wide topic. There are plenty of scientific papers that tries to attack this problem. You should read something before.
Briefly: Background detection and contours is the easiest technique. OpenCV has very nice implementations, also optimized for the gpu. For refine the foreground/background blobs you can use some morphological operation, try to close holes in the blobs and get better results. But do not expect perfect results. Background subtraction is a difficult operation, you can spend hours in fine tune parameters for a given dataset, then try your code in the real world and.. nothing works. Lights, shadows, background changes with non-interested objects.. just for mention some problems.
So.. no, there is no a simple and standard technique for handling the so called "blob fragmentation" or "split-merge" problem (sometime one person is split in more blobs, sometime more people are merged in one single blob). Again, it's full of scientific papers on this argument. But there are techniques for handling the tracking of incomplete or clutter observation. One of the easiest is to try to infer the real state of the system given some incomplete observation with Kalman filter. Opencv has a nice implementation on that. Again, if you do some search on "Kalman filter tracking" or "GNN data association" you'll find a lot.
If you want to use some geometrical information like estimating the height of a person etc, you can do it but you need the calibration parameters of the camera. That implies have them available (microsoft kinect of standard iphone camera has their parameters available) or calculating them though a camera calibration process. This means to download a chessboard image, print it on a paper, and take some pictures of it. Then, OpenCV has all methods for doing the calibration. After that, you need to estimate the ground plane, and then use some simple render project/unproject methods for going from 2d to 3d coordinates forth and back, and estimate the 2d bounding box of a 3d standard person.
Modern approaches on "pedestrian tracking" extract observation with some detector. Background subtraction can give a map where to try to detect to not search on the hole image, but blob detection is useless in this case. In OpenCV the more used implementations in this case are Haar Adaboost detector and HOG detector.HOG detector seems to give better results in some cases. Already implemented classifier in OpenCV includes face detector for Haar and people detect for HOG. You'll find examples in both cpp and python samples in the OpenCV repository.
If the standard detections fail (your video are with different size or you have to detect other object than pedestrians).. you have to train your own detector. That means collect some images of object you want to detect (positive samples), and some images with something else (negative samples) and train your own classifiers with machine learning techniques like SVN. again, google is your friend : )
Good luck!