How can I stitch images from video cameras in real time?

Alex picture Alex · Mar 30, 2015 · Viewed 13.3k times · Source

I use 4 stationary cameras. Cameras do not move relative to each other. And I want to stitch video images from them into the one video image in real time.

I use for this OpenCV 2.4.10, and cv:stitcher class, like this:

// use 4 video-cameras
cv::VideoCapture cap0(0), cap1(1), cap2(2), cap3(3);

bool try_use_gpu = true;    // use GPU
cv::Stitcher stitcher = cv::Stitcher::createDefault(try_use_gpu);
stitcher.setWarper(new cv::CylindricalWarperGpu());
stitcher.setWaveCorrection(false);
stitcher.setSeamEstimationResol(0.001);
stitcher.setPanoConfidenceThresh(0.1);

//stitcher.setSeamFinder(new cv::detail::GraphCutSeamFinder(cv::detail::GraphCutSeamFinderBase::COST_COLOR_GRAD));
stitcher.setSeamFinder(new cv::detail::NoSeamFinder());
stitcher.setBlender(cv::detail::Blender::createDefault(cv::detail::Blender::NO, true));
//stitcher.setExposureCompensator(cv::detail::ExposureCompensator::createDefault(cv::detail::ExposureCompensator::NO));
stitcher.setExposureCompensator(new cv::detail::NoExposureCompensator());


std::vector<cv::Mat> images(4);
cap0 >> images[0];
cap1 >> images[1];
cap2 >> images[2];
cap3 >> images[3];

// call once!
cv::Stitcher::Status status = stitcher.estimateTransform(images);


while(true) {

    // **lack of speed, even if I use old frames**
    // std::vector<cv::Mat> images(4);
    //cap0 >> images[0];
    //cap1 >> images[1];
    //cap2 >> images[2];
    //cap3 >> images[3];

    cv::Stitcher::Status status = stitcher.composePanorama(images, pano_result);
}

I get only 10 FPS (frame per seconds), but I need 25 FPS. How can I accelerate this example?

When I use stitcher.setWarper(new cv::PlaneWarperGpu()); then I get a very enlarged image, this I do not need.

I need only - Translations.

For example, I'm ready to don't use:

  • Perspective transformation
  • Scale operations
  • and may be even Rotations

How can I do it? Or how can I get from cv::Stitcher stitcher parameters x,y of translations for each of images?

UPDATE - profiling in MSVS 2013 on Windows 7 x64: enter image description here

Answer

n00dle picture n00dle · Apr 1, 2015

cv::Stitcher is fairly slow. If your cameras definitely don't move relative to one another and the transformation is as simple as you say, you should be able to overlay the images onto a blank canvas simply by chaining homographies.

The following is somewhat mathematical - if this isn't clear I can write it up properly using LaTeX, but SO doesn't support pretty maths :)

You have a set of 4 cameras, from left to right, (C_1, C_2, C_3, C_4), giving a set of 4 images (I_1, I_2, I_3, I_4).

To transform from I_1 to I_2, you have a 3x3 transformation matrix, called a homography. We'll call this H_12. Similarly for I_2 to I_3 we have H_23 and for I_3 to I_4 you'll have H_34.

You can pre-calibrate these homographies in advance using the standard method (point matching between the overlapping cameras).

You'll need to create a blank matrix, to act as the canvas. You can guess the size of this (4*image_size would suffice) or you can take the top-right corner (call this P1_tr) and transform it by the three homographies, giving a new point at the top-right of the panorama, PP_tr (the following assumes that P1_tr has been converted to a matrix):

PP_tr = H_34 * H_23 * H_12 * P1_tr'

What this is doing, is taking P1_tr and transforming it first into camera 2, then from C_2 to C_3 and finally from C_3 to C_4

You'll need to create one of these for combining images 1 and 2, images 1,2 and 3 and finally images 1-4, I'll refer to them as V_12, V_123 and V_1234 respectively.

Use the following to warp the image onto the canvas:

cv::warpAffine(I_2, V_12, H_12, V_12.size( ));

Then do the same with the next images:

cv::warpAffine(I_3, V_123, H_23*H_12, V_123.size( ));
cv::warpAffine(I_4, V_1234, H_34*H_23*H_12, V_1234.size( ));

Now you have four canvases, all of which are the width of the 4 combined images, and with one of the images transformed into the relevant place on each.

All that remains is to merge the transformed images onto eachother. This is easily achieved using regions of interest.

Creating the ROI masks can be done in advance, before frame capture begins.

Start with a blank (zeros) image the same size as your canvases will be. Set the leftmost rectangle the size of I_1 to white. This is the mask for your first image. We'll call it M_1.

Next, to get the mask for the second transformed image, we do

cv::warpAffine(M_1, M_2, H_12, M_1.size( ));
cv::warpAffine(M_2, M_3, H_23*H_12, M_1.size( ));
cv::warpAffine(M_3, M_4, H_34*H_23*H_12, M_1.size( ));

To bring all the images together into one panorama, you do:

cv::Mat pano = zeros(M_1.size( ), CV_8UC3);
I_1.copyTo(pano, M_1);
V_12.copyTo(pano, M_2): 
V_123.copyTo(pano, M_3): 
V_1234.copyTo(pano, M_4): 

What you're doing here is copying the relevant area of each canvas onto the output image, pano - a fast operation.

You should be able to do all this on the GPU, substituting cv::gpu::Mat's for cv::Mats and cv::gpu::warpAffine for its non-GPU counterpart.