Parallel GPU computing using OpenCV

mmccullo picture mmccullo · Jun 21, 2012 · Viewed 9.3k times · Source

I have an application that requires processing multiple images in parallel in order to maintain real-time speed.

It is my understanding that I cannot call OpenCV's GPU functions in a multi-threaded fashion on a single CUDA device. I have tried an OpenMP code construct such as the following:

#pragma omp parallel for
for(int i=0; i<numImages; i++){
    for(int j=0; j<numChannels; j++){
        for(int k=0; k<pyramidDepth; k++){
            cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k]);
        }
    }
}

This seems to compile and execute correctly, but unfortunately it appears to execute the numImages threads serially on the same CUDA device.

I should be able to execute multiple threads in parallel if I have multiple CUDA devices, correct? In order to get multiple CUDA devices, do I need multiple video cards?

Does anyone know if the nVidia GTX 690 dual-chip card works as two independent CUDA devices with OpenCV 2.4 or later? I found confirmation it can work as such with OpenCL, but no confirmation with regard to OpenCV.

Answer

Martin Beckett picture Martin Beckett · Jun 22, 2012

Just do the multiply passing whole images to the cv::gpu::multiply() function.

OpenCV and CUDA will handle splitting it and dividing the task in the best way. Generally each computer unit (i.e. core) in a GPU can run multiple threads (typically >=16 in CUDA). This is in addition to having cards that can appear as multiple GPUs or putting multiple linked cards in one machine.

The whole point of cv::gpu is to save you from having to know anything about how the internals work.