I have an application that requires processing multiple images in parallel in order to maintain real-time speed.
It is my understanding that I cannot call OpenCV's GPU functions in a multi-threaded fashion on a single CUDA device. I have tried an OpenMP code construct such as the following:
#pragma omp parallel for
for(int i=0; i<numImages; i++){
for(int j=0; j<numChannels; j++){
for(int k=0; k<pyramidDepth; k++){
cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k]);
}
}
}
This seems to compile and execute correctly, but unfortunately it appears to execute the numImages threads serially on the same CUDA device.
I should be able to execute multiple threads in parallel if I have multiple CUDA devices, correct? In order to get multiple CUDA devices, do I need multiple video cards?
Does anyone know if the nVidia GTX 690 dual-chip card works as two independent CUDA devices with OpenCV 2.4 or later? I found confirmation it can work as such with OpenCL, but no confirmation with regard to OpenCV.
Just do the multiply passing whole images to the cv::gpu::multiply()
function.
OpenCV and CUDA will handle splitting it and dividing the task in the best way. Generally each computer unit (i.e. core) in a GPU can run multiple threads (typically >=16 in CUDA). This is in addition to having cards that can appear as multiple GPUs or putting multiple linked cards in one machine.
The whole point of cv::gpu
is to save you from having to know anything about how the internals work.