I have a C++ project, where I am using OpenCV and Libfreenect. I do not want to include something as big and heavy as OpenNI and create OpenCV installation dependency in the process. I want to use the calibration information provided here to undistort and align the RGB and depth images.
Undistorting the images individually based on camera matrix and distortion coefficients was easy enough. But now I am confused as to how I could use the rectification and projection matrices to align the RGB and depth image, so that they essentially show me the same things from the same perspective. After searching around for quite some time, I cannot establish a flow of how it should work with OpenCV. It is a vague estimate that reprojectImageTo3D() and warpPerspective() might be used, but I am not sure how.
How could I approach this problem? I am using the old XBOX360 Kinect (with 0-2047 raw disparity value range).
UPDATE
Here is the partial code I have written so far:
// I use callback functions to get RGB (CV_8UC3) and depth (CV_16UC1)
// I undistort them and call the following method
void AlignImages(cv::Mat& pRGB, cv::Mat& pDepth) {
rotationMat = (cv::Mat_<double_t>(3,3) << 9.9984628826577793e-01, 1.2635359098409581e-03, -1.7487233004436643e-02, -1.4779096108364480e-03, 9.9992385683542895e-01, -1.2251380107679535e-02, 1.7470421412464927e-02, 1.2275341476520762e-02, 9.9977202419716948e-01);
translationMat = (cv::Mat_<double_t>(3,1) << 1.9985242312092553e-02, -7.4423738761617583e-04, -1.0916736334336222e-02);
// make a copy in float to convert raw depth data to physical distance
cv::Mat tempDst;
pDepth.convertTo(tempDst, CV_32F);
// create a 3 channel image of precision double for the 3D points
cv::Mat tempDst3D = cv::Mat(cv::Size(640, 480), CV_64FC3, double(0));
float_t* tempDstData = (float_t*)tempDst.data;
double_t* tempDst3DData = (double_t*)tempDst3D.data;
size_t pixelSize = tempDst.step / sizeof(float_t);
size_t pixel3DSize = tempDst3D.step / sizeof(double_t);
for (int row=0; row < tempDst.rows; row++) {
for (int col=0; col < tempDst.cols; col++) {
// convert raw depth values to physical distance (in metres)
float_t& pixel = tempDstData[pixelSize * row + col];
pixel = 0.1236 * tanf(pixel/2842.5 + 1.1863);
// reproject physical distance values to 3D space
double_t& pixel3D_X = tempDst3DData[pixel3DSize * row + col];
double_t& pixel3D_Y = tempDst3DData[pixel3DSize * row + col +1];
double_t& pixel3D_Z = tempDst3DData[pixel3DSize * row + col + 2];
pixel3D_X = (row - 3.3930780975300314e+02) * pixel / 5.9421434211923247e+02;
pixel3D_Y = (col - 2.4273913761751615e+02) * pixel / 5.9104053696870778e+02;
pixel3D_Z = pixel;
}
}
tempDst3D = rotationMat * tempDst3D + translationMat;
}
I have directly used the numbers instead of assigning them to variables, but that should not be a problem in understanding the logic. At this point, I am supposed to do the following:
P2D_rgb.x = (P3D'.x * fx_rgb / P3D'.z) + cx_rgb
P2D_rgb.y = (P3D'.y * fy_rgb / P3D'.z) + cy_rgb
But I do not understand how I am to do it, exactly. Perhaps I am going in the wrong direction altogether. But I cannot find any example of this being done.
Basically, you need to change the 3D coordinate system to convert 3D points as seen by the depth camera to 3D points seen by the RGB camera.
You cannot use function reprojectImageTo3D()
because it expects a matrix Q that you do not have. Instead, you should convert your disparity map to a depthmap using the function raw_depth_to_meters
provided in the page you linked.
Then, for each pixel of the depthmap, you need to compute the associated 3D point, denoted by P3D
in the page you linked (see § "Mapping depth pixels with color pixels"). Then you need to apply the provided 3D rotation matrix R and 3D translation vector T, which represent the transformation from the depth camera to the RGB camera, to each 3D point P3D
in order to obtain the associated new 3D point P3D'
. Finally, using the calibration matrix of the RGB camera, you can project the new 3D points into the RGB image, and assign the associated depth to the obtained pixel in order to generate a new depthmap aligned with the RGB image.
Note that you are necessarily loosing accuracy in the process, since you need to handle occlusions (by keeping only the minimum depth seen by each pixel) and image interpolation (since in general, the projected 3D points won't be associated with integer pixel coordinates in the RGB image). Concerning image interpolation, I recommand you use the nearest neighbor approach, otherwise you might end up with weird behavior at depth boundaries.
Edit following the question update
Here is a model of what you should be doing in order to remap the Kinect depthmap to the RGB cam point of view:
cv::Mat_<float> pt(3,1), R(3,3), t(3,1);
// Initialize R & t here
depthmap_rgbcam = cv::Mat::zeros(height,width,CV_32FC1); // Initialize the depthmap to all zeros
float *depthmap_rgbcam_buffer = (float*)depthmap_rgbcam.data;
for(int row=0; row<height; ++row)
{
for(int col=0; col<width; ++col)
{
// Convert kinect raw disparity to depth
float raw_disparity = kinect_disparity_map_buffer[width*row+col];
float depth_depthcam = disparity_to_depth(raw_disparity);
// Map depthcam depth to 3D point
pt(0) = depth*(col-cx_depthcam)/fx_depthcam; // No need for a 3D point buffer
pt(1) = depth*(row-cy_depthcam)/fy_depthcam; // here, unless you need one.
pt(2) = depth;
// Rotate and translate 3D point
pt = R*pt+t;
// If required, apply rgbcam lens distortion to X, Y and Z here.
// Project 3D point to rgbcam
float x_rgbcam = fx_rgbcam*pt(0)/pt(2)+cx_rgbcam;
float y_rgbcam = fy_rgbcam*pt(1)/pt(2)+cy_rgbcam;
// "Interpolate" pixel coordinates (Nearest Neighbors, as discussed above)
int px_rgbcam = cvRound(x_rgbcam);
int py_rgbcam = cvRound(y_rgbcam);
// Handle 3D occlusions
float &depth_rgbcam = depthmap_rgbcam_buffer[width*py_rgbcam+px_rgbcam];
if(depth_rgbcam==0 || depth_depthcam<depth_rgbcam)
depth_rgbcam = depth_depthcam;
}
}
This is the idea, modulo possible typos. You can also change consistently the datatype as you like. Concerning your comment, I don't think there is any builtin OpenCV function for that purpose yet.