How to apply a transformation matrix?

Mylo picture Mylo · May 11, 2009 · Viewed 19.7k times · Source

I am trying to get the 2D screen coordinates of a point in 3D space, i.e. I know the location of the camera its pan, tilt and roll and I have the 3D x,y,z coordinates of a point I wish to project.

I am having difficulty understanding transformation/projection matrices and I was hoping some intelligent people here could help me along ;)

Here is my test code I have thrown together thus far:

public class TransformTest {

public static void main(String[] args) {

    // set up a world point (Point to Project)
    double[] wp = {100, 100, 1};
    // set up the projection centre (Camera Location)
    double[] pc = {90, 90, 1};

    double roll = 0;
    double tilt = 0;
    double pan = 0;

    // translate the point
    vSub(wp, pc, wp);

    // create roll matrix
    double[][] rollMat = {
            {1, 0, 0},
            {0, Math.cos(roll), -Math.sin(roll)},
            {0, Math.sin(roll), Math.cos(roll)},
    };
    // create tilt matrix
    double[][] tiltMat = {
            {Math.cos(tilt), 0, Math.sin(tilt)},
            {0, 1, 0},
            {-Math.sin(tilt), 0, Math.cos(tilt)},
    };
    // create pan matrix
    double[][] panMat = {
            {Math.cos(pan), -Math.sin(pan), 0},
            {Math.sin(pan), Math.cos(pan), 0},
            {0, 0, 1},
    };

    // roll it
    mvMul(rollMat, wp, wp);
    // tilt it
    mvMul(tiltMat, wp, wp);
    // pan it
    mvMul(panMat, wp, wp);

}

public static void vAdd(double[] a, double[] b, double[] c) {
    for (int i=0; i<a.length; i++) {
        c[i] = a[i] + b[i];
    }
}

public static void vSub(double[] a, double[] b, double[] c) {
    for (int i=0; i<a.length; i++) {
        c[i] = a[i] - b[i];
    }      
}

public static void mvMul(double[][] m, double[] v, double[] w) {

    // How to multiply matrices?
} }

Basically, what I need is to get the 2D XY coordinates for a given screen where the 3D point intersects. I am not sure how to use the roll, tilt and pan matrices to transform the world point (wp).

Any help with this is greatly appreciated!

Answer

Wesley picture Wesley · May 11, 2009

This is complicated stuff. Please read a book about this topic to get all the math and nitty gritty details. If you plan on playing with this stuff at length, you need to know these things. This answer is just so you can get your feet wet and hack around.

Multiplying matrices

First things first. Multiplying matrices is a reasonably simple affair.

Let's say you have matrices A, B, and C, where AB = C. Let's say you want to figure out the value of matrix C at row 3, column 2.

  • Take the third row of A and the second column of B. You should have the same number of values from A and B now. (If you don't matrix multiplication isn't defined for those two matrices. You can't do it.) If both are 4×4 matrices, you should have 4 values from A (row 3) and 4 values from B (column 2).
  • Multiply each value of A with each value of B. You should end up with 4 new values.
  • Add these values.

You now have the value of matrix C at row 3, column 2. The challenge is, of course, to do this programmatically.

/* AB = C

Row-major ordering
a[0][0] a[0][2] a[0][3]...
a[1][0] a[1][4] ...
a[2][0] ...
...*/
public static mmMul(double[][] a, double[][] b, double[][] c) {
    c_height = b.length; // Height of b
    c_width = a[0].length; // Width of a
    common_side = a.length; // Height of a, width of b

    for (int i = 0; i < c_height; i++) {
        for (int j = 0; j < c_width; j++) {
            // Ready to calculate value of c[i][j]
            c[i][j] = 0;

            // Iterate through ith row of a, jth col of b in lockstep
            for (int k = 0; k < common_side; k++) {
                c[i][j] += a[i][k] * b[k][j];
            }
        }
    }
}

Homogenous coordinates

You have 3D coordinates. Let's say you have (5, 2, 1). These are Cartesian coordinates. Let's call them (x, y, z).

Homogenous coordinates mean that you write an extra 1 at the end of your Cartesian coordinates. (5, 2, 1) becomes (5, 2, 1, 1). Let's call them (x, y, z, w).

Whenever you do a transformation that makes w ≠ 1, you divide every component of your coordinates by w. This changes your x, y, and z, and it makes w = 1 again. (There is no harm in doing this even when your transformation doesn't change w. It just divides everything by 1, which does nothing.)

There is some majorly cool stuff you can do with homogenous coordinates, even if the math behind them doesn't make total sense. It is at this point that I ask you to look again at the advice at the top of this answer.


Transforming a point

I'll be using OpenGL terminology and approaches in this and following sections. If anything is unclear or seems to conflict with your goals (because this seems vaguely homework-like to me :P), please leave a comment.

I'll also start by assuming that your roll, tilt, and pan matrices are correct.

When you want to transform a point using a transformation matrix, you right-multiply that matrix with a column vector representing your point. Say you want to translate (5, 2, 1) by some transformation matrix A. You first define v = [5, 2, 1, 1]T. (I write [x, y, z, w]T with the little T to mean that you should write it as a column vector.)

// Your point in 3D
double v[4][5] = {{5}, {2}, {1}, {1}}

In this case, Av = v1, where v1 is your transformed point. Do this multiplication like a matrix multiplication, where A is 4×4 and v is 4×1. You will end up with a 4×1 matrix (which is another column vector).

// Transforming a single point with a roll
double v_1[4][6];
mmMul(rollMat, v, v_1);

Now, if you have several transformation matrices to apply, first combine them into one transformation matrix. Do this by multiplying the matrices together in the order that you want them applied.

Programmatically, you should start with the identity matrix and right-multiply each transformation matrix. Let I4 be 4×4 identity matrix, and let A1, A2, A3, ... be your transformation matrices. Let your final transformation matrix be Afinal

AfinalI4
AfinalAfinal A1
AfinalAfinal A2
AfinalAfinal A3

Note that I'm using that arrow to represent assignment. When you implement this, make sure not to overwrite Afinal while you're still using it in the matrix multiplication calculation! Make a copy.

// A composite transformation matrix (roll, then tilt)

double a_final[4][4] =
{
    {1, 0, 0, 0},
    {0, 1, 0, 0},
    {0, 0, 1, 0},
    {0, 0, 0, 1}
}; // the 4 x 4 identity matrix

double a_final_copy[4][4];
mCopy(a_final, a_final_copy); // make a copy of a_final
mmMul(rollMat, a_final_copy, a_final);
mCopy(a_final, a_final_copy); // update the copy
mmMul(tiltMat, a_final_copy, a_final);

Finally, do the same multiplication as above: Afinal v = v1

// Use the above matrix to transform v
mmMul(a_final, v, v_1);

From start to finish

Camera transformations should be represented as a view matrix. Perform your Aview v = v1 operation here. (v represents your world coordinates as a 4×1 column vector, Afinal is your Aview.)

// World coordinates to eye coordinates
// A_view is a_final from above
mmMult(a_view, v_world, v_view);

Projection transformations describe a perspective transform. This is what makes nearer objects bigger and farther objects smaller. This is performed after the camera transformation. If you don't want perspective yet, just use the identity matrix for the projection matrix. Anyway, perform A v1 = v2 here.

// Eye coordinates to clip coordinates
// If you don't care about perspective, SKIP THIS STEP
mmMult(a_projection, v_view, v_eye);

Next, you need to do a perspective divide. This delves deeper into homogenous coordinates, which I haven't described yet. Anyway, divide every component of v2 by the last component of v2. If v2 = [x, y, z, w]T, then divide each component by w (including w itself). You should end up with w = 1. (If your projection matrix is the identity matrix, like I described earlier, this step should do nothing.)

// Clip coordinates to normalized device coordinates
// If you skipped the previous step, SKIP THIS STEP
for (int i = 0; i < 4; i++) {
    v_ndc[i] = v_eye[i] / v[3];
}

Finally, take your v2. The first two coordinates are your x and y coordinates. The third is z, which you can throw away. (Later, once you get very advanced, you can use this z value to figure out which point is in front of or behind some other point.) And at this point, the last component is w = 1, so you don't need that at all anymore.

x = v_ndc[0]
y = v_ndc[1]
z = v_ndc[2]  // unused; your screen is 2D

If you skipped the perspective and perspective divide steps, use v_view instead of v_ndc above.

This is very similar to the set of OpenGL coordinate systems. The difference is that you start with world coordinates, while OpenGL starts with object coordinates. The difference is as follows:

  • You start with world coordinates
    • OpenGL starts with object coordinates
  • You use the view matrix to transform world coordinates to eye coordinates
    • OpenGL uses the ModelView matrix to transform object coordinates to eye coordinates

From there on, everything is the same.