What do the elements in a homography matrix mean?

Mattb2291 picture Mattb2291 · Aug 22, 2012 · Viewed 9.8k times · Source

I'm new to image processing, but I'm using EMGU for C# image analysis. However, I know the homography matrix isn't unique to EMGU, and so perhaps someone with knowledge of another language can explain better.

Please (in as simplified as can be) can someone explain what each element does. I've looked this up online but can't find an answer that I can properly understand (as I said, I'm kinda new to all this!)

I analyse 2 images, both 2 dimensional. Therefore a 3x3 matrix is needed to account for the rotation / translation of the image. If no movement is detected, the homography matrix is: 100, 010, 001

I know from research (eg OpenCV Homography, Transform a point, what is this code doing?) that: 10Tx, 01Ty, XXX

The 10,01 bit is the rotation of the x and y coordinates. The Tx and Ty bits are the translational movement, but what is the XXX bit? This is what I don't understand? Is it something to do with affine transformations? Please can someone explain: 1. If I'm currently right in what I say above. 2. what the XXX bit means

Answer

phipsgabler picture phipsgabler · Aug 22, 2012

It's not that difficult to understand if you have a grasp of matrix multiplication. Assume you point x is

/a\
\b/,

and you want to rotate the coordinate system by A:

/3 4\
\5 6/

and and "move it" it by t

/2\
\2/.

The latter matrices are the components of the affine transformation to get the new point y:

y = A*x + t = <a'; b'>T //(T means transposed).

As you know, to get that, one can construct a 3d matrix B and a vector x' looking like

    /3 4 2\         /a\
B = |5 6 2| ,  x' = |b|
    \0 0 1/         \1/

such that

     /a'\
y' = |b'| = B*x'
     \ 1/ 

from which you can extract y. Let's see how that works. In the original transformation (using addition), the first step would be to carry out the multiplication, ie. the rotating part y_r:

y_r = A*x = <3a+4b; 5a+6b>T

then you add the "absolute" part:

y = y_r + t = <3a+4b+2; 5a+6b+2>T

Now look at how B works. I'll calculate y' row by row:

1) a' = 3*a + 4*b + 2*1

2) b' = 5*a + 6*b + 2*1

3) the rest: 0*a + 0*b + 1*1 = 1

Just what we expected. First, the rotation part gets calculated--addition and multiplication. Then, the x-part of the translational part gets added, multiplied by 1--it stays the same. The same thing for the second row.

In the third row, a and b are dropped (multiplied by 0). The last part is kept the same, and happens to be 1. So, all about that last line is to "drop" the values of the point and keep the 1.


It could be argued, then, that a 2x3 matrix would be enough for that. That's partially true, but has one significant disadvantage: you loose composability. Suppose you are basically satisfied with B, but want to mirror one coordinate. Then you can choose another transformation matrix

    /-1 0 0\
C = | 0 1 0|
    \ 0 0 1/

and have a result

y'' = C*B*x' = <-3a+4b+2; 5a+6b+2; 1>T

This simple multiplication could not be done that easily with 2x3 matrices, simply because of the properties of matrix multiplication.

In principle, in the above, the last row (the XXX) could also be anything else of the form <0;0;x>. It was there just to drop the point values. It is however necessary exactly like this to make composition by multiplication work.

Finally, wikipedia seems quite informative to me in this case.