Matrix multiplication: Strassen vs. Standard

multiholle picture multiholle · Nov 29, 2010 · Viewed 10.1k times · Source

I tried to implement the Strassen algorithm for matrix multiplication with C++, but the result isn't that, what I expected. As you can see strassen always takes more time then standard implementation and only with a dimension from a power of 2 is as fast as standard implementation. What went wrong? alt text

matrix mult_strassen(matrix a, matrix b) {
if (a.dim() <= cut)
    return mult_std(a, b);

matrix a11 = get_part(0, 0, a);
matrix a12 = get_part(0, 1, a);
matrix a21 = get_part(1, 0, a);
matrix a22 = get_part(1, 1, a);

matrix b11 = get_part(0, 0, b);
matrix b12 = get_part(0, 1, b);
matrix b21 = get_part(1, 0, b);
matrix b22 = get_part(1, 1, b);

matrix m1 = mult_strassen(a11 + a22, b11 + b22); 
matrix m2 = mult_strassen(a21 + a22, b11);
matrix m3 = mult_strassen(a11, b12 - b22);
matrix m4 = mult_strassen(a22, b21 - b11);
matrix m5 = mult_strassen(a11 + a12, b22);
matrix m6 = mult_strassen(a21 - a11, b11 + b12);
matrix m7 = mult_strassen(a12 - a22, b21 + b22);

matrix c(a.dim(), false, true);
set_part(0, 0, &c, m1 + m4 - m5 + m7);
set_part(0, 1, &c, m3 + m5);
set_part(1, 0, &c, m2 + m4);
set_part(1, 1, &c, m1 - m2 + m3 + m6);

return c; 
}


PROGRAM
matrix.h http://pastebin.com/TYFYCTY7
matrix.cpp http://pastebin.com/wYADLJ8Y
main.cpp http://pastebin.com/48BSqGJr

g++ main.cpp matrix.cpp -o matrix -O3.

Answer

uesp picture uesp · Nov 29, 2010

Some thoughts:

  • Have you optimized it to consider that a non-power of two sized matrix is filled in with zeroes? I think the algorithm assumes you don't bother multiplying these terms. This is why you get the flat areas where the running time is constant between 2^n and 2^(n+1)-1. By not multiplying terms you know are zero you should be able to improve these areas. Or perhaps Strassen is only meant to work with 2^n sized matrices.
  • Consider that a "large" matrix is arbitrary and the algorithm is only slightly better than the naive case, O(N^3) vs O(N^2.8). You may not see measurable gains until bigger matrices are tried. For example, I was did some Finite Element modeling where 10,000x10,000 matrices were considered "small". Its hard to tell from your graph but it looks like the 511 case may be faster in the Stassen case.
  • Try testing with various optimization levels including no optimizations at all.
  • This algorithm seems to assume that multiplications are much more expensive than additions. This was certainly true 40 years ago when it was first developed but I believe in more modern processors the difference between add and multiply has gotten smaller. This may reduce the effectiveness of the algorithm which seems to reduce multiplications but increase additions.
  • Have you looked at some of the other Strassen implementations out there for ideas? Try benchmarking a known good implementation to see exactly how much faster you can get.