I'm performing matrix multiplication with this simple algorithm. To be more flexible I used objects for the matricies which contain dynamicly created arrays.
Comparing this solution to my first one with static arrays it is 4 times slower. What can I do to speed up the data access? I don't want to change the algorithm.
matrix mult_std(matrix a, matrix b) {
matrix c(a.dim(), false, false);
for (int i = 0; i < a.dim(); i++)
for (int j = 0; j < a.dim(); j++) {
int sum = 0;
for (int k = 0; k < a.dim(); k++)
sum += a(i,k) * b(k,j);
c(i,j) = sum;
}
return c;
}
k
and j
loop iterations -> performance improvementdim()
and operator()()
as inline
-> performance improvementThe performance is now nearly the same as it was in the old porgram. Maybe there should be a bit more improvement.
But I have another problem: I get a memory error in the function mult_strassen(...)
. Why?
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
c99 main.c -o matrix -O3
g++ main.cpp matrix.cpp -o matrix -O3
.
Speaking of speed-up, your function will be more cache-friendly if you swap the order of the k
and j
loop iterations:
matrix mult_std(matrix a, matrix b) {
matrix c(a.dim(), false, false);
for (int i = 0; i < a.dim(); i++)
for (int k = 0; k < a.dim(); k++)
for (int j = 0; j < a.dim(); j++) // swapped order
c(i,j) += a(i,k) * b(k,j);
return c;
}
That's because a k
index on the inner-most loop will cause a cache miss in b
on every iteration. With j
as the inner-most index, both c
and b
are accessed contiguously, while a
stays put.