Anybody know about the fastest method for calculating convolution? Unfortunately the matrix which I deal with is very large (500x500x200) and if I use convn
in MATLAB it takes a long time (I have to iterate this calculation in a nested loop). So, I used convolution with FFT and it is faster now. But, I am still looking for a faster method. Any idea?
If your kernel is separable, the greatest speed gains will be realized by performing multiple sequential 1D convolutions.
Steve Eddins of MathWorks describes how to take advantage of the associativity of convolution to speed up convolution when the kernel is separable in a MATLAB context on his blog. For a P-by-Q
kernel, the computational advantage of performing two separate and sequential convolutions vs. 2D convolution is PQ/(P+Q)
, which corresponds to 4.5x for a 9x9 kernel and ~11x for a 15x15 kernel. EDIT: An interesting unwitting demonstration of this difference was given in this Q&A.
To figure out if the kernel is separable (i.e. the outer product of two vectors) the blog goes on to describe how to check if your kernel is separable with SVD and how to get the 1D kernels. Their example is for a 2D kernel. For a solution for N-dimensional separable convolution, check this FEX submission.
Another resource worth pointing out is this SIMD (SSE3/SSE4) implementation of 3D convolution by Intel, which includes both source and a presentation. The code is for 16 bit integers. Unless you move to GPU (e.g. cuFFT), it is probably hard to get faster than Intel's implementations, which also includes Intel MKL. There is an example of 3D convolution (single-precision float) at the bottom of this page of the MKL documentation (link fixed, now mirrored in https://stackoverflow.com/a/27074295/2778484).