What is the fastest FFT implementation in Python?
It seems numpy.fft and scipy.fftpack both are based on fftpack, and not FFTW. Is fftpack as fast as FFTW? What about using multithreaded FFT, or using distributed (MPI) FFT?
You could certainly wrap whatever FFT implementation that you wanted to test using Cython or other like-minded tools that allow you to access external libraries.
If you're going to test FFT implementations, you might also take a look at GPU-based codes (if you have access to the proper hardware). There are several: reikna.fft, scikits.cuda.
There's also a CPU based python FFTW wrapper pyFFTW.
(There is pyFFTW3 as well, but it is not so actively maintained as pyFFTW, and it does not work with Python3. (source))
I don't have experience with any of these. It's probably going to fall to you to do some digging around and benchmark different codes for your particular application if speed is important to you.