Explain the FFT to me

user143879 picture user143879 · Aug 13, 2009 · Viewed 8.3k times · Source

I want to take audio PCM data and find peaks in it. Specifically, I want to return the frequency and time at which a peak occurs.

My understanding of this is that I have to take the PCM data and dump it into an array, setting it as the real values with the complex parts set to 0. I then take the FFT, and I get an array back. If each number in the array is a magnitude value, how do I get the frequency associated with each one? Also, do I take the magnitude of the real & complex part or just discard the complex values?

Finally, if I wanted to find the peaks in a single song, do I just set a small window to FFT and slide it across all of the audio? Any suggestions on how large that window should be?

Answer

Han picture Han · Aug 13, 2009

If the samplerate of your PCM data is F, then the highest frequency component in the FFT is F/2. Suppose your PCM data was sampled at 44100Hz, then your FFT values will run from 0Hz (DC) to 22050Hz. If you start with N samples, (N being a power of 2), then the FFT may return N/2 values representing all positive frequencies from 0 to F/2, or it may return N values that also include the negative frequencies from -F/2 to 0. You should check the specification of your FFT algorithm to find out to which frequency each array item is mapped.

To find the peaks, you need to look at the magnitude of the FFT values. So you need to add the squared real and imaginary parts of each complex value.

Suppose your FFT of N PCM samples returns N/2 complex values representing positive frequencies. Then the distance between 2 complex samples is F/2N Hz. With F=44100Hz and N=1024 samples, this would be 21.5Hz. This is your frequency resolution. If you need to find lower frequency beats, the FFT window will need to be extended.