Python: Performing FFT on .csv values using SciPy documentation

ailsa_naismith picture ailsa_naismith · Feb 5, 2018 · Viewed 9.5k times · Source

I would like to perform Fast Fourier transform on a data series. The series contains values of daily seismic amplitude, sampled consistently across 407 days. I would like to search this dataset for any periodic cycles.

I have made an initial attempt using SciPy documentation here: https://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html. Similar to this question (link), I then changed the argument for y from an artificial sine function to my dataset.

However, I get the following error:

ValueError: x and y must have same first dimension, but have shapes (203,) and (407, 1)

I would appreciate help on understanding why I get this error, and how I can fix it.

I would also appreciate help on the correct frequency and sampling input values I need for FFT to work on my dataset. I have 407 values in my dataset, and each value represents a day. Thus I have defined N (number of sample points) = 407, and T (sample spacing) = 1 / 84600 (1 / number of seconds in a day). Is that correct?

Here is my full code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft, ifft
import pandas as pd

# Import csv file
df = pd.read_csv('rsam_2016-17_fft_test.csv', index_col=['DateTime'], parse_dates=['DateTime'])
print(df.head())

#plot data
plt.figure(figsize=(12,4))
df.plot(linestyle = '', marker = '*', color='r')
plt.show()

#FFT
#number of sample points
N = 407
#frequency of signal
T = 1 / 84600
#create x-axis for time length of signal
x = np.linspace(0, N*T, N)
#create array that corresponds to values in signal
y = df
#perform FFT on signal
yf = fft(y)
#create new x-axis: frequency from signal
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)
#plot results
plt.plot(xf, yf)
plt.grid()
plt.show()

Many thanks in advance for your help!


Edit: the full error is below:

Traceback (most recent call last):

File "<ipython-input-40-c090e0039ba9>", line 1, in <module>
runfile('/Users/an16975/Desktop/PhD/Code/FFT/fft_test.py', wdir='/Users/an16975/Desktop/PhD/Code/FFT')

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/an16975/Desktop/PhD/Code/FFT/fft_test.py", line 36, in <module>
plt.plot(xf, yf)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/pyplot.py", line 3317, in plot
ret = ax.plot(*args, **kwargs)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py", line 1898, in inner
return func(ax, *args, **kwargs)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 1406, in plot
for line in self._get_lines(*args, **kwargs):

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 407, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 385, in _plot_args
x, y = self._xy_from_xy(x, y)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 244, in _xy_from_xy
"have shapes {} and {}".format(x.shape, y.shape))

ValueError: x and y must have same first dimension, but have shapes (203,) and (407, 1)

Second edit:

SleuthEye's answer allowed me to generate the plots that I was looking for. For anyone interested, the plots are below.

The unfiltered data series -

Daily seismic amplitude values for a period within 2016 - 2017

And below, the plot generated from performing FFT on the above data series -

Fast Fourier Transform of daily seismic amplitude dataset

I hope that is the correct output, and that I have correctly labelled the axes/understood what the second plot represents. I would also like to know why the upper part of the Fourier spectrum is redundant.

For reference, my full (corrected) code is below:

import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft, ifft
import pandas as pd

# Import csv file
df = pd.read_csv('rsam_2016-17_fft_test.csv', index_col=['DateTime'], parse_dates=['DateTime'])
print(df.head())

#plot data
plt.figure(figsize=(12,4))
df.plot(linestyle = '', marker = '*', color='r')
plt.savefig('rsam_2016_2017_snippetforfft.jpg')
plt.show()

#FFT
#number of sample points
N = 407
#frequency of signal (in days)
T = 1
#create x-axis for time length of signal
x = np.linspace(0, N*T, N)
#create array that corresponds to values in signal
y = df
#perform FFT on signal
yf = fft(y)
#create new x-axis: frequency from signal
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)
#plot results
plt.plot(xf, yf[0:N//2], label = 'signal')
plt.grid()
plt.xlabel('Frequency (days)')
plt.ylabel(r'Spectral Amplitude')
plt.legend(loc=1)
plt.savefig('rsam_2016_2017_snippet_fft_firstresult.jpg')
plt.show()

Answer

SleuthEye picture SleuthEye · Feb 5, 2018

The fft function returns the full N points spectrum (which for real-valued inputs includes the redundant upper half of the spectrum), whereas your frequency axis xf is constructed to cover only the lower half of the spectrum with only N//2 points. Your error relates to the mismatch between those xf and yf array sizes. Due to the redundancy, you may exclude the upper half of the spectrum in yf using yf[0:N//2].

Note also that the array yf contains complex numbers. To display a spectrogram output, you should take the absolute value:

plt.plot(xf, abs(yf[0:N//2]))

Finally as far as sampling period, if you are going to use seconds for sampling period and Hz for frequencies, you should be using T = 86400 (since you have a data point every 1 day or 86400 seconds). You could also alternatively use days for sampling period and day-1 (or cycles/day) for frequencies, in which case you would use T = 1.