Fourier smoothing of data set

Py-ser picture Py-ser · Apr 15, 2014 · Viewed 10.2k times · Source

I am following this link to do a smoothing of my data set. The technique is based on the principle of removing the higher order terms of the Fourier Transform of the signal, and so obtaining a smoothed function. This is part of my code:

N = len(y)

y = y.astype(float)               # fix issue, see below
yfft = fft(y, N)

yfft[31:] = 0.0                   # set higher harmonics to zero
y_smooth = fft(yfft, N)

ax.errorbar(phase, y, yerr = err, fmt='b.', capsize=0, elinewidth=1.0)
ax.plot(phase, y_smooth/30, color='black') #arbitrary normalization, see below

However some things do not work properly. Indeed, you can check the resulting plot : enter image description here The blue points are my data, while the black line should be the smoothed curve.

First of all I had to convert my array of data y by following this discussion.

Second, I just normalized arbitrarily to compare the curve with data, since I don't know why the original curve had values much higher than the data points.

Most importantly, the curve is like "specular" to the data point, and I don't know why this happens. It would be great to have some advices especially to the third point, and more generally how to optimize the smoothing with this technique for my particular data set shape.

Answer

Davidmh picture Davidmh · Apr 15, 2014

Your problem is probably due to the shifting that the standard FFT does. You can read about it here.

Your data is real, so you can take advantage of symmetries in the FT and use the special function np.fft.rfft

import numpy as np

x = np.arange(40)
y = np.log(x + 1) * np.exp(-x/8.) * x**2 + np.random.random(40) * 15
rft = np.fft.rfft(y)
rft[5:] = 0   # Note, rft.shape = 21
y_smooth = np.fft.irfft(rft)

plt.plot(x, y, label='Original')
plt.plot(x, y_smooth, label='Smoothed')
plt.legend(loc=0)
plt.show()

If you plot the absolute value of rft, you will see that there is almost no information in frequencies beyond 5, so that is why I choose that threshold (and a bit of playing around, too).

Here the results:

enter image description here