Peak-finding algorithm for Python/SciPy

endolith picture endolith · Nov 11, 2009 · Viewed 128.5k times · Source

I can write something myself by finding zero-crossings of the first derivative or something, but it seems like a common-enough function to be included in standard libraries. Anyone know of one?

My particular application is a 2D array, but usually it would be used for finding peaks in FFTs, etc.

Specifically, in these kinds of problems, there are multiple strong peaks, and then lots of smaller "peaks" that are just caused by noise that should be ignored. These are just examples; not my actual data:

1-dimensional peaks:

FFT output with peaks

2-dimensional peaks:

Radon transform output with circled peak

The peak-finding algorithm would find the location of these peaks (not just their values), and ideally would find the true inter-sample peak, not just the index with maximum value, probably using quadratic interpolation or something.

Typically you only care about a few strong peaks, so they'd either be chosen because they're above a certain threshold, or because they're the first n peaks of an ordered list, ranked by amplitude.

As I said, I know how to write something like this myself. I'm just asking if there's a pre-existing function or package that's known to work well.

Update:

I translated a MATLAB script and it works decently for the 1-D case, but could be better.

Updated update:

sixtenbe created a better version for the 1-D case.

Answer

Basj picture Basj · Oct 2, 2018

The function scipy.signal.find_peaks, as its name suggests, is useful for this. But it's important to understand well its parameters width, threshold, distance and above all prominence to get a good peak extraction.

According to my tests and the documentation, the concept of prominence is "the useful concept" to keep the good peaks, and discard the noisy peaks.

What is (topographic) prominence? It is "the minimum height necessary to descend to get from the summit to any higher terrain", as it can be seen here:

enter image description here

The idea is:

The higher the prominence, the more "important" the peak is.

Test:

enter image description here

I used a (noisy) frequency-varying sinusoid on purpose because it shows many difficulties. We can see that the width parameter is not very useful here because if you set a minimum width too high, then it won't be able to track very close peaks in the high frequency part. If you set width too low, you would have many unwanted peaks in the left part of the signal. Same problem with distance. threshold only compares with the direct neighbours, which is not useful here. prominence is the one that gives the best solution. Note that you can combine many of these parameters!

Code:

import numpy as np
import matplotlib.pyplot as plt 
from scipy.signal import find_peaks

x = np.sin(2*np.pi*(2**np.linspace(2,10,1000))*np.arange(1000)/48000) + np.random.normal(0, 1, 1000) * 0.15
peaks, _ = find_peaks(x, distance=20)
peaks2, _ = find_peaks(x, prominence=1)      # BEST!
peaks3, _ = find_peaks(x, width=20)
peaks4, _ = find_peaks(x, threshold=0.4)     # Required vertical distance to its direct neighbouring samples, pretty useless
plt.subplot(2, 2, 1)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.subplot(2, 2, 2)
plt.plot(peaks2, x[peaks2], "ob"); plt.plot(x); plt.legend(['prominence'])
plt.subplot(2, 2, 3)
plt.plot(peaks3, x[peaks3], "vg"); plt.plot(x); plt.legend(['width'])
plt.subplot(2, 2, 4)
plt.plot(peaks4, x[peaks4], "xk"); plt.plot(x); plt.legend(['threshold'])
plt.show()