Pitch detection in Python

Andrew Ravus picture Andrew Ravus · Sep 15, 2015 · Viewed 20.6k times · Source

The concept of the program I'm working on is a Python module which detects certain frequencies (human speech frequency 80-300hz) and by checking from a database shows the intonation of the sentence. I use SciPy to plot frequency of the sound files, but I cannot set any certain frequency in order to analyze pitch. How can I do this?

more info: I would like to be able to set a defined pattern in speech (e.g. Rising, Falling) and the program detects if the sound file follows the specific pattern.

Answer

Sahil M picture Sahil M · Sep 15, 2015

You could try the following. I'm sure you know that human voice also has harmonics which go way beyond 300 Hz. Nevertheless, you can move a window across your audio file, and try to look at change in power in the max ( as shown below) or a set of frequencies in a window. The code below is for giving intuition:

import scipy.fftpack as sf
import numpy as np
def maxFrequency(X, F_sample, Low_cutoff=80, High_cutoff= 300):
        """ Searching presence of frequencies on a real signal using FFT
        Inputs
        =======
        X: 1-D numpy array, the real time domain audio signal (single channel time series)
        Low_cutoff: float, frequency components below this frequency will not pass the filter (physical frequency in unit of Hz)
        High_cutoff: float, frequency components above this frequency will not pass the filter (physical frequency in unit of Hz)
        F_sample: float, the sampling frequency of the signal (physical frequency in unit of Hz)
        """        

        M = X.size # let M be the length of the time series
        Spectrum = sf.rfft(X, n=M) 
        [Low_cutoff, High_cutoff, F_sample] = map(float, [Low_cutoff, High_cutoff, F_sample])

        #Convert cutoff frequencies into points on spectrum
        [Low_point, High_point] = map(lambda F: F/F_sample * M, [Low_cutoff, High_cutoff])

        maximumFrequency = np.where(Spectrum == np.max(Spectrum[Low_point : High_point])) # Calculating which frequency has max power.

        return maximumFrequency

voiceVector = []
for window in fullAudio: # Run a window of appropriate length across the audio file
    voiceVector.append (maxFrequency( window, samplingRate))

Now based on the intonation of the voice, the maximum power frequency may shift which you can register and map to a given intonation. This may not necessarily be true always, and you may have to monitor shifts in a lot of frequencies together, but this should get you started.