extracting pitch features from audio file

Ada Xu picture Ada Xu · Dec 22, 2013 · Viewed 13.8k times · Source

I am trying to extract pitch features from an audio file which I would use for a classification problem. I am using python(scipy/numpy) for classification.

I think I can get frequency features using scipy.fft but I don't know how to approximate musical notes using frequencies. I researched a bit and found that I need to get chroma features which map frequencies to 12 bins for notes of a chromatic scale.

I think there's a chroma toolbox for matlab but I don't think there's anything similiar for python.

How should I go forward with this? Could anyone also suggest reading material I should look into?

Answer

Frank Zalkow picture Frank Zalkow · Dec 23, 2013

You can map frequencies to musical notes:

n=12*log_2(f/Cp)+69

with n being the midi note number to be calculated, f the frequency and Cp the chamber pitch (in modern music 440.0 Hz is common).

As you may know a single frequency doesn't make a musical pitch. "Pitch" arises from the sensation of the fundamental of harmonic sounds, i.e. sounds that mainly consist of integer multiples of one single frequency (= the fundamental).

If you want to have Chroma Features in Python, you can use the Bregman Audio-Visual Information Toolbox. Note that chroma features don't give you information about the octave of a pitch, so you just get information about the pitch class.

from bregman.suite import Chromagram
audio_file = "mono_file.wav"
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
F.X # all chroma features
F.X[:,0] # one feature

The general problem of extracting pitch information from audio is called pitch detection.