I am trying to extract pitch features from an audio file which I would use for a classification problem. I am using python(scipy/numpy) for classification.
I think I can get frequency features using scipy.fft
but I don't know how to approximate musical notes using frequencies. I researched a bit and found that I need to get chroma features which map frequencies to 12
bins for notes of a chromatic scale.
I think there's a chroma toolbox for matlab but I don't think there's anything similiar for python.
How should I go forward with this? Could anyone also suggest reading material I should look into?
You can map frequencies to musical notes:
with being the midi note number to be calculated, the frequency and the chamber pitch (in modern music 440.0 Hz is common).
As you may know a single frequency doesn't make a musical pitch. "Pitch" arises from the sensation of the fundamental of harmonic sounds, i.e. sounds that mainly consist of integer multiples of one single frequency (= the fundamental).
If you want to have Chroma Features in Python, you can use the Bregman Audio-Visual Information Toolbox. Note that chroma features don't give you information about the octave of a pitch, so you just get information about the pitch class.
from bregman.suite import Chromagram
audio_file = "mono_file.wav"
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
F.X # all chroma features
F.X[:,0] # one feature
The general problem of extracting pitch information from audio is called pitch detection.