Music Recognition and Signal Processing

Alix Axel picture Alix Axel · Jan 15, 2010 · Viewed 9.2k times · Source

I want to build something similar to Tunatic or Midomi (try them out if you're not sure what they do) and I'm wondering what algorithms I'd have to use; The idea I have about the workings of such applications is something like this:

  1. have a big database with several songs
  2. for each song in 1. reduce quality / bit-rate (to 64kbps for instance) and calculate the sound "hash"
  3. have the sound / excerpt of the music you want to identify
  4. for the song in 3. reduce quality / bit-rate (again to 64kbps) and calculate sound "hash"
  5. if 4. sound hash is in any of the 2. sound hashes return the matched music

I though of reducing the quality / bit-rate due to the environment noises and encoding differences.

Am I in the right track here? Can anyone provide me any specific documentation or examples? Midori seems to even recognize hum's, that's pretty awesomely impressive! How do they do that?

Do sound hashes exist or is it something I just made up? If they do, how can I calculate them? And more importantly, how can I check if child-hash is in father-hash?

How would I go about building a similar system with Python (maybe a built-in module) or PHP?

Some examples (preferably in Python or PHP) will be greatly appreciated. Thanks in advance!

Answer

Steve Tjoa picture Steve Tjoa · Jan 15, 2010

I do research in music information retrieval (MIR). The seminal paper on music fingerprinting is the one by Haitsma and Kalker around 2002-03. Google should get you it.

I read an early (really early; before 2000) white paper about Shazam's method. At that point, they just basically detected spectrotemporal peaks, and then hashed the peaks. I'm sure that procedure has evolved.

Both of these methods address music similarity at the signal level, i.e., it is robust to environment distortions. I don't think it works well for query-by-humming (QBH). However, that is a different (yet related) problem with different (yet related) solutions, so you can find solutions in the literature. (Too many to name here.)

The ISMIR proceedings are freely available online. You can find valuable stuff there: http://www.ismir.net/

I agree with using an existing library like Marsyas. Depends on what you want. Numpy/Scipy is indispensible here, I think. Simple stuff can be written in Python on your own. Heck, if you need stuff like STFT, MFCC, I can email you code.