wav-to-midi conversion

Dolphin picture Dolphin · Jan 24, 2010 · Viewed 14.8k times · Source

I'm new to this field - but I need to perform a WAV-to-MIDI conversion in java. Is there a way to know what exactly are the steps involved in WAV-to-MIDI conversion? I have a very rough idea as in you need to; sample the wav file, filter it, use FFT for spectral analysis, feature extraction and then write the extracted features on to MIDI. But I cannot find solid sources or papers as in how to do all that? Can some one give me clues as in how and where to start? Are there any Open Source APIs available for this WAV-to-MIDI conversion process?

Advance thanks

Answer

Steve Tjoa picture Steve Tjoa · Jan 24, 2010

It's a more involved process than you might imagine.

This research problem is often referred to as music transcription: the act of converting a low-level representation of music (e.g., waveform) into a higher-level representation such as MIDI or even sheet music.

The sophistication of your solution will depend upon the complexity of your input data. Tons of research papers address music transcription only on monophonic piano or drums... because they are easy to transcribe. (Relatively.) Violin is harder. Voice is even harder. Violin plus voice plus piano is much harder. A symphony is nearly impossible. You get the picture.

The basic elements of music transcription involve any of the following overlapping areas:

  1. (multi)pitch estimation
  2. instrument recognition, timbral modeling
  3. rhythm detection
  4. note onset/offset detection
  5. form/structure modeling

Search for papers on "music transcription" on Google Scholar or from the ISMIR proceedings: http://www.ismir.net. If you are more interested in one of the above subtopics, I can point you further. Good luck.

EDIT: That being said, there are existing solutions that we can all find on the web. Feel free to try them. But as you do, evaluate them with a critical eye and ear. What types of audio signals would cause transcription to fail?

EDIT 2: Ah, you are only doing this for piano. Okay, this is doable. Music transcription has advanced to the point where it can transcribe monophonic piano pretty well. A Rachmaninov concerto will still pose problems.

Our recommendations depend upon your end goal. You state "need to perform... in Java." So it sounds like you just want something to work regardless of how it gets you there. In that case, I agree 100% with others: use something that exists.

That's actually an interesting question; all of the MIR libraries I know are typically C/C++/Python/Matlab. But not Java. The EchoNest has a Java API, but I don't think it does note-level transcription. http://developer.echonest.com. (Edit: It does note-level transcription. The returned data includes pitch, timbre, beat, tatum, and more. But I find polyphony is still a problem.)

Oh, Marsyas is Java-based. Cool. I thought it was just C++. http://marsyas.info/ I recommend this. It's developed by George Tzanetakis, a professor in MIR. It does signal-level analysis and should be a good option.

Now, if this is for a fun learning experience, I think you can use the sound manipulation utilities in Java to experiment with the WAV signal and see what comes out.

EDIT: This page describes MIR software better than I can: The Tools We Use

For Matlab, you may be interested in the MIR Toolbox

Here is a nice page of common datasets: MIR Datasets