Speech Synthesis - Creating Custom Voices

Travier picture Travier · Apr 8, 2014 · Viewed 8.1k times · Source

Is it possible, programatically, to take someone's voice sample and produce a unique tone/property that could be used to create a synthesised speech?

For example, person A records himself. A unique tone is produced from this voice sample, and is being turned into synthesis speech. This allows people to use this synthetic voice in Text-to-Speech software, writing any text that they want that would be read in person A's voice.

Is it possible in today's terms? I know that there are companies that do this professionally, but generally, is it possible for a piece of software to do this?

Answer

Markus Toman picture Markus Toman · Aug 29, 2014

Using speaker adaptation methods you can achieve some results with comparably few training samples but still you should have some hundred sentences of the person - preferably with a phonetic transcription.

We once had this as a small lab exercise for students to record their own voices and train a voice model using HTS (http://hts.sp.nitech.ac.jp/). The "most simple" approach using HTS is to download the "Speaker dependent training demo" from this page and replace the training speech samples with your own recordings (of the same sentences!). We did this for another language with our own package though.

I think MaryTTS (http://mary.dfki.de/) has some more convenient tools to assist with this process but I've never worked with that.

But still - for high quality voices, you should have thousands of recorded sentences.