I have a big batch of files I'd like to run recognition on using CMU Sphinx 4. Sphinx requires the following format:
My files are something like 44100 khz, 32 bit stereo mp3 files. I tried using Tritonus, and then its updated version JavaZoom, to convert using code from bakuzen. However, AudioSystem.getAudioInputStream(File)
throws an UnsupportedAudioFileException
, and I haven't been able to figure out why, so I have moved on.
Now I am trying ffmpeg. The command ffmpeg -i input.mp3 -ac 1 -ab 16 -ar 16000 output.wav
seems like it should do the trick (except for little endian), but when I check the output with Audacity, it still labels it as "32-bit float". The command I found on this site also uses -acodec pcm_s16le
, which from its name seems to be outputting 16 bit little endian; however, Audacity still tells me the output is 32 bit float
.
Can anyone tell me how to convert audio files into the format required by CMU Sphinx 4?
Did you actually try the output from ffmpeg in CMU Sphinx 4? 32-bit float is probably your default sampling format in Audacity (Edit > Preferences > Quality). I'm guessing it converts any imported file to these settings, so it may not be reporting the parameters of the actual file, but perhaps the working file in Audacity.
Remove -ab 16
. This would instruct the encoder to use 16 bits/s and ffmpeg will ignore it for pcm_s16le anyway. So your command will look like:
ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav
To convert all mp3 files in a directory in Linux:
for f in *.mp3; do ffmpeg -i "$f" -acodec pcm_s16le -ac 1 -ar 16000 "${f%.mp3}.wav"; done
Or Windows:
for /r %i in (*) do ffmpeg -i %i -acodec pcm_s16le -ac 1 -ar 16000 %i.wav
In Windows Batch file:
for /r %%i in (*.mp3) do ffmpeg -i "%%i" -acodec pcm_s16le -ac 1 -ar 16000 "%i.wav"
You can see file information with file
, ffmpeg
, ffprobe
, mediainfo
among other utilities:
$ file hjl0bC.wav
hjl0bC.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
$ ffmpeg -i hjl0bC.wav
[...]
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s