Sox: concatenate multiple audio files without a gap in between

trainoasis picture trainoasis · Aug 13, 2014 · Viewed 25.3k times · Source

I am concatenating multiple (max 25) audio files using SoX with

sox first.mp3 second.mp3 third.mp3 result.mp3

which does what it is supposed to; concatenates given files into one file. But unfortunately there is a small time-gap between those files in result.mp3. Is there a way to remove this gap?

I am creating first.mp3, second.mp3 and so on before concatenating them by merging multiple audios(same length/format/rate):

sox -m drums.mp3 bass.mp3 guitar.mp3 first.mp3

How can I check and assure that there is no time-gap added on all those files? (merged and concatenated)

I need to achieve a seamless playback of all the concatenated files (when playing them in browser one after another it works ok).

Thank you for any help.

EDIT:

The exact example (without real file-names) of a command I am running is now:

sox "|sox -m file1.mp3 file2.mp3 file3.mp3 file4.mp3 -p" "|sox -m file1.mp3 file6.mp3 file7.mp3 -p" "|sox -m file5.mp3 file6.mp3 file4.mp3 -p" "|sox -m file0.mp3 file2.mp3 file9.mp3 -p" "|sox -m file1.mp3 file15.mp3 file4.mp3 -p" result.mp3

This merges files and pipes them directly into concatenation command. The resulting mp3 (result.mp3) has an ever so slight delay between concatenated files. Any ideas really appreciated.

Answer

scruss picture scruss · Feb 14, 2015

The best — though least helpful — way to do this is not to use MP3 files as your source files. WAV, FLAC or M4A files don't have this problem.

MP3s aren't made up of fixed-rate samples, so cropping out a section of an arbitrary length will not work as you expect. Unless the encoder was smart (like lame), there will often be a gap at the start or end of the MP3 file's audio. I did a test with a sample 0.98s long (which is precisely 73½ CDDA frames, and many MP3 encoders use frames for minimum sample lengths). I then encoded the sample with three different MP3 encoders (lame, sox, and the ancient shine), then decoded those files with three decoders (lame, sox, and madplay). Here's how the sample lengths compare to the original:

 Enc.→Dec.          Length     Samples  CDDA Frames
 -----------------  ---------  -------  -----------
 shine→lame         0.95"      42095    71.5901
 shine→madplay      0.97"      42624    72.4898
 shine→sox          0.97"      42624    72.4898
 lame→lame          0.98"      43218    73.5000
*Original           0.98"      43218    73.5000
 sox→sox            0.99"      43776    74.4490
 sox→lame           1.01"      44399    75.5085
 lame→madplay       1.02"      44928    76.4082
 lame→sox           1.02"      44928    76.4082
 sox→madplay        1.02"      44928    76.4082

Only the file encoded and decoded by lame ended up the same length (mostly because lame inserts a length tag to correct for these too-short samples, and knows how to decode it). Everything encoded by sox ended up with a tiny gap, no matter what decoder I used. So joining the files will result in tiny clicks.

Your browser is likely mixing and overlapping the source files very slightly so you don't hear the clicks. Gapless playback is hard to do correctly.