Using pyDub to chop up a long audio file

user3643227 picture user3643227 · May 19, 2014 · Viewed 13.6k times · Source

I'd like to use pyDub to take a long WAV file of individual words (and silence in between) as input, then strip out all the silence, and output the remaining chunks is individual WAV files. The filenames can just be sequential numbers, like 001.wav, 002.wav, 003.wav, etc.

The "Yet another Example?" example on the Github page does something very similar, but rather than outputting separate files, it combines the silence-stripped segments back together into one file:

from pydub import AudioSegment
from pydub.utils import db_to_float

# Let's load up the audio we need...
podcast = AudioSegment.from_mp3("podcast.mp3")
intro = AudioSegment.from_wav("intro.wav")
outro = AudioSegment.from_wav("outro.wav")

# Let's consider anything that is 30 decibels quieter than
# the average volume of the podcast to be silence
average_loudness = podcast.rms
silence_threshold = average_loudness * db_to_float(-30)

# filter out the silence
podcast_parts = (ms for ms in podcast if ms.rms > silence_threshold)

# combine all the chunks back together
podcast = reduce(lambda a, b: a + b, podcast_parts)

# add on the bumpers
podcast = intro + podcast + outro

# save the result
podcast.export("podcast_processed.mp3", format="mp3")

Is it possible to output those podcast_parts fragments as individual WAV files? If so, how?

Thanks!

Answer

Jiaaro picture Jiaaro · May 19, 2014

The example code is pretty simplified, you'll probably want to look at the strip_silence function:

https://github.com/jiaaro/pydub/blob/2644289067aa05dbb832974ac75cdc91c3ea6911/pydub/effects.py#L98

And then just export each chunk instead of combining them.

The main difference between the example and the strip_silence function is the example looks at one millisecond slices, which doesn't count low frequency sound very well since one waveform of a 40hz sound, for example, is 25 milliseconds long.

The answer to your original question though, is that all those slices of the original audio segment are also audio segments, so you can just call the export method on them :)

update: you may want to take a look at the silence utilities I've just pushed up into the master branch; especially split_on_silence() which could do this (assuming the right specific arguments) like so:

from pydub import AudioSegment
from pydub.silence import split_on_silence

sound = AudioSegment.from_mp3("my_file.mp3")
chunks = split_on_silence(sound, 
    # must be silent for at least half a second
    min_silence_len=500,

    # consider it silent if quieter than -16 dBFS
    silence_thresh=-16
)

you could export all the individual chunks as wav files like this:

for i, chunk in enumerate(chunks):
    chunk.export("/path/to/ouput/dir/chunk{0}.wav".format(i), format="wav")

which would make output each one named "chunk0.wav", "chunk1.wav", "chunk2.wav", and so on