Simultaneous record audio from mic and play it back with effect in python

Leonardo Lopez picture Leonardo Lopez · Jul 18, 2013 · Viewed 7.9k times · Source

My goal is to record my voice through the laptop mic and simultaneously adding an effect to it, in python. What I need is similar to a music effects pedal where you connect a guitar or mic and it adds reverb or echo or distortion, etc.

I am using 'pyaudio' and 'wave' to record and play back audio. Using 'scikits.audiolab' to import audio as a array and to be able to edit this array with with functions such as invert, clip, tile, etc. This manipulation of the audio array lets me "add" effects to the original audio.

I am having a problem, which isn't really a problem, it's just not the effect I want. Let's say I record the word "Hello". I have my record function set to record for 3 seconds. I then take this audio array and tile it once. Now, when I play this back, it will say hello twice, a delay effect. BUT, there is a time interval of 'empty space' between both hellos, which happens because the audio is still recording AFTER I finish saying hello. Therefore when it repeats, there's too much empty space between the words. I want to eliminate this empty space so that the playback says hello hello more quickly.

My teacher recommends threading. He says I should record, and simultaneously grab the first 500 samples, to say a number. He recommends to take these 500 samples and play them back while you record. I'm not quite sure how to implement this.

My question is, how to simultaneously record, take the first 500 samples, and create a new array with the "effect" added to the original recording.

import scikits.audiolab as audiolab
import pyaudio
import wave

def recordAudio():

    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 44100
    RECORD_SECONDS = 3
    WAVE_OUTPUT_FILENAME = "audioOriginal.wav"

    p = pyaudio.PyAudio()

    stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

    print("* recording:")

    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)

    print("* Finished recording.")

    stream.stop_stream()
    stream.close()
    p.terminate()

    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

    # Duplicate audio and save as Actual
    frames, fs, encoder = audiolab.wavread('audioOriginal.wav')
    audiolab.wavwrite(frames,'audioActual.wav',fs)

def playAudio():

    import pyaudio
    import wave

    CHUNK = 1024

    wf = wave.open('audioActual.wav', 'rb')

    p = pyaudio.PyAudio()

    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()), 
        channels=wf.getnchannels(), 
        rate=wf.getframerate(), 
        output=True)

    data = wf.readframes(CHUNK)

    while data != '':  
        stream.write(data)  
        data = wf.readframes(CHUNK)

    stream.stop_stream()
    stream.close()
    p.terminate()

def reverseAudio():

    frames, fs, encoder = audiolab.wavread('audioActual.wav')

    audiolab.wavwrite(frames[::-1],'audioActual.wav',44100)

def revert():
    frames, fs, encoder = audiolab.wavread('audioOriginal.wav')
    audiolab.wavwrite(frames,'audioActual.wav',fs)

def errorSelection():
    print("\nERROR.") # no option in menu
def showMenu():
    print("""
    1. Record audio
    2. Play audio
    3. Reverse audio
    4. Add delay
    5. Revert to original audio

    T to end program.
    """)

# Menu
def main():
    selecciones = {"1": recordAudio, "2": playAudio, "3": reverseAudio, "5": revert}
    while True:
        showMenu()
        seleccion = raw_input(u'What do you want to do? ')
        if "t" == seleccion:
            return
        elif "T" == seleccion:
            return
        toDo = selecciones.get(seleccion, errorSelection)
        toDo()

if __name__ == "__main__":
    main()

Answer

Luke picture Luke · Jul 20, 2013

First, the problem you posed (being able to tile audio samples while automatically removing the quiet space between them) is not one that can be solved with threading. You need to analyze the recorded sound to determine where there is or is not silence, or simply allow the user to specify when recording should end. You can accomplish the latter with a simple loop:

  1. Open audio hardware and start recording.
  2. Create an empty list to store chunks of audio
  3. Request a small chunk of audio data, append to the list
  4. Check user has requested the recording to end. If not, loop back to 3.
  5. When finished, assemble the chunks into a single array for playback.

In this simple example, there is no benefit to using threading.

The method suggested, to record and simultaneously play back, seems like a solution to a different problem, one that is much more complex. In this case, there are two major difficulties:

  1. Not all consumer sound cards are capable of recording and playing simultaneously. Look for cards that claim "full duplex" instead of "half duplex".
  2. Speaking into a microphone and hearing yourself with a short delay is extremely distracting. To make this work properly, the recorded audio must be processed and sent back to the sound card in less than about 20 ms. At 44.1 kHz, this means you should be reading fewer than 880 frames per loop-cycle, and if the processing can't keep up, you will have gaps in the output. This is a surprisingly difficult problem unless you have specialized software to help. If you really want to go this way, you might look at Jack (http://jackaudio.org/), which provides low-latency audio access on most platforms and has an easy python library as well (http://sourceforge.net/projects/py-jack/). Threading will probably not be helpful in this type of program.