Any simple VAD implementation?

Gilad Novik picture Gilad Novik · Mar 20, 2011 · Viewed 12.6k times · Source

I'm looking for some C/C++ code for VAD (Voice Activity Detection).

Basically, my application is reading PCM frames from the device. I would like to know when the user is talking. I'm not looking for any speech recognition algorithm but only for voice detection.

I would like to know when the user is talking and when he finishes:

bool isVAD(short* pcm,size_t count);

Answer

John Wiseman picture John Wiseman · Apr 24, 2016

Google's open-source WebRTC code has a VAD module written in C. It uses a Gaussian Mixture Model (GMM), which is typically much more effective than a simple energy-threshold detector, especially in a situation with dynamic levels and types of background noise. In my experience it's also much more effective than the Moattar-Homayounpour VAD that Gilad mentions in their comment.

The VAD code is part of the much, much larger WebRTC repository, but it's very easy to pull it out and compile it on its own. E.g. the webrtcvad Python wrapper includes just the VAD C source.

The WebRTC VAD API is very easy to use. First, the audio must be mono 16 bit PCM, with either a 8 KHz, 16 KHz or 32 KHz sample rate. Each frame of audio that you send to the VAD must be 10, 20 or 30 milliseconds long.

Here's an outline of an example that assumes audio_frame is 10 ms (320 bytes) of audio at 16000 Hz:

#include "webrtc/common_audio/vad/include/webrtc_vad.h"
// ...
VadInst *vad;
WebRtcVad_Create(&vad);
WebRtcVad_Init(vad);
int is_voiced = WebRtcVad_Process(vad, 16000, audio_frame, 160);