Detect silence when recording

olyanren picture olyanren · Apr 27, 2011 · Viewed 21.3k times · Source

How can I detect silence when recording operation is started in Java? What is PCM data? How can I calculate PCM data in Java?

I found the solution :

package bemukan.voiceRecognition.speechToText;

import javax.sound.sampled.*;
import java.io.*;

public class RecordAudio {
    private File audioFile;
    protected boolean running;
    private ByteArrayOutputStream out;
    private AudioInputStream inputStream;
    final static float MAX_8_BITS_SIGNED = Byte.MAX_VALUE;
    final static float MAX_8_BITS_UNSIGNED = 0xff;
    final static float MAX_16_BITS_SIGNED = Short.MAX_VALUE;
    final static float MAX_16_BITS_UNSIGNED = 0xffff;
    private AudioFormat format;
    private float level;
    private int frameSize;

    public RecordAudio(){
         getFormat();
    }

    private AudioFormat getFormat() {
        File file = new File("src/Facebook/1.wav");
        AudioInputStream stream;
        try {
            stream = AudioSystem.getAudioInputStream(file);
            format=stream.getFormat();
            frameSize=stream.getFormat().getFrameSize();
            return stream.getFormat();
        } catch (UnsupportedAudioFileException e) {

        } catch (IOException e) {

        }
        return null;
    }

    public void stopAudio() {

        running = false;
    }

    public void recordAudio() {

        try {
            final AudioFormat format = getFormat();
            DataLine.Info info = new DataLine.Info(
                    TargetDataLine.class, format);
            final TargetDataLine line = (TargetDataLine)
                    AudioSystem.getLine(info);
            line.open(format);
            line.start();
            Runnable runner = new Runnable() {
                int bufferSize = (int) format.getSampleRate()
                        * format.getFrameSize();
                byte buffer[] = new byte[bufferSize];

                public void run() {
                     int readPoint = 0;

                    out = new ByteArrayOutputStream();
                    running = true;
                     int sum=0;
                    while (running) {
                        int count =
                              line.read(buffer, 0, buffer.length);
                              calculateLevel(buffer,0,0);
                         System.out.println(level);

                        if (count > 0) {
                            out.write(buffer, 0, count);
                        }
                    }
                    line.stop();
                }
            };
            Thread captureThread = new Thread(runner);
            captureThread.start();
        } catch (LineUnavailableException e) {
            System.err.println("Line unavailable: " + e);
            System.exit(-2);
        }
    }

    public File getAudioFile() {
        byte[] audio = out.toByteArray();
        InputStream input = new ByteArrayInputStream(audio);
        try {

            final AudioFormat format = getFormat();
            final AudioInputStream ais =
                    new AudioInputStream(input, format,
                            audio.length / format.getFrameSize());
            AudioSystem.write(ais, AudioFileFormat.Type.WAVE, new File("temp.wav"));
            input.close();
            System.out.println("New file created!");
        } catch (IOException e) {
            System.out.println(e.getMessage());
        }
        return new File("temp.wav");
    }
    private void calculateLevel (byte[] buffer,
                                 int readPoint,
                                 int leftOver) {
        int max = 0;
        boolean use16Bit = (format.getSampleSizeInBits() == 16);
        boolean signed = (format.getEncoding() ==
                          AudioFormat.Encoding.PCM_SIGNED);
        boolean bigEndian = (format.isBigEndian());
        if (use16Bit) {
            for (int i=readPoint; i<buffer.length-leftOver; i+=2) {
                int value = 0;
                // deal with endianness
                int hiByte = (bigEndian ? buffer[i] : buffer[i+1]);
                int loByte = (bigEndian ? buffer[i+1] : buffer [i]);
                if (signed) {
                    short shortVal = (short) hiByte;
                    shortVal = (short) ((shortVal << 8) | (byte) loByte);
                    value = shortVal;
                } else {
                    value = (hiByte << 8) | loByte;
                }
                max = Math.max(max, value);
            } // for
        } else {
            // 8 bit - no endianness issues, just sign
            for (int i=readPoint; i<buffer.length-leftOver; i++) {
                int value = 0;
                if (signed) {
                    value = buffer [i];
                } else {
                    short shortVal = 0;
                    shortVal = (short) (shortVal | buffer [i]);
                    value = shortVal;
                }
                max = Math.max (max, value);
            } // for
        } // 8 bit
        // express max as float of 0.0 to 1.0 of max value
        // of 8 or 16 bits (signed or unsigned)
        if (signed) {
            if (use16Bit) { level = (float) max / MAX_16_BITS_SIGNED; }
            else { level = (float) max / MAX_8_BITS_SIGNED; }
        } else {
            if (use16Bit) { level = (float) max / MAX_16_BITS_UNSIGNED; }
            else { level = (float) max / MAX_8_BITS_UNSIGNED; }
        }
    } // calculateLevel


}

Answer

Andrew Thompson picture Andrew Thompson · Apr 27, 2011

How can I detect silence when recording operation is started in Java?

Calculate the dB or RMS value for a group of sound frames and decide at what level it is considered to be 'silence'.

What is PCM data?

Data that is in Pulse-code modulation format.

How can I calculate PCM data in Java?

I do not understand that question. But guessing it has something to do with the speech-recognition tag, I have some bad news. This might theoretically be done using the Java Speech API. But there are apparently no 'speech to text' implementations available for the API (only 'text to speech').


I have to calculate rms for speech-recognition project. But I do not know how can I calculate in Java.

For a single channel that is represented by signal sizes in a double ranging from -1 to 1, you might use this method.

/** Computes the RMS volume of a group of signal sizes ranging from -1 to 1. */
public double volumeRMS(double[] raw) {
    double sum = 0d;
    if (raw.length==0) {
        return sum;
    } else {
        for (int ii=0; ii<raw.length; ii++) {
            sum += raw[ii];
        }
    }
    double average = sum/raw.length;

    double sumMeanSquare = 0d;
    for (int ii=0; ii<raw.length; ii++) {
        sumMeanSquare += Math.pow(raw[ii]-average,2d);
    }
    double averageMeanSquare = sumMeanSquare/raw.length;
    double rootMeanSquare = Math.sqrt(averageMeanSquare);

    return rootMeanSquare;
}

There is a byte buffer to save input values from the line, and what I should have to do with this buffer?

If using the volumeRMS(double[]) method, convert the byte values to an array of double values ranging from -1 to 1. ;)