Explanation of audio stat using sox

Question 1

Explanation of audio stat using sox

audio sox

Nguyễn Tài Long · Apr 14, 2017 · Viewed 8.1k times · Source

Answer

Answer

I don't know how I've managed to miss stat in the SoX docs all this time, it's right there.

Length
- length of the audio file in seconds
Scaled by
- what the input is scaled by. By default 2^31-1, to go from 32-bit signed integer to [-1, 1]
Maximum amplitude
- maximum sample value
Minimum amplitude
- minimum sample value
Midline amplitude
- aka mid-range, midpoint between the max and minimum values.
Mean norm
- arithmetic mean of samples' absolute values
Mean amplitude
- arithmetic mean of samples' values
RMS amplitude
- root mean square, root of squared values' mean
Maximum delta
- maximum difference between two successive samples
Minimum delta
- minimum difference between two successive samples
Mean delta
- arithmetic mean of differences between successive samples
RMS delta
- root mean square of differences between successive samples
Rough frequency
- estimation of the input file's frequency, in hertz. unsure of method used
Volume adjustment
- value that should be sent to -v so peak absolute amplitude is 1

Personally I'd rather use the stats function, whose output I find much more practically useful.

As a measure to differentiate between the more or less noisy audio I'd try using the difference between the highest and lowest sound levels. The quietest parts will never be quieter than the background noise alone, so if there is little difference the audio is either noisy, or just loud all the time, like a compressed pop song. You could take the difference between the maximum and minimum RMS values, or between peak and minimum RMS. The RMS window length should be kept fairly short, say between 10 and 200ms, and if the audio has fade-in or fade-out sections, those should be trimmed away, though I didn't include that in the code.

audio="input1.flac"
width=0.01

# Mixes down multi-channel files to mono
stats=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\
  grep "Pk lev dB\|RMS Pk dB\|RMS Tr dB" |\
  sed 's/[^0-9.-]*//g')

peak=$(head -n 1 <<< "$stats")
rmsmax=$(head -n 2 <<< "$stats" | tail -n 1)
rmsmin=$(tail -n 1 <<< "$stats")

rmsdif=$(bc <<< "scale=3; $rmsmax - $rmsmin")
pkmindif=$(bc <<< "scale=3; $peak - $rmsmin")

echo "
  max RMS: $rmsmax
  min RMS: $rmsmin

  diff RMS: $rmsdif
  peak-min: $pkmindif
"

Question 2

I have a bunch of audio files and need to split each files based on silence and using SOX. However, I realize that some files have very noisy background and some don't thus I can't use a single set of parameter to iterate over all files doing the split. I try to figure out how to separate them by noisy background. Here is what I got from sox input1.flac -n stat and sox input2.flac -n stat

Samples read:          18207744
Length (seconds):    568.992000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.031888
Mean    amplitude:    -0.000361
RMS     amplitude:     0.053763
Maximum delta:         0.858917
Minimum delta:         0.000000
Mean    delta:         0.018609
RMS     delta:         0.039249
Rough   frequency:         1859
Volume adjustment:        1.000

and

Samples read:         198976896
Length (seconds):   6218.028000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.156168
Mean    amplitude:    -0.000010
RMS     amplitude:     0.211787
Maximum delta:         1.999969
Minimum delta:         0.000000
Mean    delta:         0.091605
RMS     delta:         0.123462
Rough   frequency:         1484
Volume adjustment:        1.000

The former does not contain noisy background and the latter does. I suspect I can use the Sample Mean of Max delta because of the big gap. Can anyone explain for me the meaning of those stats, or at least show me where I can get it myself (I tried looking up in official documentation but they don't explain). Many thanks.

Explanation of audio stat using sox

Answer

Related questions