PCM audio amplitude values?

stackoverflowuser2010 picture stackoverflowuser2010 · May 5, 2011 · Viewed 14.2k times · Source

I am starting out with audio recording using my Android smartphone.

I successfully saved voice recordings to a PCM file. When I parse the data and print out the signed, 16-bit values, I can create a graph like the one below. However, I do not understand the amplitude values along the y-axis.

  1. What exactly are the units for the amplitude values? The values are signed 16-bit, so they must range from -32K to +32K. But what do these values represent? Decibels?

  2. If I use 8-bit values, then the values must range from -128 to +128. How would that get mapped to the volume/"loudness" of the 16-bit values? Would you just use a 16-to-1 quantisation mapping?

  3. Why are there negative values? I would think that complete silence would result in values of 0.

If someone can point me to a website with information on what's being recorded, I would appreciate it. I found webpages on the PCM file format, but not what the data values are.

enter image description here

Answer

hotpaw2 picture hotpaw2 · May 5, 2011

Think of the surface of the microphone. When it's silent, the surface is motionless at position zero. When you talk, that causes the air around your mouth to vibrate. Vibrations are spring like, and have movement in both directions, as in back and forth, or up and down, or in and out. The vibrations in the air cause the microphone surface to vibrate as well, as in move up and down. When it moves down, that might be measured or sampled a positive value. When it moves up that might be sampled as a negative value. (Or it could be the opposite.) When you stop talking the surface settles back down to the zero position.

What numbers you get from your PCM recording data depend on the gain of the system. With common 16 bit samples, the range is from -32768 to 32767 for the largest possible excursion of a vibration that can be recorded without distortion, clipping or overflow. Usually the gain is set a bit lower so that the maximum values aren't right on the edge of distortion.

ADDED:

8-bit PCM audio is often an unsigned data type, with the range from 0..255, with a value of 128 indicating "silence". So you have to add/subtract this bias, as well as scale by about 256 to convert between 8-bit and 16-bit audio PCM waveforms.