How to change pyplot.specgram x and y axis scaling?

minerals picture minerals · Nov 12, 2015 · Viewed 8.3k times · Source

I have never worked with audio signals before and little do I know about signal processing. Nevertheless, I need to represent and audio signal using pyplot.specgram function from matplotlib library. Here is how I do it.

import matplotlib.pyplot as plt
import scipy.io.wavfile as wavfile

rate, frames = wavfile.read("song.wav")
plt.specgram(frames)

The result I am getting is this nice spectrogram below: enter image description here

When I look at x-axis and y-axis which I suppose are frequency and time domains I can't get my head around the fact that frequency is scaled from 0 to 1.0 and time from 0 to 80k. What is the intuition behind it and, what's more important, how to represent it in a human friendly format such that frequency is 0 to 100k and time is in sec?

Answer

oystein picture oystein · Dec 24, 2015

As others have pointed out, you need to specify the sample rate, else you get a normalised frequency (between 0 and 1) and sample index (0 to 80k). Fortunately this is as simple as:

plt.specgram(frames, Fs=rate)

To expand on Nukolas answer and combining my Changing plot scale by a factor in matplotlib and matplotlib intelligent axis labels for timedelta we can not only get kHz on the frequency axis, but also minutes and seconds on the time axis.

import matplotlib.pyplot as plt
import scipy.io.wavfile as wavfile

cmap = plt.get_cmap('viridis') # this may fail on older versions of matplotlib
vmin = -40  # hide anything below -40 dB
cmap.set_under(color='k', alpha=None)

rate, frames = wavfile.read("song.wav")
fig, ax = plt.subplots()
pxx, freq, t, cax = ax.specgram(frames[:, 0], # first channel
                                Fs=rate,      # to get frequency axis in Hz
                                cmap=cmap, vmin=vmin)
cbar = fig.colorbar(cax)
cbar.set_label('Intensity dB')
ax.axis("tight")

# Prettify
import matplotlib
import datetime

ax.set_xlabel('time h:mm:ss')
ax.set_ylabel('frequency kHz')

scale = 1e3                     # KHz
ticks = matplotlib.ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale))
ax.yaxis.set_major_formatter(ticks)

def timeTicks(x, pos):
    d = datetime.timedelta(seconds=x)
    return str(d)
formatter = matplotlib.ticker.FuncFormatter(timeTicks)
ax.xaxis.set_major_formatter(formatter)
plt.show()

Result:

resulting image