generating correct spectrogram using fftw and window function

Boedy picture Boedy · Jan 22, 2014 · Viewed 7.1k times · Source

For a project I need to be able to generate a spectrogram from a .WAV file. I've read the following should be done:

  1. Get N (transform size) samples
  2. Apply a window function
  3. Do a Fast Fourier Transform using the samples
  4. Normalise the output
  5. Generate spectrogram

On the image below you see two spectrograms of a 10000 Hz sine wave both using the hanning window function. On the left you see a spectrogram generated by audacity and on the right my version. As you can see my version has a lot more lines/noise. Is this leakage in different bins? How would I get a clear image like the one audacity generates. Should I do some post-processing? I have not yet done any normalisation because do not fully understand how to do so.

enter image description here

update

I found this tutorial explaining how to generate a spectrogram in c++. I compiled the source to see what differences I could find.

My math is very rusty to be honest so I'm not sure what the normalisation does here:

    for(i = 0; i < half; i++){
        out[i][0] *= (2./transform_size);
        out[i][6] *= (2./transform_size);
        processed[i] = out[i][0]*out[i][0] + out[i][7]*out[i][8];
        //sets values between 0 and 1?
        processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
    }

after doing this I got this image (btw I've inverted the colors):

enter image description here

I then took a look at difference of the input samples provided by my sound library and the one of the tutorial. Mine were way higher so I manually normalised is by dividing it by the factor 32767.9. I then go this image which looks pretty ok I think. But dividing it by this number seems wrong. And I would like to see a different solution.

enter image description here

Here is the full relevant source code.

void Spectrogram::process(){
    int i;
    int transform_size = 1024;
    int half = transform_size/2;
    int step_size = transform_size/2;
    double in[transform_size];
    double processed[half];
    fftw_complex *out;
    fftw_plan p;

    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);


    for(int x=0; x < wavFile->getSamples()/step_size; x++){

        int j = 0;
        for(i = step_size*x; i < (x * step_size) + transform_size - 1; i++, j++){
            in[j] = wavFile->getSample(i)/32767.9;
        }

        //apply window function
        for(i = 0; i < transform_size; i++){
            in[i] *= windowHanning(i, transform_size);
//            in[i] *= windowBlackmanHarris(i, transform_size);
        }

        p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);

        fftw_execute(p); /* repeat as needed */

        for(i = 0; i < half; i++){
            out[i][0] *= (2./transform_size);
            out[i][11] *= (2./transform_size);
            processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13];
            processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
        }

        for (i = 0; i < half; i++){
            if(processed[i] > 0.99)
                processed[i] = 1;
            In->setPixel(x,(half-1)-i,processed[i]*255);
        }


    }


    fftw_destroy_plan(p);
    fftw_free(out);
}

Answer

Alex I picture Alex I · Jan 31, 2014

This is not exactly an answer as to what is wrong but rather a step by step procedure to debug this.

  1. What do you think this line does? processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13] Likely that is incorrect: fftw_complex is typedef double fftw_complex[2], so you only have out[i][0] and out[i][1], where the first is the real and the second the imaginary part of the result for that bin. If the array is contiguous in memory (which it is), then out[i][12] is likely the same as out[i+6][0] and so forth. Some of these will go past the end of the array, adding random values.

  2. Is your window function correct? Print out windowHanning(i, transform_size) for every i and compare with a reference version (for example numpy.hanning or the matlab equivalent). This is the most likely cause, what you see looks like a bad window function, kind of.

  3. Print out processed, and compare with a reference version (given the same input, of course you'd have to print the input and reformat it to feed into pylab/matlab etc). However, the -60 and 1e-6 are fudge factors which you don't want, the same effect is better done in a different way. Calculate like this:

    power_in_db[i] = 10 * log(out[i][0]*out[i][0] + out[i][1]*out[i][1])/log(10)
    
  4. Print out the values of power_in_db[i] for the same i but for all x (a horizontal line). Are they approximately the same?

  5. If everything so far is good, the remaining suspect is setting the pixel values. Be very explicit about clipping to range, scaling and rounding.

    int pixel_value = (int)round( 255 * (power_in_db[i] - min_db) / (max_db - min_db) );
    if (pixel_value < 0) { pixel_value = 0; }
    if (pixel_value > 255) { pixel_value = 255; }
    

Here, again, print out the values in a horizontal line, and compare with the grayscale values in your pgm (by hand, using the colorpicker in photoshop or gimp or similar).

At this point, you will have validated everything from end to end, and likely found the bug.