iPhone Image Processing with Accelerate Framework and vDSP

Roger picture Roger · May 9, 2011 · Viewed 7.5k times · Source

UPDATE: Please see additional question below with more code;

I am trying to code a category for blurring an image. My starting point is Jeff LaMarche's sample here. Whilst this (after the fixes suggested by others) works fine, it is an order of magnitude too slow for my requirements - on a 3GS it takes maybe 3 seconds to do a decent blur and I'd like to get this down to under 0.5 sec for a full screen (faster is better).

He mentions the Accelerate framework as a performance enhancement so I've spent the last day looking at this, and in particular vDSP_f3x3 which according to the Apple Documenation

Filters an image by performing a two-dimensional convolution with a 3x3 kernel; single precision.

Perfect - I have a suitable filter matrix, and I have an image ... but this is where I get stumped.

vDSP_f3x3 assumes image data is (float *) but my image comes from;

srcData = (unsigned char *)CGBitmapContextGetData (context);

and the context comes from CGBitmapContextCreate with kCGImageAlphaPremultipliedFirst, so my srcData is really ARGB with 8 bits per component.

I suspect what I really need is a context with float components, but according to the Quartz documentation here, kCGBitMapFloatComponents is only available on Mac OS and not iOS :-(

Is there a really fast way using the accelerate framework of converting the integer components I have into the float components that vDSP_f3x3 needs? I mean I could do it myself, but by the time I do that, then the convolution, and then convert back, I suspect I'll have made it even slower than it is now since I might as well convolve as I go.

Maybe I have the wrong approach?

Does anyone have some tips for me having done some image processing on the iphone using vDSP? The documentation I can find is very reference orientated and not very newbie friendly when it comes to this sort of thing.

If anyone has a reference for really fast blurring (and high quality, not the reduce resolution and then rescale stuff I've seen and looks pants) that would be fab!

EDIT:

Thanks @Jason. I've done this and it is almost working, but now my problem is that although the image does blur, on every invocation it shifts left 1 pixel. It also seems to make the image black and white, but that could be something else.

Is there anything in this code that leaps out as obviously incorrect? I haven't optimised it yet and it's a bit rough, but hopefully the convolution code is clear enough.

CGImageRef CreateCGImageByBlurringImage(CGImageRef inImage, NSUInteger pixelRadius, NSUInteger gaussFactor)
{
unsigned char *srcData, *finalData;

CGContextRef context = CreateARGBBitmapContext(inImage);
if (context == NULL) 
    return NULL;

size_t width = CGBitmapContextGetWidth(context);
size_t height = CGBitmapContextGetHeight(context);
size_t bpr = CGBitmapContextGetBytesPerRow(context);

int componentsPerPixel = 4; // ARGB

CGRect rect = {{0,0},{width,height}}; 
CGContextDrawImage(context, rect, inImage); 

// Now we can get a pointer to the image data associated with the bitmap
// context.

srcData = (unsigned char *)CGBitmapContextGetData (context);

if (srcData != NULL)
{

    size_t dataSize = bpr * height;
    finalData = malloc(dataSize);
    memcpy(finalData, srcData, dataSize);

    //Generate Gaussian kernel

    float *kernel;  

    // Limit the pixelRadius

    pixelRadius = MIN(MAX(1,pixelRadius), 248);
    int kernelSize = pixelRadius * 2 + 1;

    kernel = malloc(kernelSize * sizeof *kernel);

    int gauss_sum =0;

    for (int i = 0; i < pixelRadius; i++)
    {
        kernel[i] = 1 + (gaussFactor*i);
        kernel[kernelSize - (i + 1)] = 1 + (gaussFactor * i);
        gauss_sum += (kernel[i] + kernel[kernelSize - (i + 1)]);
    }

    kernel[(kernelSize - 1)/2] = 1 + (gaussFactor*pixelRadius);

    gauss_sum += kernel[(kernelSize-1)/2];

    // Scale the kernel

    for (int i=0; i<kernelSize; ++i) {
        kernel[i] = kernel[i]/gauss_sum;
    }

    float * srcAsFloat,* resultAsFloat;

    srcAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);
    resultAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);

   // Convert uint source ARGB to floats

    vDSP_vfltu8(srcData,1,srcAsFloat,1,width*height*componentsPerPixel);

    // Convolve (hence the -1) with the kernel

    vDSP_conv(srcAsFloat, 1, &kernel[kernelSize-1],-1, resultAsFloat, 1, width*height*componentsPerPixel, kernelSize);

    // Copy the floats back to ints

    vDSP_vfixu8(resultAsFloat, 1, finalData, 1, width*height*componentsPerPixel);

    free(resultAsFloat);
    free(srcAsFloat);

}

size_t bitmapByteCount = bpr * height;

CGDataProviderRef dataProvider = CGDataProviderCreateWithData(NULL, finalData, bitmapByteCount, &providerRelease);

CGImageRef cgImage = CGImageCreate(width, height, CGBitmapContextGetBitsPerComponent(context),
                                   CGBitmapContextGetBitsPerPixel(context), CGBitmapContextGetBytesPerRow(context), CGBitmapContextGetColorSpace(context), CGBitmapContextGetBitmapInfo(context), 
                                   dataProvider, NULL, true, kCGRenderingIntentDefault);

CGDataProviderRelease(dataProvider);
CGContextRelease(context); 


return cgImage;
}

I should add that if I comment out the vDSP_conv line, and change the line following to;

       vDSP_vfixu8(srcAsFloat, 1, finalData, 1, width*height*componentsPerPixel);

Then as expected, my result is a clone of the original source. In colour and not shifted left. This implies to me that it IS the convolution that is going wrong, but I can't see where :-(

THOUGHT: Actually thinking about this, it seems to me that the convolve needs to know the input pixels are in ARGB format as otherwise the convolution will be multiplying the values together with no knowledge about their meaning (ie it will multiple R * B etc). This would explain why I get a B&W result I think, but not the shift. Again, I think there might need to be more to it than my naive version here ...

FINAL THOUGHT: I think the shifting left is a natural result of the filter and I need to look at the image dimensions and possibly pad it out ... so I think the code is actually working OK given what I've fed it.

Answer

Brad Larson picture Brad Larson · May 10, 2011

While the Accelerate framework will be faster than simple serial code, you'll probably never see the greatest performance by blurring an image using it.

My suggestion would be to use an OpenGL ES 2.0 shader (for devices that support this API) to do a two-pass box blur. Based on my benchmarks, the GPU can handle these kinds of image manipulation operations at 14-28X the speed of the CPU on an iPhone 4, versus the maybe 4.5X that Apple reports for the Accelerate framework in the best cases.

Some code for this is described in this question, as well as in the "Post-Processing Effects on Mobile Devices" chapter in the GPU Pro 2 book (for which the sample code can be found here). By placing your image in a texture, then reading values in between pixels, bilinear filtering on the GPU gives you some blurring for free, which can then be combined with a few fast lookups and averaging operations.

If you need a starting project to feed images into the GPU for processing, you might be able to use my sample application from the article here. That sample application passes AVFoundation video frames as textures into a processing shader, but you can modify it to send in your particular image data and run your blur operation. You should be able to use my glReadPixels() code to then retrieve the blurred image for later use.

Since I originally wrote this answer, I've created an open source image and video processing framework for doing these kinds of operations on the GPU. The framework has several different blur types within it, all of which can be applied very quickly to images or live video. The GPUImageGaussianBlurFilter, which applies a standard 9-hit Gaussian blur, runs in 16 ms for a 640x480 frame of video on the iPhone 4. The GPUImageFastBlurFilter is a modified 9-hit Gaussian blur that uses hardware filtering, and it runs in 2.0 ms for that same video frame. Likewise, there's a GPUImageBoxBlurFilter that uses a 5-pixel box and runs in 1.9 ms for the same image on the same hardware. I also have median and bilateral blur filters, although they need a little performance tuning.

In my benchmarks, Accelerate doesn't come close to these kinds of speeds, especially when it comes to filtering live video.