I am working on iOS application where user can apply a certain set of photo filters. Each filter is basically set of Photoshop actions with a specific parameters. This actions are:
I've repeated all this actions in my code using arithmetic expressions looping through the all pixels in image. But when I run my app on iPhone 4, each filter takes about 3-4 sec to apply which is quite a few time for the user to wait. The image size is 640 x 640 px which is @2x of my view size because it's displayed on Retina display. I've found that my main problem is levels modifications which are calling the pow() C function each time I need to adjust the gamma. I am using floats not doubles of course because ARMv6 and ARMv7 are slow with doubles. Tried to enable and disable Thumb and got the same result.
Example of the simplest filter in my app which is runs pretty fast though (2 secs). The other filters includes more expressions and pow() calls thus making them slow.
https://gist.github.com/1156760
I've seen some solutions which are using Accelerate Framework vDSP matrix transformations for fast image modifications. I've also seen OpenGL ES solutions. I am not sure that they are capable of my needs. But probably it's just a matter of translating my set of changes into some good convolution matrix?
Any advice would be helpful.
Thanks,
Andrey.
For the filter in your example code, you could use a lookup table to make it much faster. I assume your input image is 8 bits per color and you are converting it to float before passing it to this function. For each color, this only gives 256 possible values and therefore only 256 possible output values. You could precompute these and store them in an array. This would avoid the pow()
calculation and the bounds checking since you could factor them into the precomputation.
It would look something like this:
unsigned char table[256];
for(int i=0; i<256; i++) {
float tmp = pow((float)i/255.0f, 1.3f) * 255.0;
table[i] = tmp > 255 ? 255 : (unsigned char)tmp;
}
for(int i=0; i<length; ++i)
m_OriginalPixelBuf[i] = table[m_OriginalPixelBuf[i]];
In this case, you only have to perform pow()
256 times instead of 3*640*640 times. You would also avoid the branching caused by the bounds checking in your main image loop which can be costly. You would not have to convert to float either.
Even a faster way may be to precompute the table outside the program and just put the 256 coefficients in the code.
None of the operations you have listed there should require a convolution or even a matrix multiply. They are all pixel-wise operations, meaning that each output pixel only depends on the single corresponding input pixel. You would need to consider convolution for operations like blurring or sharpening where multiple input pixels affect a single output pixel.