YUV to RGBA on Apple A4, should I use shaders or NEON?

pawlowski picture pawlowski · Dec 9, 2011 · Viewed 7.4k times · Source

I'm writing media player framework for Apple TV, using OpenGL ES and ffmpeg. Conversion to RGBA is required for rendering on OpenGL ES, soft convert using swscale is unbearably slow, so using information on the internet I came up with two ideas: using neon (like here) or using fragment shaders and GL_LUMINANCE and GL_LUMINANCE_ALPHA.

As I know almost nothing about OpenGL, the second option still doesn't work :)

Can you give me any pointers how to proceed? Thank you in advance.

Answer

jpap picture jpap · Dec 28, 2011

It is most definitely worthwhile learning OpenGL ES2.0 shaders:

  1. You can load-balance between the GPU and CPU (e.g. video decoding of subsequent frames while GPU renders the current frame).
  2. Video frames need to go to the GPU in any case: using YCbCr saves you 25% bus bandwidth if your video has 4:2:0 sampled chrominance.
  3. You get 4:2:0 chrominance up-sampling for free, with the GPU hardware interpolator. (Your shader should be configured to use the same vertex coordinates for both Y and C{b,r} textures, in effect stretching the chrominance texture out over the same area.)
  4. On iOS5 pushing YCbCr textures to the GPU is fast (no data-copy or swizzling) with the texture cache (see the CVOpenGLESTextureCache* API functions). You will save 1-2 data-copies compared to NEON.

I am using these techniques to great effect in my super-fast iPhone camera app, SnappyCam.

You are on the right track for implementation: use a GL_LUMINANCE texture for Y and GL_LUMINANCE_ALPHA if your CbCr is interleaved. Otherwise use three GL_LUMINANCE textures if all of your YCbCr components are noninterleaved.

Creating two textures for 4:2:0 bi-planar YCbCr (where CbCr is interleaved) is straightforward:

    glBindTexture(GL_TEXTURE_2D, texture_y);
    glTexImage2D(
        GL_TEXTURE_2D, 
        0, 
        GL_LUMINANCE,        // Texture format (8bit)
        width,
        height,
        0,                   // No border
        GL_LUMINANCE,        // Source format (8bit)
        GL_UNSIGNED_BYTE,    // Source data format
        NULL
    );
    glBindTexture(GL_TEXTURE_2D, texture_cbcr);
    glTexImage2D(
        GL_TEXTURE_2D, 
        0, 
        GL_LUMINANCE_ALPHA, // Texture format (16-bit)
        width / 2,
        height / 2,
        0,                  // No border
        GL_LUMINANCE_ALPHA, // Source format (16-bits)
        GL_UNSIGNED_BYTE,   // Source data format
        NULL
    );

where you would then use glTexSubImage2D() or the iOS5 texture cache to update these textures.

I'd also recommend using a 2D varying that spans the texture coordinate space (x: [0,1], y: [0,1]) so that you avoid any dependent texture reads in your fragment shader. The end result is super-fast and doesn't load the GPU at all in my experience.