How to render Android's YUV-NV21 camera image on the background in libgdx with OpenGLES 2.0 in real-time?

Ayberk Özgür picture Ayberk Özgür · Mar 17, 2014 · Viewed 25.7k times · Source

Unlike Android, I'm relatively new to GL/libgdx. The task I need to solve, namely rendering the Android camera's YUV-NV21 preview image to the screen background inside libgdx in real time is multi-faceted. Here are the main concerns:

  1. Android camera's preview image is only guaranteed to be in the YUV-NV21 space (and in the similar YV12 space where U and V channels are not interleaved but grouped). Assuming that most modern devices will provide implicit RGB conversion is VERY wrong, e.g the newest Samsung Note 10.1 2014 version only provides the YUV formats. Since nothing can be drawn to the screen in OpenGL unless it is in RGB, the color space must somehow be converted.

  2. The example in the libgdx documentation (Integrating libgdx and the device camera) uses an Android surface view that is below everything to draw the image on with GLES 1.1. Since the beginning of March 2014, OpenGLES 1.x support is removed from libgdx due to being obsolete and nearly all devices now supporting GLES 2.0. If you try the same sample with GLES 2.0, the 3D objects you draw on the image will be half-transparent. Since the surface behind has nothing to do with GL, this cannot really be controlled. Disabling BLENDING/TRANSLUCENCY does not work. Therefore, rendering this image must be done purely in GL.

  3. This has to be done in real-time, so the color space conversion must be VERY fast. Software conversion using Android bitmaps will probably be too slow.

  4. As a side-feature, the camera image must be accessible from the Android code in order to perform other tasks than drawing it on the screen, e.g sending it to a native image processor through JNI.

The question is, how is this task done properly and as fast as possible?

Answer

Ayberk Özgür picture Ayberk Özgür · Mar 17, 2014

The short answer is to load the camera image channels (Y,UV) into textures and draw these textures onto a Mesh using a custom fragment shader that will do the color space conversion for us. Since this shader will be running on the GPU, it will be much faster than CPU and certainly much much faster than the Java code. Since this mesh is part of GL, any other 3D shapes or sprites can be safely drawn over or under it.

I solved the problem starting from this answer https://stackoverflow.com/a/17615696/1525238. I understood the general method using the following link: How to use camera view with OpenGL ES, it is written for Bada but the principles are the same. The conversion formulas there were a bit weird so I replaced them with the ones in the Wikipedia article YUV Conversion to/from RGB.

The following are the steps leading to the solution:

YUV-NV21 explanation

Live images from the Android camera are preview images. The default color space (and one of the two guaranteed color spaces) is YUV-NV21 for camera preview. The explanation of this format is very scattered, so I'll explain it here briefly:

The image data is made of (width x height) x 3/2 bytes. The first width x height bytes are the Y channel, 1 brightness byte for each pixel. The following (width / 2) x (height / 2) x 2 = width x height / 2 bytes are the UV plane. Each two consecutive bytes are the V,U (in that order according to the NV21 specification) chroma bytes for the 2 x 2 = 4 original pixels. In other words, the UV plane is (width / 2) x (height / 2) pixels in size and is downsampled by a factor of 2 in each dimension. In addition, the U,V chroma bytes are interleaved.

Here is a very nice image that explains the YUV-NV12, NV21 is just U,V bytes flipped:

YUV-NV12

How to convert this format to RGB?

As stated in the question, this conversion would take too much time to be live if done inside the Android code. Luckily, it can be done inside a GL shader, which runs on the GPU. This will allow it to run VERY fast.

The general idea is to pass the our image's channels as textures to the shader and render them in a way that does RGB conversion. For this, we have to first copy the channels in our image to buffers that can be passed to textures:

byte[] image;
ByteBuffer yBuffer, uvBuffer;

...

yBuffer.put(image, 0, width*height);
yBuffer.position(0);

uvBuffer.put(image, width*height, width*height/2);
uvBuffer.position(0);

Then, we pass these buffers to actual GL textures:

/*
 * Prepare the Y channel texture
 */

//Set texture slot 0 as active and bind our texture object to it
Gdx.gl.glActiveTexture(GL20.GL_TEXTURE0);
yTexture.bind();

//Y texture is (width*height) in size and each pixel is one byte; 
//by setting GL_LUMINANCE, OpenGL puts this byte into R,G and B 
//components of the texture
Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE, 
    width, height, 0, GL20.GL_LUMINANCE, GL20.GL_UNSIGNED_BYTE, yBuffer);

//Use linear interpolation when magnifying/minifying the texture to 
//areas larger/smaller than the texture size
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);

/*
 * Prepare the UV channel texture
 */

//Set texture slot 1 as active and bind our texture object to it
Gdx.gl.glActiveTexture(GL20.GL_TEXTURE1);
uvTexture.bind();

//UV texture is (width/2*height/2) in size (downsampled by 2 in 
//both dimensions, each pixel corresponds to 4 pixels of the Y channel) 
//and each pixel is two bytes. By setting GL_LUMINANCE_ALPHA, OpenGL 
//puts first byte (V) into R,G and B components and of the texture
//and the second byte (U) into the A component of the texture. That's 
//why we find U and V at A and R respectively in the fragment shader code.
//Note that we could have also found V at G or B as well. 
Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE_ALPHA, 
    width/2, height/2, 0, GL20.GL_LUMINANCE_ALPHA, GL20.GL_UNSIGNED_BYTE, 
    uvBuffer);

//Use linear interpolation when magnifying/minifying the texture to 
//areas larger/smaller than the texture size
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);

Next, we render the mesh we prepared earlier (covers the entire screen). The shader will take care of rendering the bound textures on the mesh:

shader.begin();

//Set the uniform y_texture object to the texture at slot 0
shader.setUniformi("y_texture", 0);

//Set the uniform uv_texture object to the texture at slot 1
shader.setUniformi("uv_texture", 1);

mesh.render(shader, GL20.GL_TRIANGLES);
shader.end();

Finally, the shader takes over the task of rendering our textures to the mesh. The fragment shader that achieves the actual conversion looks like the following:

String fragmentShader = 
    "#ifdef GL_ES\n" +
    "precision highp float;\n" +
    "#endif\n" +

    "varying vec2 v_texCoord;\n" +
    "uniform sampler2D y_texture;\n" +
    "uniform sampler2D uv_texture;\n" +

    "void main (void){\n" +
    "   float r, g, b, y, u, v;\n" +

    //We had put the Y values of each pixel to the R,G,B components by 
    //GL_LUMINANCE, that's why we're pulling it from the R component,
    //we could also use G or B
    "   y = texture2D(y_texture, v_texCoord).r;\n" + 

    //We had put the U and V values of each pixel to the A and R,G,B 
    //components of the texture respectively using GL_LUMINANCE_ALPHA. 
    //Since U,V bytes are interspread in the texture, this is probably 
    //the fastest way to use them in the shader
    "   u = texture2D(uv_texture, v_texCoord).a - 0.5;\n" +
    "   v = texture2D(uv_texture, v_texCoord).r - 0.5;\n" +

    //The numbers are just YUV to RGB conversion constants
    "   r = y + 1.13983*v;\n" +
    "   g = y - 0.39465*u - 0.58060*v;\n" +
    "   b = y + 2.03211*u;\n" +

    //We finally set the RGB color of our pixel
    "   gl_FragColor = vec4(r, g, b, 1.0);\n" +
    "}\n"; 

Please note that we are accessing the Y and UV textures using the same coordinate variable v_texCoord, this is due to v_texCoord being between -1.0 and 1.0 which scales from one end of the texture to the other as opposed to actual texture pixel coordinates. This is one of the nicest features of shaders.

The full source code

Since libgdx is cross-platform, we need an object that can be extended differently in different platforms that handles the device camera and rendering. For example, you might want to bypass YUV-RGB shader conversion altogether if you can get the hardware to provide you with RGB images. For this reason, we need a device camera controller interface that will be implemented by each different platform:

public interface PlatformDependentCameraController {

    void init();

    void renderBackground();

    void destroy();
} 

The Android version of this interface is as follows (the live camera image is assumed to be 1280x720 pixels):

public class AndroidDependentCameraController implements PlatformDependentCameraController, Camera.PreviewCallback {

    private static byte[] image; //The image buffer that will hold the camera image when preview callback arrives

    private Camera camera; //The camera object

    //The Y and UV buffers that will pass our image channel data to the textures
    private ByteBuffer yBuffer;
    private ByteBuffer uvBuffer;

    ShaderProgram shader; //Our shader
    Texture yTexture; //Our Y texture
    Texture uvTexture; //Our UV texture
    Mesh mesh; //Our mesh that we will draw the texture on

    public AndroidDependentCameraController(){

        //Our YUV image is 12 bits per pixel
        image = new byte[1280*720/8*12];
    }

    @Override
    public void init(){

        /*
         * Initialize the OpenGL/libgdx stuff
         */

        //Do not enforce power of two texture sizes
        Texture.setEnforcePotImages(false);

        //Allocate textures
        yTexture = new Texture(1280,720,Format.Intensity); //A 8-bit per pixel format
        uvTexture = new Texture(1280/2,720/2,Format.LuminanceAlpha); //A 16-bit per pixel format

        //Allocate buffers on the native memory space, not inside the JVM heap
        yBuffer = ByteBuffer.allocateDirect(1280*720);
        uvBuffer = ByteBuffer.allocateDirect(1280*720/2); //We have (width/2*height/2) pixels, each pixel is 2 bytes
        yBuffer.order(ByteOrder.nativeOrder());
        uvBuffer.order(ByteOrder.nativeOrder());

        //Our vertex shader code; nothing special
        String vertexShader = 
                "attribute vec4 a_position;                         \n" + 
                "attribute vec2 a_texCoord;                         \n" + 
                "varying vec2 v_texCoord;                           \n" + 

                "void main(){                                       \n" + 
                "   gl_Position = a_position;                       \n" + 
                "   v_texCoord = a_texCoord;                        \n" +
                "}                                                  \n";

        //Our fragment shader code; takes Y,U,V values for each pixel and calculates R,G,B colors,
        //Effectively making YUV to RGB conversion
        String fragmentShader = 
                "#ifdef GL_ES                                       \n" +
                "precision highp float;                             \n" +
                "#endif                                             \n" +

                "varying vec2 v_texCoord;                           \n" +
                "uniform sampler2D y_texture;                       \n" +
                "uniform sampler2D uv_texture;                      \n" +

                "void main (void){                                  \n" +
                "   float r, g, b, y, u, v;                         \n" +

                //We had put the Y values of each pixel to the R,G,B components by GL_LUMINANCE, 
                //that's why we're pulling it from the R component, we could also use G or B
                "   y = texture2D(y_texture, v_texCoord).r;         \n" + 

                //We had put the U and V values of each pixel to the A and R,G,B components of the
                //texture respectively using GL_LUMINANCE_ALPHA. Since U,V bytes are interspread 
                //in the texture, this is probably the fastest way to use them in the shader
                "   u = texture2D(uv_texture, v_texCoord).a - 0.5;  \n" +                                   
                "   v = texture2D(uv_texture, v_texCoord).r - 0.5;  \n" +


                //The numbers are just YUV to RGB conversion constants
                "   r = y + 1.13983*v;                              \n" +
                "   g = y - 0.39465*u - 0.58060*v;                  \n" +
                "   b = y + 2.03211*u;                              \n" +

                //We finally set the RGB color of our pixel
                "   gl_FragColor = vec4(r, g, b, 1.0);              \n" +
                "}                                                  \n"; 

        //Create and compile our shader
        shader = new ShaderProgram(vertexShader, fragmentShader);

        //Create our mesh that we will draw on, it has 4 vertices corresponding to the 4 corners of the screen
        mesh = new Mesh(true, 4, 6, 
                new VertexAttribute(Usage.Position, 2, "a_position"), 
                new VertexAttribute(Usage.TextureCoordinates, 2, "a_texCoord"));

        //The vertices include the screen coordinates (between -1.0 and 1.0) and texture coordinates (between 0.0 and 1.0)
        float[] vertices = {
                -1.0f,  1.0f,   // Position 0
                0.0f,   0.0f,   // TexCoord 0
                -1.0f,  -1.0f,  // Position 1
                0.0f,   1.0f,   // TexCoord 1
                1.0f,   -1.0f,  // Position 2
                1.0f,   1.0f,   // TexCoord 2
                1.0f,   1.0f,   // Position 3
                1.0f,   0.0f    // TexCoord 3
        };

        //The indices come in trios of vertex indices that describe the triangles of our mesh
        short[] indices = {0, 1, 2, 0, 2, 3};

        //Set vertices and indices to our mesh
        mesh.setVertices(vertices);
        mesh.setIndices(indices);

        /*
         * Initialize the Android camera
         */
        camera = Camera.open(0);

        //We set the buffer ourselves that will be used to hold the preview image
        camera.setPreviewCallbackWithBuffer(this); 

        //Set the camera parameters
        Camera.Parameters params = camera.getParameters();
        params.setFocusMode(Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO);
        params.setPreviewSize(1280,720); 
        camera.setParameters(params);

        //Start the preview
        camera.startPreview();

        //Set the first buffer, the preview doesn't start unless we set the buffers
        camera.addCallbackBuffer(image);
    }

    @Override
    public void onPreviewFrame(byte[] data, Camera camera) {

        //Send the buffer reference to the next preview so that a new buffer is not allocated and we use the same space
        camera.addCallbackBuffer(image);
    }

    @Override
    public void renderBackground() {

        /*
         * Because of Java's limitations, we can't reference the middle of an array and 
         * we must copy the channels in our byte array into buffers before setting them to textures
         */

        //Copy the Y channel of the image into its buffer, the first (width*height) bytes are the Y channel
        yBuffer.put(image, 0, 1280*720);
        yBuffer.position(0);

        //Copy the UV channels of the image into their buffer, the following (width*height/2) bytes are the UV channel; the U and V bytes are interspread
        uvBuffer.put(image, 1280*720, 1280*720/2);
        uvBuffer.position(0);

        /*
         * Prepare the Y channel texture
         */

        //Set texture slot 0 as active and bind our texture object to it
        Gdx.gl.glActiveTexture(GL20.GL_TEXTURE0);
        yTexture.bind();

        //Y texture is (width*height) in size and each pixel is one byte; by setting GL_LUMINANCE, OpenGL puts this byte into R,G and B components of the texture
        Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE, 1280, 720, 0, GL20.GL_LUMINANCE, GL20.GL_UNSIGNED_BYTE, yBuffer);

        //Use linear interpolation when magnifying/minifying the texture to areas larger/smaller than the texture size
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);


        /*
         * Prepare the UV channel texture
         */

        //Set texture slot 1 as active and bind our texture object to it
        Gdx.gl.glActiveTexture(GL20.GL_TEXTURE1);
        uvTexture.bind();

        //UV texture is (width/2*height/2) in size (downsampled by 2 in both dimensions, each pixel corresponds to 4 pixels of the Y channel) 
        //and each pixel is two bytes. By setting GL_LUMINANCE_ALPHA, OpenGL puts first byte (V) into R,G and B components and of the texture
        //and the second byte (U) into the A component of the texture. That's why we find U and V at A and R respectively in the fragment shader code.
        //Note that we could have also found V at G or B as well. 
        Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE_ALPHA, 1280/2, 720/2, 0, GL20.GL_LUMINANCE_ALPHA, GL20.GL_UNSIGNED_BYTE, uvBuffer);

        //Use linear interpolation when magnifying/minifying the texture to areas larger/smaller than the texture size
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);

        /*
         * Draw the textures onto a mesh using our shader
         */

        shader.begin();

        //Set the uniform y_texture object to the texture at slot 0
        shader.setUniformi("y_texture", 0);

        //Set the uniform uv_texture object to the texture at slot 1
        shader.setUniformi("uv_texture", 1);

        //Render our mesh using the shader, which in turn will use our textures to render their content on the mesh
        mesh.render(shader, GL20.GL_TRIANGLES);
        shader.end();
    }

    @Override
    public void destroy() {
        camera.stopPreview();
        camera.setPreviewCallbackWithBuffer(null);
        camera.release();
    }
}

The main application part just ensures that init() is called once in the beginning, renderBackground() is called every render cycle and destroy() is called once in the end:

public class YourApplication implements ApplicationListener {

    private final PlatformDependentCameraController deviceCameraControl;

    public YourApplication(PlatformDependentCameraController cameraControl) {
        this.deviceCameraControl = cameraControl;
    }

    @Override
    public void create() {              
        deviceCameraControl.init();
    }

    @Override
    public void render() {      
        Gdx.gl.glViewport(0, 0, Gdx.graphics.getWidth(), Gdx.graphics.getHeight());
        Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT | GL20.GL_DEPTH_BUFFER_BIT);

        //Render the background that is the live camera image
        deviceCameraControl.renderBackground();

        /*
         * Render anything here (sprites/models etc.) that you want to go on top of the camera image
         */
    }

    @Override
    public void dispose() {
        deviceCameraControl.destroy();
    }

    @Override
    public void resize(int width, int height) {
    }

    @Override
    public void pause() {
    }

    @Override
    public void resume() {
    }
}

The only other Android-specific part is the following extremely short main Android code, you just create a new Android specific device camera handler and pass it to the main libgdx object:

public class MainActivity extends AndroidApplication {

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);

        AndroidApplicationConfiguration cfg = new AndroidApplicationConfiguration();
        cfg.useGL20 = true; //This line is obsolete in the newest libgdx version
        cfg.a = 8;
        cfg.b = 8;
        cfg.g = 8;
        cfg.r = 8;

        PlatformDependentCameraController cameraControl = new AndroidDependentCameraController();
        initialize(new YourApplication(cameraControl), cfg);

        graphics.getView().setKeepScreenOn(true);
    }
}

How fast is it?

I tested this routine on two devices. While the measurements are not constant across frames, a general profile can be observed:

  1. Samsung Galaxy Note II LTE - (GT-N7105): Has ARM Mali-400 MP4 GPU.

    • Rendering one frame takes around 5-6 ms, with occasional jumps to around 15 ms every couple of seconds
    • Actual rendering line (mesh.render(shader, GL20.GL_TRIANGLES);) consistently takes 0-1 ms
    • Creation and binding of both textures consistently take 1-3 ms in total
    • ByteBuffer copies generally take 1-3 ms in total but jump to around 7ms occasionally, probably due to the image buffer being moved around in the JVM heap
  2. Samsung Galaxy Note 10.1 2014 - (SM-P600): Has ARM Mali-T628 GPU.

    • Rendering one frame takes around 2-4 ms, with rare jumps to around 6-10 ms
    • Actual rendering line (mesh.render(shader, GL20.GL_TRIANGLES);) consistently takes 0-1 ms
    • Creation and binding of both textures take 1-3 ms in total but jump to around 6-9 ms every couple of seconds
    • ByteBuffer copies generally take 0-2 ms in total but jump to around 6ms very rarely

Please don't hesitate to share if you think that these profiles can be made faster with some other method. Hope this little tutorial helped.