Creating a video from images using ffmpeg libav and libx264?

marikaner picture marikaner · Jul 23, 2013 · Viewed 10.6k times · Source

I am trying to create a video from images using the ffmpeg library. The images have a size of 1920x1080 and are supposed to be encoded with H.264 using a .mkv container. I have come across various problems, thinking I am getting closer to a solution, but this one I am really stuck on. With the settings I use, the first X frames (around 40, depending on what and how many images I use for the video) of my video are not encoded. avcodec_encode_video2 does not return any error (return value is 0) with got_picture_ptr = 0. The result is a video that actually looks as expected, but the first seconds are weirdly jumpy.

So this is how I create the video file:

// m_codecContext is an instance variable of type AVCodecContext *
// m_formatCtx is an instance variable of type AVFormatContext *

// outputFileName is a valid filename ending with .mkv
AVOutputFormat *oformat = av_guess_format(NULL, outputFileName, NULL);
if (oformat == NULL)
{
    oformat = av_guess_format("mpeg", NULL, NULL);
}

// oformat->video_codec is AV_CODEC_ID_H264
AVCodec *codec = avcodec_find_encoder(oformat->video_codec);

m_codecContext = avcodec_alloc_context3(codec);
m_codecContext->codec_id = oformat->video_codec;
m_codecContext->codec_type = AVMEDIA_TYPE_VIDEO;
m_codecContext->gop_size = 30;
m_codecContext->bit_rate = width * height * 4
m_codecContext->width = width;
m_codecContext->height = height;
m_codecContext->time_base = (AVRational){1,frameRate};
m_codecContext->max_b_frames = 1;
m_codecContext->pix_fmt = AV_PIX_FMT_YUV420P;

m_formatCtx = avformat_alloc_context();
m_formatCtx->oformat = oformat;
m_formatCtx->video_codec_id = oformat->video_codec;

snprintf(m_formatCtx->filename, sizeof(m_formatCtx->filename), "%s", outputFileName);

AVStream *videoStream = avformat_new_stream(m_formatCtx, codec);
if(!videoStream)
{
   printf("Could not allocate stream\n");
}
videoStream->codec = m_codecContext;

if(m_formatCtx->oformat->flags & AVFMT_GLOBALHEADER)
{
   m_codecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
}

avcodec_open2(m_codecContext, codec, NULL) < 0);
avio_open(&m_formatCtx->pb, outputFileName.toStdString().c_str(), AVIO_FLAG_WRITE);
avformat_write_header(m_formatCtx, NULL);

this is how the frames are added:

void VideoCreator::writeImageToVideo(const QSharedPointer<QImage> &img, int frameIndex)
{
    AVFrame *frame = avcodec_alloc_frame();

    /* alloc image and output buffer */

    int size = m_codecContext->width * m_codecContext->height;
    int numBytes = avpicture_get_size(m_codecContext->pix_fmt, m_codecContext->width, m_codecContext->height);

    uint8_t *outbuf = (uint8_t *)malloc(numBytes);
    uint8_t *picture_buf = (uint8_t *)av_malloc(numBytes);

    int ret = av_image_fill_arrays(frame->data, frame->linesize, picture_buf, m_codecContext->pix_fmt, m_codecContext->width, m_codecContext->height, 1);

    frame->data[0] = picture_buf;
    frame->data[1] = frame->data[0] + size;
    frame->data[2] = frame->data[1] + size/4;
    frame->linesize[0] = m_codecContext->width;
    frame->linesize[1] = m_codecContext->width/2;
    frame->linesize[2] = m_codecContext->width/2;

    fflush(stdout);


    for (int y = 0; y < m_codecContext->height; y++)
    {
        for (int x = 0; x < m_codecContext->width; x++)
        {
            unsigned char b = img->bits()[(y * m_codecContext->width + x) * 4 + 0];
            unsigned char g = img->bits()[(y * m_codecContext->width + x) * 4 + 1];
            unsigned char r = img->bits()[(y * m_codecContext->width + x) * 4 + 2];

            unsigned char Y = (0.257 * r) + (0.504 * g) + (0.098 * b) + 16;

            frame->data[0][y * frame->linesize[0] + x] = Y;

            if (y % 2 == 0 && x % 2 == 0)
            {
                unsigned char V = (0.439 * r) - (0.368 * g) - (0.071 * b) + 128;
                unsigned char U = -(0.148 * r) - (0.291 * g) + (0.439 * b) + 128;

                frame->data[1][y/2 * frame->linesize[1] + x/2] = U;
                frame->data[2][y/2 * frame->linesize[2] + x/2] = V;
            }
        }
    }

    int pts = frameIndex;//(1.0 / 30.0) * 90.0 * frameIndex;

    frame->pts = pts;//av_rescale_q(m_codecContext->coded_frame->pts, m_codecContext->time_base, formatCtx->streams[0]->time_base); //(1.0 / 30.0) * 90.0 * frameIndex;

    int got_packet_ptr;
    AVPacket packet;
    av_init_packet(&packet);
    packet.data = outbuf;
    packet.size = numBytes;
    packet.stream_index = formatCtx->streams[0]->index;
    packet.flags |= AV_PKT_FLAG_KEY;
    packet.pts = packet.dts = pts;
    m_codecContext->coded_frame->pts = pts;

    ret = avcodec_encode_video2(m_codecContext, &packet, frame, &got_packet_ptr);
    if (got_packet_ptr != 0)
    {
        m_codecContext->coded_frame->pts = pts;  // Set the time stamp

        if (m_codecContext->coded_frame->pts != (0x8000000000000000LL))
        {
            pts = av_rescale_q(m_codecContext->coded_frame->pts, m_codecContext->time_base, formatCtx->streams[0]->time_base);
        }
        packet.pts = pts;
        if(m_codecContext->coded_frame->key_frame)
        {
           packet.flags |= AV_PKT_FLAG_KEY;
        }

        std::cout << "pts: " << packet.pts << ", dts: "  << packet.dts << std::endl;

        av_interleaved_write_frame(formatCtx, &packet);
        av_free_packet(&packet);
    }

    free(picture_buf);
    free(outbuf);
    av_free(frame);
    printf("\n");
}

and this is the cleanup:

int numBytes = avpicture_get_size(m_codecContext->pix_fmt, m_codecContext->width, m_codecContext->height);
int got_packet_ptr = 1;

int ret;
//        for(; got_packet_ptr != 0; i++)
while (got_packet_ptr)
{
    uint8_t *outbuf = (uint8_t *)malloc(numBytes);

    AVPacket packet;
    av_init_packet(&packet);
    packet.data = outbuf;
    packet.size = numBytes;

    ret = avcodec_encode_video2(m_codecContext, &packet, NULL, &got_packet_ptr);
    if (got_packet_ptr)
    {
        av_interleaved_write_frame(m_formatCtx, &packet);
    }

    av_free_packet(&packet);
    free(outbuf);
}

av_write_trailer(formatCtx);

avcodec_close(m_codecContext);
av_free(m_codecContext);
printf("\n");

I assume it might be tied to the PTS and DTS values, but I have tried EVERYTHING. The frame index seems to make the most sense. The images are correct, I can save them to files without any problems. I am running out of ideas. I would be incredibly thankful if there was someone out there who knew better than me...

Cheers, marikaner

UPDATE:

If this is of any help this is the output at the end of the video encoding:

[libx264 @ 0x7fffc00028a0] frame I:19    Avg QP:14.24  size:312420
[libx264 @ 0x7fffc00028a0] frame P:280   Avg QP:19.16  size:148867
[libx264 @ 0x7fffc00028a0] frame B:181   Avg QP:21.31  size: 40540
[libx264 @ 0x7fffc00028a0] consecutive B-frames: 24.6% 75.4%
[libx264 @ 0x7fffc00028a0] mb I  I16..4: 30.9% 45.5% 23.7%
[libx264 @ 0x7fffc00028a0] mb P  I16..4:  4.7%  9.1%  4.5%  P16..4: 23.5% 16.6% 12.6%  0.0%  0.0%    skip:28.9%
[libx264 @ 0x7fffc00028a0] mb B  I16..4:  0.6%  0.5%  0.3%  B16..8: 26.7% 11.0%  5.5%  direct: 3.9%  skip:51.5%  L0:39.4% L1:45.0% BI:15.6%
[libx264 @ 0x7fffc00028a0] final ratefactor: 19.21
[libx264 @ 0x7fffc00028a0] 8x8 transform intra:48.2% inter:47.3%
[libx264 @ 0x7fffc00028a0] coded y,uvDC,uvAC intra: 54.9% 53.1% 30.4% inter: 25.4% 13.5% 4.2%
[libx264 @ 0x7fffc00028a0] i16 v,h,dc,p: 41% 29% 11% 19%
[libx264 @ 0x7fffc00028a0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 16% 26% 31%  3%  4%  3%  7%  3%  6%
[libx264 @ 0x7fffc00028a0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 30% 26% 14%  4%  5%  4%  7%  4%  7%
[libx264 @ 0x7fffc00028a0] i8c dc,h,v,p: 58% 26% 13%  3%
[libx264 @ 0x7fffc00028a0] Weighted P-Frames: Y:17.1% UV:3.6%
[libx264 @ 0x7fffc00028a0] ref P L0: 63.1% 21.4% 11.4%  4.1%  0.1%    
[libx264 @ 0x7fffc00028a0] ref B L0: 85.7% 14.3%
[libx264 @ 0x7fffc00028a0] kb/s:27478.30

Answer

Hrishikesh_Pardeshi picture Hrishikesh_Pardeshi · Jul 24, 2013

Libav is probably delaying the processing of the initial frames. A good practice is to check for any delayed frames after you have finished processing all frames. This is done as follows:

int i=NUMBER_OF_FRAMES_PREVIOUSLY_ENCODED
for(; got_packet_ptr; i++)
   ret = avcodec_encode_video2(m_codecContext, &packet, NULL, &got_packet_ptr);
//Write the packets to a container after this.

The point is to pass a NULL pointer in place of the frame to be encoded and continue to do so until the packet you get is non-empty. See this link for the code example - the part under "get the delayed frames".

An easier way out would be to set the number of b frames to be 0.

m_codecContext->max_b_frames = 0;

Let me know if this works fine.

Also, you haven't used the libx264 API at all. You can make use of the libx264 APIs for encoding videos, they have a simpler and cleaner syntax. Plus it offers you more control over the settings and improved performance.

For writing the video stream to mkv container, you still will have to use the libav libraries. though.