What's wrong with my use of timestamps/timebases for frame seeking/reading using libav (ffmpeg)?

Question 1

What's wrong with my use of timestamps/timebases for frame seeking/reading using libav (ffmpeg)?

video ffmpeg libav

mtree · Sep 16, 2013 · Viewed 8.9k times · Source

Answer

Answer

It's mostly like this:

the stream timebase is what you are really interested in. It's what the packet timestamps are in, and also pkt_pts on the output frame (since it's just copied from the corresponding packet).
the codec timebase is (if set at all) just the inverse of the framerate that might be written in the codec-level headers. It can be useful in cases where there is no container timing information (e.g. when you're reading raw video), but otherwise can be safely ignored.
AVFrame.pkt_pts is the timestamp of the packet that got decoded into this frame. As already said, it's just a straight copy from the packet, so it's in the stream timebase. This is the field you want to use (if the container has timestamps).
AVFrame.pts is not ever set to anything useful when decoding, ignore it (it might replace pkt_pts in the future, to make the whole mess less confusing, but for now it's like this, for historical reasons mostly).
the format context's duration is in AV_TIME_BASE (i.e. microseconds). It cannot be in any stream timebase, since you can have three bazillion streams, each with its own timebase.
the problem you see with getting a different timestamp after seeking is simply that seeking is not accurate. In most cases you can only seek to closest keyframe, so it's common to be a couple seconds off. Decoding and discarding the frames you don't need must be done manually.

Question 2

So I want to grab a frame from a video at a specific time using libav for the use as a thumbnail.

What I'm using is the following code. It compiles and works fine (in regards to retrieving a picture at all), yet I'm having a hard time getting it to retrieve the right picture.

I simply can't get my head around the all but clear logic behind libav's apparent use of multiple time-bases per video. Specifically figuring out which functions expect/return which type of time-base.

The docs were of basically no help whatsoever, unfortunately. SO to the rescue?

#define ABORT(x) do {fprintf(stderr, x); exit(1);} while(0)

av_register_all();

AVFormatContext *format_context = ...;
AVCodec *codec = ...;
AVStream *stream = ...;
AVCodecContext *codec_context = ...;
int stream_index = ...;

// open codec_context, etc.

AVRational stream_time_base = stream->time_base;
AVRational codec_time_base = codec_context->time_base;

printf("stream_time_base: %d / %d = %.5f\n", stream_time_base.num, stream_time_base.den, av_q2d(stream_time_base));
printf("codec_time_base: %d / %d = %.5f\n\n", codec_time_base.num, codec_time_base.den, av_q2d(codec_time_base));

AVFrame *frame = avcodec_alloc_frame();

printf("duration: %lld @ %d/sec (%.2f sec)\n", format_context->duration, AV_TIME_BASE, (double)format_context->duration / AV_TIME_BASE);
printf("duration: %lld @ %d/sec (stream time base)\n\n", format_context->duration / AV_TIME_BASE * stream_time_base.den, stream_time_base.den);
printf("duration: %lld @ %d/sec (codec time base)\n", format_context->duration / AV_TIME_BASE * codec_time_base.den, codec_time_base.den);

double request_time = 10.0; // 10 seconds. Video's total duration is ~20sec
int64_t request_timestamp = request_time / av_q2d(stream_time_base);
printf("requested: %.2f (sec)\t-> %2lld (pts)\n", request_time, request_timestamp);

av_seek_frame(format_context, stream_index, request_timestamp, 0);

AVPacket packet;
int frame_finished;
do {
    if (av_read_frame(format_context, &packet) < 0) {
        break;
    } else if (packet.stream_index != stream_index) {
        av_free_packet(&packet);
        continue;
    }
    avcodec_decode_video2(codec_context, frame, &frame_finished, &packet);
} while (!frame_finished);

// do something with frame

int64_t received_timestamp = frame->pkt_pts;
double received_time = received_timestamp * av_q2d(stream_time_base);
printf("received:  %.2f (sec)\t-> %2lld (pts)\n\n", received_time, received_timestamp);

Running this with a test movie file I get this output:

    stream_time_base: 1 / 30000 = 0.00003
    codec_time_base: 50 / 2997 = 0.01668

    duration: 20062041 @ 1000000/sec (20.06 sec)
    duration: 600000 @ 30000/sec (stream time base)
    duration: 59940 @ 2997/sec (codec time base)

    requested: 10.00 (sec)  -> 300000 (pts)
    received:  0.07 (sec)   -> 2002 (pts)

The times don't match. What's going on here? What am I doing wrong?

While searching for clues I stumbled upon this this statement from the libav-users mailing list…

[...] packet PTS/DTS are in units of the format context's time_base,
where the AVFrame->pts value is in units of the codec context's time_base.

In other words, the container can have (and usually does) a different time_base than the codec. Most libav players don't bother using the codec's time_base or pts since not all codecs have one, but most containers do. (This is why the dranger tutorial says to ignore AVFrame->pts)

…which confused me even more, given that I couldn't find any such mention in the official docs.

Anyway, I replaced…

double received_time = received_timestamp * av_q2d(stream_time_base);

…with…

double received_time = received_timestamp * av_q2d(codec_time_base);

…and the output changed to this…

...

requested: 10.00 (sec)  -> 300000 (pts)
received:  33.40 (sec)  -> 2002 (pts)

Still no match. What's wrong?

What's wrong with my use of timestamps/timebases for frame seeking/reading using libav (ffmpeg)?

Answer

Related questions