Understanding PTS and DTS in video frames

theateist picture theateist · Nov 28, 2012 · Viewed 40.2k times · Source

I had fps issues when transcoding from avi to mp4(x264). Eventually the problem was in PTS and DTS values, so lines 12-15 where added before av_interleaved_write_frame function:

1.  AVFormatContext* outContainer = NULL;
2.  avformat_alloc_output_context2(&outContainer, NULL, "mp4", "c:\\test.mp4";
3.  AVCodec *encoder = avcodec_find_encoder(AV_CODEC_ID_H264);
4.  AVStream *outStream = avformat_new_stream(outContainer, encoder);
5.  // outStream->codec initiation
6.  // ...
7.  avformat_write_header(outContainer, NULL);

8.  // reading and decoding packet
9.  // ...
10. avcodec_encode_video2(outStream->codec, &encodedPacket, decodedFrame, &got_frame)
11. 
12. if (encodedPacket.pts != AV_NOPTS_VALUE)
13.     encodedPacket.pts =  av_rescale_q(encodedPacket.pts, outStream->codec->time_base, outStream->time_base);
14. if (encodedPacket.dts != AV_NOPTS_VALUE)
15.     encodedPacket.dts = av_rescale_q(encodedPacket.dts, outStream->codec->time_base, outStream->time_base);
16. 
17. av_interleaved_write_frame(outContainer, &encodedPacket)

After reading many posts I still do not understand:

  1. outStream->codec->time_base = 1/25 and outStream->time_base = 1/12800. The 1st one was set by me but I cannot figure out why and who set 12800? I noticed that before line (7) outStream->time_base = 1/90000 and right after it it changes to 1/12800, why? When I transcode from avi to avi, meaning changing the line (2) to avformat_alloc_output_context2(&outContainer, NULL, "avi", "c:\\test.avi"; , so before and after line (7) outStream->time_base remains always 1/25 and not like in mp4 case, why?
  2. What is the difference between time_base of outStream->codec and outStream?
  3. To calc the pts av_rescale_q does: takes 2 time_base, multiplies their fractions in cross and then compute the pts. Why it does this in this way? As I debugged, the encodedPacket.pts has value incremental by 1, so why changing it if it does has value?
  4. At the beginning the dts value is -2 and after each rescaling it still has negative number, but despite this the video played correctly! Shouldn't it be positive?

Answer

Alex I picture Alex I · Nov 30, 2012
  1. The time_base is just a unit of measurement. Different units may be used to represent the same times (approximately, if they are not exact multiples). In some cases a container format requires a certain time base and it will be set to that by the muxer. In other cases the container doesn't require a time base but it has a default that you might have to override. I'm not sure about 1/12800 specifically, I know 1/600 is a special value in mp4 spec.

  2. The two time bases are the units of measurement of time for the codec and for the container. If using constant fps, the codec unit of measurement is commonly set to the interval between each frame and the next (the duration that each frame gets displayed), so that frame times are successive integers. It doesn't have to be set to 1/fps, however, as long as the pts times are correct in whatever units are used.

  3. What you describe is simply what one would have to do to convert from one unit to another. (ie: multiply by old unit, divide by new). A time t in units of a/b can be converted to units c/d as t*(a*d)/(b*c).

  4. The dts sequence can start from any value, there is no special significance to dts 0. At start of playback, the difference between wall clock time and the starting dts is computed, and all future dts are converted to wall clock using that. A video stream with dts=-10, -9, -8, ... is perfectly ok. The difference between successive dts is what is used, the absolute values don't matter.