Can the ffmpeg av libs return an accurate PTS?

hobb0001 picture hobb0001 · Sep 18, 2008 · Viewed 10.8k times · Source

I'm working with an mpeg stream that uses a IBBP... GOP sequence. The (DTS,PTS) values returned for the first 4 AVPackets are as follows: I=(0,3) B=(1,1) B=(2,2) P=(3,6)

The PTS on the I frame looks like it is legit, but then the PTS on the B frames cannot be right, since the B frames shouldn't be displayed before the I frame as their PTS values indicate. I've also tried decoding the packets and using the pts value in the resulting AVFrame, put that PTS is always set to zero.

Is there any way to get an accurate PTS out of ffmpeg? If not, what's the best way to sync audio then?

Answer

hobb0001 picture hobb0001 · Sep 19, 2008

I think I finally figured out what's going on based on a comment made in http://www.dranger.com/ffmpeg/tutorial05.html:

ffmpeg reorders the packets so that the DTS of the packet being processed by avcodec_decode_video() will always be the same as the PTS of the frame it returns

Translation: If I feed a packet into avcodec_decode_video() that has a PTS of 12, avcodec_decode_video() will not return the decoded frame contained in that packet until I feed it a later packet that has a DTS of 12. If the packet's PTS is the same as its DTS, then the packet given is the same as the frame returned. If the packet's PTS is 2 frames later than its DTS, then avcodec_decode_video() will delay the frame and not return it until I provide 2 more packets.

Based on this behavior, I'm guessing that av_read_frame() is maybe reordering the packets from IPBB to IBBP so that avcodec_decode_video() only has to buffer the P frames for 3 frames instead of 5. For example, the difference between the input and the output of the P frame with this ordering is 3 (6 - 3):

|                  I B B P B B P
|             DTS: 0 1 2 3 4 5 6
| decode() result:       I B B P

vs. a difference of 5 with the standard ordering (6 - 1):

|                  I P B B P B B
|             DTS: 0 1 2 3 4 5 6
| decode() result:       I B B P

<shrug/> but that is pure conjecture.