How to fragment H264 Packets in RTP compliant with RFC3984

Pierluigi Cifani picture Pierluigi Cifani · Apr 1, 2011 · Viewed 12.9k times · Source

I have the FFMPEG streaming baseline h264 video, which I have to encapsulate in RTP and send to SIP phones for their decoding. I am using Linphone with the h264 plugin for Windows and Mirial for the decoding progress. However, sometimes I get a huge frame size (3Kb ~ 9Kb) from the FFMPEG, which obviously doesn't fit in the MTU.

If I send these frames "as is" and trusting IP fragmentation feature, some phones are able to play it well enough, but others choke and can't decode the stream. I think this is because the stream is not compliant with the RFC 3984 that specifies that packets that don't fit in the MTU have to be separated into different NALUs and mark the end of a Frame with the Mark feature of RTP.

How do I know where I can "cut" the I or P frame? I noticed that fragmented h264 packets (the ones without the Mark label) sometimes finish in 0xF8 but couldn't quite get a pattern and in the RFC 3984 which describes how to send these packets over RTP doesn't specify how to do it.

UPDATE: Does anyone know how to tell the X264 library how to generate NALUs of a Max Size? that way i should be able to avoid this problem. Thanks everyone

Answer

jesup picture jesup · Apr 7, 2011

As an author to RFC 3984bis (to be RFC 6184), it details exactly how to convert H.264 NALs into RFC 3984 packets. There are 3 modes: 0 (single-NAL), 1 (allows for fragmenting and combining NALs), and 2 (lets you fragment, combine, and interleave the transmission order to change how a burst loss will affect a stream, among other things). See SDP packetization-mode. Only mode 0 is required.

Mode 0 (Single-NAL) requires you either use UDP fragmentation (discouraged) or tell the encoder don't generate NALs larger than MTU-X. You should be able to tell the encoder this.

Mode 1 lets you fragment. See the RFC for how you set up an FU-A packet. The fragmentation info is on the front. You can also use STAPs to aggregate small NALs like SPS and PPS packets sent before IDRs (normally). Each packet requires normal RTP headers with incremented sequence numbers (but the same timestamp).

Mark on the last RTP packet of a frame (not of a fragment or NAL) is expected but you shouldn't count on it.