Problem to Decode H264 video over RTP with ffmpeg (libavcodec)

bben picture bben · Aug 16, 2010 · Viewed 31.4k times · Source

I set profile_idc, level_idc, extradata et extradata_size of AvCodecContext with the profile-level-id et sprop-parameter-set of the SDP.

I separate the decoding of Coded Slice, SPS, PPS and NAL_IDR_SLICE packet :

Init:

uint8_t start_sequence[]= {0, 0, 1}; int size= recv(id_de_la_socket,(char*) rtpReceive,65535,0);

Coded Slice :

char *z = new char[size-16+sizeof(start_sequence)];
    memcpy(z,&start_sequence,sizeof(start_sequence));
    memcpy(z+sizeof(start_sequence),rtpReceive+16,size-16);
    ConsumedBytes = avcodec_decode_video(codecContext,pFrame,&GotPicture,(uint8_t*)z,size-16+sizeof(start_sequence));
    delete z;

Result: ConsumedBytes >0 and GotPicture >0 (often)

SPS and PPS :

identical code. Result: ConsumedBytes >0 and GotPicture =0

It's normal I think

When I find a new couple SPS/PPS, I update extradata and extrada_size with the payloads of this packet and their size.

NAL_IDR_SLICE :

The Nal unit type is 28 =>idr Frame are fragmented therefor I tryed two method to decode

1) I prefix the first fragment (without RTP header) with the sequence 0x000001 and send it to avcodec_decode_video. Then I send the rest of fragments to this function.

2) I prefix the first fragment (without RTP header) with the sequence 0x000001 and concatenate the rest of fragments to it. I send this buffer to decoder.

In both cases, I have no error (ConsumedBytes >0) but I detect no frame (GotPicture = 0) ...

What is the problem ?

Answer

Cipi picture Cipi · Aug 17, 2010

In RTP all H264 I-Frames (IDRs) are usualy fragmented. When you receive RTP you first must skip the header (usualy first 12 bytes) and then get to the NAL unit (first payload byte). If the NAL is 28 (1C) then it means that following payload represents one H264 IDR (I-Frame) fragment and that you need to collect all of them to reconstruct H264 IDR (I-Frame).

Fragmentation occurs because of the limited MTU, and much larger IDR. One fragment can look like this:

Fragment that has START BIT = 1:

First byte:  [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS] 
Second byte: [ START BIT | END BIT | RESERVED BIT | 5 NAL UNIT BITS] 
Other bytes: [... IDR FRAGMENT DATA...]

Other fragments:

First byte:  [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS]  
Other bytes: [... IDR FRAGMENT DATA...]

To reconstruct IDR you must collect this info:

int fragment_type = Data[0] & 0x1F;
int nal_type = Data[1] & 0x1F;
int start_bit = Data[1] & 0x80;
int end_bit = Data[1] & 0x40;

If fragment_type == 28 then payload following it is one fragment of IDR. Next check is start_bit set, if it is, then that fragment is the first one in a sequence. You use it to reconstruct IDR's NAL byte by taking the first 3 bits from first payload byte (3 NAL UNIT BITS) and combine them with last 5 bits from second payload byte (5 NAL UNIT BITS) so you would get a byte like this [3 NAL UNIT BITS | 5 NAL UNIT BITS]. Then write that NAL byte first into a clear buffer with all other following bytes from that fragment. Remember to skip first byte in a sequence since it is not a part of IDR, but only identifies the fragment.

If start_bit and end_bit are 0 then just write the payload (skipping first payload byte that identifies the fragment) to the buffer.

If start_bit is 0 and end_bit is 1, that means that it is the last fragment, and you just write its payload (skipping the first byte that identifies the fragment) to the buffer, and now you have your IDR reconstructed.

If you need some code, just ask in comment, I'll post it, but I think this is pretty clear how to do... =)

CONCERNING THE DECODING

It crossed my mind today why you get error on decoding the IDR (I presumed that you have reconstructed it good). How are you building your AVC Decoder Configuration Record? Does the lib that you use have that automated? If not, and you havent heard of this, continue reading...

AVCDCR is specified to allow decoders to quickly parse all the data they need to decode H264 (AVC) video stream. And the data is following:

  • ProfileIDC
  • ProfileIOP
  • LevelIDC
  • SPS (Sequence Parameter Sets)
  • PPS (Picture Parameter Sets)

All this data is sent in RTSP session in SDP under the fields: profile-level-id and sprop-parameter-sets.

DECODING PROFILE-LEVEL-ID

Prifile level ID string is divided into 3 substrings, each 2 characters long:

[PROFILE IDC][PROFILE IOP][LEVEL IDC]

Each substring represents one byte in base16! So, if Profile IDC is 28, that means it is actualy 40 in base10. Later you will use base10 values to construct AVC Decoder Configuration Record.

DECODING SPROP-PARAMETER-SETS

Sprops are usualy 2 strings (could be more) that are comma separated, and base64 encoded! You can decode both of them but there is no need to. Your job here is just to convert them from base64 string into byte array for later use. Now you have 2 byte arrays, first array us SPS, second one is PPS.

BUILDING THE AVCDCR

Now, you have all you need to build AVCDCR, you start by making new clean buffer, now write these things in it in the order explained here:

1 - Byte that has value 1 and represents version

2 - Profile IDC byte

3 - Prifile IOP byte

4 - Level IDC byte

5 - Byte with value 0xFF (google the AVC Decoder Configuration Record to see what this is)

6 - Byte with value 0xE1

7 - Short with value of the SPS array length

8 - SPS byte array

9 - Byte with the number of PPS arrays (you could have more of them in sprop-parameter-set)

10 - Short with the length of following PPS array

11 - PPS array

DECODING VIDEO STREAM

Now you have byte array that tells the decoder how to decode H264 video stream. I believe that you need this if your lib doesn't build it itself from SDP...