streaming H.264 over RTP with libavformat

Jacob Peddicord picture Jacob Peddicord · Apr 13, 2012 · Viewed 12.4k times · Source

I've been trying over the past week to implement H.264 streaming over RTP, using x264 as an encoder and libavformat to pack and send the stream. Problem is, as far as I can tell it's not working correctly.

Right now I'm just encoding random data (x264_picture_alloc) and extracting NAL frames from libx264. This is fairly simple:

x264_picture_t pic_out;
x264_nal_t* nals;
int num_nals;
int frame_size = x264_encoder_encode(this->encoder, &nals, &num_nals, this->pic_in, &pic_out);

if (frame_size <= 0)
{
    return frame_size;
}

// push NALs into the queue
for (int i = 0; i < num_nals; i++)
{
    // create a NAL storage unit
    NAL nal;
    nal.size = nals[i].i_payload;
    nal.payload = new uint8_t[nal.size];
    memcpy(nal.payload, nals[i].p_payload, nal.size);

    // push the storage into the NAL queue
    {
        // lock and push the NAL to the queue
        boost::mutex::scoped_lock lock(this->nal_lock);
        this->nal_queue.push(nal);
    }
}

nal_queue is used for safely passing frames over to a Streamer class which will then send the frames out. Right now it's not threaded, as I'm just testing to try to get this to work. Before encoding individual frames, I've made sure to initialize the encoder.

But I don't believe x264 is the issue, as I can see frame data in the NALs it returns back. Streaming the data is accomplished with libavformat, which is first initialized in a Streamer class:

Streamer::Streamer(Encoder* encoder, string rtp_address, int rtp_port, int width, int height, int fps, int bitrate)
{
    this->encoder = encoder;

    // initalize the AV context
    this->ctx = avformat_alloc_context();
    if (!this->ctx)
    {
        throw runtime_error("Couldn't initalize AVFormat output context");
    }

    // get the output format
    this->fmt = av_guess_format("rtp", NULL, NULL);
    if (!this->fmt)
    {
        throw runtime_error("Unsuitable output format");
    }
    this->ctx->oformat = this->fmt;

    // try to open the RTP stream
    snprintf(this->ctx->filename, sizeof(this->ctx->filename), "rtp://%s:%d", rtp_address.c_str(), rtp_port);
    if (url_fopen(&(this->ctx->pb), this->ctx->filename, URL_WRONLY) < 0)
    {
        throw runtime_error("Couldn't open RTP output stream");
    }

    // add an H.264 stream
    this->stream = av_new_stream(this->ctx, 1);
    if (!this->stream)
    {
        throw runtime_error("Couldn't allocate H.264 stream");
    }

    // initalize codec
    AVCodecContext* c = this->stream->codec;
    c->codec_id = CODEC_ID_H264;
    c->codec_type = AVMEDIA_TYPE_VIDEO;
    c->bit_rate = bitrate;
    c->width = width;
    c->height = height;
    c->time_base.den = fps;
    c->time_base.num = 1;

    // write the header
    av_write_header(this->ctx);
}

This is where things seem to go wrong. av_write_header above seems to do absolutely nothing; I've used wireshark to verify this. For reference, I use Streamer streamer(&enc, "10.89.6.3", 49990, 800, 600, 30, 40000); to initialize the Streamer instance, with enc being a reference to an Encoder object used to handle x264 previously.

Now when I want to stream out a NAL, I use this:

// grab a NAL
NAL nal = this->encoder->nal_pop();
cout << "NAL popped with size " << nal.size << endl;

// initalize a packet
AVPacket p;
av_init_packet(&p);
p.data = nal.payload;
p.size = nal.size;
p.stream_index = this->stream->index;

// send it out
av_write_frame(this->ctx, &p);

At this point, I can see RTP data appearing over the network, and it looks like the frames I've been sending, even including a little copyright blob from x264. But, no player I've used has been able to make any sense of the data. VLC quits wanting an SDP description, which apparently isn't required.

I then tried to play it through gst-launch:

gst-launch udpsrc port=49990 ! rtph264depay ! decodebin ! xvimagesink

This will sit waiting for UDP data, but when it is received, I get:

ERROR: element /GstPipeline:pipeline0/GstRtpH264Depay:rtph264depay0: No RTP format was negotiated. Additional debug info: gstbasertpdepayload.c(372): gst_base_rtp_depayload_chain (): /GstPipeline:pipeline0/GstRtpH264Depay:rtph264depay0: Input buffers need to have RTP caps set on them. This is usually achieved by setting the 'caps' property of the upstream source element (often udpsrc or appsrc), or by putting a capsfilter element before the depayloader and setting the 'caps' property on that. Also see http://cgit.freedesktop.org/gstreamer/gst-plugins-good/tree/gst/rtp/README

As I'm not using GStreamer to stream itself, I'm not quite sure what it means with RTP caps. But, it makes me wonder if I'm not sending enough information over RTP to describe the stream. I'm pretty new to video and I feel like there's some key thing I'm missing here. Any hints?

Answer

George Skoptsov picture George Skoptsov · Apr 13, 2012

h264 is an encoding standard. It specifies how video data is compressed and stored in a format that can be decompressed into a video stream at later point.

RTP is a transmission protocol. It specifies format and order of packets that can carry audio-video data that was encoded by an arbitrary encoder.

GStreamer expects to receive data that conforms to the RTP procotol. Is your expectation that libaformat will produce the RTP packets immediately readable by GStreamer warranted? Maybe GStreamers expect an additional stream description that would enable it to accept and decode the streamed packets using the proper decoder? Maybe it requires an additional RTSP exchange or the SDP stream descriptor file?

The error message states pretty clearly that an RTP format has not been negotiated. caps are short-hand for capabilities. Receiver needs to know transmitter's capabilities to set up the receiver/decoding machinery correctly.

I strongly suggest trying at least to create an SDP file for your RTP stream. libavformat should be able to do it for you.