Live streaming through MP4

Ivo picture Ivo · Oct 22, 2012 · Viewed 41.2k times · Source

I am working on an online TV service. One of the goals is for the video to be played without any additional browser plug-ins (except for Flash).

I decided to use MP4, because it is supported by the majority of HTML5 browsers and by Flash (for fallback). The videos are transcoded from ASF on a server by FFMpeg.

However, I found that MP4 cannot be live-streamed because it has a moov atom for metadata that has to specify the length. FFMpeg cannot directly stream mp4 to stdout, because it puts the moov at the end of the file. ( Live transcoding and streaming of MP4 works in Android but fails in Flash player with NetStream.Play.FileStructureInvalid error )

Of course, MPEG-TS exists, but it is not supported by HTML5 <video>.

What I thought about is a method to transcode the stream in real-time to MP4, and on each new HTTP request for it, first send a moov that specifies a very long number for the video's length, and then start sending the rest of the MP4 file.

Is it possible to use MP4 for streaming that way?

After some research and av501's answer, I understand that the sizes of the frames must be known so that it can work.

Can the mp4 file be segmented into smaller parts so that it can be streamed?

Of course, switching to another container/format is an option, but the only format compatible with both Flash and HTML5 is mp4/h264, so if I have to support both, I'd have to transcode twice.

Answer

Sebastian Annies picture Sebastian Annies · Jan 28, 2013

You may use fragmented MP4. A fragmented MP4 file is built a follows:

moov [moof mdat]+

The moov box then only contains basic information about the tracks (how many, their type , codec initialization and so on) but no information about the samples in the track. The information about sample locations and sample sizes is in the moof box, each moof box is followed by a mdat that contains the samples as described in the preceding moof box. Typically one would choose the length of a (moof, mdat)-pair to be around 2,4 or 8 seconds (there is no specification on that but these values seem to be reasonable for most usecases).

This is a way to construct a neverending MP4 stream.