Media Source Extension (MSE) needs fragmented mp4 for playback in the browser.
A fragmented MP4 contains a series of segments which can be requested individually if your server supports byte-range requests.
All MP4 files use an object oriented format that contains boxes aka atoms.
You can view a representation of the boxes in your MP4 using an online tool such as MP4 Parser or if you're using Windows, MP4 Explorer. Let's compare a normal MP4 with one that is fragmented:
This screenshot (from MP4 Parser) shows an MP4 that hasn't been fragmented and quite simply has one massive mdat
(Movie Data) box.
If we were building a video player that supports adaptive bitrate, we might need to know the byte position of the 10 sec mark in a 0.5Mbps and a 1Mbps file in order to switch the video source between the two files at that moment. Determining this exact byte position within one massive mdat
in each respective file is not trivial.
This screenshot shows a fragmented MP4 which has been segmented using MP4Box with the onDemand
profile.
You'll notice the sidx
and series of moof
+mdat
boxes. The sidx
is the Segment Index and stores meta data of the precise byte range locations of the moof
+mdat
segments.
Essentially, you can independently load the sidx
(its byte-range will be defined in the accompanying .mpd
Media Presentation Descriptor file) and then choose which segments you'd like to subsequently load and add to the MSE SourceBuffer.
Importantly, each segment is created at a regular interval of your choosing (ie. every 5 seconds), so the segments can have temporal alignment across files of different bitrates, making it easy to adapt the bitrate during playback.