How to make a live audio streaming website?

Sokco picture Sokco · Mar 21, 2016 · Viewed 8.6k times · Source

I've been wanting to make a live audio streaming service, sort of like twitch. Now before you say this is too difficult and should just use a service that's already out there, I would really like to know the nitty-gritty of how to actually do this from the ground up. I've done some research, but the results I've found have been very vague, or directed me to something like Wowza. I've seen some stuff about HTTP Live Streaming and I think I understand the general idea: a microphone/camera sends its feed to an encoder, the encoder sends the feed in m3u8 format to the server, and people stream the m3u8 file from the server to their device. But how do I actually go about doing this? What is the actual programming behind this? Is it necessary to use a service like Wowza or Red5?

Answer

Brad picture Brad · Mar 25, 2016

I've done some research, but the results I've found have been very vague

Unfortunately, you're asking some very vague questions, which is why you're getting vague answers. Let me take a stab at breaking down your questions into pieces. If you have questions on the specifics, you should post a separate specific question, and then link to it in the comments.

Is it necessary to use a service like Wowza or Red5?

These aren't services (well, Wowza offers some), but servers that handle streaming media. They take your source stream and effectively relay it out to all your listeners. Yes, you need a server of some sort to get your streaming media out to people over the internet, and no it doesn't need to be Wowza or Red5. There are many other ways to do this, depending on your specific needs.

Let's talk about a simpler method... HTTP progressive streaming. Your clients (web browsers, apps, internet radios, whatever) can play back an audio stream as they receive it. They don't know or care that it's live... all they know is that they made an HTTP request, have received enough data to being playback, and start playing it. They also don't know or care what the source was... whether it was files transcoded to the stream or someone talking into a microphone. It matters not. In this mode, an internet radio stream is basically like an audio file that never seems to end. If you look into SHOUTcast or Icecast, HTTP progressive is the protocol they speak.

For the encoder, the original audio has to come from somewhere, such as an audio capture device (microphone, mixer, etc.) or a bunch of audio files. The raw audio data (generally PCM) is encoded with a codec (such as MP3). The output of that codec is sent to the server, these days by an HTTP PUT request (if you're using Icecast... hacky other methods for SHOUTcast, and SOURCE for old Icecast). The server receives this data, keeps a small buffer of it, and sends a copy of it to clients that connect.

If you're streaming MP3, the server just sends the data right back out to the clients as it came in. Other container formats like Ogg require headers to be sent first, before the stream catches up. At that point, the server basically dynamically muxes the stream data into a container on the fly for each client. (Typically this is done by building the header, then splicing in the rest of the stream at the right point.)

HTTP progressive streaming is advantageous in that it works right out of the box in your browser, is compatible with devices old and new (my old Palm Pilot plays them just fine), and requires very little server resources.

I've seen some stuff about HTTP Live Streaming

HLS is one of the protocols available. Instead of a continually running stream like you get with HTTP progressive, records the codec output for a few seconds at a time, saves a chunk of data, and uploads it to the server. Clients can then download those chunks in-order and play them back. There's a bunch of overhead with this method, but there are some key reasons people choose it:

  • Clients can switch to a different stream at the segment breaks. If the client is streaming some HD video but then finds that it doesn't have the bandwidth to support it, it can start downloading SD video instead. The encoders are typically configured to provide chunks at a variety of bitrates. The container formats used with HLS support this sort of direct stream concatenation because the codec is basically informed to ensure the stream is spliceable at those point.

  • HLS requires no special server. You can just upload files to a web server over SFTP or whatever method you normally use. Nothing to install on top of what would normally be needed for a web page.

  • Since you're storing the data on the server, you automatically can support replay back in time, if the clients can handle it and you have the disk space.

  • CDN distribution. If you want to use something like Cloudfront in front of an S3 bucket, you can, and AWS doesn't have to support you in any different way than if you were distributing any other file.

A big negative against HLS though is client support. While HTTP progressive streaming has effectively been around since HTTP, HLS is newish and clients aren't very good at it. Browsers don't support it directly and require the usage of the MediaSource API and some craft JavaScript to handle the playback. Mobile apps relying on standard frameworks often run into trouble... Android 3.0 in particular had some really nasty HLS bugs. This is getting better as time goes on.

There is another similar protocol that I won't get into, but it's MPEG DASH. Segmentation is done similar to HLS, and it's rapidly eating up HLS's real world usage.

But how do I actually go about doing this? What is the actual programming behind this?

You'll have to break this problem down into pieces to decide what you want to achieve. Doing what, specifically? Do you want to make an encoder? Make a server?

I've been wanting to make a live audio streaming service, sort of like twitch.

For this, you don't need to invent any of the tech yourself. You can just assemble the pieces already out there. Let's assume "like Twitch" means the following:

  • User generated content
  • Few listeners for every user streaming
  • Some users streaming will have a lot of listeners
  • General load will be unpredictable
  • Everything needs to work in-browser

To do all this, I would say: - Don't host the streams on your own, use a CDN. - Use the MediaRecorder API for your encoding. (Not widely available yet, but will be soon.)

I'm running out of the character limit on this post... so I hope that gets you started. Please post specific questions beyond that.