how to stream live videos with no latency (ffplay, mplayer) and what kind of wrapper could be used with ffplay?

user573014 picture user573014 · Jan 19, 2014 · Viewed 30.5k times · Source

I have been testing playing multiple live streams using different players because I wanted to get the lowest latency value. I tried gstreamer player (gst-launch-0.01), mplayer, totem and ffmpeg player (ffplay). I used different configuration values to get the lowest latency for each of them for example:

ffplay -fflags nobuffer 
mplayer -benchmark

The protocol I am streaming with is udp and I am getting a better values with ffplay than mplayer or gst-launch. To be honest, I don't know what kind of configuration I need to do it the gstreamer to get a lower latency. Now, what I need is two things:

  1. I would like to know if someone has a better suggestion about streaming a live stream with lower latency < 100 ms. I am now getting higher than 100 ms which is not really efficient for me.

  2. Since I am using ffplay currently, because it is the best so far. I would like to do a simple gui with a play and record button and 3 screens to stream from different video servers, I just don't know what kind of wrapper (which should be really fast) to use!

Answer

Wil picture Wil · Jun 10, 2015

Well, for a really low latency streaming scenario, you could try NTSC. Its latency can be under 63us (microseconds) ideally.

For digital streaming with quality approaching NTSC and a 40ms latency budget see rsaxvc's answer at 120hz. If you need Over The Air streaming, this is the best low-latency option I've seen and it's very well thought out and the resolution will scale with hardware capability.

If you mean digital streaming and you want good compression ratios, ie 1080p over wifi, then you are out of luck if you want less than 100ms of latency with today's commodity hardware, because in order for a compression algorithm to give a good compression ratio, it needs a lot of context. For example Mpeg 1 used 12 frames in an ipbbpbbpbbpb GOP (group of pictures) arrangement where i is an 'intra' frame which is effectively a jpeg still, a p is a predictive frame which encodes some movements between i and p frames, and b frames encode some spot fixups where the prediction didn't work very well. Anyhow, 12 frames even at 60fps is still 200ms, so that's 200ms just to capture the data, then some time to encode it, then some time to transmit it, then some time to decode it, then some time to buffer the audio so the soundcard doesn't run out of data while the CPU is sending a new block to the DMA memory region, and at the same time 2-3 frames of video need to be queued up to send to the video display in order to prevent tearing on a digital display. So really there's a minimum of 15 frames or 250ms, plus latency incurred in transmission. NTSC doesn't have such latencies because it's transmitted analog with the only 'compression' being two sneaky tricks: interlacing where only half of each frame is transmitted each time as alternate rows, even on one frame, odd on the next, and then the second trick is colour space compression by using 3 black and white pixels plus its phase discrimination to determine what colour is shown, so colour is transmitted at 1/3 the bandwidth of the brightness (luma) signal. Cool eh? And I guess you could say that the audio has a sort of 'compression' as well in that automatic gain control could be used to make a 20dB analog audio signal appear to provide closer to a 60dB experience by blasting our ears out of our heads at commercials due to the AGC jacking up the volume during the 2-3 seconds of silence between the show and the commercial. Later when we got higher fidelity audio circuits, commercials were actually broadcast louder than shows, but that was just their way of providing the same impact as the older TVs had given the advertisers.

This walk down memory lane brought to you by Nostalgia (tm). Buy Nostalgia brand toilet soap! ;-)

Here's the best I've achieved under Ubuntu 18.04 with stock ffmpeg and mpv. This requires a 3rd gen Intel Core processor or later. See ffmpeg site for directions to use NVidia hardware coding instead.

ffmpeg -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
  -vaapi_device /dev/dri/renderD128 \
  -vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
  -c:v h264_vaapi -qp:v 26 -bf 0 -tune zerolatency -f mpegts \
  udp://$HOST_IP:12345

And then on the Media box:

mpv --no-cache --untimed --no-demuxer-thread --video-sync=audio \
  --vd-lavc-threads=1 udp://$HOST_IP:12345 

This achieves about 250ms latency for 1080p@60hz at around 3Mbps, which is ok for streaming shows over wifi. mpv can adjust for lip sync (CTRL +- during play). It's tolerable for streaming desktop mouse/keyboard interactions for media control, but it's unusable for real-time gaming (see NVidia Shield, Google Stadia for remote gaming)

One other thing: LCD/OLED/Plasma TVs, and some LCD monitors have Frame Interpolation, either via de-interlacing, or via SmoothVision (the "Soap Opera Effect"). This processing adds input lag. You can usually turn it off in the display's settings, or by connecting to the "PC" or "Console" input port if the display has a port marked that way. Some displays have a way to rename the inputs. In that case, selecting "PC" or "Console" may reduce the input lag, but you may notice colour banding, flickering, etc as a result of the extra processing being turned off.

CRT monitors have effectively zero input lag. But you'll get baked with ionizing radiation. Pick your poison.