Superimposing two videos onto a static image?

Question 1

Superimposing two videos onto a static image?

video ffmpeg video-processing video-encoding command-line-tool

Archagon · Nov 15, 2012 · Viewed 12k times · Source

Answer

Answer

Complex filtergraphs in ffmpeg may seem complicated at first, but it makes sense once you try it a few times. You need to be familiar with the filtergraph syntax. Start by reading Filtering Introduction and Filtergraph Description. You do not have to understand it completely but it will help you understand the following example.

Example

two videos over static image

Use the scale video filter to scale (resize) the inputs to a specific size, and then use the overlay video filter to place the videos over the static images.

ffmpeg -loop 1 -i background.png -i video1.mp4 -i video2.mp4 -filter_complex \
"[1:v]scale=(iw/2)-20:-1[a]; \
 [2:v]scale=(iw/2)-20:-1[b]; \
 [0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c]; \
 [c][b]overlay=main_w-overlay_w-10:(main_h/2)-(overlay_h/2)[video]" \
-map "[video]" output.mkv

What this means

Non-filtering options:

-loop 1 Continuously loop the next input which is background.png.
background.png The background image. The stream specifier is [0:v] It is sized 1280x720.
video1.mp4 This first video input (Big Buck Bunny in the example image). The stream specifier is [1:v]. It is sized 640x360.
video2.mp4 This second video input (the varmints in the example image). The stream specifier is [2:v]. It is sized 640x360.

Filtering options

-filter_complex The option to start the complex filtergraph.
[1:v]scale=(iw/2)-20:-1[a] This is taking video1.mp4, referred to as [1:v], and scaling it. iw is an alias for input width, and in this case it is a value of 640. We divide than in half and subtract an additional 20 pixels as padding so there will be space around each video when it is overlaid. -1 means to automatically calculate a value that will preserve aspect. If course you can omit the fanciness and manually provide values such as scale=320:240. Then use an output link label named [a] so we can refer to this output later.
[2:v]scale=(iw/2)-20:-1[b] Same as above, but use video2.mp4 as the input and name the output link label as [b].
[0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c] Use background.png as first overlay input, and use the results of our first scale filter, referred to as [a], as the second overlay input. Place [a] over [0:v]. main_h is an alias for main height which refers to the background input ([0:v]) height. overlay_h is an alias for overlay height and refers to the height of the foreground ([a]). This example will place Big Buck Bunny on the left side. shortest=1 will force the output to terminate when the shortest input terminates; otherwise it will loop forever since background.png is looping. Name the results of this filter [c].
[c][b]overlay=overlay_w*2:overlay_h:shortest=1[video] Use [c] as the first overlay input and [b] as the second overlay input. Using overlay parameters overlay_w and overlay_h (overlay input width and height). This example will place the verminy varmints on the right side. Label the output as [video].
-map "[video]" map the output from the filter to the output file. The [video] link label at the end of the filtergraph is not necessarily required but it is recommended to be explicit with mapping.

Audio

Have two separate audio streams

By default only the first input audio channel encountered will be used in the output as defined in Stream Selection. You can use the -map option to add an additional audio track from the second video input (the output will have two audio streams). This example will stream copy the audio instead of re-encoding:

ffmpeg -loop 1 -i background.png -i video1.mp4 -i video2.mp4 -filter_complex \
"[1:v]scale=(iw/2)-20:-1[a]; \
 [2:v]scale=(iw/2)-20:-1[b]; \
 [0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c]; \
 [c][b]overlay=main_w-overlay_w-10:(main_h/2)-(overlay_h/2)[video]" \
-map "[video]" -map 1:a -map 2:a -codec:a copy output.mkv

Combine both audio streams

Or combine both audio inputs into one using the amerge and pan audio filters (assuming both inputs are stereo and you want stereo output):

ffmpeg -loop 1 -i background.png -i video1.mp4 -i video2.mp4 -filter_complex \
"[1:v]scale=(iw/2)-20:-1[a]; \
 [2:v]scale=(iw/2)-20:-1[b]; \
 [0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c]; \
 [c][b]overlay=main_w-overlay_w-10:(main_h/2)-(overlay_h/2)[video]" \
 [1:a][2:a]amerge,pan=stereo:c0<c0+c2:c1<c1+c3[audio]" \
-map "[video]" -map "[audio]" output.mkv

Also see

Question 2

I have two videos that I'd like to combine into a single video, in which both videos would sit on top of a static background image. (Think something like this.) My requirements are that the software I use is free, that it runs on OSX, and that I don't have to re-encode my videos an excessive number of times. I'd also like to be able to perform this operation from the command line or via script, since I'll be doing it a lot. (But this isn't strictly necessary.)

I tried fiddling with ffmpeg for a couple of hours, but it just doesn't seem very well suited for post-processing. I could potentially hack something together via the overlay feature, but so far I haven't figured out how to do it, aside from pain-stakingly converting the image to a video (which takes 2x as long as the length of my videos!) and then superimposing the two videos onto it in another rendering step.

Any tips? Thank you!

Update:

Thanks to LordNeckbeard's help, I was able to achieve my desired result with a single ffmpeg call! Unfortunately, encoding is quite slow, taking 6 seconds to encode 1 second of video. I believe this is caused by the background image. Any tips on speeding up encoding? Here's the ffmpeg log:

MacBook-Pro:Video archagon$ ffmpeg -loop 1 -i underlay.png -i test-slide-video-short.flv -i test-speaker-video-short.flv -filter_complex "[1:0]scale=400:-1[a];[2:0]scale=320:-1[b];[0:0][a]overlay=0:0[c];[c][b]overlay=0:0" -shortest -t 5 -an output.mp4
ffmpeg version 1.0 Copyright (c) 2000-2012 the FFmpeg developers
  built on Nov 14 2012 16:18:58 with Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)
  configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid
  libavutil      51. 73.101 / 51. 73.101
  libavcodec     54. 59.100 / 54. 59.100
  libavformat    54. 29.104 / 54. 29.104
  libavdevice    54.  2.101 / 54.  2.101
  libavfilter     3. 17.100 /  3. 17.100
  libswscale      2.  1.101 /  2.  1.101
  libswresample   0. 15.100 /  0. 15.100
  libpostproc    52.  0.100 / 52.  0.100
Input #0, image2, from 'underlay.png':
  Duration: 00:00:00.04, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: png, rgb24, 1024x768, 25 fps, 25 tbr, 25 tbn, 25 tbc
Input #1, flv, from 'test-slide-video-short.flv':
  Metadata:
    author          : 
    copyright       : 
    description     : 
    keywords        : 
    rating          : 
    title           : 
    presetname      : Custom
    videodevice     : VGA2USB Pro V3U30343
    videokeyframe_frequency: 5
    canSeekToEnd    : false
    createdby       : FMS 3.5
    creationdate    : Mon Aug 16 16:35:34 2010
    encoder         : Lavf54.29.104
  Duration: 00:50:32.75, start: 0.000000, bitrate: 90 kb/s
    Stream #1:0: Video: vp6f, yuv420p, 640x480, 153 kb/s, 8 tbr, 1k tbn, 1k tbc
Input #2, flv, from 'test-speaker-video-short.flv':
  Metadata:
    author          : 
    copyright       : 
    description     : 
    keywords        : 
    rating          : 
    title           : 
    presetname      : Custom
    videodevice     : Microsoft DV Camera and VCR
    videokeyframe_frequency: 5
    audiodevice     : Microsoft DV Camera and VCR
    audiochannels   : 1
    audioinputvolume: 75
    canSeekToEnd    : false
    createdby       : FMS 3.5
    creationdate    : Mon Aug 16 16:35:34 2010
    encoder         : Lavf54.29.104
  Duration: 00:50:38.05, start: 0.000000, bitrate: 238 kb/s
    Stream #2:0: Video: vp6f, yuv420p, 320x240, 204 kb/s, 25 tbr, 1k tbn, 1k tbc
    Stream #2:1: Audio: mp3, 22050 Hz, mono, s16, 32 kb/s
File 'output.mp4' already exists. Overwrite ? [y/N] y
using cpu capabilities: none!
[libx264 @ 0x7fa84c02f200] profile High, level 3.1
[libx264 @ 0x7fa84c02f200] 264 - core 119 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
  Metadata:
    encoder         : Lavf54.29.104
    Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1024x768, q=-1--1, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 (png) -> overlay:main
  Stream #1:0 (vp6f) -> scale
  Stream #2:0 (vp6f) -> scale
  overlay -> Stream #0:0 (libx264)
Press [q] to stop, [?] for help

Update 2:

It works! One important tweak was to move the underlay.png input to the end of the input list. This increased performance substantially. Here's my final ffmpeg call. (The maps at the end aren't required for this particular arrangement, but I sometimes have a few extra audio inputs that I want to map to my output.)

ffmpeg
    -i VideoOne.flv
    -i VideoTwo.flv
    -loop 1 -i Underlay.png
    -filter_complex "[2:0] [0:0] overlay=20:main_h/2-overlay_h/2 [overlay];[overlay] [1:0] overlay=main_w-overlay_w-20:main_h/2-overlay_h/2 [output]"
    -map [output]:v
    -map 0:a
    OutputVideo.m4v

Superimposing two videos onto a static image?

Answer

Example

What this means

Non-filtering options:

Filtering options

Audio

Have two separate audio streams

Combine both audio streams

Also see

Related questions