How can I place a still image before the first frame of a video?

Konstantin picture Konstantin · Jun 8, 2014 · Viewed 7.9k times · Source

When I encode videos by FFMpeg I would like to put a jpg image before the very first video frame, because when I embed the video on a webpage with "video" html5 tag, it shows the very first picture as a splash image. Alternatively I want to encode an image to an 1 frame video and concatenate it to my encoded video. I don't want to use the "poster" property of the "video" html5 element.

Answer

Timothy Gu picture Timothy Gu · Jun 9, 2014

You can use the concat filter to do that. The exact command depends on how long you want your splash screen to be. I am pretty sure you don't want an 1-frame splash screen, which is about 1/25 to 1/30 seconds, depending on the video ;)

The Answer

First, you need to get the frame rate of the video. Try ffmpeg -i INPUT and find the tbr value. E.g.

$ ffmpeg -i a.mkv
ffmpeg version N-62860-g9173602 Copyright (c) 2000-2014 the FFmpeg developers
  built on Apr 30 2014 21:42:15 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
[...]
Input #0, matroska,webm, from 'a.mkv':
  Metadata:
    ENCODER         : Lavf55.37.101
  Duration: 00:00:10.08, start: 0.080000, bitrate: 23 kb/s
    Stream #0:0: Video: h264 (High 4:4:4 Predictive), yuv444p, 320x240 [SAR 1:1 DAR 4:3], 25 fps, 25 tbr, 1k tbn, 50 tbc (default)
At least one output file must be specified

In the above example, it shows 25 tbr. Remember this number.

Second, you need to concatenate the image with the video. Try this command:

ffmpeg -loop 1 -framerate FPS -t SECONDS -i IMAGE \
       -t SECONDS -f lavfi -i aevalsrc=0 \
       -i INPUTVIDEO \
       -filter_complex '[0:0] [1:0] [2:0] [2:1] concat=n=2:v=1:a=1' \
       [OPTIONS] OUTPUT

If your video doesn't have audio, try this:

ffmpeg -loop 1 -framerate FPS -t SECONDS -i IMAGE \
       -i INPUTVIDEO \
       -filter_complex '[0:0] [1:0] concat=n=2:v=1:a=0' \
       [OPTIONS] OUTPUT

FPS = tbr value got from step 1

SECONDS = duration you want the image to be shown.

IMAGE = the image name

INPUTVIDEO = the original video name

[OPTIONS] = optional encoding parameters (such as -vcodec libx264 or -b:a 160k)

OUTPUT = the output video file name

How Does This Work?

Let's split the command line I used:

-loop 1 -framerate FPS -t SECONDS -i IMAGE: this basically means: open the image, and loop over it to make it a video with SECONDS seconds with FPS frames per second. The reason you need it to have the same FPS as the input video is because the concat filter we will use later has a restriction on it.

-t SECONDS -f lavfi -i aevalsrc=0: this means: generate silence for SECONDS (0 means silence). You need silence to fill up the time for the splash image. This isn't needed if the original video doesn't have audio.

-i INPUTVIDEO: open the video itself.

-filter_complex '[0:0] [1:0] [2:0] [2:1] concat=n=2:v=1:a=1': this is the best part. You open file 0 stream 0 (the image-video), file 1 stream 0 (the silence audio), file 2 streams 0 and 1 (the real input audio and video), and concatenate them together. The options n, v, and a mean that there are 2 segments, 1 output video, and 1 output audio.

[OPTIONS] OUTPUT: this just means to encode the video to the output file name. If you are using HTML5 streaming, you'd probably want to use -c:v libx264 -crf 23 -c:a libfdk_aac (or -c:a libfaac) -b:a 128k for H.264 video and AAC audio.

Further information