I am considering to use VP9 to encode my BluRays in the future, since its an open source codec. But I cannot get Handbrake or ffmpeg use more than 50% (4) of my (8) cores. The encoding time is therefore much worse than x264/5 which uses all cores.
In Handbrake I just set encoder to VP9 and CQ19. There is no difference if I add threads 8
, threads 16
or threads 64
in the parameters field.
Testing ffmpeg in the command line (-c:v libvpx-vp9 -crf 19 -threads 16 -tile-columns 6 -frame-parallel 1 -speed 0
) also does not use any more cpu threads.
Is the current encoder not capable of encoding on more than 4 threads or am I doing something wrong?
Libvpx uses tile threading, which means you can at most have as many threads as the number of tiles. The -tile-columns
option is in log2 format (so -tile-columns 6
means 64 tiles), but is also limited by the framesize. The exact details are here, it basically means that max_tiles = max(1, exp2(floor(log2(sb_cols)) - 2))
, where sb_cols = ceil(width / 64.0)
. You can write a small script to calculate the number of tiles for a given horizontal resolution:
Width: 320 (sb_cols: 5), min tiles: 1, max tiles: 1
Width: 640 (sb_cols: 10), min tiles: 1, max tiles: 2
Width: 1280 (sb_cols: 20), min tiles: 1, max tiles: 4
Width: 1920 (sb_cols: 30), min tiles: 1, max tiles: 4
Width: 3840 (sb_cols: 60), min tiles: 1, max tiles: 8
So even for 1080p (1920 horizontal pixels), you only get 4 tiles max, so 4 threads max, i.e. a bitstream limitation. To get 8 tiles, you need at least a width of 1985 pixels (2048-64+1, which gives sb_cols=32). To get more threads than the max. number of tiles at a given resolution, you need frame-level multithreading, which libvpx doesn't implement. Other encoders, like x265/x264, do implement this.
As some people in comments and below have already commented, more recent versions of libvpx support -row-mt 1
to enable tile row multi-threading. This can increase the number of tiles by up to 4x in VP9 (since the max number of tile rows is 4, regardless of video height). To enable this, use -tile-rows N
where N is the number of tile rows in log2 units (so -tile-rows 1
means 2 tile rows and -tile-rows 2
means 4 tile rows). The total number of active threads will then be equal to $tile_rows * $tile_columns
.