CPU SIMD vs GPU SIMD?

carmellose picture carmellose · Dec 6, 2014 · Viewed 17.6k times · Source

GPU uses the SIMD paradigm, that is, the same portion of code will be executed in parallel, and applied to various elements of a data set.

However, CPU also uses SIMD, and provide instruction-level parallelism. For example, as far as I know, SSE-like instructions will process data elements with parallelism.

While the SIMD paradigm seems to be used differently in GPU and CPU, does GPUs have more SIMD power than CPUs?

In which way the parallel computational capabilities in a CPU are 'weaker' than the ones in a GPU?

Answer

Ben Adams picture Ben Adams · Jul 24, 2015

Both CPUs & GPUs provide SIMD with the most standard conceptual unit being 16 bytes/128 bits; for example a Vector of 4 floats (x,y,z,w).

Simplifying:

CPUs then parallelize more through pipelining future instructions so they proceed faster through a program. Then next step is multiple cores which run independent programs.

GPUs on the other hand parallelize by continuing the SIMD approach and executing the same program multiple times; both by pure SIMD where a set of programs execute in lock step (which is why branching is bad on a GPU, as both sides of an if statement must execute; and one result be thrown away so that the lock step programs proceed at the same rate); and also by single program, multiple data (SPMD) where groups of the sets of identical programs proceed in parallel but not necessarily in lock step.

The GPU approach is great where the exact same processing needs be applied to large volumes of data; for example a million vertices than need to be transformed in the same way, or many million pixels that need the processing to produce their colour. Assuming they don't become data block/pipeline stalled, GPUs programs general offer more predictable time bound execution due to its restrictions; which again is good for temporal parallelism e.g. the programs need to repeat their cycle at a certain rate for example 60 times a second (16ms) for 60 fps.

The CPU approach however is better for decisioning and performing multiple different tasks at the same time and dealing with changing inputs and requests.

Apart from its many other uses and purposes, the CPU is used to orchestrate work for the GPU to perform.