Top "Simd" questions

Single instruction, multiple data (SIMD) is the concept of having each instruction operate on a small chunk or vector of data elements.

How to compile Tensorflow with SSE4.2 and AVX instructions?

This is the message received from running a script to check if Tensorflow is working: I tensorflow/stream_executor/dso_…

tensorflow x86 compiler-optimization simd compiler-options
Do all 64 bit intel architectures support SSSE3/SSE4.1/SSE4.2 instructions?

I did searched on web and intel Software manual . But am unable to confirm if all Intel 64 architectures support upto …

x86-64 intel cpu-architecture simd
Header files for x86 SIMD intrinsics

Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible …

x86 header-files sse simd intrinsics
How to determine if memory is aligned?

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To …

c optimization memory sse simd
SSE intrinsic functions reference

Does anyone know of a reference listing the operation of the SSE intrinsic functions for gcc, i.e. the functions …

c++ c gcc sse simd
How to vectorize with gcc?

The v4 series of the gcc compiler can automatically vectorize loops using the SIMD processor on some modern CPUs, such …

gcc compiler-optimization simd auto-vectorization vector-processing
ARM Cortex-A8: Whats the difference between VFP and NEON

In ARM Cortex-A8 processor, I understand what NEON is, it is an SIMD co-processor. But is VFP(Vector Floating Point) …

arm simd neon cortex-a8
Fastest way to do horizontal SSE vector sum (or other reduction)

Given a vector of three (or four) floats. What is the fastest way to sum them? Is SSE (movaps, shuffle, …

assembly optimization floating-point sse simd
Implementation of __builtin_clz

What is the implementation of GCC's (4.6+) __builtin_clz? Does it correspond to some CPU instruction on Intel x86_64 (AVX)?

c gcc cpu simd
Parallel for vs omp simd: when to use each?

OpenMP 4.0 introduces a new construct called "omp simd". What is the benefit of using this construct over the old "parallel …

c++ c performance openmp simd