Top "Avx" questions

Advanced Vector Extensions (AVX) is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD.

Why is this SSE code 6 times slower without VZEROUPPER on Skylake?

I've been trying to figure out a performance problem in an application and have finally narrowed it down to a …

performance x86 intel sse avx
Aligned and unaligned memory access with AVX/AVX2 intrinsics

According to Intel's Software Developer Manual (sec. 14.9), AVX relaxed the alignment requirements of memory accesses. If data is loaded directly …

gcc avx avx2
How can I exchange the low 128 bits and high 128 bits in a 256 bit AVX (YMM) register

I am porting SSE SIMD code to use the 256 bit AVX extensions and cannot seem to find any instruction that …

x86 simd avx
How to use AVX/pclmulqdq on Mac OS X

I am trying to compile a program that uses the pclmulqdq instruction present in new Intel processors. I've installed GCC 4.6 …

gcc assembly osx-lion macports avx
How to sum __m256 horizontally?

I would like to horizontally sum the components of a __m256 vector using AVX instructions. In SSE I could use _…

sse vectorization intrinsics avx
How to use the Intel AVX in Java?

How do I use the Intel AVX vector instruction set from Java? It's a simple question but the answer seems …

java simd avx
How to find the horizontal maximum in a 256-bit AVX vector

I have a __m256d vector packed with four 64-bit floating-point values. I need to find the horizontal maximum of …

x86 simd avx vector-processing avx2
Transpose an 8x8 float using AVX/AVX2

Transposing a 8x8 matrix can be achieved by making four 4x4 matrices, and transposing each of them. This is not …

simd avx avx2
How to get data out of AVX registers?

Using MSVC 2013 and AVX 1, I've got 8 floats in a register: __m256 foo = mm256_fmadd_ps(a,b,c); Now I …

c++ visual-c++ avx fma
How to force gcc to use all SSE (or AVX) registers?

I'm trying to write some computationally intensive code for Windows x64 target, with SSE or the new AVX instructions, compiling …

gcc 64-bit sse register-allocation avx