Top "Sse" questions

SSE (Streaming SIMD Extensions) was the first of many similarly-named vector extensions to the x86 instruction set.

Add a constant value to a xmm register in x86

How would I add 1 or 2 to the register xmm0 (double)? I can do it like this, but sure there must …

assembly x86 sse x87
How to allocate 16byte memory aligned data

I am trying to implement SSE vectorization on a piece of code for which I need my 1D array to …

c memory sse icc
best cross-platform method to get aligned memory

Here is the code I normally use to get aligned memory with Visual Studio and GCC inline void* aligned_malloc(…

c++ c performance sse memory-alignment
SSE integer division?

There is _mm_div_ps for floating-point values division, there is _mm_mullo_epi16 for integer multiplication. But is there …

c++ sse
AVX2 what is the most efficient way to pack left based on a mask?

If you have an input array, and an output array, but you only want to write those elements which pass …

c++ vectorization sse simd avx2
Is it possible to cast floats directly to __m128 if they are 16 byte aligned?

Is it safe/possible/advisable to cast floats directly to __m128 if they are 16 byte aligned? I noticed using _mm_…

c++ c alignment sse intrinsics
SIMD and difference between packed and scalar double precision

I am reading Intel's intrinsics guide while implementing SIMD support. I have a few confusions and my questions are as …

c++ x86 sse simd intrinsics
How to absolute 2 double or 4 floats using SSE instruction set? (Up to SSE4)

Here's the sample C code that I am trying to accelerate using SSE, the two arrays are 3072 element long with …

gcc sse
Efficient 4x4 matrix vector multiplication with SSE: horizontal add and dot product - what's the point?

I am trying to find the most efficient implementation of 4x4 matrix (M) multiplication with a vector (u) using SSE. …

c performance optimization sse matrix-multiplication
Push XMM register to the stack

Is there a way of pushing a packed doubleword integer from XMM register to the stack? and then later on …

assembly x86 simd sse