Top "Simd" questions

Single instruction, multiple data (SIMD) is the concept of having each instruction operate on a small chunk or vector of data elements.

Fast counting the number of set bits in __m128i register

I should count the number of set bits of a __m128i register. In particular, I should write two functions …

c sse simd sse2 hammingweight
SSE multiplication 16 x uint8_t

I want to multiply with SSE4 a __m128i object with 16 unsigned 8 bit integers, but I could only find an …

x86 sse simd sse4
Using SSE instructions with gcc without inline assembly

I am interested in using the SSE vector instructions of x86-64 with gcc and don't want to use any …

c x86-64 sse simd intrinsics
Fast Vector Math in .NET - What are the options?

My 3D graphics software, written in C# using SlimDX, does a lot of vector operations on the CPU. (In this …

c# .net sse simd slimdx
Why is strcmp not SIMD optimized?

I've tried to compile this program on an x64 computer: #include <cstring> int main(int argc, char* argv[]) { …

c++ sse simd strcmp sse2
How can I exchange the low 128 bits and high 128 bits in a 256 bit AVX (YMM) register

I am porting SSE SIMD code to use the 256 bit AVX extensions and cannot seem to find any instruction that …

x86 simd avx
How to compare two vectors using SIMD and get a single boolean result?

I have two vectors of 4 integers each and I'd like to use a SIMD command to compare them (say generate …

assembly x86 sse simd
How to use the Intel AVX in Java?

How do I use the Intel AVX vector instruction set from Java? It's a simple question but the answer seems …

java simd avx
How to find the horizontal maximum in a 256-bit AVX vector

I have a __m256d vector packed with four 64-bit floating-point values. I need to find the horizontal maximum of …

x86 simd avx vector-processing avx2
Transpose an 8x8 float using AVX/AVX2

Transposing a 8x8 matrix can be achieved by making four 4x4 matrices, and transposing each of them. This is not …

simd avx avx2