Top "Sse" questions

SSE (Streaming SIMD Extensions) was the first of many similarly-named vector extensions to the x86 instruction set.

How to compare two vectors using SIMD and get a single boolean result?

I have two vectors of 4 integers each and I'd like to use a SIMD command to compare them (say generate …

assembly x86 sse simd
How to sum __m256 horizontally?

I would like to horizontally sum the components of a __m256 vector using AVX instructions. In SSE I could use _…

sse vectorization intrinsics avx
SIMD prefix sum on Intel cpu

I need to implement a prefix sum algorithm and would need it to be as fast as possible. Ex: [3, 1, 7, 0, 4, 1, 6, 3] should …

c++ sse simd prefix-sum
Is it fair to compare SSE/AVX units to GPU cores?

I have a presentation to make to people who have (almost) no clue of how a GPU works. I think …

cuda hardware opencl gpu sse
SSE instructions: which CPUs can do atomic 16B memory operations?

Consider a single memory access (a single read or a single write, not read+write) SSE instruction on an x86 …

concurrency x86 thread-safety atomic sse
Is the SSE unaligned load intrinsic any slower than the aligned load intrinsic on x64_64 Intel CPUs?

I'm considering changing some code high performance code that currently requires 16 byte aligned arrays and uses _mm_load_ps to …

c performance sse
Fastest Implementation of the Natural Exponential Function Using SSE

I'm looking for an approximation of the natural exponential function operating on SSE element. Namely - __m128 exp( __m128 x ). …

c optimization vectorization sse simd
SSE: Difference between _mm_load/store vs. using direct pointer access

Suppose I want to add two buffers and store the result. Both buffers are already allocated 16byte aligned. I found …

x86 sse simd
SIMD the following code

How do I SIMIDize the following code in C (using SIMD intrinsics of course)? I am having trouble understanding SIMD …

c x86 sse simd
Benefits of x87 over SSE

I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and …

x86 x86-64 sse fpu x87