Popular "sse" questions | Page 7

I have two vectors of 4 integers each and I'd like to use a SIMD command to compare them (say generate …

assembly x86 sse simd

I would like to horizontally sum the components of a __m256 vector using AVX instructions. In SSE I could use _…

sse vectorization intrinsics avx

I need to implement a prefix sum algorithm and would need it to be as fast as possible. Ex: [3, 1, 7, 0, 4, 1, 6, 3] should …

c++ sse simd prefix-sum

I have a presentation to make to people who have (almost) no clue of how a GPU works. I think …

cuda hardware opencl gpu sse

Consider a single memory access (a single read or a single write, not read+write) SSE instruction on an x86 …

concurrency x86 thread-safety atomic sse

I'm considering changing some code high performance code that currently requires 16 byte aligned arrays and uses _mm_load_ps to …

c performance sse

I'm looking for an approximation of the natural exponential function operating on SSE element. Namely - __m128 exp( __m128 x ). …

c optimization vectorization sse simd

Suppose I want to add two buffers and store the result. Both buffers are already allocated 16byte aligned. I found …

x86 sse simd

How do I SIMIDize the following code in C (using SIMD intrinsics of course)? I am having trouble understanding SIMD …

c x86 sse simd

I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and …

x86 x86-64 sse fpu x87

Top "Sse" questions