A common operation I do in my program is scaling vectors by a scalar (V*s, e.g. [1,2,3,4]*2 == [2,4,6,8]). Is there a SSE (or AVX) instruction to do this, other than first loading the scalar in every position in a vector (e.g. _mm_set_ps(2,2,2,2)) and then multiplying?
This is what I do now:
__m128 _scalar = _mm_set_ps(s,s,s,s);
__m128 _result = _mm_mul_ps(_vector, _scalar);
I'm looking for something like...
__m128 _result = _mm_scale_ps(_vector, s);
Depending on your compiler you may be able to improve the code generation a little by using _mm_set1_ps
const __m128 scalar = _mm_set1_ps(s);
__m128 result = _mm_mul_ps(vector, scalar);
However scalar constants like this should only need to be initialised once, outside any loops, so the performance cost should be irrelevant. (Unless the scalar value is changing within the loop ?)
As always you should look at the code your compiler generates and also try running your code under a decent profiler to see where the hotspots really are.