I am porting SSE SIMD code to use the 256 bit AVX extensions and cannot seem to find any instruction that will blend/shuffle/move the high 128 bits and the low 128 bits.
The backing story:
What I really want is VHADDPS
/_mm256_hadd_ps
to act like HADDPS
/_mm_hadd_ps
, only with 256 bit words. Unfortunately, it acts like two calls to HADDPS
acting independently on the low and high words.
Using VPERM2F128, one can swap the low 128 and high 128 bits ( as well as other permutations). The instrinsic function usage looks like
x = _mm256_permute2f128_ps( x , x , 1)
The third argument is a control word which gives the user a lot of flexibility. See the Intel Instrinsic Guide for details.