When reading through CUDA 5.0 Programming Guide I stumbled on a feature called "Funnel shift" which is present in 3.5 compute-capable device, but not 3.0. It contains an annotation "see reference manual", but when I search for the "funnel shift" term in the manual, I don't find anything.
I tried googling for it, but only found a mention on http://www.cudahandbook.com, in the chapter 8:
8.2.3 Funnel Shift (SM 3.5)
GK110 added a 64-bit “funnel shift” instruction that may be accessed with the following intrinsics:
__funnelshift_lc(): returns most significant 32 bits of a left funnel shift.
__funnelshift_rc(): returns least significant 32 bits of a right funnel shift.
These intrinsics are implemented as inline device functions (using inline PTX assembler) in sm_35_intrinsics.h.
...but it still does not explain what the "left funnel shift" or "right funnel shift" is.
So, what is it and where does one need it?
In the case of CUDA, two 32-bit registers are concatenated together into a 64-bit value; that value is shifted left or right; and the most significant (for a left shift) or least significant (for right shift) 32 bits are returned.
The intrinsics from sm_35_intrinsics.h
are as follows:
unsigned int __funnelshift_lc(unsigned int lo, unsigned int hi, unsigned int shift);
unsigned int __funnelshift_rc(unsigned int lo, unsigned int hi, unsigned int shift);
According to Andy Glew (dead link removed), applications for funnel shift include fast misaligned memcpy; and as njuffa mentions in the comments above, it can be used to implement rotate if the two input words are the same.