Thrust inside user written kernels

Question 1

Thrust inside user written kernels

cuda thrust

Ashwin Nanjappa · Apr 1, 2011 · Viewed 18.8k times · Source

Answer

Answer

As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this:

thrust::device_vector< Foo > fooVector;
// Do something thrust-y with fooVector

Foo* fooArray = thrust::raw_pointer_cast( fooVector.data() );

// Pass raw array and its size to kernel
someKernelCall<<< x, y >>>( fooArray, fooVector.size() );

and you can also use device memory not allocated by thrust within thrust algorithms by instantiating a thrust::device_ptr with the bare cuda device memory pointer.

Edited four and half years later to add that as per @JackOLantern's answer, thrust 1.8 adds a sequential execution policy which means you can run single threaded versions of thrust's alogrithms on the device. Note that it still isn't possible to directly pass a thrust device vector to a kernel and device vectors can't be directly used in device code.

Note that it is also possible to use the thrust::device execution policy in some cases to have parallel thrust execution launched by a kernel as a child grid. This requires separate compilation/device linkage and hardware which supports dynamic parallelism. I am not certain whether this is actually supported in all thrust algorithms or not, but certainly works with some.

Question 2

I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code.

I would like to know if I can pass a device_vector to my own kernel? How? If yes, what are the operations permitted on it inside kernel/device code?

Thrust inside user written kernels

Answer

Related questions