I've searched all over for some insight on how exactly to use classes with CUDA, and while there is a general consensus that it can be done and apparently is being done by people, I've had a hard time finding out how to actually do it.
I have a class which implements a basic bitset with operator overloading and the like. I need to be able to instantiate objects of this class on both the host and the device, copy between the two, etc. Do I define this class in a .cu? If so, how do I use it in my host-side C++ code? The functions of the class do not need to access special CUDA variables like threadId; it just needs to be able to be used host and device side.
Thanks for any help, and if I'm approaching this in completely the wrong way, I'd love to hear alternatives.
Define the class in a header that you #include, just like in C++.
Any method that must be called from device code should be defined with both __device__
and __host__
declspecs, including the constructor and destructor if you plan to use new
/delete
on the device (note new
/delete
require CUDA 4.0 and a compute capability 2.0 or higher GPU).
You probably want to define a macro like
#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif
Then use this macro on your member functions
class Foo {
public:
CUDA_CALLABLE_MEMBER Foo() {}
CUDA_CALLABLE_MEMBER ~Foo() {}
CUDA_CALLABLE_MEMBER void aMethod() {}
};
The reason for this is that only the CUDA compiler knows __device__
and __host__
-- your host C++ compiler will raise an error.
Edit:
Note __CUDACC__
is defined by NVCC when it is compiling CUDA files. This can be either when compiling a .cu file with NVCC or when compiling any file with the command line option -x cu
.