i am trying to allocate shared memory by using a constant parameter but getting an error. my kernel looks like this:
__global__ void Kernel(const int count)
{
__shared__ int a[count];
}
and i am getting an error saying
error: expression must have a constant value
count is const! Why am I getting this error? And how can I get around this?
CUDA supports dynamic shared memory allocation. If you define the kernel like this:
__global__ void Kernel(const int count)
{
extern __shared__ int a[];
}
and then pass the number of bytes required as the the third argument of the kernel launch
Kernel<<< gridDim, blockDim, a_size >>>(count)
then it can be sized at run time. Be aware that the runtime only supports a single dynamically declared allocation per block. If you need more, you will need to use pointers to offsets within that single allocation. Also be aware when using pointers that shared memory uses 32 bit words, and all allocations must be 32 bit word aligned, irrespective of the type of the shared memory allocation.