The __shared__
memory in CUDA seems to require a known size at compile time. However, in my problem, the __shared__
memory size is only know at run time, i.e.
int size=get_size();
__shared__ mem[size];
This will end up with "error: constant value is not known", and I'm not sure how to get around this problem.
The purpose of shared memory is to allow the threads in a block to collaborate. When you declare an array as __shared__
, each thread in the block sees the same memory, so it would not make sense for a given thread to be able to set its own size for an array in shared memory.
However, the special case of dynamically specifying the size of a single __shared__
array that is the same size for all threads IS supported. See allocating shared memory.
If you do need to dynamically allocate memory for each thread, you can use new or malloc inside a kernel (on Fermi), but they allocate global memory, which is likely to be slow.