What is the maximum number of blocks in a grid that can created per kernel launch? I am slightly confused here since
Now the compute capability table here says that there can be 65535 blocks per grid dimemsion in CUDA compute capability 2.0.
Does that mean the total number of blocks = 65535*65535?
Or does it mean that you can rearrange at most 65535 into a 1d grid of 65536 blocks or 2d grid of sqrt(65535) * sqrt(65535) ?
Thank you.
65535 per dimension of the grid. On compute 1.x cards, 1D and 2D grids are supported. On compute 2.x cards, 3D grids are also supported, so 65535, 65535 x 65535, and 65535 x 65535 x 65535 are the limits for Fermi (compute 2.x) cards.