I'm new to CUDA and am trying to figure out whether PyCUDA (free) or NumbaPro CUDA Python (not free) would be better for me (assuming the library cost is not an issue).
Both seem to require that you use their respective Python dialects. But, it seems that PyCUDA requires you to write a kernel function in C
code, which would be more cumbersome than using NumbaPro, which seems to do all the hard work for you.
Is this indeed the case? Would there be notable performance differences?