CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units).
I have developed a Windows application that captures video from an external device using DirectShow. The image resolution is 640x480 …
encoding directshow cuda real-time openclI have quite a good understanding about how to allocate and copy linear memory with cudaMalloc() and cudaMemcpy(). However, when …
c++ cudaI have a kernel which uses 17 registers, reducing it to 16 would bring me 100% occupancy. My question is: are there methods …
optimization cuda gpgpuI use cudaMemcpy() one time to copy exactly 1GB of data to the device. This takes 5.9s. The other way …
cuda busCUDA provides built-in vector data types like uint2, uint4 and so on. Are there any advantages to using these data …
cuda abstract-data-typeThe __shared__ memory in CUDA seems to require a known size at compile time. However, in my problem, the __shared__ …
cuda gpu-shared-memory