Top "Cuda" questions

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units).

Real-time video encoding in DirectShow

I have developed a Windows application that captures video from an external device using DirectShow. The image resolution is 640x480 …

encoding directshow cuda real-time opencl
How and when should I use pitched pointer with the cuda API?

I have quite a good understanding about how to allocate and copy linear memory with cudaMalloc() and cudaMemcpy(). However, when …

c++ cuda
Is it possible to high performance computing by Golang and CUDA?

I've googled for a while and the only useful infos are: github.com/barnex/cuda5 mumax.github.io/ Unfortunately, the …

go cuda opencl archlinux hpc
CUDA - Multiprocessors, Warp size and Maximum Threads Per Block: What is the exact relationship?

I know that there are multiprocessors on a CUDA GPU which contain CUDA cores in them. In my workplace I …

caching memory cuda textures
Running more than one CUDA applications on one GPU

CUDA document does not specific how many CUDA process can share one GPU. For example, if I launch more than …

cuda gpu gpgpu nvidia
Reducing Number of Registers Used in CUDA Kernel

I have a kernel which uses 17 registers, reducing it to 16 would bring me 100% occupancy. My question is: are there methods …

optimization cuda gpgpu
How to measure the inner kernel time in NVIDIA CUDA?

I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA? e.g. __global__ …

cuda gpu gpgpu nvidia
cudaMemcpy too slow

I use cudaMemcpy() one time to copy exactly 1GB of data to the device. This takes 5.9s. The other way …

cuda bus
Are there advantages to using the CUDA vector types?

CUDA provides built-in vector data types like uint2, uint4 and so on. Are there any advantages to using these data …

cuda abstract-data-type
How to define a CUDA shared memory with a size known at run time?

The __shared__ memory in CUDA seems to require a known size at compile time. However, in my problem, the __shared__ …

cuda gpu-shared-memory