How to measure the inner kernel time in NVIDIA CUDA?

Amin picture Amin · May 14, 2012 · Viewed 14.4k times · Source

I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA? e.g.

__global__ void kernelSample()
{
  some code here
  get start time 
  some code here 
  get stop time 
  some code here
}

Answer

talonmies picture talonmies · May 14, 2012

You can do something like this:

__global__ void kernelSample(int *runtime)
{
  // ....
  clock_t start_time = clock(); 
  //some code here 
  clock_t stop_time = clock();
  // ....

  runtime[tidx] = (int)(stop_time - start_time);
}

Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple of seconds, so you should be sure that the duration of code between successive calls is quite short. You should also be aware that the compiler and assembler do perform instruction re-ordering so you might want to check that the clock calls don't wind up getting put next to each other in the SASS output (use cudaobjdump to check).