I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA? e.g.
__global__ void kernelSample()
{
some code here
get start time
some code here
get stop time
some code here
}
You can do something like this:
__global__ void kernelSample(int *runtime)
{
// ....
clock_t start_time = clock();
//some code here
clock_t stop_time = clock();
// ....
runtime[tidx] = (int)(stop_time - start_time);
}
Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple of seconds, so you should be sure that the duration of code between successive calls is quite short. You should also be aware that the compiler and assembler do perform instruction re-ordering so you might want to check that the clock calls don't wind up getting put next to each other in the SASS output (use cudaobjdump
to check).