I have some code that tries to determine the execution time of a code block.
#include <time.h>
#include <stdio.h>
int main()
{
clock_t start_t, end_t, total_t;
int i;
start_t = clock(); //clock start
printf("Starting of the program, start_t = %ld\n", start_t);
printf("Going to scan a big loop, start_t = %ld\n", start_t);
for(i=0; i< 10000000; i++) //trying to determine execution time of this block
{
}
end_t = clock(); //clock stopped
printf("End of the big loop, end_t = %ld\n", end_t);
total_t = (long int)(end_t - start_t);
printf("Total time taken by CPU: %lu\n", total_t );
return(0);
}
The output of the code snippet on my machine is
Starting of the program, start_t = 8965
Going to scan a big loop, start_t = 8965
End of the big loop, end_t = 27259
Total time taken by CPU: 18294
So if my CPU was running at 21 MHz and assuming that this was the only thing getting executed, each machine cycle would be approximately equal to 47 nanoseconds so (18294 * 47) = 859818 nanoseconds.
Would this be the execution time for the for loop in my code? Am I making some incorrect assumptions here.
The unit of time used by the clock
function is arbitrary. On most platforms, it is unrelated to the processor speed. It's more commonly related to the frequency of an external timer interrupt — which may be configured in software — or to a historical value that's been kept for compatibility through years of processor evolution. You need to use the macro CLOCKS_PER_SEC
to convert to real time.
printf("Total time taken by CPU: %fs\n", (double)total_t / CLOCKS_PER_SEC);
The C standard library was designed to be implementable on a wide range of hardware, including processors that don't have an internal timer and rely on an external peripheral to tell the time. Many platforms have more precise ways to measure wall clock time than time
and more precise ways to measure CPU consumption than clock
. For example, on POSIX systems (e.g. Linux and other Unix-like systems), you can use getrusage
, which has microsecond precision.
struct timeval start, end;
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
start = usage.ru_utime;
…
getrusage(RUSAGE_SELF, &usage);
end = usage.ru_utime;
printf("Total time taken by CPU: %fs\n", (double)(end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec) / 1e-6);
Where available, clock_gettime(CLOCK_THREAD_CPUTIME_ID)
or clock_gettime(CLOCK_PROCESS_CPUTIME_ID)
may give better precision. It has nanosecond precision.
Note the difference between precision and accuracy: precision is the unit that the values are reported. Accuracy is how close the reported values are to the real values. Unless you are working on a real-time system, there are no hard guarantees as to how long a piece of code takes, including the invocation of the measurement functions themselves.
Some processors have cycle clocks that count processor cycles rather than wall clock time, but this gets very system-specific.
Whenever making benchmarks, beware that what you are measuring is the execution of this particular executable on this particular CPU in these particular circumstances, and the results may or may not generalize to other situations. For example, the empty loop in your question will be optimized away by most compilers unless you turn optimizations off. Measuring the speed of unoptimized code is usually pointless. Even if you add real work in the loop, beware of toy benchmarks: they often don't have the same performance characteristics as real-world code. On modern high-end CPUs such as found in PC and smartphones, benchmarks of CPU-intensive code is often very sensitive to cache effects and the results can depend on what else is running on the system, on the exact CPU model (due to different cache sizes and layouts), on the address at which the code happens to be loaded, etc.