I am using the following code to profile my operations to optimize on cpu cycles taken in my functions.
static __inline__ unsigned long GetCC(void)
{
unsigned a, d;
asm volatile("rdtsc" : "=a" (a), "=d" (d));
return ((unsigned long)a) | (((unsigned long)d) << 32);
}
I don't think it is the best since even two consecutive calls gives me a difference of "33". Any suggestions ?
I personally think the rdtsc instruction is great and usable for a variety of tasks. I do not think that using cpuid is necessary to prepare for rdtsc. Here is how I reason around rdtsc:
As to the question of time stamp counter being accurate I would say that assuming the tsc on different cores are synchronized (which is the norm) there is the problem of CPU throttling during periods of low activity to reduce energy consumption. It is always possible to inhibit the functionality when testing. If you're executing an instruction at 1 GHz or at 10 Mhz on the same processor the elapsed cycle count will be the same even though the former completed in 1% of the time compred to the latter.