I have loop like this
start = __rdtsc();
unsigned long long count = 0;
for(int i = 0; i < N; i++)
for(int j = 0; j < M; j++)
count += tab[i][j];
stop = __rdtsc();
time = (stop - start) * 1/3;
Need to check how prefetch data influences on efficiency. How to force prefetch some values from memory into cache before they will be counted?
For GCC only:
__builtin_prefetch((const void*)(prefetch_address),0,0);
prefetch_address
can be invalid, there will be no segfault. If there too small difference between prefetch_address
and current location, there might be no effect or even slowdown. Try to set it at least 1k ahead.