I am trying to figure out why a modified C program is running faster than its non modified counter part (I am adding very few lines of code to perform some additional work). In this context, I suspect "cache effects" to be the main explanation (instruction cache). Thus I reach the perf
(https://perf.wiki.kernel.org/index.php/Main_Page) profiling tool but unfortunately I am not able to understand the meaning of its outputs regarding cache misses.
Several events about cache are provided:
cache-references [Hardware event]
cache-misses [Hardware event]
L1-dcache-loads [Hardware cache event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-dcache-store-misses [Hardware cache event]
L1-dcache-prefetches [Hardware cache event]
L1-dcache-prefetch-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-prefetches [Hardware cache event]
L1-icache-prefetch-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-stores [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-prefetches [Hardware cache event]
LLC-prefetch-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-prefetches [Hardware cache event]
dTLB-prefetch-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
iTLB-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
branch-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-stores [Hardware cache event]
node-store-misses [Hardware cache event]
node-prefetches [Hardware cache event]
node-prefetch-misses [Hardware cache event]
Where can I find explanation about these fields ? cache-misses event is always smaller than other events. What does this event measure ?
How to interpret the 26,760 L1-icache-load-misses for ls vs the 5,708 cache-misses in the following example ?
perf stat -e L1-icache-load-misses ls
caches caches~ out
Performance counter stats for 'ls':
26,760 L1-icache-load-misses
0.002816690 seconds time elapsed
perf stat -e cache-misses ls
caches caches~ out
Performance counter stats for 'ls':
5,708 cache-misses
0.002822122 seconds time elapsed
Some answers:
L1
is the Level-1 cache, the smallest and fastest one. LLC
on the other hand refers to the last level of the cache hierarchy, thus denoting the largest but slowest cache.i
vs. d
distinguishes instruction cache from data cache. Only L1 is split in this way, other caches are shared between data and instructions.TLB
refers to the translation lookaside buffer, a cache used when mapping virtual addresses to physical ones.