What are perf cache events meaning?

Manuel Selva picture Manuel Selva · Sep 26, 2012 · Viewed 21.3k times · Source

I am trying to figure out why a modified C program is running faster than its non modified counter part (I am adding very few lines of code to perform some additional work). In this context, I suspect "cache effects" to be the main explanation (instruction cache). Thus I reach the perf (https://perf.wiki.kernel.org/index.php/Main_Page) profiling tool but unfortunately I am not able to understand the meaning of its outputs regarding cache misses.

Several events about cache are provided:

  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  L1-icache-loads                                    [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  L1-icache-prefetches                               [Hardware cache event]
  L1-icache-prefetch-misses                          [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-prefetches                                     [Hardware cache event]
  LLC-prefetch-misses                                [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-prefetches                                    [Hardware cache event]
  dTLB-prefetch-misses                               [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  node-loads                                         [Hardware cache event]
  node-load-misses                                   [Hardware cache event]
  node-stores                                        [Hardware cache event]
  node-store-misses                                  [Hardware cache event]
  node-prefetches                                    [Hardware cache event]
  node-prefetch-misses                               [Hardware cache event]

Where can I find explanation about these fields ? cache-misses event is always smaller than other events. What does this event measure ?

How to interpret the 26,760 L1-icache-load-misses for ls vs the 5,708 cache-misses in the following example ?

perf stat -e L1-icache-load-misses ls
caches  caches~  out

 Performance counter stats for 'ls':

            26,760 L1-icache-load-misses                                       

       0.002816690 seconds time elapsed



perf stat -e cache-misses ls
caches  caches~  out

 Performance counter stats for 'ls':

             5,708 cache-misses                                                

       0.002822122 seconds time elapsed

Answer

MvG picture MvG · Sep 26, 2012

Some answers:

  • L1 is the Level-1 cache, the smallest and fastest one. LLC on the other hand refers to the last level of the cache hierarchy, thus denoting the largest but slowest cache.
  • i vs. d distinguishes instruction cache from data cache. Only L1 is split in this way, other caches are shared between data and instructions.
  • TLB refers to the translation lookaside buffer, a cache used when mapping virtual addresses to physical ones.
  • Different TLB counters depending on whether the named address referred to an instruction or some data.
  • For all data access, different counters are kept depending on whether the given memory location was read, written, or prefetched (i.e. retrieved for reading at some later time).
  • The number of misses indicates how often a given item of data was accessed but not present in the cache.