why does perf stat show "stalled-cycles-backend" as <not supported>?

oliver picture oliver · Mar 28, 2014 · Viewed 13.1k times · Source

Running perf stat ls shows this:

Performance counter stats for 'ls':

          1.388670 task-clock                #    0.067 CPUs utilized          
                 2 context-switches          #    0.001 M/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
               266 page-faults               #    0.192 M/sec                  
           3515391 cycles                    #    2.531 GHz                    
           2096636 stalled-cycles-frontend   #   59.64% frontend cycles idle   
   <not supported> stalled-cycles-backend  
           2927468 instructions              #    0.83  insns per cycle        
                                             #    0.72  stalled cycles per insn
            615636 branches                  #  443.328 M/sec                  
             22172 branch-misses             #    3.60% of all branches        

       0.020657192 seconds time elapsed

Why is stalled-cycles-backend shown as "not supported"? What kind of CPU, hardware, kernel or user-space software do I need to see this value?

Currently tried this on RHEL with Linux 3.12 for x86_64, with matching "perf" version, on different Intel Core i5 and i7 systems (Ivy Bridge type). None of them support stalled-cycles-backend.

Some more info:

$ perf list | grep stalled
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]

$ ls /sys/devices/cpu/events/
branch-instructions  bus-cycles    cache-references  instructions  mem-stores
branch-misses        cache-misses  cpu-cycles        mem-loads     stalled-cycles-frontend

$ cat /sys/bus/event_source/devices/cpu/events/stalled-cycles-frontend
event=0x0e,umask=0x01,inv,cmask=0x01

Edit: just tried this on an AMD Phenom II X6 1045T CPU, under Ubuntu 12.04 with Linux 3.2 (32bit) - and here it does show values for both stalled-cycles-frontend and stalled-cycles-backend.

Answer

amdn picture amdn · Mar 28, 2014

Looks like perf has not been updated to understand all the performance monitoring events that Ivy Bridge supports. Fortunately there is a generic, albeit cumbersome, interface that allows you to access the full list of performance monitoring events. I didn't see stalled-cycles-backend in the list when I gave it a quick look, but maybe I missed, or maybe they have broken it down by all the different events that could stall the backend.

We start with

perf list --help

...shows the following NOTE

    1. Intel(R) 64 and IA-32 Architectures Software Developer's Manual
       Volume 3B: System Programming Guide
       http://www.intel.com/Assets/PDF/manual/253669.pdf

...armed with that URL you end up in

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf

...you want section 19.3

19.3 PERFORMANCE MONITORING EVENTS FOR 3RD GENERATION INTEL® CORE™ PROCESSORS 3rd generation Intel® Core™ processors and Intel Xeon processor E3-1200 v2 product family are based on Intel microarchitecture code name Ivy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of DisplayFamily_DisplayModel encoding with the following values: 06_3AH.

...so for architectural events you need Table 19-1

19.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS Architectural performance events are introduced in Intel Core Solo and Intel Core Duo processors. They are also supported on processors based on Intel Core microarchitecture. Table 19-1 lists pre-defined architectural performance events that can be configured using general-purpose performance counters and associated event-select registers.

**Table 19-1. Architectural Performance Events

enter image description here

enter image description here

... now comes the tricky part, you take the UMask Value as the upper 2 hex digits and the Event Num is the lower 2 hex digits of a 4 hex digit hardware register number to be given to perf stat.

perf stat --help
   -e, --event=
       Select the PMU event. Selection can be a symbolic event name (use
       perf list to list all events) or a raw PMU event (eventsel+umask) in
       the form of rNNN where NNN is a hexadecimal event descriptor.

... it says NNN but you can give it NNNN. Let's verify that this works, let's ask perf stat for cache-misses both as a symbolic event name and as a hex number from table 19-1. We'll invoke the date command for simplicity.

$ perf stat -e r412e -e cache-misses date

Fri Mar 28 09:28:52 CDT 2014

Performance counter stats for 'date':

          2292 r412e                                                       
          2292 cache-misses                                                

   0.003322663 seconds time elapsed

$ 

As you can see both reported the same number, so far so good. Now we go to Table 19-5 for the non-architectural hardware registers, of which there are too many too list here, but I'll list a few:

enter image description here