Running perf stat ls
shows this:
Performance counter stats for 'ls':
1.388670 task-clock # 0.067 CPUs utilized
2 context-switches # 0.001 M/sec
0 cpu-migrations # 0.000 K/sec
266 page-faults # 0.192 M/sec
3515391 cycles # 2.531 GHz
2096636 stalled-cycles-frontend # 59.64% frontend cycles idle
<not supported> stalled-cycles-backend
2927468 instructions # 0.83 insns per cycle
# 0.72 stalled cycles per insn
615636 branches # 443.328 M/sec
22172 branch-misses # 3.60% of all branches
0.020657192 seconds time elapsed
Why is stalled-cycles-backend shown as "not supported"? What kind of CPU, hardware, kernel or user-space software do I need to see this value?
Currently tried this on RHEL with Linux 3.12 for x86_64, with matching "perf" version, on different Intel Core i5 and i7 systems (Ivy Bridge type). None of them support stalled-cycles-backend.
Some more info:
$ perf list | grep stalled
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]
$ ls /sys/devices/cpu/events/
branch-instructions bus-cycles cache-references instructions mem-stores
branch-misses cache-misses cpu-cycles mem-loads stalled-cycles-frontend
$ cat /sys/bus/event_source/devices/cpu/events/stalled-cycles-frontend
event=0x0e,umask=0x01,inv,cmask=0x01
Edit: just tried this on an AMD Phenom II X6 1045T CPU, under Ubuntu 12.04 with Linux 3.2 (32bit) - and here it does show values for both stalled-cycles-frontend and stalled-cycles-backend.
Looks like perf
has not been updated to understand all the performance monitoring events that Ivy Bridge supports. Fortunately there is a generic, albeit cumbersome, interface that allows you to access the full list of performance monitoring events. I didn't see stalled-cycles-backend
in the list when I gave it a quick look, but maybe I missed, or maybe they have broken it down by all the different events that could stall the backend.
We start with
perf list --help
...shows the following NOTE
1. Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Volume 3B: System Programming Guide
http://www.intel.com/Assets/PDF/manual/253669.pdf
...armed with that URL you end up in
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
...you want section 19.3
19.3 PERFORMANCE MONITORING EVENTS FOR 3RD GENERATION INTEL® CORE™ PROCESSORS 3rd generation Intel® Core™ processors and Intel Xeon processor E3-1200 v2 product family are based on Intel microarchitecture code name Ivy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of DisplayFamily_DisplayModel encoding with the following values: 06_3AH.
...so for architectural
events you need Table 19-1
19.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS Architectural performance events are introduced in Intel Core Solo and Intel Core Duo processors. They are also supported on processors based on Intel Core microarchitecture. Table 19-1 lists pre-defined architectural performance events that can be configured using general-purpose performance counters and associated event-select registers.
**Table 19-1. Architectural Performance Events
... now comes the tricky part, you take the UMask Value
as the upper 2 hex digits and the Event Num
is the lower 2 hex digits of a 4 hex digit hardware register number to be given to perf stat
.
perf stat --help
-e, --event= Select the PMU event. Selection can be a symbolic event name (use perf list to list all events) or a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor.
... it says NNN
but you can give it NNNN
. Let's verify that this works, let's ask perf stat
for cache-misses both as a symbolic event name and as a hex number from table 19-1. We'll invoke the date
command for simplicity.
$ perf stat -e r412e -e cache-misses date
Fri Mar 28 09:28:52 CDT 2014
Performance counter stats for 'date':
2292 r412e
2292 cache-misses
0.003322663 seconds time elapsed
$
As you can see both reported the same number, so far so good. Now we go to Table 19-5 for the non-architectural hardware registers, of which there are too many too list here, but I'll list a few: