32b8af82e3
Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
181 lines
5.5 KiB
Text
181 lines
5.5 KiB
Text
perf-stat(1)
|
|
============
|
|
|
|
NAME
|
|
----
|
|
perf-stat - Run a command and gather performance counter statistics
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
|
|
'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
This command runs a command and gathers performance counter statistics
|
|
from it.
|
|
|
|
|
|
OPTIONS
|
|
-------
|
|
<command>...::
|
|
Any command you can specify in a shell.
|
|
|
|
|
|
-e::
|
|
--event=::
|
|
Select the PMU event. Selection can be:
|
|
|
|
- a symbolic event name (use 'perf list' to list all events)
|
|
|
|
- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
|
|
hexadecimal event descriptor.
|
|
|
|
- a symbolically formed event like 'pmu/param1=0x3,param2/' where
|
|
param1 and param2 are defined as formats for the PMU in
|
|
/sys/bus/event_sources/devices/<pmu>/format/*
|
|
|
|
- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
|
|
where M, N, K are numbers (in decimal, hex, octal format).
|
|
Acceptable values for each of 'config', 'config1' and 'config2'
|
|
parameters are defined by corresponding entries in
|
|
/sys/bus/event_sources/devices/<pmu>/format/*
|
|
|
|
-i::
|
|
--no-inherit::
|
|
child tasks do not inherit counters
|
|
-p::
|
|
--pid=<pid>::
|
|
stat events on existing process id (comma separated list)
|
|
|
|
-t::
|
|
--tid=<tid>::
|
|
stat events on existing thread id (comma separated list)
|
|
|
|
|
|
-a::
|
|
--all-cpus::
|
|
system-wide collection from all CPUs
|
|
|
|
-c::
|
|
--scale::
|
|
scale/normalize counter values
|
|
|
|
-r::
|
|
--repeat=<n>::
|
|
repeat command and print average + stddev (max: 100). 0 means forever.
|
|
|
|
-B::
|
|
--big-num::
|
|
print large numbers with thousands' separators according to locale
|
|
|
|
-C::
|
|
--cpu=::
|
|
Count only on the list of CPUs provided. Multiple CPUs can be provided as a
|
|
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
|
|
In per-thread mode, this option is ignored. The -a option is still necessary
|
|
to activate system-wide monitoring. Default is to count on all CPUs.
|
|
|
|
-A::
|
|
--no-aggr::
|
|
Do not aggregate counts across all monitored CPUs in system-wide mode (-a).
|
|
This option is only valid in system-wide mode.
|
|
|
|
-n::
|
|
--null::
|
|
null run - don't start any counters
|
|
|
|
-v::
|
|
--verbose::
|
|
be more verbose (show counter open errors, etc)
|
|
|
|
-x SEP::
|
|
--field-separator SEP::
|
|
print counts using a CSV-style output to make it easy to import directly into
|
|
spreadsheets. Columns are separated by the string specified in SEP.
|
|
|
|
-G name::
|
|
--cgroup name::
|
|
monitor only in the container (cgroup) called "name". This option is available only
|
|
in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
|
|
container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
|
|
can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
|
|
to first event, second cgroup to second event and so on. It is possible to provide
|
|
an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
|
|
corresponding events, i.e., they always refer to events defined earlier on the command
|
|
line.
|
|
|
|
-o file::
|
|
--output file::
|
|
Print the output into the designated file.
|
|
|
|
--append::
|
|
Append to the output file designated with the -o option. Ignored if -o is not specified.
|
|
|
|
--log-fd::
|
|
|
|
Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive
|
|
with it. --append may be used here. Examples:
|
|
3>results perf stat --log-fd 3 -- $cmd
|
|
3>>results perf stat --log-fd 3 --append -- $cmd
|
|
|
|
--pre::
|
|
--post::
|
|
Pre and post measurement hooks, e.g.:
|
|
|
|
perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage
|
|
|
|
-I msecs::
|
|
--interval-print msecs::
|
|
Print count deltas every N milliseconds (minimum: 100ms)
|
|
example: perf stat -I 1000 -e cycles -a sleep 5
|
|
|
|
--per-socket::
|
|
Aggregate counts per processor socket for system-wide mode measurements. This
|
|
is a useful mode to detect imbalance between sockets. To enable this mode,
|
|
use --per-socket in addition to -a. (system-wide). The output includes the
|
|
socket number and the number of online processors on that socket. This is
|
|
useful to gauge the amount of aggregation.
|
|
|
|
--per-core::
|
|
Aggregate counts per physical processor for system-wide mode measurements. This
|
|
is a useful mode to detect imbalance between physical cores. To enable this mode,
|
|
use --per-core in addition to -a. (system-wide). The output includes the
|
|
core number and the number of online logical processors on that physical processor.
|
|
|
|
--per-thread::
|
|
Aggregate counts per monitored threads, when monitoring threads (-t option)
|
|
or processes (-p option).
|
|
|
|
-D msecs::
|
|
--delay msecs::
|
|
After starting the program, wait msecs before measuring. This is useful to
|
|
filter out the startup phase of the program, which is often very different.
|
|
|
|
-T::
|
|
--transaction::
|
|
|
|
Print statistics of transactional execution if supported.
|
|
|
|
EXAMPLES
|
|
--------
|
|
|
|
$ perf stat -- make -j
|
|
|
|
Performance counter stats for 'make -j':
|
|
|
|
8117.370256 task clock ticks # 11.281 CPU utilization factor
|
|
678 context switches # 0.000 M/sec
|
|
133 CPU migrations # 0.000 M/sec
|
|
235724 pagefaults # 0.029 M/sec
|
|
24821162526 CPU cycles # 3057.784 M/sec
|
|
18687303457 instructions # 2302.138 M/sec
|
|
172158895 cache references # 21.209 M/sec
|
|
27075259 cache misses # 3.335 M/sec
|
|
|
|
Wall-clock time elapsed: 719.554352 msecs
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf-top[1], linkperf:perf-list[1]
|