We just check that the evsel is the one we associated with the
bpf-output event associated with the "__augmented_syscalls__" eBPF map,
to show that the formatting is done properly:
# perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null
0.000 ( ): __augmented_syscalls__:dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC
0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC
0.007 ( 0.004 ms): cat/11486 openat(dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC ) = 3
0.029 ( ): __augmented_syscalls__:dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC
0.030 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC
0.031 ( 0.004 ms): cat/11486 openat(dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC ) = 3
0.249 ( ): __augmented_syscalls__:dfd: CWD, filename: 0xc3700d6
0.250 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xc3700d6
0.252 ( 0.003 ms): cat/11486 openat(dfd: CWD, filename: 0xc3700d6 ) = 3
#
Now we just need to get the full blown enter/exit handlers to check if the
evsel being processed is the augmented_syscalls one to go pick the pointer
payloads from the end of the payload.
We also need to state somehow what is the layout for multi pointer arg syscalls.
Also handy would be to have a BTF file with the struct definitions used in
syscalls, compact, generated at kernel built time and available for use in eBPF
programs.
Till we get there we can go on doing some manual coupling of the most relevant
syscalls with some hand built beautifiers.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-r6ba5izrml82nwfmwcp7jpkm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The payload that is put in place by the eBPF script attached to
syscalls:sys_enter_openat (and other syscalls with pointers, in the
future) can be consumed by the existing sys_enter beautifiers if
evsel->priv is setup with a struct syscall_tp with struct tp_fields for
the 'syscall_id' and 'args' fields expected by the beautifiers, this
patch does just that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-xfjyog8oveg2fjys9r1yy1es@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We're calling it to setup that event, and we'll need it later to decide
if the bpf-output event we're handling is the one setup for a specific
purpose, return it using ERR_PTR, etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zhachv7il2n1lopt9aonwhu7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add an example BPF script that writes syscalls:sys_enter_openat raw
tracepoint payloads augmented with the first 64 bytes of the "filename"
syscall pointer arg.
Then catch it and print it just like with things written to the
"__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software
event, by just letting the default tracepoint handler in 'perf trace',
trace__event_handler(), to use bpf_output__fprintf(trace, sample), just
like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just
do a dump on the payload, so that we can check if what is being printed
has at least the first 64 bytes of the "filename" arg:
The augmented_syscalls.c eBPF script:
# cat tools/perf/examples/bpf/augmented_syscalls.c
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
struct bpf_map SEC("maps") __augmented_syscalls__ = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
struct syscall_enter_openat_args {
unsigned long long common_tp_fields;
long syscall_nr;
long dfd;
char *filename_ptr;
long flags;
long mode;
};
struct augmented_enter_openat_args {
struct syscall_enter_openat_args args;
char filename[64];
};
int syscall_enter(openat)(struct syscall_enter_openat_args *args)
{
struct augmented_enter_openat_args augmented_args;
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr);
perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU,
&augmented_args, sizeof(augmented_args));
return 1;
}
license(GPL);
#
So it will just prepare a raw_syscalls:sys_enter payload for the
"openat" syscall.
This will eventually be done for all syscalls with pointer args,
globally or just when the user asks, using some spec, which args of
which syscalls it wants "expanded" this way, we'll probably start with
just all the syscalls that have char * pointers with familiar names, the
ones we already handle with the probe:vfs_getname kprobe if it is in
place hooking the kernel getname_flags() function used to copy from user
the paths.
Running it we get:
# perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null
0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1.".........
0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC
0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3
0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../.".........
0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC
0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3
0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........
0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6
0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3
#
We need to go on optimizing this to avoid seding trash or zeroes in the
pointer content payload, using the return from bpf_probe_read_str(), but
to keep things simple at this stage and make incremental progress, lets
leave it at that for now.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now it looks just about the same as for the trace__sys_{enter,exit}.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-y59may7zx1eccnp4m3qm4u0b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mapping "__syscall_nr" to "id" and setting up "args" from the offset of
"__syscall_nr" + sizeof(u64), as the payload for syscalls:* is the same
as for raw_syscalls:*, just the fields have different names.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ogeenrpviwcpwl3oy1l55f3m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid having to ask libtraceevent to find a field by name when
handling each tracepoint event, we setup a struct syscall_tp with
a tp_field struct having an extractor function + the offset for the
"id", "args" and "ret" raw_syscalls:sys_{enter,exit} tracepoints.
Now that we want to do the same with syscalls:sys_{entry,exit}_NAME
individual syscall tracepoints, where we have "id" as "__syscall_nr" and
"args" as the actual series of per syscall parameters, we need more
flexibility from the routines that set up these pre-looked up syscall
tracepoint arg fields.
The next cset will use it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-v59q5e0jrlzkpl9a1c7t81ni@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Because raw_syscalls have the field for the syscall number as 'id' while
the syscalls:sys_{enter,exit}_NAME have it as __syscall_nr...
Since we want to support both for being able to enable just a
syscalls:sys_{enter,exit}_name instead of asking for
raw_syscalls:sys_{enter,exit} plus filters, make the method names for
each kind of tracepoint more explicit.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4rixbfzco6tsry0w9ghx3ktb@git.kernel.org
Signef-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were using the beautifiers only when processing the
raw_syscalls:sys_enter events, but we can as well use them for the
syscalls:sys_enter_NAME events, as the layout is the same.
Some more tweaking is needed as we're processing them straight away,
i.e. there is no buffering in the sys_enter_NAME event to wait for
things like vfs_getname to provide pointer contents and then flushing
at sys_exit_NAME, so we need to state in the syscall_arg that this
is unbuffered, just print the pointer values, beautifying just
non-pointer syscall args.
This just shows an alternative way of processing tracepoints, that we
will end up using when creating "tracepoint" payloads that already copy
pointer contents (or chunks of it, i.e. not the whole filename, but just
the end of it, not all the bf for a read/write, but just the start,
etc), directly in the kernel using eBPF.
E.g.:
# perf trace -e syscalls:*enter*sleep,*sleep sleep 1
0.303 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffc93d5ecc0
0.305 (1000.229 ms): sleep/8746 nanosleep(rqtp: 0x7ffc93d5ecc0) = 0
# perf trace -e syscalls:*_*sleep,*sleep sleep 1
0.288 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffecde87e40
0.289 ( ): sleep/8748 nanosleep(rqtp: 0x7ffecde87e40) ...
1000.479 ( ): syscalls:sys_exit_nanosleep:0x0
0.289 (1000.208 ms): sleep/8748 ... [continued]: nanosleep()) = 0
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-jehyd2zwhw00z3p7v7mg9632@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the vfs_getname() wannabe tracepoint is in place:
# perf probe -l
probe:vfs_getname (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
#
'perf trace' will use it to get the pathname when it is copied from
userspace to the kernel, right after syscalls:sys_enter_open, copied
in the 'probe:vfs_getname', stash it somewhere and then, at
syscalls:sys_exit_open time, if the 'open' return is not -1, i.e. a
successfull open syscall, associate that pathname to this return, i.e.
the fd.
We were not doing this for the 'openat' syscall, which would cause 'perf
trace' to fallback to using /proc to get the fd, change it so that we
use what we got from probe:vfs_getname, reducing the 'openat'
beautification process cost, ditching the syscalls performed to read
procfs state and avoiding some possible races in the process.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-xnp44ao3bkb6ejeczxfnjwsh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For instance:
$ trace -e socket* ssh sandy
0.000 ( 0.031 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3
0.052 ( 0.015 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3
1.568 ( 0.020 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3
1.603 ( 0.012 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3
1.699 ( 0.014 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3
1.724 ( 0.012 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3
1.804 ( 0.020 ms): ssh/19919 socket(family: INET, type: STREAM, protocol: TCP ) = 3
17.549 ( 0.098 ms): ssh/19919 socket(family: LOCAL, type: STREAM ) = 4
acme@sandy's password:
Just like with other syscall args, the common bits are supressed so that
the output is more compact, i.e. we use "TCP" instead of "IPPROTO_TCP",
but we can make this show the original constant names if we like it by
using some command line knob or ~/.perfconfig "[trace]" section
variable.
Also needed is to make perf's event parser accept things like:
$ perf trace -e socket*/protocol=TCP/
By using both the tracefs event 'format' files and these tables built
from the kernel sources.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-l39jz1vnyda0b6jsufuc8bz7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We may have string tables where not all slots have values, in those
cases its better to print the numeric value, for instance:
In the table below we would show "protocol: (null)" for
socket_ipproto[3]
Where it would be better to show "protocol: 3".
$ tools/perf/trace/beauty/socket_ipproto.sh
static const char *socket_ipproto[] = {
[0] = "IP",
[103] = "PIM",
[108] = "COMP",
[12] = "PUP",
[132] = "SCTP",
[136] = "UDPLITE",
[137] = "MPLS",
[17] = "UDP",
[1] = "ICMP",
[22] = "IDP",
[255] = "RAW",
[29] = "TP",
[2] = "IGMP",
[33] = "DCCP",
[41] = "IPV6",
[46] = "RSVP",
[47] = "GRE",
[4] = "IPIP",
[50] = "ESP",
[51] = "AH",
[6] = "TCP",
[8] = "EGP",
[92] = "MTP",
[94] = "BEETPH",
[98] = "ENCAP",
};
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-7djfak94eb3b9ltr79cpn3ti@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN),
so add an evsel__has_callchain(evsel) helper.
This will actually get more uses as we check that instead of
symbol_conf.use_callchain in places where that produces the same result
but makes this decision to be more fine grained, per evsel.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Let tools that need to have those variables with the sysctl current
values use a function that will read them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9vowe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Simulate having all symbols in just one tree by searching the still
existing two trees.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-uss70e8tvzzbzs326330t83q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Out of thread__find_addr_location(..., MAP__FUNCTION, ...), idea here is to
continue removing references to MAP__{FUNCTION,VARIABLE} ahead of
getting both types of symbols in the same rbtree, as various places do
two lookups, looking first at MAP__FUNCTION, then at MAP__VARIABLE.
So thread__find_symbol() will eventually do just that, and 'struct
symbol' will have the symbol type, for code that cares about that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-n7528en9e08yd3flzmb26tth@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a redundant ')' at the tail of each event. So remove it.
$ sudo perf trace --no-syscalls -e 'kmem:*' -a
899.342 kmem:kfree:(vfs_writev+0xb9) call_site=ffffffff9c453979 ptr=(nil))
899.344 kmem:kfree:(___sys_recvmsg+0x188) call_site=ffffffff9c9b8b88 ptr=(nil))
Signed-off-by: Changbin Du <changbin.du@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1520937601-24952-1-git-send-email-changbin.du@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It isn't necessary to pass the 'start', 'end' and 'overwrite' arguments
to perf_mmap__read_init(). The data is stored in the struct perf_mmap.
Discard the parameters.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1520350567-80082-8-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It isn't necessary to pass the 'overwrite', 'start' and 'end' argument
to perf_mmap__read_event(). Discard them.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1520350567-80082-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It isn't necessary to pass the 'overwrite' argument to
perf_mmap__consume(). Discard it.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1520350567-80082-6-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One can set a cgroup as a default cgroup to be used by all events or
set cgroups with the 'perf stat' and 'perf record' behaviour, i.e.
'-G A' will be the cgroup for events defined so far in the command line.
Here in my main machine, with a kvm instance running a rhel6 guinea pig
I have:
# ls -la /sys/fs/cgroup/perf_event/ | grep drw
drwxr-xr-x. 14 root root 360 Mar 6 12:04 ..
drwxr-xr-x. 3 root root 0 Mar 6 15:05 machine.slice
#
So I can go ahead and use that cgroup hierarchy, say lets see what
syscalls are being emitted by threads in that 'machine.slice' hierarchy
that are taking more than 100ms:
# perf trace --duration 100 -G machine.slice
0.188 (249.850 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
250.274 (249.743 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
500.224 (249.755 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
750.097 (249.934 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
1000.244 (249.780 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
1250.197 (249.796 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
1500.124 (249.859 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
1750.076 (172.900 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
902.570 (1021.116 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1
1923.825 (305.133 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1
2000.172 (229.002 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
^C #
If we look inside that cgroup hierarchy we get:
# ls -la /sys/fs/cgroup/perf_event/machine.slice/ | grep drw
drwxr-xr-x. 3 root root 0 Mar 6 15:05 .
drwxr-xr-x. 2 root root 0 Mar 6 16:16 machine-qemu\x2d2\x2drhel6.sandy.scope
#
There is just one, but lets say there were more and we would want to see
5 seconds worth of syscall summary for the threads in that cgroup:
# perf trace --summary -G machine.slice/machine-qemu\\x2d2\\x2drhel6.sandy.scope/ -a sleep 5
Summary of events:
qemu-system-x86 (23667), 143858 events, 24.2%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
ppoll 28492 4348.631 0.000 0.153 11.616 1.05%
futex 19661 140.801 0.001 0.007 2.993 3.20%
read 18440 68.084 0.001 0.004 1.653 4.33%
ioctl 5387 24.768 0.002 0.005 0.134 1.62%
CPU 0/KVM (23744), 449455 events, 75.8%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
ioctl 148364 3401.812 0.000 0.023 11.801 1.15%
futex 36131 404.127 0.001 0.011 7.377 2.63%
writev 29452 339.688 0.003 0.012 1.740 1.36%
write 11315 45.992 0.001 0.004 0.105 1.10%
#
See the documentation about how to set more than one cgroup for
different events in the same command line.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-t126jh4occqvu0xdqlcjygex@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf trace' utility still use the legacy interface.
Switch to the new perf_mmap__read_event() interface.
No functional change.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1519945751-37786-2-git-send-email-kan.liang@linux.intel.com
[ Changed bool parameters from 0 to 'false', as per Jiri comment ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To resolve some header conflicts that were preventing the build to
succeed in the Alpine Linux distribution.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-bvud0dvzvip3kibeplupdbmc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Not needed there, fixup the places where it is needed and was getting
only by luck via evlist.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-yxjpetn64z8vjuguu84gr6x6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were calculating the delta from a in-flight syscall that got its
output interrupted by another syscall, which doesn't seem like useful
information, we will print the syscall duration (sys_exit - sys_enter)
when the raw_syscalls:sys_exit event happens.
The problem here is how we're consuming the multiple ring buffers,
without using the ordered_events code used by perf_session, which may
cause some reordering of syscalls for diferent CPUs, so just stop
printing that delta, to avoid things like:
# trace --print-sample -p 9626 -e futex
raw_syscalls:sys_enter 411967179.269 Timer 9609/9626 [2]
raw_syscalls:sys_enter 411967179.213 file:// Content 9609/9609 [3]
328.038 (18446744073709.496 ms): Timer/9626 futex(uaddr: 0x7fc0d4027044, op: WAIT|PRIV, utime: 0x7fc0b0ffdb50 ) ...
raw_syscalls:sys_exit 411967179.225 file:// Content 9609/9609 [3]
327.982 ( 0.012 ms): file:// Conten/9609 futex(uaddr: 0x7fc0d4027040, op: WAKE|PRIV, val: 1 ) = 1
This is a bandaid, we should better try and use the ordered_events code,
possibly with some refactoring prep work, but for now at least we don't
show those false long deltas for the lines ending in '...'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-q6xgsqrju1sr6ltud9kjjhmb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To help with debugging, like the interrupted out of order issue that
will be dealt with in the next patch in this series, changing the code
to deal with:
raw_syscalls:sys_enter 411967179.269 Timer 9609/9626 [2]
raw_syscalls:sys_enter 411967179.213 file:// Content 9609/9609 [3]
328.038 (18446744073709.496 ms): Timer/9626 futex(uaddr: 0x7fc0d4027044, op: WAIT|PRIV, utime: 0x7fc0b0ffdb50 ) ...
raw_syscalls:sys_exit 411967179.225 file:// Content 9609/9609 [3]
327.982 ( 0.012 ms): file:// Conten/9609 futex(uaddr: 0x7fc0d4027040, op: WAKE|PRIV, val: 1 ) = 1
That long duration is the bug.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-fljqiibjn7wet24jd1ed7abc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replace the errno_to_name() from the audit-libs with the newly
introduced arch_syscalls__strerrno() function.
With this change:
1. With replacing errno_to_name() from audit-libs, perf trace
does no longer require audit-lib interfaces.
2. In addition to 1, the audit-libs dependency can be removed
for architectures that support syscall tables in perf.
This is achieved in a follow-up commit.
3. With the architecture specific errno number/name mapping,
perf trace reports can work across architectures.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: linux-s390@vger.kernel.org
LPU-Reference: 1516352177-11106-5-git-send-email-brueckner@linux.vnet.ibm.com
Link: https://lkml.kernel.org/n/tip-xjvoqzhwmu4wn4kl9ng11rvs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we use:
perf trace --max-stack=4
then the syscall events will use DWARF callchains, when available
(libunwind enabled in the build) and the printing will stop at 4 levels.
When we introduced support for tracepoint events this ended up not
applying for them, fix it.
Before:
# perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.058 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms
0.000 probe_libc:inet_pton:(7fc6c2a16350))
#
After:
# perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.087 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.087/0.087/0.087/0.000 ms
0.000 probe_libc:inet_pton:(7fbf9a041350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa947cb67f3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa947cb68379] (/usr/bin/ping)
#
Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-afsu9eegd43ppihiuafhh9qv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The raw_syscalls:sys_{enter,exit} were first supported in 'perf trace',
together with minor and major page faults, then we supported
--call-graph, then --max-stack, but when the other tracepoints got
supported, and bpf, etc, I forgot to make those global call-graph
settings apply to them.
Fix it by realizing that the global --max-stack and --call-graph
settings are done via:
OPT_CALLBACK(0, "call-graph", &trace.opts,
"record_mode[,record_size]", record_callchain_help,
&record_parse_callchain_opt),
And then, when we go to parse the events in -e via:
OPT_CALLBACK('e', "event", &trace, "event",
"event/syscall selector. use 'perf list' to list available events",
trace__parse_events_option),
And trace__parse_sevents_option() calls:
struct option o = OPT_CALLBACK('e', "event", &trace->evlist, "event",
"event selector. use 'perf list' to list available events",
parse_events_option);
err = parse_events_option(&o, lists[0], 0);
parse_events_option() will override the global --call-graph and
--max-stack if the "call-graph" and/or "max-stack" terms are in the
event definition, such as in the probe_libc:inet_pton event in one of the
examples below (-e probe_libc:inet_pton/max-stack=2).
Before:
# perf trace --mmap 1024 --call-graph dwarf -e sendto,probe_libc:inet_pton ping -6 -c 1 ::1
1.525 ( ): probe_libc:inet_pton:(7f77f3ac9350))
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.071 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.071/0.071/0.071/0.000 ms
1.677 ( 0.081 ms): ping/31296 sendto(fd: 3, buff: 0x55681b652720, len: 64, addr: 0x55681b650640, addr_len: 28) = 64
__libc_sendto (/usr/lib64/libc-2.26.so)
[0xffffaa97e4bc9cef] (/usr/bin/ping)
[0xffffaa97e4bc656d] (/usr/bin/ping)
[0xffffaa97e4bc7d0a] (/usr/bin/ping)
[0xffffaa97e4bca447] (/usr/bin/ping)
[0xffffaa97e4bc2f91] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa97e4bc3379] (/usr/bin/ping)
#
After:
# perf trace --mmap 1024 --call-graph dwarf -e sendto,probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.089 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.089/0.089/0.089/0.000 ms
1.955 ( ): probe_libc:inet_pton:(7f383a311350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa5d91444f3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa5d91445379] (/usr/bin/ping)
2.140 ( 0.101 ms): ping/32047 sendto(fd: 3, buff: 0x55a26edd0720, len: 64, addr: 0x55a26edce640, addr_len: 28) = 64
__libc_sendto (/usr/lib64/libc-2.26.so)
[0xffffaa5d9144bcef] (/usr/bin/ping)
[0xffffaa5d9144856d] (/usr/bin/ping)
[0xffffaa5d91449d0a] (/usr/bin/ping)
[0xffffaa5d9144c447] (/usr/bin/ping)
[0xffffaa5d91444f91] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa5d91445379] (/usr/bin/ping)
#
Same thing for --max-stack, the global one:
# perf trace --max-stack 3 -e sendto,probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.097 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.097/0.097/0.097/0.000 ms
1.577 ( ): probe_libc:inet_pton:(7f32f3957350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
1.738 ( 0.108 ms): ping/32103 sendto(fd: 3, buff: 0x55c3132d7720, len: 64, addr: 0x55c3132d5640, addr_len: 28) = 64
__libc_sendto (/usr/lib64/libc-2.26.so)
[0xffffaa3cecf44cef] (/usr/bin/ping)
[0xffffaa3cecf4156d] (/usr/bin/ping)
#
And then setting up a global setting (dwarf, max-stack=4), that will
affect the raw_syscall:sys_enter for the 'sendto' syscall and that will
be overriden in the probe_libc:inet_pton call to just one entry.
# perf trace --max-stack=4 --call-graph dwarf -e sendto -e probe_libc:inet_pton/max-stack=1/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.090 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.090/0.090/0.090/0.000 ms
2.140 ( ): probe_libc:inet_pton:(7f9fe9337350))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
2.283 ( 0.103 ms): ping/31804 sendto(fd: 3, buff: 0x55c7f3e19720, len: 64, addr: 0x55c7f3e17640, addr_len: 28) = 64
__libc_sendto (/usr/lib64/libc-2.26.so)
[0xffffaa380c402cef] (/usr/bin/ping)
[0xffffaa380c3ff56d] (/usr/bin/ping)
[0xffffaa380c400d0a] (/usr/bin/ping)
#
Install iputils-debuginfo to get those /usr/bin/ping addresses resolved,
those routines are not on its .dymsym nor .symtab :-)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-qgl2gse8elhh9zztw4ajopg3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since 75562573ba ("perf tools: Add support for
PERF_SAMPLE_IDENTIFIER") we don't need explicitely set
PERF_SAMPLE_IDENTIFIER, as perf_evlist__config() will do this for us,
i.e. when there are more than one evsel in an evlist, it will check if
some evsel has a sample_type different than the one on the first evsel
in the list, setting PERF_SAMPLE_IDENTIFIER in that case.
So, to simplify 'perf trace' codebase, ditch that check.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-12xq6orhwttee2tdtu96ucrp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now all perf_evlist__mmap's users doesn't set 'overwrite'. Remove it
from arguments list.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/20171203020044.81680-2-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently if trace_event__register_resolver() fails, we return -errno,
but we can't be sure that errno isn't zero in this case.
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vasily Averin <vvs@virtuozzo.com>
Link: http://lkml.kernel.org/r/20171108002246.8924-2-avagin@openvz.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For some unknown reason there is no entry in tracefs's syscalls for
kcmp, i.e. no tracefs/events/syscalls/sys_{enter,exit}_kcmp, so we need
to provide a data dictionary for the fields.
To beautify the 'type' argument we automatically generate a strarray
from tools/include/uapi/kcmp.h, the idx1 and idx2 args, nowadays used
only if type == KCMP_FILE, are masked for all the other types and a
lookup is made for the thread and fd to show the path, if possible,
getting it from the probe:vfs_getname if in place or from procfs, races
allowing.
A system wide strace like tracing session, with callchains shows just
one user so far in this fedora 25 machine:
# perf trace --max-stack 5 -e kcmp
<SNIP>
1502914.400 ( 0.001 ms): systemd/1 kcmp(pid1: 1 (systemd), pid2: 1 (systemd), type: FILE, idx1: 271<socket:[4723475]>, idx2: 25<socket:[4788686]>) = -1 ENOSYS Function not implemented
syscall (/usr/lib64/libc-2.25.so)
same_fd (/usr/lib/systemd/libsystemd-shared-233.so)
service_add_fd_store (/usr/lib/systemd/systemd)
service_notify_message.lto_priv.127 (/usr/lib/systemd/systemd)
1502914.407 ( 0.001 ms): systemd/1 kcmp(pid1: 1 (systemd), pid2: 1 (systemd), type: FILE, idx1: 270<socket:[4726396]>, idx2: 25<socket:[4788686]>) = -1 ENOSYS Function not implemented
syscall (/usr/lib64/libc-2.25.so)
same_fd (/usr/lib/systemd/libsystemd-shared-233.so)
service_add_fd_store (/usr/lib/systemd/systemd)
service_notify_message.lto_priv.127 (/usr/lib/systemd/systemd)
<SNIP>
The backtraces seem to agree this is really kcmp(), but this system
doesn't have the sys_kcmp(), bummer:
# uname -a
Linux jouet 4.14.0-rc3+ #1 SMP Fri Oct 13 12:21:12 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
# grep kcmp /proc/kallsyms
ffffffffb60b8890 W sys_kcmp
$ grep CONFIG_CHECKPOINT_RESTORE ../build/v4.14.0-rc3+/.config
# CONFIG_CHECKPOINT_RESTORE is not set
$
So systemd uses it, good fedora kernel config has it:
$ grep CONFIG_CHECKPOINT_RESTORE /boot/config-4.13.4-200.fc26.x86_64
CONFIG_CHECKPOINT_RESTORE=y
[acme@jouet linux]$
/me goes to rebuild a kernel...
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-gz5fca968viw8m7hryjqvrln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One that given a pid and a fd, will try to get the path for that fd.
Will be used in the upcoming kcmp's KCMP_FILE beautifier.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-7ketygp2dvs9h13wuakfncws@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add struct perf_data_file to represent a single file within a perf_data
struct.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename struct perf_data_file to perf_data, because we will add the
possibility to have multiple files under perf.data, so the 'perf_data'
name fits better.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Out of print_binary() but receiving a fp pointer and expecting that the
printer be a fprintf like function, i.e. receive a FILE pointer and
return the number of characters printed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lkml.kernel.org/n/tip-6oqnxr6lmgqe6q6p3iugnscx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The proc files which is sorted with alphabetical order are evenly
assigned to several synthesize threads to be processed in parallel.
For 'perf top', the threads number hard code to online CPU number. The
following patch will introduce an option to set it.
For other perf tools, the thread number is 1. Because the process
function is not ready for multithreading, e.g.
process_synthesized_event.
This patch series only support event synthesize multithreading for 'perf
top'. For other tools, it can be done separately later.
With multithread applied, the total processing time can get up to 1.56x
speedup on Knights Mill for 'perf top'.
For specific single event processing, the processing time could increase
because of the lock contention. So proc_map_timeout may need to be
increased. Otherwise some proc maps will be truncated.
Based on my test, increasing the proc_map_timeout has small impact
on the total processing time. The total processing time still get 1.49x
speedup on Knights Mill after increasing the proc_map_timeout.
The patch itself doesn't increase the proc_map_timeout.
Doesn't need to implement multithreading for per task monitoring,
perf_event__synthesize_thread_map. It doesn't have performance issue.
Committer testing:
# getconf _NPROCESSORS_ONLN
4
# perf trace --no-inherit -e clone -o /tmp/output perf top
# tail -4 /tmp/bla
0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
#
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Let's free the allocated rec_argv in case we return early, in order to
avoid leaking memory.
This adds free() at a few very similar places across the tree where it
was missing.
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Martin kepplinger <martink@posteo.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170913191419.29806-1-martink@posteo.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To process any events, it needs to find the thread in the machine first.
The machine maintains a rb tree to store all threads. The rb tree is
protected by a rw lock.
It is not a problem for current perf which serially processing events.
However, it will have scalability performance issue to process events in
parallel, especially on a heavy load system which have many threads.
Introduce a hashtable to divide the big rb tree into many samll rb tree
for threads. The index is thread id % hashtable size. It can reduce the
lock contention.
Committer notes:
Renamed some variables and function names to reduce semantic confusion:
'struct threads' pointers: thread -> threads
threads hastable index: tid -> hash_bucket
struct threads *machine__thread() -> machine__threads()
Cast tid to (unsigned int) to handle -1 in machine__threads() (Kan Liang)
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1505096603-215017-2-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>