When injecting events from a guest perf.data file, the events will have
separate sample ID numbers. These ID numbers can then be used to determine
which machine an event belongs to. To facilitate that, add machine_pid and
vcpu to id_index records. For backward compatibility, these are added at
the end of the record, and the length of the record is used to determine
if they are present or not.
Note, this is needed because the events from a guest perf.data file contain
the pid/tid of the process running at that time inside the VM not the
pid/tid of the (QEMU) hypervisor thread. So a way is needed to relate
guest events back to the guest machine and VCPU, and using sample ID
numbers for that is relatively simple and convenient.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220711093218.10967-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This new option displays all of the information needed to do external
BuildID-based symbolization of kernel stack traces, such as those collected
by bpf_get_stackid().
For each kernel module plus the main kernel, it displays the BuildID,
the start and end virtual addresses of that module's text range (rounded
out to page boundaries), and the pathname of the module.
When run as a non-privileged user, the actual addresses of the modules'
text ranges are not available, so the tools displays "0, <text length>" for
kernel modules and "0, 0xffffffffffffffff" for the kernel itself.
Sample output:
root# perf buildid-list -m
cf6df852fd4da122d616153353cc8f560fd12fe0 ffffffffa5400000 ffffffffa6001e27 [kernel.kallsyms]
1aa7209aa2acb067d66ed6cf7676d65066384d61 ffffffffc0087000 ffffffffc008b000 /lib/modules/5.15.15-1rodete2-amd64/kernel/crypto/sha512_generic.ko
3857815b5bf0183697b68f8fe0ea06121644041e ffffffffc008c000 ffffffffc0098000 /lib/modules/5.15.15-1rodete2-amd64/kernel/arch/x86/crypto/sha512-ssse3.ko
4081fde0bca2bc097cb3e9d1efcb836047d485f1 ffffffffc0099000 ffffffffc009f000 /lib/modules/5.15.15-1rodete2-amd64/kernel/drivers/acpi/button.ko
1ef81ba4890552ea6b0314f9635fc43fc8cef568 ffffffffc00a4000 ffffffffc00aa000 /lib/modules/5.15.15-1rodete2-amd64/kernel/crypto/cryptd.ko
cc5c985506cb240d7d082b55ed260cbb851f983e ffffffffc00af000 ffffffffc00b6000 /lib/modules/5.15.15-1rodete2-amd64/kernel/drivers/i2c/busses/i2c-piix4.ko
[...]
Committer notes:
u64 formatter should be PRIx64 for printing as hex numbers, fix this:
28 5.28 debian:experimental-x-mips : FAIL gcc version 11.2.0 (Debian 11.2.0-18)
builtin-buildid-list.c: In function 'buildid__map_cb':
builtin-buildid-list.c:32:24: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
32 | printf("%s %16lx %16lx", bid_buf, map->start, map->end);
| ~~~~^ ~~~~~~~~~~
| | |
| long unsigned int u64 {aka long long unsigned int}
| %16llx
builtin-buildid-list.c:32:30: error: format '%lx' expects argument of type 'long unsigned int', but argument 4 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
32 | printf("%s %16lx %16lx", bid_buf, map->start, map->end);
| ~~~~^ ~~~~~~~~
| | |
| long unsigned int u64 {aka long long unsigned int}
| %16llx
cc1: all warnings being treated as errors
Signed-off-by: Blake Jones <blakejones@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220629213632.3899212-1-blakejones@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To update the perf/core codebase.
Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end
of evsel__config(), after what was added in:
49c692b7df ("perf offcpu: Accept allowed sample types only")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':
#0 0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
#1 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
#2 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
#3 0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
#4 syscall__scnprintf_args <snip> at builtin-trace.c:2041
#5 0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319
That points to the below code in tools/perf/builtin-trace.c:
/*
* If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
* arguments, even if the syscall being handled, say "openat", uses only 4 arguments
* this breaks syscall__augmented_args() check for augmented args, as we calculate
* syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
* so when handling, say the openat syscall, we end up getting 6 args for the
* raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
* thinking that the extra 2 u64 args are the augmented filename, so just check
* here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
*/
if (evsel != trace->syscalls.events.sys_enter)
augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);
As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.
Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The lock contention tracepoints don't provide lock names. All we can
do is to get stack traces and show the caller instead. To minimize
the overhead it's limited to up to 8 stack traces and display the
first non-lock function symbol name as a caller.
$ perf lock report -F acquired,contended,avg_wait,wait_total
Name acquired contended avg wait total wait
update_blocked_a... 40 40 3.61 us 144.45 us
kernfs_fop_open+... 5 5 3.64 us 18.18 us
_nohz_idle_balance 3 3 2.65 us 7.95 us
tick_do_update_j... 1 1 6.04 us 6.04 us
ep_scan_ready_list 1 1 3.93 us 3.93 us
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220615163222.1275500-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently it only prints the time in nsec but it's a bit hard to read
and takes longer in the screen. We can change it to use different units
and keep the number small to save the space.
Before:
$ perf lock report
Name acquired contended avg wait (ns) total wait (ns) max wait (ns) min wait (ns)
jiffies_lock 433 32 2778 88908 13570 692
&lruvec->lru_lock 747 5 11254 56272 18317 1412
slock-AF_INET6 7 1 23543 23543 23543 23543
&newf->file_lock 706 15 1025 15388 2279 618
slock-AF_INET6 8 1 10379 10379 10379 10379
&rq->__lock 2143 5 2037 10185 3462 939
After:
Name acquired contended avg wait total wait max wait min wait
jiffies_lock 433 32 2.78 us 88.91 us 13.57 us 692 ns
&lruvec->lru_lock 747 5 11.25 us 56.27 us 18.32 us 1.41 us
slock-AF_INET6 7 1 23.54 us 23.54 us 23.54 us 23.54 us
&newf->file_lock 706 15 1.02 us 15.39 us 2.28 us 618 ns
slock-AF_INET6 8 1 10.38 us 10.38 us 10.38 us 10.38 us
&rq->__lock 2143 5 2.04 us 10.19 us 3.46 us 939 ns
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220615163222.1275500-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked
objects") uncovered the following issue on aarch64:
util/unwind-libunwind-local.c: In function 'find_proc_info':
util/unwind-libunwind-local.c:386:28: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
386 | if (ofs > 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:371:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
371 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:363:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
363 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
In file included from util/libunwind/arm64.c:37:
Fixes: dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked objects")
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: kernel-team@cloudflare.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701182046.12589-1-ivan@cloudflare.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
jevents.c is large, has a dependency on an old forked version of jsmn,
and is challenging to work upon. A lot of jevents.c's complexity comes
from needing to write json and csv parsing from first principles. In
contrast python has this functionality in standard libraries and is
already a build pre-requisite for tools like asciidoc (that builds all
of the perf man pages).
Introduce jevents.py that produces identical output to jevents.c. Add a
test that runs both converter tools and validates there are no output
differences. The test can be invoked with a phony build target like:
$ make -C tools/perf jevents-py-test
The python code deliberately tries to replicate the behavior of
jevents.c so that the output matches and transitioning tools shouldn't
introduce regressions. In some cases the code isn't as elegant as hoped,
but fixing this can be done as follow up.
Committer testing:
$ make -C tools/perf jevents-py-test
make: Entering directory '/var/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j32' parallel build
HOSTCC fixdep.o
HOSTLD fixdep-in.o
LINK fixdep
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... libbfd: [ on ]
... libbfd-buildid: [ on ]
... libcap: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libcrypto: [ OFF ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
... disassembler-four-args: [ on ]
HOSTCC pmu-events/json.o
HOSTCC pmu-events/jsmn.o
HOSTCC pmu-events/jevents.o
HOSTLD pmu-events/jevents-in.o
LINK pmu-events/jevents
Checking architecture: arm64
Generating using jevents.c
Generating using jevents.py
Diffing
Checking architecture: nds32
Generating using jevents.c
Generating using jevents.py
Diffing
Checking architecture: powerpc
Generating using jevents.c
Generating using jevents.py
Diffing
Checking architecture: s390
Generating using jevents.c
Generating using jevents.py
Diffing
Checking architecture: x86
Generating using jevents.c
Generating using jevents.py
Diffing
make: Leaving directory '/var/home/acme/git/perf/tools/perf'
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: John Garry <john.garry@huawei.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Like Xu <likexu@tencent.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220629182505.406269-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf already support ignore_missing_thread for -p, but not yet
applied to `perf stat -p <pid>`. This patch enables ignore_missing_thread
for `perf stat -p <pid>`.
Committer notes:
And here is a refresher about the 'ignore_missing_thread' knob, from a
previous patch using it:
ca8000684e ("perf evsel: Enable ignore_missing_thread for pid option")
---
While monitoring a multithread process with pid option, perf sometimes
may return sys_perf_event_open failure with 3(No such process) if any of
the process's threads die before we open the event. However, we want
perf continue monitoring the remaining threads and do not exit with
error.
---
Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220622030037.15005-1-ligang.bdlg@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For some reason using:
cat <<EoFuncBegin
static const char *errno_to_name__$arch(int err)
{
switch (err) {
EoFuncBegin
In tools/perf/trace/beauty/arch_errno_names.sh isn't working on ALT
Linux sisyphus (development version), which could be some distro
specific glitch, so just get this done in an alternative way that works
everywhere while giving notice to the people working on that distro to
try and figure our what really took place.
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Build ID events associate a file name with a build ID. However, when
using perf inject, there is no guarantee that the file on the current
machine at the current time has that build ID. Fix by comparing the
build IDs and skip adding to the cache if they are different.
Example:
$ echo "int main() {return 0;}" > prog.c
$ gcc -o prog prog.c
$ perf record --buildid-all ./prog
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data ]
$ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; }
$ file-buildid prog
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ echo "int main() {return 1;}" > prog.c
$ gcc -o prog prog.c
$ file-buildid prog
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
Before:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
$
After:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
$
Fixes: 454c407ec1 ("perf: add perf-inject builtin")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20220621125144.5623-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the -D option is used, the details of thread-map, cpu-map and
event-update events are not currently dumped. Add prints so that they are.
Example:
# perf record --kcore sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
# perf script -D | grep 'THREAD\|CPU'
0 0x4950 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 35116
0 0x4978 [0x20]: PERF_RECORD_CPU_MAP: 0-7
# perf script -D | grep -A4 'UPDATE'
0 0x4920 [0x30]: PERF_RECORD_EVENT_UPDATE
... id: 147
... 0-7
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220610113316.6682-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In preparation for recording sideband events in a virtual machine guest so
that they can be injected into a host perf.data file.
This is needed to enable injecting events after the initial synthesized
user events (that have an all zero id sample) but before regular events.
Committer notes:
Add entry about PERF_RECORD_FINISHED_INIT to
tools/perf/Documentation/perf.data-file-format.txt.
Committer testing:
Before:
# perf report -D | grep FINISHED
0 0x5910 [0x8]: PERF_RECORD_FINISHED_ROUND
FINISHED_ROUND events: 1 ( 0.5%)
#
After:
# perf record -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ]
# perf report -D | grep FINISHED
0 0x5068 [0x8]: PERF_RECORD_FINISHED_INIT: unhandled!
0 0x5390 [0x8]: PERF_RECORD_FINISHED_ROUND
FINISHED_ROUND events: 1 ( 0.5%)
FINISHED_INIT events: 1 ( 0.5%)
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220610113316.6682-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>