Commit Graph

16654 Commits

Author SHA1 Message Date
Howard Chu
7fbff3c0e0 perf build: Change the clang check back to 12.0.1
This serves as a revert for this patch:

  https://lore.kernel.org/linux-perf-users/ZuGL9ROeTV2uXoSp@x1/

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241011021403.4089793-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23 11:34:56 -03:00
Howard Chu
395d38419f perf trace augmented_raw_syscalls: Add more checks to pass the verifier
Add some more checks to pass the verifier in more kernels.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241011021403.4089793-3-howardchu95@gmail.com
[ Reduced the patch removing things that can be done later ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23 11:34:56 -03:00
Arnaldo Carvalho de Melo
ecabac70ff perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers
In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise
kernels go, have backports from modern kernels, the verifier complains
about lack of bounds check for the index into the array of syscall
arguments, on a BPF bytecode generated by clang 17, with:

  ; } else if (size < 0 && size >= -6) { /* buffer */
  116: (b7) r1 = -6
  117: (2d) if r1 > r6 goto pc-30
   R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
  ; index = -(size + 1);
  118: (a7) r6 ^= -1
  119: (67) r6 <<= 32
  120: (77) r6 >>= 32
  ; aug_size = args->args[index];
  121: (67) r6 <<= 3
  122: (79) r1 = *(u64 *)(r10 -24)
  123: (0f) r1 += r6
  last_idx 123 first_idx 116
  regs=40 stack=0 before 122: (79) r1 = *(u64 *)(r10 -24)
  regs=40 stack=0 before 121: (67) r6 <<= 3
  regs=40 stack=0 before 120: (77) r6 >>= 32
  regs=40 stack=0 before 119: (67) r6 <<= 32
  regs=40 stack=0 before 118: (a7) r6 ^= -1
  regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30
  regs=42 stack=0 before 116: (b7) r1 = -6
   R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value
  parent didn't have regs=40 stack=0 marks
  last_idx 110 first_idx 98
  regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5
  regs=42 stack=0 before 109: (b7) r1 = 1
  regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7
  regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9
   R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value
  parent already had regs=40 stack=0 marks
  124: (79) r6 = *(u64 *)(r1 +16)
   R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
  R1 unbounded memory access, make sure to bounds check any such access
  processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3

If we add this line, as used in other BPF programs, to cap that index:

   index &= 7;

The generated BPF program is considered safe by that version of the BPF
verifier, allowing perf to collect the syscall args in one more kernel
using the BPF based pointer contents collector.

With the above one-liner it works with that kernel:

  [root@dell-per740-01 ~]# uname -a
  Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
  [root@dell-per740-01 ~]# ~acme/bin/perf trace -e *sleep* sleep 1.234567890
       0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 })                  = 0
  [root@dell-per740-01 ~]#

As well as with the one in Fedora 40:

  root@number:~# uname -a
  Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux
  root@number:~# perf trace -e *sleep* sleep 1.234567890
       0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0
  root@number:~#

Song Liu reported that this one-liner was being optimized out by clang
18, so I suggested and he tested that adding a compiler barrier before
it made clang v18 to keep it and the verifier in the kernel in Song's
case (Meta's 5.12 based kernel) also was happy with the resulting
bytecode.

I'll investigate using virtme-ng[1] to have all the perf BPF based
functionality thoroughly tested over multiple kernels and clang
versions.

[1] https://kernel-recipes.org/en/2024/virtme-ng/

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23 11:34:56 -03:00
Arnaldo Carvalho de Melo
39c6a35620 perf trace: The return from 'write' isn't a pid
When adding a explicit beautifier for the 'write' syscall when the BPF
based buffer collector was introduced there was a cut'n'paste error that
carried the syscall_fmt->errpid setting from a nearby syscall (waitid)
that returns a pid.

So the write return was being suppressed by the return pretty printer,
remove that field, reverting it back to the default return handler, that
prints positive numbers as-is and interpret negative values as errnos.

I actually introduced the problem while making Howard's original patch
work just with the 'write' syscall, as we couldn't just look for any
buffers, the ones that are filled in by the kernel couldn't use the same
sys_enter BPF collector.

Fixes: b257fac12f ("perf trace: Pretty print buffer data")
Reported-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/lkml/bcf50648-3c7e-4513-8717-0d14492c53b9@linaro.org
Link: https://lore.kernel.org/all/Zt8jTfzDYgBPvFCd@x1/#t
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-17 10:34:43 -03:00
Linus Torvalds
b2760b8390 perf tools fixes for v6.12:
- Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces.
 
 - Fix static build compilation error when libdw isn't installed or is too old.
 
 - Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT.
 
 - Add missing refcount put on 32-bit DSOs.
 
 - Fix disassembly of user space binaries by setting the binary_type of DSO when
   loading.
 
 - Update headers with the kernel sources, including asound.h, sched.h, fcntl,
   msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZwU2dgAKCRCyPKLppCJ+
 J8uaAQDEbp0lMf1S/Y6vOGbnP6mGQCewQsXtIpSA4gcRMWlCCgD+O6ZxbnBCHOzn
 nQfBmbT62qUGuUA38Mg7pCyRXBd8FgU=
 =s4JZ
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix an assert() to handle captured and unprocessed ARM CoreSight CPU
   traces

 - Fix static build compilation error when libdw isn't installed or is
   too old

 - Add missing include when building with
   !HAVE_DWARF_GETLOCATIONS_SUPPORT

 - Add missing refcount put on 32-bit DSOs

 - Fix disassembly of user space binaries by setting the binary_type of
   DSO when loading

 - Update headers with the kernel sources, including asound.h, sched.h,
   fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's
   cputype.h

* tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace
  perf build: Fix build feature-dwarf_getlocations fail for old libdw
  perf build: Fix static compilation error when libdw is not installed
  perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT
  tools headers arm64: Sync arm64's cputype.h with the kernel sources
  perf tools: Cope with differences for lib/list_sort.c copy from the kernel
  tools check_headers.sh: Add check variant that excludes some hunks
  perf beauty: Update copy of linux/socket.h with the kernel sources
  tools headers UAPI: Sync the linux/in.h with the kernel sources
  perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools include UAPI: Sync linux/fcntl.h copy with the kernel sources
  tools include UAPI: Sync linux/sched.h copy with the kernel sources
  tools include UAPI: Sync sound/asound.h copy with the kernel sources
  perf vdso: Missed put on 32-bit dsos
  perf symbol: Set binary_type of dso when loading
2024-10-08 10:43:22 -07:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Ilkka Koskinen
e934a35e3c perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace
If one builds perf with DEBUG=1, captures data on multiple CPUs and
finally runs 'perf report -C <cpu>' for only one of the cpus, assert()
aborts the program. This happens because there are empty queues with
format set.

This patch changes the condition to abort only if a queue is not empty
and if the format is unset.

  $ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install
  $ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1
  $ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1
  Aborted (core dumped)

Fixes: 57880a7966 ("perf: cs-etm: Allocate queues for all CPUs")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240924233930.5193-1-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02 18:21:49 -03:00
Yang Jihong
a530337ba9 perf build: Fix build feature-dwarf_getlocations fail for old libdw
For libdw versions below 0.177, need to link libdl.a in addition to
libbebl.a during static compilation, otherwise
feature-dwarf_getlocations compilation will fail.

Before:

  $ make LDFLAGS=-static
    BUILD:   Doing 'make -j20' parallel build
  <SNIP>
  Makefile.config:483: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
  <SNIP>

  $ cat ../build/feature/test-dwarf_getlocations.make.output
  /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libebl.a(eblclosebackend.o): in function `ebl_closebackend':
  (.text+0x20): undefined reference to `dlclose'
  collect2: error: ld returned 1 exit status

After:

  $ make LDFLAGS=-static
  <SNIP>
    Auto-detecting system features:
  ...                                   dwarf: [ on  ]
  <SNIP>

    $ ./perf probe
   Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
      or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
      or: perf probe [<options>] --del '[GROUP:]EVENT' ...
      or: perf probe --list [GROUP:]EVENT ...
  <SNIP>

Fixes: 536661da6e ("perf: build: Only link libebl.a for old libdw")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240919013513.118527-3-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02 18:21:49 -03:00
Yang Jihong
43f6564f18 perf build: Fix static compilation error when libdw is not installed
If libdw is not installed in build environment, the output of
'pkg-config --modversion libdw' is empty, causing LIBDW_VERSION_2 to be
empty and the shell test will have the following error:

  /bin/sh: 1: test: -lt: unexpected operator

Before:

  $ pkg-config --modversion libdw
  Package libdw was not found in the pkg-config search path.
  Perhaps you should add the directory containing `libdw.pc'
  to the PKG_CONFIG_PATH environment variable
  No package 'libdw' found
  $ make LDFLAGS=-static -j16
    BUILD:   Doing 'make -j20' parallel build
  <SNIP>
  Package libdw was not found in the pkg-config search path.
  Perhaps you should add the directory containing `libdw.pc'
  to the PKG_CONFIG_PATH environment variable
  No package 'libdw' found
  /bin/sh: 1: test: -lt: unexpected operator

After:

  1. libdw is not installed:

  $ pkg-config --modversion libdw
  Package libdw was not found in the pkg-config search path.
  Perhaps you should add the directory containing `libdw.pc'
  to the PKG_CONFIG_PATH environment variable
  No package 'libdw' found
  $ make LDFLAGS=-static -j16
    BUILD:   Doing 'make -j20' parallel build
  <SNIP>
  Package libdw was not found in the pkg-config search path.
  Perhaps you should add the directory containing `libdw.pc'
  to the PKG_CONFIG_PATH environment variable
  No package 'libdw' found
  Makefile.config:473: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR

  2. libdw version is lower than 0.177

  $ pkg-config --modversion libdw
  0.176
  $ make LDFLAGS=-static -j16
    BUILD:   Doing 'make -j20' parallel build
  <SNIP>

  Auto-detecting system features:
  ...                                   dwarf: [ on  ]
  <SNIP>
    INSTALL libsubcmd_headers
    INSTALL libapi_headers
    INSTALL libperf_headers
    INSTALL libsymbol_headers
    INSTALL libbpf_headers
    LINK    perf

  3. libdw version is higher than 0.177

  $ pkg-config --modversion libdw
  0.186
  $ make LDFLAGS=-static -j16
    BUILD:   Doing 'make -j20' parallel build
  <SNIP>

  Auto-detecting system features:
  ...                                   dwarf: [ on  ]
  <SNIP>
    CC      util/bpf-utils.o
    CC      util/pfm.o
    LD      util/perf-util-in.o
    LD      perf-util-in.o
    AR      libperf-util.a
    LINK    perf

Fixes: 536661da6e ("perf: build: Only link libebl.a for old libdw")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240919013513.118527-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02 18:21:49 -03:00
James Clark
008979cc69 perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT
The linked fixes commit added an #include "dwarf-aux.h" to disasm.h
which gets picked up in a lot of places. Without
HAVE_DWARF_GETLOCATIONS_SUPPORT the stubs return an errno, so include
errno.h to fix the following build error:

  In file included from util/disasm.h:8,
                 from util/annotate.h:16,
                 from builtin-top.c:23:
  util/dwarf-aux.h: In function 'die_get_var_range':
  util/dwarf-aux.h:183:10: error: 'ENOTSUP' undeclared (first use in this function)
    183 |  return -ENOTSUP;
        |          ^~~~~~~

Fixes: 782959ac24 ("perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking")
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241001123625.1063153-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02 18:21:49 -03:00
Arnaldo Carvalho de Melo
36110669dd perf tools: Cope with differences for lib/list_sort.c copy from the kernel
With 6d74e1e371 ("tools/lib/list_sort: remove redundant code for
cond_resched handling") we need to use the newly added hunk based
exceptions when comparing the copy we carry in tools/lib/ to the
original file, do it by adding the hunks that we know will be the
expected diff.

If at some point the original file is updated in other parts, then we
should flag and check the file for update.

Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/20240930202136.16904-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02 15:07:32 -03:00
Arnaldo Carvalho de Melo
cd46ea5ab4 tools check_headers.sh: Add check variant that excludes some hunks
With 6d74e1e371 ("tools/lib/list_sort: remove redundant code for
cond_resched handling") we end up with a multi-line variation in the
merge_final() implementation, one that the simple line based exceptions
we had so far can't cope.

Thus this check has been failing:

  Warning: Kernel ABI header differences:
    diff -u tools/lib/list_sort.c lib/list_sort.c

So add a new check routine that uses grep -vf to exclude some hunks that
we store in the tools/perf/check-header_ignore_hunks/ directory.

This first patch is just the new check routine, the next one will use it
to check lib/list_sort.c.

Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/20240930202136.16904-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02 14:50:44 -03:00
Arnaldo Carvalho de Melo
d164868879 perf beauty: Update copy of linux/socket.h with the kernel sources
To pick the changes in:

  8f0b3cc9a4 ("tcp: RX path for devmem TCP")

That don't result in any changes in the tables generated from that
header.

But while updating I noticed we need to support the new MSG_SOCK_DEVMEM
flag in the hard coded table for the msg flags table, add it.

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Please see tools/include/uapi/README for details.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrO_eT9e_41xrNv@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30 17:23:38 -03:00
Arnaldo Carvalho de Melo
c94cd9508b perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources
To pick up the change in:

  a1fab3e69d ("x86/irq: Fix comment on IRQ vector layout")

That just adds some comments, so no changes in perf tooling, just
silences this build warning:

  diff -u tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h arch/x86/include/asm/irq_vectors.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZvrKT7oQc1AOv6Vk@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30 17:23:38 -03:00
Arnaldo Carvalho de Melo
58f969b7a8 tools include UAPI: Sync linux/fcntl.h copy with the kernel sources
Picking the changes from:

  4356d575ef ("fhandle: expose u64 mount id to name_to_handle_at(2)")
  b4fef22c2f ("uapi: explain how per-syscall AT_* flags should be allocated")
  820a185896 ("fcntl: add F_CREATED_QUERY")

It just moves AT_REMOVEDIR around, and adds a bunch more AT_ for
renameat2() and name_to_handle_at(). We need to improve this situation,
as not all AT_ defines are applicable to all fs flags...

This adds support for those new AT_ defines, addressing this build
warning:

  diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h

Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrIKL3cREoRHIQd@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30 17:23:31 -03:00
Arnaldo Carvalho de Melo
7ae76b32f9 tools include UAPI: Sync linux/sched.h copy with the kernel sources
Picking the changes from:

  f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")

The inclusion of the SCHED_EXT define doesn't cause any change in
behaviour in tools/perf.

This just silences this perf tools build warning:

  diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrDShNVXotZpiwk@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30 12:38:01 -03:00
Arnaldo Carvalho de Melo
c850897b6c tools include UAPI: Sync sound/asound.h copy with the kernel sources
Picking the changes from:

  37745918e0 ("ALSA: timer: Introduce virtual userspace-driven timers")

Which entails no changes in the tooling side as it only introduces new
SNDRV_TIMER_IOCTL_ ioctls, and the ones tracked by scripts in
tools/perf/trace/beauty/ are only SNDRV_PCM_IOCTL_ and SNDRV_CTL_IOCTL_,
we still need to support SNDRV_TIMER_IOCTL_ ones, but that probably will
be one of the first for a BTF enumeration based approach :-)

This silences this perf tools build warning:

  diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ivan Orlov <ivan.orlov0322@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/lkml/ZvrB-g_E7g2ArlYW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30 12:21:54 -03:00
Ian Rogers
424aafb61a perf vdso: Missed put on 32-bit dsos
If the dso type doesn't match then NULL is returned but the dso should
be put first.

Fixes: f649ed80f3 ("perf dsos: Tidy reference counting and locking")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240912182757.762369-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-27 15:38:52 -03:00
Namhyung Kim
5363c30678 perf symbol: Set binary_type of dso when loading
For the kernel dso, it sets the binary type of dso when loading the
symbol table.  But it seems not to do that for user DSOs.  Actually
it sets the symtab type only.  It's not clear why we want to maintain
the two separately but it uses the binary type info before getting
the disassembly.

Let's use the symtab type as binary type too if it's not set.  I think
it's ok to set the binary type when it founds a symsrc whether or not
it has actual symbols.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Alexander Monakov <amonakov@ispras.ru>
Link: https://lore.kernel.org/r/20240426215139.1271039-1-namhyung@kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc:  <linux-perf-users@vger.kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-22 23:46:18 +02:00
Arnaldo Carvalho de Melo
1de5b5dcb8 perf trace: Mark the 'head' arg in the set_robust_list syscall as coming from user space
With that it uses the generic BTF based pretty printer:

This one we need to think about, not being acquainted with this syscall,
should we _traverse_ that list somehow? Would that be useful?

  root@number:~# perf trace -e set_robust_list sleep 1
       0.000 ( 0.004 ms): sleep/1206493 set_robust_list(head: (struct robust_list_head){.list = (struct robust_list){.next = (struct robust_list *)0x7f48a9a02a20,},.futex_offset = (long int)-32,}, len: 24) =
  root@number:~#

strace prints the default integer args:

  root@number:~# strace -e set_robust_list sleep 1
  set_robust_list(0x7efd99559a20, 24)     = 0
  +++ exited with 0 +++
  root@number:~#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org
Link: https://lore.kernel.org/lkml/ZuH6MquMraBvODRp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 17:25:45 -03:00
Arnaldo Carvalho de Melo
0c1019e346 perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user space
With that it uses the generic BTF based pretty printer:

  root@number:~# grep -w rseq /sys/kernel/tracing/events/syscalls/sys_enter_rseq/format
  	field:struct rseq * rseq;	offset:16;	size:8;	signed:0;
  print fmt: "rseq: 0x%08lx, rseq_len: 0x%08lx, flags: 0x%08lx, sig: 0x%08lx", ((unsigned long)(REC->rseq)), ((unsigned long)(REC->rseq_len)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->sig))
  root@number:~#

Before:

  root@number:~# perf trace -e rseq
       0.000 ( 0.017 ms): Isolated Web C/1195452 rseq(rseq: 0x7ff0ecfe6fe0, rseq_len: 32, sig: 1392848979)             = 0
      74.018 ( 0.006 ms): :1195453/1195453 rseq(rseq: 0x7f2af20fffe0, rseq_len: 32, sig: 1392848979)             = 0
    1817.220 ( 0.009 ms): Isolated Web C/1195454 rseq(rseq: 0x7f5c9ec7dfe0, rseq_len: 32, sig: 1392848979)             = 0
    2515.526 ( 0.034 ms): :1195455/1195455 rseq(rseq: 0x7f61503fffe0, rseq_len: 32, sig: 1392848979)             = 0
  ^Croot@number:~#

After:

  root@number:~# perf trace -e rseq
       0.000 ( 0.019 ms): Isolated Web C/1197258 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)4,.cpu_id = (__u32)4,.mm_cid = (__u32)5,}, rseq_len: 32, sig: 1392848979) = 0
    1663.835 ( 0.019 ms): Isolated Web C/1197259 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)24,.cpu_id = (__u32)24,.mm_cid = (__u32)2,}, rseq_len: 32, sig: 1392848979) = 0
    4750.444 ( 0.018 ms): Isolated Web C/1197260 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)8,.cpu_id = (__u32)8,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0
    4994.132 ( 0.018 ms): Isolated Web C/1197261 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)10,.cpu_id = (__u32)10,.mm_cid = (__u32)1,}, rseq_len: 32, sig: 1392848979) = 0
    4997.578 ( 0.011 ms): Isolated Web C/1197263 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)16,.cpu_id = (__u32)16,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0
    4997.462 ( 0.014 ms): Isolated Web C/1197262 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)17,.cpu_id = (__u32)17,.mm_cid = (__u32)3,}, rseq_len: 32, sig: 1392848979) = 0
  ^Croot@number:~#

We'll probably need to come up with some way for using the BTF info to
synthesize a test that then gets used and captures the output of the
'perf trace' output to check if the arguments are the ones synthesized,
randomically, for now, lets make do manually:

  root@number:~# cat ~acme/c/rseq.c
  #include <sys/syscall.h>     /* Definition of SYS_* constants */
  #include <linux/rseq.h>
  #include <errno.h>
  #include <string.h>
  #include <unistd.h>
  #include <stdint.h>
  #include <stdio.h>

  /* Provide own rseq stub because glibc doesn't */
  __attribute__((weak))
  int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig)
  {
  	return syscall(SYS_rseq, rseq, rseq_len, flags, sig);
  }

  int main(int argc, char *argv[])
  {
  	struct rseq rseq = {
  		.cpu_id_start = 12,
  		.cpu_id = 34,
  		.rseq_cs = 56,
  		.flags = 78,
  		.node_id = 90,
  		.mm_cid = 12,
  	};
  	int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf);

  	printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno));
  	return err;
  }
  root@number:~# perf trace -e rseq ~acme/c/rseq
  sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument)
       0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979)            =
       0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument)
  root@number:~#root@number:~# cat ~acme/c/rseq.c
  #include <sys/syscall.h>     /* Definition of SYS_* constants */
  #include <linux/rseq.h>
  #include <errno.h>
  #include <string.h>
  #include <unistd.h>
  #include <stdint.h>
  #include <stdio.h>

  /* Provide own rseq stub because glibc doesn't */
  __attribute__((weak))
  int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig)
  {
  	return syscall(SYS_rseq, rseq, rseq_len, flags, sig);
  }

  int main(int argc, char *argv[])
  {
  	struct rseq rseq = {
  		.cpu_id_start = 12,
  		.cpu_id = 34,
  		.rseq_cs = 56,
  		.flags = 78,
  		.node_id = 90,
  		.mm_cid = 12,
  	};
  	int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf);

  	printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno));
  	return err;
  }
  root@number:~# perf trace -e rseq ~acme/c/rseq
  sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument)
       0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979)            =
       0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument)
  root@number:~#

Interesting, glibc seems to be using rseq here, as in addition to the
totally fake one this test case uses, we have this one, around these
other syscalls:

     0.175 ( 0.001 ms): rseq/1201095 set_tid_address(tidptr: 0x7f6def759a10)                               = 1201095 (rseq)
     0.177 ( 0.001 ms): rseq/1201095 set_robust_list(head: 0x7f6def759a20, len: 24)                        = 0
     0.178 ( 0.001 ms): rseq/1201095 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979)            =
     0.231 ( 0.005 ms): rseq/1201095 mprotect(start: 0x7f6def93f000, len: 16384, prot: READ)               = 0
     0.238 ( 0.003 ms): rseq/1201095 mprotect(start: 0x403000, len: 4096, prot: READ)                      = 0
     0.244 ( 0.004 ms): rseq/1201095 mprotect(start: 0x7f6def99c000, len: 8192, prot: READ)

Matches strace (well, not really as the strace in fedora:40 doesn't know
about rseq, printing just integer values in hex):

  set_robust_list(0x7fbc6acc7a20, 24)     = 0
  rseq(0x7fbc6acc8060, 0x20, 0, 0x53053053) = 0
  mprotect(0x7fbc6aead000, 16384, PROT_READ) = 0
  mprotect(0x403000, 4096, PROT_READ)     = 0
  mprotect(0x7fbc6af0a000, 8192, PROT_READ) = 0
  prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
  munmap(0x7fbc6aebd000, 81563)           = 0
  rseq(0x7fff15bb9920, 0x20, 0x181cd, 0xdeadbeaf) = -1 EINVAL (Invalid argument)
  fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x9), ...}) = 0
  getrandom("\xd0\x34\x97\x17\x61\xc2\x2b\x10", 8, GRND_NONBLOCK) = 8
  brk(NULL)                               = 0x18ff4000
  brk(0x19015000)                         = 0x19015000
  write(1, "sys_rseq({ .cpu_id_start = 12, ."..., 136sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument)
  ) = 136
  exit_group(-1)                          = ?
  +++ exited with 255 +++
  root@number:~#

And also the focus for the v6.13 should be to have a better, strace
like BTF pretty printer as one of the outputs we can get from the libbpf
BTF dumper.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuH2K1LLt1pIDkbd@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 17:05:23 -03:00
Kan Liang
edf3ce0ed3 perf env: Find correct branch counter info on hybrid
No event is printed in the "Branch Counter" column on hybrid machines.

For example,

  $ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter
  $ perf report --total-cycles

  # Branch counter abbr list:
  # cpu_core/branch-instructions/pp = A
  # cpu_core/branches/ = B
  # '-' No event occurs
  # '+' Event occurrences may be lost due to branch counter saturated
  #
  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles  Branch Counter
  # ...............  ..............  ...........  ..........  ..............
            44.54%          727.1K        0.00%           1   |+   |+   |
            36.31%          592.7K        0.00%           2   |+   |+   |
            17.83%          291.1K        0.00%           1   |+   |+   |

The branch counter information (br_cntr_width and br_cntr_nr) in the
perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS
is not available on hybrid machines. Without the width information, the
number of occurrences of an event cannot be calculated.

For a hybrid machine, the caps information should be retrieved from the
PMU_CAPS, and stored in the perf_env->pmu_caps.

Add a perf_env__find_br_cntr_info() to return the correct branch counter
information from the corresponding fields.

Committer notes:

While testing I couldn't s ee those "Branch counter" columns enabled by
pressing 'B' on the TUI, after reporting it to the list Kan explained
the situation:

<quote Kan Liang>
For a hybrid client, the "Branch Counter" feature is only supported
starting from the just released Lunar Lake. Perf falls back to only
"ANY" on your Raptor Lake.

The "The branch counter is not available" message is expected.

Here is the 'perf evlist' result from my Lunar Lake machine,

  # perf evlist -v
  cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS
  #
</quote>

Fixes: 6f9d8d1de2 ("perf script: Add branch counters")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240909184201.553519-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 13:08:46 -03:00
Kan Liang
9953807c9e perf evlist: Print hint for group
An event group is a critical relationship. There is a -g option that can
display the relationship. But it's hard for a user to know when should
this option be applied.

If there is an event group in the perf record, print a hint to suggest
the user apply the -g to display the group information.

With the patch,

  $ perf record -e "{cycles,instructions},instructions" sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.024 MB perf.data (4 samples) ]
  $

  $ perf evlist
  cycles
  instructions
  instructions
  # Tip: use 'perf evlist -g' to show group information

  $ perf evlist -g
  {cycles,instructions}
  instructions
  $

Committer testing:

So for a perf.data file _with_ a group:

  root@number:~# perf evlist -g
  {cpu_core/branch-instructions/pp,cpu_core/branches/}
  dummy:u
  root@number:~# perf evlist
  cpu_core/branch-instructions/pp
  cpu_core/branches/
  dummy:u
  # Tip: use 'perf evlist -g' to show group information
  root@number:~#

Then for something _without_ a group, no hint:

  root@number:~# perf record ls
  <SNIP>
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ]
  root@number:~# perf evlist
  cpu_atom/cycles/P
  cpu_core/cycles/P
  dummy:u
  root@number:~#

No suggestion, good.

Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/lkml/ZttgvduaKsVn1r4p@x1/
Link: https://lore.kernel.org/r/20240908202847.176280-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 13:08:45 -03:00
Sam James
eb9b9a6f5a tools: Drop nonsensical -O6
-O6 is very much not-a-thing. Really, this should've been dropped
entirely in 49b3cd306e ("tools: Set the maximum optimization level
according to the compiler being used") instead of just passing it for
not-Clang.

Just collapse it down to -O3, instead of "-O6 unless Clang, in which case
-O3".

GCC interprets > -O3 as -O3. It doesn't even interpret > -O3 as -Ofast,
which is a good thing, given -Ofast has specific (non-)requirements for
code built using it. So, this does nothing except look a bit daft.

Remove the silliness and also save a few lines in the Makefiles accordingly.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/4f01524fa4ea91c7146a41e26ceaf9dae4c127e4.1725821201.git.sam@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 13:08:36 -03:00
Ian Rogers
89c0a55e55 perf pmu: To info add event_type_desc
All PMU events are assumed to be "Kernel PMU event", however, this
isn't true for fake PMUs and won't be true with the addition of more
software PMUs. Make the PMU's type description name configurable -
largely for printing callbacks.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-5-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:29:20 -03:00
Ian Rogers
f08cc25843 perf evsel: Add accessor for tool_event
Currently tool events use a dedicated variable within the evsel. Later
changes will move this to the unused struct perf_event_attr config for
these events. Add an accessor to allow the later change to be well
typed and avoid changing all uses.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:28:27 -03:00
Ian Rogers
925320737a perf pmus: Fake PMU clean up
Rather than passing a fake PMU around, just pass that the fake PMU
should be used - true when doing testing. Move the fake PMU into
pmus.[ch] and try to abstract the PMU's properties in pmu.c, ie so
there is less "if fake_pmu" in non-PMU code. Give the fake PMU a made
up type number.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:27:42 -03:00
Ian Rogers
d3d5c1a00f perf list: Avoid potential out of bounds memory read
If a desc string is 0 length then -1 will be out of bounds, add a
check.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:26:27 -03:00
Andrew Kreimer
4ae354d73a perf help: Fix a typo ("bellow")
Fix a typo in comments.

Reported-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20240907131006.18510-1-algonell@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:24:12 -03:00
Changbin Du
74298dd8ac perf ftrace: Detect whether ftrace is enabled on system
To make error messages more accurate, this change detects whether ftrace is
enabled on system by checking trace file "set_ftrace_pid".

Before:

  # perf ftrace
  failed to reset ftrace
  #

After:

  # perf ftrace
  ftrace is not supported on this system
  #

Committer testing:

Doing it in an unprivileged toolbox container on Fedora 40:

Before:

  acme@number:~/git/perf-tools-next$ toolbox enter perf
  ⬢[acme@toolbox perf-tools-next]$ sudo su -
  ⬢[root@toolbox ~]# ~acme/bin/perf ftrace
  failed to reset ftrace
  ⬢[root@toolbox ~]#

After this patch:

  ⬢[root@toolbox ~]# ~acme/bin/perf ftrace
  ftrace is not supported on this system
  ⬢[root@toolbox ~]#

Maybe we could check if we are in such as situation, inside an
unprivileged container, and provide a HINT line?

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240911100126.900779-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:35 -03:00
Arnaldo Carvalho de Melo
83420d5f58 perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex
Thomas reported the vfs_getname perf tests failing on s/390, it seems it
was just to some extraneous '=' somehow getting into the regexp, remove
it, now:

  root@x1:~# perf test getname
   91: Add vfs_getname probe to get syscall args filenames             : Ok
   93: Use vfs_getname probe to get syscall args filenames             : FAILED!
  126: Check open filename arg using perf trace + vfs_getname          : Ok
  root@x1:~#

Second one remains a mistery, have to take some time to nail it down.

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>,
Link: https://lore.kernel.org/lkml/1d7f3b7b-9edc-4d90-955c-9345428563f1@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo
9327f0ecad perf build: Require at least clang 16.0.6 to build BPF skeletons
Howard reported problems using perf features that use BPF:

  perf $ clang -v
  Debian clang version 15.0.6
  Target: x86_64-pc-linux-gnu
  Thread model: posix
  InstalledDir: /bin
  Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Candidate multilib: .;@m64
  Selected multilib: .;@m64
  perf $ ./perf trace -e write --max-events=1
  libbpf: prog 'sys_enter_rename': BPF program load failed: Permission denied
  libbpf: prog 'sys_enter_rename': -- BEGIN PROG LOAD LOG --
  0: R1=ctx() R10=fp0

But it works with:

  perf $ clang -v
  Debian clang version 16.0.6 (15~deb12u1)
  Target: x86_64-pc-linux-gnu
  Thread model: posix
  InstalledDir: /bin
  Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Candidate multilib: .;@m64
  Selected multilib: .;@m64
  perf $ ./perf trace -e write --max-events=1
       0.000 ( 0.009 ms): gmain/1448 write(fd: 4, buf: \1\0\0\0\0\0\0\0, count: 8)                         = 8 (kworker/0:0-eve)
  perf $

So lets make that the required version, if you happen to have a slightly
older version where this work, please report so that we can adjust the
minimum required version.

Reported-by: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuGL9ROeTV2uXoSp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo
4c1af9bf97 perf trace: If a syscall arg is marked as 'const', assume it is coming _from_ userspace
We need to decide where to copy syscall arg contents, if at the
syscalls:sys_entry hook, meaning is something that is coming from
user to kernel space, or if it is a response, i.e. if it is something
the _kernel_ is filling in and thus going to userspace.

Since we have 'const' used in those syscalls, and unsure about this
being consistent, doing:

  root@number:~# echo $(grep const /sys/kernel/tracing/events/syscalls/sys_enter_*/format  | grep struct | cut -c47- | cut -d'/' -f1)
  clock_nanosleep clock_settime epoll_pwait2 futex io_pgetevents landlock_create_ruleset listmount mq_getsetattr mq_notify mq_timedreceive mq_timedsend preadv2 preadv prlimit64 process_madvise process_vm_readv process_vm_readv process_vm_writev process_vm_writev pwritev2 pwritev readv rt_sigaction rt_sigtimedwait semtimedop statmount timerfd_settime timer_settime vmsplice writev
  root@number:~#

Seems to indicate that we can use that for the ones that have the
'const' to mark it as coming from user space, do it.

Most notable/frequent syscall that now gets BTF pretty printed in a
system wide 'perf trace' session is:

  root@number:~# perf trace
     21.160 (         ): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e1dfe964, op: WAIT_BITSET|PRIVATE_FLAG, utime: (struct __kernel_timespec){.tv_sec = (__kernel_time64_t)50290,.tv_nsec = (long long int)810362837,}, val3: MATCH_ANY) ...
      21.166 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.169 ( 0.001 ms): RemVidChild/6995 sendmsg(fd: 25<socket:[78915]>, msg: 0x7f49e9af9da0, flags: DONTWAIT) = 280
      21.172 ( 0.289 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa58, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) = 0
      21.463 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.467 ( 0.001 ms): RemVidChild/6995 futex(uaddr: 0x7f49e28bb964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
      21.160 ( 0.314 ms): MediaSu~isor #/1028597  ... [continued]: futex())                                            = 0
      21.469 (         ): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa5c, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) ...
      21.475 ( 0.000 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49d0223040, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.478 ( 0.001 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e26ac964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
  ^Croot@number:~#
  root@number:~# cat /sys/kernel/tracing/events/syscalls/sys_enter_futex/format
  name: sys_enter_futex
  ID: 454
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
  	field:u32 * uaddr;	offset:16;	size:8;	signed:0;
  	field:int op;	offset:24;	size:8;	signed:0;
  	field:u32 val;	offset:32;	size:8;	signed:0;
  	field:const struct __kernel_timespec * utime;	offset:40;	size:8;	signed:0;
  	field:u32 * uaddr2;	offset:48;	size:8;	signed:0;
  	field:u32 val3;	offset:56;	size:8;	signed:0;

  print fmt: "uaddr: 0x%08lx, op: 0x%08lx, val: 0x%08lx, utime: 0x%08lx, uaddr2: 0x%08lx, val3: 0x%08lx", ((unsigned long)(REC->uaddr)), ((unsigned long)(REC->op)), ((unsigned long)(REC->val)), ((unsigned long)(REC->utime)), ((unsigned long)(REC->uaddr2)), ((unsigned long)(REC->val3))
  root@number:~#

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWnuQrrBoTn6Rrn6vM_xQ2fCoc9i-AitD7abTcNi-4o1Q@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Yang Li
e37b315c17 perf parse-events: Remove duplicated include in parse-events.c
The header files parse-events.h is included twice in parse-events.c,
so one inclusion of each can be removed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10822
Link: https://lore.kernel.org/r/20240910005522.35994-1-yang.lee@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:27 -03:00
Ian Rogers
02b2705017 perf callchain: Allow symbols to be optional when resolving a callchain
In uses like 'perf inject' it is not necessary to gather the symbol for
each call chain location, the map for the sample IP is wanted so that
build IDs and the like can be injected. Make gathering the symbol in the
callchain_cursor optional.

For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to
29.6MB by avoiding loading symbols.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
64eed019f3 perf inject: Lazy build-id mmap2 event insertion
Add -B option that lazily inserts mmap2 events thereby dropping all
mmap events without samples. This is similar to the behavior of -b
where only build_id events are inserted when a dso is accessed in a
sample.

File size savings can be significant in system-wide mode, consider:

  $ perf record -g -a -o perf.data sleep 1
  $ perf inject -B -i perf.data -o perf.new.data
  $ ls -al perf.data perf.new.data
           5147049 perf.data
           2248493 perf.new.data

Give test coverage of the new option in pipe test.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
d762ba020d perf inject: Add new mmap2-buildid-all option
Add an option that allows all mmap or mmap2 events to be rewritten as
mmap2 events with build IDs.

This is similar to the existing -b/--build-ids and --buildid-all options
except instead of adding a build_id event an existing mmap/mmap2 event
is used as a template and a new mmap2 event synthesized from it.

As mmap2 events are typical this avoids the insertion of build_id
events.

Add test coverage to the pipe test.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
ae39ba1655 perf inject: Fix build ID injection
Build ID injection wasn't inserting a sample ID and aligning events to
64 bytes rather than 8. No sample ID means events are unordered and two
different build_id events for the same path, as happens when a file is
replaced, can't be differentiated.

Add in sample ID insertion for the build_id events alongside some
refactoring. The refactoring better aligns the function arguments for
different use cases, such as synthesizing build_id events without
needing to have a dso. The misc bits are explicitly passed as with
callchains the maps/dsos may span user and kernel land, so using
sample->cpumode isn't good enough.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Namhyung Kim
02648783c2 perf annotate-data: Add pr_debug_scope()
The pr_debug_scope() is to print more information about the scope DIE
during the instruction tracking so that it can help finding relevant
debug info and the source code like inlined functions more easily.

  $ perf --debug type-profile annotate --data-type
  ...
  -----------------------------------------------------------
  find data type for 0(reg0, reg12) at set_task_cpu+0xdd
  CU for kernel/sched/core.c (die:0x1268dae)
  frame base: cfa=1 fbreg=7
  scope: [3/3] (die:12b6d28) [inlined] set_task_rq       <<<--- (here)
  bb: [9f - dd]
  var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0)
  var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Namhyung Kim
c8b9358778 perf annotate: Treat 'call' instruction as stack operation
I found some portion of mem-store events sampled on CALL instruction
which has no memory access.  But it actually saves a return address
into stack.  It should be considered as a stack operation like RET
instruction.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
James Clark
206dcfca1f perf build: Autodetect minimum required llvm-dev version
The new LLVM addr2line feature requires a minimum version of 13 to
compile. Add a feature check for the version so that NO_LLVM=1 doesn't
need to be explicitly added. Leave the existing llvm feature check
intact because it's used by tools other than Perf.

This fixes the following compilation error when the llvm-dev version
doesn't match:

  util/llvm-c-helpers.cpp: In function 'char* llvm_name_for_code(dso*, const char*, u64)':
  util/llvm-c-helpers.cpp:178:21: error: 'std::remove_reference_t<llvm::DILineInfo>' {aka 'struct llvm::DILineInfo'} has no member named 'StartAddress'
    178 |   addr, res_or_err->StartAddress ? *res_or_err->StartAddress : 0);

Fixes: c3f8644c21 ("perf report: Support LLVM for addr2line()")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Manu Bretelle <chantr4@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20240910140405.568791-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo
375f9262ac perf trace: Mark the rlim arg in the prlimit64 and setrlimit syscalls as coming from user space
With that it uses the generic BTF based pretty printer:

  root@number:~# perf trace -e prlimit64
       0.000 ( 0.004 ms): :3417020/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fe3b0)      = 0
       0.126 ( 0.003 ms): Chroot Helper/3417022 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fdfd0) = 0
      12.557 ( 0.005 ms): firefox/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1b80)        = 0
      26.640 ( 0.006 ms): MainThread/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1780)     = 0
      27.553 ( 0.002 ms): Web Content/3417020 prlimit64(resource: AS, old_rlim: 0x7ffe9ade1660)       = 0
      29.405 ( 0.003 ms): Web Content/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7ffe9ade0c80)   = 0
      30.471 ( 0.002 ms): Web Content/3417020 prlimit64(resource: RTTIME, old_rlim: 0x7ffe9ade1370)   = 0
      30.485 ( 0.001 ms): Web Content/3417020 prlimit64(resource: RTTIME, new_rlim: (struct rlimit64){.rlim_cur = (__u64)50000,.rlim_max = (__u64)200000,}) = 0
      31.779 ( 0.001 ms): Web Content/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1670)    = 0
  ^Croot@number:~#

Better than before, still needs improvements in the configurability of
the libbpf BTF dumper to get it to the strace output standard.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuBQI-f8CGpuhIdH@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo
f3f16112c6 perf trace: Support collecting 'union's with the BPF augmenter
And reuse the BTF based struct pretty printer, with that we can offer
initial support for the 'bpf' syscall's second argument, a 'union
bpf_attr' pointer.

But this is not that satisfactory as the libbpf btf dumper will pretty
print _all_ the union, we need to have a way to say that the first arg
selects the type for the union member to be pretty printed, something
like what pahole does translating the PERF_RECORD_ selector into a name,
and using that name to find a matching struct.

In the case of 'union bpf_attr' it would map PROG_LOAD to one of the
union members, but unfortunately there is no such mapping:

  root@number:~# pahole bpf_attr
  union bpf_attr {
  	struct {
  		__u32              map_type;           /*     0     4 */
  		__u32              key_size;           /*     4     4 */
  		__u32              value_size;         /*     8     4 */
  		__u32              max_entries;        /*    12     4 */
  		__u32              map_flags;          /*    16     4 */
  		__u32              inner_map_fd;       /*    20     4 */
  		__u32              numa_node;          /*    24     4 */
  		char               map_name[16];       /*    28    16 */
  		__u32              map_ifindex;        /*    44     4 */
  		__u32              btf_fd;             /*    48     4 */
  		__u32              btf_key_type_id;    /*    52     4 */
  		__u32              btf_value_type_id;  /*    56     4 */
  		__u32              btf_vmlinux_value_type_id; /*    60     4 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u64              map_extra;          /*    64     8 */
  		__s32              value_type_btf_obj_fd; /*    72     4 */
  		__s32              map_token_fd;       /*    76     4 */
  	};                                             /*     0    80 */
  	struct {
  		__u32              map_fd;             /*     0     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              key;                /*     8     8 */
  		union {
  			__u64      value;              /*    16     8 */
  			__u64      next_key;           /*    16     8 */
  		};                                     /*    16     8 */
  		__u64              flags;              /*    24     8 */
  	};                                             /*     0    32 */
  	struct {
  		__u64              in_batch;           /*     0     8 */
  		__u64              out_batch;          /*     8     8 */
  		__u64              keys;               /*    16     8 */
  		__u64              values;             /*    24     8 */
  		__u32              count;              /*    32     4 */
  		__u32              map_fd;             /*    36     4 */
  		__u64              elem_flags;         /*    40     8 */
  		__u64              flags;              /*    48     8 */
  	} batch;                                       /*     0    56 */
  	struct {
  		__u32              prog_type;          /*     0     4 */
  		__u32              insn_cnt;           /*     4     4 */
  		__u64              insns;              /*     8     8 */
  		__u64              license;            /*    16     8 */
  		__u32              log_level;          /*    24     4 */
  		__u32              log_size;           /*    28     4 */
  		__u64              log_buf;            /*    32     8 */
  		__u32              kern_version;       /*    40     4 */
  		__u32              prog_flags;         /*    44     4 */
  		char               prog_name[16];      /*    48    16 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u32              prog_ifindex;       /*    64     4 */
  		__u32              expected_attach_type; /*    68     4 */
  		__u32              prog_btf_fd;        /*    72     4 */
  		__u32              func_info_rec_size; /*    76     4 */
  		__u64              func_info;          /*    80     8 */
  		__u32              func_info_cnt;      /*    88     4 */
  		__u32              line_info_rec_size; /*    92     4 */
  		__u64              line_info;          /*    96     8 */
  		__u32              line_info_cnt;      /*   104     4 */
  		__u32              attach_btf_id;      /*   108     4 */
  		union {
  			__u32      attach_prog_fd;     /*   112     4 */
  			__u32      attach_btf_obj_fd;  /*   112     4 */
  		};                                     /*   112     4 */
  		__u32              core_relo_cnt;      /*   116     4 */
  		__u64              fd_array;           /*   120     8 */
  		/* --- cacheline 2 boundary (128 bytes) --- */
  		__u64              core_relos;         /*   128     8 */
  		__u32              core_relo_rec_size; /*   136     4 */
  		__u32              log_true_size;      /*   140     4 */
  		__s32              prog_token_fd;      /*   144     4 */
  	};                                             /*     0   152 */
  	struct {
  		__u64              pathname;           /*     0     8 */
  		__u32              bpf_fd;             /*     8     4 */
  		__u32              file_flags;         /*    12     4 */
  		__s32              path_fd;            /*    16     4 */
  	};                                             /*     0    24 */
  	struct {
  		union {
  			__u32      target_fd;          /*     0     4 */
  			__u32      target_ifindex;     /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              attach_bpf_fd;      /*     4     4 */
  		__u32              attach_type;        /*     8     4 */
  		__u32              attach_flags;       /*    12     4 */
  		__u32              replace_bpf_fd;     /*    16     4 */
  		union {
  			__u32      relative_fd;        /*    20     4 */
  			__u32      relative_id;        /*    20     4 */
  		};                                     /*    20     4 */
  		__u64              expected_revision;  /*    24     8 */
  	};                                             /*     0    32 */
  	struct {
  		__u32              prog_fd;            /*     0     4 */
  		__u32              retval;             /*     4     4 */
  		__u32              data_size_in;       /*     8     4 */
  		__u32              data_size_out;      /*    12     4 */
  		__u64              data_in;            /*    16     8 */
  		__u64              data_out;           /*    24     8 */
  		__u32              repeat;             /*    32     4 */
  		__u32              duration;           /*    36     4 */
  		__u32              ctx_size_in;        /*    40     4 */
  		__u32              ctx_size_out;       /*    44     4 */
  		__u64              ctx_in;             /*    48     8 */
  		__u64              ctx_out;            /*    56     8 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u32              flags;              /*    64     4 */
  		__u32              cpu;                /*    68     4 */
  		__u32              batch_size;         /*    72     4 */
  	} test;                                        /*     0    80 */
  	struct {
  		union {
  			__u32      start_id;           /*     0     4 */
  			__u32      prog_id;            /*     0     4 */
  			__u32      map_id;             /*     0     4 */
  			__u32      btf_id;             /*     0     4 */
  			__u32      link_id;            /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              next_id;            /*     4     4 */
  		__u32              open_flags;         /*     8     4 */
  	};                                             /*     0    12 */
  	struct {
  		__u32              bpf_fd;             /*     0     4 */
  		__u32              info_len;           /*     4     4 */
  		__u64              info;               /*     8     8 */
  	} info;                                        /*     0    16 */
  	struct {
  		union {
  			__u32      target_fd;          /*     0     4 */
  			__u32      target_ifindex;     /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              attach_type;        /*     4     4 */
  		__u32              query_flags;        /*     8     4 */
  		__u32              attach_flags;       /*    12     4 */
  		__u64              prog_ids;           /*    16     8 */
  		union {
  			__u32      prog_cnt;           /*    24     4 */
  			__u32      count;              /*    24     4 */
  		};                                     /*    24     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              prog_attach_flags;  /*    32     8 */
  		__u64              link_ids;           /*    40     8 */
  		__u64              link_attach_flags;  /*    48     8 */
  		__u64              revision;           /*    56     8 */
  	} query;                                       /*     0    64 */
  	struct {
  		__u64              name;               /*     0     8 */
  		__u32              prog_fd;            /*     8     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              cookie;             /*    16     8 */
  	} raw_tracepoint;                              /*     0    24 */
  	struct {
  		__u64              btf;                /*     0     8 */
  		__u64              btf_log_buf;        /*     8     8 */
  		__u32              btf_size;           /*    16     4 */
  		__u32              btf_log_size;       /*    20     4 */
  		__u32              btf_log_level;      /*    24     4 */
  		__u32              btf_log_true_size;  /*    28     4 */
  		__u32              btf_flags;          /*    32     4 */
  		__s32              btf_token_fd;       /*    36     4 */
  	};                                             /*     0    40 */
  	struct {
  		__u32              pid;                /*     0     4 */
  		__u32              fd;                 /*     4     4 */
  		__u32              flags;              /*     8     4 */
  		__u32              buf_len;            /*    12     4 */
  		__u64              buf;                /*    16     8 */
  		__u32              prog_id;            /*    24     4 */
  		__u32              fd_type;            /*    28     4 */
  		__u64              probe_offset;       /*    32     8 */
  		__u64              probe_addr;         /*    40     8 */
  	} task_fd_query;                               /*     0    48 */
  	struct {
  		union {
  			__u32      prog_fd;            /*     0     4 */
  			__u32      map_fd;             /*     0     4 */
  		};                                     /*     0     4 */
  		union {
  			__u32      target_fd;          /*     4     4 */
  			__u32      target_ifindex;     /*     4     4 */
  		};                                     /*     4     4 */
  		__u32              attach_type;        /*     8     4 */
  		__u32              flags;              /*    12     4 */
  		union {
  			__u32      target_btf_id;      /*    16     4 */
  			struct {
  				__u64 iter_info;       /*    16     8 */
  				__u32 iter_info_len;   /*    24     4 */
  			};                             /*    16    16 */
  			struct {
  				__u64 bpf_cookie;      /*    16     8 */
  			} perf_event;                  /*    16     8 */
  			struct {
  				__u32 flags;           /*    16     4 */
  				__u32 cnt;             /*    20     4 */
  				__u64 syms;            /*    24     8 */
  				__u64 addrs;           /*    32     8 */
  				__u64 cookies;         /*    40     8 */
  			} kprobe_multi;                /*    16    32 */
  			struct {
  				__u32 target_btf_id;   /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 cookie;          /*    24     8 */
  			} tracing;                     /*    16    16 */
  			struct {
  				__u32 pf;              /*    16     4 */
  				__u32 hooknum;         /*    20     4 */
  				__s32 priority;        /*    24     4 */
  				__u32 flags;           /*    28     4 */
  			} netfilter;                   /*    16    16 */
  			struct {
  				union {
  					__u32  relative_fd; /*    16     4 */
  					__u32  relative_id; /*    16     4 */
  				};                     /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 expected_revision; /*    24     8 */
  			} tcx;                         /*    16    16 */
  			struct {
  				__u64 path;            /*    16     8 */
  				__u64 offsets;         /*    24     8 */
  				__u64 ref_ctr_offsets; /*    32     8 */
  				__u64 cookies;         /*    40     8 */
  				__u32 cnt;             /*    48     4 */
  				__u32 flags;           /*    52     4 */
  				__u32 pid;             /*    56     4 */
  			} uprobe_multi;                /*    16    48 */
  			struct {
  				union {
  					__u32  relative_fd; /*    16     4 */
  					__u32  relative_id; /*    16     4 */
  				};                     /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 expected_revision; /*    24     8 */
  			} netkit;                      /*    16    16 */
  		};                                     /*    16    48 */
  	} link_create;                                 /*     0    64 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  		union {
  			__u32      new_prog_fd;        /*     4     4 */
  			__u32      new_map_fd;         /*     4     4 */
  		};                                     /*     4     4 */
  		__u32              flags;              /*     8     4 */
  		union {
  			__u32      old_prog_fd;        /*    12     4 */
  			__u32      old_map_fd;         /*    12     4 */
  		};                                     /*    12     4 */
  	} link_update;                                 /*     0    16 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  	} link_detach;                                 /*     0     4 */
  	struct {
  		__u32              type;               /*     0     4 */
  	} enable_stats;                                /*     0     4 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  		__u32              flags;              /*     4     4 */
  	} iter_create;                                 /*     0     8 */
  	struct {
  		__u32              prog_fd;            /*     0     4 */
  		__u32              map_fd;             /*     4     4 */
  		__u32              flags;              /*     8     4 */
  	} prog_bind_map;                               /*     0    12 */
  	struct {
  		__u32              flags;              /*     0     4 */
  		__u32              bpffs_fd;           /*     4     4 */
  	} token_create;                                /*     0     8 */
  };

  root@number:~#

So this is one case where BTF gets us only that far, not getting all
the way to automate the pretty printing of unions designed like 'union
bpf_attr', we will need a custom pretty printer for this union, as using
the libbpf union BTF dumper is way too verbose:

  root@number:~# perf trace --max-events 1 -e bpf bpftool map
       0.000 ( 0.054 ms): bpftool/3409073 bpf(cmd: PROG_LOAD, uattr: (union bpf_attr){(struct){.map_type = (__u32)1,.key_size = (__u32)2,.value_size = (__u32)2755142048,.max_entries = (__u32)32764,.map_flags = (__u32)150263906,.inner_map_fd = (__u32)21920,},(struct){.map_fd = (__u32)1,.key = (__u64)140723063628192,(union){.value = (__u64)94145833392226,.next_key = (__u64)94145833392226,},},.batch = (struct){.in_batch = (__u64)8589934593,.out_batch = (__u64)140723063628192,.keys = (__u64)94145833392226,},(struct){.prog_type = (__u32)1,.insn_cnt = (__u32)2,.insns = (__u64)140723063628192,.license = (__u64)94145833392226,},(struct){.pathname = (__u64)8589934593,.bpf_fd = (__u32)2755142048,.file_flags = (__u32)32764,.path_fd = (__s32)150263906,},(struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_bpf_fd = (__u32)2,.attach_type = (__u32)2755142048,.attach_flags = (__u32)32764,.replace_bpf_fd = (__u32)150263906,(union){.relative_fd = (__u32)21920,.relative_id = (__u32)21920,},},.test = (struct){.prog_fd = (__u32)1,.retval = (__u32)2,.data_size_in = (__u32)2755142048,.data_size_out = (__u32)32764,.data_in = (__u64)94145833392226,},(struct){(union){.start_id = (__u32)1,.prog_id = (__u32)1,.map_id = (__u32)1,.btf_id = (__u32)1,.link_id = (__u32)1,},.next_id = (__u32)2,.open_flags = (__u32)2755142048,},.info = (struct){.bpf_fd = (__u32)1,.info_len = (__u32)2,.info = (__u64)140723063628192,},.query = (struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_type = (__u32)2,.query_flags = (__u32)2755142048,.attach_flags = (__u32)32764,.prog_ids = (__u64)94145833392226,},.raw_tracepoint = (struct){.name = (__u64)8589934593,.prog_fd = (__u32)2755142048,.cookie = (__u64)94145833392226,},(struct){.btf = (__u64)8589934593,.btf_log_buf = (__u64)140723063628192,.btf_size = (__u32)150263906,.btf_log_size = (__u32)21920,},.task_fd_query = (struct){.pid = (__u32)1,.fd = (__u32)2,.flags = (__u32)2755142048,.buf_len = (__u32)32764,.buf = (__u64)94145833392226,},.link_create = (struct){(union){.prog_fd = (__u32)1,.map_fd = (__u32)1,},(u) = 3
  root@number:~# 2: prog_array  name hid_jmp_table  flags 0x0
  	key 4B  value 4B  max_entries 1024  memlock 8440B
  	owner_prog_type tracing  owner jited
  13: hash_of_maps  name cgroup_hash  flags 0x0
  	key 8B  value 4B  max_entries 2048  memlock 167584B
  	pids systemd(1)
  960: array  name libbpf_global  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 280B
  961: array  name pid_iter.rodata  flags 0x480
  	key 4B  value 4B  max_entries 1  memlock 8192B
  	btf_id 1846  frozen
  	pids bpftool(3409073)
  962: array  name libbpf_det_bind  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 280B

  root@number:~#

For simpler unions this may be better than not seeing any payload, so
keep it there.

Acked-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuBLat8cbadILNLA@x1
[ Removed needless parenteses in the if block leading to the trace__btf_scnprintf() call, as per Howard's review comments ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:31:51 -03:00
Howard Chu
3278024540 perf trace: Add --force-btf for debugging
If --force-btf is enabled, prefer btf_dump general pretty printer to
perf trace's customized pretty printers.

Mostly for debug purposes.

Committer testing:

diff before/after shows we need several improvements to be able to
compare the changes, first we need to cut off/disable mutable data such
as pids and timestamps, then what is left are the buffer addresses
passed from userspace, returned from kernel space, maybe we can ask
'perf trace' to go on making those reproducible.

That would entail a Pointer Address Translation (PAT) like for
networking, that would, for simple, reproducible if not for these
details, workloads, that we would then use in our regression tests.

Enough digression, this is one such diff:

   openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) = 3
  -fstat(fd: 3, statbuf: 0x7fff01f212a0)                                 = 0
  -read(fd: 3, buf: 0x5596bab2d630, count: 4096)                         = 2998
  -read(fd: 3, buf: 0x5596bab2d630, count: 4096)                         = 0
  +fstat(fd: 3, statbuf: 0x7ffc163cf0e0)                                 = 0
  +read(fd: 3, buf: 0x55b4e0631630, count: 4096)                         = 2998
  +read(fd: 3, buf: 0x55b4e0631630, count: 4096)                         = 0
   close(fd: 3)                                                          = 0
   openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
  @@ -45,7 +45,7 @@
   openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
  -{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0
  +(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) =

The problem more close to our hands is to make the libbpf BTF pretty
printer to have a mode that closely resembles what we're trying to
resemble: strace output.

Being able to run something with 'perf trace' and with 'strace' and get
the exact same output should be of interest of anybody wanting to have
strace and 'perf trace' regression tested against each other.

That last part is 'perf trace' shot at being something so useful as
strace... ;-)

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:27 -03:00
Howard Chu
a68fd6a6cd perf trace: Collect augmented data using BPF
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads
TRACE_AUG_MAX_BUF bytes of buffer maximum.

Determine what type of argument and how many bytes to read from user space, us ing the
value in the beauty_map. This is the relation of parameter type and its corres ponding
value in the beauty map, and how many bytes we read eventually:

string: 1                          -> size of string (till null)
struct: size of struct             -> size of struct
buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF)

After reading from user space, we output the augmented data using
bpf_perf_event_output().

If the struct augmenter, augment_sys_enter() failed, we fall back to
using bpf_tail_call().

I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:20 -03:00
Howard Chu
b257fac12f perf trace: Pretty print buffer data
Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum
buffer size we can augment. BPF will include this header too.

Print buffer in a way that's different than just printing a string, we
print all the control characters in \digits (such as \0 for null, and
\10 for newline, LF).

For character that has a bigger value than 127, we print the digits
instead of the character itself as well.

Committer notes:

Simplified the buffer scnprintf to avoid using multiple buffers as
discussed in the patch review thread.

We can't really all 'buf' args to SCA_BUF as we're collecting so far
just on the sys_enter path, so we would be printing the previous 'read'
arg buffer contents, not what the kernel puts there.

So instead of:
   static int syscall_fmt__cmp(const void *name, const void *fmtp)
  @@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
  -               else if (strstr(field->type, "char *") && strstr(field->name, "buf"))
  -                       arg->scnprintf = SCA_BUF;

Do:

static const struct syscall_fmt syscall_fmts[] = {
  +       { .name     = "write",      .errpid = true,
  +         .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, },

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:13 -03:00
Howard Chu
cb32035214 perf trace: Pretty print struct data
Change the arg->augmented.args to arg->augmented.args->value to skip the
header for customized pretty printers, since we collect data in BPF
using the general augment_sys_enter(), which always adds the header.

Use btf_dump API to pretty print augmented struct pointer.

Prefer existed pretty-printer than btf general pretty-printer.

set compact = true and skip_names = true, so that no newline character
and argument name are printed.

Committer notes:

Simplified the btf_dump_snprintf callback to avoid using multiple
buffers, as discussed in the thread accessible via the Link tag below.

Also made it do:

  dump_data_opts.skip_names = !arg->trace->show_arg_names;

I.e. show the type and struct field names according to that tunable, we
probably need another tunable just for this, but for now if the user
wants to see syscall names in addition to its value, it makes sense to
see the struct field names according to that tunable.

Committer testing:

The following have explicitely set beautifiers (SCA_FILENAME,
SCA_SOCKADDR and SCA_PERF_ATTR), SCA_FILENAME is here just because we
have been wiring up the "renameat2" ("renameat" until recently), so it
doesn't use the introduced generic fallback (btf_struct_scnprintf(), see
the definition of SCA_PERF_ATTR, SCA_SOCKADDR to see the more feature
rich beautifiers, that are not using BTF):

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
       0.000 ( 0.039 ms): mv/258478 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
       0.000 ( 0.014 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
       0.040 ( 0.003 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a6980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
      18.742 ( 0.020 ms): ping/258481 sendto(fd: 5, buff: 0x7ffc04768df0, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
  PING www.google.com (142.251.129.68) 56(84) bytes of data.
      18.783 ( 0.012 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
      18.797 ( 0.001 ms): ping/258481 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0
      18.815 ( 0.002 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.129.68 }, addrlen: 16) = 0
      18.862 ( 0.023 ms): ping/258481 sendto(fd: 3, buff: 0x55bc317a0ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.129.68 }, addr_len: 0x10) = 64
      63.330 ( 0.038 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      63.435 ( 0.010 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a8340, len: 110, flags: DONTWAIT|NOSIGNAL) = 110
  64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.2 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.158/44.158/44.158/0.000 ms
  root@number:~# perf trace -e perf_event_open perf stat -e instructions,cache-misses,syscalls:sys_enter*sleep* sleep 1.23456789
       0.000 ( 0.010 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0xa00000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599052088, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
       0.016 ( 0.002 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0x400000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599044082, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
       1.838 ( 0.006 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
       1.846 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 6
       1.849 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 7
       1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9
       1.853 ( 0.600 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x190 (syscalls:sys_enter_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 10
       2.456 ( 0.016 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x196 (syscalls:sys_enter_clock_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 11

   Performance counter stats for 'sleep 1.23456789':

           1,402,839      cpu_atom/instructions/
       <not counted>      cpu_core/instructions/                                                  (0.00%)
              11,066      cpu_atom/cache-misses/
       <not counted>      cpu_core/cache-misses/                                                  (0.00%)
                   0      syscalls:sys_enter_nanosleep
                   1      syscalls:sys_enter_clock_nanosleep

         1.236246714 seconds time elapsed

         0.000000000 seconds user
         0.001308000 seconds sys

  root@number:~#

Now if we use it even for the ones we have a specific beautifier in
tools/perf/trace/beauty, i.e. use btf_struct_scnprintf() for all
structs, by adding the following patch:

  @@ -2316,7 +2316,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,

   			default_scnprintf = sc->arg_fmt[arg.idx].scnprintf;

  -			if (default_scnprintf == NULL || default_scnprintf == SCA_PTR) {
  +			if (1 || (default_scnprintf == NULL || default_scnprintf == SCA_PTR)) {
   				btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed,
   								   size - printed, val, field->type);
   				if (btf_printed) {

We get:

  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
  PING www.google.com (142.251.129.68) 56(84) bytes of data.
       0.000 ( 0.015 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0
       0.046 ( 0.004 ms): ping/283259 sendto(fd: 5, buff: 0x559b008ae980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
       0.353 ( 0.012 ms): ping/283259 sendto(fd: 5, buff: 0x7ffc01294960, len: 20, addr: (struct sockaddr){.sa_family = (sa_family_t)16,}, addr_len: 0xc) = 20
       0.377 ( 0.006 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addrlen: 16) = 0
       0.388 ( 0.010 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)10,}, addrlen: 28) = 0
       0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0
       0.425 ( 0.045 ms): ping/283259 sendto(fd: 3, buff: 0x559b008a8ac0, len: 64, addr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addr_len: 0x10) = 64
  64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.1 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.113/44.113/44.113/0.000 ms
      44.849 ( 0.038 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0
      44.927 ( 0.006 ms): ping/283259 sendto(fd: 5, buff: 0x559b008b03d0, len: 110, flags: DONTWAIT|NOSIGNAL) = 110
  root@number:~#

Which looks sane, i.e.:

  18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0

Becomes:

   0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0

And.

  #define AF_UNIX         1       /* Unix domain sockets          */
  #define AF_LOCAL        1       /* POSIX name for AF_UNIX       */
  #define AF_INET         2       /* Internet IP Protocol         */
  <SNIP>
  #define AF_INET6        10      /* IP version 6                 */

And 'D' == 68, so the preexisting sockaddr BPF collector is working with
the new generic BTF pretty printer (btf_struct_scnprintf()), its just
that it doesn't know about 'struct sockaddr' besides what is in BTF,
i.e. its an array of bytes, not an IPv4 address that needs extra
massaging.

Ditto for the 'struct perf_event_attr' case:

       1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9

Becomes:

       2.081 ( 0.002 ms): :283304/283304 perf_event_open(attr_uptr: (struct perf_event_attr){.size = (__u32)136,.config = (__u64)17179869187,.sample_type = (__u64)65536,.read_format = (__u64)3,.disabled = (__u64)0x1,.inherit = (__u64)0x1,.enable_on_exec = (__u64)0x1,.exclude_guest = (__u64)0x1,}, pid: 283305 (sleep), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9

hex(17179869187) = 0x400000003, etc.

read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING is

enum perf_event_read_format {
        PERF_FORMAT_TOTAL_TIME_ENABLED          = 1U << 0,
        PERF_FORMAT_TOTAL_TIME_RUNNING          = 1U << 1,

and so on.

We need to work with the libbpf btf dump api to get one output that
matches the 'perf trace'/strace expectations/format, but having this in
this current form is already an improvement to 'perf trace', so lets
improve from what we have.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-7-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-5-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:07 -03:00
Howard Chu
7f40306728 perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;

if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
1;

if argument No.3 is a buffer, its size is indicated by argument No.4 (of
syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
buffer size in BPF  */

Committer notes:

Moved syscall_arg_fmt__cache_btf_struct() from a ifdef
HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on
HAVE_BPF_SKEL and thus breaks the build when building with
BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'.

Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
as we're using it in this patch, otherwise we get this while trying to
build at this point in the original patch series:

  builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
  builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
   3725 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
        |

We also have to take into account syscall_arg_fmt.from_user when telling
the kernel what to copy in the sys_enter generic collector, we don't
want to collect bogus data in buffers that will only be available to us
at sys_exit time, i.e. after the kernel has filled it, so leave this for
when we have such a sys_exit based collector.

Committer testing:

Not wired up yet, so all continues to work, using the existing BPF
collector and userspace beautifiers that are augmentation aware:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
       0.000 ( 0.031 ms): mv/20888 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
       0.000 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
       0.040 ( 0.003 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff17980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
       0.480 ( 0.017 ms): ping/20892 sendto(fd: 5, buff: 0x7ffd82d07150, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
       0.526 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
       0.542 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
       0.544 ( 0.004 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.135.100 }, addrlen: 16) = 0
       0.559 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.135.100 }, addrlen: 16PING www.google.com (142.251.135.100) 56(84) bytes of data.
  ) = 0
       0.589 ( 0.058 ms): ping/20892 sendto(fd: 3, buff: 0x560b4ff11ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.135.100 }, addr_len: 0x10) = 64
      45.250 ( 0.029 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      45.344 ( 0.012 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff19340, len: 111, flags: DONTWAIT|NOSIGNAL) = 111
  64 bytes from rio09s08-in-f4.1e100.net (142.251.135.100): icmp_seq=1 ttl=49 time=44.4 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.361/44.361/44.361/0.000 ms
  root@number:~#

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:51:59 -03:00
Arnaldo Carvalho de Melo
d92f490cba perf trace: Mark bpf's attr as from_user
This one has no specific pretty printer right now, so will be handled by
the generic BTF based one later in this patch series.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:51:51 -03:00
Arnaldo Carvalho de Melo
c790f2bafb perf trace: Introduce SCA_TIMESPEC_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:04 -03:00