Pull networking changes from Paolo Abeni:
"Core:
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent netdev_*
schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF:
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF
used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...
When using 'perf mem' and 'perf c2c', an issue is observed that tool
reports the wrong offset for global data symbols. This is a common
issue on both x86 and Arm64 platforms.
Let's see an example, for a test program, below is the disassembly for
its .bss section which is dumped with objdump:
...
Disassembly of section .bss:
0000000000004040 <completed.0>:
...
0000000000004080 <buf1>:
...
00000000000040c0 <buf2>:
...
0000000000004100 <thread>:
...
First we used 'perf mem record' to run the test program and then used
'perf --debug verbose=4 mem report' to observe what's the symbol info
for 'buf1' and 'buf2' structures.
# ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8
# ./perf --debug verbose=4 mem report
...
dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028
symbol__new: buf2 0x30a8-0x30e8
...
dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028
symbol__new: buf1 0x3068-0x30a8
...
The perf tool relies on libelf to parse symbols, in executable and
shared object files, 'st_value' holds a virtual address; 'sh_addr' is
the address at which section's first byte should reside in memory, and
'sh_offset' is the byte offset from the beginning of the file to the
first byte in the section. The perf tool uses below formula to convert
a symbol's memory address to a file address:
file_address = st_value - sh_addr + sh_offset
^
` Memory address
We can see the final adjusted address ranges for buf1 and buf2 are
[0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is
incorrect, in the code, the structure for 'buf1' and 'buf2' specifies
compiler attribute with 64-byte alignment.
The problem happens for 'sh_offset', libelf returns it as 0x3028 which
is not 64-byte aligned, combining with disassembly, it's likely libelf
doesn't respect the alignment for .bss section, therefore, it doesn't
return the aligned value for 'sh_offset'.
Suggested by Fangrui Song, ELF file contains program header which
contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD
segments contain the execution info. A better choice for converting
memory address to file address is using the formula:
file_address = st_value - p_vaddr + p_offset
This patch introduces elf_read_program_header() which returns the
program header based on the passed 'st_value', then it uses the formula
above to calculate the symbol file address; and the debugging log is
updated respectively.
After applying the change:
# ./perf --debug verbose=4 mem report
...
dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28
symbol__new: buf2 0x30c0-0x3100
...
dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28
symbol__new: buf1 0x3080-0x30c0
...
Fixes: f17e04afaf ("perf report: Fix ELF symbol parsing")
Reported-by: Chang Rui <changruinj@gmail.com>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':
#0 0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
#1 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
#2 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
#3 0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
#4 syscall__scnprintf_args <snip> at builtin-trace.c:2041
#5 0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319
That points to the below code in tools/perf/builtin-trace.c:
/*
* If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
* arguments, even if the syscall being handled, say "openat", uses only 4 arguments
* this breaks syscall__augmented_args() check for augmented args, as we calculate
* syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
* so when handling, say the openat syscall, we end up getting 6 args for the
* raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
* thinking that the extra 2 u64 args are the augmented filename, so just check
* here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
*/
if (evsel != trace->syscalls.events.sys_enter)
augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);
As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.
Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-07-09
We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).
The main changes are:
1) Add new way for performing BTF type queries to BPF, from Daniel Müller.
2) Add inlining of calls to bpf_loop() helper when its function callback is
statically known, from Eduard Zingerman.
3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.
4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
hooks, from Stanislav Fomichev.
5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.
6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.
7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
selftests, from Magnus Karlsson & Maciej Fijalkowski.
8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.
9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.
10) Sockmap optimizations around throughput of UDP transmissions which have been
improved by 61%, from Cong Wang.
11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.
12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.
13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.
14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
macro, from James Hilliard.
15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.
16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
bpf: Correctly propagate errors up from bpf_core_composites_match
libbpf: Disable SEC pragma macro on GCC
bpf: Check attach_func_proto more carefully in check_return_code
selftests/bpf: Add test involving restrict type qualifier
bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
MAINTAINERS: Add entry for AF_XDP selftests files
selftests, xsk: Rename AF_XDP testing app
bpf, docs: Remove deprecated xsk libbpf APIs description
selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
libbpf, riscv: Use a0 for RC register
libbpf: Remove unnecessary usdt_rel_ip assignments
selftests/bpf: Fix few more compiler warnings
selftests/bpf: Fix bogus uninitialized variable warning
bpftool: Remove zlib feature test from Makefile
libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
selftests/bpf: Add type match test against kernel's task_struct
selftests/bpf: Add nested type to type based tests
...
====================
Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked
objects") uncovered the following issue on aarch64:
util/unwind-libunwind-local.c: In function 'find_proc_info':
util/unwind-libunwind-local.c:386:28: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
386 | if (ofs > 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:371:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
371 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:363:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
363 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
In file included from util/libunwind/arm64.c:37:
Fixes: dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked objects")
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: kernel-team@cloudflare.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701182046.12589-1-ivan@cloudflare.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf already support ignore_missing_thread for -p, but not yet
applied to `perf stat -p <pid>`. This patch enables ignore_missing_thread
for `perf stat -p <pid>`.
Committer notes:
And here is a refresher about the 'ignore_missing_thread' knob, from a
previous patch using it:
ca8000684e ("perf evsel: Enable ignore_missing_thread for pid option")
---
While monitoring a multithread process with pid option, perf sometimes
may return sys_perf_event_open failure with 3(No such process) if any of
the process's threads die before we open the event. However, we want
perf continue monitoring the remaining threads and do not exit with
error.
---
Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220622030037.15005-1-ligang.bdlg@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For some reason using:
cat <<EoFuncBegin
static const char *errno_to_name__$arch(int err)
{
switch (err) {
EoFuncBegin
In tools/perf/trace/beauty/arch_errno_names.sh isn't working on ALT
Linux sisyphus (development version), which could be some distro
specific glitch, so just get this done in an alternative way that works
everywhere while giving notice to the people working on that distro to
try and figure our what really took place.
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Build ID events associate a file name with a build ID. However, when
using perf inject, there is no guarantee that the file on the current
machine at the current time has that build ID. Fix by comparing the
build IDs and skip adding to the cache if they are different.
Example:
$ echo "int main() {return 0;}" > prog.c
$ gcc -o prog prog.c
$ perf record --buildid-all ./prog
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data ]
$ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; }
$ file-buildid prog
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ echo "int main() {return 1;}" > prog.c
$ gcc -o prog prog.c
$ file-buildid prog
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
Before:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
$
After:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
$
Fixes: 454c407ec1 ("perf: add perf-inject builtin")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20220621125144.5623-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some functions we use for bpf prologue generation are going to be
deprecated. This change reworks current code not to use them.
We need to replace following functions/struct:
bpf_program__set_prep
bpf_program__nth_fd
struct bpf_prog_prep_result
Currently we use bpf_program__set_prep to hook perf callback before
program is loaded and provide new instructions with the prologue.
We replace this function/ality by taking instructions for specific
program, attaching prologue to them and load such new ebpf programs
with prologue using separate bpf_prog_load calls (outside libbpf
load machinery).
Before we can take and use program instructions, we need libbpf to
actually load it. This way we get the final shape of its instructions
with all relocations and verifier adjustments).
There's one glitch though.. perf kprobe program already assumes
generated prologue code with proper values in argument registers,
so loading such program directly will fail in the verifier.
That's where the fallback pre-load handler fits in and prepends
the initialization code to the program. Once such program is loaded
we take its instructions, cut off the initialization code and prepend
the prologue.
I know.. sorry ;-)
To have access to the program when loading this patch adds support to
register 'fallback' section handler to take care of perf kprobe programs.
The fallback means that it handles any section definition besides the
ones that libbpf handles.
The handler serves two purposes:
- allows perf programs to have special arguments in section name
- allows perf to use pre-load callback where we can attach init
code (zeroing all argument registers) to each perf program
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/bpf/20220616202214.70359-2-jolsa@kernel.org
The testcase 'Check Arm64 callgraphs are complete in fp mode' wants to
see the following output:
610 leaf
62f parent
648 main
However, without excluding kernel callchains, the output might look like:
ffffc2ff40ef1b5c arch_local_irq_enable
ffffc2ff419d032c __schedule
ffffc2ff419d06c0 schedule
ffffc2ff40e4da30 do_notify_resume
ffffc2ff40e421b0 work_pending
610 leaf
62f parent
648 main
Adding '--user-callchains' leaves only the wanted symbols in the chain.
Fixes: cd6382d827 ("perf test arm64: Test unwinding using fame-pointer (fp) mode")
Suggested-by: German Gomez <german.gomez@arm.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220614105207.26223-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
f94fd25cb0 ("tcp: pass back data left in socket after receive")
That don't result in any changes in the tables generated from that
header.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/all/YqORj9d58AiGYl8b@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test -F 83 ("perf stat CSV output linter") fails on s390.
Reason is the wrong number of fields for certain CPU core/die/socket
related output.
On x84_64 the output of command:
# ./perf stat -x, -A -a --no-merge true
CPU0,1.50,msec,cpu-clock,1502781,100.00,1.052,CPUs utilized
CPU1,1.48,msec,cpu-clock,1476113,100.00,1.034,CPUs utilized
...
results in 8 fields with 7 comma separators.
On s390 the output of command:
# ./perf stat -x, -A -a --no-merge -- true
0.95,msec,cpu-clock,949800,100.00,1.060,CPUs utilized
...
results in 7 fields with 6 comma separators. Therefore this tests
fails on s390. Similar issues exist for per-die and per-socket output
which is not supported on s390.
I have rewritten the python program to count commas in each output line
into a bash function to achieve the same result. I hope this makes it a
bit easier.
Output before:
# ./perf test -F 83
83: perf stat CSV output linter :
Checking CSV output: no args [Success]
Checking CSV output: system wide [Success]
Checking CSV output: system wide Checking CSV output: \
system wide no aggregation 6.92,msec,cpu-clock,\
6918131,100.00,6.972,CPUs utilized
...
RuntimeError: wrong number of fields. expected 7 in \
6.92,msec,cpu-clock,6918131,100.00,6.972,CPUs utilized
FAILED!
#
Output after:
# ./perf test -F 83
83: perf stat CSV output linter :
Checking CSV output: no args [Success]
Checking CSV output: system wide [Success]
Checking CSV output: system wide Checking CSV output:\
system wide no aggregation [Success]
Checking CSV output: interval [Success]
Checking CSV output: event [Success]
Checking CSV output: per core [Success]
Checking CSV output: per thread [Success]
Checking CSV output: per die [Success]
Checking CSV output: per node [Success]
Checking CSV output: per socket [Success]
Ok
#
Committer notes:
Continues to work on x86_64
$ perf test lint
89: perf stat CSV output linter : Ok
$ perf test -v lint
Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
89: perf stat CSV output linter :
--- start ---
test child forked, pid 53133
Checking CSV output: no args [Success]
Checking CSV output: system wide [Skip] paranoid and not root
Checking CSV output: system wide [Skip] paranoid and not root
Checking CSV output: interval [Success]
Checking CSV output: event [Success]
Checking CSV output: per core [Skip] paranoid and not root
Checking CSV output: per thread [Skip] paranoid and not root
Checking CSV output: per die [Skip] paranoid and not root
Checking CSV output: per node [Skip] paranoid and not root
Checking CSV output: per socket [Skip] paranoid and not root
test child finished with 0
---- end ----
perf stat CSV output linter: Ok
$
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux390-list@tuxmaker.boeblingen.de.ibm.com
Link: https://lore.kernel.org/r/20220603113034.2009728-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With the hardware TopDown metrics feature, the sample-read feature should
be supported for a TopDown group, e.g., sample a non-topdown event and read
a Topdown metric group. But the current perf record code errors are out.
For a TopDown metric group,the slots event must be the leader of the group,
but the leader slots event doesn't support sampling. To support sample-read
the TopDown metric group, uses the 2nd event of the group as the "leader"
for the purposes of sampling.
Only the platform with the TopDown metric feature supports sample-read the
topdown group. In commit acb65150a4 ("perf record: Support sample-read
topdown metric group"), it adds arch_topdown_sample_read() to indicate
whether the TopDown group supports sample-read, it should only work on the
non-hybrid systems, this patch extends the support for hybrid platforms.
Before:
# ./perf record -e "{cpu_core/slots/,cpu_core/cycles/,cpu_core/topdown-retiring/}:S" -a sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu_core/topdown-retiring/).
/bin/dmesg | grep -i perf may provide additional information.
After:
# ./perf record -e "{cpu_core/slots/,cpu_core/cycles/,cpu_core/topdown-retiring/}:S" -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.238 MB perf.data (369 samples) ]
Fixes: acb65150a4 ("perf record: Support sample-read topdown metric group")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220602153603.1884710-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
segbase is the address of .eh_frame_hdr and table_data is segbase plus
the header size. find_proc_info computes segbase as `map->start +
segbase - map->pgoff` which is wrong when
* .eh_frame_hdr and .text are in different PT_LOAD program headers
* and their p_vaddr difference does not equal their p_offset difference
Since 10.0, ld.lld's default --rosegment -z noseparate-code layout has
such R and RX PT_LOAD program headers.
ld.lld (default) => perf report fails to unwind `perf record
--call-graph dwarf` recorded data
ld.lld --no-rosegment => ok (trivial, no R PT_LOAD)
ld.lld -z separate-code => ok but by luck: there are two PT_LOAD but
their p_vaddr difference equals p_offset difference
ld.bfd -z noseparate-code => ok (trivial, no R PT_LOAD)
ld.bfd -z separate-code (default for Linux/x86) => ok but by luck:
there are two PT_LOAD but their p_vaddr difference equals p_offset
difference
To fix the issue, compute segbase as dso's base address plus
PT_GNU_EH_FRAME's p_vaddr. The base address is computed by iterating
over all dso-associated maps and then subtract the first PT_LOAD p_vaddr
(the minimum guaranteed by generic ABI) from the minimum address.
In libunwind, find_proc_info transitively called by unw_step is cached,
so the iteration overhead is acceptable.
Reported-by: Sebastian Ullrich <sebasti@nullri.ch>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: llvm@lists.linux.dev
Link: https://github.com/ClangBuiltLinux/linux/issues/1646
Link: https://lore.kernel.org/r/20220527182039.673248-1-maskray@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add shell test to check if perf-record hangs when recording an arm_spe
event with forks.
The test FAILS if the Kernel is not patched with Commit 961c391217 ("perf:
Always wake the parent event").
Unpatched Kernel:
$ perf test -v 90
90: Check Arm SPE doesn't hang when there are forks
--- start ---
test child forked, pid 14232
Recording workload with fork
Log lines = 90 /tmp/__perf_test.stderr.0Nu0U
Log lines after 1 second = 90 /tmp/__perf_test.stderr.0Nu0U
SPE hang test: FAIL
test child finished with -1
---- end ----
Check Arm SPE trace data in workload with forks: FAILED!
Patched Kernel:
$ perf test -v 90
90: Check Arm SPE doesn't hang when there are forks
--- start ---
test child forked, pid 2930
Compiling test program...
Recording workload...
Log lines = 478 /tmp/__perf_test.log.026AI
Log lines after 1 second = 557 /tmp/__perf_test.log.026AI
SPE hang test: PASS
Cleaning up files...
test child finished with 0
---- end ----
Check Arm SPE trace data in workload with forks: Ok
Reviewed-by: James Clark <james.clark@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220228165655.3920-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>