mirror of
https://github.com/torvalds/linux.git
synced 2024-12-31 23:31:29 +00:00
b611996ef2
33252 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
aa4800e31c |
perf tools changes for v6.2: 1st batch
Libraries: - Drop the old copy of libtraceevent in tools/lib/traceevent/ now that all major distros ship it from its external repository. This is now just another feature detection, emitting a warning when the libtraceevent-dev[el] package isn't installed, disabling the build of perf features and tools that strictly require parsing things from tracefs while keeping the core functionality present and working with a subset of the events, the most used ones like CPU cycles, hardware cache and also vendor events, etc. This was tested with lots of containers for Fedora, Debian, OpenSUSE, Alpine Linux, Ubuntu, with cross builds, etc. Build: - Update to C standard to gnu11, like was done for the kernel. - Install the tools/lib/ libraries locally instead of having headers searched directly from the source code directories, to help the cases where we can build either from in-kernel source libraries or from the same library shipped as a distro package, as is the case with libbpf and was the case with libtraceevent. perf stat: - Do not delay the workload with --delay, the delay is just for starting to count the events, to skip noise at workload startup. - When we have events for each cgroup, the metric should be printed for each cgroup separately. $ perf stat -a --for-each-cgroup system.slice,user.slice --metric-only sleep 1 Performance counter stats for 'system wide': GHz insn per cycle branch-misses of all branches system.slice 3.792 0.61 3.24% user.slice 3.661 2.32 0.37% - Fix printing field separator in CSV metrics output. - Fix --metric-only --json output. - Fix summary output in CSV with --metric-only. - Update event group check for support of uncore event. perf test: - Stop requiring a C toolchain in shell tests, instead add a workload option that has all the previously C snippets built as part of 'perf test -w' that then get used in the 'perf test' shell scripts. - Add event group test for events in multiple PMUs - The "kernel lock contention analysis" test should not print warnings in quiet mode. - Add attr tests for ARM64's new VG register. - Fix record test on KVM guests, as using precise flag with the br_inst_retired.near_call event causes the test fail on KVM guests, even when the guests have PMU forwarding enabled and the event itself is supported, so just remove the precise flag from the event. - Add mechanism for skipping attr tests on specific kernel versions where it is known that these checks will fail. - Skip watchpoint tests if no watchpoints available. - Add more Intel PT 'perf test' entries: hybrid CPUs, split the packet decoder into a suite of subtests. perf script: - Introduce task analyzer python script, where one first records some events: Recording can be done in two ways: $ perf script record tasks-analyzer -- sleep 10 $ perf record -e sched:sched_switch -a -- sleep 10 The script can parse any perf.data files, as long as it has sched:sched_switch events, other events will be ignored. The most simple report use case is to just call the script without arguments. Runtime is the time the task was running on the CPU, Time Out-In is the time between the process being scheduled *out* and scheduled back *in*. So the last time span between two executions: $ perf script report tasks-analyzer Switched-In Switched-Out CPU PID TID Comm Runtime Time Out-In 15576.658891407 15576.659156086 4 2412 2428 gdbus 265 1949 15576.659111320 15576.659455410 0 2412 2412 gnome-shell 344 2267 15576.659491326 15576.659506173 2 74 74 kworker/2:1 15 13145 15576.659506173 15576.659825748 2 2858 2858 gnome-terminal- 320 63263 15576.659871270 15576.659902872 6 20932 20932 kworker/u16:0 32 2314582 15576.659909951 15576.659945501 3 27264 27264 sh 36 -1 15576.659853285 15576.659971052 7 27265 27265 perf 118 5050741 [...] perf lock: - Allow concurrent record and report to support live monitoring of kernel lock contention without BPF: # perf lock record -a -o- sleep 1 | perf lock contention -i- contended total wait max wait avg wait type caller 2 10.27 us 6.17 us 5.13 us spinlock load_balance+0xc03 1 5.29 us 5.29 us 5.29 us rwlock:W ep_scan_ready_list+0x54 1 4.12 us 4.12 us 4.12 us spinlock smpboot_thread_fn+0x116 1 3.28 us 3.28 us 3.28 us mutex pipe_read+0x50 - Implement -t/--threads option when using BPF: $ sudo ./perf lock contention -abt -E 5 sleep 1 contended total wait max wait avg wait pid comm 1 740.66 ms 740.66 ms 740.66 ms 1950 nv_queue 3 305.50 ms 298.19 ms 101.83 ms 1884 nvidia-modeset/ 1 25.14 us 25.14 us 25.14 us 2725038 EventManager_De 12 23.09 us 9.30 us 1.92 us 0 swapper 1 20.18 us 20.18 us 20.18 us 2725033 EventManager_De - Add -l/--lock-addr to aggregate per-lock-instance contention: $ sudo ./perf lock contention -abl sleep 1 contended total wait max wait avg wait address symbol 1 36.28 us 36.28 us 36.28 us ffff92615d6448b8 9 10.91 us 1.84 us 1.21 us ffffffffbaed50c0 rcu_state 1 10.49 us 10.49 us 10.49 us ffff9262ac4f0c80 8 4.68 us 1.67 us 585 ns ffffffffbae07a40 jiffies_lock 3 3.03 us 1.45 us 1.01 us ffff9262277861e0 1 924 ns 924 ns 924 ns ffff926095ba9d20 1 436 ns 436 ns 436 ns ffff9260bfda4f60 perf record: - Add remaining branch filters: "no_cycles", "no_flags" & "hw_index", to be used with hardware such as Intel's LBR that allows things like stitching stacks of two samples to overcome the limits of the number of LBR registers. Symbol resolution: - Handle .debug files created with 'objcopy --only-keep-debug', where program headers are zeroed and thus can't be used for adjustments, use the info in the runtime_ss (runtime ELF) instead. perf trace: - Add BPF based augmenter for the 'perf_event_open's 'struct perf_event_attr' argument. - Add BPF based augmenter for the 'clock_gettime's 'struct timespec' argument. - In both cases the syscall tracepoint has just the pointer value, we need to hook a BPF program to collect the pointer contents, and then, in userspace, pretty print it in 'perf trace'. perf list: - Introduce JSON output of events. - Streamline how the expression specifying what events should be shown is handled, fixing several corner cases, such as the metric filter that is specified as a glob but was using strstr(). perf probe: - Fix to avoid crashing if DW_AT_decl_file is NULL, coping with clang generating DWARF5 like that. - Use dwarf_attr_integrate() as generic DWARF attr accessor as it supersedes dwarf_attr(), supporting abstact origin DIEs. perf inject: - Set PERF_RECORD_MISC_BUILD_ID_SIZE in the PERF_RECORD_HEADER_BUILD_ID so that perf.data readers can get the real build-id size and avoid trailing zeros. perf data: - Add tracepoint fields when converting a perf.data file to JSON. arm64: - Fix mksyscalltbl, don't lose syscalls due to sort -nu. - Add Arm Neoverse V2 PMU events. riscv: - Add riscv sbi firmware std event files. - Add Sifive U74 vendor events (JSON) file. - Add some more events and metrics for Alderlake/Alderlake-N. Documentation: - Add data documentation for the PMU structs in the C source code. Miscellaneous: - Periodic sanitization of headers, adding missing includes, removing needless ones, creating new ones, etc. - Use sig_atomic_t for signal handlers to avoid undefined behaviour in all perf tools. - Fixes for libbpf 1.0+ compatibility (maps, etc) on 'perf trace' BPF examples. - Remove some old perf bpf examples, leave the best ones that demonstrate how to associate BPF functions to points in the kernel. - Make quiet mode consistent between tools. - Use dedicated non-atomic clear/set bit helpers. - Use "grep -E" instead of "egrep" as recommended by warning emitted by GNU grep since at least version 3.8. - Complete list of supported subcommands in the 'perf daemon' help message. - Update John Garry's email address for arm64 perf tooling on the MAINTAINERS file, he moved from Huawei to Oracle. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY5yARAAKCRCyPKLppCJ+ J7bdAQCO4Y4gXKWv+AQc77aptQaCRmWy6T9ynsdv5gOV43NpCwD/TWZz8zcBqLSS fxYSgf2kOQ3Z9soE4/udsL5sDhFbsgA= =hLlg -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.2-1-2022-12-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "Libraries: - Drop the old copy of libtraceevent in tools/lib/traceevent/ now that all major distros ship it from its external repository. This is now just another feature detection, emitting a warning when the libtraceevent-dev[el] package isn't installed, disabling the build of perf features and tools that strictly require parsing things from tracefs while keeping the core functionality present and working with a subset of the events, the most used ones like CPU cycles, hardware cache and also vendor events, etc. This was tested with lots of containers for Fedora, Debian, OpenSUSE, Alpine Linux, Ubuntu, with cross builds, etc. Build: - Update to C standard to gnu11, like was done for the kernel. - Install the tools/lib/ libraries locally instead of having headers searched directly from the source code directories, to help the cases where we can build either from in-kernel source libraries or from the same library shipped as a distro package, as is the case with libbpf and was the case with libtraceevent. perf stat: - Do not delay the workload with --delay, the delay is just for starting to count the events, to skip noise at workload startup. - When we have events for each cgroup, the metric should be printed for each cgroup separately. $ perf stat -a --for-each-cgroup system.slice,user.slice --metric-only sleep 1 Performance counter stats for 'system wide': GHz insn per cycle branch-misses of all branches system.slice 3.792 0.61 3.24% user.slice 3.661 2.32 0.37% - Fix printing field separator in CSV metrics output. - Fix --metric-only --json output. - Fix summary output in CSV with --metric-only. - Update event group check for support of uncore event. perf test: - Stop requiring a C toolchain in shell tests, instead add a workload option that has all the previously C snippets built as part of 'perf test -w' that then get used in the 'perf test' shell scripts. - Add event group test for events in multiple PMUs - The "kernel lock contention analysis" test should not print warnings in quiet mode. - Add attr tests for ARM64's new VG register. - Fix record test on KVM guests, as using precise flag with the br_inst_retired.near_call event causes the test fail on KVM guests, even when the guests have PMU forwarding enabled and the event itself is supported, so just remove the precise flag from the event. - Add mechanism for skipping attr tests on specific kernel versions where it is known that these checks will fail. - Skip watchpoint tests if no watchpoints available. - Add more Intel PT 'perf test' entries: hybrid CPUs, split the packet decoder into a suite of subtests. perf script: - Introduce task analyzer python script, where one first records some events: Recording can be done in two ways: $ perf script record tasks-analyzer -- sleep 10 $ perf record -e sched:sched_switch -a -- sleep 10 The script can parse any perf.data files, as long as it has sched:sched_switch events, other events will be ignored. The most simple report use case is to just call the script without arguments. Runtime is the time the task was running on the CPU, Time Out-In is the time between the process being scheduled *out* and scheduled back *in*. So the last time span between two executions: $ perf script report tasks-analyzer Switched-In Switched-Out CPU PID TID Comm Runtime Time Out-In 15576.658891407 15576.659156086 4 2412 2428 gdbus 265 1949 15576.659111320 15576.659455410 0 2412 2412 gnome-shell 344 2267 15576.659491326 15576.659506173 2 74 74 kworker/2:1 15 13145 15576.659506173 15576.659825748 2 2858 2858 gnome-terminal- 320 63263 15576.659871270 15576.659902872 6 20932 20932 kworker/u16:0 32 2314582 15576.659909951 15576.659945501 3 27264 27264 sh 36 -1 15576.659853285 15576.659971052 7 27265 27265 perf 118 5050741 [...] perf lock: - Allow concurrent record and report to support live monitoring of kernel lock contention without BPF: # perf lock record -a -o- sleep 1 | perf lock contention -i- contended total wait max wait avg wait type caller 2 10.27 us 6.17 us 5.13 us spinlock load_balance+0xc03 1 5.29 us 5.29 us 5.29 us rwlock:W ep_scan_ready_list+0x54 1 4.12 us 4.12 us 4.12 us spinlock smpboot_thread_fn+0x116 1 3.28 us 3.28 us 3.28 us mutex pipe_read+0x50 - Implement -t/--threads option when using BPF: $ sudo ./perf lock contention -abt -E 5 sleep 1 contended total wait max wait avg wait pid comm 1 740.66 ms 740.66 ms 740.66 ms 1950 nv_queue 3 305.50 ms 298.19 ms 101.83 ms 1884 nvidia-modeset/ 1 25.14 us 25.14 us 25.14 us 2725038 EventManager_De 12 23.09 us 9.30 us 1.92 us 0 swapper 1 20.18 us 20.18 us 20.18 us 2725033 EventManager_De - Add -l/--lock-addr to aggregate per-lock-instance contention: $ sudo ./perf lock contention -abl sleep 1 contended total wait max wait avg wait address symbol 1 36.28 us 36.28 us 36.28 us ffff92615d6448b8 9 10.91 us 1.84 us 1.21 us ffffffffbaed50c0 rcu_state 1 10.49 us 10.49 us 10.49 us ffff9262ac4f0c80 8 4.68 us 1.67 us 585 ns ffffffffbae07a40 jiffies_lock 3 3.03 us 1.45 us 1.01 us ffff9262277861e0 1 924 ns 924 ns 924 ns ffff926095ba9d20 1 436 ns 436 ns 436 ns ffff9260bfda4f60 perf record: - Add remaining branch filters: "no_cycles", "no_flags" & "hw_index", to be used with hardware such as Intel's LBR that allows things like stitching stacks of two samples to overcome the limits of the number of LBR registers. Symbol resolution: - Handle .debug files created with 'objcopy --only-keep-debug', where program headers are zeroed and thus can't be used for adjustments, use the info in the runtime_ss (runtime ELF) instead. perf trace: - Add BPF based augmenter for the 'perf_event_open's 'struct perf_event_attr' argument. - Add BPF based augmenter for the 'clock_gettime's 'struct timespec' argument. - In both cases the syscall tracepoint has just the pointer value, we need to hook a BPF program to collect the pointer contents, and then, in userspace, pretty print it in 'perf trace'. perf list: - Introduce JSON output of events. - Streamline how the expression specifying what events should be shown is handled, fixing several corner cases, such as the metric filter that is specified as a glob but was using strstr(). perf probe: - Fix to avoid crashing if DW_AT_decl_file is NULL, coping with clang generating DWARF5 like that. - Use dwarf_attr_integrate() as generic DWARF attr accessor as it supersedes dwarf_attr(), supporting abstact origin DIEs. perf inject: - Set PERF_RECORD_MISC_BUILD_ID_SIZE in the PERF_RECORD_HEADER_BUILD_ID so that perf.data readers can get the real build-id size and avoid trailing zeroes. perf data: - Add tracepoint fields when converting a perf.data file to JSON. arm64: - Fix mksyscalltbl, don't lose syscalls due to sort -nu. - Add Arm Neoverse V2 PMU events. riscv: - Add riscv sbi firmware std event files. - Add Sifive U74 vendor events (JSON) file. - Add some more events and metrics for Alderlake/Alderlake-N. Documentation: - Add data documentation for the PMU structs in the C source code. Miscellaneous: - Periodic sanitization of headers, adding missing includes, removing needless ones, creating new ones, etc. - Use sig_atomic_t for signal handlers to avoid undefined behaviour in all perf tools. - Fixes for libbpf 1.0+ compatibility (maps, etc) on 'perf trace' BPF examples. - Remove some old perf bpf examples, leave the best ones that demonstrate how to associate BPF functions to points in the kernel. - Make quiet mode consistent between tools. - Use dedicated non-atomic clear/set bit helpers. - Use "grep -E" instead of "egrep" as recommended by warning emitted by GNU grep since at least version 3.8. - Complete list of supported subcommands in the 'perf daemon' help message. - Update John Garry's email address for arm64 perf tooling on the MAINTAINERS file, he moved from Huawei to Oracle" * tag 'perf-tools-for-v6.2-1-2022-12-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (239 commits) libperf: Fix install_pkgconfig target perf tools: Use "grep -E" instead of "egrep" perf stat: Do not delay the workload with --delay perf evlist: Remove group option. perf build: Fix python/perf.so library's name perf test arm64: Add attr tests for new VG register perf test: Add mechanism for skipping attr tests on kernel versions perf test: Add mechanism for skipping attr tests on auxiliary vector values perf test: Add ability to test exit code for attr tests perf test: add new task-analyzer tests perf script: task-analyzer add csv support perf script: Introduce task analyzer python script perf cs-etm: Print auxtrace info even if OpenCSD isn't linked perf cs-etm: Cleanup cs_etm__process_auxtrace_info() perf cs-etm: Tidy up auxtrace info header printing perf cs-etm: Remove unused stub methods perf cs-etm: Print unknown header version as an error perf test: Update perf lock contention test perf lock contention: Add -l/--lock-addr option perf lock contention: Implement -t/--threads option for BPF ... |
||
Alexander Gordeev
|
4ff17c448a |
libperf: Fix install_pkgconfig target
Commit
|
||
Arnaldo Carvalho de Melo
|
1a931707ad |
Merge remote-tracking branch 'torvalds/master' into perf/core
To resolve a trivial merge conflict with
|
||
Linus Torvalds
|
58bcac11fd |
USB/Thunderbolt driver changes for 6.2-rc1
Here is the large set of USB and Thunderbolt driver changes for 6.2-rc1. Overall, thanks to the removal of a driver, more lines were removed than added, a nice change. Highlights include: - removal of the sisusbvga driver that was not used by anyone anymore - minor thunderbolt driver changes and tweaks - chipidea driver updates - usual set of typec driver features and hardware support added - musb minor driver fixes - fotg210 driver fixes, bringing that hardware back from the "dead" - minor dwc3 driver updates - addition, and then removal, of a list.h helper function for many USB and other subsystem drivers, that ended up breaking the build. That will come back for 6.3-rc1, it missed this merge window. - usual xhci updates and enhancements - usb-serial driver updates and support for new devices - other minor USB driver updates All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY5wvYg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yl5DACgssl/ag4zDePHpfoiG5zEGEzH8XsAoMFrzvzu d43hsH3qsfDGSZRkJJMu =ORDd -----END PGP SIGNATURE----- Merge tag 'usb-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB and Thunderbolt driver updates from Greg KH: "Here is the large set of USB and Thunderbolt driver changes for 6.2-rc1. Overall, thanks to the removal of a driver, more lines were removed than added, a nice change. Highlights include: - removal of the sisusbvga driver that was not used by anyone anymore - minor thunderbolt driver changes and tweaks - chipidea driver updates - usual set of typec driver features and hardware support added - musb minor driver fixes - fotg210 driver fixes, bringing that hardware back from the "dead" - minor dwc3 driver updates - addition, and then removal, of a list.h helper function for many USB and other subsystem drivers, that ended up breaking the build. That will come back for 6.3-rc1, it missed this merge window. - usual xhci updates and enhancements - usb-serial driver updates and support for new devices - other minor USB driver updates All of these have been in linux-next for a while with no reported problems" * tag 'usb-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (153 commits) usb: gadget: uvc: Rename bmInterfaceFlags -> bmInterlaceFlags usb: dwc2: power on/off phy for peripheral mode in dual-role mode usb: dwc2: disable lpm feature on Rockchip SoCs dt-bindings: usb: mtk-xhci: add support for mt7986 usb: dwc3: core: defer probe on ulpi_read_id timeout usb: ulpi: defer ulpi_register on ulpi_read_id timeout usb: misc: onboard_usb_hub: add Genesys Logic GL850G hub support dt-bindings: usb: Add binding for Genesys Logic GL850G hub controller dt-bindings: vendor-prefixes: add Genesys Logic usb: fotg210-udc: fix potential memory leak in fotg210_udc_probe() usb: typec: tipd: Set mode of operation for USB Type-C connector usb: gadget: udc: drop obsolete dependencies on COMPILE_TEST usb: musb: remove extra check in musb_gadget_vbus_draw usb: gadget: uvc: Prevent buffer overflow in setup handler usb: dwc3: qcom: Fix memory leak in dwc3_qcom_interconnect_init usb: typec: wusb3801: fix fwnode refcount leak in wusb3801_probe() usb: storage: Add check for kcalloc USB: sisusbvga: use module_usb_driver() USB: sisusbvga: rename sisusb.c to sisusbvga.c USB: sisusbvga: remove console support ... |
||
Linus Torvalds
|
8fa590bf34 |
ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. * Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit |
||
Linus Torvalds
|
94a855111e |
- Add the call depth tracking mitigation for Retbleed which has
been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd 73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/ zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs= =cRy1 -----END PGP SIGNATURE----- Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ... |
||
Linus Torvalds
|
ad76bf1ff1 |
memblock: extend test coverage
* add tests that trigger reallocation of memblock structures from memblock itself via memblock_double_array() * add tests for memblock_alloc_exact_nid_raw() that verify that requested node and memory range constraints are respected. -----BEGIN PGP SIGNATURE----- iQFMBAABCAA2FiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmOYL14YHG1pa2UucmFw b3BvcnRAZ21haWwuY29tAAoJEDkDhibLDv2RZdcH/2AE447oXzVO2lzOgkqQH1EX xJdaa7hu00h2Euzv2lgcOHroHGXDP8wYjUV2cEyNZMP0WOMiO8i6rwIKmrzWufcm R+ZoKPQV/Nc+7rIycpW455yLxcgsVIpUILK2BQEkDCGYugSHKb7IYdcA9KDJwtmR xIG9j8nsuwWJtmtAuQqNOBmsc5FzKNYFa/RtDiJoMFmQNK3UqB8G8VCASdP0DYvH 7MXPcyRmlwpmOsKoNKi2/wQBsiag8/PLgcZv5vYg+E6no1tMG6u7pgDS12Sn6ZvA I8gThJ8HNAo0d1O2SnbkicMx2CqrPFSub3QXaEFjCZF5mdBcirxHc/VBKj50TXU= =iXEA -----END PGP SIGNATURE----- Merge tag 'memblock-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: "Extend test coverage: - add tests that trigger reallocation of memblock structures from memblock itself via memblock_double_array() - add tests for memblock_alloc_exact_nid_raw() that verify that requested node and memory range constraints are respected" * tag 'memblock-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock tests: remove completed TODO item memblock tests: add generic NUMA tests for memblock_alloc_exact_nid_raw memblock tests: add bottom-up NUMA tests for memblock_alloc_exact_nid_raw memblock tests: add top-down NUMA tests for memblock_alloc_exact_nid_raw memblock tests: introduce range tests for memblock_alloc_exact_nid_raw memblock test: Update TODO list memblock test: Add test to memblock_reserve() 129th region memblock test: Add test to memblock_add() 129th region |
||
Tiezhu Yang
|
818448e9cf |
perf tools: Use "grep -E" instead of "egrep"
The latest version of grep claims the egrep is now obsolete so the build now contains warnings that look like: egrep: warning: egrep is obsolescent; using grep -E fix this up by moving the related file to use "grep -E" instead. sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/perf` Here are the steps to install the latest grep: wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz tar xf grep-3.8.tar.gz cd grep-3.8 && ./configure && make sudo make install export PATH=/usr/local/bin:$PATH Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1668762999-9297-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
c587e77e10 |
perf stat: Do not delay the workload with --delay
The -D/--delay option is to delay the measure after the program starts. But the current code goes to sleep before starting the program so the program is delayed too. This is not the intention, let's fix it. Before: $ time sudo ./perf stat -a -e cycles -D 3000 sleep 4 Events disabled Events enabled Performance counter stats for 'system wide': 4,326,949,337 cycles 4.007494118 seconds time elapsed real 0m7.474s user 0m0.356s sys 0m0.120s It ran the workload for 4 seconds and gave the 3 second delay. So it should skip the first 3 second and measure the last 1 second only. But as you can see, it delays 3 seconds and ran the workload after that for 4 seconds. So the total time (real) was 7 seconds. After: $ time sudo ./perf stat -a -e cycles -D 3000 sleep 4 Events disabled Events enabled Performance counter stats for 'system wide': 1,063,551,013 cycles 1.002769510 seconds time elapsed real 0m4.484s user 0m0.385s sys 0m0.086s The bug was introduced when it changed enablement of system-wide events with a command line workload. But it should've considered the initial delay case. The code was reworked since then (in |
||
Ian Rogers
|
5f8f95673f |
perf evlist: Remove group option.
The group option predates grouping events using curly braces added in
commit
|
||
Linus Torvalds
|
08cdc21579 |
iommufd for 6.2
iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl s7fAu+CQ1pr9+9NKGevD+frw8Solsw4= =jJkd -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ... |
||
Ian Rogers
|
caec54705a |
perf build: Fix python/perf.so library's name
Since Python 3.3 extensions have a suffix encoding platform and version information. For example, the perf extension was previously perf.so but now maybe perf.cpython-310-x86_64-linux-gnu.so. Compute the extension using Python and then use this in the target name. Doing this avoids the "perf.so" target always being rebuilt. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Shaomin Deng <dengshaomin@cdjrlc.com> Cc: Stephane Eranian <eranian@google.com> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221213232651.1269909-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
9440ebdc33 |
perf test arm64: Add attr tests for new VG register
Ensure that the availability of the VG register behaves as expected depending on the kernel version and SVE support. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221213114739.2312862-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
ee26adf627 |
perf test: Add mechanism for skipping attr tests on kernel versions
The first two version numbers are used since that is where the ABI changes happen, so seems to be the most useful for now. 'Until' is exclusive and 'since' is inclusive so that the same version number can be used to mark a point where the change comes into effect. This allows keeping the tests in a state where new tests will also pass on older kernels if the existence of a new feature isn't explicitly broadcast by the kernel. For example extended user regs are currently discovered by trial and error calls to perf_event_open. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221213114739.2312862-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
c3a8f85351 |
perf test: Add mechanism for skipping attr tests on auxiliary vector values
This can be used to skip tests or provide different test values on different platforms. For example to run a test only where Arm SVE is present add this to the config section: auxv = auxv["AT_HWCAP"] & 0x200000 == 0x200000 The value is a freeform Python expression that is evaled in the context of a map called "auxv" that contains the decoded auxiliary vector. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221213114739.2312862-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
a8f26192ca |
perf test: Add ability to test exit code for attr tests
Currently the return value is used to skip the test, but sometimes it can be useful to test if a certain command should return a certain exit code. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221213114739.2312862-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Petar Gligoric
|
e8478b84d6 |
perf test: add new task-analyzer tests
Provide task-analyzer test cases for all possible arguments and a subset of possible combinations. 12 Tests in total. test_basic: - cmd:"perf script report task-analyzer" - Fundamental test of script without arguments. - Check for standard output. test_ns_rename: - cmd:"perf script report task-analyzer --ns --rename-comms-by-tids 0:random" - Standard task with timestamps in nanoseconds and comm renamed. - Check for standard output. test_ms_filtertasks_highlight: - cmd:"perf script report task-analyzer --ms --filter-tasks perf --highlight-tasks perf" - Standard task with timestamps in milliseconds, task filtered out and highlighted. - Check for standard output. test_extended_times_timelimit_limittasks: - cmd "perf script report task-analyzer --extended-times --time-limit :99999" - Standard task with additional schedule out/in info and timlimit active at 99999. - Check for extended table output. test_summary: - cmd:"perf script report task-analyzer --summary" - Standard task with additional summary output. - Check for summary print. test_summary_extended: - cmd:"perf script report task-analyzer --summary-extended" - Standard task with summary and additional schedule in/out info. - Chceck for extended table print. test_summaryonly: - cmd:"perf script report task-analyzer --summary-only" - Only summary should be printed. - Check for summary print. test_extended_times_summary_ns: - cmd:"perf script report task-analyzer --extended-times --summary --ns" - Standard task with extended schedule in/out information and summary in ns. - Check for extended table and summary. test_csv: - cmd:"perf script report task-analyzer --csv csv" - Print standard task to csv file in csv format. - Check for csv format. test_csv_extended_times: - cmd:"perf script report task-analyzer --csv csv --extended-times" - Print standard task to csv file in csv format with additional schedule in/out information. - Check for additional information and csv format. test_csvsummary: - cmd:"perf script report task-analyzer --csv-summary csvsummary" - Print summary to csvsummary file in csv format. - Check for csv format. test_csvsummary_extended: - cmd:"perf script report task-analyzer --csv-summary csvsummary --summary-extended" - Print summary to csvsummary file in csv format with additional schedule in/out information. - Check for additional information and csv format. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Petar Gligoric <petar.gligoric@rohde-schwarz.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221206154406.41941-4-petar.gligor@gmail.com Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Petar Gligoric
|
fdd0f81f05 |
perf script: task-analyzer add csv support
This patch adds the possibility to write the trace and the summary as csv files to a user specified file. A format as such simplifies further data processing. This is achieved by having ";" as separators instead of spaces and solely one header per file. Additional parameters are being considered, like in the normal usage of the script. Colors are turned off in the case of a csv output, thus the highlight option is also being ignored. Usage: Write standard task to csv file: $ perf script report tasks-analyzer --csv <file> write limited output to csv file in nanoseconds: $ perf script report tasks-analyzer --csv <file> --ns --limit-to-tasks 1337 Write summary to a csv file: $ perf script report tasks-analyzer --csv-summary <file> Write summary to csv file with additional schedule information: $ perf script report tasks-analyzer --csv-summary <file> --summary-extended Write both summary and standard task to a csv file: $ perf script report tasks-analyzer --csv --csv-summary The following examples illustrate what is possible with the CSV output. The first command sequence will record all scheduler switch events for 10 seconds, the task-analyzer calculates task information like runtimes as CSV. A small python snippet using pandas and matplotlib will visualize the most frequent task (e.g. kworker/1:1) runtimes - each runtime as a bar in a bar chart: $ perf record -e sched:sched_switch -a -- sleep 10 $ perf script report tasks-analyzer --ns --csv tasks.csv $ cat << EOF > /tmp/freq-comm-runtimes-bar.py import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv("tasks.csv", sep=';') most_freq_comm = df["COMM"].value_counts().idxmax() most_freq_runtimes = df[df["COMM"]==most_freq_comm]["Runtime"] plt.title(f"Runtimes for Task {most_freq_comm} in Nanoseconds") plt.bar(range(len(most_freq_runtimes)), most_freq_runtimes) plt.show() $ python3 /tmp/freq-comm-runtimes-bar.py As a seconds example, the subsequent script generates a pie chart of all accumulated tasks runtimes for 10 seconds of system recordings: $ perf record -e sched:sched_switch -a -- sleep 10 $ perf script report tasks-analyzer --csv-summary task-summary.csv $ cat << EOF > /tmp/accumulated-task-pie.py import pandas as pd from matplotlib.pyplot import pie, axis, show df = pd.read_csv("task-summary.csv", sep=';') sums = df.groupby(df["Comm"])["Accumulated"].sum() axis("equal") pie(sums, labels=sums.index); show() EOF $ python3 /tmp/accumulated-task-pie.py A variety of other visualizations are possible in matplotlib and other environments. Of course, pandas, numpy and co. also allow easy statistical analysis of the data! Signed-off-by: Petar Gligoric <petar.gligoric@rohde-schwarz.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221206154406.41941-3-petar.gligor@gmail.com Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Hagen Paul Pfeifer
|
e76aff0523 |
perf script: Introduce task analyzer python script
Introduce a new 'perf script' to analyze task scheduling behavior. During the task analysis, some data is always needed - which goes beyond the simple time of switching on and off a task (process/thread). This concerns for example the runtime of a process or the frequency with which the process was called. This script serves to simplify this recurring analyze process. It immediately provides the user with helpful task characteristic information about the tasks runtimes. Usage: Recorded can be in two ways: $ perf script record tasks-analyzer -- sleep 10 $ perf record -e sched:sched_switch -a -- sleep 10 The script can parse all perf.data files, most important: sched:sched_switch events are mandatory, other events will be ignored. Most simple report use case is to just call the script without arguments: $ perf script report tasks-analyzer Switched-In Switched-Out CPU PID TID Comm Runtime Time Out-In 15576.658891407 15576.659156086 4 2412 2428 gdbus 265 1949 15576.659111320 15576.659455410 0 2412 2412 gnome-shell 344 2267 15576.659491326 15576.659506173 2 74 74 kworker/2:1 15 13145 15576.659506173 15576.659825748 2 2858 2858 gnome-terminal- 320 63263 15576.659871270 15576.659902872 6 20932 20932 kworker/u16:0 32 2314582 15576.659909951 15576.659945501 3 27264 27264 sh 36 -1 15576.659853285 15576.659971052 7 27265 27265 perf 118 5050741 [...] What is not shown here are the ASCII color sequences. For example, if the task consists of only one thread, the TID is grayed out. Runtime is the time the task was running on the CPU, Time Out-In is the time between the process being scheduled *out* and scheduled back *in*. So the last time span between two executions. If -1 is printed, then the task simply ran the first time in the measurements - a Out-In delta could not be calculated. In addition to the chronological representation, there is a summary on task level. This output can be additionally switched on via the --summary option and provides information such as max, min & average runtime per process. The maximum runtime is often important for debugging. The call looks like this: $ perf script report tasks-analyzer --summary Summary Task Information Runtime Information PID TID Comm Runs Accumulated Mean Median Min Max Max At 14 14 ksoftirqd/0 13 334 26 15 9 127 15571.621211956 15 15 rcu_preempt 133 1778 13 13 2 33 15572.581176024 16 16 migration/0 3 49 16 13 12 24 15571.608915425 20 20 migration/1 3 34 11 13 8 13 15571.639101555 25 25 migration/2 3 32 11 12 9 12 15575.639239896 [...] Besides these two options, there are a number of other options that change the output and behavior. This can be queried via --help. Options worth mentioning include: - filter-tasks - filter out unneeded tasks, --filter-task 1337,/sbin/init - highlight-tasks - more pleasant focusing, --highlight-tasks 1:red,mutt:yellow - extended-times - show combinations of elapsed times between schedule in/schedule out - summary-extended - summary with additional information, like maximum delta time statistics - rename-comms-by-tids - handy for inexpressive processnames like python, --rename 1337:my-python-app - ms - show timestamps in milliseconds, nanoseconds is also possible (--ns) - time-limit - limit the analyzer to a time range, --time-limit 15576.0:15576.1 Script is tested and prime time ready for python2 & python3: - make PYTHON=python3 prefix=/usr/local install - make PYTHON=python2 prefix=/usr/local install Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221206154406.41941-2-petar.gligor@gmail.com Signed-off-by: Petar Gligoric <petar.gligoric@rohde-schwarz.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
55c1de9973 |
perf cs-etm: Print auxtrace info even if OpenCSD isn't linked
Printing the info doesn't have any dependency on OpenCSD, and neither does recording Coresight data. Because it's sometimes useful to look at the info for debugging, it makes sense to be able to see it on the same platform that the recording was made on. So pull the auxtrace info printing parts into a new file that is always compiled into Perf. Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20221212155513.2259623-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
fd63091f2a |
perf cs-etm: Cleanup cs_etm__process_auxtrace_info()
hdr is a copy of 3 values of ptr and doesn't need to be long lived. So just use ptr instead which means the malloc and the extra error path can be removed to simplify things. Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20221212155513.2259623-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
b00204f5c2 |
perf cs-etm: Tidy up auxtrace info header printing
cs_etm__print_auxtrace_info() is called twice in case there is an error somewhere in cs_etm__process_auxtrace_info(), but all the info is already available at the beginning so just print it there instead. Also use u64 and the already cast ptr variable to make it more consistent with the rest of the etm code. Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20221212155513.2259623-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
fe55ba1832 |
perf cs-etm: Remove unused stub methods
These aren't used outside of cs-etm so don't need stubs. Leave cs_etm__process_auxtrace_info() which is used externally, and add an error message so that it's obvious to users why it causes errors. Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20221212155513.2259623-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
ab6bd55e99 |
perf cs-etm: Print unknown header version as an error
This is an error rather than just for the raw trace dump so always print it as an error. Also remove the duplicate header version check. Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20221212155513.2259623-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
22ddcb6b4a |
perf test: Update perf lock contention test
Add test cases for the task and addr aggregation modes. $ sudo ./perf test -v contention 86: kernel lock contention analysis test : --- start --- test child forked, pid 680006 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr test child finished with 0 ---- end ---- kernel lock contention analysis test: Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221209190727.759804-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
688d2e8de2 |
perf lock contention: Add -l/--lock-addr option
The -l/--lock-addr option is to implement per-lock-instance contention stat using LOCK_AGGR_ADDR. It displays lock address and optionally symbol name if exists. $ sudo ./perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 1 36.28 us 36.28 us 36.28 us ffff92615d6448b8 9 10.91 us 1.84 us 1.21 us ffffffffbaed50c0 rcu_state 1 10.49 us 10.49 us 10.49 us ffff9262ac4f0c80 8 4.68 us 1.67 us 585 ns ffffffffbae07a40 jiffies_lock 3 3.03 us 1.45 us 1.01 us ffff9262277861e0 1 924 ns 924 ns 924 ns ffff926095ba9d20 1 436 ns 436 ns 436 ns ffff9260bfda4f60 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221209190727.759804-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
eca949b2b4 |
perf lock contention: Implement -t/--threads option for BPF
The BPF didn't show the per-thread stat properly. Use task's thread id (PID) as a key instead of stack_id and add a task_data map to save task comm names. $ sudo ./perf lock con -abt -E 5 sleep 1 contended total wait max wait avg wait pid comm 1 740.66 ms 740.66 ms 740.66 ms 1950 nv_queue 3 305.50 ms 298.19 ms 101.83 ms 1884 nvidia-modeset/ 1 25.14 us 25.14 us 25.14 us 2725038 EventManager_De 12 23.09 us 9.30 us 1.92 us 0 swapper 1 20.18 us 20.18 us 20.18 us 2725033 EventManager_De Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221209190727.759804-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
fd507d3e35 |
perf lock contention: Add lock_data.h for common data
Accessing BPF maps should use the same data types. Add bpf_skel/lock_data.h to define the common data structures. No functional changes. Committer notes: Fixed contention_key.stack_id missing rename to contention_key.stack_or_task_id. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221209190727.759804-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Khem Raj
|
3cad53a6f9 |
perf python: Account for multiple words in CC
Sometimes build systems may append options e.g. --sysroot etc. to CC variable especially in cross-compile environments like yocto project where CC varable is composed of cross-compiler name and some needed options for it to work in a relocatable environment. Therefore separate out the compiler name from rest of the options in CC, then add the options via second argument to Popen() API Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Khem Raj <raj.khem@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Fangrui Song <maskray@google.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Keeping <john@metanate.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Link: https://lore.kernel.org/r/20221205025534.150006-1-raj.khem@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
167b266bf6 |
perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
In BTF, tracepoint definitions have the "btf_trace_" prefix. The
off-cpu profiler needs to check the signature of the sched_switch event
using that definition. But there's a typo (s/bpf/btf/) so it failed
always.
Fixes:
|
||
Athira Rajeev
|
232b82d201 |
perf test: Update event group check for support of uncore event
The event group test checks group creation for combinations of hw, sw
and uncore PMU events. Some of the uncore pmus may require additional
permission to access the counters.
For example, in case of hv_24x7, partition need to have permissions to
access hv_24x7 pmu counters. If not, event_open will fail. Hence add a
sanity check to see if event_open succeeds before proceeding with the
test.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
b9a49f8cb0 |
perf tools: Check if libtracevent has TEP_FIELD_IS_RELATIVE
Some distros have older versions of libtraceevent where TEP_FIELD_IS_RELATIVE and its associated semantics are not present, so we need to check if the version has it, it was introduced in libtraceevent 1.5.0. Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org>, Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
4171925aa9 |
tools lib traceevent: Remove libtraceevent
libtraceevent is now out-of-date and it is better to depend on the system version. Remove this code that is no longer depended upon by any builds. Committer notes: Removed the removed tools/lib/traceevent/ from tools/perf/MANIFEST, so that 'make perf-tar-src-pkg' works. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20221130062935.2219247-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
378ef0f5d9 |
perf build: Use libtraceevent from the system
Remove the LIBTRACEEVENT_DYNAMIC and LIBTRACEFS_DYNAMIC make command line variables. If libtraceevent isn't installed or NO_LIBTRACEEVENT=1 is passed to the build, don't compile in libtraceevent and libtracefs support. This also disables CONFIG_TRACE that controls "perf trace". CONFIG_LIBTRACEEVENT is used to control enablement in Build/Makefiles, HAVE_LIBTRACEEVENT is used in C code. Without HAVE_LIBTRACEEVENT tracepoints are disabled and as such the commands kmem, kwork, lock, sched and timechart are removed. The majority of commands continue to work including "perf test". Committer notes: Fixed up a tools/perf/util/Build reject and added: #include <traceevent/event-parse.h> to tools/perf/util/scripting-engines/trace-event-perl.c. Committer testing: $ rpm -qi libtraceevent-devel Name : libtraceevent-devel Version : 1.5.3 Release : 2.fc36 Architecture: x86_64 Install Date: Mon 25 Jul 2022 03:20:19 PM -03 Group : Unspecified Size : 27728 License : LGPLv2+ and GPLv2+ Signature : RSA/SHA256, Fri 15 Apr 2022 02:11:58 PM -03, Key ID 999f7cbf38ab71f4 Source RPM : libtraceevent-1.5.3-2.fc36.src.rpm Build Date : Fri 15 Apr 2022 10:57:01 AM -03 Build Host : buildvm-x86-05.iad2.fedoraproject.org Packager : Fedora Project Vendor : Fedora Project URL : https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/ Bug URL : https://bugz.fedoraproject.org/libtraceevent Summary : Development headers of libtraceevent Description : Development headers of libtraceevent-libs $ Default build: $ ldd ~/bin/perf | grep tracee libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007f1dcaf8f000) $ # perf trace -e sched:* --max-events 10 0.000 migration/0/17 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, dest_cpu: 1) 0.005 migration/0/17 sched:sched_wake_idle_without_ipi(cpu: 1) 0.011 migration/0/17 sched:sched_switch(prev_comm: "", prev_pid: 17 (migration/0), prev_state: 1, next_comm: "", next_prio: 120) 1.173 :0/0 sched:sched_wakeup(comm: "", pid: 3138 (gnome-terminal-), prio: 120) 1.180 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 3138 (gnome-terminal-), next_prio: 120) 0.156 migration/1/21 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, orig_cpu: 1, dest_cpu: 2) 0.160 migration/1/21 sched:sched_wake_idle_without_ipi(cpu: 2) 0.166 migration/1/21 sched:sched_switch(prev_comm: "", prev_pid: 21 (migration/1), prev_state: 1, next_comm: "", next_prio: 120) 1.183 :0/0 sched:sched_wakeup(comm: "", pid: 1602985 (kworker/u16:0-f), prio: 120, target_cpu: 1) 1.186 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 1602985 (kworker/u16:0-f), next_prio: 120) # Had to tweak tools/perf/util/setup.py to make sure the python binding shared object links with libtraceevent if -DHAVE_LIBTRACEEVENT is present in CFLAGS. Building with NO_LIBTRACEEVENT=1 uncovered some more build failures: - Make building of data-convert-bt.c to CONFIG_LIBTRACEEVENT=y - perf-$(CONFIG_LIBTRACEEVENT) += scripts/ - bpf_kwork.o needs also to be dependent on CONFIG_LIBTRACEEVENT=y - The python binding needed some fixups and util/trace-event.c can't be built and linked with the python binding shared object, so remove it in tools/perf/util/setup.py and exclude it from the list of dependencies in the python/perf.so Makefile.perf target. Building without libtraceevent-devel installed uncovered more build failures: - The python binding tools/perf/util/python.c was assuming that traceevent/parse-events.h was always available, which was the case when we defaulted to using the in-kernel tools/lib/traceevent/ files, now we need to enclose it under ifdef HAVE_LIBTRACEEVENT, just like the other parts of it that deal with tracepoints. - We have to ifdef the rules in the Build files with CONFIG_LIBTRACEEVENT=y to build builtin-trace.c and tools/perf/trace/beauty/ as we only ifdef setting CONFIG_TRACE=y when setting NO_LIBTRACEEVENT=1 in the make command line, not when we don't detect libtraceevent-devel installed in the system. Simplification here to avoid these two ways of disabling builtin-trace.c and not having CONFIG_TRACE=y when libtraceevent-devel isn't installed is the clean way. From Athira: <quote> tools/perf/arch/powerpc/util/Build -perf-y += kvm-stat.o +perf-$(CONFIG_LIBTRACEEVENT) += kvm-stat.o </quote> Then, ditto for arm64 and s390, detected by container cross build tests. - s/390 uses test__checkevent_tracepoint() that is now only available if HAVE_LIBTRACEEVENT is defined, enclose the callsite with ifder HAVE_LIBTRACEEVENT. Also from Athira: <quote> With this change, I could successfully compile in these environment: - Without libtraceevent-devel installed - With libtraceevent-devel installed - With “make NO_LIBTRACEEVENT=1” </quote> Then, finally rename CONFIG_TRACEEVENT to CONFIG_LIBTRACEEVENT for consistency with other libraries detected in tools/perf/. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20221205225940.3079667-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
40769665b6 |
perf jevents: Parse metrics during conversion
Currently the 'MetricExpr' json value is passed from the json file to the pmu-events.c. This change introduces an expression tree that is parsed into. The parsing is done largely by using operator overloading and python's 'eval' function. Two advantages in doing this are: 1) Broken metrics fail at compile time rather than relying on `perf test` to detect. `perf test` remains relevant for checking event encoding and actual metric use. 2) The conversion to a string from the tree can minimize the metric's string size, for example, preferring 1e6 over 1000000, avoiding multiplication by 1 and removing unnecessary whitespace. On x86 this reduces the string size by 2,930bytes (0.07%). In future changes it would be possible to programmatically generate the json expressions (a single line of text and so a pain to write manually) for an architecture using the expression tree. This could avoid copy-pasting metrics for all architecture variants. v4. Doesn't simplify "0*SLOTS" to 0, as the pattern is used to fix Intel metrics with topdown events. v3. Avoids generic types on standard types like set that aren't supported until Python 3.9, fixing an issue with Python 3.6 reported-by John Garry. v3 also fixes minor pylint issues and adds a call to Simplify on the read expression tree. v2. Improvements to type information. Committer notes: Added one-line fixer from Ian, see first Link: tag below. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/CAP-5=fWa=zNK_ecpWGoGggHCQx7z-oW0eGMQf19Maywg0QK=4g@mail.gmail.com Link: https://lore.kernel.org/r/20221207055908.1385448-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
b897613510 |
perf stat: Update event skip condition for system-wide per-thread mode and merged uncore and hybrid events
In print_counter_aggrdata(), it skips some events that has no aggregate
count. It's actually for system-wide per-thread mode and merged uncore
and hybrid events.
Let's update the condition to check them explicitly.
Fixes:
|
||
Ian Rogers
|
616aa32d6f |
perf build: Fixes for LIBTRACEEVENT_DYNAMIC
If LIBTRACEEVENT_DYNAMIC is enabled then avoid the install step for
the plugins. If disabled correct DESTDIR so that the plugins are
installed under <lib>/traceevent/plugins.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
cc2367eebb |
machine: Adopt is_lock_function() from builtin-lock.c
It is used in bpf_lock_contention.c and builtin-lock.c will be made CONFIG_LIBTRACEEVENT=y conditional, so move it to machine.c, that is always available. This makes those 4 global variables for sched and lock text start and end to move to 'struct machine' too, as conceivably we can have that info for several machine instances, say some 'perf diff' like tool. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ravi Bangoria
|
9d9b22beda |
perf test: Add event group test for events in multiple PMUs
Multiple events in a group can belong to one or more PMUs, however there are some limitations. One of the limitations is that perf doesn't allow creating a group of events from different hw PMUs. Write a simple test to create various combinations of hw, sw and uncore PMU events and verify group creation succeeds or fails as expected. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20221206043237.12159-3-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ravi Bangoria
|
336b92da1a |
perf tool: Move pmus list variable to a new file
The 'pmus' list variable is defined as static variable under pmu.c file. Introduce a new pmus.c file and migrate this variable to it. Also make it non static so that it can be accessed from outside. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: carsten.haitzler@arm.com Link: https://lore.kernel.org/r/20221206043237.12159-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
5b7a29fb0b |
perf util: Add host_is_bigendian to util.h
Avoid libtraceevent dependency for tep_is_bigendian or trace-event.h dependency for bigendian. Add a new host_is_bigendian to util.h, using the compiler defined __BYTE_ORDER__ when available. Committer notes: Added: #else /* !__BYTE_ORDER__ */ On that nested #ifdef block, as per Namhyung's suggestion. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221130062935.2219247-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
fce9a61914 |
perf util: Make header guard consistent with tool
Remove git reference by changing GIT_COMPAT_UTIL_H to __PERF_UTIL_H. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221130062935.2219247-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
3f81f72d30 |
perf stat: Fix invalid output handle
In this context, 'os' is already a pointer so the extra dereference
isn't required. This fixes the following test failure on aarch64:
$ ./perf test "json output" -vvv
92: perf stat JSON output linter :
--- start ---
Checking json output: no args Test failed for input:
...
Fatal error: glibc detected an invalid stdio handle
---- end ----
perf stat JSON output linter: FAILED!
Fixes:
|
||
Namhyung Kim
|
117195d9f8 |
perf stat: Fix multi-line metric output in JSON
When a metric produces more than one values, it missed to print the opening
bracket.
Fixes:
|
||
Ian Rogers
|
113bb39642 |
tools lib symbol: Add dependency test to install_headers
Compute the headers to be installed from their source headers and make each have its own build target to install it. Using dependencies avoids headers being reinstalled and getting a new timestamp which then causes files that depend on the header to be rebuilt. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20221202045743.2639466-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
5d890591db |
tools lib subcmd: Add dependency test to install_headers
Compute the headers to be installed from their source headers and make each have its own build target to install it. Using dependencies avoids headers being reinstalled and getting a new timestamp which then causes files that depend on the header to be rebuilt. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20221202045743.2639466-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
47e02b94a4 |
tools lib perf: Add dependency test to install_headers
Compute the headers to be installed from their source headers and make each have its own build target to install it. Using dependencies avoids headers being reinstalled and getting a new timestamp which then causes files that depend on the header to be rebuilt. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20221202045743.2639466-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
1849f9f009 |
tools lib api: Add dependency test to install_headers
Compute the headers to be installed from their source headers and make each have its own build target to install it. Using dependencies avoids headers being reinstalled and getting a new timestamp which then causes files that depend on the header to be rebuilt. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20221202045743.2639466-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
8f4b1e3ceb |
perf stat: Fix printing field separator in CSV metrics output
In 'perf stat' with CSV output option, number of fields in metrics
output is not matching with number of fields in other event output
lines.
Sample output below after applying patch to fix printing os->prefix.
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,82.11,msec,cpu-clock,82111626,100.00,1.000,CPUs utilized
S0,1,2,,context-switches,82109314,100.00,24.358,/sec
------
====> S0,1,,,,,,,1.71,stalled cycles per insn
The above command line uses field separator as "," via "-x," option and
per-socket option displays socket value as first field. But here the
last line for "stalled cycles per insn" has more separators. Each csv
output line is expected to have 8 field separators (for the 9 fields),
where as last line has 9 "," in the result. Patch fixes this issue.
The counter stats are displayed by function
"perf_stat__print_shadow_stats" in code "util/stat-shadow.c". While
printing the stats info for "stalled cycles per insn", function
"new_line_csv" is used as new_line callback.
The fields printed in each line contains: "Socket_id,aggr
nr,Avg,unit,event_name,run,enable_percent,ratio,unit"
The metric output prints Socket_id, aggr nr, ratio and unit. It has to
skip through remaining five fields ie,
Avg,unit,event_name,run,enable_percent. The csv line callback uses
"os->nfields" to know the number of fields to skip to match with other
lines.
Currently it is set as:
os.nfields = 3 + aggr_fields[config->aggr_mode] + (counter->cgrp ? 1 : 0);
But in case of aggregation modes, csv_sep already gets printed along
with each field (Function "aggr_printout" in util/stat-display.c). So
aggr_fields can be removed from nfields. And fixed number of fields to
skip has to be "4". This is to skip fields for: "avg, unit, event name,
run, enable_percent"
This needs 4 csv separators. Patch removes aggr_fields
and uses 4 as fixed number of os->nfields to skip.
After the patch:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,79.08,msec,cpu-clock,79085956,100.00,1.000,CPUs utilized
S0,1,7,,context-switches,79084176,100.00,88.514,/sec
------
====> S0,1,,,,,,0.81,stalled cycles per insn
Fixes:
|
||
Anshuman Khandual
|
955f6def55 |
perf record: Add remaining branch filters: "no_cycles", "no_flags" & "hw_index"
This adds all remaining branch filters i.e "no_cycles", "no_flags" and "hw_index". While here, also updates the documentation. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20221205064443.533587-1-anshuman.khandual@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |