Commit Graph

5373 Commits

Author SHA1 Message Date
Jiri Olsa
d31e0386a2 bpf: Fix sparse warnings in kprobe_multi_resolve_syms
Adding missing __user tags to fix sparse warnings:

kernel/trace/bpf_trace.c:2370:34: warning: incorrect type in argument 2 (different address spaces)
kernel/trace/bpf_trace.c:2370:34:    expected void const [noderef] __user *from
kernel/trace/bpf_trace.c:2370:34:    got void const *usyms
kernel/trace/bpf_trace.c:2376:51: warning: incorrect type in argument 2 (different address spaces)
kernel/trace/bpf_trace.c:2376:51:    expected char const [noderef] __user *src
kernel/trace/bpf_trace.c:2376:51:    got char const *
kernel/trace/bpf_trace.c:2443:49: warning: incorrect type in argument 1 (different address spaces)
kernel/trace/bpf_trace.c:2443:49:    expected void const *usyms
kernel/trace/bpf_trace.c:2443:49:    got void [noderef] __user *[assigned] usyms

Fixes: 0dcac27254 ("bpf: Add multi kprobe link")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220330110510.398558-1-jolsa@kernel.org
2022-03-30 14:10:06 +02:00
Masami Hiramatsu
73f9b911fa kprobes: Use rethook for kretprobe if possible
Use rethook for kretprobe function return hooking if the arch sets
CONFIG_HAVE_RETHOOK=y. In this case, CONFIG_KRETPROBE_ON_RETHOOK is
set to 'y' automatically, and the kretprobe internal data fields
switches to use rethook. If not, it continues to use kretprobe
specific function return hooks.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164826162556.2455864.12255833167233452047.stgit@devnote2
2022-03-28 19:38:09 -07:00
Masami Hiramatsu
261608f310 fprobe: Fix sparse warning for acccessing __rcu ftrace_hash
Since ftrace_ops::local_hash::filter_hash field is an __rcu pointer,
we have to use rcu_access_pointer() to access it.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164802093635.1732982.4938094876018890866.stgit@devnote2
2022-03-28 19:05:40 -07:00
Masami Hiramatsu
9052e4e837 fprobe: Fix smatch type mismatch warning
Fix the type mismatching warning of 'rethook_node vs fprobe_rethook_node'
found by Smatch.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164802092611.1732982.12268174743437084619.stgit@devnote2
2022-03-28 19:05:40 -07:00
Linus Torvalds
7001052160 Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
 where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
 
 Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
 limited to 2 instructions (and typically fewer) on branch targets not starting
 with ENDBR. CET-IBT also limits speculation of the next sequential instruction
 after the indirect CALL/JMP [1].
 
 CET-IBT is fundamentally incompatible with retpolines, but provides, as
 described above, speculation limits itself.
 
 [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
 bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
 1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
 urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
 BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
 y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
 go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
 9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
 ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
 x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
 dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
 EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
 beB/7u2KIUbKEkSN
 =jZfK
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
 "Add support for Intel CET-IBT, available since Tigerlake (11th gen),
  which is a coarse grained, hardware based, forward edge
  Control-Flow-Integrity mechanism where any indirect CALL/JMP must
  target an ENDBR instruction or suffer #CP.

  Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
  is limited to 2 instructions (and typically fewer) on branch targets
  not starting with ENDBR. CET-IBT also limits speculation of the next
  sequential instruction after the indirect CALL/JMP [1].

  CET-IBT is fundamentally incompatible with retpolines, but provides,
  as described above, speculation limits itself"

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  kvm/emulate: Fix SETcc emulation for ENDBR
  x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
  x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
  kbuild: Fixup the IBT kbuild changes
  x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
  x86: Remove toolchain check for X32 ABI capability
  x86/alternative: Use .ibt_endbr_seal to seal indirect calls
  objtool: Find unused ENDBR instructions
  objtool: Validate IBT assumptions
  objtool: Add IBT/ENDBR decoding
  objtool: Read the NOENDBR annotation
  x86: Annotate idtentry_df()
  x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
  x86: Annotate call_on_stack()
  objtool: Rework ASM_REACHABLE
  x86: Mark __invalid_creds() __noreturn
  exit: Mark do_group_exit() __noreturn
  x86: Mark stop_this_cpu() __noreturn
  objtool: Ignore extra-symbol code
  objtool: Rename --duplicate to --lto
  ...
2022-03-27 10:17:23 -07:00
Linus Torvalds
f022814633 Trace event fix of string verifier
- The run time string verifier that checks all trace event formats
   as they are read from the tracing file to make sure that the %s
   pointers are not reading something that no longer exists, failed
   to account for %*.s where the length given is zero, and the string
   is NULL. It incorrectly flagged it as a null pointer dereference and
   gave a WARN_ON().
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjzyxBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qk+WAP4+ICutAiIGyPmHEqtjLGyDPC25nbB3
 vg+qWWkWEOIi5gD+PpGsGSE7HYFdWJi1BCfshNOm8I92TyoE2nJkXh3LeA8=
 =mZr5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull trace event string verifier fix from Steven Rostedt:
 "The run-time string verifier checks all trace event formats as
  they are read from the tracing file to make sure that the %s pointers
  are not reading something that no longer exists.

  However, it failed to account for the valid case of '%*.s' where the
  length given is zero, and the string is NULL. It incorrectly flagged
  it as a null pointer dereference and gave a WARN_ON()"

* tag 'trace-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have trace event string test handle zero length strings
2022-03-26 14:54:41 -07:00
Linus Torvalds
6f2689a766 SCSI misc on 20220324
This series consists of the usual driver updates (qla2xxx, pm8001,
 libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
 and bug fixes.  The high blast radius core update is the removal of
 write same, which affects block and several non-SCSI devices.  The
 other big change, which is more local, is the removal of the SCSI
 pointer.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYjzDQyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQMYAQDEWUGV
 6U0+736AHVtOfiMNfiRN79B1HfXVoHvemnPcTwD/UlndwFfy/3GGOtoZmqEpc73J
 Ec1HDuUCE18H1H2QAh0=
 =/Ty9
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (qla2xxx, pm8001,
  libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
  and bug fixes.

  The high blast radius core update is the removal of write same, which
  affects block and several non-SCSI devices. The other big change,
  which is more local, is the removal of the SCSI pointer"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
  scsi: scsi_ioctl: Drop needless assignment in sg_io()
  scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
  scsi: lpfc: Copyright updates for 14.2.0.0 patches
  scsi: lpfc: Update lpfc version to 14.2.0.0
  scsi: lpfc: SLI path split: Refactor BSG paths
  scsi: lpfc: SLI path split: Refactor Abort paths
  scsi: lpfc: SLI path split: Refactor SCSI paths
  scsi: lpfc: SLI path split: Refactor CT paths
  scsi: lpfc: SLI path split: Refactor misc ELS paths
  scsi: lpfc: SLI path split: Refactor VMID paths
  scsi: lpfc: SLI path split: Refactor FDISC paths
  scsi: lpfc: SLI path split: Refactor LS_RJT paths
  scsi: lpfc: SLI path split: Refactor LS_ACC paths
  scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
  scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
  scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
  scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
  scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
  scsi: lpfc: SLI path split: Refactor lpfc_iocbq
  scsi: lpfc: Use kcalloc()
  ...
2022-03-24 19:37:53 -07:00
Linus Torvalds
169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Linus Torvalds
194dfe88d6 asm-generic updates for 5.18
There are three sets of updates for 5.18 in the asm-generic tree:
 
  - The set_fs()/get_fs() infrastructure gets removed for good. This
    was already gone from all major architectures, but now we can
    finally remove it everywhere, which loses some particularly
    tricky and error-prone code.
    There is a small merge conflict against a parisc cleanup, the
    solution is to use their new version.
 
  - The nds32 architecture ends its tenure in the Linux kernel. The
    hardware is still used and the code is in reasonable shape, but
    the mainline port is not actively maintained any more, as all
    remaining users are thought to run vendor kernels that would never
    be updated to a future release.
    There are some obvious conflicts against changes to the removed
    files.
 
  - A series from Masahiro Yamada cleans up some of the uapi header
    files to pass the compile-time checks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
 GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
 IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
 cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
 sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
 DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
 G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
 a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
 ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
 QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
 CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
 h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
 =vtCN
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "There are three sets of updates for 5.18 in the asm-generic tree:

   - The set_fs()/get_fs() infrastructure gets removed for good.

     This was already gone from all major architectures, but now we can
     finally remove it everywhere, which loses some particularly tricky
     and error-prone code. There is a small merge conflict against a
     parisc cleanup, the solution is to use their new version.

   - The nds32 architecture ends its tenure in the Linux kernel.

     The hardware is still used and the code is in reasonable shape, but
     the mainline port is not actively maintained any more, as all
     remaining users are thought to run vendor kernels that would never
     be updated to a future release.

   - A series from Masahiro Yamada cleans up some of the uapi header
     files to pass the compile-time checks"

* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
  nds32: Remove the architecture
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  sparc64: fix building assembly files
  ...
2022-03-23 18:03:08 -07:00
Linus Torvalds
1bc191051d Tracing updates for 5.18:
- New user_events interface. User space can register an event with the kernel
   describing the format of the event. Then it will receive a byte in a page
   mapping that it can check against. A privileged task can then enable that
   event like any other event, which will change the mapped byte to true,
   telling the user space application to start writing the event to the
   tracing buffer.
 
 - Add new "ftrace_boot_snapshot" kernel command line parameter. When set,
   the tracing buffer will be saved in the snapshot buffer at boot up when
   the kernel hands things over to user space. This will keep the traces that
   happened at boot up available even if user space boot up has tracing as
   well.
 
 - Have TRACE_EVENT_ENUM() also update trace event field type descriptions.
   Thus if a static array defines its size with an enum, the user space trace
   event parsers can still know how to parse that array.
 
 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This will
   make one tracepoint be able to trace different content and not be stuck at
   only what the original TRACE_EVENT() macro exports.
 
 - Fixes to tracing error logging.
 
 - Better saving of cmdlines to PIDs when tracing (use the wakeup events for
   mapping).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjiO3RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhQzAQDtek5p80p/zkMGymm14wSH6qq0NdgN
 Kv7fTBwEewUa0gD/UCOVLw4Oj+JtHQhCa3sCGZopmRv0BT1+4UQANqosKQY=
 =Au08
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - New user_events interface. User space can register an event with the
   kernel describing the format of the event. Then it will receive a
   byte in a page mapping that it can check against. A privileged task
   can then enable that event like any other event, which will change
   the mapped byte to true, telling the user space application to start
   writing the event to the tracing buffer.

 - Add new "ftrace_boot_snapshot" kernel command line parameter. When
   set, the tracing buffer will be saved in the snapshot buffer at boot
   up when the kernel hands things over to user space. This will keep
   the traces that happened at boot up available even if user space boot
   up has tracing as well.

 - Have TRACE_EVENT_ENUM() also update trace event field type
   descriptions. Thus if a static array defines its size with an enum,
   the user space trace event parsers can still know how to parse that
   array.

 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This
   will make one tracepoint be able to trace different content and not
   be stuck at only what the original TRACE_EVENT() macro exports.

 - Fixes to tracing error logging.

 - Better saving of cmdlines to PIDs when tracing (use the wakeup events
   for mapping).

* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
  tracing: Have type enum modifications copy the strings
  user_events: Add trace event call as root for low permission cases
  tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
  tracing: Add snapshot at end of kernel boot up
  tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
  tracing: Fix strncpy warning in trace_events_synth.c
  user_events: Prevent dyn_event delete racing with ioctl add/delete
  tracing: Add TRACE_CUSTOM_EVENT() macro
  tracing: Move the defines to create TRACE_EVENTS into their own files
  tracing: Add sample code for custom trace events
  tracing: Allow custom events to be added to the tracefs directory
  tracing: Fix last_cmd_set() string management in histogram code
  user_events: Fix potential uninitialized pointer while parsing field
  tracing: Fix allocation of last_cmd in last_cmd_set()
  user_events: Add documentation file
  user_events: Add sample code for typical usage
  user_events: Add self-test for validator boundaries
  user_events: Add self-test for perf_event integration
  user_events: Add self-test for dynamic_events integration
  user_events: Add self-test for ftrace integration
  ...
2022-03-23 11:40:25 -07:00
Steven Rostedt (Google)
eca344a736 tracing: Have trace event string test handle zero length strings
If a trace event has in its TP_printk():

 "%*.s", len, len ? __get_str(string) : NULL

It is perfectly valid if len is zero and passing in the NULL.
Unfortunately, the runtime string check at time of reading the trace sees
the NULL and flags it as a bad string and produces a WARN_ON().

Handle this case by passing into the test function if the format has an
asterisk (star) and if so, if the length is zero, then mark it as safe.

Link: https://lore.kernel.org/all/YjsWzuw5FbWPrdqq@bfoster/

Cc: stable@vger.kernel.org
Reported-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Fixes: 9a6944fee6 ("tracing: Add a verifier to check string pointers for trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-23 10:51:58 -04:00
Linus Torvalds
3fe2f7446f Changes in this cycle were:
- Cleanups for SCHED_DEADLINE
  - Tracing updates/fixes
  - CPU Accounting fixes
  - First wave of changes to optimize the overhead of the scheduler build,
    from the fast-headers tree - including placeholder *_api.h headers for
    later header split-ups.
  - Preempt-dynamic using static_branch() for ARM64
  - Isolation housekeeping mask rework; preperatory for further changes
  - NUMA-balancing: deal with CPU-less nodes
  - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD)
  - Updates to RSEQ UAPI in preparation for glibc usage
  - Lots of RSEQ/selftests, for same
  - Add Suren as PSI co-maintainer
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k
 ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL
 LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb
 mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/
 dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1
 0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie
 IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep
 qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6
 3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi
 NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk
 f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t
 ES/qvlGxTIs=
 =Z8uT
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Cleanups for SCHED_DEADLINE

 - Tracing updates/fixes

 - CPU Accounting fixes

 - First wave of changes to optimize the overhead of the scheduler
   build, from the fast-headers tree - including placeholder *_api.h
   headers for later header split-ups.

 - Preempt-dynamic using static_branch() for ARM64

 - Isolation housekeeping mask rework; preperatory for further changes

 - NUMA-balancing: deal with CPU-less nodes

 - NUMA-balancing: tune systems that have multiple LLC cache domains per
   node (eg. AMD)

 - Updates to RSEQ UAPI in preparation for glibc usage

 - Lots of RSEQ/selftests, for same

 - Add Suren as PSI co-maintainer

* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
  sched/headers: ARM needs asm/paravirt_api_clock.h too
  sched/numa: Fix boot crash on arm64 systems
  headers/prep: Fix header to build standalone: <linux/psi.h>
  sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
  cgroup: Fix suspicious rcu_dereference_check() usage warning
  sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
  sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
  sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
  sched/deadline,rt: Remove unused functions for !CONFIG_SMP
  sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
  sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
  sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
  sched/deadline: Remove unused def_dl_bandwidth
  sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
  sched/tracing: Don't re-read p->state when emitting sched_switch event
  sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
  sched/cpuacct: Remove redundant RCU read lock
  sched/cpuacct: Optimize away RCU read lock
  sched/cpuacct: Fix charge percpu cpuusage
  sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
  ...
2022-03-22 14:39:12 -07:00
Jakub Kicinski
0db8640df5 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2022-03-21 v2

We've added 137 non-merge commits during the last 17 day(s) which contain
a total of 143 files changed, 7123 insertions(+), 1092 deletions(-).

The main changes are:

1) Custom SEC() handling in libbpf, from Andrii.

2) subskeleton support, from Delyan.

3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao.

4) Fix net.core.bpf_jit_harden race, from Hou.

5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub.

6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami.
The arch specific bits will come later.

7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri.

8) Enable non-atomic allocations in local storage, from Joanne.

9) Various var_off ptr_to_btf_id fixed, from Kumar.

10) bpf_ima_file_hash helper, from Roberto.

11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits)
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  bpftool: Fix a bug in subskeleton code generation
  bpf: Fix bpf_prog_pack when PMU_SIZE is not defined
  bpf: Fix bpf_prog_pack for multi-node setup
  bpf: Fix warning for cast from restricted gfp_t in verifier
  bpf, arm: Fix various typos in comments
  libbpf: Close fd in bpf_object__reuse_map
  bpftool: Fix print error when show bpf map
  bpf: Fix kprobe_multi return probe backtrace
  Revert "bpf: Add support to inline bpf_get_func_ip helper on x86"
  bpf: Simplify check in btf_parse_hdr()
  selftests/bpf/test_lirc_mode2.sh: Exit with proper code
  bpf: Check for NULL return from bpf_get_btf_vmlinux
  selftests/bpf: Test skipping stacktrace
  bpf: Adjust BPF stack helper functions to accommodate skip > 0
  bpf: Select proper size for bpf_prog_pack
  ...
====================

Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22 11:18:49 -07:00
Linus Torvalds
14705fda8f New features:
- NFSv3 support in NFSD is now always built
 - Added NFSD support for the NFSv4 birth-time file attribute
 - Added support for storing and displaying sockaddrs in trace points
 - NFSD now recognizes RPC_AUTH_TLS probes
 
 Performance improvements:
 - Optimized the svc transport enqueuing mechanism
 - Added micro-optimizations for the duplicate reply cache
 
 Notable bug fixes:
 - Allocation of the NFSD file cache hash table is more reliable
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmI3XNkACgkQM2qzM29m
 f5dNERAAqJ/nzfVp2H5BKLszJ7p/s13wFqbW719Rzymzl6/tUUHOqIsVdBsJsa/b
 BdZQLLDwa6ZB5zOAnC6FosRKYu4lwixOOf94pC6a9ZDD/glYVKF8mZG+RZXPAy16
 g3JUOi/bcyHXv0ZUhbv7DqW+HHM+owPP4vlNJ9ChiiLr/Xdp8NBKj+4Qtn/wcAo+
 Xuvx7fU/5Mbemh+dd5mWker4afHvpt9V6U6s04m5LiTPPnHVnxmeyekJGUCOY0QO
 cm/6SPNDqyn/VEfM/SRxEnLE9QcHRhZo/4PKRGF4wYolcviIogbILE1M7Ig/r/Gv
 6Du2kcRAhyZ3zgWnu799Ivn3Q6IrVjxZwqmsi7YHURTwYKyZtxYsUk0MCBcpnxrE
 WyTS2onpElbMop3viKCnQdpIetbsHnUNg3udUV6ugbdCbnZuhUw5B/d6x0o8ZWDE
 C0f+jnX+GnBstn0vkcj2H0+VQTd5hUJtXMrooI42ODJoboQRZcmePwoXjqCmw3sy
 PXTxLZC/5+4zNHGUuz4Pq4V7FKr4nHhDzaW4ZDO3mILx4ahceotulY1B/yoBUu8o
 /LAhu2kJ6nFQkmpzdrGzPeOstgJYHm9CaitRvMzg+NJxEAJdebypdQDbX5iNpgfW
 MDXH4n8eIqroTlQ/mQYEV0EbC7BaTqSCL6rQdcrcFUPu9n4Fcno=
 =5nac
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "New features:

   - NFSv3 support in NFSD is now always built

   - Added NFSD support for the NFSv4 birth-time file attribute

   - Added support for storing and displaying sockaddrs in trace points

   - NFSD now recognizes RPC_AUTH_TLS probes

  Performance improvements:

   - Optimized the svc transport enqueuing mechanism

   - Added micro-optimizations for the duplicate reply cache

  Notable bug fixes:

   - Allocation of the NFSD file cache hash table is more reliable"

* tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (30 commits)
  nfsd: fix using the correct variable for sizeof()
  nfsd: use correct format characters
  NFSD: prevent integer overflow on 32 bit systems
  NFSD: prevent underflow in nfssvc_decode_writeargs()
  fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
  NFSD: Fix nfsd_breaker_owns_lease() return values
  NFSD: Clean up _lm_ operation names
  arch: Remove references to CONFIG_NFSD_V3 in the default configs
  NFSD: Remove CONFIG_NFSD_V3
  nfsd: more robust allocation failure handling in nfsd_file_cache_init
  SUNRPC: Teach server to recognize RPC_AUTH_TLS
  NFSD: Move svc_serv_ops::svo_function into struct svc_serv
  NFSD: Remove svc_serv_ops::svo_module
  SUNRPC: Remove svc_shutdown_net()
  SUNRPC: Rename svc_close_xprt()
  SUNRPC: Rename svc_create_xprt()
  SUNRPC: Remove svo_shutdown method
  SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
  SUNRPC: Remove the .svo_enqueue_xprt method
  SUNRPC: Record endpoint information in trace log
  ...
2022-03-22 10:29:51 -07:00
Jiri Olsa
f70986902c bpf: Fix kprobe_multi return probe backtrace
Andrii reported that backtraces from kprobe_multi program attached
as return probes are not complete and showing just initial entry [1].

It's caused by changing registers to have original function ip address
as instruction pointer even for return probe, which will screw backtrace
from return probe.

This change keeps registers intact and store original entry ip and
link address on the stack in bpf_kprobe_multi_run_ctx struct, where
bpf_get_func_ip and bpf_get_attach_cookie helpers for kprobe_multi
programs can find it.

[1] https://lore.kernel.org/bpf/CAEf4BzZDDqK24rSKwXNp7XL3ErGD4bZa1M6c_c4EvDSt3jrZcg@mail.gmail.com/T/#m8d1301c0ea0892ddf9dc6fba57a57b8cf11b8c51

Fixes: ca74823c6e ("bpf: Add cookie support to programs attached with kprobe multi link")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220321070113.1449167-3-jolsa@kernel.org
2022-03-21 14:54:15 +01:00
Jiri Olsa
f705ec764b Revert "bpf: Add support to inline bpf_get_func_ip helper on x86"
This reverts commit 97ee4d20ee.

Following change is adding more complexity to bpf_get_func_ip
helper for kprobe_multi programs, which can't be inlined easily.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321070113.1449167-2-jolsa@kernel.org
2022-03-21 14:53:42 +01:00
Steven Rostedt (Google)
795301d3c2 tracing: Have type enum modifications copy the strings
When an enum is used in the visible parts of a trace event that is
exported to user space, the user space applications like perf and
trace-cmd do not have a way to know what the value of the enum is. To
solve this, at boot up (or module load) the printk formats are modified to
replace the enum with their numeric value in the string output.

Array fields of the event are defined by [<nr-elements>] in the type
portion of the format file so that the user space parsers can correctly
parse the array into the appropriate size chunks. But in some trace
events, an enum is used in defining the size of the array, which once
again breaks the parsing of user space tooling.

This was solved the same way as the print formats were, but it modified
the type strings of the trace event. This caused crashes in some
architectures because, as supposed to the print string, is a const string
value. This was not detected on x86, as it appears that const strings are
still writable (at least in boot up), but other architectures this is not
the case, and writing to a const string will cause a kernel fault.

To fix this, use kstrdup() to copy the type before modifying it. If the
trace event is for the core kernel there's no need to free it because the
string will be in use for the life of the machine being on line. For
modules, create a link list to store all the strings being allocated for
modules and when the module is removed, free them.

Link: https://lore.kernel.org/all/yt9dr1706b4i.fsf@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220318153432.3984b871@gandalf.local.home

Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: b3bc8547d3 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-20 10:25:13 -04:00
Jiri Olsa
ca74823c6e bpf: Add cookie support to programs attached with kprobe multi link
Adding support to call bpf_get_attach_cookie helper from
kprobe programs attached with kprobe multi link.

The cookie is provided by array of u64 values, where each
value is paired with provided function address or symbol
with the same array index.

When cookie array is provided it's sorted together with
addresses (check bpf_kprobe_multi_cookie_swap). This way
we can find cookie based on the address in
bpf_get_attach_cookie helper.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-7-jolsa@kernel.org
2022-03-17 20:17:19 -07:00
Jiri Olsa
97ee4d20ee bpf: Add support to inline bpf_get_func_ip helper on x86
Adding support to inline it on x86, because it's single
load instruction.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-6-jolsa@kernel.org
2022-03-17 20:17:19 -07:00
Jiri Olsa
42a5712094 bpf: Add bpf_get_func_ip kprobe helper for multi kprobe link
Adding support to call bpf_get_func_ip helper from kprobe
programs attached by multi kprobe link.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-5-jolsa@kernel.org
2022-03-17 20:17:19 -07:00
Jiri Olsa
0dcac27254 bpf: Add multi kprobe link
Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe
program through fprobe API.

The fprobe API allows to attach probe on multiple functions at once
very fast, because it works on top of ftrace. On the other hand this
limits the probe point to the function entry or return.

The kprobe program gets the same pt_regs input ctx as when it's attached
through the perf API.

Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment
kprobe to multiple function with new link.

User provides array of addresses or symbols with count to attach the
kprobe program to. The new link_create uapi interface looks like:

  struct {
          __u32           flags;
          __u32           cnt;
          __aligned_u64   syms;
          __aligned_u64   addrs;
  } kprobe_multi;

The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create
return multi kprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org
2022-03-17 20:17:18 -07:00
Masami Hiramatsu
ab51e15d53 fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobe
Introduce FPROBE_FL_KPROBE_SHARED flag for sharing fprobe callback with
kprobes safely from the viewpoint of recursion.

Since the recursion safety of the fprobe (and ftrace) is a bit different
from the kprobes, this may cause an issue if user wants to run the same
code from the fprobe and the kprobes.

The kprobes has per-cpu 'current_kprobe' variable which protects the
kprobe handler from recursion in any case. On the other hand, the fprobe
uses only ftrace_test_recursion_trylock(), which will allow interrupt
context calls another (or same) fprobe during the fprobe user handler is
running.

This is not a matter in cases if the common callback shared among the
kprobes and the fprobe has its own recursion detection, or it can handle
the recursion in the different contexts (normal/interrupt/NMI.)
But if it relies on the 'current_kprobe' recursion lock, it has to check
kprobe_running() and use kprobe_busy_*() APIs.

Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback
code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED
*before* registering the fprobe, like;

 fprobe.flags = FPROBE_FL_KPROBE_SHARED;

 register_fprobe(&fprobe, "func*", NULL);

This will protect your common callback from the nested call.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735293127.1084943.15687374237275817599.stgit@devnote2
2022-03-17 20:17:02 -07:00
Masami Hiramatsu
5b0ab78998 fprobe: Add exit_handler support
Add exit_handler to fprobe. fprobe + rethook allows us to hook the kernel
function return. The rethook will be enabled only if the
fprobe::exit_handler is set.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735290790.1084943.10601965782208052202.stgit@devnote2
2022-03-17 20:16:52 -07:00
Masami Hiramatsu
54ecbe6f1e rethook: Add a generic return hook
Add a return hook framework which hooks the function return. Most of the
logic came from the kretprobe, but this is independent from kretprobe.

Note that this is expected to be used with other function entry hooking
feature, like ftrace, fprobe, adn kprobes. Eventually this will replace
the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment,
this is just an additional hook.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
2022-03-17 20:16:29 -07:00
Masami Hiramatsu
cad9931f64 fprobe: Add ftrace based probe APIs
The fprobe is a wrapper API for ftrace function tracer.
Unlike kprobes, this probes only supports the function entry, but this
can probe multiple functions by one fprobe. The usage is similar, user
will set their callback to fprobe::entry_handler and call
register_fprobe*() with probed functions.
There are 3 registration interfaces,

 - register_fprobe() takes filtering patterns of the functin names.
 - register_fprobe_ips() takes an array of ftrace-location addresses.
 - register_fprobe_syms() takes an array of function names.

The registered fprobes can be unregistered with unregister_fprobe().
e.g.

struct fprobe fp = { .entry_handler = user_handler };
const char *targets[] = { "func1", "func2", "func3"};
...

ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets));

...

unregister_fprobe(&fp);

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2
2022-03-17 20:16:15 -07:00
Jiri Olsa
4f554e9556 ftrace: Add ftrace_set_filter_ips function
Adding ftrace_set_filter_ips function to be able to set filter on
multiple ip addresses at once.

With the kprobe multi attach interface we have cases where we need to
initialize ftrace_ops object with thousands of functions, so having
single function diving into ftrace_hash_move_and_update_ops with
ftrace_lock is faster.

The functions ips are passed as unsigned long array with count.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735282673.1084943.18310504594134769804.stgit@devnote2
2022-03-17 20:15:17 -07:00
Jakub Kicinski
e243f39685 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 13:56:58 -07:00
Beau Belgrave
089331d473 user_events: Add trace event call as root for low permission cases
Tracefs by default is locked down heavily. System operators can open up
some files, such as user_events to a broader set of users. These users
do not have access within tracefs beyond just the user_event files. Due
to this restriction the trace_add_event_call/remove calls will silently
fail since the caller does not have permissions to create directories.

To fix this trace_add_event_call/remove calls will be issued with
override creds of the global root UID. Creds are reverted immediately
afterward.

Link: https://lkml.kernel.org/r/20220308222807.2040-1-beaub@linux.microsoft.com

Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-15 14:33:20 -04:00
Peter Zijlstra
aebfd12521 x86/ibt,ftrace: Search for __fentry__ location
Currently a lot of ftrace code assumes __fentry__ is at sym+0. However
with Intel IBT enabled the first instruction of a function will most
likely be ENDBR.

Change ftrace_location() to not only return the __fentry__ location
when called for the __fentry__ location, but also when called for the
sym+0 location.

Then audit/update all callsites of this function to consistently use
these new semantics.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.227581603@infradead.org
2022-03-15 10:32:37 +01:00
Ingo Molnar
ccdbf33c23 Linux 5.17-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmIuUskeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGCFkH/2n3mpGXuITp0ZXE
 TNrpbdZOof5SgLw+w7THswXuo6m5yRGNKQs9fvIvDD8Vf7/OdQQfPOmF1cIE5+nk
 wcz6aHKbdrok8Jql2qjJqWXZ5xbGj6qywg3zZrwOUsCKFP5p+AjBJcmZOsvQHjSp
 ASODy1moOlK+nO52TrMaJw74a8xQPmQiNa+T2P+FedEYjlcRH/c7hLJ7GEnL6+cC
 /R4bATZq3tiInbTBlkC0hR0iVNgRXwXNyv9PEXrYYYHnekh8G1mgSNf06iejLcsG
 aAYsW9NyPxu8zPhhHNx79K9o8BMtxGD4YQpsfdfIEnf9Q3euqAKe2evRWqHHlDms
 RuSCtsc=
 =M9Nc
 -----END PGP SIGNATURE-----

Merge tag 'v5.17-rc8' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-03-15 10:28:12 +01:00
Steven Rostedt (Google)
bc47ee4844 tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
kzalloc virtual addresses do not work with SetPageReserved, use the actual
page virtual addresses instead via alloc_pages.

The issue is reported when booting with user_events and
DEBUG_VM_PGFLAGS=y.

Also make the number of events based on the ORDER.

Link: https://lore.kernel.org/all/CADYN=9+xY5Vku3Ws5E9S60SM5dCFfeGeRBkmDFbcxX0ZMoFing@mail.gmail.com/
Link: https://lore.kernel.org/all/20220311223028.1865-1-beaub@linux.microsoft.com/

Cc: Beau Belgrave <beaub@linux.microsoft.com>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-11 18:59:57 -05:00
Steven Rostedt (Google)
380af29b8d tracing: Add snapshot at end of kernel boot up
Add ftrace_boot_snapshot kernel parameter that will take a snapshot at the
end of boot up just before switching over to user space (it happens during
the kernel freeing of init memory).

This is useful when there's interesting data that can be collected from
kernel start up, but gets overridden by user space start up code. With
this option, the ring buffer content from the boot up traces gets saved in
the snapshot at the end of boot up. This trace can be read from:

 /sys/kernel/tracing/snapshot

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-11 11:49:24 -05:00
Steven Rostedt (Google)
b3bc8547d3 tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
The macro TRACE_DEFINE_ENUM is used to convert enums in the kernel to
their actual value when they are exported to user space via the trace
event format file.

Currently only the enums in the "print fmt" (TP_printk in the TRACE_EVENT
macro) have the enums converted. But the enums can be used to denote array
size:

        field:unsigned int fc_ineligible_rc[EXT4_FC_REASON_MAX]; offset:12;      size:36;        signed:0;

The EXT4_FC_REASON_MAX has no meaning to userspace but it needs to know
that information to know how to parse the array.

Have the array indexes also be parsed as well.

Link: https://lore.kernel.org/all/cover.1646922487.git.riteshh@linux.ibm.com/

Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-11 11:49:24 -05:00
Tom Zanussi
9f438d4d7f tracing: Fix strncpy warning in trace_events_synth.c
0-day reported the strncpy error below:

../kernel/trace/trace_events_synth.c: In function 'last_cmd_set':
../kernel/trace/trace_events_synth.c:65:9: warning: 'strncpy' specified bound depends on the length o\
f the source argument [-Wstringop-truncation]
   65 |         strncpy(last_cmd, str, strlen(str) + 1);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kernel/trace/trace_events_synth.c:65:32: note: length computed here
   65 |         strncpy(last_cmd, str, strlen(str) + 1);
      |                                ^~~~~~~~~~~

There's no reason to use strncpy here, in fact there's no reason to do
anything but a simple kstrdup() (note we don't even need to check for
failure since last_cmod is expected to be either the last cmd string
or NULL, and the containing function is a void return).

Link: https://lkml.kernel.org/r/77deca8cbfd226981b3f1eab203967381e9b5bd9.camel@kernel.org

Fixes: 27c888da98 ("tracing: Remove size restriction on synthetic event cmd error logging")

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-11 11:49:24 -05:00
Beau Belgrave
7e348b325b user_events: Prevent dyn_event delete racing with ioctl add/delete
Find user_events always while under the event_mutex and before leaving
the lock, add a ref count to the user_event. This ensures that all paths
under the event_mutex that check the ref counts will be synchronized.

The ioctl add/delete paths are protected by the reg_mutex. However,
dyn_event is only protected by the event_mutex. The dyn_event delete
path cannot acquire reg_mutex, since that could cause a deadlock between
the ioctl delete case acquiring event_mutex after acquiring the reg_mutex.

Link: https://lkml.kernel.org/r/20220310001141.1660-1-beaub@linux.microsoft.com

Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-11 11:49:24 -05:00
Linus Torvalds
dda64ead7e Minor tracing fixes:
- Fix unregistering the same event twice a the user could disable
    the event osnoise will disable on unregistering.
 
  - Inform RCU of a quiescent state in the osnoise testing thread.
 
  - Fix some kerneldoc comments.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYioe0BQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qohMAQCcnN4Iy9LcKX02A5HcuUxIdH3ktOqt
 lbc24+nb6AO+/wD+Ilnlx/Ir09HgaL7NreSAc0tvLMwRgaM+grLA82IRTwc=
 =quYh
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Minor tracing fixes:

   - Fix unregistering the same event twice. A user could disable the
     same event that osnoise will disable on unregistering.

   - Inform RCU of a quiescent state in the osnoise testing thread.

   - Fix some kerneldoc comments"

* tag 'trace-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix some W=1 warnings in kernel doc comments
  tracing/osnoise: Force quiescent states while tracing
  tracing/osnoise: Do not unregister events twice
2022-03-10 17:23:08 -08:00
Jakub Kicinski
1e8a3f0d2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/dsa/dsa2.c
  commit afb3cc1a39 ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
  commit e83d565378 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/

drivers/net/ethernet/intel/ice/ice.h
  commit 97b0129146 ("ice: Fix error with handling of bonding MTU")
  commit 43113ff734 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/

drivers/staging/gdm724x/gdm_lte.c
  commit fc7f750dc9 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
  commit 4bcc4249b4 ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 17:16:56 -08:00
Steven Rostedt (Google)
8bcd06632b tracing: Allow custom events to be added to the tracefs directory
Allow custom events to be added to the events directory in the tracefs
file system. For example, a module could be installed that attaches to an
event and wants to be enabled and disabled via the tracefs file system. It
would use trace_add_event_call() to add the event to the tracefs
directory, and trace_remove_event_call() to remove it.

Make both those functions EXPORT_SYMBOL_GPL().

Link: https://lkml.kernel.org/r/20220303220625.186988045@goodmis.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-10 17:41:18 -05:00
Steven Rostedt (Google)
5677a3d713 tracing: Fix last_cmd_set() string management in histogram code
Using strnlen(dest, str, n) is confusing, as the size of dest must be
strlen(dest) + n + 1. Even more confusing, using sizeof(string constant)
gives you strlen(string constant) + 1 and not just strlen().

These two together made using strncat() with a constant string a bit off
in the calculations as we have:

	len = sizeof(HIST_PREFIX) + strlen(str) + 1;
	kfree(last_cmd);
	last_cmd = kzalloc(len, GFP_KERNEL);
	strcpy(last_cmd, HIST_PREFIX);
	len -= sizeof(HIST_PREFIX) + 1;
	strncat(last_cmd, str, len);

The above works if we s/sizeof/strlen/ with HIST_PREFIX (which is defined
as "hist:", but because sizeof(HIST_PREFIX) is equal to
strlen(HIST_PREFIX) + 1, we can drop the +1 in the code. But at least
comment that we are doing so.

Link: https://lore.kernel.org/all/202203082112.Iu7tvFl4-lkp@intel.com/

Fixes: 9f8e5aee93 ("tracing: Fix allocation of last_cmd in last_cmd_set()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-10 17:41:18 -05:00
Beau Belgrave
173c2049d1 user_events: Fix potential uninitialized pointer while parsing field
Ensure name is initialized by default to NULL to prevent possible edge
cases that could lead to it being left uninitialized. Add an explicit
check for NULL name to ensure edge boundaries.

Link: https://lore.kernel.org/bpf/20220224105334.GA2248@kili/
Link: https://lore.kernel.org/linux-trace-devel/20220224181637.2129-1-beaub@linux.microsoft.com

Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-10 17:41:18 -05:00
Jiapeng Chong
78cbc65132 ftrace: Fix some W=1 warnings in kernel doc comments
Clean up the following clang-w1 warning:

kernel/trace/ftrace.c:7827: warning: Function parameter or member 'ops'
not described in 'unregister_ftrace_function'.

kernel/trace/ftrace.c:7805: warning: Function parameter or member 'ops'
not described in 'register_ftrace_function'.

Link: https://lkml.kernel.org/r/20220307004303.26399-1-jiapeng.chong@linux.alibaba.com

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-09 11:52:21 -05:00
Nicolas Saenz Julienne
caf4c86bf1 tracing/osnoise: Force quiescent states while tracing
At the moment running osnoise on a nohz_full CPU or uncontested FIFO
priority and a PREEMPT_RCU kernel might have the side effect of
extending grace periods too much. This will entice RCU to force a
context switch on the wayward CPU to end the grace period, all while
introducing unwarranted noise into the tracer. This behaviour is
unavoidable as overly extending grace periods might exhaust the system's
memory.

This same exact problem is what extended quiescent states (EQS) were
created for, conversely, rcu_momentary_dyntick_idle() emulates them by
performing a zero duration EQS. So let's make use of it.

In the common case rcu_momentary_dyntick_idle() is fairly inexpensive:
atomically incrementing a local per-CPU counter and doing a store. So it
shouldn't affect osnoise's measurements (which has a 1us granularity),
so we'll call it unanimously.

The uncommon case involve calling rcu_momentary_dyntick_idle() after
having the osnoise process:

 - Receive an expedited quiescent state IPI with preemption disabled or
   during an RCU critical section. (activates rdp->cpu_no_qs.b.exp
   code-path).

 - Being preempted within in an RCU critical section and having the
   subsequent outermost rcu_read_unlock() called with interrupts
   disabled. (t->rcu_read_unlock_special.b.blocked code-path).

Neither of those are possible at the moment, and are unlikely to be in
the future given the osnoise's loop design. On top of this, the noise
generated by the situations described above is unavoidable, and if not
exposed by rcu_momentary_dyntick_idle() will be eventually seen in
subsequent rcu_read_unlock() calls or schedule operations.

Link: https://lkml.kernel.org/r/20220307180740.577607-1-nsaenzju@redhat.com

Cc: stable@vger.kernel.org
Fixes: bce29ac9ce ("trace: Add osnoise tracer")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-09 11:51:42 -05:00
Daniel Bristot de Oliveira
f0cfe17bcc tracing/osnoise: Do not unregister events twice
Nicolas reported that using:

 # trace-cmd record -e all -M 10 -p osnoise --poll

Resulted in the following kernel warning:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 1217 at kernel/tracepoint.c:404 tracepoint_probe_unregister+0x280/0x370
 [...]
 CPU: 0 PID: 1217 Comm: trace-cmd Not tainted 5.17.0-rc6-next-20220307-nico+ #19
 RIP: 0010:tracepoint_probe_unregister+0x280/0x370
 [...]
 CR2: 00007ff919b29497 CR3: 0000000109da4005 CR4: 0000000000170ef0
 Call Trace:
  <TASK>
  osnoise_workload_stop+0x36/0x90
  tracing_set_tracer+0x108/0x260
  tracing_set_trace_write+0x94/0xd0
  ? __check_object_size.part.0+0x10a/0x150
  ? selinux_file_permission+0x104/0x150
  vfs_write+0xb5/0x290
  ksys_write+0x5f/0xe0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7ff919a18127
 [...]
 ---[ end trace 0000000000000000 ]---

The warning complains about an attempt to unregister an
unregistered tracepoint.

This happens on trace-cmd because it first stops tracing, and
then switches the tracer to nop. Which is equivalent to:

  # cd /sys/kernel/tracing/
  # echo osnoise > current_tracer
  # echo 0 > tracing_on
  # echo nop > current_tracer

The osnoise tracer stops the workload when no trace instance
is actually collecting data. This can be caused both by
disabling tracing or disabling the tracer itself.

To avoid unregistering events twice, use the existing
trace_osnoise_callback_enabled variable to check if the events
(and the workload) are actually active before trying to
deactivate them.

Link: https://lore.kernel.org/all/c898d1911f7f9303b7e14726e7cc9678fbfb4a0e.camel@redhat.com/
Link: https://lkml.kernel.org/r/938765e17d5a781c2df429a98f0b2e7cc317b022.1646823913.git.bristot@kernel.org

Cc: stable@vger.kernel.org
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Fixes: 2fac8d6486 ("tracing/osnoise: Allow multiple instances of the same tracer")
Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-09 11:15:24 -05:00
Linus Torvalds
f40a33f5ea Two tracing fixes:
- Fix sorting on old "cpu" value in histograms
 
  - Fix return value of __setup() boot parameter handlers.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYiQOghQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quMXAP0TVq+FvVroN42ZS/UpiynnJ0uW1ibV
 93i3M12QQL2zSQEA6a+aWHywTl1tU2F/I4frH5RkIwTulfP/RwBVJG0MFQc=
 =ccPg
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix sorting on old "cpu" value in histograms

 - Fix return value of __setup() boot parameter handlers

* tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix return value of __setup handlers
  tracing/histogram: Fix sorting on old "cpu" value
2022-03-06 11:47:59 -08:00
Linus Torvalds
ac84e82f78 block-5.17-2022-03-04
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmIihP0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvWwD/4/Rwu4a7plr7HHYKfS5MaTS62edwWIf2Li
 zMZaGS0kuS4DSV3Lk5Y4AlGyz7FrWjbV+hWlotQNZuvmGntlLeBmscuYlSdN55NL
 afRjwhFRmLOfhOJXCsAE2dSqDvReuRdSn9XkDTL/ViByb35UZUaxGR+nTrGQ8B6J
 DyoA2JVpTVs9B7jtnWoCXKz6TgjFIqT7v29Zd2xE5BrJ/vKpvq0z/4BdJlMBSSKT
 FJ5IQjuE1dyudxJAVYc7X4+t7HRw0afRItZIxrn294COoMmdazhBelnES65CMLfN
 u309J2/HGL0hIRI7tb1Gljp2U8oxYgKeg66VPx1LYFoQ0sUqC9rW+sqU8zZky7SG
 oTzG6ZppSrhTSFhgMYIobChIOKmBRW+tj2BvO6ipKwNJVZbMMFmZogf9K75MJ5U7
 L52RdFxf8D5t7lYzl22puBRgzq5G4m2yi6gbV2EMUfWb2SkbbngdVzuG/uJRQv+D
 7zE8XqqevOgLsUgS71+1oAgc1h07j4b2ihe1UIY2Zo0rZ27y9MV66cbllG8s0R3y
 la5xSSi+HuMNcUpmCeERWLf8uXB3Jzwrwo5l7UvpJuPEGSes4jmE+dHsN3r79bV4
 I5Td7wjBASFu7LKEJlP1OinKdQWJvbJhahNN+pqQtNMxyK6IvNlQRgqh0EwGJhH+
 dqwVNNgkIQ==
 =drk+
 -----END PGP SIGNATURE-----

Merge tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
 "Just a small UAF fix for blktrace"

* tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block:
  blktrace: fix use after free for struct blk_trace
2022-03-04 16:03:46 -08:00
Randy Dunlap
1d02b444b8 tracing: Fix return value of __setup handlers
__setup() handlers should generally return 1 to indicate that the
boot options have been handled.

Using invalid option values causes the entire kernel boot option
string to be reported as Unknown and added to init's environment
strings, polluting it.

  Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
    kprobe_event=p,syscall_any,$arg1 trace_options=quiet
    trace_clock=jiffies", will be passed to user space.

 Run /sbin/init as init process
   with arguments:
     /sbin/init
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc6
     kprobe_event=p,syscall_any,$arg1
     trace_options=quiet
     trace_clock=jiffies

Return 1 from the __setup() handlers so that init's environment is not
polluted with kernel boot options.

Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org

Cc: stable@vger.kernel.org
Fixes: 7bcfaf54f5 ("tracing: Add trace_options kernel command line parameter")
Fixes: e1e232ca6b ("tracing: Add trace_clock=<clock> kernel parameter")
Fixes: 970988e19e ("tracing/kprobe: Add kprobe_event= boot parameter")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-04 13:46:13 -05:00
Jakub Kicinski
80901bff81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/batman-adv/hard-interface.c
  commit 690bb6fb64 ("batman-adv: Request iflink once in batadv-on-batadv check")
  commit 6ee3c393ee ("batman-adv: Demote batadv-on-batadv skip error message")
https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/

net/smc/af_smc.c
  commit 4d08b7b57e ("net/smc: Fix cleanup when register ULP fails")
  commit 462791bbfa ("net/smc: add sysctl interface for SMC")
https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 11:55:12 -08:00
Steven Rostedt (Google)
1d1898f656 tracing/histogram: Fix sorting on old "cpu" value
When trying to add a histogram against an event with the "cpu" field, it
was impossible due to "cpu" being a keyword to key off of the running CPU.
So to fix this, it was changed to "common_cpu" to match the other generic
fields (like "common_pid"). But since some scripts used "cpu" for keying
off of the CPU (for events that did not have "cpu" as a field, which is
most of them), a backward compatibility trick was added such that if "cpu"
was used as a key, and the event did not have "cpu" as a field name, then
it would fallback and switch over to "common_cpu".

This fix has a couple of subtle bugs. One was that when switching over to
"common_cpu", it did not change the field name, it just set a flag. But
the code still found a "cpu" field. The "cpu" field is used for filtering
and is returned when the event does not have a "cpu" field.

This was found by:

  # cd /sys/kernel/tracing
  # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
  # cat events/sched/sched_wakeup/hist

Which showed the histogram unsorted:

{ cpu:         19, pid:       1175 } hitcount:          1
{ cpu:          6, pid:        239 } hitcount:          2
{ cpu:         23, pid:       1186 } hitcount:         14
{ cpu:         12, pid:        249 } hitcount:          2
{ cpu:          3, pid:        994 } hitcount:          5

Instead of hard coding the "cpu" checks, take advantage of the fact that
trace_event_field_field() returns a special field for "cpu" and "CPU" if
the event does not have "cpu" as a field. This special field has the
"filter_type" of "FILTER_CPU". Check that to test if the returned field is
of the CPU type instead of doing the string compare.

Also, fix the sorting bug by testing for the hist_field flag of
HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
the special CPU field to know what compare routine to use, and since that
special field does not have a size, it returns tracing_map_cmp_none.

Cc: stable@vger.kernel.org
Fixes: 1e3bac71c5 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-01 22:48:30 -05:00
Valentin Schneider
fa2c3254d7 sched/tracing: Don't re-read p->state when emitting sched_switch event
As of commit

  c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")

the following sequence becomes possible:

		      p->__state = TASK_INTERRUPTIBLE;
		      __schedule()
			deactivate_task(p);
  ttwu()
    READ !p->on_rq
    p->__state=TASK_WAKING
			trace_sched_switch()
			  __trace_sched_switch_state()
			    task_state_index()
			      return 0;

TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
the trace event.

Prevent this by pushing the value read from __schedule() down the trace
event.

Reported-by: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com
2022-03-01 16:18:39 +01:00
Steven Rostedt (Google)
c6ced22997 tracing: Update print fmt check to handle new __get_sockaddr() macro
A helper macro was added to make reading socket addresses easier in trace
events. It pairs %pISpc with __get_sockaddr() that reads the socket
address from the ring buffer into a human readable format.

The boot up check that makes sure that trace events do not reference
pointers to memory that can later be freed when the trace event is read,
incorrectly flagged this as a delayed reference. Update the check to
handle "__get_sockaddr" and not report an error on it.

Link: https://lore.kernel.org/all/20220125160505.068dbb52@canb.auug.org.au/

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:38 -05:00
Yu Kuai
3093929326 blktrace: fix use after free for struct blk_trace
When tracing the whole disk, 'dropped' and 'msg' will be created
under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free()
won't remove those files. What's worse, the following UAF can be
triggered because of accessing stale 'dropped' and 'msg':

==================================================================
BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100
Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188

CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 print_address_description.constprop.0.cold+0xab/0x381
 ? blk_dropped_read+0x89/0x100
 ? blk_dropped_read+0x89/0x100
 kasan_report.cold+0x83/0xdf
 ? blk_dropped_read+0x89/0x100
 kasan_check_range+0x140/0x1b0
 blk_dropped_read+0x89/0x100
 ? blk_create_buf_file_callback+0x20/0x20
 ? kmem_cache_free+0xa1/0x500
 ? do_sys_openat2+0x258/0x460
 full_proxy_read+0x8f/0xc0
 vfs_read+0xc6/0x260
 ksys_read+0xb9/0x150
 ? vfs_write+0x3d0/0x3d0
 ? fpregs_assert_state_consistent+0x55/0x60
 ? exit_to_user_mode_prepare+0x39/0x1e0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fbc080d92fd
Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1
RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd
RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045
RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd
R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0
R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8
 </TASK>

Allocated by task 1050:
 kasan_save_stack+0x1e/0x40
 __kasan_kmalloc+0x81/0xa0
 do_blk_trace_setup+0xcb/0x410
 __blk_trace_setup+0xac/0x130
 blk_trace_ioctl+0xe9/0x1c0
 blkdev_ioctl+0xf1/0x390
 __x64_sys_ioctl+0xa5/0xe0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 1050:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 kasan_set_free_info+0x20/0x30
 __kasan_slab_free+0x103/0x180
 kfree+0x9a/0x4c0
 __blk_trace_remove+0x53/0x70
 blk_trace_ioctl+0x199/0x1c0
 blkdev_common_ioctl+0x5e9/0xb30
 blkdev_ioctl+0x1a5/0x390
 __x64_sys_ioctl+0xa5/0xe0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88816912f380
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 88 bytes inside of
 96-byte region [ffff88816912f380, ffff88816912f3e0)
The buggy address belongs to the page:
page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f
flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780
raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
>ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                                    ^
 ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
==================================================================

Fixes: c0ea57608b ("blktrace: remove debugfs file dentries from struct blk_trace")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220228034354.4047385-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28 06:36:33 -07:00
Christophe Leroy
c5229a0bd4 tracing: Fix selftest config check for function graph start up test
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test
direct tramp.

Link: https://lkml.kernel.org/r/bdc7e594e13b0891c1d61bc8d56c94b1890eaed7.1640017960.git.christophe.leroy@csgroup.eu

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 21:05:29 -05:00
Daniel Bristot de Oliveira
dd990352f0 tracing/osnoise: Make osnoise_main to sleep for microseconds
osnoise's runtime and period are in the microseconds scale, but it is
currently sleeping in the millisecond's scale. This behavior roots in the
usage of hwlat as the skeleton for osnoise.

Make osnoise to sleep in the microseconds scale. Also, move the sleep to
a specialized function.

Link: https://lkml.kernel.org/r/302aa6c7bdf2d131719b22901905e9da122a11b2.1645197336.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 12:07:01 -05:00
Nathan Chancellor
ab2f993c01 ftrace: Remove unused ftrace_startup_enable() stub
When building with clang + CONFIG_DYNAMIC_FTRACE=n + W=1, there is a
warning:

  kernel/trace/ftrace.c:7194:20: error: unused function 'ftrace_startup_enable' [-Werror,-Wunused-function]
  static inline void ftrace_startup_enable(int command) { }
                     ^
  1 error generated.

Clang warns on instances of static inline functions in .c files with W=1
after commit 6863f5643d ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

The ftrace_startup_enable() stub has been unused since
commit e1effa0144 ("ftrace: Annotate the ops operation on update"),
where its use outside of the CONFIG_DYNAMIC_TRACE section was replaced
by ftrace_startup_all().  Remove it to resolve the warning.

Link: https://lkml.kernel.org/r/20220214192847.488166-1-nathan@kernel.org

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 12:07:01 -05:00
Sven Schnelle
7acf3a127b tracing: Ensure trace buffer is at least 4096 bytes large
Booting the kernel with 'trace_buf_size=1' give a warning at
boot during the ftrace selftests:

[    0.892809] Running postponed tracer tests:
[    0.892893] Testing tracer function:
[    0.901899] Callback from call_rcu_tasks_trace() invoked.
[    0.983829] Callback from call_rcu_tasks_rude() invoked.
[    1.072003] .. bad ring buffer .. corrupted trace buffer ..
[    1.091944] Callback from call_rcu_tasks() invoked.
[    1.097695] PASSED
[    1.097701] Testing dynamic ftrace: .. filter failed count=0 ..FAILED!
[    1.353474] ------------[ cut here ]------------
[    1.353478] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1951 run_tracer_selftest+0x13c/0x1b0

Therefore enforce a minimum of 4096 bytes to make the selftest pass.

Link: https://lkml.kernel.org/r/20220214134456.1751749-1-svens@linux.ibm.com

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 12:07:01 -05:00
Christophe Leroy
bc82c38a69 tracing: Uninline trace_trigger_soft_disabled() partly
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline
keyword is not honored and trace_trigger_soft_disabled() appears
approx 50 times in vmlinux.

Adding -Winline to the build, the following message appears:

	./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline]

That function is rather big for an inlined function:

	c003df60 <trace_trigger_soft_disabled>:
	c003df60:	94 21 ff f0 	stwu    r1,-16(r1)
	c003df64:	7c 08 02 a6 	mflr    r0
	c003df68:	90 01 00 14 	stw     r0,20(r1)
	c003df6c:	bf c1 00 08 	stmw    r30,8(r1)
	c003df70:	83 e3 00 24 	lwz     r31,36(r3)
	c003df74:	73 e9 01 00 	andi.   r9,r31,256
	c003df78:	41 82 00 10 	beq     c003df88 <trace_trigger_soft_disabled+0x28>
	c003df7c:	38 60 00 00 	li      r3,0
	c003df80:	39 61 00 10 	addi    r11,r1,16
	c003df84:	4b fd 60 ac 	b       c0014030 <_rest32gpr_30_x>
	c003df88:	73 e9 00 80 	andi.   r9,r31,128
	c003df8c:	7c 7e 1b 78 	mr      r30,r3
	c003df90:	41 a2 00 14 	beq     c003dfa4 <trace_trigger_soft_disabled+0x44>
	c003df94:	38 c0 00 00 	li      r6,0
	c003df98:	38 a0 00 00 	li      r5,0
	c003df9c:	38 80 00 00 	li      r4,0
	c003dfa0:	48 05 c5 f1 	bl      c009a590 <event_triggers_call>
	c003dfa4:	73 e9 00 40 	andi.   r9,r31,64
	c003dfa8:	40 82 00 28 	bne     c003dfd0 <trace_trigger_soft_disabled+0x70>
	c003dfac:	73 ff 02 00 	andi.   r31,r31,512
	c003dfb0:	41 82 ff cc 	beq     c003df7c <trace_trigger_soft_disabled+0x1c>
	c003dfb4:	80 01 00 14 	lwz     r0,20(r1)
	c003dfb8:	83 e1 00 0c 	lwz     r31,12(r1)
	c003dfbc:	7f c3 f3 78 	mr      r3,r30
	c003dfc0:	83 c1 00 08 	lwz     r30,8(r1)
	c003dfc4:	7c 08 03 a6 	mtlr    r0
	c003dfc8:	38 21 00 10 	addi    r1,r1,16
	c003dfcc:	48 05 6f 6c 	b       c0094f38 <trace_event_ignore_this_pid>
	c003dfd0:	38 60 00 01 	li      r3,1
	c003dfd4:	4b ff ff ac 	b       c003df80 <trace_trigger_soft_disabled+0x20>

However it is located in a hot path so inlining it is important.
But forcing inlining of the entire function by using __always_inline
leads to increasing the text size by approx 20 kbytes.

Instead, split the fonction in two parts, one part with the likely
fast path, flagged __always_inline, and a second part out of line.

With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE
vmlinux text increases by only 1,4 kbytes, which is partly
compensated by a decrease of vmlinux data by 7 kbytes.

On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this
change reduces vmlinux text by more than 30 kbytes.

Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 12:07:01 -05:00
Steven Rostedt (Google)
b61edd5774 eprobes: Remove redundant event type information
Currently, the event probes save the type of the event they are attached
to when recording the event. For example:

  # echo 'e:switch sched/sched_switch prev_state=$prev_state prev_prio=$prev_prio next_pid=$next_pid next_prio=$next_prio' > dynamic_events
  # cat events/eprobes/switch/format

 name: switch
 ID: 1717
 format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned int __probe_type;        offset:8;       size:4; signed:0;
        field:u64 prev_state;   offset:12;      size:8; signed:0;
        field:u64 prev_prio;    offset:20;      size:8; signed:0;
        field:u64 next_pid;     offset:28;      size:8; signed:0;
        field:u64 next_prio;    offset:36;      size:8; signed:0;

 print fmt: "(%u) prev_state=0x%Lx prev_prio=0x%Lx next_pid=0x%Lx next_prio=0x%Lx", REC->__probe_type, REC->prev_state, REC->prev_prio, REC->next_pid, REC->next_prio

The __probe_type adds 4 bytes to every event.

One of the reasons for creating eprobes is to limit what is traced in an
event to be able to limit what is written into the ring buffer. Having
this redundant 4 bytes to every event takes away from this.

The event that is recorded can be retrieved from the event probe itself,
that is available when the trace is happening. For user space tools, it
could simply read the dynamic_event file to find the event they are for.
So there is really no reason to write this information into the ring
buffer for every event.

Link: https://lkml.kernel.org/r/20220218190057.2f5a19a8@gandalf.local.home

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 12:07:01 -05:00
Steven Rostedt (Google)
302e9edd54 tracing: Have traceon and traceoff trigger honor the instance
If a trigger is set on an event to disable or enable tracing within an
instance, then tracing should be disabled or enabled in the instance and
not at the top level, which is confusing to users.

Link: https://lkml.kernel.org/r/20220223223837.14f94ec3@rorschach.local.home

Cc: stable@vger.kernel.org
Fixes: ae63b31e4d ("tracing: Separate out trace events from global variables")
Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 12:06:45 -05:00
Arnd Bergmann
967747bbc0 uaccess: remove CONFIG_SET_FS
There are no remaining callers of set_fs(), so CONFIG_SET_FS
can be removed globally, along with the thread_info field and
any references to it.

This turns access_ok() into a cheaper check against TASK_SIZE_MAX.

As CONFIG_SET_FS is now gone, drop all remaining references to
set_fs()/get_fs(), mm_segment_t, user_addr_max() and uaccess_kernel().

Acked-by: Sam Ravnborg <sam@ravnborg.org> # for sparc32 changes
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> # for arc changes
Acked-by: Stafford Horne <shorne@gmail.com> # [openrisc, asm-generic]
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:06 +01:00
Daniel Bristot de Oliveira
ce33c845b0 tracing: Dump stacktrace trigger to the corresponding instance
The stacktrace event trigger is not dumping the stacktrace to the instance
where it was enabled, but to the global "instance."

Use the private_data, pointing to the trigger file, to figure out the
corresponding trace instance, and use it in the trigger action, like
snapshot_trigger does.

Link: https://lkml.kernel.org/r/afbb0b4f18ba92c276865bc97204d438473f4ebc.1645396236.git.bristot@kernel.org

Cc: stable@vger.kernel.org
Fixes: ae63b31e4d ("tracing: Separate out trace events from global variables")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-24 22:07:03 -05:00
Steven Rostedt (Google)
9f8e5aee93 tracing: Fix allocation of last_cmd in last_cmd_set()
The strncat() used in last_cmd_set() includes the nul byte of length of
the string being copied in, when it should only hold the size of the
string being copied (not the nul byte). Change it to subtract the length
of the allocated space and the nul byte to pass that into the strncat().

Also, assign "len" instead of initializing it to zero and its first update
is to do a "+=".

Link: https://lore.kernel.org/all/202202140628.fj6e4w4v-lkp@intel.com/

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-23 23:18:54 -05:00
Christoph Hellwig
73bd66d9c8 scsi: block: Remove REQ_OP_WRITE_SAME support
No more users of REQ_OP_WRITE_SAME or drivers implementing it are left,
so remove the infrastructure.

[mkp: fold in and tweak sysfs reporting fix]

Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
Jakub Kicinski
6b5567b1b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 11:44:20 -08:00
Beau Belgrave
2467cda1b5 user_events: Validate user payloads for size and null termination
Add validation to ensure data is at or greater than the min size for the
fields of the event. If a dynamic array is used and is a type of char,
ensure null termination of the array exists.

Link: https://lkml.kernel.org/r/20220118204326.2169-7-beaub@linux.microsoft.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:37:38 -05:00
Beau Belgrave
0279400ad3 user_events: Optimize writing events by only copying data once
Pass iterator through to probes to allow copying data directly to the
probe buffers instead of taking multiple copies. Enables eBPF user and
raw iterator types out to programs for no-copy scenarios.

Link: https://lkml.kernel.org/r/20220118204326.2169-6-beaub@linux.microsoft.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:37:31 -05:00
Beau Belgrave
3207d0459e user_events: Add basic perf and eBPF support
Adds support to write out user_event data to perf_probe/perf files as
well as to any attached eBPF program.

Link: https://lkml.kernel.org/r/20220118204326.2169-5-beaub@linux.microsoft.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:37:22 -05:00
Beau Belgrave
9aed4e157d user_events: Handle matching arguments from dyn_events
Ensures that when dynamic events requests a match with arguments that
they match what is in the user_event.

Link: https://lkml.kernel.org/r/20220118204326.2169-4-beaub@linux.microsoft.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:37:16 -05:00
Beau Belgrave
aa3b2b4c66 user_events: Add print_fmt generation support for basic types
Addes print_fmt format generation for basic types that are supported for
user processes. Only supports sizes that are the same on 32 and 64 bit.

Link: https://lkml.kernel.org/r/20220118204326.2169-3-beaub@linux.microsoft.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:37:08 -05:00
Beau Belgrave
7f5a08c79d user_events: Add minimal support for trace_event into ftrace
Minimal support for interacting with dynamic events, trace_event and
ftrace. Core outline of flow between user process, ioctl and trace_event
APIs.

User mode processes that wish to use trace events to get data into
ftrace, perf, eBPF, etc are limited to uprobes today. The user events
features enables an ABI for user mode processes to create and write to
trace events that are isolated from kernel level trace events. This
enables a faster path for tracing from user mode data as well as opens
managed code to participate in trace events, where stub locations are
dynamic.

User processes often want to trace only when it's useful. To enable this
a set of pages are mapped into the user process space that indicate the
current state of the user events that have been registered. User
processes can check if their event is hooked to a trace/probe, and if it
is, emit the event data out via the write() syscall.

Two new files are introduced into tracefs to accomplish this:
user_events_status - This file is mmap'd into participating user mode
processes to indicate event status.

user_events_data - This file is opened and register/delete ioctl's are
issued to create/open/delete trace events that can be used for tracing.

The typical scenario is on process start to mmap user_events_status. Processes
then register the events they plan to use via the REG ioctl. The ioctl reads
and updates the passed in user_reg struct. The status_index of the struct is
used to know the byte in the status page to check for that event. The
write_index of the struct is used to describe that event when writing out to
the fd that was used for the ioctl call. The data must always include this
index first when writing out data for an event. Data can be written either by
write() or by writev().

For example, in memory:
int index;
char data[];

Psuedo code example of typical usage:
struct user_reg reg;

int page_fd = open("user_events_status", O_RDWR);
char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0);
close(page_fd);

int data_fd = open("user_events_data", O_RDWR);

reg.size = sizeof(reg);
reg.name_args = (__u64)"test";

ioctl(data_fd, DIAG_IOCSREG, &reg);
int status_id = reg.status_index;
int write_id = reg.write_index;

struct iovec io[2];
io[0].iov_base = &write_id;
io[0].iov_len = sizeof(write_id);
io[1].iov_base = payload;
io[1].iov_len = sizeof(payload);

if (page_data[status_id])
	writev(data_fd, io, 2);

User events are also exposed via the dynamic_events tracefs file for
both create and delete. Current status is exposed via the user_events_status
tracefs file.

Simple example to register a user event via dynamic_events:
	echo u:test >> dynamic_events
	cat dynamic_events
	u:test

If an event is hooked to a probe, the probe hooked shows up:
	echo 1 > events/user_events/test/enable
	cat user_events_status
	1:test # Used by ftrace

	Active: 1
	Busy: 1
	Max: 4096

If an event is not hooked to a probe, no probe status shows up:
	echo 0 > events/user_events/test/enable
	cat user_events_status
	1:test

	Active: 1
	Busy: 0
	Max: 4096

Users can describe the trace event format via the following format:
	name[:FLAG1[,FLAG2...] [field1[;field2...]]

Each field has the following format:
	type name

Example for char array with a size of 20 named msg:
	echo 'u:detailed char[20] msg' >> dynamic_events
	cat dynamic_events
	u:detailed char[20] msg

Data offsets are based on the data written out via write() and will be
updated to reflect the correct offset in the trace_event fields. For dynamic
data it is recommended to use the new __rel_loc data type. This type will be
the same as __data_loc, but the offset is relative to this entry. This allows
user_events to not worry about what common fields are being inserted before
the data.

The above format is valid for both the ioctl and the dynamic_events file.

Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:36:46 -05:00
Steven Rostedt (Google)
55bc8384d3 tracing: Save both wakee and current on wakeup events
Use the sched_switch function to save both the wakee and the waker comms
in the saved cmdlines list when sched_wakeup is done.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:27:41 -05:00
Tom Zanussi
27c888da98 tracing: Remove size restriction on synthetic event cmd error logging
Currently, synthetic event command error strings are restricted to a
length of MAX_FILTER_STR_VAL (256), which is too short for some
commands already seen in the wild (with cmd strings longer than that
showing up truncated in err_log).

Remove the restriction so that no synthetic event command error string
is ever truncated.

Link: https://lkml.kernel.org/r/0376692396a81d0b795127c66ea92ca5bf60f481.1643399022.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:27:40 -05:00
Tom Zanussi
edfeed318d tracing: Remove size restriction on hist trigger cmd error logging
Currently, hist trigger command error strings are restricted to a
length of MAX_FILTER_STR_VAL (256), which is too short for some
commands already seen in the wild (with cmd strings longer than that
showing up truncated in err_log).

Remove the restriction so that no hist trigger command error string is
ever truncated.

Link: https://lkml.kernel.org/r/0f9d46407222eaf6632cd3b417bc50a11f401b71.1643399022.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:27:33 -05:00
Tom Zanussi
1581a884b7 tracing: Remove size restriction on tracing_log_err cmd strings
Currently, tracing_log_err.cmd strings are restricted to a length of
MAX_FILTER_STR_VAL (256), which is too short for some commands already
seen in the wild (with cmd strings longer than that showing up
truncated).

Remove the restriction so that no command string is ever truncated.

Link: https://lkml.kernel.org/r/ca965f23256b350ebd94b3dc1a319f28e8267f5f.1643319703.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:27:17 -05:00
Jakub Kicinski
1127170d45 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-02-09

We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).

The main changes are:

1) Add custom BPF allocator for JITs that pack multiple programs into a huge
   page to reduce iTLB pressure, from Song Liu.

2) Add __user tagging support in vmlinux BTF and utilize it from BPF
   verifier when generating loads, from Yonghong Song.

3) Add per-socket fast path check guarding from cgroup/BPF overhead when
   used by only some sockets, from Pavel Begunkov.

4) Continued libbpf deprecation work of APIs/features and removal of their
   usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
   and various others.

5) Improve BPF instruction set documentation by adding byte swap
   instructions and cleaning up load/store section, from Christoph Hellwig.

6) Switch BPF preload infra to light skeleton and remove libbpf dependency
   from it, from Alexei Starovoitov.

7) Fix architecture-agnostic macros in libbpf for accessing syscall
   arguments from BPF progs for non-x86 architectures,
   from Ilya Leoshkevich.

8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
   of 16-bit field with anonymous zero padding, from Jakub Sitnicki.

9) Add new bpf_copy_from_user_task() helper to read memory from a different
   task than current. Add ability to create sleepable BPF iterator progs,
   from Kenny Yu.

10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
    utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.

11) Generate temporary netns names for BPF selftests to avoid naming
    collisions, from Hangbin Liu.

12) Implement bpf_core_types_are_compat() with limited recursion for
    in-kernel usage, from Matteo Croce.

13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
    to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.

14) Misc minor fixes to libbpf and selftests from various folks.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
  bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
  libbpf: Fix compilation warning due to mismatched printf format
  selftests/bpf: Test BPF_KPROBE_SYSCALL macro
  libbpf: Add BPF_KPROBE_SYSCALL macro
  libbpf: Fix accessing the first syscall argument on s390
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on powerpc
  selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
  bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
  bpf: Fix leftover header->pages in sparc and powerpc code.
  libbpf: Fix signedness bug in btf_dump_array_data()
  selftests/bpf: Do not export subtest as standalone test
  bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
  ...
====================

Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09 18:40:56 -08:00
JaeSang Yoo
3203ce39ac tracing: Fix tp_printk option related with tp_printk_stop_on_boot
The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is
the same as another kernel parameter "tp_printk". If "tp_printk" setup is
called before the "tp_printk_stop_on_boot", it will override the latter
and keep it from being set.

This is similar to other kernel parameter issues, such as:
  Commit 745a600cf1 ("um: console: Ignore console= option")
or init/do_mounts.c:45 (setup function of "ro" kernel param)

Fix it by checking for a "_" right after the "tp_printk" and if that
exists do not process the parameter.

Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com

Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com>
[ Fixed up change log and added space after if condition ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-08 15:47:00 -05:00
Jakub Kicinski
c59400a68c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 17:36:16 -08:00
Tom Zanussi
67ab5eb71b tracing: Don't inc err_log entry count if entry allocation fails
tr->n_err_log_entries should only be increased if entry allocation
succeeds.

Doing it when it fails won't cause any problems other than wasting an
entry, but should be fixed anyway.

Link: https://lkml.kernel.org/r/cad1ab28f75968db0f466925e7cba5970cec6c29.1643319703.git.zanussi@kernel.org

Cc: stable@vger.kernel.org
Fixes: 2f754e771b ("tracing: Don't inc err_log entry count if entry allocation fails")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:49 -05:00
Tom Zanussi
097f1eefed tracing: Propagate is_signed to expression
During expression parsing, a new expression field is created which
should inherit the properties of the operands, such as size and
is_signed.

is_signed propagation was missing, causing spurious errors with signed
operands.  Add it in parse_expr() and parse_unary() to fix the problem.

Link: https://lkml.kernel.org/r/f4dac08742fd7a0920bf80a73c6c44042f5eaa40.1643319703.git.zanussi@kernel.org

Cc: stable@vger.kernel.org
Fixes: 100719dcef ("tracing: Add simple expression support to hist triggers")
Reported-by: Yordan Karadzhov <ykaradzhov@vmware.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215513
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:49 -05:00
Tom Zanussi
b59f2f2b86 tracing: Fix smatch warning for do while check in event_hist_trigger_parse()
The patch ec5ce0987541: "tracing: Allow whitespace to surround hist
trigger filter" from Jan 15, 2018, leads to the following Smatch
static checker warning:

    kernel/trace/trace_events_hist.c:6199 event_hist_trigger_parse()
    warn: 'p' can't be NULL.

Since p is always checked for a NULL value at the top of loop and
nothing in the rest of the loop will set it to NULL, the warning
is correct and might as well be 1 to silence the warning.

Link: https://lkml.kernel.org/r/a1d4c79766c0cf61e20438dc35244d216633fef6.1643319703.git.zanussi@kernel.org

Fixes: ec5ce09875 ("tracing: Allow whitespace to surround hist trigger filter")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:48 -05:00
Tom Zanussi
798a5b6c19 tracing: Fix smatch warning for null glob in event_hist_trigger_parse()
The recent rename of event_hist_trigger_parse() caused smatch
re-evaluation of trace_events_hist.c and as a result an old warning
was found:

    kernel/trace/trace_events_hist.c:6174 event_hist_trigger_parse()
    error: we previously assumed 'glob' could be null (see line 6166)

glob should never be null (and apparently smatch can also figure that
out and skip the warning when using the cross-function DB (but which
can't be used with a 0day build as it takes too much time to
generate)).

Nonetheless for clarity, remove the test but add a WARN_ON() in case
the code ever changes.

Link: https://lkml.kernel.org/r/96925e5c1f116654ada7ea0613d930b1266b5e1c.1643319703.git.zanussi@kernel.org

Fixes: f404da6e1d ("tracing: Add 'last error' error facility for hist triggers")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:47 -05:00
Xiaoke Wang
e629e7b525 tracing/histogram: Fix a potential memory leak for kstrdup()
kfree() is missing on an error path to free the memory allocated by
kstrdup():

  p = param = kstrdup(data->params[i], GFP_KERNEL);

So it is better to free it via kfree(p).

Link: https://lkml.kernel.org/r/tencent_C52895FD37802832A3E5B272D05008866F0A@qq.com

Cc: stable@vger.kernel.org
Fixes: d380dcde9a ("tracing: Fix now invalid var_ref_vals assumption in trace action")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:44 -05:00
Steven Rostedt (Google)
4ed308c445 ftrace: Have architectures opt-in for mcount build time sorting
First S390 complained that the sorting of the mcount sections at build
time caused the kernel to crash on their architecture. Now PowerPC is
complaining about it too. And also ARM64 appears to be having issues.

It may be necessary to also update the relocation table for the values
in the mcount table. Not only do we have to sort the table, but also
update the relocations that may be applied to the items in the table.

If the system is not relocatable, then it is fine to sort, but if it is,
some architectures may have issues (although x86 does not as it shifts all
addresses the same).

Add a HAVE_BUILDTIME_MCOUNT_SORT that an architecture can set to say it is
safe to do the sorting at build time.

Also update the config to compile in build time sorting in the sorttable
code in scripts/ to depend on CONFIG_BUILDTIME_MCOUNT_SORT.

Link: https://lore.kernel.org/all/944D10DA-8200-4BA9-8D0A-3BED9AA99F82@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220127153821.3bc1ac6e@gandalf.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Yinan Liu <yinan@linux.alibaba.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: 72b3942a17 ("scripts: ftrace - move the sort-processing in ftrace_init")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:44 -05:00
Jakub Kicinski
72d044e4bf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-27 12:54:16 -08:00
Kenny Yu
376040e473 bpf: Add bpf_copy_from_user_task() helper
This adds a helper for bpf programs to read the memory of other
tasks.

As an example use case at Meta, we are using a bpf task iterator program
and this new helper to print C++ async stack traces for all threads of
a given process.

Signed-off-by: Kenny Yu <kennyyu@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220124185403.468466-3-kennyyu@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-24 19:59:27 -08:00
Jakub Kicinski
caaba96131 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-01-24

We've added 80 non-merge commits during the last 14 day(s) which contain
a total of 128 files changed, 4990 insertions(+), 895 deletions(-).

The main changes are:

1) Add XDP multi-buffer support and implement it for the mvneta driver,
   from Lorenzo Bianconi, Eelco Chaudron and Toke Høiland-Jørgensen.

2) Add unstable conntrack lookup helpers for BPF by using the BPF kfunc
   infra, from Kumar Kartikeya Dwivedi.

3) Extend BPF cgroup programs to export custom ret value to userspace via
   two helpers bpf_get_retval() and bpf_set_retval(), from YiFei Zhu.

4) Add support for AF_UNIX iterator batching, from Kuniyuki Iwashima.

5) Complete missing UAPI BPF helper description and change bpf_doc.py script
   to enforce consistent & complete helper documentation, from Usama Arif.

6) Deprecate libbpf's legacy BPF map definitions and streamline XDP APIs to
   follow tc-based APIs, from Andrii Nakryiko.

7) Support BPF_PROG_QUERY for BPF programs attached to sockmap, from Di Zhu.

8) Deprecate libbpf's bpf_map__def() API and replace users with proper getters
   and setters, from Christy Lee.

9) Extend libbpf's btf__add_btf() with an additional hashmap for strings to
   reduce overhead, from Kui-Feng Lee.

10) Fix bpftool and libbpf error handling related to libbpf's hashmap__new()
    utility function, from Mauricio Vásquez.

11) Add support to BTF program names in bpftool's program dump, from Raman Shukhau.

12) Fix resolve_btfids build to pick up host flags, from Connor O'Brien.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (80 commits)
  selftests, bpf: Do not yet switch to new libbpf XDP APIs
  selftests, xsk: Fix rx_full stats test
  bpf: Fix flexible_array.cocci warnings
  xdp: disable XDP_REDIRECT for xdp frags
  bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags
  bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
  net: xdp: introduce bpf_xdp_pointer utility routine
  bpf: generalise tail call map compatibility check
  libbpf: Add SEC name for xdp frags programs
  bpf: selftests: update xdp_adjust_tail selftest to include xdp frags
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
  bpf: introduce frags support to bpf_prog_test_run_xdp()
  bpf: move user_size out of bpf_test_init
  bpf: add frags support to xdp copy helpers
  bpf: add frags support to the bpf_xdp_adjust_tail() API
  bpf: introduce bpf_xdp_get_buff_len helper
  net: mvneta: enable jumbo frames if the loaded XDP program support frags
  bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program
  net: mvneta: add frags support to XDP_TX
  xdp: add frags support to xdp_return_{buff/frame}
  ...
====================

Link: https://lore.kernel.org/r/20220124221235.18993-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-24 15:42:29 -08:00
Linus Torvalds
67bfce0e01 ftrace: Fix s390 breakage from sorting mcount tables
The latest merge of the tracing tree sorts the mcount table at build time.
 But s390 appears to do things differently (like always) and replaces the
 sorted table back to the original unsorted one. As the ftrace algorithm
 depends on it being sorted, bad things happen when it is not, and s390
 experienced those bad things.
 
 Add a new config to tell the boot if the mcount table is sorted or not,
 and allow s390 to opt out of it.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYezl+RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qs5wAQCO2K1uB+XI4rown8zKLrGxOiFIXChi
 Ds5GKJwYlcOnwQEAqKK3mAodmMy6ydbeY1OY1IDdTkOCAd752FDIe1Qozws=
 =aF8W
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
 "Fix s390 breakage from sorting mcount tables.

  The latest merge of the tracing tree sorts the mcount table at build
  time. But s390 appears to do things differently (like always) and
  replaces the sorted table back to the original unsorted one. As the
  ftrace algorithm depends on it being sorted, bad things happen when it
  is not, and s390 experienced those bad things.

  Add a new config to tell the boot if the mcount table is sorted or
  not, and allow s390 to opt out of it"

* tag 'trace-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix assuming build time sort works for s390
2022-01-23 08:07:02 +02:00
Steven Rostedt (Google)
6b9b641370 ftrace: Fix assuming build time sort works for s390
To speed up the boot process, as mcount_loc needs to be sorted for ftrace
to work properly, sorting it at build time is more efficient than boot up
and can save milliseconds of time. Unfortunately, this change broke s390
as it will modify the mcount_loc location after the sorting takes place
and will put back the unsorted locations. Since the sorting is skipped at
boot up if it is believed that it was sorted at run time, ftrace can crash
as its algorithms are dependent on the list being sorted.

Add a new config BUILDTIME_MCOUNT_SORT that is set when
BUILDTIME_TABLE_SORT but not if S390 is set. Use this config to determine
if sorting should take place at boot up.

Link: https://lore.kernel.org/all/yt9dee51ctfn.fsf@linux.ibm.com/

Fixes: 72b3942a17 ("scripts: ftrace - move the sort-processing in ftrace_init")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-23 00:10:09 -05:00
Eelco Chaudron
d99173027d bpf: add frags support to xdp copy helpers
This patch adds support for frags for the following helpers:
  - bpf_xdp_output()
  - bpf_perf_event_output()

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/340b4a99cdc24337b40eaf8bb597f9f9e7b0373e.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-21 14:14:02 -08:00
Linus Torvalds
4d66020dce Tracing updates for 5.17:
New:
 
 - The Real Time Linux Analysis (RTLA) tool is added to the tools directory.
 
 - Can safely filter on user space pointers with: field.ustring ~ "match-string"
 
 - eprobes can now be filtered like any other event.
 
 - trace_marker(_raw) now uses stream_open() to allow multiple threads to safely
   write to it. Note, this could possibly break existing user space, but we will
   not know until we hear about it, and then can revert the change if need be.
 
 - New field in events to display when bottom halfs are disabled.
 
 - Sorting of the ftrace functions are now done at compile time instead of
   at bootup.
 
 Infrastructure changes to support future efforts:
 
 - Added __rel_loc type for trace events. Similar to __data_loc but the offset
   to the dynamic data is based off of the location of the descriptor and not
   the beginning of the event. Needed for user defined events.
 
 - Some simplification of event trigger code.
 
 - Make synthetic events process its callback better to not hinder other
   event callbacks that are registered. Needed for user defined events.
 
 And other small fixes and clean ups.
 -
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYeGvcxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrZtAP9ICjJxX54MTErElhhUL/NFLV7wqhJi
 OIAgmp6jGVRqPAD+JxQtBnGH+3XMd71ioQkTfQ1rp+jBz2ERBj2DmELUAg0=
 =zmda
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New:

   - The Real Time Linux Analysis (RTLA) tool is added to the tools
     directory.

   - Can safely filter on user space pointers with: field.ustring ~
     "match-string"

   - eprobes can now be filtered like any other event.

   - trace_marker(_raw) now uses stream_open() to allow multiple threads
     to safely write to it. Note, this could possibly break existing
     user space, but we will not know until we hear about it, and then
     can revert the change if need be.

   - New field in events to display when bottom halfs are disabled.

   - Sorting of the ftrace functions are now done at compile time
     instead of at bootup.

  Infrastructure changes to support future efforts:

   - Added __rel_loc type for trace events. Similar to __data_loc but
     the offset to the dynamic data is based off of the location of the
     descriptor and not the beginning of the event. Needed for user
     defined events.

   - Some simplification of event trigger code.

   - Make synthetic events process its callback better to not hinder
     other event callbacks that are registered. Needed for user defined
     events.

  And other small fixes and cleanups"

* tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
  tracing: Add ustring operation to filtering string pointers
  rtla: Add rtla timerlat hist documentation
  rtla: Add rtla timerlat top documentation
  rtla: Add rtla timerlat documentation
  rtla: Add rtla osnoise hist documentation
  rtla: Add rtla osnoise top documentation
  rtla: Add rtla osnoise man page
  rtla: Add Documentation
  rtla/timerlat: Add timerlat hist mode
  rtla: Add timerlat tool and timelart top mode
  rtla/osnoise: Add the hist mode
  rtla/osnoise: Add osnoise top mode
  rtla: Add osnoise tool
  rtla: Helper functions for rtla
  rtla: Real-Time Linux Analysis tool
  tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails
  tracing: Remove duplicate warnings when calling trace_create_file()
  tracing/kprobes: 'nmissed' not showed correctly for kretprobe
  tracing: Add test for user space strings when filtering on string pointers
  tracing: Have syscall trace events use trace_event_buffer_lock_reserve()
  ...
2022-01-16 10:15:32 +02:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Cai Huoqing
ff78f6679d trace/hwlat: make use of the helper function kthread_run_on_cpu()
Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu()
to simplify the code.

Link: https://lkml.kernel.org/r/20211022025711.3673-7-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Cai Huoqing
11e4e3523d trace/osnoise: make use of the helper function kthread_run_on_cpu()
Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu()
to simplify the code.

Link: https://lkml.kernel.org/r/20211022025711.3673-6-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Cai Huoqing
64ed3a049e ring-buffer: make use of the helper function kthread_run_on_cpu()
Replace kthread_create/kthread_bind/wake_up_process() with
kthread_run_on_cpu() to simplify the code.

Link: https://lkml.kernel.org/r/20211022025711.3673-4-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:23 +02:00
Steven Rostedt
f37c3bbc63 tracing: Add ustring operation to filtering string pointers
Since referencing user space pointers is special, if the user wants to
filter on a field that is a pointer to user space, then they need to
specify it.

Add a ".ustring" attribute to the field name for filters to state that the
field is pointing to user space such that the kernel can take the
appropriate action to read that pointer.

Link: https://lore.kernel.org/all/yt9d8rvmt2jq.fsf@linux.ibm.com/

Fixes: 77360f9bbc ("tracing: Add test for user space strings when filtering on string pointers")
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-14 03:42:24 -05:00
Nikita Yushchenko
0878355b51 tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails
If start_per_cpu_kthreads() called from osnoise_workload_start() returns
error, event hooks are left in broken state: unhook_irq_events() called
but unhook_thread_events() and unhook_softirq_events() not called, and
trace_osnoise_callback_enabled flag not cleared.

On the next tracer enable, hooks get not installed due to
trace_osnoise_callback_enabled flag.

And on the further tracer disable an attempt to remove non-installed
hooks happened, hitting a WARN_ON_ONCE() in tracepoint_remove_func().

Fix the error path by adding the missing part of cleanup.
While at this, introduce osnoise_unhook_events() to avoid code
duplication between this error path and normal tracer disable.

Link: https://lkml.kernel.org/r/20220109153459.3701773-1-nikita.yushchenko@virtuozzo.com

Cc: stable@vger.kernel.org
Fixes: bce29ac9ce ("trace: Add osnoise tracer")
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 17:02:42 -05:00
Yuntao Wang
6e1b4bd191 tracing: Remove duplicate warnings when calling trace_create_file()
Since the same warning message is already printed in the
trace_create_file() function, there is no need to print it again.

Link: https://lkml.kernel.org/r/20220109162232.361747-1-ytcoode@gmail.com

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 17:02:42 -05:00
Xiangyang Zhang
dfea08a211 tracing/kprobes: 'nmissed' not showed correctly for kretprobe
The 'nmissed' column of the 'kprobe_profile' file for kretprobe is
not showed correctly, kretprobe can be skipped by two reasons,
shortage of kretprobe_instance which is counted by tk->rp.nmissed,
and kprobe itself is missed by some reason, so to show the sum.

Link: https://lkml.kernel.org/r/20220107150242.5019-1-xyz.sun.ok@gmail.com

Cc: stable@vger.kernel.org
Fixes: 4a846b443b ("tracing/kprobes: Cleanup kprobe tracer code")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Xiangyang Zhang <xyz.sun.ok@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 17:02:42 -05:00
Steven Rostedt
77360f9bbc tracing: Add test for user space strings when filtering on string pointers
Pingfan reported that the following causes a fault:

  echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
  echo 1 > events/syscalls/sys_enter_at/enable

The reason is that trace event filter treats the user space pointer
defined by "filename" as a normal pointer to compare against the "cpu"
string. The following bug happened:

 kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0001) - permissions violation
 PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
 Oops: 0001 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:strlen+0x0/0x20
 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
       48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
       48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
 RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
 RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
 RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
 RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
 R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
 R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
 FS:  00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  filter_pred_pchar+0x18/0x40
  filter_match_preds+0x31/0x70
  ftrace_syscall_enter+0x27a/0x2c0
  syscall_trace_enter.constprop.0+0x1aa/0x1d0
  do_syscall_64+0x16/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fa861d88664

The above happened because the kernel tried to access user space directly
and triggered a "supervisor read access in kernel mode" fault. Worse yet,
the memory could not even be loaded yet, and a SEGFAULT could happen as
well. This could be true for kernel space accessing as well.

To be even more robust, test both kernel and user space strings. If the
string fails to read, then simply have the filter fail.

Note, TASK_SIZE is used to determine if the pointer is user or kernel space
and the appropriate strncpy_from_kernel/user_nofault() function is used to
copy the memory. For some architectures, the compare to TASK_SIZE may always
pick user space or kernel space. If it gets it wrong, the only thing is that
the filter will fail to match. In the future, this needs to be fixed to have
the event denote which should be used. But failing a filter is much better
than panicing the machine, and that can be solved later.

Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/
Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Fixes: 87a342f5db ("tracing/filters: Support filtering for char * strings")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 17:00:45 -05:00
Steven Rostedt
3e2a56e6f6 tracing: Have syscall trace events use trace_event_buffer_lock_reserve()
Currently, the syscall trace events call trace_buffer_lock_reserve()
directly, which means that it misses out on some of the filtering
optimizations provided by the helper function
trace_event_buffer_lock_reserve(). Have the syscall trace events call that
instead, as it was missed when adding the update to use the temp buffer
when filtering.

Link: https://lkml.kernel.org/r/20220107225839.823118570@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 0fc1b09ff1 ("tracing: Use temp buffer when filtering events")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 16:23:05 -05:00
Steven Rostedt (VMware)
8147dc78e6 ftrace: Add test to make sure compiled time sorts work
Now that ftrace function pointers are sorted at compile time, add a test
that makes sure they are sorted at run time. This test is only run if it is
configured in.

Link: https://lkml.kernel.org/r/20211206151858.4d21a24d@gandalf.local.home

Cc: Yinan Liu <yinan@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2022-01-13 16:23:05 -05:00
Yinan Liu
72b3942a17 scripts: ftrace - move the sort-processing in ftrace_init
When the kernel starts, the initialization of ftrace takes
up a portion of the time (approximately 6~8ms) to sort mcount
addresses. We can save this time by moving mcount-sorting to
compile time.

Link: https://lkml.kernel.org/r/20211212113358.34208-2-yinan@linux.alibaba.com

Signed-off-by: Yinan Liu <yinan@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 16:23:04 -05:00
Xiaoke Wang
1c1857d400 tracing/probes: check the return value of kstrndup() for pbuf
kstrndup() is a memory allocation-related function, it returns NULL when
some internal memory errors happen. It is better to check the return
value of it so to catch the memory error in time.

Link: https://lkml.kernel.org/r/tencent_4D6E270731456EB88712ED7F13883C334906@qq.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: a42e3c4de9 ("tracing/probe: Add immediate string parameter support")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 16:23:04 -05:00
Xiaoke Wang
8c72242455 tracing/uprobes: Check the return value of kstrdup() for tu->filename
kstrdup() returns NULL when some internal memory errors happen, it is
better to check the return value of it so to catch the memory error in
time.

Link: https://lkml.kernel.org/r/tencent_3C2E330722056D7891D2C83F29C802734B06@qq.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 33ea4b2427 ("perf/core: Implement the 'perf_uprobe' PMU")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 16:23:04 -05:00
Sebastian Andrzej Siewior
289e7b0f7e tracing: Account bottom half disabled sections.
Disabling only bottom halves via local_bh_disable() disables also
preemption but this remains invisible to tracing. On a CONFIG_PREEMPT
kernel one might wonder why there is no scheduling happening despite the
N flag in the trace. The reason might be the a rcu_read_lock_bh()
section.

Add a 'b' to the tracing output if in task context with disabled bottom
halves.

Link: https://lkml.kernel.org/r/YbcbtdtC/bjCKo57@linutronix.de

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-13 16:23:04 -05:00
Linus Torvalds
d3c8108035 for-5.17/block-2022-01-11
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8DAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnhRD/wMAjsNO65PCA+o/bPpVi4ulx9EejAzrJnB
 5vHFvREAoOOGKvRpYGe4w3TcKyW+zPb+GtlXFjPfK+wuVzWhrQtW/+vkjKlBt8wK
 o7rzeMwTKJ9ZGvYaaQpp1yC0WURBB3qnCRQhb8dOQzhJgEXinhIOznZsut4mniLv
 fTqcDmKAb/+G6K6CQCCqnH0I/+OJZyUeSFo1kk2i4ZqCBepQpBkOL6H2rBOtGxUg
 bt1jiGHbbhCRYEE3u2kV0HP10qAChNaMQC705jV4Qpf4+3EntSxs+6nSb74dvMkX
 3+Wmp8Ctq6lpPnDL1nrAFGz3jZnB0Y+GdgOclQn3ViQd1FCXZzuYWQ3fTaBfURCZ
 /RE5nc047SqpwCFLOynM++OkaeQZ1zSxeyoFTtzDaPF4tLuaX3JHswvTzNGPw8SN
 BnexseNnNBCjJliZSEE7fOkjJDcev2dvRxPtI8/wkF4lHUgETc5IW563C53xo/Tx
 32yFjZwCVIpNWk21su/0H3iEq80wZ7PnriiN/E3JA6XbnevlRPu0NPMb0D258GCm
 yCcdPVDNZsQCB8hluqZcu0g6LSgZRo90Yg1oqKqEpAllJJMBaEAPPPuUIJh998mo
 iKGxZzgr7d9jrbGJTInp0F8b3B3/oV/hxgzy0Hu/mHP3AsnaAk9o/oEQZ7rX4Khr
 6biloqkIMA==
 =RWnJ
 -----END PGP SIGNATURE-----

Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Unify where the struct request handling code is located in the blk-mq
   code (Christoph)

 - Header cleanups (Christoph)

 - Clean up the io_context handling code (Christoph, me)

 - Get rid of ->rq_disk in struct request (Christoph)

 - Error handling fix for add_disk() (Christoph)

 - request allocation cleanusp (Christoph)

 - Documentation updates (Eric, Matthew)

 - Remove trivial crypto unregister helper (Eric)

 - Reduce shared tag overhead (John)

 - Reduce poll_stats memory overhead (me)

 - Known indirect function call for dio (me)

 - Use atomic references for struct request (me)

 - Support request list issue for block and NVMe (me)

 - Improve queue dispatch pinning (Ming)

 - Improve the direct list issue code (Keith)

 - BFQ improvements (Jan)

 - Direct completion helper and use it in mmc block (Sebastian)

 - Use raw spinlock for the blktrace code (Wander)

 - fsync error handling fix (Ye)

 - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me)

* tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits)
  MAINTAINERS: add entries for block layer documentation
  docs: block: remove queue-sysfs.rst
  docs: sysfs-block: document virt_boundary_mask
  docs: sysfs-block: document stable_writes
  docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
  docs: sysfs-block: add contact for nomerges
  docs: sysfs-block: sort alphabetically
  docs: sysfs-block: move to stable directory
  block: don't protect submit_bio_checks by q_usage_counter
  block: fix old-style declaration
  nvme-pci: fix queue_rqs list splitting
  block: introduce rq_list_move
  block: introduce rq_list_for_each_safe macro
  block: move rq_list macros to blk-mq.h
  block: drop needless assignment in set_task_ioprio()
  block: remove unnecessary trailing '\'
  bio.h: fix kernel-doc warnings
  block: check minor range in device_add_disk()
  block: use "unsigned long" for blk_validate_block_size().
  block: fix error unwinding in device_add_disk
  ...
2022-01-12 10:26:52 -08:00
Linus Torvalds
8efd0d9c31 Networking changes for 5.17.
Core
 ----
 
  - Defer freeing TCP skbs to the BH handler, whenever possible,
    or at least perform the freeing outside of the socket lock section
    to decrease cross-CPU allocator work and improve latency.
 
  - Add netdevice refcount tracking to locate sources of netdevice
    and net namespace refcount leaks.
 
  - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
    all queues from a single CPU removing latency spikes.
 
  - Various small optimizations throughout the stack from Eric Dumazet.
 
  - Make netdev->dev_addr[] constant, force modifications to go via
    appropriate helpers to allow us to keep addresses in ordered data
    structures.
 
  - Replace unix_table_lock with per-hash locks, improving performance
    of bind() calls.
 
  - Extend skb drop tracepoint with a drop reason.
 
  - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
 
 BPF
 ---
 
  - New helpers:
    - bpf_find_vma(), find and inspect VMAs for profiling use cases
    - bpf_loop(), runtime-bounded loop helper trading some execution
      time for much faster (if at all converging) verification
    - bpf_strncmp(), improve performance, avoid compiler flakiness
    - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
      for tracing programs, all inlined by the verifier
 
  - Support BPF relocations (CO-RE) in the kernel loader.
 
  - Further the support for BTF_TYPE_TAG annotations.
 
  - Allow access to local storage in sleepable helpers.
 
  - Convert verifier argument types to a composable form with different
    attributes which can be shared across types (ro, maybe-null).
 
  - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
    creating new, extensible ones where missing and deprecating those
    to be removed.
 
 Protocols
 ---------
 
  - WiFi (mac80211/cfg80211):
    - notify user space about long "come back in N" AP responses,
      allow it to react to such temporary rejections
    - allow non-standard VHT MCS 10/11 rates
    - use coarse time in airtime fairness code to save CPU cycles
 
  - Bluetooth:
    - rework of HCI command execution serialization to use a common
      queue and work struct, and improve handling errors reported
      in the middle of a batch of commands
    - rework HCI event handling to use skb_pull_data, avoiding packet
      parsing pitfalls
    - support AOSP Bluetooth Quality Report
 
  - SMC:
    - support net namespaces, following the RDMA model
    - improve connection establishment latency by pre-clearing buffers
    - introduce TCP ULP for automatic redirection to SMC
 
  - Multi-Path TCP:
    - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
    - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
      IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
    - support cmsgs: TCP_INQ
    - improvements in the data scheduler (assigning data to subflows)
    - support fastclose option (quick shutdown of the full MPTCP
      connection, similar to TCP RST in regular TCP)
 
  - MCTP (Management Component Transport) over serial, as defined by
    DMTF spec DSP0253 - "MCTP Serial Transport Binding".
 
 Driver API
 ----------
 
  - Support timestamping on bond interfaces in active/passive mode.
 
  - Introduce generic phylink link mode validation for drivers which
    don't have any quirks and where MAC capability bits fully express
    what's supported. Allow PCS layer to participate in the validation.
    Convert a number of drivers.
 
  - Add support to set/get size of buffers on the Rx rings and size of
    the tx copybreak buffer via ethtool.
 
  - Support offloading TC actions as first-class citizens rather than
    only as attributes of filters, improve sharing and device resource
    utilization.
 
  - WiFi (mac80211/cfg80211):
    - support forwarding offload (ndo_fill_forward_path)
    - support for background radar detection hardware
    - SA Query Procedures offload on the AP side
 
 New hardware / drivers
 ----------------------
 
  - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
    real-time requirements for isochronous communication with protocols
    like OPC UA Pub/Sub.
 
  - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
    integrated into many older Qualcomm SoCs, e.g. MSM8916 or
    MSM8974 (qcom_bam_dmux).
 
  - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch
    driver with support for bridging, VLANs and multicast forwarding
    (lan966x).
 
  - iwlmei driver for co-operating between Intel's WiFi driver and
    Intel's Active Management Technology (AMT) devices.
 
  - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
 
  - Bluetooth:
    - MediaTek MT7921 SDIO devices
    - Foxconn MT7922A
    - Realtek RTL8852AE
 
 Drivers
 -------
 
  - Significantly improve performance in the datapaths of:
    lan78xx, ax88179_178a, lantiq_xrx200, bnxt.
 
  - Intel Ethernet NICs:
    - igb: support PTP/time PEROUT and EXTTS SDP functions on
      82580/i354/i350 adapters
    - ixgbevf: new PF -> VF mailbox API which avoids the risk of
      mailbox corruption with ESXi
    - iavf: support configuration of VLAN features of finer granularity,
      stacked tags and filtering
    - ice: PTP support for new E822 devices with sub-ns precision
    - ice: support firmware activation without reboot
 
  - Mellanox Ethernet NICs (mlx5):
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
    - support TC forwarding when tunnel encap and decap happen between
      two ports of the same NIC
    - dynamically size and allow disabling various features to save
      resources for running in embedded / SmartNIC scenarios
 
  - Broadcom Ethernet NICs (bnxt):
    - use page frag allocator to improve Rx performance
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
 
  - Other Ethernet NICs:
    - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
 
  - Microsoft cloud/virtual NIC (mana):
    - add XDP support (PASS, DROP, TX)
 
  - Mellanox Ethernet switches (mlxsw):
    - initial support for Spectrum-4 ASICs
    - VxLAN with IPv6 underlay
 
  - Marvell Ethernet switches (prestera):
    - support flower flow templates
    - add basic IP forwarding support
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - support Per-Stream Filtering and Policing (PSFP)
    - enable cut-through forwarding between ports by default
    - support FDMA to improve packet Rx/Tx to CPU
 
  - Other embedded switches:
    - hellcreek: improve trapping management (STP and PTP) packets
    - qca8k: support link aggregation and port mirroring
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - qca6390, wcn6855: enable 802.11 power save mode in station mode
    - BSS color change support
    - WCN6855 hw2.1 support
    - 11d scan offload support
    - scan MAC address randomization support
    - full monitor mode, only supported on QCN9074
    - qca6390/wcn6855: report signal and tx bitrate
    - qca6390: rfkill support
    - qca6390/wcn6855: regdb.bin support
 
  - Intel WiFi (iwlwifi):
    - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
      in cooperation with the BIOS
    - support for Optimized Connectivity Experience (OCE) scan
    - support firmware API version 68
    - lots of preparatory work for the upcoming Bz device family
 
  - MediaTek WiFi (mt76):
    - Specific Absorption Rate (SAR) support
    - mt7921: 160 MHz channel support
 
  - RealTek WiFi (rtw88):
    - Specific Absorption Rate (SAR) support
    - scan offload
 
  - Other WiFi NICs
    - ath10k: support fetching (pre-)calibration data from nvmem
    - brcmfmac: configure keep-alive packet on suspend
    - wcn36xx: beacon filter support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S
 IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k
 h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O
 i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY
 AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz
 Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP
 NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF
 yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ
 0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N
 cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO
 z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J
 r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo=
 =KLm3
 -----END PGP SIGNATURE-----

Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Defer freeing TCP skbs to the BH handler, whenever possible, or at
     least perform the freeing outside of the socket lock section to
     decrease cross-CPU allocator work and improve latency.

   - Add netdevice refcount tracking to locate sources of netdevice and
     net namespace refcount leaks.

   - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
     all queues from a single CPU removing latency spikes.

   - Various small optimizations throughout the stack from Eric Dumazet.

   - Make netdev->dev_addr[] constant, force modifications to go via
     appropriate helpers to allow us to keep addresses in ordered data
     structures.

   - Replace unix_table_lock with per-hash locks, improving performance
     of bind() calls.

   - Extend skb drop tracepoint with a drop reason.

   - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.

  BPF
  ---

   - New helpers:
      - bpf_find_vma(), find and inspect VMAs for profiling use cases
      - bpf_loop(), runtime-bounded loop helper trading some execution
        time for much faster (if at all converging) verification
      - bpf_strncmp(), improve performance, avoid compiler flakiness
      - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
        for tracing programs, all inlined by the verifier

   - Support BPF relocations (CO-RE) in the kernel loader.

   - Further the support for BTF_TYPE_TAG annotations.

   - Allow access to local storage in sleepable helpers.

   - Convert verifier argument types to a composable form with different
     attributes which can be shared across types (ro, maybe-null).

   - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
     creating new, extensible ones where missing and deprecating those
     to be removed.

  Protocols
  ---------

   - WiFi (mac80211/cfg80211):
      - notify user space about long "come back in N" AP responses,
        allow it to react to such temporary rejections
      - allow non-standard VHT MCS 10/11 rates
      - use coarse time in airtime fairness code to save CPU cycles

   - Bluetooth:
      - rework of HCI command execution serialization to use a common
        queue and work struct, and improve handling errors reported in
        the middle of a batch of commands
      - rework HCI event handling to use skb_pull_data, avoiding packet
        parsing pitfalls
      - support AOSP Bluetooth Quality Report

   - SMC:
      - support net namespaces, following the RDMA model
      - improve connection establishment latency by pre-clearing buffers
      - introduce TCP ULP for automatic redirection to SMC

   - Multi-Path TCP:
      - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
      - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
        IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
      - support cmsgs: TCP_INQ
      - improvements in the data scheduler (assigning data to subflows)
      - support fastclose option (quick shutdown of the full MPTCP
        connection, similar to TCP RST in regular TCP)

   - MCTP (Management Component Transport) over serial, as defined by
     DMTF spec DSP0253 - "MCTP Serial Transport Binding".

  Driver API
  ----------

   - Support timestamping on bond interfaces in active/passive mode.

   - Introduce generic phylink link mode validation for drivers which
     don't have any quirks and where MAC capability bits fully express
     what's supported. Allow PCS layer to participate in the validation.
     Convert a number of drivers.

   - Add support to set/get size of buffers on the Rx rings and size of
     the tx copybreak buffer via ethtool.

   - Support offloading TC actions as first-class citizens rather than
     only as attributes of filters, improve sharing and device resource
     utilization.

   - WiFi (mac80211/cfg80211):
      - support forwarding offload (ndo_fill_forward_path)
      - support for background radar detection hardware
      - SA Query Procedures offload on the AP side

  New hardware / drivers
  ----------------------

   - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
     real-time requirements for isochronous communication with protocols
     like OPC UA Pub/Sub.

   - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
     integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
     (qcom_bam_dmux).

   - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
     with support for bridging, VLANs and multicast forwarding
     (lan966x).

   - iwlmei driver for co-operating between Intel's WiFi driver and
     Intel's Active Management Technology (AMT) devices.

   - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips

   - Bluetooth:
      - MediaTek MT7921 SDIO devices
      - Foxconn MT7922A
      - Realtek RTL8852AE

  Drivers
  -------

   - Significantly improve performance in the datapaths of: lan78xx,
     ax88179_178a, lantiq_xrx200, bnxt.

   - Intel Ethernet NICs:
      - igb: support PTP/time PEROUT and EXTTS SDP functions on
        82580/i354/i350 adapters
      - ixgbevf: new PF -> VF mailbox API which avoids the risk of
        mailbox corruption with ESXi
      - iavf: support configuration of VLAN features of finer
        granularity, stacked tags and filtering
      - ice: PTP support for new E822 devices with sub-ns precision
      - ice: support firmware activation without reboot

   - Mellanox Ethernet NICs (mlx5):
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
      - support TC forwarding when tunnel encap and decap happen between
        two ports of the same NIC
      - dynamically size and allow disabling various features to save
        resources for running in embedded / SmartNIC scenarios

   - Broadcom Ethernet NICs (bnxt):
      - use page frag allocator to improve Rx performance
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool

   - Other Ethernet NICs:
      - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support

   - Microsoft cloud/virtual NIC (mana):
      - add XDP support (PASS, DROP, TX)

   - Mellanox Ethernet switches (mlxsw):
      - initial support for Spectrum-4 ASICs
      - VxLAN with IPv6 underlay

   - Marvell Ethernet switches (prestera):
      - support flower flow templates
      - add basic IP forwarding support

   - NXP embedded Ethernet switches (ocelot & felix):
      - support Per-Stream Filtering and Policing (PSFP)
      - enable cut-through forwarding between ports by default
      - support FDMA to improve packet Rx/Tx to CPU

   - Other embedded switches:
      - hellcreek: improve trapping management (STP and PTP) packets
      - qca8k: support link aggregation and port mirroring

   - Qualcomm 802.11ax WiFi (ath11k):
      - qca6390, wcn6855: enable 802.11 power save mode in station mode
      - BSS color change support
      - WCN6855 hw2.1 support
      - 11d scan offload support
      - scan MAC address randomization support
      - full monitor mode, only supported on QCN9074
      - qca6390/wcn6855: report signal and tx bitrate
      - qca6390: rfkill support
      - qca6390/wcn6855: regdb.bin support

   - Intel WiFi (iwlwifi):
      - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
        in cooperation with the BIOS
      - support for Optimized Connectivity Experience (OCE) scan
      - support firmware API version 68
      - lots of preparatory work for the upcoming Bz device family

   - MediaTek WiFi (mt76):
      - Specific Absorption Rate (SAR) support
      - mt7921: 160 MHz channel support

   - RealTek WiFi (rtw88):
      - Specific Absorption Rate (SAR) support
      - scan offload

   - Other WiFi NICs
      - ath10k: support fetching (pre-)calibration data from nvmem
      - brcmfmac: configure keep-alive packet on suspend
      - wcn36xx: beacon filter support"

* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
  tcp: tcp_send_challenge_ack delete useless param `skb`
  net/qla3xxx: Remove useless DMA-32 fallback configuration
  rocker: Remove useless DMA-32 fallback configuration
  hinic: Remove useless DMA-32 fallback configuration
  lan743x: Remove useless DMA-32 fallback configuration
  net: enetc: Remove useless DMA-32 fallback configuration
  cxgb4vf: Remove useless DMA-32 fallback configuration
  cxgb4: Remove useless DMA-32 fallback configuration
  cxgb3: Remove useless DMA-32 fallback configuration
  bnx2x: Remove useless DMA-32 fallback configuration
  et131x: Remove useless DMA-32 fallback configuration
  be2net: Remove useless DMA-32 fallback configuration
  vmxnet3: Remove useless DMA-32 fallback configuration
  bna: Simplify DMA setting
  net: alteon: Simplify DMA setting
  myri10ge: Simplify DMA setting
  qlcnic: Simplify DMA setting
  net: allwinner: Fix print format
  page_pool: remove spinlock in page_pool_refill_alloc_cache()
  amt: fix wrong return type of amt_send_membership_update()
  ...
2022-01-10 19:06:09 -08:00
Tom Zanussi
86599dbe2c tracing: Add helper functions to simplify event_command.parse() callback handling
The event_command.parse() callback is responsible for parsing and
registering triggers.  The existing command implementions for this
callback duplicate a lot of the same code, so to clean up and
consolidate those implementations, introduce a handful of helper
functions for implementors to use.

This also makes it easier for new commands to be implemented and
allows them to focus more on the customizations they provide rather
than obscuring and complicating it with boilerplate code.

Link: https://lkml.kernel.org/r/c1ff71f594d45177706571132bd3119491097221.1641823001.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-10 11:09:11 -05:00
Tom Zanussi
2378a2d6b6 tracing: Remove ops param from event_command reg()/unreg() callbacks
The event_trigger_ops for an event_command are already accessible via
event_trigger_data.ops so remove the redundant ops from the callback.

Link: https://lkml.kernel.org/r/4c6f2a41820452f9cacddc7634ad442928aa2aa6.1641823001.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-10 11:09:11 -05:00
Tom Zanussi
fb339e531b tracing: Change event_trigger_ops func() to trigger()
The name of the func() callback on event_trigger_ops is too generic
and is easily confused with other callbacks with that name, so change
it to something that reflects its actual purpose.

In this case, the main purpose of the callback is to implement an
event trigger, so call it trigger() instead.

Also add some more documentation to event_trigger_ops describing the
callbacks a bit better.

Link: https://lkml.kernel.org/r/36ab812e3ee74ee03ae0043fda41a858ee728c00.1641823001.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-10 11:09:10 -05:00
Tom Zanussi
9ec5a7d168 tracing: Change event_command func() to parse()
The name of the func() callback on event_command is too generic and is
easily confused with other callbacks with that name, so change it to
something that reflects its actual purpose.

In this case, the main purpose of the callback is to parse an event
command, so call it parse() instead.

Link: https://lkml.kernel.org/r/7784e321840752ed88aac0b349c0c685fc9247b1.1641823001.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-10 11:09:10 -05:00
Naveen N. Rao
f28439db47 tracing: Tag trace_percpu_buffer as a percpu pointer
Tag trace_percpu_buffer as a percpu pointer to resolve warnings
reported by sparse:
  /linux/kernel/trace/trace.c:3218:46: warning: incorrect type in initializer (different address spaces)
  /linux/kernel/trace/trace.c:3218:46:    expected void const [noderef] __percpu *__vpp_verify
  /linux/kernel/trace/trace.c:3218:46:    got struct trace_buffer_struct *
  /linux/kernel/trace/trace.c:3234:9: warning: incorrect type in initializer (different address spaces)
  /linux/kernel/trace/trace.c:3234:9:    expected void const [noderef] __percpu *__vpp_verify
  /linux/kernel/trace/trace.c:3234:9:    got int *

Link: https://lkml.kernel.org/r/ebabd3f23101d89cb75671b68b6f819f5edc830b.1640255304.git.naveen.n.rao@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 07d777fe8c ("tracing: Add percpu buffers for trace_printk()")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-05 18:53:49 -05:00
Naveen N. Rao
823e670f7e tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()
With the new osnoise tracer, we are seeing the below splat:
    Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0)
    BUG: Unable to handle kernel data access on read at 0xc7d880000
    Faulting instruction address: 0xc0000000002ffa10
    Oops: Kernel access of bad area, sig: 11 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    ...
    NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0
    LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0
    Call Trace:
    [c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable)
    [c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90
    [c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290
    [c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710
    [c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130
    [c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270
    [c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180
    [c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278

osnoise tracer on ppc64le is triggering osnoise_taint() for negative
duration in get_int_safe_duration() called from
trace_sched_switch_callback()->thread_exit().

The problem though is that the check for a valid trace_percpu_buffer is
incorrect in get_trace_buf(). The check is being done after calculating
the pointer for the current cpu, rather than on the main percpu pointer.
Fix the check to be against trace_percpu_buffer.

Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-05 18:51:25 -05:00
David S. Miller
e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Wander Lairson Costa
361c81dbc5 blktrace: switch trace spinlock to a raw spinlock
The running_trace_lock protects running_trace_list and is acquired
within the tracepoint which implies disabled preemption. The spinlock_t
typed lock can not be acquired with disabled preemption on PREEMPT_RT
because it becomes a sleeping lock.
The runtime of the tracepoint depends on the number of entries in
running_trace_list and has no limit. The blk-tracer is considered debug
code and higher latencies here are okay.

Make running_trace_lock a raw_spinlock_t.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20211220192827.38297-1-wander@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-20 13:50:14 -07:00
Hao Luo
216e3cd2f2 bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
Some helper functions may modify its arguments, for example,
bpf_d_path, bpf_get_stack etc. Previously, their argument types
were marked as ARG_PTR_TO_MEM, which is compatible with read-only
mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate,
but technically incorrect, to modify a read-only memory by passing
it into one of such helper functions.

This patch tags the bpf_args compatible with immutable memory with
MEM_RDONLY flag. The arguments that don't have this flag will be
only compatible with mutable memory types, preventing the helper
from modifying a read-only memory. The bpf_args that have
MEM_RDONLY are compatible with both mutable memory and immutable
memory.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
2021-12-18 13:27:41 -08:00
Jakub Kicinski
7cd2802d74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 16:13:19 -08:00
Jakub Kicinski
aef2feda97 add missing bpf-cgroup.h includes
We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency,
make sure those who actually need more than the definition of
struct cgroup_bpf include bpf-cgroup.h explicitly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org
2021-12-16 14:57:09 -08:00
Jiri Olsa
f92c1e1836 bpf: Add get_func_[arg|ret|arg_cnt] helpers
Adding following helpers for tracing programs:

Get n-th argument of the traced function:
  long bpf_get_func_arg(void *ctx, u32 n, u64 *value)

Get return value of the traced function:
  long bpf_get_func_ret(void *ctx, u64 *value)

Get arguments count of the traced function:
  long bpf_get_func_arg_cnt(void *ctx)

The trampoline now stores number of arguments on ctx-8
address, so it's easy to verify argument index and find
return value argument's position.

Moving function ip address on the trampoline stack behind
the number of functions arguments, so it's now stored on
ctx-16 address if it's needed.

All helpers above are inlined by verifier.

Also bit unrelated small change - using newly added function
bpf_prog_has_trampoline in check_get_func_ip.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org
2021-12-13 09:25:59 -08:00
Steven Rostedt (VMware)
2768c1e7f9 tracing: Use trace_iterator_reset() in tracing_read_pipe()
Currently tracing_read_pipe() open codes trace_iterator_reset(). Just have
it use trace_iterator_reset() instead.

Link: https://lkml.kernel.org/r/20211210202616.64d432d2@gandalf.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-11 09:34:32 -05:00
Xiu Jianfeng
dba8796722 tracing: Use memset_startat helper in trace_iterator_reset()
Make use of memset_startat helper to simplify the code, there should be
no functional change as a result of this patch.

Link: https://lkml.kernel.org/r/20211210012245.207489-1-xiujianfeng@huawei.com

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-11 09:34:32 -05:00
Beau Belgrave
4f67cca70c tracing: Do not let synth_events block other dyn_event systems during create
synth_events is returning -EINVAL if the dyn_event create command does
not contain ' \t'. This prevents other systems from getting called back.
synth_events needs to return -ECANCELED in these cases when the command
is not targeting the synth_event system.

Link: https://lore.kernel.org/linux-trace-devel/20210930223821.11025-1-beaub@linux.microsoft.com

Fixes: c9e759b1e8 ("tracing: Rework synthetic event command parsing")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-11 09:34:32 -05:00
Jiri Olsa
e161c6bf39 tracing: Iterate trace_[ku]probe objects directly
As suggested by Linus [1] using list_for_each_entry to iterate
directly trace_[ku]probe objects so we can skip another call to
container_of in these loops.

[1] https://lore.kernel.org/r/CAHk-=wjakjw6-rDzDDBsuMoDCqd+9ogifR_EE1F0K-jYek1CdA@mail.gmail.com

Link: https://lkml.kernel.org/r/20211125202852.406405-1-jolsa@kernel.org

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-11 09:34:32 -05:00
Jakub Kicinski
be3158290d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 15:56:13 -08:00
Miaoqian Lin
c24be24aed tracing: Fix possible memory leak in __create_synth_event() error path
There's error paths in __create_synth_event() after the argv is allocated
that fail to free it. Add a jump to free it when necessary.

Link: https://lkml.kernel.org/r/20211209024317.11783-1-linmq006@gmail.com

Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
[ Fixed up the patch and change log ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-09 13:03:05 -05:00
Jiri Olsa
fea3ffa48c ftrace: Add cleanup to unregister_ftrace_direct_multi
Adding ops cleanup to unregister_ftrace_direct_multi,
so it can be reused in another register call.

Link: https://lkml.kernel.org/r/20211206182032.87248-3-jolsa@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Fixes: f64dd4627e ("ftrace: Add multi direct register/unregister interface")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-08 12:12:02 -05:00
Jiri Olsa
7d5b7cad79 ftrace: Use direct_ops hash in unregister_ftrace_direct
Now when we have *direct_multi interface the direct_functions
hash is no longer owned just by direct_ops. It's also used by
any other ftrace_ops passed to *direct_multi interface.

Thus to find out that we are unregistering the last function
from direct_ops, we need to check directly direct_ops's hash.

Link: https://lkml.kernel.org/r/20211206182032.87248-2-jolsa@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Fixes: f64dd4627e ("ftrace: Add multi direct register/unregister interface")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-08 12:12:02 -05:00
John Keeping
2972e3050e tracing: Make trace_marker{,_raw} stream-like
The tracing marker files are write-only streams with no meaningful
concept of file position.  Using stream_open() to mark them as
stream-link indicates this and has the added advantage that a single
file descriptor can now be used from multiple threads without contention
thanks to clearing FMODE_ATOMIC_POS.

Note that this has the potential to break existing userspace by since
both lseek(2) and pwrite(2) will now return ESPIPE when previously lseek
would have updated the stored offset and pwrite would have appended to
the trace.  A survey of libtracefs and several other projects found to
use trace_marker(_raw) [1][2][3] suggests that everyone limits
themselves to calling write(2) and close(2) on these file descriptors so
there is a good chance this will go unnoticed and the benefits of
reduced overhead and lock contention seem worth the risk.

[1] https://github.com/google/perfetto
[2] https://github.com/intel/media-driver/
[3] https://w1.fi/cgit/hostap/

Link: https://lkml.kernel.org/r/20211207142558.347029-1-john@metanate.com

Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-07 22:05:49 -05:00
Uladzislau Rezki (Sony)
a6ed2aee54 tracing: Switch to kvfree_rcu() API
Instead of invoking a synchronize_rcu() to free a pointer
after a grace period we can directly make use of new API
that does the same but in more efficient way.

Link: https://lkml.kernel.org/r/20211124110308.2053-10-urezki@gmail.com

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 17:53:50 -05:00
Qiujun Huang
1d83c3a20b tracing: Fix synth_event_add_val() kernel-doc comment
It's named field here.

Link: https://lkml.kernel.org/r/20210516022410.64271-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 17:53:50 -05:00
Steven Rostedt (VMware)
b7d5eb267f tracing/uprobes: Use trace_event_buffer_reserve() helper
To be consistent with kprobes and eprobes, use
trace_event_buffer_reserver() and trace_event_buffer_commit(). This will
ensure that any updates to trace events will also be implemented on uprobe
events.

Link: https://lkml.kernel.org/r/20211206162440.69fbf96c@gandalf.local.home

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 17:53:23 -05:00
Steven Rostedt (VMware)
5e6cd84e2f tracing/kprobes: Do not open code event reserve logic
As kprobe events use trace_event_buffer_commit() to commit the event to
the ftrace ring buffer, for consistency, it should use
trace_event_buffer_reserve() to allocate it, as the two functions are
related.

Link: https://lkml.kernel.org/r/20211130024319.257430762@goodmis.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:22 -05:00
Steven Rostedt (VMware)
3e8b1a29a0 tracing: Have eprobes use filtering logic of trace events
The eprobes open code the reserving of the event on the ring buffer for
ftrace instead of using the ftrace event wrappers, which means that it
doesn't get affected by the filters, breaking the filtering logic on user
space.

Link: https://lkml.kernel.org/r/20211130024319.068451680@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:22 -05:00
Steven Rostedt (VMware)
6c536d76cf tracing: Disable preemption when using the filter buffer
In case trace_event_buffer_lock_reserve() is called with preemption
enabled, the algorithm that defines the usage of the per cpu filter buffer
may fail if the task schedules to another CPU after determining which
buffer it will use.

Disable preemption when using the filter buffer. And because that same
buffer must be used throughout the call, keep preemption disabled until
the filter buffer is released.

This will also keep the semantics between the use case of when the filter
buffer is used, and when the ring buffer itself is used, as that case also
disables preemption until the ring buffer is released.

Link: https://lkml.kernel.org/r/20211130024318.880190623@goodmis.org

[ Fixed warning of assignment in if statement
  Reported-by: kernel test robot <lkp@intel.com> ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:22 -05:00
Steven Rostedt (VMware)
e07a1d5762 tracing: Use __this_cpu_read() in trace_event_buffer_lock_reserver()
The value read by this_cpu_read() is used later and its use is expected to
stay on the same CPU as being read. But this_cpu_read() does not warn if
it is called without preemption disabled, where as __this_cpu_read() will
check if preemption is disabled on CONFIG_DEBUG_PREEMPT

Currently all callers have preemption disabled, but there may be new
callers in the future that may not.

Link: https://lkml.kernel.org/r/20211130024318.698165354@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:22 -05:00
Masami Hiramatsu
55de2c0b56 tracing: Add '__rel_loc' using trace event macros
Add '__rel_loc' using trace event macros. These macros are usually
not used in the kernel, except for testing purpose.
This also add "rel_" variant of macros for dynamic_array string,
and bitmask.

Link: https://lkml.kernel.org/r/163757342119.510314.816029622439099016.stgit@devnote2

Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:21 -05:00
Masami Hiramatsu
05770dd0ad tracing: Support __rel_loc relative dynamic data location attribute
Add '__rel_loc' new dynamic data location attribute which encodes
the data location from the next to the field itself.

The '__data_loc' is used for encoding the dynamic data location on
the trace event record. But '__data_loc' is not useful if the writer
doesn't know the event header (e.g. user event), because it records
the dynamic data offset from the entry of the record, not the field
itself.

This new '__rel_loc' attribute encodes the data location relatively
from the next of the field. For example, when there is a record like
below (the number in the parentheses is the size of fields)

 |header(N)|common(M)|fields(K)|__data_loc(4)|fields(L)|data(G)|

In this case, '__data_loc' field will be

 __data_loc = (G << 16) | (N+M+K+4+L)

If '__rel_loc' is used, this will be

 |header(N)|common(M)|fields(K)|__rel_loc(4)|fields(L)|data(G)|

where

 __rel_loc = (G << 16) | (L)

This case shows L bytes after the '__rel_loc' attribute  field,
if there is no fields after the __rel_loc field, L must be 0.

This is relatively easy (and no need to consider the kernel header
change) when the event data fields are composed by user who doesn't
know header and common fields.

Link: https://lkml.kernel.org/r/163757341258.510314.4214431827833229956.stgit@devnote2

Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:21 -05:00
Colin Ian King
f2b20c6627 tracing: Fix spelling mistake "aritmethic" -> "arithmetic"
There is a spelling mistake in the tracing mini-HOWTO text. Fix it.

Link: https://lkml.kernel.org/r/20211108201513.42876-1-colin.i.king@gmail.com

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:21 -05:00
Kajol Jain
db52f57211 bpf: Remove config check to enable bpf support for branch records
Branch data available to BPF programs can be very useful to get stack traces
out of userspace application.

Commit fff7b64355 ("bpf: Add bpf_read_branch_records() helper") added BPF
support to capture branch records in x86. Enable this feature also for other
architectures as well by removing checks specific to x86.

If an architecture doesn't support branch records, bpf_read_branch_records()
still has appropriate checks and it will return an -EINVAL in that scenario.
Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures
should return -ENOENT in such case. Hence, update the appropriate check to
return -ENOENT instead.

Selftest 'perf_branches' result on power9 machine which has the branch stacks
support:

 - Before this patch:

  [command]# ./test_progs -t perf_branches
   #88/1 perf_branches/perf_branches_hw:FAIL
   #88/2 perf_branches/perf_branches_no_hw:OK
   #88 perf_branches:FAIL
  Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED

 - After this patch:

  [command]# ./test_progs -t perf_branches
   #88/1 perf_branches/perf_branches_hw:OK
   #88/2 perf_branches/perf_branches_no_hw:OK
   #88 perf_branches:OK
  Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

Selftest 'perf_branches' result on power9 machine which doesn't have branch
stack report:

 - After this patch:

  [command]# ./test_progs -t perf_branches
   #88/1 perf_branches/perf_branches_hw:SKIP
   #88/2 perf_branches/perf_branches_no_hw:OK
   #88 perf_branches:OK
  Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED

Fixes: fff7b64355 ("bpf: Add bpf_read_branch_records() helper")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com
2021-12-06 15:21:15 +01:00
Jakub Kicinski
fc993be36f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-02 11:44:56 -08:00
Chen Jun
f25667e598 tracing: Fix a kmemleak false positive in tracing_map
Doing the command:
  echo 'hist:key=common_pid.execname,common_timestamp' > /sys/kernel/debug/tracing/events/xxx/trigger

Triggers many kmemleak reports:

unreferenced object 0xffff0000c7ea4980 (size 128):
  comm "bash", pid 338, jiffies 4294912626 (age 9339.324s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0
    [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178
    [<00000000633bd154>] tracing_map_init+0x1f8/0x268
    [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0
    [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128
    [<00000000f549355a>] event_trigger_write+0x7c/0x120
    [<00000000b80f898d>] vfs_write+0xc4/0x380
    [<00000000823e1055>] ksys_write+0x74/0xf8
    [<000000008a9374aa>] __arm64_sys_write+0x24/0x30
    [<0000000087124017>] do_el0_svc+0x88/0x1c0
    [<00000000efd0dcd1>] el0_svc+0x1c/0x28
    [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0
    [<00000000e7399680>] el0_sync+0x148/0x180
unreferenced object 0xffff0000c7ea4980 (size 128):
  comm "bash", pid 338, jiffies 4294912626 (age 9339.324s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0
    [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178
    [<00000000633bd154>] tracing_map_init+0x1f8/0x268
    [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0
    [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128
    [<00000000f549355a>] event_trigger_write+0x7c/0x120
    [<00000000b80f898d>] vfs_write+0xc4/0x380
    [<00000000823e1055>] ksys_write+0x74/0xf8
    [<000000008a9374aa>] __arm64_sys_write+0x24/0x30
    [<0000000087124017>] do_el0_svc+0x88/0x1c0
    [<00000000efd0dcd1>] el0_svc+0x1c/0x28
    [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0
    [<00000000e7399680>] el0_sync+0x148/0x180

The reason is elts->pages[i] is alloced by get_zeroed_page.
and kmemleak will not scan the area alloced by get_zeroed_page.
The address stored in elts->pages will be regarded as leaked.

That is, the elts->pages[i] will have pointers loaded onto it as well, and
without telling kmemleak about it, those pointers will look like memory
without a reference.

To fix this, call kmemleak_alloc to tell kmemleak to scan elts->pages[i]

Link: https://lkml.kernel.org/r/20211124140801.87121-1-chenjun102@huawei.com

Signed-off-by: Chen Jun <chenjun102@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-01 21:04:34 -05:00
Steven Rostedt (VMware)
450fec13d9 tracing/histograms: String compares should not care about signed values
When comparing two strings for the "onmatch" histogram trigger, fields
that are strings use string comparisons, which do not care about being
signed or not.

Do not fail to match two string fields if one is unsigned char array and
the other is a signed char array.

Link: https://lore.kernel.org/all/20211129123043.5cfd687a@gandalf.local.home/

Cc: stable@vgerk.kernel.org
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Yafang Shao <laoar.shao@gmail.com>
Fixes: b05e89ae7c ("tracing: Accept different type for synthetic event fields")
Reviewed-by: Masami Hiramatsu <mhiramatsu@kernel.org>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-01 21:04:22 -05:00
Christoph Hellwig
f3fa33acca block: remove the ->rq_disk field in struct request
Just use the disk attached to the request_queue instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29 06:41:29 -07:00
Steven Rostedt (VMware)
27ff768fa2 tracing: Test the 'Do not trace this pid' case in create event
When creating a new event (via a module, kprobe, eprobe, etc), the
descriptors that are created must add flags for pid filtering if an
instance has pid filtering enabled, as the flags are used at the time the
event is executed to know if pid filtering should be done or not.

The "Only trace this pid" case was added, but a cut and paste error made
that case checked twice, instead of checking the "Trace all but this pid"
case.

Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/

Fixes: 6cb206508b ("tracing: Check pid filtering when creating events")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-27 16:50:43 -05:00
Steven Rostedt (VMware)
a55f224ff5 tracing: Fix pid filtering when triggers are attached
If a event is filtered by pid and a trigger that requires processing of
the event to happen is a attached to the event, the discard portion does
not take the pid filtering into account, and the event will then be
recorded when it should not have been.

Cc: stable@vger.kernel.org
Fixes: 3fdaf80f4a ("tracing: Implement event pid filtering")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-26 17:37:06 -05:00
Jakub Kicinski
93d5404e89 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_main.c
  8afc7e471a ("net: ipa: separate disabling setup from modem stop")
  76b5fbcd6b ("net: ipa: kill ipa_modem_init()")

Duplicated include, drop one.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26 13:45:19 -08:00
Steven Rostedt (VMware)
6cb206508b tracing: Check pid filtering when creating events
When pid filtering is activated in an instance, all of the events trace
files for that instance has the PID_FILTER flag set. This determines
whether or not pid filtering needs to be done on the event, otherwise the
event is executed as normal.

If pid filtering is enabled when an event is created (via a dynamic event
or modules), its flag is not updated to reflect the current state, and the
events are not filtered properly.

Cc: stable@vger.kernel.org
Fixes: 3fdaf80f4a ("tracing: Implement event pid filtering")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-26 14:31:23 -05:00
Jiri Olsa
1880ed71ce tracing/uprobe: Fix uprobe_perf_open probes iteration
Add missing 'tu' variable initialization in the probes loop,
otherwise the head 'tu' is used instead of added probes.

Link: https://lkml.kernel.org/r/20211123142801.182530-1-jolsa@kernel.org

Cc: stable@vger.kernel.org
Fixes: 99c9a923e9 ("tracing/uprobe: Fix double perf_event linking on multiprobe uprobe")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-23 20:52:01 -05:00
Linus Torvalds
e4365e369f Tracing fixes:
- Fix double free in destroy_hist_field
 
  - Harden memset() of trace_iterator structure
 
  - Do not warn in trace printk check when test buffer fills up
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYZgSTRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qqsJAQDg6Oe0XMclYPLMyRlEJEMEV2bFh8ZQ
 G1jqvMLcMnuFZAEA2onhzHzjR1amXuSw9YwNHcDB7eHiaIg9pgdOFFDUpwI=
 =KTcf
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix double free in destroy_hist_field

 - Harden memset() of trace_iterator structure

 - Do not warn in trace printk check when test buffer fills up

* tag 'trace-v5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Don't use out-of-sync va_list in event printing
  tracing: Use memset_startat() to zero struct trace_iterator
  tracing/histogram: Fix UAF in destroy_hist_field()
2021-11-19 13:50:48 -08:00
Nikita Yushchenko
2ef75e9bd2 tracing: Don't use out-of-sync va_list in event printing
If trace_seq becomes full, trace_seq_vprintf() no longer consumes
arguments from va_list, making va_list out of sync with format
processing by trace_check_vprintf().

This causes va_arg() in trace_check_vprintf() to return wrong
positional argument, which results into a WARN_ON_ONCE() hit.

ftrace_stress_test from LTP triggers this situation.

Fix it by explicitly avoiding further use if va_list at the point
when it's consistency can no longer be guaranteed.

Link: https://lkml.kernel.org/r/20211118145516.13219-1-nikita.yushchenko@virtuozzo.com

Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-18 21:10:18 -05:00
Kees Cook
c4c1dbcc09 tracing: Use memset_startat() to zero struct trace_iterator
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Use memset_startat() to avoid confusing memset() about writing beyond
the target struct member.

Link: https://lkml.kernel.org/r/20211118202217.1285588-1-keescook@chromium.org

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-18 20:54:51 -05:00
Jakub Kicinski
50fc24944a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18 13:13:16 -08:00
Linus Torvalds
8d0112ac6f Networking fixes for 5.16-rc2, including fixes from bpf, mac80211.
Current release - regressions:
 
  - devlink: don't throw an error if flash notification sent before
    devlink visible
 
  - page_pool: Revert "page_pool: disable dma mapping support...",
    turns out there are active arches who need it
 
 Current release - new code bugs:
 
  - amt: cancel delayed_work synchronously in amt_fini()
 
 Previous releases - regressions:
 
  - xsk: fix crash on double free in buffer pool
 
  - bpf: fix inner map state pruning regression causing program
    rejections
 
  - mac80211: drop check for DONT_REORDER in __ieee80211_select_queue,
    preventing mis-selecting the best effort queue
 
  - mac80211: do not access the IV when it was stripped
 
  - mac80211: fix radiotap header generation, off-by-one
 
  - nl80211: fix getting radio statistics in survey dump
 
  - e100: fix device suspend/resume
 
 Previous releases - always broken:
 
  - tcp: fix uninitialized access in skb frags array for Rx 0cp
 
  - bpf: fix toctou on read-only map's constant scalar tracking
 
  - bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
 
  - tipc: only accept encrypted MSG_CRYPTO msgs
 
  - smc: transfer remaining wait queue entries during fallback,
    fix missing wake ups
 
  - udp: validate checksum in udp_read_sock() (when sockmap is used)
 
  - sched: act_mirred: drop dst for the direction from egress to ingress
 
  - virtio_net_hdr_to_skb: count transport header in UFO, prevent
    allowing bad skbs into the stack
 
  - nfc: reorder the logic in nfc_{un,}register_device, fix unregister
 
  - ipsec: check return value of ipv6_skip_exthdr
 
  - usb: r8152: add MAC passthrough support for more Lenovo Docks
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGWf08ACgkQMUZtbf5S
 Irt+lxAAj8FAoLoSmQKUK3LttLLh0ZQQXu8Riey+wrP8Z9Yp8xWXIaVRF1c0vCE6
 clbrF+mLfk6Wvv/RzOgwyBMHvK+djr/oVDNSmjlRvss4MLDfOQZhUV8V4XpvF4Up
 hI7wyKfHtd7niosNqil6wklJFpLU8WyIAWrPSIPE6JlPkJmcm3GUGsliwEPwdLY1
 yl7z4zsxigjA+hKxYqNQX6tixF3xnbDUbAnWshrSPL89melwz4GMao45qmcxJEVr
 EipPhKifk0hT067jG08FMXcKBFKt6rKk7SVNo4mtq8Tl6HleJBj8fdaJAjSdFahB
 +rYJ0sDZwGoDL5CxZ5mD3fM1cDgh4WFEM0z//0b/bZhoPDRKEpLr9LPuv+N6+/rA
 8D98EHsvyNjlFgdyd8celMstiGtBn4YLEoLNYYh9Qibgm0XsCuv0yox7g0AOLMmQ
 QiBmh2EnaXNPQ8PRZNMK3VH5ol2KoYWL6yrpJYV+wOWVLfezwlSsjkPSfW5pF9FG
 hU0iQBp/YTCdCadR9YLj8qfDWDUAkCN7WpqIu9EA9FXJcYjJVaix0MA/tAVlzXyR
 xlB7cU6O5NABcs/+04zPkKLwKbVYNMqgvKE+FVDVm+BKxo0UMxcmz/Np/ZYxfhkF
 bwKplaiPb2H4D6t0sdxqaeYirPwt1BcleLilae6vHG1jO90H9Vw=
 =tlqV
 -----END PGP SIGNATURE-----

Merge tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, mac80211.

  Current release - regressions:

   - devlink: don't throw an error if flash notification sent before
     devlink visible

   - page_pool: Revert "page_pool: disable dma mapping support...",
     turns out there are active arches who need it

  Current release - new code bugs:

   - amt: cancel delayed_work synchronously in amt_fini()

  Previous releases - regressions:

   - xsk: fix crash on double free in buffer pool

   - bpf: fix inner map state pruning regression causing program
     rejections

   - mac80211: drop check for DONT_REORDER in __ieee80211_select_queue,
     preventing mis-selecting the best effort queue

   - mac80211: do not access the IV when it was stripped

   - mac80211: fix radiotap header generation, off-by-one

   - nl80211: fix getting radio statistics in survey dump

   - e100: fix device suspend/resume

  Previous releases - always broken:

   - tcp: fix uninitialized access in skb frags array for Rx 0cp

   - bpf: fix toctou on read-only map's constant scalar tracking

   - bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing
     progs

   - tipc: only accept encrypted MSG_CRYPTO msgs

   - smc: transfer remaining wait queue entries during fallback, fix
     missing wake ups

   - udp: validate checksum in udp_read_sock() (when sockmap is used)

   - sched: act_mirred: drop dst for the direction from egress to
     ingress

   - virtio_net_hdr_to_skb: count transport header in UFO, prevent
     allowing bad skbs into the stack

   - nfc: reorder the logic in nfc_{un,}register_device, fix unregister

   - ipsec: check return value of ipv6_skip_exthdr

   - usb: r8152: add MAC passthrough support for more Lenovo Docks"

* tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
  ptp: ocp: Fix a couple NULL vs IS_ERR() checks
  net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
  net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
  ipv6: check return value of ipv6_skip_exthdr
  e100: fix device suspend/resume
  devlink: Don't throw an error if flash notification sent before devlink visible
  page_pool: Revert "page_pool: disable dma mapping support..."
  ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
  octeontx2-af: debugfs: don't corrupt user memory
  NFC: add NCI_UNREG flag to eliminate the race
  NFC: reorder the logic in nfc_{un,}register_device
  NFC: reorganize the functions in nci_request
  tipc: check for null after calling kmemdup
  i40e: Fix display error code in dmesg
  i40e: Fix creation of first queue by omitting it if is not power of two
  i40e: Fix warning message and call stack during rmmod i40e driver
  i40e: Fix ping is lost after configuring ADq on VF
  i40e: Fix changing previously set num_queue_pairs for PFs
  i40e: Fix NULL ptr dereference on VSI filter sync
  i40e: Fix correct max_pkt_size on VF RX queue
  ...
2021-11-18 12:54:24 -08:00
Kalesh Singh
f86b0aaad7 tracing/histogram: Fix UAF in destroy_hist_field()
Calling destroy_hist_field() on an expression will recursively free
any operands associated with the expression. If during expression
parsing the operands of the expression are already set when an error
is encountered, there is no need to explicity free the operands. Doing
so will result in destroy_hist_field() being called twice for the
operands and lead to a use-after-free (UAF) error.

If the operands are associated with the expression, only call
destroy_hist_field() on the expression since the operands will be
recursively freed.

Link: https://lore.kernel.org/all/CAHk-=wgcrEbFgkw9720H3tW-AhHOoEKhYwZinYJw4FpzSaJ6_Q@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211118011542.1420131-1-kaleshsingh@google.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Fixes: 8b5d46fd7a ("tracing/histogram: Optimize division by constants")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-18 13:53:43 -05:00
Dmitrii Banshchikov
5e0bc3082e bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
Use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in tracing
progs may result in locking issues.

bpf_ktime_get_coarse_ns() uses ktime_get_coarse_ns() time accessor that
isn't safe for any context:
======================================================
WARNING: possible circular locking dependency detected
5.15.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.4/14877 is trying to acquire lock:
ffffffff8cb30008 (tk_core.seq.seqcount){----}-{0:0}, at: ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255

but task is already holding lock:
ffffffff90dbf200 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_deactivate+0x61/0x400 lib/debugobjects.c:735

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&obj_hash[i].lock){-.-.}-{2:2}:
       lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0xd1/0x120 kernel/locking/spinlock.c:162
       __debug_object_init+0xd9/0x1860 lib/debugobjects.c:569
       debug_hrtimer_init kernel/time/hrtimer.c:414 [inline]
       debug_init kernel/time/hrtimer.c:468 [inline]
       hrtimer_init+0x20/0x40 kernel/time/hrtimer.c:1592
       ntp_init_cmos_sync kernel/time/ntp.c:676 [inline]
       ntp_init+0xa1/0xad kernel/time/ntp.c:1095
       timekeeping_init+0x512/0x6bf kernel/time/timekeeping.c:1639
       start_kernel+0x267/0x56e init/main.c:1030
       secondary_startup_64_no_verify+0xb1/0xbb

-> #0 (tk_core.seq.seqcount){----}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3051 [inline]
       check_prevs_add kernel/locking/lockdep.c:3174 [inline]
       validate_chain+0x1dfb/0x8240 kernel/locking/lockdep.c:3789
       __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5015
       lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
       seqcount_lockdep_reader_access+0xfe/0x230 include/linux/seqlock.h:103
       ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255
       ktime_get_coarse include/linux/timekeeping.h:120 [inline]
       ktime_get_coarse_ns include/linux/timekeeping.h:126 [inline]
       ____bpf_ktime_get_coarse_ns kernel/bpf/helpers.c:173 [inline]
       bpf_ktime_get_coarse_ns+0x7e/0x130 kernel/bpf/helpers.c:171
       bpf_prog_a99735ebafdda2f1+0x10/0xb50
       bpf_dispatcher_nop_func include/linux/bpf.h:721 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       BPF_PROG_RUN_ARRAY include/linux/bpf.h:1294 [inline]
       trace_call_bpf+0x2cf/0x5d0 kernel/trace/bpf_trace.c:127
       perf_trace_run_bpf_submit+0x7b/0x1d0 kernel/events/core.c:9708
       perf_trace_lock+0x37c/0x440 include/trace/events/lock.h:39
       trace_lock_release+0x128/0x150 include/trace/events/lock.h:58
       lock_release+0x82/0x810 kernel/locking/lockdep.c:5636
       __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:149 [inline]
       _raw_spin_unlock_irqrestore+0x75/0x130 kernel/locking/spinlock.c:194
       debug_hrtimer_deactivate kernel/time/hrtimer.c:425 [inline]
       debug_deactivate kernel/time/hrtimer.c:481 [inline]
       __run_hrtimer kernel/time/hrtimer.c:1653 [inline]
       __hrtimer_run_queues+0x2f9/0xa60 kernel/time/hrtimer.c:1749
       hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1811
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
       __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1103
       sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
       __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
       _raw_spin_unlock_irqrestore+0xd4/0x130 kernel/locking/spinlock.c:194
       try_to_wake_up+0x702/0xd20 kernel/sched/core.c:4118
       wake_up_process kernel/sched/core.c:4200 [inline]
       wake_up_q+0x9a/0xf0 kernel/sched/core.c:953
       futex_wake+0x50f/0x5b0 kernel/futex/waitwake.c:184
       do_futex+0x367/0x560 kernel/futex/syscalls.c:127
       __do_sys_futex kernel/futex/syscalls.c:199 [inline]
       __se_sys_futex+0x401/0x4b0 kernel/futex/syscalls.c:180
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

There is a possible deadlock with bpf_timer_* set of helpers:
hrtimer_start()
  lock_base();
  trace_hrtimer...()
    perf_event()
      bpf_run()
        bpf_timer_start()
          hrtimer_start()
            lock_base()         <- DEADLOCK

Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in
BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT
and BPF_PROG_TYPE_RAW_TRACEPOINT prog types.

Fixes: d055126180 ("bpf: Add bpf_ktime_get_coarse_ns helper")
Fixes: b00628b1c7 ("bpf: Introduce bpf timers.")
Reported-by: syzbot+43fd005b5a1b4d10781e@syzkaller.appspotmail.com
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211113142227.566439-2-me@ubique.spb.ru
2021-11-15 20:35:58 -08:00
Jakub Kicinski
a5bdc36354 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-11-15

We've added 72 non-merge commits during the last 13 day(s) which contain
a total of 171 files changed, 2728 insertions(+), 1143 deletions(-).

The main changes are:

1) Add btf_type_tag attributes to bring kernel annotations like __user/__rcu to
   BTF such that BPF verifier will be able to detect misuse, from Yonghong Song.

2) Big batch of libbpf improvements including various fixes, future proofing APIs,
   and adding a unified, OPTS-based bpf_prog_load() low-level API, from Andrii Nakryiko.

3) Add ingress_ifindex to BPF_SK_LOOKUP program type for selectively applying the
   programmable socket lookup logic to packets from a given netdev, from Mark Pashmfouroush.

4) Remove the 128M upper JIT limit for BPF programs on arm64 and add selftest to
   ensure exception handling still works, from Russell King and Alan Maguire.

5) Add a new bpf_find_vma() helper for tracing to map an address to the backing
   file such as shared library, from Song Liu.

6) Batch of various misc fixes to bpftool, fixing a memory leak in BPF program dump,
   updating documentation and bash-completion among others, from Quentin Monnet.

7) Deprecate libbpf bpf_program__get_prog_info_linear() API and migrate its users as
   the API is heavily tailored around perf and is non-generic, from Dave Marchevsky.

8) Enable libbpf's strict mode by default in bpftool and add a --legacy option as an
   opt-out for more relaxed BPF program requirements, from Stanislav Fomichev.

9) Fix bpftool to use libbpf_get_error() to check for errors, from Hengqi Chen.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpftool: Use libbpf_get_error() to check error
  bpftool: Fix mixed indentation in documentation
  bpftool: Update the lists of names for maps and prog-attach types
  bpftool: Fix indent in option lists in the documentation
  bpftool: Remove inclusion of utilities.mak from Makefiles
  bpftool: Fix memory leak in prog_dump()
  selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning
  selftests/bpf: Fix an unused-but-set-variable compiler warning
  bpf: Introduce btf_tracing_ids
  bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs
  bpftool: Enable libbpf's strict mode by default
  docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support
  selftests/bpf: Clarify llvm dependency with btf_tag selftest
  selftests/bpf: Add a C test for btf_type_tag
  selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c
  selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication
  selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests
  selftests/bpf: Test libbpf API function btf__add_type_tag()
  bpftool: Support BTF_KIND_TYPE_TAG
  libbpf: Support BTF_KIND_TYPE_TAG
  ...
====================

Link: https://lore.kernel.org/r/20211115162008.25916-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-15 08:49:23 -08:00
Linus Torvalds
8ab7745879 Update to tracing histogram variable string copy
A fix to only copy the size of the field to the histogram string
 did not take into account that the size can be larger than the
 storage.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYZHGYBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qi4RAP9Lr7RqTRQQ3C9BHZfCmIgwZtAqT+Z4
 U+nHva6FcI9ufQEAtWAAHleVHUcfVB90mahMFSEnJ7yESKC3k1ZKXsTsYwo=
 =X961
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Update to tracing histogram variable string copy

  A fix to only copy the size of the field to the histogram string did
  not take into account that the size can be larger than the storage"

* tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add length protection to histogram string copies
2021-11-14 19:07:19 -08:00
Steven Rostedt (VMware)
938aa33f14 tracing: Add length protection to histogram string copies
The string copies to the histogram storage has a max size of 256 bytes
(defined by MAX_FILTER_STR_VAL). Only the string size of the event field
needs to be copied to the event storage, but no more than what is in the
event storage. Although nothing should be bigger than 256 bytes, there's
no protection against overwriting of the storage if one day there is.

Copy no more than the destination size, and enforce it.

Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the
min() comparison of the string sizes of comparable types.

Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 63f84ae6b8 ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-14 21:21:08 -05:00
Linus Torvalds
7c3737c706 Three tracing fixes:
- Make local osnoise_instances static
 
  - Copy just actual size of histogram strings
 
  - Properly check missing operands in histogram expressions
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYY++DxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qn93AQD9sBFtm7D/90P8KMp/yl75OTd1InGm
 uZPOioR/itFXBwD6A4Y4xbpN0aWByM4P31pqFjZRxY0wmInHw3fkd8EjmQM=
 =LgAs
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Three tracing fixes:

   - Make local osnoise_instances static

   - Copy just actual size of histogram strings

   - Properly check missing operands in histogram expressions"

* tag 'trace-v5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/histogram: Fix check for missing operands in an expression
  tracing/histogram: Do not copy the fixed-size char array field over the field size
  tracing/osnoise: Make osnoise_instances static
2021-11-13 10:11:51 -08:00
Kalesh Singh
1cab6bce42 tracing/histogram: Fix check for missing operands in an expression
If a binary operation is detected while parsing an expression string,
the operand strings are deduced by splitting the experssion string at
the position of the detected binary operator. Both operand strings are
sub-strings (can be empty string) of the expression string but will
never be NULL.

Currently a NULL check is used for missing operands, fix this by
checking for empty strings instead.

Link: https://lkml.kernel.org/r/20211112191324.1302505-1-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Fixes: 9710b2f341 ("tracing: Fix operator precedence for hist triggers expression")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-12 15:55:59 -05:00
Masami Hiramatsu
63f84ae6b8 tracing/histogram: Do not copy the fixed-size char array field over the field size
Do not copy the fixed-size char array field of the events over
the field size. The histogram treats char array as a string and
there are 2 types of char array in the event, fixed-size and
dynamic string. The dynamic string (__data_loc) field must be
null terminated, but the fixed-size char array field may not
be null terminated (not a string, but just a data).
In that case, histogram can copy the data after the field.
This uses the original field size for fixed-size char array
field to restrict the histogram not to access over the original
field size.

Link: https://lkml.kernel.org/r/163673292822.195747.3696966210526410250.stgit@devnote2

Fixes: 02205a6752 (tracing: Add support for 'field variables')
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-12 15:48:46 -05:00
Song Liu
d19ddb476a bpf: Introduce btf_tracing_ids
Similar to btf_sock_ids, btf_tracing_ids provides btf ID for task_struct,
file, and vm_area_struct via easy to understand format like
btf_tracing_ids[BTF_TRACING_TYPE_[TASK|file|VMA]].

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211112150243.1270987-3-songliubraving@fb.com
2021-11-12 10:19:09 -08:00
Daniel Bristot de Oliveira
d7458bc0d8 tracing/osnoise: Make osnoise_instances static
Make the struct list_head osnoise_instances definition static.

Link: https://lore.kernel.org/all/202111120052.ZuikQSJi-lkp@intel.com/
Link: https://lkml.kernel.org/r/d001f0eeac66e2b2eeec7d2a15e9e7abede0453a.1636667971.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Fixes: dae181349f ("tracing/osnoise: Support a list of trace_array *tr")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-12 09:25:59 -05:00
Linus Torvalds
600b18f88f Two tracing fixes:
- Add mutex protection to ring_buffer_reset()
 
 - Fix deadlock in modify_ftrace_direct_multi()
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYY0ivBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhAvAPsHNmAXJ32HuOgVrTCm4WxOSDdukri+
 E5KirCzp0jtQQwEAxwz8neUalfZ8RQyHdpDe9vP9Ay0lCjbfrPxD0DUtiQE=
 =VwcI
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Two locking fixes:

   - Add mutex protection to ring_buffer_reset()

   - Fix deadlock in modify_ftrace_direct_multi()"

* tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/direct: Fix lockup in modify_ftrace_direct_multi
  ring-buffer: Protect ring_buffer_reset() from reentrancy
2021-11-11 10:16:33 -08:00
Jiri Olsa
2e6e9058d1 ftrace/direct: Fix lockup in modify_ftrace_direct_multi
We can't call unregister_ftrace_function under ftrace_lock.

Link: https://lkml.kernel.org/r/20211109114217.1645296-1-jolsa@kernel.org

Fixes: ed29271894 ("ftrace/direct: Do not disable when switching direct callers")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-10 11:56:43 -05:00
Steven Rostedt (VMware)
51d1579466 ring-buffer: Protect ring_buffer_reset() from reentrancy
The resetting of the entire ring buffer use to simply go through and reset
each individual CPU buffer that had its own protection and synchronization.
But this was very slow, due to performing a synchronization for each CPU.
The code was reshuffled to do one disabling of all CPU buffers, followed
by a single RCU synchronization, and then the resetting of each of the CPU
buffers. But unfortunately, the mutex that prevented multiple occurrences
of resetting the buffer was not moved to the upper function, and there is
nothing to protect from it.

Take the ring buffer mutex around the global reset.

Cc: stable@vger.kernel.org
Fixes: b23d7a5f4a ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
Reported-by: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-10 11:56:29 -05:00
Linus Torvalds
59a2ceeef6 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "87 patches.

  Subsystems affected by this patch series: mm (pagecache and hugetlb),
  procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs,
  init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork,
  sysvfs, kcov, gdb, resource, selftests, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits)
  ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
  ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
  selftests/kselftest/runner/run_one(): allow running non-executable files
  virtio-mem: disallow mapping virtio-mem memory via /dev/mem
  kernel/resource: disallow access to exclusive system RAM regions
  kernel/resource: clean up and optimize iomem_is_exclusive()
  scripts/gdb: handle split debug for vmlinux
  kcov: replace local_irq_save() with a local_lock_t
  kcov: avoid enable+disable interrupts if !in_task()
  kcov: allocate per-CPU memory on the relevant node
  Documentation/kcov: define `ip' in the example
  Documentation/kcov: include types.h in the example
  sysv: use BUILD_BUG_ON instead of runtime check
  kernel/fork.c: unshare(): use swap() to make code cleaner
  seq_file: fix passing wrong private data
  seq_file: move seq_escape() to a header
  signal: remove duplicate include in signal.h
  crash_dump: remove duplicate include in crash_dump.h
  crash_dump: fix boolreturn.cocci warning
  hfs/hfsplus: use WARN_ON for sanity check
  ...
2021-11-09 10:11:53 -08:00
Kefeng Wang
a20deb3a34 sections: move and rename core_kernel_data() to is_kernel_core_data()
Move core_kernel_data() into sections.h and rename it to
is_kernel_core_data(), also make it return bool value, then update all the
callers.

Link: https://lkml.kernel.org/r/20210930071143.63410-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:50 -08:00
Song Liu
7c7e3d31e7 bpf: Introduce helper bpf_find_vma
In some profiler use cases, it is necessary to map an address to the
backing file, e.g., a shared library. bpf_find_vma helper provides a
flexible way to achieve this. bpf_find_vma maps an address of a task to
the vma (vm_area_struct) for this address, and feed the vma to an callback
BPF function. The callback function is necessary here, as we need to
ensure mmap_sem is unlocked.

It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
safely when irqs are disable, we use the same mechanism as stackmap with
build_id. Specifically, when irqs are disabled, the unlocked is postponed
in an irq_work. Refactor stackmap.c so that the irq_work is shared among
bpf_find_vma and stackmap helpers.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211105232330.1936330-2-songliubraving@fb.com
2021-11-07 11:54:51 -08:00
Linus Torvalds
e1fd0b2acd Second set of tracing updates for 5.16:
- osnoise and timerlat updates that will work with the RTLA tool (Real-Time
   Linux Analysis). Specifically it disconnects the work load (threads
   that look for latency) from the tracing instances attached to them,
   allowing for more than one instance to retrieve data from the work load.
 
 - Optimization on division in the trace histogram trigger code to use shift
   and multiply when possible. Also added documentation.
 
 - Fix prototype to my_direct_func in direct ftrace trampoline sample code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYKWXxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qqJEAP9czpSZ/nFvDjxdGHZAcKKXCFWbGcK5
 IF2cHDDwxXjZ/gD+NnpRhR1JPfA55fO52DUJPn2cOU5xOsP6DmJxu6mwDg0=
 =AKVv
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:

 - osnoise and timerlat updates that will work with the RTLA tool
   (Real-Time Linux Analysis).

   Specifically it disconnects the work load (threads that look for
   latency) from the tracing instances attached to them, allowing for
   more than one instance to retrieve data from the work load.

 - Optimization on division in the trace histogram trigger code to use
   shift and multiply when possible. Also added documentation.

 - Fix prototype to my_direct_func in direct ftrace trampoline sample
   code.

* tag 'trace-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/samples: Add missing prototype for my_direct_func
  tracing/selftests: Add tests for hist trigger expression parsing
  tracing/histogram: Document hist trigger variables
  tracing/histogram: Update division by 0 documentation
  tracing/histogram: Optimize division by constants
  tracing/osnoise: Remove PREEMPT_RT ifdefs from inside functions
  tracing/osnoise: Remove STACKTRACE ifdefs from inside functions
  tracing/osnoise: Allow multiple instances of the same tracer
  tracing/osnoise: Remove TIMERLAT ifdefs from inside functions
  tracing/osnoise: Support a list of trace_array *tr
  tracing/osnoise: Use start/stop_per_cpu_kthreads() on osnoise_cpus_write()
  tracing/osnoise: Split workload start from the tracer start
  tracing/osnoise: Improve comments about barrier need for NMI callbacks
  tracing/osnoise: Do not follow tracing_cpumask
2021-11-03 09:08:47 -07:00
Linus Torvalds
fc02cb2b37 Core:
- Remove socket skb caches
 
  - Add a SO_RESERVE_MEM socket op to forward allocate buffer space
    and avoid memory accounting overhead on each message sent
 
  - Introduce managed neighbor entries - added by control plane and
    resolved by the kernel for use in acceleration paths (BPF / XDP
    right now, HW offload users will benefit as well)
 
  - Make neighbor eviction on link down controllable by userspace
    to work around WiFi networks with bad roaming implementations
 
  - vrf: Rework interaction with netfilter/conntrack
 
  - fq_codel: implement L4S style ce_threshold_ect1 marking
 
  - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
 
 BPF:
 
  - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging
    as implemented in LLVM14
 
  - Introduce bpf_get_branch_snapshot() to capture Last Branch Records
 
  - Implement variadic trace_printk helper
 
  - Add a new Bloomfilter map type
 
  - Track <8-byte scalar spill and refill
 
  - Access hw timestamp through BPF's __sk_buff
 
  - Disallow unprivileged BPF by default
 
  - Document BPF licensing
 
 Netfilter:
 
  - Introduce egress hook for looking at raw outgoing packets
 
  - Allow matching on and modifying inner headers / payload data
 
  - Add NFT_META_IFTYPE to match on the interface type either from
    ingress or egress
 
 Protocols:
 
  - Multi-Path TCP:
    - increase default max additional subflows to 2
    - rework forward memory allocation
    - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS
 
  - MCTP flow support allowing lower layer drivers to configure msg
    muxing as needed
 
  - Automatic Multicast Tunneling (AMT) driver based on RFC7450
 
  - HSR support the redbox supervision frames (IEC-62439-3:2018)
 
  - Support for the ip6ip6 encapsulation of IOAM
 
  - Netlink interface for CAN-FD's Transmitter Delay Compensation
 
  - Support SMC-Rv2 eliminating the current same-subnet restriction,
    by exploiting the UDP encapsulation feature of RoCE adapters
 
  - TLS: add SM4 GCM/CCM crypto support
 
  - Bluetooth: initial support for link quality and audio/codec
    offload
 
 Driver APIs:
 
  - Add a batched interface for RX buffer allocation in AF_XDP
    buffer pool
 
  - ethtool: Add ability to control transceiver modules' power mode
 
  - phy: Introduce supported interfaces bitmap to express MAC
    capabilities and simplify PHY code
 
  - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks
 
 New drivers:
 
  - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89)
 
  - Ethernet driver for ASIX AX88796C SPI device (x88796c)
 
 Drivers:
 
  - Broadcom PHYs
    - support 72165, 7712 16nm PHYs
    - support IDDQ-SR for additional power savings
 
  - PHY support for QCA8081, QCA9561 PHYs
 
  - NXP DPAA2: support for IRQ coalescing
 
  - NXP Ethernet (enetc): support for software TCP segmentation
 
  - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of
    Gigabit-capable IP found on RZ/G2L SoC
 
  - Intel 100G Ethernet
    - support for eswitch offload of TC/OvS flow API, including
      offload of GRE, VxLAN, Geneve tunneling
    - support application device queues - ability to assign Rx and Tx
      queues to application threads
    - PTP and PPS (pulse-per-second) extensions
 
  - Broadcom Ethernet (bnxt)
    - devlink health reporting and device reload extensions
 
  - Mellanox Ethernet (mlx5)
    - offload macvlan interfaces
    - support HW offload of TC rules involving OVS internal ports
    - support HW-GRO and header/data split
    - support application device queues
 
  - Marvell OcteonTx2:
    - add XDP support for PF
    - add PTP support for VF
 
  - Qualcomm Ethernet switch (qca8k): support for QCA8328
 
  - Realtek Ethernet DSA switch (rtl8366rb)
    - support bridge offload
    - support STP, fast aging, disabling address learning
    - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch
 
  - Mellanox Ethernet/IB switch (mlxsw)
    - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping)
    - offload root TBF qdisc as port shaper
    - support multiple routing interface MAC address prefixes
    - support for IP-in-IP with IPv6 underlay
 
  - MediaTek WiFi (mt76)
    - mt7921 - ASPM, 6GHz, SDIO and testmode support
    - mt7915 - LED and TWT support
 
  - Qualcomm WiFi (ath11k)
    - include channel rx and tx time in survey dump statistics
    - support for 80P80 and 160 MHz bandwidths
    - support channel 2 in 6 GHz band
    - spectral scan support for QCN9074
    - support for rx decapsulation offload (data frames in 802.3
      format)
 
  - Qualcomm phone SoC WiFi (wcn36xx)
    - enable Idle Mode Power Save (IMPS) to reduce power consumption
      during idle
 
  - Bluetooth driver support for MediaTek MT7922 and MT7921
 
  - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x
    and Realtek 8822C/8852A
 
  - Microsoft vNIC driver (mana)
    - support hibernation and kexec
 
  - Google vNIC driver (gve)
    - support for jumbo frames
    - implement Rx page reuse
 
 Refactor:
 
  - Make all writes to netdev->dev_addr go thru helpers, so that we
    can add this address to the address rbtree and handle the updates
 
  - Various TCP cleanups and optimizations including improvements
    to CPU cache use
 
  - Simplify the gnet_stats, Qdisc stats' handling and remove
    qdisc->running sequence counter
 
  - Driver changes and API updates to address devlink locking
    deficiencies
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGAzX4ACgkQMUZtbf5S
 IrvW3g//Q0ZLrOuHK9pZ8sCXMMhDj8qL6ajm0otMddHWA/+1UglwVBKFhsajfxOf
 wJ/5LZis+XKLpLqKTU5chKVfn39HuDGe/D3l+egi01Gv5BW0+XzEhagfyR5tJX5z
 wsGG5CXO/we/laVSzRiFtwwVEKHKN20YC+tIQwYOYP5Wy3q4G7qDsFhT7GqgsGCS
 n74QUEAIB5Tz0ODWFqLtbsySzIurXrskibwt5T9bvAAlPw/lCU68mmG+NVJ7VddO
 lBbNkLMOo8yW9Ci20H09SrYd4jZTmMARo9tsFO1tAvAMk7qpn0Wd8pnOYTjFFoMD
 +qjiFSVMh7E0JGb8Y7NCvwaB99suAK5rfGP68Xwe62DfP7vYWEx4pZGxBP19F4ld
 6Kn1ME33BX9rUF9tBecf0bdKfJUwB2Q2Xou/b9laG04bwiqsc9iG5FQq1C46lnLZ
 QdzNiS1My4dJMczkWt66HF3Kx30ibwHfvKMIHjf4PqkzEatkv6Y6SBZ57KXL+Lde
 0BQSFhbf0tm2Gf55etzrczLElI3uqHSFWUNZZ2Bt6WmzO1e6tpV9nAtRWF4C/dFg
 QDpLJtOOOY65uq+qz09zoPfv2lem868SrCAuFrVn99bEpYjx/CGNFDeEI02l6jyr
 84eUxd364UcbIk3fc+eTGdXHLQNVk30G0AHVBBxaWNIidwfqXeE=
 =srde
 -----END PGP SIGNATURE-----

Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Remove socket skb caches

   - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and
     avoid memory accounting overhead on each message sent

   - Introduce managed neighbor entries - added by control plane and
     resolved by the kernel for use in acceleration paths (BPF / XDP
     right now, HW offload users will benefit as well)

   - Make neighbor eviction on link down controllable by userspace to
     work around WiFi networks with bad roaming implementations

   - vrf: Rework interaction with netfilter/conntrack

   - fq_codel: implement L4S style ce_threshold_ect1 marking

   - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap()

  BPF:

   - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging
     as implemented in LLVM14

   - Introduce bpf_get_branch_snapshot() to capture Last Branch Records

   - Implement variadic trace_printk helper

   - Add a new Bloomfilter map type

   - Track <8-byte scalar spill and refill

   - Access hw timestamp through BPF's __sk_buff

   - Disallow unprivileged BPF by default

   - Document BPF licensing

  Netfilter:

   - Introduce egress hook for looking at raw outgoing packets

   - Allow matching on and modifying inner headers / payload data

   - Add NFT_META_IFTYPE to match on the interface type either from
     ingress or egress

  Protocols:

   - Multi-Path TCP:
      - increase default max additional subflows to 2
      - rework forward memory allocation
      - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS

   - MCTP flow support allowing lower layer drivers to configure msg
     muxing as needed

   - Automatic Multicast Tunneling (AMT) driver based on RFC7450

   - HSR support the redbox supervision frames (IEC-62439-3:2018)

   - Support for the ip6ip6 encapsulation of IOAM

   - Netlink interface for CAN-FD's Transmitter Delay Compensation

   - Support SMC-Rv2 eliminating the current same-subnet restriction, by
     exploiting the UDP encapsulation feature of RoCE adapters

   - TLS: add SM4 GCM/CCM crypto support

   - Bluetooth: initial support for link quality and audio/codec offload

  Driver APIs:

   - Add a batched interface for RX buffer allocation in AF_XDP buffer
     pool

   - ethtool: Add ability to control transceiver modules' power mode

   - phy: Introduce supported interfaces bitmap to express MAC
     capabilities and simplify PHY code

   - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks

  New drivers:

   - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89)

   - Ethernet driver for ASIX AX88796C SPI device (x88796c)

  Drivers:

   - Broadcom PHYs
      - support 72165, 7712 16nm PHYs
      - support IDDQ-SR for additional power savings

   - PHY support for QCA8081, QCA9561 PHYs

   - NXP DPAA2: support for IRQ coalescing

   - NXP Ethernet (enetc): support for software TCP segmentation

   - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of
     Gigabit-capable IP found on RZ/G2L SoC

   - Intel 100G Ethernet
      - support for eswitch offload of TC/OvS flow API, including
        offload of GRE, VxLAN, Geneve tunneling
      - support application device queues - ability to assign Rx and Tx
        queues to application threads
      - PTP and PPS (pulse-per-second) extensions

   - Broadcom Ethernet (bnxt)
      - devlink health reporting and device reload extensions

   - Mellanox Ethernet (mlx5)
      - offload macvlan interfaces
      - support HW offload of TC rules involving OVS internal ports
      - support HW-GRO and header/data split
      - support application device queues

   - Marvell OcteonTx2:
      - add XDP support for PF
      - add PTP support for VF

   - Qualcomm Ethernet switch (qca8k): support for QCA8328

   - Realtek Ethernet DSA switch (rtl8366rb)
      - support bridge offload
      - support STP, fast aging, disabling address learning
      - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch

   - Mellanox Ethernet/IB switch (mlxsw)
      - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping)
      - offload root TBF qdisc as port shaper
      - support multiple routing interface MAC address prefixes
      - support for IP-in-IP with IPv6 underlay

   - MediaTek WiFi (mt76)
      - mt7921 - ASPM, 6GHz, SDIO and testmode support
      - mt7915 - LED and TWT support

   - Qualcomm WiFi (ath11k)
      - include channel rx and tx time in survey dump statistics
      - support for 80P80 and 160 MHz bandwidths
      - support channel 2 in 6 GHz band
      - spectral scan support for QCN9074
      - support for rx decapsulation offload (data frames in 802.3
        format)

   - Qualcomm phone SoC WiFi (wcn36xx)
      - enable Idle Mode Power Save (IMPS) to reduce power consumption
        during idle

   - Bluetooth driver support for MediaTek MT7922 and MT7921

   - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and
     Realtek 8822C/8852A

   - Microsoft vNIC driver (mana)
      - support hibernation and kexec

   - Google vNIC driver (gve)
      - support for jumbo frames
      - implement Rx page reuse

  Refactor:

   - Make all writes to netdev->dev_addr go thru helpers, so that we can
     add this address to the address rbtree and handle the updates

   - Various TCP cleanups and optimizations including improvements to
     CPU cache use

   - Simplify the gnet_stats, Qdisc stats' handling and remove
     qdisc->running sequence counter

   - Driver changes and API updates to address devlink locking
     deficiencies"

* tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits)
  Revert "net: avoid double accounting for pure zerocopy skbs"
  selftests: net: add arp_ndisc_evict_nocarrier
  net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
  net: arp: introduce arp_evict_nocarrier sysctl parameter
  libbpf: Deprecate AF_XDP support
  kbuild: Unify options for BTF generation for vmlinux and modules
  selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
  bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
  bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
  net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c
  net: avoid double accounting for pure zerocopy skbs
  tcp: rename sk_wmem_free_skb
  netdevsim: fix uninit value in nsim_drv_configure_vfs()
  selftests/bpf: Fix also no-alu32 strobemeta selftest
  bpf: Add missing map_delete_elem method to bloom filter map
  selftests/bpf: Add bloom map success test for userspace calls
  bpf: Add alignment padding for "map_extra" + consolidate holes
  bpf: Bloom filter map naming fixups
  selftests/bpf: Add test cases for struct_ops prog
  bpf: Add dummy BPF STRUCT_OPS for test purpose
  ...
2021-11-02 06:20:58 -07:00
Linus Torvalds
79ef0c0014 Tracing updates for 5.16:
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
   dump happens from a kretprobe callback.
 
 - Fix to bootconfig parsing
 
 - Have tracefs allow owner and group permissions by default (only denying
   others). There's been pressure to allow non root to tracefs in a
   controlled fashion, and using groups is probably the safest.
 
 - Bootconfig memory managament updates.
 
 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.
 
 - Allow perf to be traced by function tracer.
 
 - Rewrite of function graph tracer to be a callback from the function tracer
   instead of having its own trampoline (this change will happen on an arch
   by arch basis, and currently only x86_64 implements it).
 
 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.
 
 - Allow histogram triggers to add variables that can perform calculations
   against the event's fields.
 
 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent warnings
   from the compiler.
 
 - Extend histogram triggers to key off of variables.
 
 - Have trace recursion use bit magic to determine preempt context over if
   branches.
 
 - Have trace recursion disable preemption as all use cases do anyway.
 
 - Added testing for verification of tracing utilities.
 
 - Various small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
 Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
 =Loz8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - kprobes: Restructured stack unwinder to show properly on x86 when a
   stack dump happens from a kretprobe callback.

 - Fix to bootconfig parsing

 - Have tracefs allow owner and group permissions by default (only
   denying others). There's been pressure to allow non root to tracefs
   in a controlled fashion, and using groups is probably the safest.

 - Bootconfig memory managament updates.

 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.

 - Allow perf to be traced by function tracer.

 - Rewrite of function graph tracer to be a callback from the function
   tracer instead of having its own trampoline (this change will happen
   on an arch by arch basis, and currently only x86_64 implements it).

 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.

 - Allow histogram triggers to add variables that can perform
   calculations against the event's fields.

 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent
   warnings from the compiler.

 - Extend histogram triggers to key off of variables.

 - Have trace recursion use bit magic to determine preempt context over
   if branches.

 - Have trace recursion disable preemption as all use cases do anyway.

 - Added testing for verification of tracing utilities.

 - Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
  tracing/histogram: Fix semicolon.cocci warnings
  tracing/histogram: Fix documentation inline emphasis warning
  tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
  tracing: Show size of requested perf buffer
  bootconfig: Initialize ret in xbc_parse_tree()
  ftrace: do CPU checking after preemption disabled
  ftrace: disable preemption when recursion locked
  tracing/histogram: Document expression arithmetic and constants
  tracing/histogram: Optimize division by a power of 2
  tracing/histogram: Covert expr to const if both operands are constants
  tracing/histogram: Simplify handling of .sym-offset in expressions
  tracing: Fix operator precedence for hist triggers expression
  tracing: Add division and multiplication support for hist triggers
  tracing: Add support for creating hist trigger variables from literal
  selftests/ftrace: Stop tracing while reading the trace file by default
  MAINTAINERS: Update KPROBES and TRACING entries
  test_kprobes: Move it from kernel/ to lib/
  docs, kprobes: Remove invalid URL and add new reference
  samples/kretprobes: Fix return value if register_kretprobe() failed
  lib/bootconfig: Fix the xbc_get_info kerneldoc
  ...
2021-11-01 20:05:19 -07:00
Jakub Kicinski
b7b98f8689 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-11-01

We've added 181 non-merge commits during the last 28 day(s) which contain
a total of 280 files changed, 11791 insertions(+), 5879 deletions(-).

The main changes are:

1) Fix bpf verifier propagation of 64-bit bounds, from Alexei.

2) Parallelize bpf test_progs, from Yucong and Andrii.

3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus.

4) Improve bpf selftests on s390, from Ilya.

5) bloomfilter bpf map type, from Joanne.

6) Big improvements to JIT tests especially on Mips, from Johan.

7) Support kernel module function calls from bpf, from Kumar.

8) Support typeless and weak ksym in light skeleton, from Kumar.

9) Disallow unprivileged bpf by default, from Pawan.

10) BTF_KIND_DECL_TAG support, from Yonghong.

11) Various bpftool cleanups, from Quentin.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits)
  libbpf: Deprecate AF_XDP support
  kbuild: Unify options for BTF generation for vmlinux and modules
  selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
  bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
  bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
  selftests/bpf: Fix also no-alu32 strobemeta selftest
  bpf: Add missing map_delete_elem method to bloom filter map
  selftests/bpf: Add bloom map success test for userspace calls
  bpf: Add alignment padding for "map_extra" + consolidate holes
  bpf: Bloom filter map naming fixups
  selftests/bpf: Add test cases for struct_ops prog
  bpf: Add dummy BPF STRUCT_OPS for test purpose
  bpf: Factor out helpers for ctx access checking
  bpf: Factor out a helper to prepare trampoline for struct_ops prog
  selftests, bpf: Fix broken riscv build
  riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h
  tools, build: Add RISC-V to HOSTARCH parsing
  riscv, bpf: Increase the maximum number of iterations
  selftests, bpf: Add one test for sockmap with strparser
  selftests, bpf: Fix test_txmsg_ingress_parser error
  ...
====================

Link: https://lore.kernel.org/r/20211102013123.9005-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01 19:59:46 -07:00
Kalesh Singh
6a6e5ef2b2 tracing/histogram: Document hist trigger variables
Update the tracefs README to describe how hist trigger variables
can be created.

Link: https://lkml.kernel.org/r/20211029183339.3216491-4-kaleshsingh@google.com

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 20:46:52 -04:00
Kalesh Singh
8b5d46fd7a tracing/histogram: Optimize division by constants
If the divisor is a constant use specific division functions to
avoid extra branches when the trigger is hit.

If the divisor constant but not a power of 2, the division can be
replaced with a multiplication and shift in the following case:

Let X = dividend and Y = divisor.

Choose Z = some power of 2. If Y <= Z, then:
    X / Y = (X * (Z / Y)) / Z

(Z / Y) is a constant (mult) which is calculated at parse time, so:
    X / Y = (X * mult) / Z

The division by Z can be replaced by a shift since Z is a power of 2:
    X / Y = (X * mult) >> shift

As long, as X < Z the results will not be off by more than 1.

Link: https://lkml.kernel.org/r/20211029232410.3494196-1-kaleshsingh@google.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 20:45:11 -04:00
Daniel Bristot de Oliveira
01e181c776 tracing/osnoise: Remove PREEMPT_RT ifdefs from inside functions
Remove CONFIG_PREEMPT_RT from inside functions, avoiding
compilation problems in the future.

Link: https://lkml.kernel.org/r/37ee0881b033cdc513efc84ebea26cf77880c8c2.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:37 -04:00
Daniel Bristot de Oliveira
b14f4568d3 tracing/osnoise: Remove STACKTRACE ifdefs from inside functions
Remove CONFIG_STACKTRACE from inside functions, avoiding
compilation problems in the future.

Link: https://lkml.kernel.org/r/3465cca2f28e1ba602a1fc8bdb28d12950b5226e.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:26 -04:00
Daniel Bristot de Oliveira
2fac8d6486 tracing/osnoise: Allow multiple instances of the same tracer
Currently, the user can start only one instance of timerlat/osnoise
tracers and the tracers cannot run in parallel.

As starting point to add more flexibility, let's allow the same tracer to
run on different trace instances. The workload will start when the first
trace_array (instance) is registered and stop when the last instance
is unregistered.

So, while this patch allows the same tracer to run in multiple
instances (e.g., two instances running osnoise), it still does not allow
instances of timerlat and osnoise in parallel (e.g., one timerlat and
osnoise). That is because the osnoise: events have different behavior
depending on which tracer is enabled (osnoise or timerlat). Enabling
the parallel usage of these two tracers is in my TODO list.

Link: https://lkml.kernel.org/r/38c8f14b613492a4f3f938d9d3bf0b063b72f0f0.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:19 -04:00
Daniel Bristot de Oliveira
ccb6754495 tracing/osnoise: Remove TIMERLAT ifdefs from inside functions
Remove CONFIG_TIMERLAT_TRACER from inside functions, avoiding
compilation problems in the future.

Link: https://lkml.kernel.org/r/8245abb5a112d249f5da6c1df499244ad9e647bc.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:15 -04:00
Daniel Bristot de Oliveira
dae181349f tracing/osnoise: Support a list of trace_array *tr
osnoise/timerlat were built to run a single instance, and for this,
a single variable is enough to store the current struct trace_array
*tr with information about the tracing instance. This is done via
the *osnoise_trace variable. A trace_array represents a trace instance.

In preparation to support multiple instances, replace the
*osnoise_trace variable with an RCU protected list of instances.

The operations that refer to an instance now propagate to all
elements of the list (all instances).

Also, replace the osnoise_busy variable with a check if the list
has elements (busy).

No functional change is expected with this patch, i.e., only one
instance is allowed yet.

Link: https://lkml.kernel.org/r/91d006e889b9a5d1ff258fe6077f021ae3f26372.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:09 -04:00
Daniel Bristot de Oliveira
2bd1bdf01f tracing/osnoise: Use start/stop_per_cpu_kthreads() on osnoise_cpus_write()
When writing a new CPU mask via osnoise/cpus, if the tracer is running,
the workload is restarted to follow the new cpumask. The restart is
currently done using osnoise_workload_start/stop(), which disables the
workload *and* the instrumentation. However, disabling the
instrumentation is not necessary.

Calling start/stop_per_cpu_kthreads() is enough to apply the new
osnoise/cpus config.

Link: https://lkml.kernel.org/r/ee633e82867c5b88851aa6040522a799c0034486.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:05 -04:00
Daniel Bristot de Oliveira
15ca4bdb03 tracing/osnoise: Split workload start from the tracer start
In preparation from supporting multiple trace instances, create
workload start/stop specific functions.

No functional change.

Link: https://lkml.kernel.org/r/74b090971e9acdd13625be1c28ef3270d2275e77.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:00 -04:00
Daniel Bristot de Oliveira
c3b6343c0d tracing/osnoise: Improve comments about barrier need for NMI callbacks
trace_osnoise_callback_enabled is used by ftrace_nmi_enter/exit()
to know when to call the NMI callback. The barrier is used to
avoid having callbacks enabled before the resetting date during
the start or to touch the values after stopping the tracer.

Link: https://lkml.kernel.org/r/a413b8f14aa9312fbd1ba99f96225a8aed831053.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:09:52 -04:00
Daniel Bristot de Oliveira
66df27f19f tracing/osnoise: Do not follow tracing_cpumask
In preparation to support multiple instances, decouple the
osnoise/timelat workload from instance-specific tracing_cpumask.

Different instances can have conflicting cpumasks, making osnoise
workload management needlessly complex. Osnoise already has its
global cpumask.

I also thought about using the first instance mask, but the
"first" instance could be removed before the others.

This also fixes the problem that changing the tracing_mask was not
re-starting the trace.

Link: https://lkml.kernel.org/r/169a71bcc919ce3ab53ae6f9ca5cde57fffaf9c6.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:09:31 -04:00
Hou Tao
35346ab641 bpf: Factor out helpers for ctx access checking
Factor out two helpers to check the read access of ctx for raw tp
and BTF function. bpf_tracing_ctx_access() is used to check
the read access to argument is valid, and bpf_tracing_btf_ctx_access()
checks whether the btf type of argument is valid besides the checking
of argument read. bpf_tracing_btf_ctx_access() will be used by the
following patch.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211025064025.2567443-3-houtao1@huawei.com
2021-11-01 14:10:00 -07:00
Linus Torvalds
33c8846c81 for-5.16/block-2021-10-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8KDgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmQ2D/wO0nH3U+3+OZChi3XUwYck9Dev3o6BANCF
 ClATiK/kivZY0xY1r8J4ixirZo2gcjIMpWSC3JGYZ5LdspfmYGLUbMjfZsaeU23i
 lAKaX1IqfArmHN76k3IU1bKCg7B0/LFwC0q9QTFWTSwNSs8RK/EZLJ61U1hEXUb3
 OfIpaMmvPiMaU7yuPqhcZK14m1cg1srrLM4rFB/PqsWWStF07pHq32WeArGDAU0e
 Fe0YSnYD7qqA5Qc37KwqjCTmmxKX5YZf7etIcA6p3DNmwcuQrVNzKoCH/ZEDijaD
 E2bS/BWbN1x96+rtoEZfBYEaNIrkmJzmW6+fJ53OITbJF3KqP6V66erhqNcFYCzC
 mhFlRe7voXb/8AP7zQqSIhK529BUBM36sQ6nF7EiQcDrfLc1z39mq6eblUxbknIA
 DDPISD5Tseik9N9x0bc7vINseKyHI1E90VAU/XKADcuGbzLvehPx+2p+Iq5ch5Ah
 oa1G3RdlWWQOZxphJHWJhu1qMfo5+FP9dFZj1aoo7b8Kbc/CedyoQe71cpIE5wNh
 Jj/EpWJnuyKXwuTic2VYGC+6ezM9O5DSdqCfP3YuZky95VESyvRCKJYMMgBYRVdC
 /LuxhnBXIY2G8An7ZTnX0kLCCvLbapIwa0NyA98/xeOngO843coJ6wn8ZmE9LJNH
 kMmpCygUrA==
 =QWC+
 -----END PGP SIGNATURE-----

Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - mq-deadline accounting improvements (Bart)

 - blk-wbt timer fix (Andrea)

 - Untangle the block layer includes (Christoph)

 - Rework the poll support to be bio based, which will enable adding
   support for polling for bio based drivers (Christoph)

 - Block layer core support for multi-actuator drives (Damien)

 - blk-crypto improvements (Eric)

 - Batched tag allocation support (me)

 - Request completion batching support (me)

 - Plugging improvements (me)

 - Shared tag set improvements (John)

 - Concurrent queue quiesce support (Ming)

 - Cache bdev in ->private_data for block devices (Pavel)

 - bdev dio improvements (Pavel)

 - Block device invalidation and block size improvements (Xie)

 - Various cleanups, fixes, and improvements (Christoph, Jackie,
   Masahira, Tejun, Yu, Pavel, Zheng, me)

* tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits)
  blk-mq-debugfs: Show active requests per queue for shared tags
  block: improve readability of blk_mq_end_request_batch()
  virtio-blk: Use blk_validate_block_size() to validate block size
  loop: Use blk_validate_block_size() to validate block size
  nbd: Use blk_validate_block_size() to validate block size
  block: Add a helper to validate the block size
  block: re-flow blk_mq_rq_ctx_init()
  block: prefetch request to be initialized
  block: pass in blk_mq_tags to blk_mq_rq_ctx_init()
  block: add rq_flags to struct blk_mq_alloc_data
  block: add async version of bio_set_polled
  block: kill DIO_MULTI_BIO
  block: kill unused polling bits in __blkdev_direct_IO()
  block: avoid extra iter advance with async iocb
  block: Add independent access ranges support
  blk-mq: don't issue request directly in case that current is to be blocked
  sbitmap: silence data race warning
  blk-cgroup: synchronize blkg creation against policy deactivation
  block: refactor bio_iov_bvec_set()
  block: add single bio async direct IO helper
  ...
2021-11-01 09:19:50 -07:00
kernel test robot
feea69ec12 tracing/histogram: Fix semicolon.cocci warnings
kernel/trace/trace_events_hist.c:6039:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Link: https://lkml.kernel.org/r/20211030005615.GA41257@3074f0d39c61

Fixes: c5eac6ee8b ("tracing/histogram: Simplify handling of .sym-offset in expressions")
CC: Kalesh Singh <kaleshsingh@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 22:12:38 -04:00
Steven Rostedt (VMware)
ddcf906fe5 tracing: Fix misspelling of "missing"
My snake instinct was on and I wrote "misssing" instead of "missing".

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 09:54:14 -04:00
Steven Rostedt (VMware)
6130722f11 ftrace: Fix kernel-doc formatting issues
Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.

Also, the "ftrace_is_dead()" had an incomplete description.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 09:52:23 -04:00
Jakub Kicinski
7df621a3ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  7b50ecfcc6 ("net: Rename ->stream_memory_read to ->sock_is_readable")
  4c1e34c0db ("vsock: Enable y2038 safe timeval for timeout")

drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
  0daa55d033 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
  e77bcdd1f6 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")

Adjacent code addition in both cases, keep both.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 10:43:58 -07:00
Steven Rostedt (VMware)
7fa598f970 tracing: Do not warn when connecting eprobe to non existing event
When the syscall trace points are not configured in, the kselftests for
ftrace will try to attach an event probe (eprobe) to one of the system
call trace points. This triggered a WARNING, because the failure only
expects to see memory issues. But this is not the only failure. The user
may attempt to attach to a non existent event, and the kernel must not
warn about it.

Link: https://lkml.kernel.org/r/20211027120854.0680aa0f@gandalf.local.home

Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 21:47:55 -04:00
Robin H. Johnson
a90afe8d02 tracing: Show size of requested perf buffer
If the perf buffer isn't large enough, provide a hint about how large it
needs to be for whatever is running.

Link: https://lkml.kernel.org/r/20210831043723.13481-1-robbat2@gentoo.org

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 12:25:09 -04:00
王贇
d33cc65737 ftrace: do CPU checking after preemption disabled
With CONFIG_DEBUG_PREEMPT we observed reports like:

  BUG: using smp_processor_id() in preemptible
  caller is perf_ftrace_function_call+0x6f/0x2e0
  CPU: 1 PID: 680 Comm: a.out Not tainted
  Call Trace:
   <TASK>
   dump_stack_lvl+0x8d/0xcf
   check_preemption_disabled+0x104/0x110
   ? optimize_nops.isra.7+0x230/0x230
   ? text_poke_bp_batch+0x9f/0x310
   perf_ftrace_function_call+0x6f/0x2e0
   ...
   __text_poke+0x5/0x620
   text_poke_bp_batch+0x9f/0x310

This telling us the CPU could be changed after task is preempted, and
the checking on CPU before preemption will be invalid.

Since now ftrace_test_recursion_trylock() will help to disable the
preemption, this patch just do the checking after trylock() to address
the issue.

Link: https://lkml.kernel.org/r/54880691-5fe2-33e7-d12f-1fa6136f5183@linux.alibaba.com

CC: Steven Rostedt <rostedt@goodmis.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 11:22:09 -04:00
王贇
ce5e48036c ftrace: disable preemption when recursion locked
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.

And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.

This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.

Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com

CC: Petr Mladek <pmladek@suse.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
[ Removed extra line in comment - SDR ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 11:21:49 -04:00
Kalesh Singh
722eddaa40 tracing/histogram: Optimize division by a power of 2
The division is a slow operation. If the divisor is a power of 2, use a
shift instead.

Results were obtained using Android's version of perf (simpleperf[1]) as
described below:

1. hist_field_div() is modified to call 2 test functions:
   test_hist_field_div_[not]_optimized(); passing them the
   same args. Use noinline and volatile to ensure these are
   not optimized out by the compiler.
2. Create a hist event trigger that uses division:
      events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>'
         >> trigger
      events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x'
         >> trigger
3. Run Android's lmkd_test[2] to generate rss_stat events, and
   record CPU samples with Android's simpleperf:
      simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g
         -f 2000 -o perf.data

== Results ==

Divisor is a power of 2 (divisor == 32):

   test_hist_field_div_not_optimized  | 8,717,091 cpu-cycles
   test_hist_field_div_optimized      | 1,643,137 cpu-cycles

If the divisor is a power of 2, the optimized version is ~5.3x faster.

Divisor is not a power of 2 (divisor == 33):

   test_hist_field_div_not_optimized  | 4,444,324 cpu-cycles
   test_hist_field_div_optimized      | 5,497,958 cpu-cycles

If the divisor is not a power of 2, as expected, the optimized version is
slightly slower (~24% slower).

[1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md
[2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp

Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:20 -04:00
Kalesh Singh
f47716b7a9 tracing/histogram: Covert expr to const if both operands are constants
If both operands of a hist trigger expression are constants, convert the
expression to a constant. This optimization avoids having to perform the
same calculation multiple times and also saves on memory since the
merged constants are represented by a single struct hist_field instead
or multiple.

Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
c5eac6ee8b tracing/histogram: Simplify handling of .sym-offset in expressions
The '-' in .sym-offset can confuse the hist trigger arithmetic
expression parsing. Simplify the handling of this by replacing the
'sym-offset' with 'symXoffset'. This allows us to correctly evaluate
expressions where the user may have inadvertently added a .sym-offset
modifier to one of the operands in an expression, instead of bailing
out. In this case the .sym-offset has no effect on the evaluation of the
expression. The only valid use of the .sym-offset is as a hist key
modifier.

Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
9710b2f341 tracing: Fix operator precedence for hist triggers expression
The current histogram expression evaluation logic evaluates the
expression from right to left. This can lead to incorrect results
if the operations are not associative (as is the case for subtraction
and, the now added, division operators).
	e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2
	     64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2

Division and multiplication are currently limited to single operation
expression due to operator precedence support not yet implemented.

Rework the expression parsing to support the correct evaluation of
expressions containing operators of different precedences; and fix
the associativity error by evaluating expressions with operators of
the same precedence from left to right.

Examples:
        (1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \
                  >> event/trigger
        (2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger
        (3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger
        (4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger

Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
bcef044150 tracing: Add division and multiplication support for hist triggers
Adds basic support for division and multiplication operations for
hist trigger variable expressions.

For simplicity this patch only supports, division and multiplication
for a single operation expression (e.g. x=$a/$b), as currently
expressions are always evaluated right to left. This can lead to some
incorrect results:

	e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger

	     8-4-2 should evaluate to 2 i.e. (8-4)-2
	     but currently x evaluate to  6 i.e. 8-(4-2)

Multiplication and division in sub-expressions will work correctly, once
correct operator precedence support is added (See next patch in this
series).

For the undefined case of division by 0, the histogram expression
evaluates to (u64)(-1). Since this cannot be detected when the
expression is created, it is the responsibility of the user to be
aware and account for this possibility.

Examples:
	echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \
                   >> event/trigger

	echo 'hist:keys=common_pid:y=5*$b' \
                   >> event/trigger

Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
52cfb37353 tracing: Add support for creating hist trigger variables from literal
Currently hist trigger expressions don't support the use of numeric
literals:
	e.g. echo 'hist:keys=common_pid:x=$y-1234'
		--> is not valid expression syntax

Having the ability to use numeric constants in hist triggers supports
a wider range of expressions for creating variables.

Add support for creating trace event histogram variables from numeric
literals.

	e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger

A negative numeric constant is created, using unary minus operator
(parentheses are required).

	e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger

Constants can be used with division/multiplication (added in the
next patch in this series) to implement granularity filters for frequent
trace events. For instance we can limit emitting the rss_stat
trace event to when there is a 512KB cross over in the rss size:

  # Create a synthetic event to monitor instead of the high frequency
  # rss_stat event
  echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr;
	int member; long size' >> tracing/synthetic_events

  # Create a hist trigger that emits the synthetic rss_stat_throttled
  # event only when the rss size crosses a 512KB boundary.
  echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket)
      .rss_stat_throttled(mm_id,curr,member,size)'
        >> events/kmem/rss_stat/trigger

A use case for using constants with addition/subtraction is not yet
known, but for completeness the use of constants are supported for all
operators.

Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Wang ShaoBo
1d62889142 tracing/hwlat: Make some internal symbols static
The sparse tool complains as follows:

kernel/trace/trace_hwlat.c:82:27: warning: symbol 'hwlat_single_cpu_data' was not declared. Should it be static?
kernel/trace/trace_hwlat.c:83:1: warning: symbol '__pcpu_scope_hwlat_per_cpu_data' was not declared. Should it be static?

This symbol is not used outside of trace_hwlat.c, so this commit
marks it static.

Link: https://lkml.kernel.org/r/20211021035225.1050685-1-bobo.shaobowang@huawei.com

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 09:18:28 -04:00