Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - BPF:
      - add syscall program type and libbpf support for generating
        instructions and bindings for in-kernel BPF loaders (BPF loaders
        for BPF), this is a stepping stone for signed BPF programs
      - infrastructure to migrate TCP child sockets from one listener to
        another in the same reuseport group/map to improve flexibility
        of service hand-off/restart
      - add broadcast support to XDP redirect

   - allow bypass of the lockless qdisc to improving performance (for
     pktgen: +23% with one thread, +44% with 2 threads)

   - add a simpler version of "DO_ONCE()" which does not require jump
     labels, intended for slow-path usage

   - virtio/vsock: introduce SOCK_SEQPACKET support

   - add getsocketopt to retrieve netns cookie

   - ip: treat lowest address of a IPv4 subnet as ordinary unicast
     address allowing reclaiming of precious IPv4 addresses

   - ipv6: use prandom_u32() for ID generation

   - ip: add support for more flexible field selection for hashing
     across multi-path routes (w/ offload to mlxsw)

   - icmp: add support for extended RFC 8335 PROBE (ping)

   - seg6: add support for SRv6 End.DT46 behavior

   - mptcp:
      - DSS checksum support (RFC 8684) to detect middlebox meddling
      - support Connection-time 'C' flag
      - time stamping support

   - sctp: packetization Layer Path MTU Discovery (RFC 8899)

   - xfrm: speed up state addition with seq set

   - WiFi:
      - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
      - aggregation handling improvements for some drivers
      - minstrel improvements for no-ack frames
      - deferred rate control for TXQs to improve reaction times
      - switch from round robin to virtual time-based airtime scheduler

   - add trace points:
      - tcp checksum errors
      - openvswitch - action execution, upcalls
      - socket errors via sk_error_report

  Device APIs:

   - devlink: add rate API for hierarchical control of max egress rate
     of virtual devices (VFs, SFs etc.)

   - don't require RCU read lock to be held around BPF hooks in NAPI
     context

   - page_pool: generic buffer recycling

  New hardware/drivers:

   - mobile:
      - iosm: PCIe Driver for Intel M.2 Modem
      - support for Qualcomm MSM8998 (ipa)

   - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices

   - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches

   - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)

   - NXP SJA1110 Automotive Ethernet 10-port switch

   - Qualcomm QCA8327 switch support (qca8k)

   - Mikrotik 10/25G NIC (atl1c)

  Driver changes:

   - ACPI support for some MDIO, MAC and PHY devices from Marvell and
     NXP (our first foray into MAC/PHY description via ACPI)

   - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx

   - Mellanox/Nvidia NIC (mlx5)
      - NIC VF offload of L2 bridging
      - support IRQ distribution to Sub-functions

   - Marvell (prestera):
      - add flower and match all
      - devlink trap
      - link aggregation

   - Netronome (nfp): connection tracking offload

   - Intel 1GE (igc): add AF_XDP support

   - Marvell DPU (octeontx2): ingress ratelimit offload

   - Google vNIC (gve): new ring/descriptor format support

   - Qualcomm mobile (rmnet & ipa): inline checksum offload support

   - MediaTek WiFi (mt76)
      - mt7915 MSI support
      - mt7915 Tx status reporting
      - mt7915 thermal sensors support
      - mt7921 decapsulation offload
      - mt7921 enable runtime pm and deep sleep

   - Realtek WiFi (rtw88)
      - beacon filter support
      - Tx antenna path diversity support
      - firmware crash information via devcoredump

   - Qualcomm WiFi (wcn36xx)
      - Wake-on-WLAN support with magic packets and GTK rekeying

   - Micrel PHY (ksz886x/ksz8081): add cable test support"

* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
  tcp: change ICSK_CA_PRIV_SIZE definition
  tcp_yeah: check struct yeah size at compile time
  gve: DQO: Fix off by one in gve_rx_dqo()
  stmmac: intel: set PCI_D3hot in suspend
  stmmac: intel: Enable PHY WOL option in EHL
  net: stmmac: option to enable PHY WOL with PMT enabled
  net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
  net: use netdev_info in ndo_dflt_fdb_{add,del}
  ptp: Set lookup cookie when creating a PTP PPS source.
  net: sock: add trace for socket errors
  net: sock: introduce sk_error_report
  net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
  net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
  net: dsa: include fdb entries pointing to bridge in the host fdb list
  net: dsa: include bridge addresses which are local in the host fdb list
  net: dsa: sync static FDB entries on foreign interfaces to hardware
  net: dsa: install the host MDB and FDB entries in the master's RX filter
  net: dsa: reference count the FDB addresses at the cross-chip notifier level
  net: dsa: introduce a separate cross-chip notifier type for host FDBs
  net: dsa: reference count the MDB entries at the cross-chip notifier level
  ...
This commit is contained in:
Linus Torvalds
2021-06-30 15:51:09 -07:00
1908 changed files with 108602 additions and 27721 deletions

View File

@@ -527,6 +527,15 @@ union bpf_iter_link_info {
* Look up an element with the given *key* in the map referred to
* by the file descriptor *fd*, and if found, delete the element.
*
* For **BPF_MAP_TYPE_QUEUE** and **BPF_MAP_TYPE_STACK** map
* types, the *flags* argument needs to be set to 0, but for other
* map types, it may be specified as:
*
* **BPF_F_LOCK**
* Look up and delete the value of a spin-locked map
* without returning the lock. This must be specified if
* the elements contain a spinlock.
*
* The **BPF_MAP_TYPE_QUEUE** and **BPF_MAP_TYPE_STACK** map types
* implement this command as a "pop" operation, deleting the top
* element rather than one corresponding to *key*.
@@ -536,6 +545,10 @@ union bpf_iter_link_info {
* This command is only valid for the following map types:
* * **BPF_MAP_TYPE_QUEUE**
* * **BPF_MAP_TYPE_STACK**
* * **BPF_MAP_TYPE_HASH**
* * **BPF_MAP_TYPE_PERCPU_HASH**
* * **BPF_MAP_TYPE_LRU_HASH**
* * **BPF_MAP_TYPE_LRU_PERCPU_HASH**
*
* Return
* Returns zero on success. On error, -1 is returned and *errno*
@@ -837,6 +850,7 @@ enum bpf_cmd {
BPF_PROG_ATTACH,
BPF_PROG_DETACH,
BPF_PROG_TEST_RUN,
BPF_PROG_RUN = BPF_PROG_TEST_RUN,
BPF_PROG_GET_NEXT_ID,
BPF_MAP_GET_NEXT_ID,
BPF_PROG_GET_FD_BY_ID,
@@ -937,6 +951,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_EXT,
BPF_PROG_TYPE_LSM,
BPF_PROG_TYPE_SK_LOOKUP,
BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
};
enum bpf_attach_type {
@@ -979,6 +994,8 @@ enum bpf_attach_type {
BPF_SK_LOOKUP,
BPF_XDP,
BPF_SK_SKB_VERDICT,
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
__MAX_BPF_ATTACH_TYPE
};
@@ -1097,8 +1114,8 @@ enum bpf_link_type {
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
* insn[0].src_reg: BPF_PSEUDO_MAP_FD
* insn[0].imm: map fd
* insn[0].src_reg: BPF_PSEUDO_MAP_[FD|IDX]
* insn[0].imm: map fd or fd_idx
* insn[1].imm: 0
* insn[0].off: 0
* insn[1].off: 0
@@ -1106,15 +1123,19 @@ enum bpf_link_type {
* verifier type: CONST_PTR_TO_MAP
*/
#define BPF_PSEUDO_MAP_FD 1
/* insn[0].src_reg: BPF_PSEUDO_MAP_VALUE
* insn[0].imm: map fd
#define BPF_PSEUDO_MAP_IDX 5
/* insn[0].src_reg: BPF_PSEUDO_MAP_[IDX_]VALUE
* insn[0].imm: map fd or fd_idx
* insn[1].imm: offset into value
* insn[0].off: 0
* insn[1].off: 0
* ldimm64 rewrite: address of map[0]+offset
* verifier type: PTR_TO_MAP_VALUE
*/
#define BPF_PSEUDO_MAP_VALUE 2
#define BPF_PSEUDO_MAP_VALUE 2
#define BPF_PSEUDO_MAP_IDX_VALUE 6
/* insn[0].src_reg: BPF_PSEUDO_BTF_ID
* insn[0].imm: kernel btd id of VAR
* insn[1].imm: 0
@@ -1314,6 +1335,8 @@ union bpf_attr {
/* or valid module BTF object fd or 0 to attach to vmlinux */
__u32 attach_btf_obj_fd;
};
__u32 :32; /* pad */
__aligned_u64 fd_array; /* array of FDs */
};
struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -2534,8 +2557,12 @@ union bpf_attr {
* The lower two bits of *flags* are used as the return code if
* the map lookup fails. This is so that the return value can be
* one of the XDP program return codes up to **XDP_TX**, as chosen
* by the caller. Any higher bits in the *flags* argument must be
* unset.
* by the caller. The higher bits of *flags* can be set to
* BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS as defined below.
*
* With BPF_F_BROADCAST the packet will be broadcasted to all the
* interfaces in the map, with BPF_F_EXCLUDE_INGRESS the ingress
* interface will be excluded when do broadcasting.
*
* See also **bpf_redirect**\ (), which only supports redirecting
* to an ifindex, but doesn't require a map to do so.
@@ -4735,6 +4762,24 @@ union bpf_attr {
* be zero-terminated except when **str_size** is 0.
*
* Or **-EBUSY** if the per-CPU memory copy buffer is busy.
*
* long bpf_sys_bpf(u32 cmd, void *attr, u32 attr_size)
* Description
* Execute bpf syscall with given arguments.
* Return
* A syscall result.
*
* long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
* Description
* Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
* Return
* Returns btf_id and btf_obj_fd in lower and upper 32 bits.
*
* long bpf_sys_close(u32 fd)
* Description
* Execute close syscall for given FD.
* Return
* A syscall result.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -4903,6 +4948,9 @@ union bpf_attr {
FN(check_mtu), \
FN(for_each_map_elem), \
FN(snprintf), \
FN(sys_bpf), \
FN(btf_find_by_name_kind), \
FN(sys_close), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -5080,6 +5128,12 @@ enum {
BPF_F_BPRM_SECUREEXEC = (1ULL << 0),
};
/* Flags for bpf_redirect_map helper */
enum {
BPF_F_BROADCAST = (1ULL << 3),
BPF_F_EXCLUDE_INGRESS = (1ULL << 4),
};
#define __bpf_md_ptr(type, name) \
union { \
type name; \
@@ -5364,6 +5418,20 @@ struct sk_reuseport_md {
__u32 ip_protocol; /* IP protocol. e.g. IPPROTO_TCP, IPPROTO_UDP */
__u32 bind_inany; /* Is sock bound to an INANY address? */
__u32 hash; /* A hash of the packet 4 tuples */
/* When reuse->migrating_sk is NULL, it is selecting a sk for the
* new incoming connection request (e.g. selecting a listen sk for
* the received SYN in the TCP case). reuse->sk is one of the sk
* in the reuseport group. The bpf prog can use reuse->sk to learn
* the local listening ip/port without looking into the skb.
*
* When reuse->migrating_sk is not NULL, reuse->sk is closed and
* reuse->migrating_sk is the socket that needs to be migrated
* to another listening socket. migrating_sk could be a fullsock
* sk that is fully established or a reqsk that is in-the-middle
* of 3-way handshake.
*/
__bpf_md_ptr(struct bpf_sock *, sk);
__bpf_md_ptr(struct bpf_sock *, migrating_sk);
};
#define BPF_TAG_SIZE 8