Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
This commit is contained in:
@@ -527,6 +527,15 @@ union bpf_iter_link_info {
|
||||
* Look up an element with the given *key* in the map referred to
|
||||
* by the file descriptor *fd*, and if found, delete the element.
|
||||
*
|
||||
* For **BPF_MAP_TYPE_QUEUE** and **BPF_MAP_TYPE_STACK** map
|
||||
* types, the *flags* argument needs to be set to 0, but for other
|
||||
* map types, it may be specified as:
|
||||
*
|
||||
* **BPF_F_LOCK**
|
||||
* Look up and delete the value of a spin-locked map
|
||||
* without returning the lock. This must be specified if
|
||||
* the elements contain a spinlock.
|
||||
*
|
||||
* The **BPF_MAP_TYPE_QUEUE** and **BPF_MAP_TYPE_STACK** map types
|
||||
* implement this command as a "pop" operation, deleting the top
|
||||
* element rather than one corresponding to *key*.
|
||||
@@ -536,6 +545,10 @@ union bpf_iter_link_info {
|
||||
* This command is only valid for the following map types:
|
||||
* * **BPF_MAP_TYPE_QUEUE**
|
||||
* * **BPF_MAP_TYPE_STACK**
|
||||
* * **BPF_MAP_TYPE_HASH**
|
||||
* * **BPF_MAP_TYPE_PERCPU_HASH**
|
||||
* * **BPF_MAP_TYPE_LRU_HASH**
|
||||
* * **BPF_MAP_TYPE_LRU_PERCPU_HASH**
|
||||
*
|
||||
* Return
|
||||
* Returns zero on success. On error, -1 is returned and *errno*
|
||||
@@ -837,6 +850,7 @@ enum bpf_cmd {
|
||||
BPF_PROG_ATTACH,
|
||||
BPF_PROG_DETACH,
|
||||
BPF_PROG_TEST_RUN,
|
||||
BPF_PROG_RUN = BPF_PROG_TEST_RUN,
|
||||
BPF_PROG_GET_NEXT_ID,
|
||||
BPF_MAP_GET_NEXT_ID,
|
||||
BPF_PROG_GET_FD_BY_ID,
|
||||
@@ -937,6 +951,7 @@ enum bpf_prog_type {
|
||||
BPF_PROG_TYPE_EXT,
|
||||
BPF_PROG_TYPE_LSM,
|
||||
BPF_PROG_TYPE_SK_LOOKUP,
|
||||
BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
|
||||
};
|
||||
|
||||
enum bpf_attach_type {
|
||||
@@ -979,6 +994,8 @@ enum bpf_attach_type {
|
||||
BPF_SK_LOOKUP,
|
||||
BPF_XDP,
|
||||
BPF_SK_SKB_VERDICT,
|
||||
BPF_SK_REUSEPORT_SELECT,
|
||||
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
|
||||
__MAX_BPF_ATTACH_TYPE
|
||||
};
|
||||
|
||||
@@ -1097,8 +1114,8 @@ enum bpf_link_type {
|
||||
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
|
||||
* the following extensions:
|
||||
*
|
||||
* insn[0].src_reg: BPF_PSEUDO_MAP_FD
|
||||
* insn[0].imm: map fd
|
||||
* insn[0].src_reg: BPF_PSEUDO_MAP_[FD|IDX]
|
||||
* insn[0].imm: map fd or fd_idx
|
||||
* insn[1].imm: 0
|
||||
* insn[0].off: 0
|
||||
* insn[1].off: 0
|
||||
@@ -1106,15 +1123,19 @@ enum bpf_link_type {
|
||||
* verifier type: CONST_PTR_TO_MAP
|
||||
*/
|
||||
#define BPF_PSEUDO_MAP_FD 1
|
||||
/* insn[0].src_reg: BPF_PSEUDO_MAP_VALUE
|
||||
* insn[0].imm: map fd
|
||||
#define BPF_PSEUDO_MAP_IDX 5
|
||||
|
||||
/* insn[0].src_reg: BPF_PSEUDO_MAP_[IDX_]VALUE
|
||||
* insn[0].imm: map fd or fd_idx
|
||||
* insn[1].imm: offset into value
|
||||
* insn[0].off: 0
|
||||
* insn[1].off: 0
|
||||
* ldimm64 rewrite: address of map[0]+offset
|
||||
* verifier type: PTR_TO_MAP_VALUE
|
||||
*/
|
||||
#define BPF_PSEUDO_MAP_VALUE 2
|
||||
#define BPF_PSEUDO_MAP_VALUE 2
|
||||
#define BPF_PSEUDO_MAP_IDX_VALUE 6
|
||||
|
||||
/* insn[0].src_reg: BPF_PSEUDO_BTF_ID
|
||||
* insn[0].imm: kernel btd id of VAR
|
||||
* insn[1].imm: 0
|
||||
@@ -1314,6 +1335,8 @@ union bpf_attr {
|
||||
/* or valid module BTF object fd or 0 to attach to vmlinux */
|
||||
__u32 attach_btf_obj_fd;
|
||||
};
|
||||
__u32 :32; /* pad */
|
||||
__aligned_u64 fd_array; /* array of FDs */
|
||||
};
|
||||
|
||||
struct { /* anonymous struct used by BPF_OBJ_* commands */
|
||||
@@ -2534,8 +2557,12 @@ union bpf_attr {
|
||||
* The lower two bits of *flags* are used as the return code if
|
||||
* the map lookup fails. This is so that the return value can be
|
||||
* one of the XDP program return codes up to **XDP_TX**, as chosen
|
||||
* by the caller. Any higher bits in the *flags* argument must be
|
||||
* unset.
|
||||
* by the caller. The higher bits of *flags* can be set to
|
||||
* BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS as defined below.
|
||||
*
|
||||
* With BPF_F_BROADCAST the packet will be broadcasted to all the
|
||||
* interfaces in the map, with BPF_F_EXCLUDE_INGRESS the ingress
|
||||
* interface will be excluded when do broadcasting.
|
||||
*
|
||||
* See also **bpf_redirect**\ (), which only supports redirecting
|
||||
* to an ifindex, but doesn't require a map to do so.
|
||||
@@ -4735,6 +4762,24 @@ union bpf_attr {
|
||||
* be zero-terminated except when **str_size** is 0.
|
||||
*
|
||||
* Or **-EBUSY** if the per-CPU memory copy buffer is busy.
|
||||
*
|
||||
* long bpf_sys_bpf(u32 cmd, void *attr, u32 attr_size)
|
||||
* Description
|
||||
* Execute bpf syscall with given arguments.
|
||||
* Return
|
||||
* A syscall result.
|
||||
*
|
||||
* long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
|
||||
* Description
|
||||
* Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
|
||||
* Return
|
||||
* Returns btf_id and btf_obj_fd in lower and upper 32 bits.
|
||||
*
|
||||
* long bpf_sys_close(u32 fd)
|
||||
* Description
|
||||
* Execute close syscall for given FD.
|
||||
* Return
|
||||
* A syscall result.
|
||||
*/
|
||||
#define __BPF_FUNC_MAPPER(FN) \
|
||||
FN(unspec), \
|
||||
@@ -4903,6 +4948,9 @@ union bpf_attr {
|
||||
FN(check_mtu), \
|
||||
FN(for_each_map_elem), \
|
||||
FN(snprintf), \
|
||||
FN(sys_bpf), \
|
||||
FN(btf_find_by_name_kind), \
|
||||
FN(sys_close), \
|
||||
/* */
|
||||
|
||||
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
|
||||
@@ -5080,6 +5128,12 @@ enum {
|
||||
BPF_F_BPRM_SECUREEXEC = (1ULL << 0),
|
||||
};
|
||||
|
||||
/* Flags for bpf_redirect_map helper */
|
||||
enum {
|
||||
BPF_F_BROADCAST = (1ULL << 3),
|
||||
BPF_F_EXCLUDE_INGRESS = (1ULL << 4),
|
||||
};
|
||||
|
||||
#define __bpf_md_ptr(type, name) \
|
||||
union { \
|
||||
type name; \
|
||||
@@ -5364,6 +5418,20 @@ struct sk_reuseport_md {
|
||||
__u32 ip_protocol; /* IP protocol. e.g. IPPROTO_TCP, IPPROTO_UDP */
|
||||
__u32 bind_inany; /* Is sock bound to an INANY address? */
|
||||
__u32 hash; /* A hash of the packet 4 tuples */
|
||||
/* When reuse->migrating_sk is NULL, it is selecting a sk for the
|
||||
* new incoming connection request (e.g. selecting a listen sk for
|
||||
* the received SYN in the TCP case). reuse->sk is one of the sk
|
||||
* in the reuseport group. The bpf prog can use reuse->sk to learn
|
||||
* the local listening ip/port without looking into the skb.
|
||||
*
|
||||
* When reuse->migrating_sk is not NULL, reuse->sk is closed and
|
||||
* reuse->migrating_sk is the socket that needs to be migrated
|
||||
* to another listening socket. migrating_sk could be a fullsock
|
||||
* sk that is fully established or a reqsk that is in-the-middle
|
||||
* of 3-way handshake.
|
||||
*/
|
||||
__bpf_md_ptr(struct bpf_sock *, sk);
|
||||
__bpf_md_ptr(struct bpf_sock *, migrating_sk);
|
||||
};
|
||||
|
||||
#define BPF_TAG_SIZE 8
|
||||
|
||||
Reference in New Issue
Block a user