linux/net
Kuniyuki Iwashima d5e4ddaeb6 bpf: Support socket migration by eBPF.
This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT
to check if the attached eBPF program is capable of migrating sockets. When
the eBPF program is attached, we run it for socket migration if the
expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or
net.ipv4.tcp_migrate_req is enabled.

Currently, the expected_attach_type is not enforced for the
BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the
earlier idea in the commit aac3fc320d ("bpf: Post-hooks for sys_bind") to
fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type().

Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to
select a new listener based on the child socket. migrating_sk varies
depending on if it is migrating a request in the accept queue or during
3WHS.

  - accept_queue : sock (ESTABLISHED/SYN_RECV)
  - 3WHS         : request_sock (NEW_SYN_RECV)

In the eBPF program, we can select a new listener by
BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning
SK_DROP. This feature is useful when listeners have different settings at
the socket API level or when we want to free resources as soon as possible.

  - SK_PASS with selected_sk, select it as a new listener
  - SK_PASS with selected_sk NULL, fallbacks to the random selection
  - SK_DROP, cancel the migration.

There is a noteworthy point. We select a listening socket in three places,
but we do not have struct skb at closing a listener or retransmitting a
SYN+ACK. On the other hand, some helper functions do not expect skb is NULL
(e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer()
in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb
temporarily before running the eBPF program.

Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
2021-06-15 18:01:06 +02:00
..
6lowpan 6lowpan: Fix some typos in nhc_udp.c 2021-03-24 17:52:11 -07:00
9p net: 9p: Correct function names in the kerneldoc comments 2021-03-28 17:56:56 -07:00
802
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-26 12:00:00 -07:00
appletalk appletalk: Fix skb allocation size in loopback case 2021-02-12 16:40:28 -08:00
atm net: atm: use DEVICE_ATTR_RO macro 2021-05-20 15:50:54 -07:00
ax25 net/ax25: Delete obsolete TODO file 2021-03-30 16:54:50 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-09 20:48:35 -07:00
bluetooth Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
bpf bpf: Prepare bpf syscall to be used from kernel and user space. 2021-05-19 00:33:40 +02:00
bpfilter net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
bridge net: bridge: fix build when IPv6 is disabled 2021-05-14 10:38:59 -07:00
caif net: caif: Drop unnecessary NULL check after container_of 2021-05-13 15:55:50 -07:00
can can: proc: fix rcvlist_* header alignment on 64-bit system 2021-04-25 19:43:00 +02:00
ceph Notable items here are a series to take advantage of David Howells' 2021-05-06 10:27:02 -07:00
core bpf: Support socket migration by eBPF. 2021-06-15 18:01:06 +02:00
dcb net: dcb: Remove unnecessary INIT_LIST_HEAD() 2021-05-18 13:44:24 -07:00
dccp net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
decnet net/decnet: Delete obsolete TODO file 2021-03-30 16:54:50 -07:00
dns_resolver net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
dsa net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
ethernet of: net: pass the dst buffer to of_get_mac_address() 2021-04-13 14:35:02 -07:00
ethtool ethtool: fix missing NLM_F_MULTI flag when dumping 2021-05-05 12:41:10 -07:00
hsr net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info 2021-05-03 13:33:54 -07:00
ieee802154 net: remove the new_ifindex argument from dev_change_net_namespace 2021-04-07 14:43:28 -07:00
ife net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
ipv4 tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK. 2021-06-15 18:01:06 +02:00
ipv6 tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK. 2021-06-15 18:01:06 +02:00
iucv iucv: af_iucv.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
kcm kcm: kcmsock.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
key
l2tp net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
l3mdev l3mdev: Correct function names in the kerneldoc comments 2021-03-28 17:56:55 -07:00
lapb net: lapb: Make "lapb_t1timer_running" able to detect an already running timer 2021-03-23 14:14:50 -07:00
llc llc2: Remove redundant assignment to rc 2021-04-27 14:16:14 -07:00
mac80211 Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
mac802154 net: mac802154: Fix general protection fault 2021-04-06 22:42:16 +02:00
mpls mpls: Remove redundant assignment to err 2021-04-27 14:17:00 -07:00
mptcp mptcp: fix splat when closing unaccepted socket 2021-05-07 15:53:40 -07:00
ncsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-09 20:48:35 -07:00
netfilter netfilter: nftables: avoid potential overflows on 32bit arches 2021-05-07 10:01:39 +02:00
netlabel netlabel: remove unused parameter in netlbl_netlink_auditinfo() 2021-05-19 12:27:13 -07:00
netlink netlink: don't call ->netlink_bind with table lock held 2021-04-16 17:01:04 -07:00
netrom net: netrom: nr_in: Remove redundant assignment to ns 2021-04-28 13:59:08 -07:00
nfc net/nfc: fix use-after-free llcp_sock_bind/connect 2021-05-04 11:59:48 -07:00
nsh
openvswitch net: openvswitch: Remove unnecessary skb_nfct() 2021-05-10 14:18:19 -07:00
packet net/packet: Remove redundant assignment to ret 2021-05-17 16:03:56 -07:00
phonet
psample psample: Add additional metadata attributes 2021-03-14 15:00:43 -07:00
qrtr net: qrtr: ns: Fix error return code in qrtr_ns_init() 2021-05-19 13:14:35 -07:00
rds RDMA merge window pull request 2021-05-01 09:15:05 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose net: rose: Fix fall-through warnings for Clang 2021-03-10 12:45:15 -08:00
rxrpc Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
sched net/sched: cls_api: increase max_reclassify_loop 2021-05-19 13:10:24 -07:00
sctp net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
smc smc: disallow TCP_ULP in smc_setsockopt() 2021-05-05 12:52:45 -07:00
strparser
sunrpc NFS client updates for Linux 5.13 2021-05-07 11:23:41 -07:00
switchdev net: bridge: propagate extack through switchdev_port_attr_set 2021-02-14 17:38:11 -08:00
tipc Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
tls tls splice: remove inappropriate flags checking for MSG_PEEK 2021-05-12 14:31:30 -07:00
unix af_unix: handle idmapped mounts 2021-01-24 14:27:18 +01:00
vmw_vsock vsock/vmci: Remove redundant assignment to err 2021-04-30 15:00:59 -07:00
wireless Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
x25 af_x25.c: Fix a spello 2021-03-28 17:31:13 -07:00
xdp xdp: Extend xdp_redirect_map with broadcast support 2021-05-26 09:46:16 +02:00
xfrm xfrm: ipcomp: remove unnecessary get_cpu() 2021-04-19 12:49:29 +02:00
compat.c
devres.c
Kconfig net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
Makefile net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile 2021-01-27 17:03:52 -08:00
socket.c net: Fix a misspell in socket.c 2021-03-25 16:56:27 -07:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00