linux/net/core
Kuniyuki Iwashima d5e4ddaeb6 bpf: Support socket migration by eBPF.
This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT
to check if the attached eBPF program is capable of migrating sockets. When
the eBPF program is attached, we run it for socket migration if the
expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or
net.ipv4.tcp_migrate_req is enabled.

Currently, the expected_attach_type is not enforced for the
BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the
earlier idea in the commit aac3fc320d ("bpf: Post-hooks for sys_bind") to
fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type().

Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to
select a new listener based on the child socket. migrating_sk varies
depending on if it is migrating a request in the accept queue or during
3WHS.

  - accept_queue : sock (ESTABLISHED/SYN_RECV)
  - 3WHS         : request_sock (NEW_SYN_RECV)

In the eBPF program, we can select a new listener by
BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning
SK_DROP. This feature is useful when listeners have different settings at
the socket API level or when we want to free resources as soon as possible.

  - SK_PASS with selected_sk, select it as a new listener
  - SK_PASS with selected_sk NULL, fallbacks to the random selection
  - SK_DROP, cancel the migration.

There is a noteworthy point. We select a listening socket in three places,
but we do not have struct skb at closing a listener or retransmitting a
SYN+ACK. On the other hand, some helper functions do not expect skb is NULL
(e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer()
in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb
temporarily before running the eBPF program.

Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
2021-06-15 18:01:06 +02:00
..
bpf_sk_storage.c bpf: Use struct_size() in kzalloc() 2021-05-13 15:58:00 -07:00
datagram.c udp: fix skb_copy_and_csum_datagram with odd segment sizes 2021-02-04 18:56:56 -08:00
datagram.h
dev_addr_lists.c net: core: Correct function name dev_uc_flush() in the kerneldoc 2021-03-28 17:56:56 -07:00
dev_ioctl.c net: fix dev_ifsioc_locked() race condition 2021-02-11 18:14:19 -08:00
dev.c net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT 2021-05-13 13:11:19 -07:00
devlink.c devlink: Extend SF port attributes to have external attribute 2021-04-24 00:58:53 -07:00
drop_monitor.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-03-25 15:31:22 -07:00
dst_cache.c
dst.c net, bpf: Fix ip6ip6 crash with collect_md populated skbs 2021-03-10 12:24:18 -08:00
failover.c
fib_notifier.c
fib_rules.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
filter.c bpf: Support socket migration by eBPF. 2021-06-15 18:01:06 +02:00
flow_dissector.c flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target() 2021-04-16 17:02:27 -07:00
flow_offload.c net: flow_offload: Fix memory leak for indirect flow block 2020-12-09 16:08:33 -08:00
gen_estimator.c net_sched: gen_estimator: support large ewma log 2021-01-15 18:11:06 -08:00
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c gro_cells: reduce number of synchronize_net() calls 2020-11-25 11:28:12 -08:00
hwbm.c
link_watch.c net: Add IF_OPER_TESTING 2020-04-20 12:43:24 -07:00
lwt_bpf.c lwt_bpf: Replace preempt_disable() with migrate_disable() 2020-12-07 11:53:40 -08:00
lwtunnel.c net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
Makefile net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
neighbour.c neighbour: Remove redundant initialization of 'bucket' 2021-05-10 14:25:13 -07:00
net_namespace.c net: initialize net->net_cookie at netns setup 2021-02-11 14:10:07 -08:00
net-procfs.c net: move the ptype_all and ptype_base declarations to include/linux/netdevice.h 2021-03-22 13:14:45 -07:00
net-sysfs.c net-sysfs: remove possible sleep from an RCU read-side critical section 2021-03-22 13:28:13 -07:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
netclassid_cgroup.c net: Remove the err argument from sock_from_file 2020-12-04 22:32:40 +01:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c Revert "net: Have netpoll bring-up DSA management interface" 2021-02-06 14:42:57 -08:00
netprio_cgroup.c net: Remove the err argument from sock_from_file 2020-12-04 22:32:40 +01:00
page_pool.c net: page_pool: use alloc_pages_bulk in refill code path 2021-04-30 11:20:43 -07:00
pktgen.c pktgen: fix misuse of BUG_ON() in pktgen_thread_worker() 2021-01-27 16:46:37 -08:00
ptp_classifier.c ptp: Add generic ptp v2 header parsing function 2020-08-19 16:07:49 -07:00
request_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
rtnetlink.c rtnetlink: avoid RCU read lock when holding RTNL 2021-05-10 14:33:10 -07:00
scm.c scm: fix a typo in put_cmsg() 2021-04-16 11:41:07 -07:00
secure_seq.c crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h 2020-05-08 15:32:17 +10:00
selftests.c net: add generic selftest support 2021-04-20 16:08:02 -07:00
skbuff.c Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
skmsg.c skmsg: Remove unused parameters of sk_msg_wait_data() 2021-05-18 16:44:19 +02:00
sock_diag.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
sock_map.c sock_map: Fix a potential use-after-free in sock_map_close() 2021-04-12 17:35:26 +02:00
sock_reuseport.c bpf: Support socket migration by eBPF. 2021-06-15 18:01:06 +02:00
sock.c net: sock: remove the unnecessary check in proto_register 2021-04-23 13:10:03 -07:00
stream.c
sysctl_net_core.c net: change netdev_unregister_timeout_secs min value to 1 2021-03-25 17:24:06 -07:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c xdp: Extend xdp_redirect_map with broadcast support 2021-05-26 09:46:16 +02:00