linux/net/core
Sebastian Andrzej Siewior 401cb7dae8 net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
  packet and makes decisions. While doing that, the per-CPU variable
  bpf_redirect_info is used.

- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
  and it may also access other per-CPU variables like xskmap_flush_list.

At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.

The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.

PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.

Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it, bpf_net_ctx_clear() removes it again.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info. The returned data structure is zero initialized to
ensure nothing is leaked from stack. This is done on first usage of the
struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to
note that initialisation is required. First invocation of
bpf_net_ctx_get_ri() will memset() the data structure and update
bpf_redirect_info::kern_flags.
bpf_redirect_info::nh is excluded from memset because it is only used
once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
moved past nh to exclude it from memset.

The pointer to bpf_net_context is saved task's task_struct. Using
always the bpf_net_context approach has the advantage that there is
almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.

Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24 16:41:24 -07:00
..
bpf_sk_storage.c netlink: introduce type-checking attribute iteration 2024-03-29 15:06:02 -07:00
datagram.c net: micro-optimize skb_datagram_iter 2024-06-14 19:32:48 -07:00
dev_addr_lists_test.c net: dev_addr_lists: move locking out of init/exit in kunit 2024-04-15 10:26:35 +01:00
dev_addr_lists.c
dev_ioctl.c net: Move dev_set_hwtstamp_phylib to net/core/dev.h 2024-06-17 18:25:53 -07:00
dev.c net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. 2024-06-24 16:41:24 -07:00
dev.h net: softnet_data: Make xmit per task. 2024-06-24 16:41:23 -07:00
drop_monitor.c net: add rx_sk to trace_kfree_skb 2024-06-19 12:44:22 +01:00
dst_cache.c net: dst_cache: add two DEBUG_NET warnings 2024-06-03 18:50:09 -07:00
dst.c net: dst: Make dst_destroy() static and return void. 2024-02-06 11:45:53 +01:00
failover.c net: failover: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf 2022-12-12 15:18:25 -08:00
fib_notifier.c
fib_rules.c fib: rules: no longer hold RTNL in fib_nl_dumprule() 2024-04-12 19:09:31 -07:00
filter.c net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. 2024-06-24 16:41:24 -07:00
flow_dissector.c net: ipv4: Add a sysctl to set multipath hash seed 2024-06-12 16:42:11 -07:00
flow_offload.c tc: flower: Enable offload support IPSEC SPI field. 2023-08-02 10:09:32 +01:00
gen_estimator.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
gen_stats.c net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
gro_cells.c net: move netdev_max_backlog to net_hotdata 2024-03-07 21:12:42 -08:00
gro.c net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment 2024-05-13 14:44:06 -07:00
gso.c net: introduce struct net_hotdata 2024-03-07 21:12:41 -08:00
hotdata.c net: move sysctl_mem_pcpu_rsv to net_hotdata 2024-04-30 18:46:52 -07:00
hwbm.c
ieee8021q_helpers.c net: add IEEE 802.1q specific helpers 2024-05-08 10:35:09 +01:00
link_watch.c net: add netdev_set_operstate() helper 2024-02-14 11:20:13 +00:00
lwt_bpf.c net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. 2024-06-24 16:41:24 -07:00
lwtunnel.c xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available 2022-10-12 10:45:51 +02:00
Makefile net: add IEEE 802.1q specific helpers 2024-05-08 10:35:09 +01:00
neighbour.c net/neighbour: constify ctl_table arguments of utility function 2024-05-28 19:49:47 -07:00
net_namespace.c netns: Make get_net_ns() handle zero refcount net 2024-06-18 10:59:52 +02:00
net_test.c pfcp: always set pfcp metadata 2024-04-01 10:49:28 +01:00
net-procfs.c net: make softnet_data.dropped an atomic_t 2024-04-01 11:28:32 +01:00
net-sysfs.c net: no longer acquire RTNL in threaded_show() 2024-05-03 15:14:01 -07:00
net-sysfs.h
net-traces.c udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint 2023-07-07 09:16:52 +01:00
netclassid_cgroup.c cgroup, netclassid: on modifying netclassid in cgroup, only consider the main process. 2023-10-16 16:36:53 -07:00
netdev-genl-gen.c netdev: support dumping a single netdev in qstats 2024-04-23 10:09:49 -07:00
netdev-genl-gen.h netdev: add per-queue statistics 2024-03-07 21:13:25 -08:00
netdev-genl.c netdev-genl: fix error codes when outputting XDP features 2024-06-14 18:04:29 -07:00
netevent.c
netpoll.c netpoll: Fix race condition in netpoll_owner_active 2024-04-30 19:03:47 -07:00
netprio_cgroup.c
of_net.c net: Explicitly include correct DT includes 2023-07-27 20:33:16 -07:00
page_pool_priv.h net: page_pool: report when page pool was destroyed 2023-11-28 15:48:39 +01:00
page_pool_user.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
page_pool.c page_pool: remove WARN_ON() with OR 2024-06-09 15:50:43 +01:00
pktgen.c net: pktgen: Use wait_event_freezable_timeout() for freezable kthread 2023-12-27 14:34:52 +00:00
ptp_classifier.c
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-01-19 21:13:25 -08:00
rtnetlink.c rtnetlink: move rtnl_lock handling out of af_netlink 2024-06-10 13:15:40 +01:00
scm.c af_unix: Add dead flag to struct scm_fp_list. 2024-05-10 18:52:45 -07:00
secure_seq.c
selftests.c net: fill in MODULE_DESCRIPTION()s under net/core 2023-10-28 11:29:27 +01:00
skbuff.c net: Use nested-BH locking for napi_alloc_cache. 2024-06-24 16:41:22 -07:00
skmsg.c bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue 2024-04-08 09:18:22 +02:00
sock_destructor.h
sock_diag.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
sock_map.c sock_map: avoid race between sock_map_close and sk_psock_put 2024-05-28 12:05:19 +02:00
sock_reuseport.c soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-10-25 11:35:16 +02:00
sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
stream.c net: Return error from sk_stream_wait_connect() if sk_wait_event() fails 2023-12-15 10:48:51 +00:00
sysctl_net_core.c net: make net.core.{r,w}mem_{default,max} namespaced 2024-06-01 16:03:21 -07:00
timestamping.c net: partial revert of the "Make timestamping selectable: series 2023-11-18 18:42:37 -08:00
tso.c net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
utils.c net: core: inet[46]_pton strlen len types 2022-11-01 21:14:39 -07:00
xdp.c net: move skbuff_cache(s) to net_hotdata 2024-03-07 21:12:42 -08:00