linux/net/core
John Fastabend a454d84ee2 bpf, sockmap: Fix skb refcnt race after locking changes
There is a race where skb's from the sk_psock_backlog can be referenced
after userspace side has already skb_consumed() the sk_buff and its refcnt
dropped to zer0 causing use after free.

The flow is the following:

  while ((skb = skb_peek(&psock->ingress_skb))
    sk_psock_handle_Skb(psock, skb, ..., ingress)
    if (!ingress) ...
    sk_psock_skb_ingress
       sk_psock_skb_ingress_enqueue(skb)
          msg->skb = skb
          sk_psock_queue_msg(psock, msg)
    skb_dequeue(&psock->ingress_skb)

The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
what the application reads when recvmsg() is called. An application can
read this anytime after the msg is placed on the queue. The recvmsg hook
will also read msg->skb and then after user space reads the msg will call
consume_skb(skb) on it effectively free'ing it.

But, the race is in above where backlog queue still has a reference to
the skb and calls skb_dequeue(). If the skb_dequeue happens after the
user reads and free's the skb we have a use after free.

The !ingress case does not suffer from this problem because it uses
sendmsg_*(sk, msg) which does not pass the sk_buff further down the
stack.

The following splat was observed with 'test_progs -t sockmap_listen':

  [ 1022.710250][ T2556] general protection fault, ...
  [...]
  [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
  [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
  [ 1022.713653][ T2556] Code: ...
  [...]
  [ 1022.720699][ T2556] Call Trace:
  [ 1022.720984][ T2556]  <TASK>
  [ 1022.721254][ T2556]  ? die_addr+0x32/0x80^M
  [ 1022.721589][ T2556]  ? exc_general_protection+0x25a/0x4b0
  [ 1022.722026][ T2556]  ? asm_exc_general_protection+0x22/0x30
  [ 1022.722489][ T2556]  ? skb_dequeue+0x4c/0x80
  [ 1022.722854][ T2556]  sk_psock_backlog+0x27a/0x300
  [ 1022.723243][ T2556]  process_one_work+0x2a7/0x5b0
  [ 1022.723633][ T2556]  worker_thread+0x4f/0x3a0
  [ 1022.723998][ T2556]  ? __pfx_worker_thread+0x10/0x10
  [ 1022.724386][ T2556]  kthread+0xfd/0x130
  [ 1022.724709][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725066][ T2556]  ret_from_fork+0x2d/0x50
  [ 1022.725409][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725799][ T2556]  ret_from_fork_asm+0x1b/0x30
  [ 1022.726201][ T2556]  </TASK>

To fix we add an skb_get() before passing the skb to be enqueued in the
engress queue. This bumps the skb->users refcnt so that consume_skb()
and kfree_skb will not immediately free the sk_buff. With this we can
be sure the skb is still around when we do the dequeue. Then we just
need to decrement the refcnt or free the skb in the backlog case which
we do by calling kfree_skb() on the ingress case as well as the sendmsg
case.

Before locking change from fixes tag we had the sock locked so we
couldn't race with user and there was no issue here.

Fixes: 799aa7f98d ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Jiri Olsa  <jolsa@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Xu Kuohai <xukuohai@huawei.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
2023-09-04 09:53:35 +02:00
..
bpf_sk_storage.c bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing 2023-07-27 10:07:56 -07:00
datagram.c net: datagram: fix data-races in datagram_poll() 2023-05-10 19:06:49 -07:00
dev_addr_lists_test.c kunit: Use KUNIT_EXPECT_MEMEQ macro 2022-10-27 02:40:14 -06:00
dev_addr_lists.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
dev_ioctl.c net: omit ndo_hwtstamp_get() call when possible in dev_set_hwtstamp_phylib() 2023-08-06 13:25:10 +01:00
dev.c net: Make consumed action consistent in sch_handle_egress 2023-08-28 10:18:03 +01:00
dev.h net-sysctl: factor-out rpm mask manipulation helpers 2023-02-09 17:45:55 -08:00
drop_monitor.c net: extend drop reasons for multiple subsystems 2023-04-20 20:20:49 -07:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-11-29 19:50:45 -08:00
dst.c net: remove unnecessary input parameter 'how' in ifdown function 2023-08-22 13:19:02 +02:00
failover.c net: failover: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf 2022-12-12 15:18:25 -08:00
fib_notifier.c
fib_rules.c fib: expand fib_rule_policy 2021-12-16 07:18:35 -08:00
filter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-10 14:10:53 -07:00
flow_dissector.c net: flow_dissector: Add IPSEC dissector 2023-08-02 10:09:31 +01:00
flow_offload.c tc: flower: Enable offload support IPSEC SPI field. 2023-08-02 10:09:32 +01:00
gen_estimator.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
gen_stats.c net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
gro_cells.c net: drop the weight argument from netif_napi_add 2022-09-28 18:57:14 -07:00
gro.c gro: move the tc_ext comparison to a helper 2023-06-18 18:08:35 +01:00
gso.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
hwbm.c
link_watch.c net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down 2022-11-16 09:45:00 +00:00
lwt_bpf.c lwt: Fix return values of BPF xmit ops 2023-08-18 16:05:26 +02:00
lwtunnel.c xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available 2022-10-12 10:45:51 +02:00
Makefile net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
neighbour.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
net_namespace.c lib/ref_tracker: improve printing stats 2023-06-05 15:28:42 -07:00
net-procfs.c net-sysfs: display two backlog queue len separately 2023-03-22 12:03:52 +01:00
net-sysfs.c net: move struct netdev_rx_queue out of netdevice.h 2023-08-03 08:38:07 -07:00
net-sysfs.h
net-traces.c udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint 2023-07-07 09:16:52 +01:00
netclassid_cgroup.c core: Variable type completion 2022-08-31 09:40:34 +01:00
netdev-genl-gen.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
netdev-genl-gen.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
netdev-genl.c netdev-genl: use struct genl_info for reply construction 2023-08-15 15:01:03 -07:00
netevent.c
netpoll.c netpoll: allocate netdev tracker right away 2023-06-15 08:21:11 +01:00
netprio_cgroup.c
of_net.c net: Explicitly include correct DT includes 2023-07-27 20:33:16 -07:00
page_pool.c page_pool: add a lockdep check for recycling in hardirq 2023-08-07 13:05:53 -07:00
pktgen.c net: introduce and use skb_frag_fill_page_desc() 2023-05-13 19:47:56 +01:00
ptp_classifier.c ptp: Add generic PTP is_sync() function 2022-03-07 11:31:34 +00:00
request_sock.c
rtnetlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
scm.c net: annotate data-races around sock->ops 2023-08-09 15:32:43 -07:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-20 10:14:49 +01:00
selftests.c
skbuff.c skbuff: skb_segment, Call zero copy functions before using skbuff frags 2023-09-01 08:09:07 +01:00
skmsg.c bpf, sockmap: Fix skb refcnt race after locking changes 2023-09-04 09:53:35 +02:00
sock_destructor.h
sock_diag.c net: fix __sock_gen_cookie() 2022-11-21 20:36:30 -08:00
sock_map.c bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t 2023-08-30 09:58:42 +02:00
sock_reuseport.c soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-10-25 11:35:16 +02:00
sock.c net: annotate data-races around sk->sk_bind_phc 2023-09-01 07:27:33 +01:00
stream.c net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
sysctl_net_core.c net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() 2023-04-05 13:48:04 +00:00
timestamping.c
tso.c net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
utils.c net: core: inet[46]_pton strlen len types 2022-11-01 21:14:39 -07:00
xdp.c page_pool: split types and declarations from page_pool.h 2023-08-07 13:05:19 -07:00