linux/net
Ido Schimmel 8743aeff5b nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.

The nexthop bucket dump callback always returns a positive number if
nexthop buckets were filled in the provided skb, even if the dump is
complete. This means that a dump will span at least two recvmsg() calls
as long as nexthop buckets are present. In the last recvmsg() call the
dump callback will not fill in any nexthop buckets because the previous
call indicated that the dump should restart from the last dumped nexthop
ID plus one.

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id 10 group 1 type resilient buckets 2
 # strace -e sendto,recvmsg -s 5 ip nexthop bucket
 sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
 id 10 index 0 idle_time 6.66 nhid 1
 id 10 index 1 idle_time 6.66 nhid 1
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
 +++ exited with 0 +++

This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
 # ip nexthop bucket
 id 4294967295 index 0 idle_time 5.55 nhid 1
 id 4294967295 index 1 idle_time 5.55 nhid 1
 id 4294967295 index 0 idle_time 5.55 nhid 1
 id 4294967295 index 1 idle_time 5.55 nhid 1
 [...]

Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
 # strace -e sendto,recvmsg -s 5 ip nexthop bucket
 sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
 id 4294967295 index 0 idle_time 6.61 nhid 1
 id 4294967295 index 1 idle_time 6.61 nhid 1
 +++ exited with 0 +++

Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.

Add a test that fails before the fix:

 # ./fib_nexthops.sh -t basic_res
 [...]
 TEST: Maximum nexthop ID dump                                       [FAIL]
 [...]

And passes after it:

 # ./fib_nexthops.sh -t basic_res
 [...]
 TEST: Maximum nexthop ID dump                                       [ OK ]
 [...]

Fixes: 8a1bbabb03 ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09 13:45:12 -07:00
..
6lowpan
9p 9p: remove dead stores (variable set again without being read) 2023-07-20 19:14:50 +00:00
802
8021q vlan: Fix VLAN 0 memory leak 2023-08-09 13:44:27 -07:00
appletalk sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
atm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ax25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
batman-adv batman-adv: Broken sync while rescheduling delayed work 2023-05-26 23:14:49 +02:00
bluetooth Bluetooth: MGMT: Use correct address for memcpy() 2023-07-20 11:27:22 -07:00
bpf bpf: Move kernel test kfuncs to bpf_testmod 2023-05-16 22:09:24 -07:00
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
bridge Revert "bridge: Add extack warning when enabling STP in netns." 2023-07-20 10:46:28 +02:00
caif sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
can net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
ceph rbd: harden get_lock_owner_info() a bit 2023-07-26 15:08:09 +02:00
core net/unix: use consistent error code in SO_PEERPIDFD 2023-08-08 15:56:48 -07:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-01 21:07:46 -07:00
dccp dccp: fix data-race around dp->dccps_mss_cache 2023-08-04 18:27:58 -07:00
devlink devlink: report devlink_port_type_warn source device 2023-06-17 00:31:14 -07:00
dns_resolver
dsa net: dsa: fix older DSA drivers using phylink 2023-07-27 17:19:46 -07:00
ethernet
ethtool net: create device lookup API with reference tracking 2023-06-15 08:21:11 +01:00
handshake Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-15 22:19:41 -07:00
hsr net: hsr: Disable promiscuous mode in offload mode 2023-06-21 16:47:05 -07:00
ieee802154 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ife
ipv4 nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID 2023-08-09 13:45:12 -07:00
ipv6 ip6mr: Fix skb_under_panic in ip6mr_cache_report() 2023-08-02 10:35:21 +01:00
iucv
kcm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
key sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
l2tp net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
l3mdev
lapb
llc llc: Don't drop packet from non-root netns. 2023-07-20 10:46:28 +02:00
mac80211 - New Drivers 2023-07-03 11:26:05 -07:00
mac802154 Core WPAN changes: 2023-06-24 15:41:46 -07:00
mctp sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
mpls net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
mptcp mptcp: fix disconnect vs accept race 2023-08-04 18:26:27 -07:00
ncsi net/ncsi: change from ndo_set_mac_address to dev_set_mac_address 2023-06-09 10:32:51 +01:00
netfilter net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
netlabel netlabel: Reorder fields in 'struct netlbl_domaddr6_map' 2023-06-20 20:06:56 -07:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-27 09:45:22 -07:00
netrom sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-27 09:45:22 -07:00
nsh net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
openvswitch net: openvswitch: add support for l4 symmetric hashing 2023-06-12 09:46:30 +01:00
packet net/packet: annotate data-races around tp->status 2023-08-04 18:03:16 -07:00
phonet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
psample
qrtr Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
rds sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rfkill
rose sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rxrpc Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sched net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free 2023-07-31 20:10:37 -07:00
sctp sctp: fix potential deadlock on &net->sctp.addr_wq_lock 2023-06-29 11:49:42 +02:00
smc net/smc: Use correct buffer sizes when switching between TCP and SMC 2023-08-09 11:20:29 +01:00
strparser
sunrpc NFS client updates for Linux 6.5 2023-07-01 14:38:25 -07:00
switchdev
tipc tipc: stop tipc crypto on failure in tipc_node_create 2023-07-27 11:45:05 +02:00
tls net: tls: avoid discarding data on record close 2023-08-06 08:32:18 +01:00
unix net: add missing data-race annotations around sk->sk_peek_off 2023-07-29 18:13:41 +01:00
vmw_vsock sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
wireless wifi: cfg80211: Fix return value in scan logic 2023-07-26 10:08:07 +03:00
x25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xdp net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
xfrm net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
compat.c
devres.c
Kconfig net/core: Enable socket busy polling on -RT 2023-05-26 08:51:26 +01:00
Kconfig.debug
Makefile
socket.c Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sysctl_net.c