linux

Author	SHA1	Message	Date
Neal Cardwell	e7b10a4dd1	tcp: Simplify EBPF TCP_CONGESTION to always init CC Now that the previous patch ensures we don't initialize the congestion control twice, when EBPF sets the congestion control algorithm at connection establishment we can simplify the code by simply initializing the congestion control module at that time. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Kevin Yang <yyd@google.com> Cc: Lawrence Brakmo <brakmo@fb.com>	2020-09-10 20:53:01 -07:00
Neal Cardwell	8919a9b31e	tcp: Only init congestion control if not initialized already Change tcp_init_transfer() to only initialize congestion control if it has not been initialized already. With this new approach, we can arrange things so that if the EBPF code sets the congestion control by calling setsockopt(TCP_CONGESTION) then tcp_init_transfer() will not re-initialize the CC module. This is an approach that has the following beneficial properties: (1) This allows CC module customizations made by the EBPF called in tcp_init_transfer() to persist, and not be wiped out by a later call to tcp_init_congestion_control() in tcp_init_transfer(). (2) Does not flip the order of EBPF and CC init, to avoid causing bugs for existing code upstream that depends on the current order. (3) Does not cause 2 initializations for for CC in the case where the EBPF called in tcp_init_transfer() wants to set the CC to a new CC algorithm. (4) Allows follow-on simplifications to the code in net/core/filter.c and net/ipv4/tcp_cong.c, which currently both have some complexity to special-case CC initialization to avoid double CC initialization if EBPF sets the CC. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Kevin Yang <yyd@google.com> Cc: Lawrence Brakmo <brakmo@fb.com>	2020-09-10 20:53:01 -07:00
Karsten Graul	22ef473dbd	net/smc: use separate work queues for different worker types There are 6 types of workers which exist per smc connection. 3 of them are used for listen and handshake processing, another 2 are used for close and abort processing and 1 is the tx worker that moves calls to sleeping functions into a worker. To prevent flooding of the system work queue when many connections are opened or closed at the same time (some pattern uperf implements), move those workers to one of 3 smc-specific work queues. Two work queues are module-global and used for handshake and close workers. The third work queue is defined per link group and used by the tx workers that may sleep waiting for resources of this link group. And in smc_llc_enqueue() queue the llc_event_work work to the system prio work queue because its critical that this work is started fast. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:27 -07:00
Guvenc Gulce	8418cb4065	net/smc: use the retry mechanism for netlink messages When the netlink messages to be sent to the userspace are too big for a single netlink message, send them in chunks using the netlink_dump infrastructure. Modify the smc diag dump code so that it can signal to the netlink_dump infrastructure that it needs to send more data. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:27 -07:00
Ursula Braun	f9aab6f2ce	net/smc: immediate freeing in smc_lgr_cleanup_early() smc_lgr_cleanup_early() schedules the free worker with delay. DMB unregistering occurs in this delayed worker increasing the risk to reach the SMCD SBA limit without need. Terminate the linkgroup immediately, since termination means early DMB unregistering. For SMCD the global smc_server_lgr_pending lock is given up early. A linkgroup to be given up with smc_lgr_cleanup_early() may already contain more than one connection. Using __smc_lgr_terminate() in smc_lgr_cleanup_early() covers this. And consolidate smc_ism_put_vlan() and smc_put_device() into smc_lgr_free() only. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:26 -07:00
Ursula Braun	0c881ada3d	net/smc: reduce smc_listen_decline() calls smc_listen_work() contains already an smc_listen_decline() exit. Use this exit for smc_listen_rdma_finish() problems as well. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:26 -07:00
Ursula Braun	7b2977d083	net/smc: improve server ISM device determination Move check whether peer can be reached into smc_pnet_find_ism_by_pnetid(). Thus searching continues for another ism device, if check fails. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:26 -07:00
Ursula Braun	3d9725a6a1	net/smc: common routine for CLC accept and confirm smc_clc_send_accept() and smc_clc_send_confirm() are quite similar. Move common code into a separate function smc_clc_send_confirm_accept(). And introduce separate SMCD and SMCR struct definitions for CLC accept resp. confirm. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:26 -07:00
Ursula Braun	6bb14e48ee	net/smc: dynamic allocation of CLC proposal buffer Reduce stack size for smc_listen_work() and smc_clc_send_proposal() by dynamic allocation of the CLC buffer to be received or sent. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:26 -07:00
Ursula Braun	5ac54d8768	net/smc: introduce better field names Field names "srv_first_contact" and "cln_first_contact" are misleading, since they apply to both, server and client. Rename them to "first_contact_peer" and "first_contact_local". Rename "ism_gid" by the more precise name "ism_peer_gid". Rename version constant "SMC_CLC_V1" into "SMC_V1". No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:26 -07:00
Ursula Braun	a60a2b1e0a	net/smc: reduce active tcp_listen workers SMC starts a separate tcp_listen worker for every SMC socket in state SMC_LISTEN, and can accept an incoming connection request only, if this worker is really running and waiting in kernel_accept(). But the number of running workers is limited. This patch reworks the listening SMC code and starts a tcp_listen worker after the SYN-ACK handshake on the internal clc-socket only. Suggested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:24:26 -07:00
Petr Machata	297e77e53e	net: DCB: Validate DCB_ATTR_DCB_BUFFER argument The parameter passed via DCB_ATTR_DCB_BUFFER is a struct dcbnl_buffer. The field prio2buffer is an array of IEEE_8021Q_MAX_PRIORITIES bytes, where each value is a number of a buffer to direct that priority's traffic to. That value is however never validated to lie within the bounds set by DCBX_MAX_BUFFERS. The only driver that currently implements the callback is mlx5 (maintainers CCd), and that does not do any validation either, in particual allowing incorrect configuration if the prio2buffer value does not fit into 4 bits. Instead of offloading the need to validate the buffer index to drivers, do it right there in core, and bounce the request if the value is too large. CC: Parav Pandit <parav@nvidia.com> CC: Saeed Mahameed <saeedm@nvidia.com> Fixes: `e549f6f9c0` ("net/dcb: Add dcbnl buffer attribute") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:09:08 -07:00
Ido Schimmel	e1b9efe6ba	net: Fix bridge enslavement failure When a netdev is enslaved to a bridge, its parent identifier is queried. This is done so that packets that were already forwarded in hardware will not be forwarded again by the bridge device between netdevs belonging to the same hardware instance. The operation fails when the netdev is an upper of netdevs with different parent identifiers. Instead of failing the enslavement, have dev_get_port_parent_id() return '-EOPNOTSUPP' which will signal the bridge to skip the query operation. Other callers of the function are not affected by this change. Fixes: `7e1146e8c1` ("net: devlink: introduce devlink_compat_switch_id_get() helper") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 15:06:48 -07:00
Miaohe Lin	1be107de2e	net: Correct the comment of dst_dev_put() Since commit `8d7017fd62` ("blackhole_netdev: use blackhole_netdev to invalidate dst entries"), we use blackhole_netdev to invalidate dst entries instead of loopback device anymore. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:28:57 -07:00
Wei Wang	ac8f1710c1	tcp: reflect tos value received in SYN to the socket This commit adds a new TCP feature to reflect the tos value received in SYN, and send it out on the SYN-ACK, and eventually set the tos value of the established socket with this reflected tos value. This provides a way to set the traffic class/QoS level for all traffic in the same connection to be the same as the incoming SYN request. It could be useful in data centers to provide equivalent QoS according to the incoming request. This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by default turned off. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:15:40 -07:00
Wei Wang	de033b7d15	ip: pass tos into ip_build_and_send_pkt() This commit adds tos as a new passed in parameter to ip_build_and_send_pkt() which will be used in the later commit. This is a pure restructure and does not have any functional change. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:15:40 -07:00
Wei Wang	e9b12edc13	tcp: record received TOS value in the request socket A new field is added to the request sock to record the TOS value received on the listening socket during 3WHS: When not under syn flood, it is recording the TOS value sent in SYN. When under syn flood, it is recording the TOS value sent in the ACK. This is a preparation patch in order to do TOS reflection in the later commit. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:15:40 -07:00
Jakub Kicinski	5251ef8299	net: make sure napi_list is safe for RCU traversal netpoll needs to traverse dev->napi_list under RCU, make sure it uses the right iterator and that removal from this list is handled safely. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:08:46 -07:00
Jakub Kicinski	4d092dd204	net: manage napi add/del idempotence explicitly To RCUify napi->dev_list we need to replace list_del_init() with list_del_rcu(). There is no _init() version for RCU for obvious reasons. Up until now netif_napi_del() was idempotent so to make sure it remains such add a bit which is set when NAPI is listed, and cleared when it removed. Since we don't expect multiple calls to netif_napi_add() to be correct, add a warning on that side. Now that napi_hash_add / napi_hash_del are only called by napi_add / del we can actually steal its bit. We just need to make sure hash node is initialized correctly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:08:46 -07:00
Jakub Kicinski	5198d545db	net: remove napi_hash_del() from driver-facing API We allow drivers to call napi_hash_del() before calling netif_napi_del() to batch RCU grace periods. This makes the API asymmetric and leaks internal implementation details. Soon we will want the grace period to protect more than just the NAPI hash table. Restructure the API and have drivers call a new function - __netif_napi_del() if they want to take care of RCU waits. Note that only core was checking the return status from napi_hash_del() so the new helper does not report if the NAPI was actually deleted. Some notes on driver oddness: - veth observed the grace period before calling netif_napi_del() but that should not matter - myri10ge observed normal RCU flavor - bnx2x and enic did not actually observe the grace period (unless they did so implicitly) - virtio_net and enic only unhashed Rx NAPIs The last two points seem to indicate that the calls to napi_hash_del() were a left over rather than an optimization. Regardless, it's easy enough to correct them. This patch may introduce extra synchronize_net() calls for interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on free_netdev() to call netif_napi_del(). This seems inevitable since we want to use RCU for netpoll dev->napi_list traversal, and almost no drivers set IFF_DISABLE_NETPOLL. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:08:46 -07:00
Jakub Kicinski	3ea87ca772	devlink: don't crash if netdev is NULL Following change will add support for a corner case where we may not have a netdev to pass to devlink_port_type_eth_set() but we still want to set port type. This is definitely a corner case, and drivers should not normally pass NULL netdev - print a warning message when this happens. Sadly for other port types (ib) switches don't have a device reference, the way we always do for Ethernet, so we can't put the warning in __devlink_port_type_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:49:00 -07:00
Yunsheng Lin	2fb541c862	net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc Currently there is concurrent reset and enqueue operation for the same lockless qdisc when there is no lock to synchronize the q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in qdisc_deactivate() called by dev_deactivate_queue(), which may cause out-of-bounds access for priv->ring[] in hns3 driver if user has requested a smaller queue num when __dev_xmit_skb() still enqueue a skb with a larger queue_mapping after the corresponding qdisc is reset, and call hns3_nic_net_xmit() with that skb later. Reused the existing synchronize_net() in dev_deactivate_many() to make sure skb with larger queue_mapping enqueued to old qdisc(which is saved in dev_queue->qdisc_sleeping) will always be reset when dev_reset_queue() is called. Fixes: `6b3ba9146f` ("net: sched: allow qdiscs to handle locking") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:38:26 -07:00
Lorenz Bauer	0365351524	net: Allow iterating sockmap and sockhash Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and hashtable implementation. sockmap and sockhash share the same iteration context: a pointer to an arbitrary key and a pointer to a socket. Both pointers may be NULL, and so BPF has to perform a NULL check before accessing them. Technically it's not possible for sockhash iteration to yield a NULL socket, but we ignore this to be able to use a single iteration point. Iteration will visit all keys that remain unmodified during the lifetime of the iterator. It may or may not visit newly added ones. Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain another guard rail if CONFIG_PROVE_RCU is enabled. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com	2020-09-10 12:31:55 -07:00
Lorenz Bauer	654785a1af	net: sockmap: Remove unnecessary sk_fullsock checks The lookup paths for sockmap and sockhash currently include a check that returns NULL if the socket we just found is not a full socket. However, this check is not necessary. On insertion we ensure that we have a full socket (caveat around sock_ops), so request sockets are not a problem. Time-wait sockets are allocated separate from the original socket and then fed into the hashdance. They don't affect the sockets already stored in the sockmap. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com	2020-09-10 12:31:55 -07:00
Geliang Tang	f612eb76f3	mptcp: fix kmalloc flag in mptcp_pm_nl_get_local_id mptcp_pm_nl_get_local_id may be called in interrupt context, so we need to use GFP_ATOMIC flag to allocate memory to avoid sleeping in atomic context. [ 280.209809] BUG: sleeping function called from invalid context at mm/slab.h:498 [ 280.209812] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1680, name: kworker/1:3 [ 280.209814] INFO: lockdep is turned off. [ 280.209816] CPU: 1 PID: 1680 Comm: kworker/1:3 Tainted: G W 5.9.0-rc3-mptcp+ #146 [ 280.209818] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 280.209820] Workqueue: events mptcp_worker [ 280.209822] Call Trace: [ 280.209824] <IRQ> [ 280.209826] dump_stack+0x77/0xa0 [ 280.209829] ___might_sleep.cold+0xa6/0xb6 [ 280.209832] kmem_cache_alloc_trace+0x1d1/0x290 [ 280.209835] mptcp_pm_nl_get_local_id+0x23c/0x410 [ 280.209840] subflow_init_req+0x1e9/0x2ea [ 280.209843] ? inet_reqsk_alloc+0x1c/0x120 [ 280.209845] ? kmem_cache_alloc+0x264/0x290 [ 280.209849] tcp_conn_request+0x303/0xae0 [ 280.209854] ? printk+0x53/0x6a [ 280.209857] ? tcp_rcv_state_process+0x28f/0x1374 [ 280.209859] tcp_rcv_state_process+0x28f/0x1374 [ 280.209864] ? tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209866] tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209869] tcp_v4_rcv+0xed6/0xfa0 [ 280.209873] ip_protocol_deliver_rcu+0x28/0x270 [ 280.209875] ip_local_deliver_finish+0x89/0x120 [ 280.209877] ip_local_deliver+0x180/0x220 [ 280.209881] ip_rcv+0x166/0x210 [ 280.209885] __netif_receive_skb_one_core+0x82/0x90 [ 280.209888] process_backlog+0xd6/0x230 [ 280.209891] net_rx_action+0x13a/0x410 [ 280.209895] __do_softirq+0xcf/0x468 [ 280.209899] asm_call_on_stack+0x12/0x20 [ 280.209901] </IRQ> [ 280.209903] ? ip_finish_output2+0x240/0x9a0 [ 280.209906] do_softirq_own_stack+0x4d/0x60 [ 280.209908] do_softirq.part.0+0x2b/0x60 [ 280.209911] __local_bh_enable_ip+0x9a/0xa0 [ 280.209913] ip_finish_output2+0x264/0x9a0 [ 280.209916] ? rcu_read_lock_held+0x4d/0x60 [ 280.209920] ? ip_output+0x7a/0x250 [ 280.209922] ip_output+0x7a/0x250 [ 280.209925] ? __ip_finish_output+0x330/0x330 [ 280.209928] __ip_queue_xmit+0x1dc/0x5a0 [ 280.209931] __tcp_transmit_skb+0xa0f/0xc70 [ 280.209937] tcp_connect+0xb03/0xff0 [ 280.209939] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209942] ? ktime_get_with_offset+0x125/0x150 [ 280.209944] ? trace_hardirqs_on+0x1c/0xe0 [ 280.209948] tcp_v4_connect+0x449/0x550 [ 280.209953] __inet_stream_connect+0xbb/0x320 [ 280.209955] ? mark_held_locks+0x49/0x70 [ 280.209958] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209960] ? __local_bh_enable_ip+0x6b/0xa0 [ 280.209963] inet_stream_connect+0x32/0x50 [ 280.209966] __mptcp_subflow_connect+0x1fd/0x242 [ 280.209972] mptcp_pm_create_subflow_or_signal_addr+0x2db/0x600 [ 280.209975] mptcp_worker+0x543/0x7a0 [ 280.209980] process_one_work+0x26d/0x5b0 [ 280.209984] ? process_one_work+0x5b0/0x5b0 [ 280.209987] worker_thread+0x48/0x3d0 [ 280.209990] ? process_one_work+0x5b0/0x5b0 [ 280.209993] kthread+0x117/0x150 [ 280.209996] ? kthread_park+0x80/0x80 [ 280.209998] ret_from_fork+0x22/0x30 Fixes: `01cacb00b3` ("mptcp: add netlink-based PM") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:30:03 -07:00
Geliang Tang	2ff0e566fa	mptcp: fix subflow's remote_id issues This patch set the init remote_id to zero, otherwise it will be a random number. Then it added the missing subflow's remote_id setting code both in __mptcp_subflow_connect and in subflow_ulp_clone. Fixes: `01cacb00b3` ("mptcp: add netlink-based PM") Fixes: `ec3edaa7ca` ("mptcp: Add handling of outgoing MP_JOIN requests") Fixes: `f296234c98` ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:29:15 -07:00
Geliang Tang	57025817ea	mptcp: fix subflow's local_id issues In mptcp_pm_nl_get_local_id, skc_local is the same as msk_local, so it always return 0. Thus every subflow's local_id is 0. It's incorrect. This patch fixed this issue. Also, we need to ignore the zero address here, like 0.0.0.0 in IPv4. When we use the zero address as a local address, it means that we can use any one of the local addresses. The zero address is not a new address, we don't need to add it to PM, so this patch added a new function address_zero to check whether an address is the zero address, if it is, we ignore this address. Fixes: `01cacb00b3` ("mptcp: add netlink-based PM") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:29:15 -07:00
Paul Davey	bb82067c57	ipmr: Use full VIF ID in netlink cache reports Insert the full 16 bit VIF ID into ipmr Netlink cache reports. The VIF_ID attribute has 32 bits of space so can store the full VIF ID extracted from the high and low byte fields in the igmpmsg. Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:25:51 -07:00
Paul Davey	c8715a8e9f	ipmr: Add high byte of VIF ID to igmpmsg Use the unused3 byte in struct igmpmsg to hold the high 8 bits of the VIF ID. If using more than 255 IPv4 multicast interfaces it is necessary to have access to a VIF ID for cache reports that is wider than 8 bits, the VIF ID present in the igmpmsg reports sent to mroute_sk was only 8 bits wide in the igmpmsg header. Adding the high 8 bits of the 16 bit VIF ID in the unused byte allows use of more than 255 IPv4 multicast interfaces. Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:25:51 -07:00
Paul Davey	501cb00890	ipmr: Add route table ID to netlink cache reports Insert the multicast route table ID as a Netlink attribute to Netlink cache report notifications. When multiple route tables are in use it is necessary to have a way to determine which route table a given cache report belongs to when receiving the cache report. Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:25:51 -07:00
Tetsuo Handa	a4b5cc9e10	tipc: fix shutdown() of connection oriented socket I confirmed that the problem fixed by commit `2a63866c8b` ("tipc: fix shutdown() of connectionless socket") also applies to stream socket. ---------- #include <sys/socket.h> #include <unistd.h> #include <sys/wait.h> int main(int argc, char argv[]) { int fds[2] = { -1, -1 }; socketpair(PF_TIPC, SOCK_STREAM / or SOCK_DGRAM /, 0, fds); if (fork() == 0) _exit(read(fds[0], NULL, 1)); shutdown(fds[0], SHUT_RDWR); / This must make read() return. / wait(NULL); / To be woken up by _exit(). */ return 0; } ---------- Since shutdown(SHUT_RDWR) should affect all processes sharing that socket, unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right behavior. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 12:21:39 -07:00
Chen Zhou	e9091bb77f	bpf: Remove duplicate headers Remove duplicate headers which are included twice. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200908132201.184005-1-chenzhou10@huawei.com	2020-09-10 10:53:14 -07:00
Parav Pandit	66b17082d1	devlink: Use controller while building phys_port_name Now that controller number attribute is available, use it when building phsy_port_name for external controller ports. An example devlink port and representor netdev name consist of controller annotation for external controller with controller number = 1, for a VF 1 of PF 0: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "ens2f0c1pf0vf1", "flavour": "pcivf", "controller": 1, "pfnum": 0, "vfnum": 1, "external": true, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Controller number annotation is skipped for non external controllers to maintain backward compatibility. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 14:19:55 -07:00
Parav Pandit	3a2d9588c4	devlink: Introduce controller number A devlink port may be for a controller consist of PCI device. A devlink instance holds ports of two types of controllers. (1) controller discovered on same system where eswitch resides This is the case where PCI PF/VF of a controller and devlink eswitch instance both are located on a single system. (2) controller located on external host system. This is the case where a controller is located in one system and its devlink eswitch ports are located in a different system. When a devlink eswitch instance serves the devlink ports of both controllers together, PCI PF/VF numbers may overlap. Due to this a unique phys_port_name cannot be constructed. For example in below such system controller-0 and controller-1, each has PCI PF pf0 whose eswitch ports can be present in controller-0. These results in phys_port_name as "pf0" for both. Similar problem exists for VFs and upcoming Sub functions. An example view of two controller systems: --------------------------------------------------------- \| \| \| --------- --------- ------- ------- \| ----------- \| \| vf(s) \| \| sf(s) \| \|vf(s)\| \|sf(s)\| \| \| server \| \| ------- ----/---- ---/----- ------- ---/--- ---/--- \| \| pci rc \|=== \| pf0 \|______/________/ \| pf1 \|___/_______/ \| \| connect \| \| ------- ------- \| ----------- \| \| controller_num=1 (no eswitch) \| ------\|-------------------------------------------------- (internal wire) \| --------------------------------------------------------- \| devlink eswitch ports and reps \| \| ----------------------------------------------------- \| \| \|ctrl-0 \| ctrl-0 \| ctrl-0 \| ctrl-0 \| ctrl-0 \|ctrl-0 \| \| \| \|pf0 \| pf0vfN \| pf0sfN \| pf1 \| pf1vfN \|pf1sfN \| \| \| ----------------------------------------------------- \| \| \|ctrl-1 \| ctrl-1 \| ctrl-1 \| ctrl-1 \| ctrl-1 \|ctrl-1 \| \| \| \|pf1 \| pf1vfN \| pf1sfN \| pf1 \| pf1vfN \|pf0sfN \| \| \| ----------------------------------------------------- \| \| \| \| \| \| --------- --------- ------- ------- \| \| \| vf(s) \| \| sf(s) \| \|vf(s)\| \|sf(s)\| \| \| ------- ----/---- ---/----- ------- ---/--- ---/--- \| \| \| pf0 \|______/________/ \| pf1 \|___/_______/ \| \| ------- ------- \| \| \| \| local controller_num=0 (eswitch) \| --------------------------------------------------------- An example devlink port for external controller with controller number = 1 for a VF 1 of PF 0: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "ens2f0pf0vf1", "flavour": "pcivf", "controller": 1, "pfnum": 0, "vfnum": 1, "external": true, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 14:19:55 -07:00
Parav Pandit	05b595e9c4	devlink: Introduce external controller flag A devlink eswitch port may represent PCI PF/VF ports of a controller. A controller either located on same system or it can be an external controller located in host where such NIC is plugged in. Add the ability for driver to specify if a port is for external controller. Use such flag in the mlx5_core driver. An example of an external controller having VF1 of PF0 belong to controller 1. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "ens2f0pf0vf1", "flavour": "pcivf", "pfnum": 0, "vfnum": 1, "external": true, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 14:19:55 -07:00
David S. Miller	d85427e3c8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT, from Michael Zhou. 2) do_ip_vs_set_ctl() dereferences uninitialized value, from Peilin Ye. 3) Support for userdata in tables, from Jose M. Guisado. 4) Do not increment ct error and invalid stats at the same time, from Florian Westphal. 5) Remove ct ignore stats, also from Florian. 6) Add ct stats for clash resolution, from Florian Westphal. 7) Bump reference counter bump on ct clash resolution only, this is safe because bucket lock is held, again from Florian. 8) Use ip_is_fragment() in xt_HMARK, from YueHaibing. 9) Add wildcard support for nft_socket, from Balazs Scheidler. 10) Remove superfluous IPVS dependency on iptables, from Yaroslav Bolyukin. 11) Remove unused definition in ebt_stp, from Wang Hai. 12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT in selftests/net, from Fabian Frederick. 13) Add userdata support for nft_object, from Jose M. Guisado. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 11:21:19 -07:00
Ye Bin	b87f9fe1ac	hsr: avoid newline at end of message in NL_SET_ERR_MSG_MOD clean follow coccicheck warning: net//hsr/hsr_netlink.c:94:8-42: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD net//hsr/hsr_netlink.c:87:30-57: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD net//hsr/hsr_netlink.c:79:29-53: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 11:15:26 -07:00
Linus Torvalds	ab29a807a7	NFS client bugfixes for Linux 5.9 Highlights include: Bugfixes: - Fix an NFS/RDMA resource leak - Fix the error handling during delegation recall - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR - Stop printk reading past end of string -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl9ZFYAACgkQZwvnipYK APLg+RAArQ0J54M4vTg7avKhUEwIrAlPCFjHvZ5jtlXiY8JDT7Cy2lEo9W/pC9x2 BiV02H6seKXq6vKUHIBgzVq0BdZBKeWQcOpoO/dfvWSPs9u+lxKlOEwcdsaXwdXz 31u5HS4xHYg2SlYj+BcKGfVexcWVEVyPqqPvflGBZIlKfzQLHo9YY390deUHMC6o HrRXWADvpYXC1sJb3mtNtCojqr9a5A8Ty4clT19YvdwQL7cUt3HjjsOvJfbmB9S+ fW5/u3sdWJ1nYoz8AxC+utIMNmtXFBUhW0Sg+TPWMJj8yG9rclAgTxbobhXyzGph j2ZamPhUtpcSYXBlwiQCm7GbUIItnzHgU6MSCs/nq8AeDc3WEx4qVONVqNvNr/sY 1T3znylZpXCHvxLmDWzDGsW8XvZT1r86Lm6zrJCmjWm+eoSKBzeoENcXGsGGYuJu 6NGz7pgQbYMb9t7VfOEFSxxt5w0wt7nRyhV1R7taBhm5B9XjF+BOmJBI0epQ1S7i XRIr7WqxT00wijWyunNCQZxi1aDMHVYZXPwaqkEHTwJqeDzCtmir+ajAnZQUgUId 1MNiv8BDoN5YlPmj/gt+E3kbyj0Pu7M+09NvVEKqG7j8W80ltf6eb85XGrq+vp1E Y0lmDXElBdNo3AA+dBOmk+peoVv4bfoog5PymElaRiwRM25VCOM= =3fw2 -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: - Fix an NFS/RDMA resource leak - Fix the error handling during delegation recall - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR - Stop printk reading past end of string * tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: stop printk reading past end of string NFS: Zero-stateid SETATTR should first return delegation NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall xprtrdma: Release in-flight MRs on disconnect	2020-09-09 11:14:20 -07:00
Eric Dumazet	3ca1a42a52	net: qrtr: check skb_put_padto() return value If skb_put_padto() returns an error, skb has been freed. Better not touch it anymore, as reported by syzbot [1] Note to qrtr maintainers : this suggests qrtr_sendmsg() should adjust sock_alloc_send_skb() second parameter to account for the potential added alignment to avoid reallocation. [1] BUG: KASAN: use-after-free in __skb_insert include/linux/skbuff.h:1907 [inline] BUG: KASAN: use-after-free in __skb_queue_before include/linux/skbuff.h:2016 [inline] BUG: KASAN: use-after-free in __skb_queue_tail include/linux/skbuff.h:2049 [inline] BUG: KASAN: use-after-free in skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146 Write of size 8 at addr ffff88804d8ab3c0 by task syz-executor.4/4316 CPU: 1 PID: 4316 Comm: syz-executor.4 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 print_address_description+0x66/0x620 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 mm/kasan/report.c:530 __skb_insert include/linux/skbuff.h:1907 [inline] __skb_queue_before include/linux/skbuff.h:2016 [inline] __skb_queue_tail include/linux/skbuff.h:2049 [inline] skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146 qrtr_tun_send+0x1a/0x40 net/qrtr/tun.c:23 qrtr_node_enqueue+0x44f/0xc00 net/qrtr/qrtr.c:364 qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861 qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] sock_write_iter+0x317/0x470 net/socket.c:998 call_write_iter include/linux/fs.h:1882 [inline] new_sync_write fs/read_write.c:503 [inline] vfs_write+0xa96/0xd10 fs/read_write.c:578 ksys_write+0x11b/0x220 fs/read_write.c:631 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45d5b9 Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f84b5b81c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000038b40 RCX: 000000000045d5b9 RDX: 0000000000000055 RSI: 0000000020001240 RDI: 0000000000000003 RBP: 00007f84b5b81ca0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000f R13: 00007ffcbbf86daf R14: 00007f84b5b829c0 R15: 000000000118cf4c Allocated by task 4316: kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461 slab_post_alloc_hook+0x3e/0x290 mm/slab.h:518 slab_alloc mm/slab.c:3312 [inline] kmem_cache_alloc+0x1c1/0x2d0 mm/slab.c:3482 skb_clone+0x1b2/0x370 net/core/skbuff.c:1449 qrtr_bcast_enqueue+0x6d/0x140 net/qrtr/qrtr.c:857 qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] sock_write_iter+0x317/0x470 net/socket.c:998 call_write_iter include/linux/fs.h:1882 [inline] new_sync_write fs/read_write.c:503 [inline] vfs_write+0xa96/0xd10 fs/read_write.c:578 ksys_write+0x11b/0x220 fs/read_write.c:631 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 4316: kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x3d/0x70 mm/kasan/common.c:56 kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422 __cache_free mm/slab.c:3418 [inline] kmem_cache_free+0x82/0xf0 mm/slab.c:3693 __skb_pad+0x3f5/0x5a0 net/core/skbuff.c:1823 __skb_put_padto include/linux/skbuff.h:3233 [inline] skb_put_padto include/linux/skbuff.h:3252 [inline] qrtr_node_enqueue+0x62f/0xc00 net/qrtr/qrtr.c:360 qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861 qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] sock_write_iter+0x317/0x470 net/socket.c:998 call_write_iter include/linux/fs.h:1882 [inline] new_sync_write fs/read_write.c:503 [inline] vfs_write+0xa96/0xd10 fs/read_write.c:578 ksys_write+0x11b/0x220 fs/read_write.c:631 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88804d8ab3c0 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 0 bytes inside of 224-byte region [ffff88804d8ab3c0, ffff88804d8ab4a0) The buggy address belongs to the page: page:00000000ea8cccfb refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804d8abb40 pfn:0x4d8ab flags: 0xfffe0000000200(slab) raw: 00fffe0000000200 ffffea0002237ec8 ffffea00029b3388 ffff88821bb66800 raw: ffff88804d8abb40 ffff88804d8ab000 000000010000000b 0000000000000000 page dumped because: kasan: bad access detected Fixes: `ce57785bf9` ("net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Carl Huang <cjhuang@codeaurora.org> Cc: Wen Gong <wgong@codeaurora.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 11:04:39 -07:00
Wei Wang	e92dd77e6f	ipv6: add tos reflection in TCP reset and ack Currently, ipv6 stack does not do any TOS reflection. To make the behavior consistent with v4 stack, this commit adds TOS reflection in tcp_v6_reqsk_send_ack() and tcp_v6_send_reset(). We clear the lower 2-bit ECN value of the received TOS in compliance with RFC 3168 6.1.5 robustness principles. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:20:55 -07:00
Wei Wang	ba9e04a7dd	ip: fix tos reflection in ack and reset packets Currently, in tcp_v4_reqsk_send_ack() and tcp_v4_send_reset(), we echo the TOS value of the received packets in the response. However, we do not want to echo the lower 2 ECN bits in accordance with RFC 3168 6.1.5 robustness principles. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:19:08 -07:00
David S. Miller	56bbc22d83	RxRPC development -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl9X6OcACgkQ+7dXa6fL C2s3Dg/9FyGc4Na08Rc5q8K2TSVhsx1dqBscf/t3gIulCaShbbRRFAb4UecrxOTn GV/eekGVejh/a+cCCVjMGFNP2FxyW9WTFoxINaTFp+bdGURdo8MOJTCTc/1Sgflb M7s+2mubYHfHLw4/VUR8tCAhODPC0JUTiDnkZqXZxA8SizUPAkNZySH7OfgW9PjA PFy/lB5IbR9mU64FWMALU2Rs09Gav/KRA+/X8fyE80i1MgYAz82U7x/EQa0XCmdR B61VObK8z9/o7Juu8GzM01k8b5zoWraHph0e/BdFaEN1KY5phPdPyerGa67GcgwX imnvcW4N75QBlHhE8C34eKnbIaNIPUR9ZNJdEgnA29CPOtP87rSPaeNNn4ISyF8r TnGGpAeBHMahrY2dhKK0lIv9Gbwd1j+OAlrL9O+j4KTaKZc9CqnXLzKp3ugXDeex RzHptlsc0I0bQ8ZeCA1jS9OFmypaRtYabE9DSrPS2Epbb8SdcCE4ZTXabB3A/ytk HBle/MfGcQN9yFAJpDJ/0Bj6PmUZgohVS7qMrZi+JV99vkNPebxxxEvoiM5il2km 3DXPrD3rv9/qg8F6xVHYkWXVxZk6FCMe8/sg9VKUfOhmwEiFBkXv2IwTP/s3nN0g 0h46rXAzjWXY8YMZzsgBHVe/Vp2MfJGsx9hyO1JppRbxGb1PVeU= =f+/X -----END PGP SIGNATURE----- Merge tag 'rxrpc-next-20200908' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Allow more calls to same peer Here are some development patches for AF_RXRPC that allow more simultaneous calls to be made to the same peer with the same security parameters. The current code allows a maximum of 4 simultaneous calls, which limits the afs filesystem to that many simultaneous threads. This increases the limit to 16. To make this work, the way client connections are limited has to be changed (incoming call/connection limits are unaffected) as the current code depends on queuing calls on a connection and then pushing the connection through a queue. The limit is on the number of available connections. This is changed such that there's a limit[] on the total number of calls systemwide across all namespaces, but the limit on the number of client connections is removed. Once a call is allowed to proceed, it finds a bundle of connections and tries to grab a call slot. If there's a spare call slot, fine, otherwise it will wait. If there's already a waiter, it will try to create another connection in the bundle, unless the limit of 4 is reached (4 calls per connection, giving 16). A number of things throttle someone trying to set up endless connections: - Calls that fail immediately have their conns deleted immediately, - Calls that don't fail immediately have to wait for a timeout, - Connections normally get automatically reaped if they haven't been used for 2m, but this is sped up to 2s if the number of connections rises over 900. This number is tunable by sysctl. [] Technically two limits - kernel sockets and userspace rxrpc sockets are accounted separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:18:17 -07:00
David S. Miller	6fd40d32ef	Merge tag 'ieee802154-for-davem-2020-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2020-09-08 An update from ieee802154 for your net tree. A potential memory leak fix for ca8210 from Liu Jian, a check on the return for a register read in adf7242 and finally a user after free fix in the softmac tx function from Eric found by syzkaller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:12:58 -07:00
Nikolay Aleksandrov	071445c605	net: bridge: mcast: fix unused br var when lockdep isn't defined Stephen reported the following warning: net/bridge/br_multicast.c: In function 'br_multicast_find_port': net/bridge/br_multicast.c:1818:21: warning: unused variable 'br' [-Wunused-variable] 1818 \| struct net_bridge *br = mp->br; \| ^~ It happens due to bridge's mlock_dereference() when lockdep isn't defined. Silence the warning by annotating the variable as __maybe_unused. Fixes: `0436862e41` ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:11:57 -07:00
Brian Vazquez	923f614cdb	fib: fix fib_rule_ops indirect call wrappers when CONFIG_IPV6=m If CONFIG_IPV6=m, the IPV6 functions won't be found by the linker: ld: net/core/fib_rules.o: in function `fib_rules_lookup': fib_rules.c:(.text+0x606): undefined reference to `fib6_rule_match' ld: fib_rules.c:(.text+0x611): undefined reference to `fib6_rule_match' ld: fib_rules.c:(.text+0x68c): undefined reference to `fib6_rule_action' ld: fib_rules.c:(.text+0x693): undefined reference to `fib6_rule_action' ld: fib_rules.c:(.text+0x6aa): undefined reference to `fib6_rule_suppress' ld: fib_rules.c:(.text+0x6bc): undefined reference to `fib6_rule_suppress' make: *** [Makefile:1166: vmlinux] Error 1 Reported-by: Sven Joachim <svenjoac@gmx.de> Fixes: `b9aaec8f0b` ("fib: use indirect call wrappers in the most common fib_rules_ops") Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:09:08 -07:00
David S. Miller	2650be2c2d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: =================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Allow conntrack entries with l3num == NFPROTO_IPV4 or == NFPROTO_IPV6 only via ctnetlink, from Will McVicker. 2) Batch notifications to userspace to improve netlink socket receive utilization. 3) Restore mark based dump filtering via ctnetlink, from Martin Willi. 4) nf_conncount_init() fails with -EPROTO with CONFIG_IPV6, from Eelco Chaudron. 5) Containers fail to match on meta skuid and skgid, use socket user_ns to retrieve meta skuid and skgid. =================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:07:58 -07:00
Wang Hai	8c70b26817	netlabel: Fix some kernel-doc warnings Fixes the following W=1 kernel build warning(s): net/netlabel/netlabel_calipso.c:438: warning: Excess function parameter 'audit_secid' description in 'calipso_doi_remove' net/netlabel/netlabel_calipso.c:605: warning: Excess function parameter 'reg' description in 'calipso_req_delattr' Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:04:27 -07:00
Wang Hai	7edce63666	cipso: fix 'audit_secid' kernel-doc warning in cipso_ipv4.c Fixes the following W=1 kernel build warning(s): net/ipv4/cipso_ipv4.c:510: warning: Excess function parameter 'audit_secid' description in 'cipso_v4_doi_remove' Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:03:36 -07:00
Eric Dumazet	843d926b00	ipv6: avoid lockdep issue in fib6_del() syzbot reported twice a lockdep issue in fib6_del() [1] which I think is caused by net->ipv6.fib6_null_entry having a NULL fib6_table pointer. fib6_del() already checks for fib6_null_entry special case, we only need to return earlier. Bug seems to occur very rarely, I have thus chosen a 'bug origin' that makes backports not too complex. [1] WARNING: suspicious RCU usage 5.9.0-rc4-syzkaller #0 Not tainted ----------------------------- net/ipv6/ip6_fib.c:1996 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by syz-executor.5/8095: #0: ffffffff8a7ea708 (rtnl_mutex){+.+.}-{3:3}, at: ppp_release+0x178/0x240 drivers/net/ppp/ppp_generic.c:401 #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: spin_trylock_bh include/linux/spinlock.h:414 [inline] #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: fib6_run_gc+0x21b/0x2d0 net/ipv6/ip6_fib.c:2312 #2: ffffffff89bd6a40 (rcu_read_lock){....}-{1:2}, at: __fib6_clean_all+0x0/0x290 net/ipv6/ip6_fib.c:2613 #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline] #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: __fib6_clean_all+0x107/0x290 net/ipv6/ip6_fib.c:2245 stack backtrace: CPU: 1 PID: 8095 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 fib6_del+0x12b4/0x1630 net/ipv6/ip6_fib.c:1996 fib6_clean_node+0x39b/0x570 net/ipv6/ip6_fib.c:2180 fib6_walk_continue+0x4aa/0x8e0 net/ipv6/ip6_fib.c:2102 fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2150 fib6_clean_tree+0xdb/0x120 net/ipv6/ip6_fib.c:2230 __fib6_clean_all+0x120/0x290 net/ipv6/ip6_fib.c:2246 fib6_clean_all net/ipv6/ip6_fib.c:2257 [inline] fib6_run_gc+0x113/0x2d0 net/ipv6/ip6_fib.c:2320 ndisc_netdev_event+0x217/0x350 net/ipv6/ndisc.c:1805 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] dev_close_many+0x30b/0x650 net/core/dev.c:1634 rollback_registered_many+0x3a8/0x1210 net/core/dev.c:9261 rollback_registered net/core/dev.c:9329 [inline] unregister_netdevice_queue+0x2dd/0x570 net/core/dev.c:10410 unregister_netdevice include/linux/netdevice.h:2774 [inline] ppp_release+0x216/0x240 drivers/net/ppp/ppp_generic.c:403 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `421842edea` ("net/ipv6: Add fib6_null_entry") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 19:56:25 -07:00
Vladimir Oltean	2f1e8ea726	net: dsa: link interfaces with the DSA master to get rid of lockdep warnings Since commit `845e0ebb44` ("net: change addr_list_lock back to static key"), cascaded DSA setups (DSA switch port as DSA master for another DSA switch port) are emitting this lockdep warning: ============================================ WARNING: possible recursive locking detected 5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted -------------------------------------------- dhcpcd/323 is trying to acquire lock: ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90 but task is already holding lock: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dsa_master_addr_list_lock_key/1); lock(&dsa_master_addr_list_lock_key/1); * DEADLOCK * May be due to missing lock nesting notation 3 locks held by dhcpcd/323: #0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30 #1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48 #2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90 stack backtrace: Call trace: dump_backtrace+0x0/0x1e0 show_stack+0x20/0x30 dump_stack+0xec/0x158 __lock_acquire+0xca0/0x2398 lock_acquire+0xe8/0x440 _raw_spin_lock_nested+0x64/0x90 dev_mc_sync+0x44/0x90 dsa_slave_set_rx_mode+0x34/0x50 __dev_set_rx_mode+0x60/0xa0 dev_mc_sync+0x84/0x90 dsa_slave_set_rx_mode+0x34/0x50 __dev_set_rx_mode+0x60/0xa0 dev_set_rx_mode+0x30/0x48 __dev_open+0x10c/0x180 __dev_change_flags+0x170/0x1c8 dev_change_flags+0x2c/0x70 devinet_ioctl+0x774/0x878 inet_ioctl+0x348/0x3b0 sock_do_ioctl+0x50/0x310 sock_ioctl+0x1f8/0x580 ksys_ioctl+0xb0/0xf0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common.constprop.0+0x7c/0x180 do_el0_svc+0x2c/0x98 el0_sync_handler+0x9c/0x1b8 el0_sync+0x158/0x180 Since DSA never made use of the netdev API for describing links between upper devices and lower devices, the dev->lower_level value of a DSA switch interface would be 1, which would warn when it is a DSA master. We can use netdev_upper_dev_link() to describe the relationship between a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port) is an "upper" to a DSA "master" (host port). The relationship is "many uppers to one lower", like in the case of VLAN. So, for that reason, we use the same function as VLAN uses. There might be a chance that somebody will try to take hold of this interface and use it immediately after register_netdev() and before netdev_upper_dev_link(). To avoid that, we do the registration and linkage while holding the RTNL, and we use the RTNL-locked cousin of register_netdev(), which is register_netdevice(). Since this warning was not there when lockdep was using dynamic keys for addr_list_lock, we are blaming the lockdep patch itself. The network stack _has_ been using static lockdep keys before, and it _is_ likely that stacked DSA setups have been triggering these lockdep warnings since forever, however I can't test very old kernels on this particular stacked DSA setup, to ensure I'm not in fact introducing regressions. Fixes: `845e0ebb44` ("net: change addr_list_lock back to static key") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 19:40:09 -07:00
Tom Rix	c1f1f16c4d	net: sched: skip an unnecessay check Reviewing the error handling in tcf_action_init_1() most of the early handling uses err_out: if (cookie) { kfree(cookie->data); kfree(cookie); } before cookie could ever be set. So skip the unnecessay check. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 19:34:36 -07:00
David Howells	288827d53e	rxrpc: Allow multiple client connections to the same peer Allow the number of parallel connections to a machine to be expanded from a single connection to a maximum of four. This allows up to 16 calls to be in progress at the same time to any particular peer instead of 4. Signed-off-by: David Howells <dhowells@redhat.com>	2020-09-08 21:11:47 +01:00
David Howells	245500d853	rxrpc: Rewrite the client connection manager Rewrite the rxrpc client connection manager so that it can support multiple connections for a given security key to a peer. The following changes are made: (1) For each open socket, the code currently maintains an rbtree with the connections placed into it, keyed by communications parameters. This is tricky to maintain as connections can be culled from the tree or replaced within it. Connections can require replacement for a number of reasons, e.g. their IDs span too great a range for the IDR data type to represent efficiently, the call ID numbers on that conn would overflow or the conn got aborted. This is changed so that there's now a connection bundle object placed in the tree, keyed on the same parameters. The bundle, however, does not need to be replaced. (2) An rxrpc_bundle object can now manage the available channels for a set of parallel connections. The lock that manages this is moved there from the rxrpc_connection struct (channel_lock). (3) There'a a dummy bundle for all incoming connections to share so that they have a channel_lock too. It might be better to give each incoming connection its own bundle. This bundle is not needed to manage which channels incoming calls are made on because that's the solely at whim of the client. (4) The restrictions on how many client connections are around are removed. Instead, a previous patch limits the number of client calls that can be allocated. Ordinarily, client connections are reaped after 2 minutes on the idle queue, but when more than a certain number of connections are in existence, the reaper starts reaping them after 2s of idleness instead to get the numbers back down. It could also be made such that new call allocations are forced to wait until the number of outstanding connections subsides. Signed-off-by: David Howells <dhowells@redhat.com>	2020-09-08 21:11:43 +01:00
David Howells	b7a7d67408	rxrpc: Impose a maximum number of client calls Impose a maximum on the number of client rxrpc calls that are allowed simultaneously. This will be in lieu of a maximum number of client connections as this is easier to administed as, unlike connections, calls aren't reusable (to be changed in a subsequent patch).. This doesn't affect the limits on service calls and connections. Signed-off-by: David Howells <dhowells@redhat.com>	2020-09-08 21:10:45 +01:00
Daniel Borkmann	e6a18d3611	bpf: Fix clobbering of r2 in bpf_gen_ld_abs Bryce reported that he saw the following with: 0: r6 = r1 1: r1 = 12 2: r0 = (u16 )skb[r1] The xlated sequence was incorrectly clobbering r2 with pointer value of r6 ... 0: (bf) r6 = r1 1: (b7) r1 = 12 2: (bf) r1 = r6 3: (bf) r2 = r1 4: (85) call bpf_skb_load_helper_16_no_cache#7692160 ... and hence call to the load helper never succeeded given the offset was too high. Fix it by reordering the load of r6 to r1. Other than that the insn has similar calling convention than BPF helpers, that is, r0 - r5 are scratch regs, so nothing else affected after the insn. Fixes: `e0cea7ce98` ("bpf: implement ld_abs/ld_ind in native bpf") Reported-by: Bryce Kahle <bryce.kahle@datadoghq.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/cace836e4d07bb63b1a53e49c5dfb238a040c298.1599512096.git.daniel@iogearbox.net	2020-09-08 09:16:12 -07:00
Jose M. Guisado Gomez	b131c96496	netfilter: nf_tables: add userdata support for nft_object Enables storing userdata for nft_object. Initially this will store an optional comment but can be extended in the future as needed. Adds new attribute NFTA_OBJ_USERDATA to nft_object. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 16:35:38 +02:00
Eric Dumazet	0ff4628f4c	mac802154: tx: fix use-after-free syzbot reported a bug in ieee802154_tx() [1] A similar issue in ieee802154_xmit_worker() is also fixed in this patch. [1] BUG: KASAN: use-after-free in ieee802154_tx+0x3d2/0x480 net/mac802154/tx.c:88 Read of size 4 at addr ffff8880251a8c70 by task syz-executor.3/928 CPU: 0 PID: 928 Comm: syz-executor.3 Not tainted 5.9.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 ieee802154_tx+0x3d2/0x480 net/mac802154/tx.c:88 ieee802154_subif_start_xmit+0xbe/0xe4 net/mac802154/tx.c:130 __netdev_start_xmit include/linux/netdevice.h:4634 [inline] netdev_start_xmit include/linux/netdevice.h:4648 [inline] dev_direct_xmit+0x4e9/0x6e0 net/core/dev.c:4203 packet_snd net/packet/af_packet.c:2989 [inline] packet_sendmsg+0x2413/0x5290 net/packet/af_packet.c:3014 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:671 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353 ___sys_sendmsg+0xf3/0x170 net/socket.c:2407 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45d5b9 Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fc98e749c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000002ccc0 RCX: 000000000045d5b9 RDX: 0000000000000000 RSI: 0000000020007780 RDI: 000000000000000b RBP: 000000000118d020 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cfec R13: 00007fff690c720f R14: 00007fc98e74a9c0 R15: 000000000118cfec Allocated by task 928: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 slab_post_alloc_hook mm/slab.h:518 [inline] slab_alloc_node mm/slab.c:3254 [inline] kmem_cache_alloc_node+0x136/0x3e0 mm/slab.c:3574 __alloc_skb+0x71/0x550 net/core/skbuff.c:198 alloc_skb include/linux/skbuff.h:1094 [inline] alloc_skb_with_frags+0x92/0x570 net/core/skbuff.c:5771 sock_alloc_send_pskb+0x72a/0x880 net/core/sock.c:2348 packet_alloc_skb net/packet/af_packet.c:2837 [inline] packet_snd net/packet/af_packet.c:2932 [inline] packet_sendmsg+0x19fb/0x5290 net/packet/af_packet.c:3014 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:671 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353 ___sys_sendmsg+0xf3/0x170 net/socket.c:2407 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 928: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3418 [inline] kmem_cache_free.part.0+0x74/0x1e0 mm/slab.c:3693 kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:622 __kfree_skb net/core/skbuff.c:679 [inline] consume_skb net/core/skbuff.c:838 [inline] consume_skb+0xcf/0x160 net/core/skbuff.c:832 __dev_kfree_skb_any+0x9c/0xc0 net/core/dev.c:3107 fakelb_hw_xmit+0x20e/0x2a0 drivers/net/ieee802154/fakelb.c:81 drv_xmit_async net/mac802154/driver-ops.h:16 [inline] ieee802154_tx+0x282/0x480 net/mac802154/tx.c:81 ieee802154_subif_start_xmit+0xbe/0xe4 net/mac802154/tx.c:130 __netdev_start_xmit include/linux/netdevice.h:4634 [inline] netdev_start_xmit include/linux/netdevice.h:4648 [inline] dev_direct_xmit+0x4e9/0x6e0 net/core/dev.c:4203 packet_snd net/packet/af_packet.c:2989 [inline] packet_sendmsg+0x2413/0x5290 net/packet/af_packet.c:3014 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:671 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353 ___sys_sendmsg+0xf3/0x170 net/socket.c:2407 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff8880251a8c00 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 112 bytes inside of 224-byte region [ffff8880251a8c00, ffff8880251a8ce0) The buggy address belongs to the page: page:0000000062b6a4f1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x251a8 flags: 0xfffe0000000200(slab) raw: 00fffe0000000200 ffffea0000435c88 ffffea00028b6c08 ffff8880a9055d00 raw: 0000000000000000 ffff8880251a80c0 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880251a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880251a8b80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc >ffff8880251a8c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880251a8c80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff8880251a8d00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb Fixes: `409c3b0c5f` ("mac802154: tx: move stats tx increment") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@datenfreihafen.org> Cc: linux-wpan@vger.kernel.org Link: https://lore.kernel.org/r/20200908104025.4009085-1-edumazet@google.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2020-09-08 16:35:32 +02:00
Pablo Neira Ayuso	0c92411bb8	netfilter: nft_meta: use socket user_ns to retrieve skuid and skgid ... instead of using init_user_ns. Fixes: `96518518cc` ("netfilter: add nftables") Tested-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 13:04:56 +02:00
Eelco Chaudron	526e81b990	netfilter: conntrack: nf_conncount_init is failing with IPv6 disabled The openvswitch module fails initialization when used in a kernel without IPv6 enabled. nf_conncount_init() fails because the ct code unconditionally tries to initialize the netns IPv6 related bit, regardless of the build option. The change below ignores the IPv6 part if not enabled. Note that the corresponding _put() function already has this IPv6 configuration check. Fixes: `11efd5cb04` ("openvswitch: Support conntrack zone limit") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 13:04:54 +02:00
Martin Willi	6c0d95d123	netfilter: ctnetlink: fix mark based dump filtering regression conntrack mark based dump filtering may falsely skip entries if a mask is given: If the mask-based check does not filter out the entry, the else-if check is always true and compares the mark without considering the mask. The if/else-if logic seems wrong. Given that the mask during filter setup is implicitly set to 0xffffffff if not specified explicitly, the mark filtering flags seem to just complicate things. Restore the previously used approach by always matching against a zero mask is no filter mark is given. Fixes: `cb8aa9a3af` ("netfilter: ctnetlink: add kernel side filtering for dump") Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 13:04:51 +02:00
Pablo Neira Ayuso	67cc570eda	netfilter: nf_tables: coalesce multiple notifications into one skbuff On x86_64, each notification results in one skbuff allocation which consumes at least 768 bytes due to the skbuff overhead. This patch coalesces several notifications into one single skbuff, so each notification consumes at least ~211 bytes, that ~3.5 times less memory consumption. As a result, this is reducing the chances to exhaust the netlink socket receive buffer. Rule of thumb is that each notification batch only contains netlink messages whose report flag is the same, nfnetlink_send() requires this to do appropriate delivery to userspace, either via unicast (echo mode) or multicast (monitor mode). The skbuff control buffer is used to annotate the report flag for later handling at the new coalescing routine. The batch skbuff notification size is NLMSG_GOODSIZE, using a larger skbuff would allow for more socket receiver buffer savings (to amortize the cost of the skbuff even more), however, going over that size might break userspace applications, so let's be conservative and stick to NLMSG_GOODSIZE. Reported-by: Phil Sutter <phil@nwl.cc> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 13:02:44 +02:00
Wang Hai	36c3be8a2c	netfilter: ebt_stp: Remove unused macro BPDU_TYPE_TCN BPDU_TYPE_TCN is never used after it was introduced. So better to remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 12:56:38 +02:00
Will McVicker	1cc5ef91d2	netfilter: ctnetlink: add a range check for l3/l4 protonum The indexes to the nf_nat_l[34]protos arrays come from userspace. So check the tuple's family, e.g. l3num, when creating the conntrack in order to prevent an OOB memory access during setup. Here is an example kernel panic on 4.14.180 when userspace passes in an index greater than NFPROTO_NUMPROTO. Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in:... Process poc (pid: 5614, stack limit = 0x00000000a3933121) CPU: 4 PID: 5614 Comm: poc Tainted: G S W O 4.14.180-g051355490483 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM task: 000000002a3dfffe task.stack: 00000000a3933121 pc : __cfi_check_fail+0x1c/0x24 lr : __cfi_check_fail+0x1c/0x24 ... Call trace: __cfi_check_fail+0x1c/0x24 name_to_dev_t+0x0/0x468 nfnetlink_parse_nat_setup+0x234/0x258 ctnetlink_parse_nat_setup+0x4c/0x228 ctnetlink_new_conntrack+0x590/0xc40 nfnetlink_rcv_msg+0x31c/0x4d4 netlink_rcv_skb+0x100/0x184 nfnetlink_rcv+0xf4/0x180 netlink_unicast+0x360/0x770 netlink_sendmsg+0x5a0/0x6a4 ___sys_sendmsg+0x314/0x46c SyS_sendmsg+0xb4/0x108 el0_svc_naked+0x34/0x38 This crash is not happening since 5.4+, however, ctnetlink still allows for creating entries with unsupported layer 3 protocol number. Fixes: `c1d10adb4a` ("[NETFILTER]: Add ctnetlink port for nf_conntrack") Signed-off-by: Will McVicker <willmcvicker@google.com> [pablo@netfilter.org: rebased original patch on top of nf.git] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 11:41:43 +02:00
Vladimir Oltean	4349abdb40	net: dsa: don't print non-fatal MTU error if not supported Commit `72579e14a1` ("net: dsa: don't fail to probe if we couldn't set the MTU") changed, for some reason, the "err && err != -EOPNOTSUPP" check into a simple "err". This causes the MTU warning to be printed even for drivers that don't have the MTU operations implemented. Fix that. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 21:01:50 -07:00
Vladimir Oltean	c9ebf126f1	net: dsa: change PHY error message again slave_dev->name is only populated at this stage if it was specified through a label in the device tree. However that is not mandatory. When it isn't, the error message looks like this: [ 5.037057] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d [ 5.044672] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d [ 5.052275] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d [ 5.059877] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d which is especially confusing since the error gets printed on behalf of the DSA master (fsl_enetc in this case). Printing an error message that contains a valid reference to the DSA port's name is difficult at this point in the initialization stage, so at least we should print some info that is more reliable, even if less user-friendly. That may be the driver name and the hardware port index. After this change, the error is printed as: [ 6.051587] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 0 [ 6.061192] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 1 [ 6.070765] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 2 [ 6.080324] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 3 Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 21:00:53 -07:00
Wang Hai	81365af13a	rxrpc: Remove unused macro rxrpc_min_rtt_wlen rxrpc_min_rtt_wlen is never used after it was introduced. So better to remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 15:04:41 -07:00
Taehee Yoo	e1f469cd58	Revert "netns: don't disable BHs when locking "nsid_lock"" This reverts commit `8d7e5dee97`. To protect netns id, the nsid_lock is used when netns id is being allocated and removed by peernet2id_alloc() and unhash_nsid(). The nsid_lock can be used in BH context but only spin_lock() is used in this code. Using spin_lock() instead of spin_lock_bh() can result in a deadlock in the following scenario reported by the lockdep. In order to avoid a deadlock, the spin_lock_bh() should be used instead of spin_lock() to acquire nsid_lock. Test commands: ip netns del nst ip netns add nst ip link add veth1 type veth peer name veth2 ip link set veth1 netns nst ip netns exec nst ip link add name br1 type bridge vlan_filtering 1 ip netns exec nst ip link set dev br1 up ip netns exec nst ip link set dev veth1 master br1 ip netns exec nst ip link set dev veth1 up ip netns exec nst ip link add macvlan0 link br1 up type macvlan Splat looks like: [ 33.615860][ T607] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 33.617194][ T607] 5.9.0-rc1+ #665 Not tainted [ ... ] [ 33.670615][ T607] Chain exists of: [ 33.670615][ T607] &mc->mca_lock --> &bridge_netdev_addr_lock_key --> &net->nsid_lock [ 33.670615][ T607] [ 33.673118][ T607] Possible interrupt unsafe locking scenario: [ 33.673118][ T607] [ 33.674599][ T607] CPU0 CPU1 [ 33.675557][ T607] ---- ---- [ 33.676516][ T607] lock(&net->nsid_lock); [ 33.677306][ T607] local_irq_disable(); [ 33.678517][ T607] lock(&mc->mca_lock); [ 33.679725][ T607] lock(&bridge_netdev_addr_lock_key); [ 33.681166][ T607] <Interrupt> [ 33.681791][ T607] lock(&mc->mca_lock); [ 33.682579][ T607] [ 33.682579][ T607] * DEADLOCK * [ ... ] [ 33.922046][ T607] stack backtrace: [ 33.922999][ T607] CPU: 3 PID: 607 Comm: ip Not tainted 5.9.0-rc1+ #665 [ 33.924099][ T607] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 33.925714][ T607] Call Trace: [ 33.926238][ T607] dump_stack+0x78/0xab [ 33.926905][ T607] check_irq_usage+0x70b/0x720 [ 33.927708][ T607] ? iterate_chain_key+0x60/0x60 [ 33.928507][ T607] ? check_path+0x22/0x40 [ 33.929201][ T607] ? check_noncircular+0xcf/0x180 [ 33.930024][ T607] ? __lock_acquire+0x1952/0x1f20 [ 33.930860][ T607] __lock_acquire+0x1952/0x1f20 [ 33.931667][ T607] lock_acquire+0xaf/0x3a0 [ 33.932366][ T607] ? peernet2id_alloc+0x3a/0x170 [ 33.933147][ T607] ? br_port_fill_attrs+0x54c/0x6b0 [bridge] [ 33.934140][ T607] ? br_port_fill_attrs+0x5de/0x6b0 [bridge] [ 33.935113][ T607] ? kvm_sched_clock_read+0x14/0x30 [ 33.935974][ T607] _raw_spin_lock+0x30/0x70 [ 33.936728][ T607] ? peernet2id_alloc+0x3a/0x170 [ 33.937523][ T607] peernet2id_alloc+0x3a/0x170 [ 33.938313][ T607] rtnl_fill_ifinfo+0xb5e/0x1400 [ 33.939091][ T607] rtmsg_ifinfo_build_skb+0x8a/0xf0 [ 33.939953][ T607] rtmsg_ifinfo_event.part.39+0x17/0x50 [ 33.940863][ T607] rtmsg_ifinfo+0x1f/0x30 [ 33.941571][ T607] __dev_notify_flags+0xa5/0xf0 [ 33.942376][ T607] ? __irq_work_queue_local+0x49/0x50 [ 33.943249][ T607] ? irq_work_queue+0x1d/0x30 [ 33.943993][ T607] ? __dev_set_promiscuity+0x7b/0x1a0 [ 33.944878][ T607] __dev_set_promiscuity+0x7b/0x1a0 [ 33.945758][ T607] dev_set_promiscuity+0x1e/0x50 [ 33.946582][ T607] br_port_set_promisc+0x1f/0x40 [bridge] [ 33.947487][ T607] br_manage_promisc+0x8b/0xe0 [bridge] [ 33.948388][ T607] __dev_set_promiscuity+0x123/0x1a0 [ 33.949244][ T607] __dev_set_rx_mode+0x68/0x90 [ 33.950021][ T607] dev_uc_add+0x50/0x60 [ 33.950720][ T607] macvlan_open+0x18e/0x1f0 [macvlan] [ 33.951601][ T607] __dev_open+0xd6/0x170 [ 33.952269][ T607] __dev_change_flags+0x181/0x1d0 [ 33.953056][ T607] rtnl_configure_link+0x2f/0xa0 [ 33.953884][ T607] __rtnl_newlink+0x6b9/0x8e0 [ 33.954665][ T607] ? __lock_acquire+0x95d/0x1f20 [ 33.955450][ T607] ? lock_acquire+0xaf/0x3a0 [ 33.956193][ T607] ? is_bpf_text_address+0x5/0xe0 [ 33.956999][ T607] rtnl_newlink+0x47/0x70 Acked-by: Guillaume Nault <gnault@redhat.com> Fixes: `8d7e5dee97` ("netns: don't disable BHs when locking "nsid_lock"") Reported-by: syzbot+3f960c64a104eaa2c813@syzkaller.appspotmail.com Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 14:32:39 -07:00
Nikolay Aleksandrov	e12cec65b5	net: bridge: mcast: destroy all entries via gc Since each entry type has timers that can be running simultaneously we need to make sure that entries are not freed before their timers have finished. In order to do that generalize the src gc work to mcast gc work and use a callback to free the entries (mdb, port group or src). v3: add IPv6 support v2: force mcast gc on port del to make sure all port group timers have finished before freeing the bridge port Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	23550b8313	net: bridge: mcast: improve IGMPv3/MLDv2 query processing When an IGMPv3/MLDv2 query is received and we're operating in such mode then we need to avoid updating group timers if the suppress flag is set. Also we should update only timers for groups in exclude mode. v3: add IPv6/MLDv2 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	109865fe12	net: bridge: mcast: support for IGMPV3/MLDv2 BLOCK_OLD_SOURCES report We already have all necessary helpers, so process IGMPV3/MLDv2 BLOCK_OLD_SOURCES as per the RFCs. v3: add IPv6/MLDv2 support v2: directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	5bf1e00b68	net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report In order to process IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report types we need new helpers which allow us to mark entries based on their timer state and to query only marked entries. v3: add IPv6/MLDv2 support, fix other_query checks v2: directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	e6231bca6a	net: bridge: mcast: support for IGMPV3/MLDv2 MODE_IS_INCLUDE/EXCLUDE report In order to process IGMPV3/MLDv2_MODE_IS_INCLUDE/EXCLUDE report types we need some new helpers which allow us to set/clear flags for all current entries and later delete marked entries after the report sources have been processed. v3: add IPv6/MLDv2 support v2: drop flag helpers and directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	0436862e41	net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report This patch adds handling for the ALLOW_NEW_SOURCES IGMPv3/MLDv2 report types and limits them only when multicast_igmp_version == 3 or multicast_mld_version == 2 respectively. Now that IGMPv3/MLDv2 handling functions will be managing timers we need to delay their activation, thus a new argument is added which controls if the timer should be updated. We also disable host IGMPv3/MLDv2 handling as it's not yet implemented and could cause inconsistent group state, the host can only join a group as EXCLUDE {} or leave it. v4: rename update_timer to igmpv2_mldv1 and use the passed value from br_multicast_add_group's callers v3: Add IPv6/MLDv2 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	d6c33d67a8	net: bridge: mcast: delete expired port groups without srcs If an expired port group is in EXCLUDE mode, then we have to turn it into INCLUDE mode, remove all srcs with zero timer and finally remove the group itself if there are no more srcs with an active timer. For IGMPv2 use there would be no sources, so this will reduce to just removing the group as before. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	81f1983852	net: bridge: mdb: use mdb and port entries in notifications We have to use mdb and port entries when sending mdb notifications in order to fill in all group attributes properly. Before this change we would've used a fake br_mdb_entry struct to fill in only partial information about the mdb. Now we can also reuse the mdb dump fill function and thus have only a single central place which fills the mdb attributes. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	79abc87505	net: bridge: mdb: push notifications in __br_mdb_add/del This change is in preparation for using the mdb port group entries when sending a notification, so their full state and additional attributes can be filled in. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	42c11ccfe8	net: bridge: mcast: add support for group query retransmit We need to be able to retransmit group-specific and group-and-source specific queries. The new timer takes care of those. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	438ef2d027	net: bridge: mcast: add support for group-and-source specific queries Allows br_multicast_alloc_query to build queries with the port group's source lists and sends a query for sources over and under lmqt when necessary as per RFCs 3376 and 3810 with the suppress flag set appropriately. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	5205e919c9	net: bridge: mcast: add support for src list and filter mode dumping Support per port group src list (address and timer) and filter mode dumping. Protected by either multicast_lock or rcu. v3: add IPv6 support v2: require RCU or multicast_lock to traverse src groups Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	8b671779b7	net: bridge: mcast: add support for group source list Initial functions for group source lists which are needed for IGMPv3 and MLDv2 include/exclude lists. Both IPv4 and IPv6 sources are supported. User-added mdb entries are created with exclude filter mode, we can extend that later to allow user-supplied mode. When group src entries are deleted, they're freed from a workqueue to make sure their timers are not still running. Source entries are protected by the multicast_lock and rcu. The number of src groups per port group is limited to 32. v4: use the new port group del function directly add igmpv2/mldv1 bool to denote if the entry was added in those modes, it will later replace the old update_timer bool v3: add IPv6 support v2: allow src groups to be traversed under rcu Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	681590bd4c	net: bridge: mcast: factor out port group del In order to avoid future errors and reduce code duplication we should factor out the port group del sequence. This allows us to have one function which takes care of all details when removing a port group. v4: set pg's fast leave flag when deleting due to fast leave move the patch before adding source lists Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	6ec0d0ee66	net: bridge: mdb: arrange internal structs so fast-path fields are close Before this patch we'd need 2 cache lines for fast-path, now all used fields are in the first cache line. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Hoang Huu Le	d966ddcc38	tipc: fix a deadlock when flushing scheduled work In the commit `fdeba99b1e` ("tipc: fix use-after-free in tipc_bcast_get_mode"), we're trying to make sure the tipc_net_finalize_work work item finished if it enqueued. But calling flush_scheduled_work() is not just affecting above work item but either any scheduled work. This has turned out to be overkill and caused to deadlock as syzbot reported: ====================================================== WARNING: possible circular locking dependency detected 5.9.0-rc2-next-20200828-syzkaller #0 Not tainted ------------------------------------------------------ kworker/u4:6/349 is trying to acquire lock: ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: flush_workqueue+0xe1/0x13e0 kernel/workqueue.c:2777 but task is already holding lock: ffffffff8a879430 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x9b/0xb10 net/core/net_namespace.c:565 [...] Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pernet_ops_rwsem); lock(&sb->s_type->i_mutex_key#13); lock(pernet_ops_rwsem); lock((wq_completion)events); * DEADLOCK * [...] v1: To fix the original issue, we replace above calling by introducing a bit flag. When a namespace cleaned-up, bit flag is set to zero and: - tipc_net_finalize functionial just does return immediately. - tipc_net_finalize_work does not enqueue into the scheduled work queue. v2: Use cancel_work_sync() helper to make sure ONLY the tipc_net_finalize_work() stopped before releasing bcbase object. Reported-by: syzbot+d5aa7e0385f6a5d0f4fd@syzkaller.appspotmail.com Fixes: `fdeba99b1e` ("tipc: fix use-after-free in tipc_bcast_get_mode") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 12:08:53 -07:00
Antony Antony	8366685b28	xfrm: clone whole liftime_cur structure in xfrm_do_migrate When we clone state only add_time was cloned. It missed values like bytes, packets. Now clone the all members of the structure. v1->v3: - use memcpy to copy the entire structure Fixes: `80c9abaabf` ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-07 12:45:22 +02:00
Antony Antony	7aa05d3047	xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate XFRMA_SEC_CTX was not cloned from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE v1->v2: - return -ENOMEM on error v2->v3: - fix return type to int Fixes: `80c9abaabf` ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-07 12:45:22 +02:00
Antony Antony	545e5c5716	xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate XFRMA_SET_MARK and XFRMA_SET_MARK_MASK was not cloned from the old to the new. Migrate these two attributes during XFRMA_MSG_MIGRATE Fixes: `9b42c1f179` ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-07 12:45:22 +02:00
Jonathan Neuschäfer	ee1a4c84a7	net: Add a missing word Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-06 12:13:11 -07:00
Wang Hai	383e3f3ee8	net/packet: Remove unused macro BLOCK_PRIV BLOCK_PRIV is never used after it was introduced. So better to remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-06 10:17:34 -07:00
Wang Hai	be239c4d5e	NFC: digital: Remove two unused macroes DIGITAL_NFC_DEP_REQ_RES_TAILROOM is never used after it was introduced. DIGITAL_NFC_DEP_REQ_RES_HEADROOM is no more used after below commit `e8e7f42175` ("NFC: digital: Remove useless call to skb_reserve()") Remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-05 16:01:52 -07:00
Wang Hai	877c347402	caif: Remove duplicate macro SRVL_CTRL_PKT_SIZE Remove SRVL_CTRL_PKT_SIZE which is defined more than once. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-05 15:57:05 -07:00
J. Bruce Fields	8c6b6c793e	SUNRPC: stop printk reading past end of string Since p points at raw xdr data, there's no guarantee that it's NULL terminated, so we should give a length. And probably escape any special characters too. Reported-by: Zhi Li <yieli@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-09-05 10:39:41 -04:00
Linus Lüssing	7dda5b3384	batman-adv: mcast/TT: fix wrongly dropped or rerouted packets The unicast packet rerouting code makes several assumptions. For instance it assumes that there is always exactly one destination in the TT. This breaks for multicast frames in a unicast packets in several ways: For one thing if there is actually no TT entry and the destination node was selected due to the multicast tvlv flags it announced. Then an intermediate node will wrongly drop the packet. For another thing if there is a TT entry but the TTVN of this entry is newer than the originally addressed destination node: Then the intermediate node will wrongly redirect the packet, leading to duplicated multicast packets at a multicast listener and missing packets at other multicast listeners or multicast routers. Fixing this by not applying the unicast packet rerouting to batman-adv unicast packets with a multicast payload. We are not able to detect a roaming multicast listener at the moment and will just continue to send the multicast frame to both the new and old destination for a while in case of such a roaming multicast listener. Fixes: `a73105b8d4` ("batman-adv: improved client announcement mechanism") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-09-05 08:45:46 +02:00
Cong Wang	cc8e58f832	act_ife: load meta modules before tcf_idr_check_alloc() The following deadlock scenario is triggered by syzbot: Thread A: Thread B: tcf_idr_check_alloc() ... populate_metalist() rtnl_unlock() rtnl_lock() ... request_module() tcf_idr_check_alloc() rtnl_lock() At this point, thread A is waiting for thread B to release RTNL lock, while thread B is waiting for thread A to commit the IDR change with tcf_idr_insert() later. Break this deadlock situation by preloading ife modules earlier, before tcf_idr_check_alloc(), this is fine because we only need to load modules we need potentially. Reported-and-tested-by: syzbot+80e32b5d1f9923f8ace6@syzkaller.appspotmail.com Fixes: `0190c1d452` ("net: sched: atomically check-allocate action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-04 22:08:59 -07:00
Jakub Kicinski	44a8c4f33c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net We got slightly different patches removing a double word in a comment in net/ipv4/raw.c - picked the version from net. Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached values instead of VNIC login response buffer (following what commit `507ebe6444` ("ibmvnic: Fix use-after-free of VNIC login response buffer") did). Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-04 21:28:59 -07:00
Or Cohen	acf69c9462	net/packet: fix overflow in tpacket_rcv Using tp_reserve to calculate netoff can overflow as tp_reserve is unsigned int and netoff is unsigned short. This may lead to macoff receving a smaller value then sizeof(struct virtio_net_hdr), and if po->has_vnet_hdr is set, an out-of-bounds write will occur when calling virtio_net_hdr_from_skb. The bug is fixed by converting netoff to unsigned int and checking if it exceeds USHRT_MAX. This addresses CVE-2020-14386 Fixes: `8913336a7e` ("packet: add PACKET_RESERVE sockopt") Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-04 11:56:02 -07:00
Linus Torvalds	3e8d3bdc2a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi Kivilinna. 2) Fix loss of RTT samples in rxrpc, from David Howells. 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu. 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka. 5) We disable BH for too lokng in sctp_get_port_local(), add a cond_resched() here as well, from Xin Long. 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu. 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from Yonghong Song. 8) Missing of_node_put() in mt7530 DSA driver, from Sumera Priyadarsini. 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan. 10) Fix geneve tunnel checksumming bug in hns3, from Yi Li. 11) Memory leak in rxkad_verify_response, from Dinghao Liu. 12) In tipc, don't use smp_processor_id() in preemptible context. From Tuong Lien. 13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu. 14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter. 15) Fix ABI mismatch between driver and firmware in nfp, from Louis Peens. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits) net/smc: fix sock refcounting in case of termination net/smc: reset sndbuf_desc if freed net/smc: set rx_off for SMCR explicitly net/smc: fix toleration of fake add_link messages tg3: Fix soft lockup when tg3_reset_task() fails. doc: net: dsa: Fix typo in config code sample net: dp83867: Fix WoL SecureOn password nfp: flower: fix ABI mismatch between driver and firmware tipc: fix shutdown() of connectionless socket ipv6: Fix sysctl max for fib_multipath_hash_policy drivers/net/wan/hdlc: Change the default of hard_header_len to 0 net: gemini: Fix another missing clk_disable_unprepare() in probe net: bcmgenet: fix mask check in bcmgenet_validate_flow() amd-xgbe: Add support for new port mode net: usb: dm9601: Add USB ID of Keenetic Plus DSL vhost: fix typo in error message net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() pktgen: fix error message with wrong function name net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode cxgb4: fix thermal zone device registration ...	2020-09-03 18:50:48 -07:00
Ursula Braun	5fb8642a17	net/smc: fix sock refcounting in case of termination When an ISM device is removed, all its linkgroups are terminated, i.e. all the corresponding connections are killed. Connection killing invokes smc_close_active_abort(), which decreases the sock refcount for certain states to simulate passive closing. And it cancels the close worker and has to give up the sock lock for this timeframe. This opens the door for a passive close worker or a socket close to run in between. In this case smc_close_active_abort() and passive close worker resp. smc_release() might do a sock_put for passive closing. This causes: [ 1323.315943] refcount_t: underflow; use-after-free. [ 1323.316055] WARNING: CPU: 3 PID: 54469 at lib/refcount.c:28 refcount_warn_saturate+0xe8/0x130 [ 1323.316069] Kernel panic - not syncing: panic_on_warn set ... [ 1323.316084] CPU: 3 PID: 54469 Comm: uperf Not tainted 5.9.0-20200826.rc2.git0.46328853ed20.300.fc32.s390x+debug #1 [ 1323.316096] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) [ 1323.316108] Call Trace: [ 1323.316125] [<00000000c0d4aae8>] show_stack+0x90/0xf8 [ 1323.316143] [<00000000c15989b0>] dump_stack+0xa8/0xe8 [ 1323.316158] [<00000000c0d8344e>] panic+0x11e/0x288 [ 1323.316173] [<00000000c0d83144>] __warn+0xac/0x158 [ 1323.316187] [<00000000c1597a7a>] report_bug+0xb2/0x130 [ 1323.316201] [<00000000c0d36424>] monitor_event_exception+0x44/0xc0 [ 1323.316219] [<00000000c195c716>] pgm_check_handler+0x1da/0x238 [ 1323.316234] [<00000000c151844c>] refcount_warn_saturate+0xec/0x130 [ 1323.316280] ([<00000000c1518448>] refcount_warn_saturate+0xe8/0x130) [ 1323.316310] [<000003ff801f2e2a>] smc_release+0x192/0x1c8 [smc] [ 1323.316323] [<00000000c169f1fa>] __sock_release+0x5a/0xe0 [ 1323.316334] [<00000000c169f2ac>] sock_close+0x2c/0x40 [ 1323.316350] [<00000000c1086de0>] __fput+0xb8/0x278 [ 1323.316362] [<00000000c0db1e0e>] task_work_run+0x76/0xb8 [ 1323.316393] [<00000000c0d8ab84>] do_exit+0x26c/0x520 [ 1323.316408] [<00000000c0d8af08>] do_group_exit+0x48/0xc0 [ 1323.316421] [<00000000c0d8afa8>] __s390x_sys_exit_group+0x28/0x38 [ 1323.316433] [<00000000c195c32c>] system_call+0xe0/0x2b4 [ 1323.316446] 1 lock held by uperf/54469: [ 1323.316456] #0: 0000000044125e60 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x44/0xe0 The patch rechecks sock state in smc_close_active_abort() after smc_close_cancel_work() to avoid duplicate decrease of sock refcount for the same purpose. Fixes: `611b63a127` ("net/smc: cancel tx worker in case of socket aborts") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 16:52:33 -07:00
Ursula Braun	1d8df41d89	net/smc: reset sndbuf_desc if freed When an SMC connection is created, and there is a problem to create an RMB or DMB, the previously created send buffer is thrown away as well including buffer descriptor freeing. Make sure the connection no longer references the freed buffer descriptor, otherwise bugs like this are possible: [71556.835148] ============================================================================= [71556.835168] BUG kmalloc-128 (Tainted: G B OE ): Poison overwritten [71556.835172] ----------------------------------------------------------------------------- [71556.835179] INFO: 0x00000000d20894be-0x00000000aaef63e9 @offset=2724. First byte 0x0 instead of 0x6b [71556.835215] INFO: Allocated in __smc_buf_create+0x184/0x578 [smc] age=0 cpu=5 pid=46726 [71556.835234] ___slab_alloc+0x5a4/0x690 [71556.835239] __slab_alloc.constprop.0+0x70/0xb0 [71556.835243] kmem_cache_alloc_trace+0x38e/0x3f8 [71556.835250] __smc_buf_create+0x184/0x578 [smc] [71556.835257] smc_buf_create+0x2e/0xe8 [smc] [71556.835264] smc_listen_work+0x516/0x6a0 [smc] [71556.835275] process_one_work+0x280/0x478 [71556.835280] worker_thread+0x66/0x368 [71556.835287] kthread+0x17a/0x1a0 [71556.835294] ret_from_fork+0x28/0x2c [71556.835301] INFO: Freed in smc_buf_create+0xd8/0xe8 [smc] age=0 cpu=5 pid=46726 [71556.835307] __slab_free+0x246/0x560 [71556.835311] kfree+0x398/0x3f8 [71556.835318] smc_buf_create+0xd8/0xe8 [smc] [71556.835324] smc_listen_work+0x516/0x6a0 [smc] [71556.835328] process_one_work+0x280/0x478 [71556.835332] worker_thread+0x66/0x368 [71556.835337] kthread+0x17a/0x1a0 [71556.835344] ret_from_fork+0x28/0x2c [71556.835348] INFO: Slab 0x00000000a0744551 objects=51 used=51 fp=0x0000000000000000 flags=0x1ffff00000010200 [71556.835352] INFO: Object 0x00000000563480a1 @offset=2688 fp=0x00000000289567b2 [71556.835359] Redzone 000000006783cde2: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835363] Redzone 00000000e35b876e: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835367] Redzone 0000000023074562: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835372] Redzone 00000000b9564b8c: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835376] Redzone 00000000810c6362: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835380] Redzone 0000000065ef52c3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835384] Redzone 00000000c5dd6984: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835388] Redzone 000000004c480f8f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835392] Object 00000000563480a1: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835397] Object 000000009c479d06: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835401] Object 000000006e1dce92: 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b kkkk....kkkkkkkk [71556.835405] Object 00000000227f7cf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835410] Object 000000009a701215: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835414] Object 000000003731ce76: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835418] Object 00000000f7085967: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835422] Object 0000000007f99927: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. [71556.835427] Redzone 00000000579c4913: bb bb bb bb bb bb bb bb ........ [71556.835431] Padding 00000000305aef82: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835435] Padding 00000000b1cdd722: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835438] Padding 00000000c7568199: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835442] Padding 00000000fad4c4d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835451] CPU: 0 PID: 47939 Comm: kworker/0:15 Tainted: G B OE 5.9.0-rc1uschi+ #54 [71556.835456] Hardware name: IBM 3906 M03 703 (LPAR) [71556.835464] Workqueue: events smc_listen_work [smc] [71556.835470] Call Trace: [71556.835478] [<00000000d5eaeb10>] show_stack+0x90/0xf8 [71556.835493] [<00000000d66fc0f8>] dump_stack+0xa8/0xe8 [71556.835499] [<00000000d61a511c>] check_bytes_and_report+0x104/0x130 [71556.835504] [<00000000d61a57b2>] check_object+0x26a/0x2e0 [71556.835509] [<00000000d61a59bc>] alloc_debug_processing+0x194/0x238 [71556.835514] [<00000000d61a8c14>] ___slab_alloc+0x5a4/0x690 [71556.835519] [<00000000d61a9170>] __slab_alloc.constprop.0+0x70/0xb0 [71556.835524] [<00000000d61aaf66>] kmem_cache_alloc_trace+0x38e/0x3f8 [71556.835530] [<000003ff80549bbc>] __smc_buf_create+0x184/0x578 [smc] [71556.835538] [<000003ff8054a396>] smc_buf_create+0x2e/0xe8 [smc] [71556.835545] [<000003ff80540c16>] smc_listen_work+0x516/0x6a0 [smc] [71556.835549] [<00000000d5f0f448>] process_one_work+0x280/0x478 [71556.835554] [<00000000d5f0f6a6>] worker_thread+0x66/0x368 [71556.835559] [<00000000d5f18692>] kthread+0x17a/0x1a0 [71556.835563] [<00000000d6abf3b8>] ret_from_fork+0x28/0x2c [71556.835569] INFO: lockdep is turned off. [71556.835573] FIX kmalloc-128: Restoring 0x00000000d20894be-0x00000000aaef63e9=0x6b [71556.835577] FIX kmalloc-128: Marking all objects used Fixes: `fd7f3a7465` ("net/smc: remove freed buffer from list") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 16:52:33 -07:00
Ursula Braun	2d2bfeb8c5	net/smc: set rx_off for SMCR explicitly SMC tries to make use of SMCD first. If a problem shows up, it tries to switch to SMCR. If the SMCD initializing problem shows up after the SMCD connection has already been initialized, field rx_off keeps the wrong SMCD value for SMCR, which results in corrupted data at the receiver. This patch adds an explicit (re-)setting of field rx_off to zero if the connection uses SMCR. Fixes: `be244f28d2` ("net/smc: add SMC-D support in data transfer") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 16:52:33 -07:00
Karsten Graul	fffe83c8c4	net/smc: fix toleration of fake add_link messages Older SMCR implementations had no link failover support and used one link only. Because the handshake protocol requires to try the establishment of a second link the old code sent a fake add_link message and declined any server response afterwards. The current code supports multiple links and inspects the received fake add_link message more closely. To tolerate the fake add_link messages smc_llc_is_local_add_link() needs an improved check of the message to be able to separate between locally enqueued and fake add_link messages. And smc_llc_cli_add_link() needs to check if the provided qp_mtu size is invalid and reject the add_link request in that case. Fixes: `c48254fa48` ("net/smc: move add link processing for new device into llc layer") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 16:52:33 -07:00
Wei Wang	c107761614	ip: expose inet sockopts through inet_diag Expose all exisiting inet sockopt bits through inet_diag for debug purpose. Corresponding changes in iproute2 ss will be submitted to output all these values. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 15:17:28 -07:00
Edward Cree	2adc6edcae	ethtool: fix error handling in ethtool_phys_id If ops->set_phys_id() returned an error, previously we would only break out of the inner loop, which neither stopped the outer loop nor returned the error to the user (since 'rc' would be overwritten on the next pass through the loop). Thus, rewrite it to use a single loop, so that the break does the right thing. Use u64 for 'count' and 'i' to prevent overflow in case of (unreasonably) large values of id.data and n. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 14:59:51 -07:00
Tom Parkin	9d319a8e93	l2tp: avoid duplicated code in l2tp_tunnel_closeall l2tp_tunnel_closeall is called as a part of tunnel shutdown in order to close all the sessions held by the tunnel. The code it uses to close a session duplicates what l2tp_session_delete does. Rather than duplicating the code, have l2tp_tunnel_closeall call l2tp_session_delete instead. This involves a very minor change to locking in l2tp_tunnel_closeall. Previously, l2tp_tunnel_closeall checked the session "dead" flag while holding tunnel->hlist_lock. This allowed for the code to step to the next session in the list without releasing the lock if the current session happened to be in the process of closing already. By calling l2tp_session_delete instead, l2tp_tunnel_closeall must now drop and regain the hlist lock for each session in the tunnel list. Given that the likelihood of a session being in the process of closing when the tunnel is closed, it seems worth this very minor potential loss of efficiency to avoid duplication of the session delete code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 12:19:03 -07:00
Tom Parkin	45faeff11b	l2tp: make magic feather checks more useful The l2tp tunnel and session structures contain a "magic feather" field which was originally intended to help trace lifetime bugs in the code. Since the introduction of the shared kernel refcount code in refcount.h, and l2tp's porting to those APIs, we are covered by the refcount code's checks and warnings. Duplicating those checks in the l2tp code isn't useful. However, magic feather checks are still useful to help to detect bugs stemming from misuse/trampling of the sk_user_data pointer in struct sock. The l2tp code makes extensive use of sk_user_data to stash pointers to the tunnel and session structures, and if another subsystem overwrites sk_user_data it's important to detect this. As such, rework l2tp's magic feather checks to focus on validating the tunnel and session data structures when they're extracted from sk_user_data. * Add a new accessor function l2tp_sk_to_tunnel which contains a magic feather check, and is used by l2tp_core and l2tp_ip[6] * Comment l2tp_udp_encap_recv which doesn't use this new accessor function because of the specific nature of the codepath it is called in * Drop l2tp_session_queue_purge's check on the session magic feather: it is called from code which is walking the tunnel session list, and hence doesn't need validation * Drop l2tp_session_free's check on the tunnel magic feather: the intention of this check is covered by refcount.h's reference count sanity checking * Add session magic validation in pppol2tp_ioctl. On failure return -EBADF, which mirrors the approach in pppol2tp_[sg]etsockopt. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 12:19:03 -07:00
Tom Parkin	de68b039e9	l2tp: capture more tx errors in data plane stats l2tp_xmit_skb has a number of failure paths which are not reflected in the tunnel and session statistics because the stats are updated by l2tp_xmit_core. Hence any errors occurring before l2tp_xmit_core is called are missed from the statistics. Refactor the transmit path slightly to capture all error paths. l2tp_xmit_skb now leaves all the actual work of transmission to l2tp_xmit_core, and updates the statistics based on l2tp_xmit_core's return code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 12:19:03 -07:00
Tom Parkin	c9ccd4c63c	l2tp: drop net argument from l2tp_tunnel_create The argument is unused, so remove it. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 12:19:03 -07:00
Tom Parkin	039bca78cb	l2tp: drop data_len argument from l2tp_xmit_core The data_len argument passed to l2tp_xmit_core is no longer used, so remove it. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 12:19:03 -07:00
Tom Parkin	efe0527882	l2tp: remove header length param from l2tp_xmit_skb All callers pass the session structure's hdr_len field as the header length parameter to l2tp_xmit_skb. Since we're passing a pointer to the session structure to l2tp_xmit_skb anyway, there's not much point breaking the header length out as a separate argument. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-03 12:19:03 -07:00
Tetsuo Handa	2a63866c8b	tipc: fix shutdown() of connectionless socket syzbot is reporting hung task at nbd_ioctl() [1], for there are two problems regarding TIPC's connectionless socket's shutdown() operation. ---------- #include <fcntl.h> #include <sys/socket.h> #include <sys/ioctl.h> #include <linux/nbd.h> #include <unistd.h> int main(int argc, char argv[]) { const int fd = open("/dev/nbd0", 3); alarm(5); ioctl(fd, NBD_SET_SOCK, socket(PF_TIPC, SOCK_DGRAM, 0)); ioctl(fd, NBD_DO_IT, 0); / To be interrupted by SIGALRM. */ return 0; } ---------- One problem is that wait_for_completion() from flush_workqueue() from nbd_start_device_ioctl() from nbd_ioctl() cannot be completed when nbd_start_device_ioctl() received a signal at wait_event_interruptible(), for tipc_shutdown() from kernel_sock_shutdown(SHUT_RDWR) from nbd_mark_nsock_dead() from sock_shutdown() from nbd_start_device_ioctl() is failing to wake up a WQ thread sleeping at wait_woken() from tipc_wait_for_rcvmsg() from sock_recvmsg() from sock_xmit() from nbd_read_stat() from recv_work() scheduled by nbd_start_device() from nbd_start_device_ioctl(). Fix this problem by always invoking sk->sk_state_change() (like inet_shutdown() does) when tipc_shutdown() is called. The other problem is that tipc_wait_for_rcvmsg() cannot return when tipc_shutdown() is called, for tipc_shutdown() sets sk->sk_shutdown to SEND_SHUTDOWN (despite "how" is SHUT_RDWR) while tipc_wait_for_rcvmsg() needs sk->sk_shutdown set to RCV_SHUTDOWN or SHUTDOWN_MASK. Fix this problem by setting sk->sk_shutdown to SHUTDOWN_MASK (like inet_shutdown() does) when the socket is connectionless. [1] https://syzkaller.appspot.com/bug?id=3fe51d307c1f0a845485cf1798aa059d12bf18b2 Reported-by: syzbot <syzbot+e36f41d207137b5d12f7@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-02 15:49:30 -07:00
Ido Schimmel	05d4487197	ipv6: Fix sysctl max for fib_multipath_hash_policy Cited commit added the possible value of '2', but it cannot be set. Fix it by adjusting the maximum value to '2'. This is consistent with the corresponding IPv4 sysctl. Before: # sysctl -w net.ipv6.fib_multipath_hash_policy=2 sysctl: setting key "net.ipv6.fib_multipath_hash_policy": Invalid argument net.ipv6.fib_multipath_hash_policy = 2 # sysctl net.ipv6.fib_multipath_hash_policy net.ipv6.fib_multipath_hash_policy = 0 After: # sysctl -w net.ipv6.fib_multipath_hash_policy=2 net.ipv6.fib_multipath_hash_policy = 2 # sysctl net.ipv6.fib_multipath_hash_policy net.ipv6.fib_multipath_hash_policy = 2 Fixes: `d8f74f0975` ("ipv6: Support multipath hashing on inner IP pkts") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-02 15:44:53 -07:00
Magnus Karlsson	83cf5c68d6	xsk: Fix use-after-free in failed shared_umem bind Fix use-after-free when a shared umem bind fails. The code incorrectly tried to free the allocated buffer pool both in the bind code and then later also when the socket was released. Fix this by setting the buffer pool pointer to NULL after the bind code has freed the pool, so that the socket release code will not try to free the pool. This is the same solution as the regular, non-shared umem code path has. This was missing from the shared umem path. Fixes: `b5aea28dca` ("xsk: Add shared umem support between queue ids") Reported-by: syzbot+5334f62e4d22804e646a@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599032164-25684-1-git-send-email-magnus.karlsson@intel.com	2020-09-02 23:37:19 +02:00
Gustavo A. R. Silva	1d6fd78a21	xsk: Fix null check on error return path Currently, dma_map is being checked, when the right object identifier to be null-checked is dma_map->dma_pages, instead. Fix this by null-checking dma_map->dma_pages. Fixes: `921b68692a` ("xsk: Enable sharing of dma mappings") Addresses-Coverity-ID: 1496811 ("Logically dead code") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20200902150750.GA7257@embeddedor	2020-09-02 20:31:50 +02:00
Magnus Karlsson	968be23cea	xsk: Fix possible segfault at xskmap entry insertion Fix possible segfault when entry is inserted into xskmap. This can happen if the socket is in a state where the umem has been set up, the Rx ring created but it has yet to be bound to a device. In this case the pool has not yet been created and we cannot reference it for the existence of the fill ring. Fix this by removing the whole xsk_is_setup_for_bpf_map function. Once upon a time, it was used to make sure that the Rx and fill rings where set up before the driver could call xsk_rcv, since there are no tests for the existence of these rings in the data path. But these days, we have a state variable that we test instead. When it is XSK_BOUND, everything has been set up correctly and the socket has been bound. So no reason to have the xsk_is_setup_for_bpf_map function anymore. Fixes: `7361f9c3d7` ("xsk: Move fill and completion rings to buffer pool") Reported-by: syzbot+febe51d44243fbc564ee@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599037569-26690-1-git-send-email-magnus.karlsson@intel.com	2020-09-02 16:52:59 +02:00
Magnus Karlsson	53ea2076d8	xsk: Fix possible segfault in xsk umem diagnostics Fix possible segfault in the xsk diagnostics code when dumping information about the umem. This can happen when a umem has been created, but the socket has not been bound yet. In this case, the xsk buffer pool does not exist yet and we cannot dump the information that was moved from the umem to the buffer pool. Fix this by testing for the existence of the buffer pool and if not there, do not dump any of that information. Fixes: `c2d3d6a474` ("xsk: Move queue_id, dev and need_wakeup to buffer pool") Fixes: `7361f9c3d7` ("xsk: Move fill and completion rings to buffer pool") Reported-by: syzbot+3f04d36b7336f7868066@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599036743-26454-1-git-send-email-magnus.karlsson@intel.com	2020-09-02 16:49:40 +02:00
Eelco Chaudron	e0afe91443	net: openvswitch: fixes crash if nf_conncount_init() fails If nf_conncount_init fails currently the dispatched work is not canceled, causing problems when the timer fires. This change fixes this by not scheduling the work until all initialization is successful. Fixes: `a65878d6f0` ("net: openvswitch: fixes potential deadlock in dp cleanup code") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-01 13:23:23 -07:00
David S. Miller	150f29f5e6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-09-01 The following pull-request contains BPF updates for your net-next tree. There are two small conflicts when pulling, resolve as follows: 1) Merge conflict in tools/lib/bpf/libbpf.c between `88a8212028` ("libbpf: Factor out common ELF operations and improve logging") in bpf-next and `1e891e513e` ("libbpf: Fix map index used in error message") in net-next. Resolve by taking the hunk in bpf-next: [...] scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx); data = elf_sec_data(obj, scn); if (!scn \|\| !data) { pr_warn("elf: failed to get %s map definitions for %s\n", MAPS_ELF_SEC, obj->path); return -EINVAL; } [...] 2) Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c between `9647c57b11` ("xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for better performance") in bpf-next and `e20f0dbf20` ("net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES") in net-next. Resolve the two locations by retaining net_prefetch() and taking xsk_buff_dma_sync_for_cpu() from bpf-next. Should look like: [...] xdp_set_data_meta_invalid(xdp); xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); net_prefetch(xdp->data); [...] We've added 133 non-merge commits during the last 14 day(s) which contain a total of 246 files changed, 13832 insertions(+), 3105 deletions(-). The main changes are: 1) Initial support for sleepable BPF programs along with bpf_copy_from_user() helper for tracing to reliably access user memory, from Alexei Starovoitov. 2) Add BPF infra for writing and parsing TCP header options, from Martin KaFai Lau. 3) bpf_d_path() helper for returning full path for given 'struct path', from Jiri Olsa. 4) AF_XDP support for shared umems between devices and queues, from Magnus Karlsson. 5) Initial prep work for full BPF-to-BPF call support in libbpf, from Andrii Nakryiko. 6) Generalize bpf_sk_storage map & add local storage for inodes, from KP Singh. 7) Implement sockmap/hash updates from BPF context, from Lorenz Bauer. 8) BPF xor verification for scalar types & add BPF link iterator, from Yonghong Song. 9) Use target's prog type for BPF_PROG_TYPE_EXT prog verification, from Udip Pant. 10) Rework BPF tracing samples to use libbpf loader, from Daniel T. Lee. 11) Fix xdpsock sample to really cycle through all buffers, from Weqaar Janjua. 12) Improve type safety for tun/veth XDP frame handling, from Maciej Żenczykowski. 13) Various smaller cleanups and improvements all over the place. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-01 13:22:59 -07:00
Yutaro Hayakawa	ffa81fa46e	net/tls: Implement getsockopt SOL_TLS TLS_RX Implement the getsockopt SOL_TLS TLS_RX which is currently missing. The primary usecase is to use it in conjunction with TCP_REPAIR to checkpoint/restore the TLS record layer state. TLS connection state usually exists on the user space library. So basically we can easily extract it from there, but when the TLS connections are delegated to the kTLS, it is not the case. We need to have a way to extract the TLS state from the kernel for both of TX and RX side. The new TLS_RX getsockopt copies the crypto_info to user in the same way as TLS_TX does. We have described use cases in our research work in Netdev 0x14 Transport Workshop [1]. Also, there is an TLS implementation called tlse [2] which supports TLS connection migration. They have support of kTLS and their code shows that they are expecting the future support of this option. [1] https://speakerdeck.com/yutarohayakawa/prism-proxies-without-the-pain [2] https://github.com/eduardsui/tlse Signed-off-by: Yutaro Hayakawa <yhayakawa3720@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-01 11:47:12 -07:00
Leesoo Ahn	355db39110	pktgen: fix error message with wrong function name Error on calling kthread_create_on_node prints wrong function name, kernel_thread. Fixes: `94dcf29a11` ("kthread: use kthread_create_on_node()") Signed-off-by: Leesoo Ahn <dev@ooseel.net> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-01 11:44:53 -07:00
Tonghao Zhang	e6896163b2	net: openvswitch: remove unused keep_flows keep_flows was introduced by [1], which used as flag to delete flows or not. When rehashing or expanding the table instance, we will not flush the flows. Now don't use it anymore, remove it. [1] - `acd051f176` Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-01 11:42:15 -07:00
Tonghao Zhang	df68d64ee3	net: openvswitch: refactor flow free function Decrease table->count and ufid_count unconditionally, because we only don't use count or ufid_count to count when flushing the flows. To simplify the codes, we remove the "count" argument of table_instance_flow_free. To avoid a bug when deleting flows in the future, add WARN_ON in flush flows function. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-01 11:42:15 -07:00
Tonghao Zhang	cf3266ad48	net: openvswitch: improve the coding style Not change the logic, just improve the coding style. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-01 11:42:15 -07:00
Abhishek Pandit-Subedi	0e9952804e	Bluetooth: Clear suspend tasks on unregister While unregistering, make sure to clear the suspend tasks before cancelling the work. If the unregister is called during resume from suspend, this will unnecessarily add 2s to the resume time otherwise. Fixes: `4e8c36c3b0` (Bluetooth: Fix suspend notifier race) Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-09-01 13:22:41 +02:00
Yaroslav Bolyukin	144b0a0e60	ipvs: remove dependency on ip6_tables This dependency was added because ipv6_find_hdr was in iptables specific code but is no longer required Fixes: `f8f626754e` ("ipv6: Move ipv6_find_hdr() out of Netfilter code.") Fixes: `63dca2c0b0` ("ipvs: Fix faulty IPv6 extension header handling in IPVS") Signed-off-by: Yaroslav Bolyukin <iam@lach.pw> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-31 23:06:51 +02:00
Miaohe Lin	34e1ec319e	net: ipv4: remove unused arg exact_dif in compute_score The arg exact_dif is not used anymore, remove it. inet_exact_dif_match() is no longer needed after the above is removed, so remove it too. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 13:08:29 -07:00
Miaohe Lin	3f7d820bad	net: ipv6: remove unused arg exact_dif in compute_score The arg exact_dif is not used anymore, remove it. inet6_exact_dif_match() is no longer needed after the above is removed, remove it too. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 13:08:10 -07:00
YueHaibing	622a63f6f3	tipc: Remove unused macro TIPC_NACK_INTV There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:39:07 -07:00
YueHaibing	ff007a9ba2	tipc: Remove unused macro TIPC_FWD_MSG There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:38:48 -07:00
YueHaibing	b1fd4470cd	mptcp: Remove unused macro MPTCP_SAME_STATE There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:37:40 -07:00
Miaohe Lin	5af68891dc	net: clean up codestyle This is a pure codestyle cleanup patch. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:33:34 -07:00
Miaohe Lin	cbc08a3312	net: Use helper macro IP_MAX_MTU in __ip_append_data() What 0xFFFF means here is actually the max mtu of a ip packet. Use help macro IP_MAX_MTU here. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:33:16 -07:00
wenxu	a7c978c6c9	openvswitch: using ip6_fragment in ipv6_stub Using ipv6_stub->ipv6_fragment to avoid the netfilter dependency Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:26:39 -07:00
wenxu	1d97898b36	ipv6: add ipv6_fragment hook in ipv6_stub Add ipv6_fragment to ipv6_stub to avoid calling netfilter when access ip6_fragment. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:26:39 -07:00
Magnus Karlsson	a1132430c2	xsk: Add shared umem support between devices Add support to share a umem between different devices. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same device. Note that when sharing a umem between devices, just as in the case of sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket (with two setsockopts, one for each ring) before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics of the rings can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-13-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	b5aea28dca	xsk: Add shared umem support between queue ids Add support to share a umem between queue ids on the same device. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same queue id and device, and you shared one set of fill and completion rings. However, note that when sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-12-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	921b68692a	xsk: Enable sharing of dma mappings Enable the sharing of dma mappings by moving them out from the buffer pool. Instead we put each dma mapped umem region in a list in the umem structure. If dma has already been mapped for this umem and device, it is not mapped again and the existing dma mappings are reused. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-9-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	7f7ffa4e9c	xsk: Move addrs from buffer pool to umem Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-8-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	a5aa8e529e	xsk: Move xsk_tx_list and its lock to buffer pool Move the xsk_tx_list and the xsk_tx_list_lock from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one xsk_tx_list per device and queue id, so it should be located in the buffer pool. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-7-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	c2d3d6a474	xsk: Move queue_id, dev and need_wakeup to buffer pool Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-6-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	7361f9c3d7	xsk: Move fill and completion rings to buffer pool Move the fill and completion rings from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queue ids. In this case, we need one fill and completion ring per queue id. As the buffer pool is per queue id and napi id this is a natural place for it and one umem struture can be shared between these buffer pools. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-5-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	1c1efc2af1	xsk: Create and free buffer pool independently from umem Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	c4655761d3	xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfaces Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:04 +02:00
Magnus Karlsson	1742b3d528	xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com	2020-08-31 21:15:03 +02:00
Johannes Berg	c30a3c957c	netlink: policy: correct validation type check In the policy export for binary attributes I erroneously used a != NLA_VALIDATE_NONE comparison instead of checking for the two possible values, which meant that if a validation function pointer ended up aliasing the min/max as negatives, we'd hit a warning in nla_get_range_unsigned(). Fix this to correctly check for only the two types that should be handled here, i.e. range with or without warn-too-long. Reported-by: syzbot+353df1490da781637624@syzkaller.appspotmail.com Fixes: `8aa26c575f` ("netlink: make NLA_BINARY validation more flexible") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 12:01:15 -07:00
David S. Miller	e9d572d94e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Do not delete clash entries on reply, let them expire instead, from Florian Westphal. 2) Do not report EAGAIN to nfnetlink, otherwise this enters a busy loop. Update nfnetlink_unicast() to translate EAGAIN to ENOBUFS. 3) Remove repeated words in code comments, from Randy Dunlap. 4) Several patches for the flowtable selftests, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-31 11:22:30 -07:00
Tuong Lien	bb8872a1e6	tipc: fix using smp_processor_id() in preemptible The 'this_cpu_ptr()' is used to obtain the AEAD key' TFM on the current CPU for encryption, however the execution can be preemptible since it's actually user-space context, so the 'using smp_processor_id() in preemptible' has been observed. We fix the issue by using the 'get/put_cpu_ptr()' API which consists of a 'preempt_disable()' instead. Fixes: `fc1b6d6de2` ("tipc: introduce TIPC encryption & authentication") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-30 19:12:17 -07:00
Balazs Scheidler	67407a406d	netfilter: nft_socket: add wildcard support Add NFT_SOCKET_WILDCARD to match to wildcard socket listener. Signed-off-by: Balazs Scheidler <bazsi77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-29 13:04:44 +02:00
Florian Westphal	c46172147e	netfilter: conntrack: do not auto-delete clash entries on reply Its possible that we have more than one packet with the same ct tuple simultaneously, e.g. when an application emits n packets on same UDP socket from multiple threads. NAT rules might be applied to those packets. With the right set of rules, n packets will be mapped to m destinations, where at least two packets end up with the same destination. When this happens, the existing clash resolution may merge the skb that is processed after the first has been received with the identical tuple already in hash table. However, its possible that this identical tuple is a NAT_CLASH tuple. In that case the second skb will be sent, but no reply can be received since the reply that is processed first removes the NAT_CLASH tuple. Do not auto-delete, this gives a 1 second window for replies to be passed back to originator. Packets that are coming later (udp stream case) will not be affected: they match the original ct entry, not a NAT_CLASH one. Also prevent NAT_CLASH entries from getting offloaded. Fixes: `6a757c07e5` ("netfilter: conntrack: allow insertion of clashing entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-29 13:03:06 +02:00
Pablo Neira Ayuso	ee92118355	netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFS Frontend callback reports EAGAIN to nfnetlink to retry a command, this is used to signal that module autoloading is required. Unfortunately, nlmsg_unicast() reports EAGAIN in case the receiver socket buffer gets full, so it enters a busy-loop. This patch updates nfnetlink_unicast() to turn EAGAIN into ENOBUFS and to use nlmsg_unicast(). Remove the flags field in nfnetlink_unicast() since this is always MSG_DONTWAIT in the existing code which is exactly what nlmsg_unicast() passes to netlink_unicast() as parameter. Fixes: `96518518cc` ("netfilter: add nftables") Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 20:11:58 +02:00
Randy Dunlap	4b7ddc58e6	netfilter: delete repeated words Drop duplicated words in net/netfilter/ and net/ipv4/netfilter/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 20:11:38 +02:00
YueHaibing	f5143e10a2	netfilter: xt_HMARK: Use ip_is_fragment() helper Use ip_is_fragment() to simpify code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:55:51 +02:00
Florian Westphal	ff73e7479b	netfilter: conntrack: remove unneeded nf_ct_put We can delay refcount increment until we reassign the existing entry to the current skb. A 0 refcount can't happen while the nf_conn object is still in the hash table and parallel mutations are impossible because we hold the bucket lock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:51:27 +02:00
Florian Westphal	bc92470413	netfilter: conntrack: add clash resolution stat counter There is a misconception about what "insert_failed" means. We increment this even when a clash got resolved, so it might not indicate a problem. Add a dedicated counter for clash resolution and only increment insert_failed if a clash cannot be resolved. For the old /proc interface, export this in place of an older stat that got removed a while back. For ctnetlink, export this with a new attribute. Also correct an outdated comment that implies we add a duplicate tuple -- we only add the (unique) reply direction. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:51:26 +02:00
Florian Westphal	4afc41dfa5	netfilter: conntrack: remove ignore stats This counter increments when nf_conntrack_in sees a packet that already has a conntrack attached or when the packet is marked as UNTRACKED. Neither is an error. The former is normal for loopback traffic. The second happens for certain ICMPv6 packets or when nftables/ip(6)tables rules are in place. In case someone needs to count UNTRACKED packets, or packets that are marked as untracked before conntrack_in this can be done with both nftables and ip(6)tables rules. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:51:26 +02:00
Florian Westphal	b1328e54ac	netfilter: conntrack: do not increment two error counters at same time The /proc interface for nf_conntrack displays the "error" counter as "icmp_error". It makes sense to not increment "invalid" when failing to handle an icmp packet since those are special. For example, its possible for conntrack to see partial and/or fragmented packets inside icmp errors. This should be a separate event and not get mixed with the "invalid" counter. Likewise, remove the "error" increment for errors from get_l4proto(). After this, the error counter will only increment for errors coming from icmp(v6) packet handling. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:51:25 +02:00
Jose M. Guisado Gomez	7a81575b80	netfilter: nf_tables: add userdata attributes to nft_table Enables storing userdata for nft_table. Field udata points to user data and udlen store its length. Adds new attribute flag NFTA_TABLE_USERDATA Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:18:48 +02:00
Peilin Ye	c5a8a8498e	ipvs: Fix uninit-value in do_ip_vs_set_ctl() do_ip_vs_set_ctl() is referencing uninitialized stack value when `len` is zero. Fix it. Reported-by: syzbot+23b5f9e7caf61d9a3898@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=46ebfb92a8a812621a001ef04d90dfa459520fe2 Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:18:48 +02:00
Michael Zhou	d5608a0578	netfilter: ip6t_NPT: rewrite addresses in ICMPv6 original packet Detect and rewrite a prefix embedded in an ICMPv6 original packet that was rewritten by a corresponding DNPT/SNPT rule so it will be recognised by the host that sent the original packet. Example Rules in effect on the 1:2:3:4::/64 + 5:6:7:8::/64 side router: * SNPT src-pfx 1:2:3:4::/64 dst-pfx 5:6:7:8::/64 * DNPT src-pfx 5:6:7:8::/64 dst-pfx 1:2:3:4::/64 No rules on the 9🅰️b:c::/64 side. 1. 1:2:3:4::1 sends UDP packet to 9🅰️b:c::1 2. Router applies SNPT changing src to 5:6:7:8::ffef::1 3. 9🅰️b:c::1 receives packet with (src 5:6:7:8::ffef::1 dst 9🅰️b:c::1) and replies with ICMPv6 port unreachable to 5:6:7:8::ffef::1, including original packet (src 5:6:7:8::ffef::1 dst 9🅰️b:c::1) 4. Router forwards ICMPv6 packet with (src 9🅰️b:c::1 dst 5:6:7:8::ffef::1) including original packet (src 5:6:7:8::ffef::1 dst 9🅰️b:c::1) and applies DNPT changing dst to 1:2:3:4::1 5. 1:2:3:4::1 receives ICMPv6 packet with (src 9🅰️b:c::1 dst 1:2:3:4::1) including original packet (src 5:6:7:8::ffef::1 dst 9🅰️b:c::1). It doesn't recognise the original packet as the src doesn't match anything it originally sent With this change, at step 4, DNPT will also rewrite the original packet src to 1:2:3:4::1, so at step 5, 1:2:3:4::1 will recognise the ICMPv6 error and provide feedback to the application properly. Conversely, SNPT will help when ICMPv6 errors are sent from the translated network. 1. 9🅰️b:c::1 sends UDP packet to 5:6:7:8::ffef::1 2. Router applies DNPT changing dst to 1:2:3:4::1 3. 1:2:3:4::1 receives packet with (src 9🅰️b:c::1 dst 1:2:3:4::1) and replies with ICMPv6 port unreachable to 9🅰️b:c::1 including original packet (src 9🅰️b:c::1 dst 1:2:3:4::1) 4. Router forwards ICMPv6 packet with (src 1:2:3:4::1 dst 9🅰️b:c::1) including original packet (src 9🅰️b:c::1 dst 1:2:3:4::1) and applies SNPT changing src to 5:6:7:8::ffef::1 5. 9🅰️b:c::1 receives ICMPv6 packet with (src 5:6:7:8::ffef::1 dst 9🅰️b:c::1) including original packet (src 9🅰️b:c::1 dst 1:2:3:4::1). It doesn't recognise the original packet as the dst doesn't match anything it already sent The change to SNPT means the ICMPv6 original packet dst will be rewritten to 5:6:7:8::ffef::1 in step 4, allowing the error to be properly recognised in step 5. Signed-off-by: Michael Zhou <mzhou@cse.unsw.edu.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-28 19:18:48 +02:00
Alex Dewar	0f091e4331	netlabel: remove unused param from audit_log_format() Commit `d3b990b7f3` ("netlabel: fix problems with mapping removal") added a check to return an error if ret_val != 0, before ret_val is later used in a log message. Now it will unconditionally print "... res=1". So just drop the check. Addresses-Coverity: ("Dead code") Fixes: `d3b990b7f3` ("netlabel: fix problems with mapping removal") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-28 09:08:51 -07:00
Cong Wang	5438dd4583	net_sched: fix error path in red_init() When ->init() fails, ->destroy() is called to clean up. So it is unnecessary to clean up in red_init(), and it would cause some refcount underflow. Fixes: `aee9caa03f` ("net: sched: sch_red: Add qevents "early_drop" and "mark"") Reported-and-tested-by: syzbot+b33c1cb0a30ebdc8a5f9@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+e5ea5f8a3ecfd4427a1c@syzkaller.appspotmail.com Cc: Petr Machata <petrm@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-28 07:16:46 -07:00
Mahesh Bandewar	316cdaa115	net: add option to not create fall-back tunnels in root-ns as well The sysctl that was added earlier by commit `79134e6ce2` ("net: do not create fallback tunnels for non-default namespaces") to create fall-back only in root-ns. This patch enhances that behavior to provide option not to create fallback tunnels in root-ns as well. Since modules that create fallback tunnels could be built-in and setting the sysctl value after booting is pointless, so added a kernel cmdline options to change this default. The default setting is preseved for backward compatibility. The kernel command line option of fb_tunnels=initns will set the sysctl value to 1 and will create fallback tunnels only in initns while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and fallback tunnels are skipped in every netns. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Maciej Zenczykowski <maze@google.com> Cc: Jian Yang <jianyang@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-28 06:52:44 -07:00
zhudi	174bce38ca	netlink: fix a data race in netlink_rcv_wake() The data races were reported by KCSAN: BUG: KCSAN: data-race in netlink_recvmsg / skb_queue_tail write (marked) to 0xffff8c0986e5a8c8 of 8 bytes by interrupt on cpu 3: skb_queue_tail+0xcc/0x120 __netlink_sendskb+0x55/0x80 netlink_broadcast_filtered+0x465/0x7e0 nlmsg_notify+0x8f/0x120 rtnl_notify+0x8e/0xb0 __neigh_notify+0xf2/0x120 neigh_update+0x927/0xde0 arp_process+0x8a3/0xf50 arp_rcv+0x27c/0x3b0 __netif_receive_skb_core+0x181c/0x1840 __netif_receive_skb+0x38/0xf0 netif_receive_skb_internal+0x77/0x1c0 napi_gro_receive+0x1bd/0x1f0 e1000_clean_rx_irq+0x538/0xb20 [e1000] e1000_clean+0x5e4/0x1340 [e1000] net_rx_action+0x310/0x9d0 __do_softirq+0xe8/0x308 irq_exit+0x109/0x110 do_IRQ+0x7f/0xe0 ret_from_intr+0x0/0x1d 0xffffffffffffffff read to 0xffff8c0986e5a8c8 of 8 bytes by task 1463 on cpu 0: netlink_recvmsg+0x40b/0x820 sock_recvmsg+0xc9/0xd0 ___sys_recvmsg+0x1a4/0x3b0 __sys_recvmsg+0x86/0x120 __x64_sys_recvmsg+0x52/0x70 do_syscall_64+0xb5/0x360 entry_SYSCALL_64_after_hwframe+0x65/0xca 0xffffffffffffffff Since the write is under sk_receive_queue->lock but the read is done as lockless. so fix it by using skb_queue_empty_lockless() instead of skb_queue_empty() for the read in netlink_rcv_wake() Signed-off-by: zhudi <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-28 06:51:11 -07:00
Martin KaFai Lau	134fede4ee	bpf: Relax max_entries check for most of the inner map types Most of the maps do not use max_entries during verification time. Thus, those map_meta_equal() do not need to enforce max_entries when it is inserted as an inner map during runtime. The max_entries check is removed from the default implementation bpf_map_meta_equal(). The prog_array_map and xsk_map are exception. Its map_gen_lookup uses max_entries to generate inline lookup code. Thus, they will implement its own map_meta_equal() to enforce max_entries. Since there are only two cases now, the max_entries check is not refactored and stays in its own .c file. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011813.1970516-1-kafai@fb.com	2020-08-28 15:41:30 +02:00
Martin KaFai Lau	f4d0525921	bpf: Add map_meta_equal map ops Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com	2020-08-28 15:41:30 +02:00
David S. Miller	ae9a138f06	This time we have: * some code to support SAE (WPA3) offload in AP mode * many documentation (wording) fixes/updates * netlink policy updates, including the use of NLA_RANGE with binary attributes * regulatory improvements for adjacent frequency bands * and a few other small additions/refactorings/cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl9I2GQACgkQB8qZga/f l8R3QhAAkVkBKrjRG26/LILGfeznpqIXgZW9jaUA6KhXHrd5hXEu2aXb+iBIETK6 CUIKcEIsuO8Uvonu0q0BbAxinVdXIZdxdmY283z4zOS3ngh5ia9GPeNoh50qbEf3 v6h2qitqRLzIuTDnFneK8UQe5WtFWcuyYKaJjr1YjYZFBkTkoUVTQ2AA8BHKA/Qa Ng6PgSx5fo5uOztFGJOvF0y2LMvoSgGGggn6ODKYIrwNI+95iE2W9nG1nUVSzy7e N6Codaqnh4/TnyQABA1REl7OPHe5ZJrApytS3H/pqqsLfMcYByGC3ebThesNDwzM ptLGhAbDZq7f1vmYMwkUowha8iOD0LzibDBIl8235Z3oIDrgvwGRonERZnmo+aCa vEQS8MFmseCrsxheicrzo5FWLc4rt9fU5nC4IgpZLZozU8ZFP+rKo7haHpO7a69k xED3SYoB461y+5TQgNPWSeSjvv3P0LjoS29xHAVsu0pp2Gf/cKg75bPJLbsrTYlX oKUMF3djN0e9+LOvtuuQYvaZfilpsd/6nBbz5r3sEeRXkJkUcOsGkgVTwX4VtsNT oPGjF2yOmj/r6J1blOZYxnSWiyWLYk3r8Hkym8bp3esJhJbJKgTDziJm+g0kGiJt FGTlQVGNKRdHRRzgMTOSGonI2nmaE5CWSJCHcSSMX8rfZFgT1ng= =Ys45 -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2020-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time we have: * some code to support SAE (WPA3) offload in AP mode * many documentation (wording) fixes/updates * netlink policy updates, including the use of NLA_RANGE with binary attributes * regulatory improvements for adjacent frequency bands * and a few other small additions/refactorings/cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-28 06:17:40 -07:00
David S. Miller	51458c9705	We have: * fixes for AQL (airtime queue limits) * reduce packet loss detection false positives * a small channel number fix for the 6 GHz band * a fix for 80+80/160 MHz negotiation * an nl80211 attribute (NL80211_ATTR_HE_6GHZ_CAPABILITY) fix * add a missing sanity check for the regulatory code -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl9I10UACgkQB8qZga/f l8TVrQ/8CNCAhEwaGVj4SxjJAlma7vnMffxR5BXboPsWm3fVzknJnLvlLyTpXJbH lXadT3fv4M9dxKFIqfJiak7LK4c8rv64xeYXelZWVq43S1DjVV3mZOaTehOPbxtM Wo+d9mUkSH7PgbgipHsR01/fdFCEwu0uCqqG6syOj1+f3Pcvaofwv8vZyhtWwkTk ZchvdtU2Q/Nh0SEaIKg6esNogfPBtgSrCX7CV/uNbfeyISL5iSeU6DZeZZLDdcD9 b66PNs+DFAI2XIr/2hmHWtkVOUIhAY9EQfICaBvUZryyvFDR9ycSO5khginZP47/ 88gjJ2loYt9M+p00/FcYXkNzxwdo9tppHQ7XmZi4Ciwt7ADuvDlTdehS/ix93HkS 5hC5UOz8GNy1XZjayqzSVn3AwWuEncRM67NQo3J6qwS1vnF4NKi7rdwSDO49OX3V aDAXGiALo2V9ZgbtOOlSbUl1y107mPtemkmdAOcMvXCLXFYixUeUgULTvl4S72fk RVxb0xp7lJ0T5hFXGGcTO/MV9X5Rg/d/Y1AKhCy3Wt2bdNk0K5JOcblnrXp8dz83 qveiscDy4B6NEnaPOffS5z/7E5wXIN77L9TYKOnQ/v8Gu+r15imyzcgz/s8EvRmJ rWhB45SUhc5CBIjTFa4iR8M8u1Wc/a+I5Zht39VRTCLtDlOADZw= =zMP7 -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2020-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have: * fixes for AQL (airtime queue limits) * reduce packet loss detection false positives * a small channel number fix for the 6 GHz band * a fix for 80+80/160 MHz negotiation * an nl80211 attribute (NL80211_ATTR_HE_6GHZ_CAPABILITY) fix * add a missing sanity check for the regulatory code ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-28 06:16:48 -07:00
Dinghao Liu	b43c75abfd	rxrpc: Fix memory leak in rxkad_verify_response() Fix a memory leak in rxkad_verify_response() whereby the response buffer doesn't get freed if we fail to allocate a ticket buffer. Fixes: `ef68622da9` ("rxrpc: Handle temporary errors better in rxkad security") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-27 12:59:45 -07:00
David S. Miller	8d73a73a7f	RxRPC fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl9HwPIACgkQ+7dXa6fL C2vHdw//S5s+nPNuJUpz3aeYDC1Yl1YP+r2+XBfcCNem504GKfsvTBbLo7FYP6Oc TCLJouPxoNSWAtlrJ86YJu8EcFdYNI4w6mCHncIrLXiiIZhsAeUMAw1GiASL5VUr QhLjJJU4zUBg3/RWLRC1pUXBXAa3sYx5r1p8j3KdBAkAmAzXdFWSRFeffVeXIk46 FZ52YoQkplJYqDL+1oQCjJLetVJGGvc68AIOjJOLh6CAjrZbx1aiY79XA+flsoE9 B+KVjhEn0f9dVybgtF4rEqS88y1FJtSQgul6FOsys+Rx2I0G7ei8PB5TQ2wNCQhy 37gGfg1AV6apcqXKcHJdovVnApoQzzdCJESCgbMvsczqM88dP7pLWrPz4wpxs655 3t6qUyXI6yMOJhUWPOoFi30+4NM+MmsqrbYLZt2f/aXRhKnyaeaLd+VAIlkgz3Lj sZuuHQsTH0xiR2uCsINWEc7d7UV6WjeVUJ77LzYiiRzEC2pX80tGu5EKHN8W9oBk xRAuExXEGtyOR0p3/S3StkT490Tt4bIxUwnKYAaQZEydMCnTlWVbtIXwzmlCbCSB p+P/7twR7LiQlHCTJU94jH3Jfpm3jpFpgjQZoZcx6ZLvtSH4+QP11nMY9+sRxEz1 hpB12AY7Wp2N7P+GhBPuXUCQpzW751aNZz4X9Etu6kRgwjuyaJM= =FWHJ -----END PGP SIGNATURE----- Merge tag 'rxrpc-fixes-20200820' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc, afs: Fix probing issues Here are some fixes for rxrpc and afs to fix issues in the RTT measuring in rxrpc and thence the Volume Location server probing in afs: (1) Move the serial number of a received ACK into a local variable to simplify the next patch. (2) Fix the loss of RTT samples due to extra interposed ACKs causing baseline information to be discarded too early. This is a particular problem for afs when it sends a single very short call to probe a server it hasn't talked to recently. (3) Fix rxrpc_kernel_get_srtt() to indicate whether it actually has seen any valid samples or not. (4) Remove a field that's set/woken, but never read/waited on. (5) Expose the RTT and other probe information through procfs to make debugging of this stuff easier. (6) Fix VL rotation in afs to only use summary information from VL probing and not the probe running state (which gets clobbered when next a probe is issued). (7) Fix VL rotation to actually return the error aggregated from the probe errors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-27 12:55:46 -07:00
Linus Lüssing	097930e85f	batman-adv: bla: fix type misuse for backbone_gw hash indexing It seems that due to a copy & paste error the void pointer in batadv_choose_backbone_gw() is cast to the wrong type. Fixing this by using "struct batadv_bla_backbone_gw" instead of "struct batadv_bla_claim" which better matches the caller's side. For now it seems that we were lucky because the two structs both have their orig/vid and addr/vid in the beginning. However I stumbled over this issue when I was trying to add some debug variables in front of "orig" in batadv_backbone_gw, which caused hash lookups to fail. Fixes: `07568d0369` ("batman-adv: don't rely on positions in struct for hashing") Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2020-08-27 17:41:45 +02:00
Miaohe Lin	645f08975f	net: Fix some comments Fix some comments, including wrong function name, duplicated word and so on. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-27 07:55:59 -07:00
Hoang Huu Le	fdeba99b1e	tipc: fix use-after-free in tipc_bcast_get_mode Syzbot has reported those issues as: ================================================================== BUG: KASAN: use-after-free in tipc_bcast_get_mode+0x3ab/0x400 net/tipc/bcast.c:759 Read of size 1 at addr ffff88805e6b3571 by task kworker/0:6/3850 CPU: 0 PID: 3850 Comm: kworker/0:6 Not tainted 5.8.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events tipc_net_finalize_work Thread 1's call trace: [...] kfree+0x103/0x2c0 mm/slab.c:3757 <- bcbase releasing tipc_bcast_stop+0x1b0/0x2f0 net/tipc/bcast.c:721 tipc_exit_net+0x24/0x270 net/tipc/core.c:112 [...] Thread 2's call trace: [...] tipc_bcast_get_mode+0x3ab/0x400 net/tipc/bcast.c:759 <- bcbase has already been freed by Thread 1 tipc_node_broadcast+0x9e/0xcc0 net/tipc/node.c:1744 tipc_nametbl_publish+0x60b/0x970 net/tipc/name_table.c:752 tipc_net_finalize net/tipc/net.c:141 [inline] tipc_net_finalize+0x1fa/0x310 net/tipc/net.c:131 tipc_net_finalize_work+0x55/0x80 net/tipc/net.c:150 [...] ================================================================== BUG: KASAN: use-after-free in tipc_named_reinit+0xef/0x290 net/tipc/name_distr.c:344 Read of size 8 at addr ffff888052ab2000 by task kworker/0:13/30628 CPU: 0 PID: 30628 Comm: kworker/0:13 Not tainted 5.8.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events tipc_net_finalize_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1f0/0x31e lib/dump_stack.c:118 print_address_description+0x66/0x5a0 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 mm/kasan/report.c:530 tipc_named_reinit+0xef/0x290 net/tipc/name_distr.c:344 tipc_net_finalize+0x85/0xe0 net/tipc/net.c:138 tipc_net_finalize_work+0x50/0x70 net/tipc/net.c:150 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 [...] Freed by task 14058: save_stack mm/kasan/common.c:48 [inline] set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x114/0x170 mm/kasan/common.c:455 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x220 mm/slab.c:3757 tipc_exit_net+0x29/0x50 net/tipc/core.c:113 ops_exit_list net/core/net_namespace.c:186 [inline] cleanup_net+0x708/0xba0 net/core/net_namespace.c:603 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 Fix it by calling flush_scheduled_work() to make sure the tipc_net_finalize_work() stopped before releasing bcbase object. Reported-by: syzbot+6ea1f7a8df64596ef4d7@syzkaller.appspotmail.com Reported-by: syzbot+e9cc557752ab126c1b99@syzkaller.appspotmail.com Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-27 06:59:42 -07:00
Chung-Hsien Hsu	2831a63102	nl80211: support SAE authentication offload in AP mode Let drivers advertise support for AP-mode SAE authentication offload with a new NL80211_EXT_FEATURE_SAE_OFFLOAD_AP flag. Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com> Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com> Link: https://lore.kernel.org/r/20200817073316.33402-4-stanley.hsu@cypress.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 15:19:44 +02:00
John Crispin	8552a434b6	mac80211: rename csa counters to countdown counters We want to reuse the functions and structs for other counters such as BSS color change. Rename them to more generic names. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200811080107.3615705-2-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 14:12:15 +02:00
John Crispin	00c207edfb	nl80211: rename csa counter attributes countdown counters We want to reuse the attributes for other counters such as BSS color change. Rename them to more generic names. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200811080107.3615705-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 14:12:15 +02:00
Miles Hu	eb89a6a6b7	nl80211: add support for setting fixed HE rate/gi/ltf This patch adds the nl80211 structs, definitions, policies and parsing code required to pass fixed HE rate, GI and LTF settings. Signed-off-by: Miles Hu <milehu@codeaurora.org> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200804081630.2013619-1-john@phrozen.org [fix comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 14:12:14 +02:00
Markus Theil	12adee3c46	cfg80211: add helper fn for adjacent rule channels Some usable channels are located in the union of adjacent regulatory rules, for example channel 144 in Germany. Enable them, by also checking if a channel spans two adjacent regulatory rules/frequency ranges. All flags involved are disabling things, therefore we can build the maximum by or-ing them together. Furthermore, take the maximum of DFS CAC time values and the minimum of allowed power of both adjacent channels in order to comply with both regulatory rules at the same time. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200803144353.305538-2-markus.theil@tu-ilmenau.de [remove unrelated comment changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:27:17 +02:00
Markus Theil	7c9ff7e232	cfg80211: add helper fn for single rule channels As a preparation to handle adjacent rule channels, factor out handling channels located in a single regulatory rule. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200803144353.305538-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:25:45 +02:00
Johannes Berg	c8b8280230	nl80211: use NLA_POLICY_RANGE(NLA_BINARY, ...) for a few attributes We have a few attributes with minimum and maximum lengths that are not the same, use the new feature of being able to specify both in the policy to validate them, removing code and allowing this to be advertised to userspace in the policy export. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200819085642.8f12ffa14f33.I9d948d59870e521febcd79bb4a986b1de1dca47b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:25:06 +02:00
Johannes Berg	cb9abd48d9	nl80211: clean up code/policy a bit Use the policy to validate minimum and exact lengths in some attributes that weren't previously covered in the right ways, and remove associated validation code. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200805154714.3ba1e233cfa0.I5dc8109b7ab5c3f4ae925f903a30cc9b35753262@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:24:56 +02:00
Miaohe Lin	7b506ff6f6	net: wireless: Convert to use the preferred fallthrough macro Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20200822082323.45495-1-linmiaohe@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:24:28 +02:00
Randy Dunlap	eee79f8094	net: wireless: wext_compat.c: delete duplicated word Drop the repeated word "be". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20200822231953.465-8-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:23:38 +02:00
Randy Dunlap	54f65de004	net: wireless: sme.c: delete duplicated word Drop the repeated word "is". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20200822231953.465-7-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:23:31 +02:00
Randy Dunlap	8cf5c86d55	net: wireless: scan.c: delete or fix duplicated words Drop repeated word "stored". Change "is is" to "it is". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20200822231953.465-6-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:23:26 +02:00
Randy Dunlap	cc5a639b03	net: wireless: reg.c: delete duplicated words + fix punctuation Drop duplicated words "was" and "does". Fix "let's" apostrophe. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20200822231953.465-5-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:23:21 +02:00
Randy Dunlap	b42c8edfdb	net: wireless: delete duplicated word + fix grammar Drop the repeated word "Return" + fix verb. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20200822231953.465-4-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:23:16 +02:00
Randy Dunlap	13880a3b55	net: mac80211: mesh.h: delete duplicated word Drop the repeated word "address". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20200822231953.465-3-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:23:11 +02:00
Randy Dunlap	39f774e78d	net: mac80211: agg-rx.c: fix duplicated words Change "If if" to "If it". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20200822231953.465-2-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 11:23:08 +02:00
Amar Singhal	2d9b555085	cfg80211: Adjust 6 GHz frequency to channel conversion Adjust the 6 GHz frequency to channel conversion function, the other way around was previously handled. Signed-off-by: Amar Singhal <asinghal@codeaurora.org> Link: https://lore.kernel.org/r/1592599921-10607-1-git-send-email-asinghal@codeaurora.org [rewrite commit message, hard-code channel 2] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 10:53:21 +02:00
Felix Fietkau	47df8e059b	mac80211: reduce packet loss event false positives When running a large number of packets per second with a high data rate and long A-MPDUs, the packet loss threshold can be reached very quickly when the link conditions change. This frequently shows up as spurious disconnects. Mitigate false positives by using a similar logic for regular stations as the one being used for TDLS, though with a more aggressive timeout. Packet loss events are only reported if no ACK was received for a second. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200808172542.41628-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 10:53:20 +02:00
Johannes Berg	47caf685a6	cfg80211: regulatory: reject invalid hints Reject invalid hints early in order to not cause a kernel WARN later if they're restored to or similar. Reported-by: syzbot+d451401ffd00a60677ee@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=d451401ffd00a60677ee Link: https://lore.kernel.org/r/20200819084648.13956-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 10:53:14 +02:00
Shay Bar	3579994476	wireless: fix wrong 160/80+80 MHz setting Fix cfg80211_chandef_usable(): consider IEEE80211_VHT_CAP_EXT_NSS_BW when verifying 160/80+80 MHz. Based on: "Table 9-272 — Setting of the Supported Channel Width Set subfield and Extended NSS BW Support subfield at a STA transmitting the VHT Capabilities Information field" From "Draft P802.11REVmd_D3.0.pdf" Signed-off-by: Aviad Brikman <aviad.brikman@celeno.com> Signed-off-by: Shay Bar <shay.bar@celeno.com> Link: https://lore.kernel.org/r/20200826143139.25976-1-shay.bar@celeno.com [reformat the code a bit and use u32_get_bits()] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 10:46:53 +02:00
Felix Fietkau	f01cfbaf9b	mac80211: improve AQL aggregation estimation for low data rates Links with low data rates use much smaller aggregates and are much more sensitive to latency added by bufferbloat. Tune the assumed aggregation length based on the tx rate duration. Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20200821163045.62140-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 10:28:41 +02:00
Felix Fietkau	43cd72c589	mac80211: factor out code to look up the average packet length duration for a rate This will be used to enhance AQL estimated aggregation length Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20200821163045.62140-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 10:28:34 +02:00
Felix Fietkau	8ed37e7919	mac80211: use rate provided via status->rate on ieee80211_tx_status_ext for AQL Since ieee80211_tx_info does not have enough room to encode HE rates, HE drivers use status->rate to provide rate info. Store it in struct sta_info and use it for AQL. Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20200821163045.62140-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-27 10:28:21 +02:00
Sabrina Dubroca	45a36a18d0	xfrmi: drop ignore_df check before updating pmtu xfrm interfaces currently test for !skb->ignore_df when deciding whether to update the pmtu on the skb's dst. Because of this, no pmtu exception is created when we do something like: ping -s 1438 <dest> By dropping this check, the pmtu exception will be created and the next ping attempt will work. Fixes: `f203b76d78` ("xfrm: Add virtual xfrm interfaces") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-08-27 08:37:50 +02:00
Jakub Kicinski	96e97bc07e	net: disable netpoll on fresh napis napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent netpoll from accessing rings before init is complete. However, the same is not done for fresh napi instances in netif_napi_add(), even though we expect NAPI instances to be added as disabled. This causes crashes during driver reconfiguration (enabling XDP, changing the channel count) - if there is any printk() after netif_napi_add() but before napi_enable(). To ensure memory ordering is correct we need to use RCU accessors. Reported-by: Rob Sherwood <rsher@fb.com> Fixes: `2d8bff1269` ("netpoll: Close race condition between poll_one_napi and napi_disable") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 16:16:39 -07:00
Ido Schimmel	885a3b1579	ipv4: nexthop: Correctly update nexthop group when replacing a nexthop Each nexthop group contains an indication if it has IPv4 nexthops ('has_v4'). Its purpose is to prevent IPv6 routes from using groups with IPv4 nexthops. However, the indication is not updated when a nexthop is replaced. This results in the kernel wrongly rejecting IPv6 routes from pointing to groups that only contain IPv6 nexthops. Example: # ip nexthop replace id 1 via 192.0.2.2 dev dummy10 # ip nexthop replace id 10 group 1 # ip nexthop replace id 1 via 2001:db8:1::2 dev dummy10 # ip route replace 2001:db8:10::/64 nhid 10 Error: IPv6 routes can not use an IPv4 nexthop. Solve this by iterating over all the nexthop groups that the replaced nexthop is a member of and potentially update their IPv4 indication according to the new set of member nexthops. Avoid wasting cycles by only performing the update in case an IPv4 nexthop is replaced by an IPv6 nexthop. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 16:00:51 -07:00
Ido Schimmel	863b25581c	ipv4: nexthop: Correctly update nexthop group when removing a nexthop Each nexthop group contains an indication if it has IPv4 nexthops ('has_v4'). Its purpose is to prevent IPv6 routes from using groups with IPv4 nexthops. However, the indication is not updated when a nexthop is removed. This results in the kernel wrongly rejecting IPv6 routes from pointing to groups that only contain IPv6 nexthops. Example: # ip nexthop replace id 1 via 192.0.2.2 dev dummy10 # ip nexthop replace id 2 via 2001:db8:1::2 dev dummy10 # ip nexthop replace id 10 group 1/2 # ip nexthop del id 1 # ip route replace 2001:db8:10::/64 nhid 10 Error: IPv6 routes can not use an IPv4 nexthop. Solve this by updating the indication according to the new set of member nexthops. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 16:00:51 -07:00
Ido Schimmel	233c63785c	ipv4: nexthop: Remove unnecessary rtnl_dereference() The pointer is not RCU protected, so remove the unnecessary rtnl_dereference(). This suppresses the following warning: net/ipv4/nexthop.c:1101:24: error: incompatible types in comparison expression (different address spaces): net/ipv4/nexthop.c:1101:24: struct rb_node [noderef] __rcu * net/ipv4/nexthop.c:1101:24: struct rb_node * Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 16:00:51 -07:00
Ido Schimmel	33d80996b8	ipv4: nexthop: Use nla_put_be32() for NHA_GATEWAY The code correctly uses nla_get_be32() to get the payload of the attribute, but incorrectly uses nla_put_u32() to add the attribute to the payload. This results in the following warning: net/ipv4/nexthop.c:279:59: warning: incorrect type in argument 3 (different base types) net/ipv4/nexthop.c:279:59: expected unsigned int [usertype] value net/ipv4/nexthop.c:279:59: got restricted __be32 [usertype] ipv4 Suppress the warning by using nla_put_be32(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 16:00:51 -07:00
Ido Schimmel	d7d49dc77c	ipv4: nexthop: Reduce allocation size of 'struct nh_group' The struct looks as follows: struct nh_group { struct nh_group spare; / spare group for removals */ u16 num_nh; bool mpath; bool fdb_nh; bool has_v4; struct nh_grp_entry nh_entries[]; }; But its offset within 'struct nexthop' is also taken into account to determine the allocation size. Instead, use struct_size() to allocate only the required number of bytes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 16:00:51 -07:00
Ido Schimmel	7f6f32bb7d	ipv4: Silence suspicious RCU usage warning fib_info_notify_update() is always called with RTNL held, but not from an RCU read-side critical section. This leads to the following warning [1] when the FIB table list is traversed with hlist_for_each_entry_rcu(), but without a proper lockdep expression. Since modification of the list is protected by RTNL, silence the warning by adding a lockdep expression which verifies RTNL is held. [1] ============================= WARNING: suspicious RCU usage 5.9.0-rc1-custom-14233-g2f26e122d62f #129 Not tainted ----------------------------- net/ipv4/fib_trie.c:2124 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/834: #0: ffffffff85a3b6b0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0 stack backtrace: CPU: 0 PID: 834 Comm: ip Not tainted 5.9.0-rc1-custom-14233-g2f26e122d62f #129 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x143/0x14d fib_info_notify_update+0x8d1/0xa60 __nexthop_replace_notify+0xd2/0x290 rtm_new_nexthop+0x35e2/0x5946 rtnetlink_rcv_msg+0x4f7/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 ____sys_sendmsg+0x879/0xa00 ___sys_sendmsg+0x122/0x190 __sys_sendmsg+0x103/0x1d0 __x64_sys_sendmsg+0x7d/0xb0 do_syscall_64+0x32/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fde28c3be57 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffc09330028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fde28c3be57 RDX: 0000000000000000 RSI: 00007ffc09330090 RDI: 0000000000000003 RBP: 000000005f45f911 R08: 0000000000000001 R09: 00007ffc0933012c R10: 0000000000000076 R11: 0000000000000246 R12: 0000000000000001 R13: 00007ffc09330290 R14: 00007ffc09330eee R15: 00005610e48ed020 Fixes: `1bff1a0c9b` ("ipv4: Add function to send route updates") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 15:58:48 -07:00
Florian Westphal	1cec170d45	mptcp: free acked data before waiting for more memory After subflow lock is dropped, more wmem might have been made available. This fixes a deadlock in mptcp_connect.sh 'mmap' mode: wmem is exhausted. But as the mptcp socket holds on to already-acked data (for retransmit) no wakeup will occur. Using 'goto restart' calls mptcp_clean_una(sk) which will free pages that have been acked completely in the mean time. Fixes: `fb529e62d3` ("mptcp: break and restart in case mptcp sndbuf is full") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 15:48:44 -07:00
Vinicius Costa Gomes	09e31cf0c5	taprio: Fix using wrong queues in gate mask Since commit `9c66d15646` ("taprio: Add support for hardware offloading") there's a bit of inconsistency when offloading schedules to the hardware: In software mode, the gate masks are specified in terms of traffic classes, so if say "sched-entry S 03 20000", it means that the traffic classes 0 and 1 are open for 20us; when taprio is offloaded to hardware, the gate masks are specified in terms of hardware queues. The idea here is to fix hardware offloading, so schedules in hardware and software mode have the same behavior. What's needed to do is to map traffic classes to queues when applying the offload to the driver. Fixes: `9c66d15646` ("taprio: Add support for hardware offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 15:45:11 -07:00
Chuck Lever	5de55ce951	xprtrdma: Release in-flight MRs on disconnect Dan Aloni reports that when a server disconnects abruptly, a few memory regions are left DMA mapped. Over time this leak could pin enough I/O resources to slow or even deadlock an NFS/RDMA client. I found that if a transport disconnects before pending Send and FastReg WRs can be posted, the to-be-registered MRs are stranded on the req's rl_registered list and never released -- since they weren't posted, there's no Send completion to DMA unmap them. Reported-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-08-26 15:29:21 -04:00
Linus Torvalds	2ac69819ba	Fixes: - Eliminate an oops introduced in v5.8 - Remove a duplicate #include added by nfsd-5.9 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl8/4gUACgkQM2qzM29m f5cxKBAAp7UjD3YNlLhSviowuOfYpWNjyk1cEQ6hWFA9oVeSfZfU3/axW8uYTHPm QZ6ams6gjorP4CXwVkFGFHpTRg4CfVN9g5lKxrcjvELqNllWBhE9UupRgbX3+XBE qselRI22M64o2tfDE+tPrDB8w8PwHmqrHwRydXfgiFlHk7nt6xD7NitaJBnPlYPM 21OBl6mrjLwtRwvX9n5wpy/+bfOTHbGV5VNez0fAfKXggNmRdt/UNROC4doLg4M0 2khAV3vgx49FRpCPL6SZPcBYd6zfrYOcj3iSf6wpxS5nTb2MifXFqz1MvKRTj863 gzvSmh7vuf0+EaOAXuLjCD9dURZpuG/k0vJGijOgaSt0+vNQHjIgZ1XRFHQtQCp4 zPJ/Qyk5k7uajHzcBFuNPUFAkOovH6LRoOzpqGvXhwaxrMPWti0LyyVKidVJrt/d EtOKQR+HCN0zAwjadXSPK8Nw1PjMzplkF7TaxXvF2LdO/4vpEZZNoz+if59gRcFY 65h2++7y+0MCX8l83uUZfs+jQU2aR1w5a0DjVzi86xzJtyhr6gEyTj3Z6L9HIHwW dnSpUmoiaCoN0eqxvEBjw0VEPqB806CuiUER0Jdd8k7mPk04fsQ/9+UsYyliSLEG N56LFSWLXLHsySa2WkuB/ghzT2/Q0vFoZKXW0KNSD7W4C5XMxi4= =czB3 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull nfs server fixes from Chuck Lever: - Eliminate an oops introduced in v5.8 - Remove a duplicate #include added by nfsd-5.9 * tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: SUNRPC: remove duplicate include nfsd: fix oops on mixed NFSv4/NFSv3 client access	2020-08-25 18:01:36 -07:00
KP Singh	30897832d8	bpf: Allow local storage to be used from LSM programs Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used in LSM programs. These helpers are not used for tracing programs (currently) as their usage is tied to the life-cycle of the object and should only be used where the owning object won't be freed (when the owning object is passed as an argument to the LSM hook). Thus, they are safer to use in LSM hooks than tracing. Usage of local storage in tracing programs will probably follow a per function based whitelist approach. Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock, it, leads to a compilation warning for LSM programs, it's also updated to accept a void * pointer instead. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-7-kpsingh@chromium.org	2020-08-25 15:00:04 -07:00
KP Singh	450af8d0f6	bpf: Split bpf_local_storage to bpf_sk_storage A purely mechanical change: bpf_sk_storage.c = bpf_sk_storage.c + bpf_local_storage.c bpf_sk_storage.h = bpf_sk_storage.h + bpf_local_storage.h Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-5-kpsingh@chromium.org	2020-08-25 15:00:04 -07:00
KP Singh	f836a56e84	bpf: Generalize bpf_sk_storage Refactor the functionality in bpf_sk_storage.c so that concept of storage linked to kernel objects can be extended to other objects like inode, task_struct etc. Each new local storage will still be a separate map and provide its own set of helpers. This allows for future object specific extensions and still share a lot of the underlying implementation. This includes the changes suggested by Martin in: https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/ adding new map operations to support bpf_local_storage maps: * storages for different kernel objects to optionally have different memory charging strategy (map_local_storage_charge, map_local_storage_uncharge) * Functionality to extract the storage pointer from a pointer to the owning object (map_owner_storage_ptr) Co-developed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825182919.1118197-4-kpsingh@chromium.org	2020-08-25 15:00:04 -07:00
KP Singh	4cc9ce4e73	bpf: Generalize caching for sk_storage. Provide the a ability to define local storage caches on a per-object type basis. The caches and caching indices for different objects should not be inter-mixed as suggested in: https://lore.kernel.org/bpf/20200630193441.kdwnkestulg5erii@kafai-mbp.dhcp.thefacebook.com/ "Caching a sk-storage at idx=0 of a sk should not stop an inode-storage to be cached at the same idx of a inode." Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-3-kpsingh@chromium.org	2020-08-25 15:00:03 -07:00
KP Singh	1f00d375af	bpf: Renames in preparation for bpf_local_storage A purely mechanical change to split the renaming from the actual generalization. Flags/consts: SK_STORAGE_CREATE_FLAG_MASK BPF_LOCAL_STORAGE_CREATE_FLAG_MASK BPF_SK_STORAGE_CACHE_SIZE BPF_LOCAL_STORAGE_CACHE_SIZE MAX_VALUE_SIZE BPF_LOCAL_STORAGE_MAX_VALUE_SIZE Structs: bucket bpf_local_storage_map_bucket bpf_sk_storage_map bpf_local_storage_map bpf_sk_storage_data bpf_local_storage_data bpf_sk_storage_elem bpf_local_storage_elem bpf_sk_storage bpf_local_storage The "sk" member in bpf_local_storage is also updated to "owner" in preparation for changing the type to void * in a subsequent patch. Functions: selem_linked_to_sk selem_linked_to_storage selem_alloc bpf_selem_alloc __selem_unlink_sk bpf_selem_unlink_storage_nolock __selem_link_sk bpf_selem_link_storage_nolock selem_unlink_sk __bpf_selem_unlink_storage sk_storage_update bpf_local_storage_update __sk_storage_lookup bpf_local_storage_lookup bpf_sk_storage_map_free bpf_local_storage_map_free bpf_sk_storage_map_alloc bpf_local_storage_map_alloc bpf_sk_storage_map_alloc_check bpf_local_storage_map_alloc_check bpf_sk_storage_map_check_btf bpf_local_storage_map_check_btf Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-2-kpsingh@chromium.org	2020-08-25 14:59:58 -07:00
Joe Perches	ca65a280fb	sunrpc: Avoid comma separated statements Use semicolons and braces. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 07:54:19 -07:00
Joe Perches	dee847793f	ipv6: fib6: Avoid comma separated statements Use semicolons and braces. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 07:54:19 -07:00
Tong Zhang	e104684108	net: caif: fix error code handling cfpkt_peek_head return 0 and 1, caller is checking error using <0 Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 07:50:25 -07:00
Miaohe Lin	343d8c6014	net: clean up codestyle for net/ipv4 This is a pure codestyle cleanup patch. Also add a blank line after declarations as warned by checkpatch.pl. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 06:28:02 -07:00
Miaohe Lin	fdf1923bf9	net: Remove duplicated midx check against 0 Check midx against 0 is always equal to check midx against sk_bound_dev_if when sk_bound_dev_if is known not equal to 0 in these case. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 06:23:59 -07:00
Miaohe Lin	0ce779a9f5	net: Avoid unnecessary inet_addr_type() call when addr is INADDR_ANY We can avoid unnecessary inet_addr_type() call by check addr against INADDR_ANY first. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 06:20:10 -07:00
Miaohe Lin	0316a21116	net: Set ping saddr after we successfully get the ping port We can defer set ping saddr until we successfully get the ping port. So we can avoid clear saddr when failed. Since ping_clear_saddr() is not used anymore now, remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 06:18:13 -07:00
Miaohe Lin	8b4510d76c	net: gain ipv4 mtu when mtu is not locked When mtu is locked, we should not obtain ipv4 mtu as we return immediately in this case and leave acquired ipv4 mtu unused. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-25 06:04:39 -07:00
David S. Miller	079f921e9f	This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Drop unused function batadv_hardif_remove_interfaces(), by Sven Eckelmann - delete duplicated words, by Randy Dunlap - Drop (even more) repeated words in comments, by Sven Eckelmann - Migrate to linux/prandom.h, by Sven Eckelmann -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl9D6i0WHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoUU7D/sEZhE7ht0F7tf5hyl8AlCaPpS4 Yd2IwrI1m4Vc8Z0gd4d3MtdCAilGsLVY32PY/+MQBRktvEZ+Ms9CD+MaVJJiCWnU OxhBZxNYeE/omP/nPZgwub/kHcIojGd/1XTy6Gyp2IGroOvAB3BA0acLsZsAoFrR /AmRLFFajnYgodn0yuWeomDjOaUoe5n4GCpWutlb6bLRA9t1B2YMWtRj5vXJO+Ht EIxBw3Gvz/b7eVsbo0VQMpsOju8OM9f6XZw66NnS2ekEepIBrkJzIi8fa97H7hy6 Ef6XkwmZhLgQQbsylVZSsynkNoB6cVOPDg/k4/BrErYTMnqAoh2U6v8IFOssmTKX Od5FBRqcXnRB2+/sPq7+VodlFXTqSriJGdlwCyGyg7dU4FPPdbdwubdJXWMjwe9A 7yMrty3VRLmVAwd5JuefNekyzEXXf2Pd/HHhDdgnLJ/LkJAgkswpmbjkz8kNZAwp 69MbFiC027O4s1l8PdiREjfAcMG1SicQ08LhBA8OV2ZUQ7IcYlg0Hq/UQNTEuJni yPFnzA5n/KtbY5sJGA+oYHGfQiNLx3xmM/GNl5YpfYrDSg8srcd+GTvwy2Vzzzp1 2Q7KSxPHVJDdcLPgyUlfTXp27870/aPY0Q+8r5fKjPgq9Fj4ZAuop7FtY8XSzg6R 3N4lPNKQS38UUPfjOg== =1GXQ -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20200824' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Drop unused function batadv_hardif_remove_interfaces(), by Sven Eckelmann - delete duplicated words, by Randy Dunlap - Drop (even more) repeated words in comments, by Sven Eckelmann - Migrate to linux/prandom.h, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 18:18:22 -07:00
David S. Miller	99408c422d	Here are some batman-adv bugfixes: - Avoid uninitialized memory access when handling DHCP, by Sven Eckelmann - Fix check for own OGM in OGM receive handler, by Linus Luessing - Fix netif_rx access for non-interrupt context in BLA, by Jussi Kivilinna -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl9D6JUWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoez3EADD/QxTk5CWF9jU7RnyLEmoRjGe GAoJnPbZV7GvyQADOFmrqYMvUv/4Vly8Tj8DBI3Sc8gNFNb3/jrBue/AaqXFKFh0 GFmh5MoRjzpz3xJRzOn+OxoDaC+ooeXNO3VIHNB76uTrgo1HToWrg5qtoT9KjGVT kDr0+os+9MuPh6qQKfnW79wVedYGwuDI1qb4ieHTWEyIU+A82TDYc5Zf3CqcPrlX WagIL1pNsMb7/BU9I1br9Q1ccG1cX/Icqaq/XQEiNRWsa0vjb084fKrq9jaQYalB AASDVh4aRbfJ1EatZAydX7AbmfDYPPg6THQsJAJphIJ1LBSFcv2+CMyT9Xkj1tGx 5rvMOAcO37VAv8+qwwH7DXmh3nOm2N9oglIYAGyBU3nf6rDeDntVOpraRMWr8eHt h2B+jJ9XSvowPJ4fTM86oqgxTUExYoyWHA1WkkX0VkcGXEKxpaSklSd1BcoSB707 3ItpxMYgFhLx12uZQ1bsnMuazOSer7rbpiZqmKx40uiVgyBB97d/udbNE0yh8LVM fCVwnnjNFsuKu2Qh4asvpKUhryKD/+1NOdWsrKZEJs6WAVmWJh1S3/MmZPpek0lQ pVEmf0JBp6yjfJdIgTb90UfoCRAEZ49OWmXelCjdSLx8k14j17geQxQvwcg3VGYN EsCTj4Ls2DsHpabhBQ== =ikYK -----END PGP SIGNATURE----- Merge tag 'batadv-net-for-davem-20200824' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Avoid uninitialized memory access when handling DHCP, by Sven Eckelmann - Fix check for own OGM in OGM receive handler, by Linus Luessing - Fix netif_rx access for non-interrupt context in BLA, by Jussi Kivilinna ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 18:16:56 -07:00
Miaohe Lin	373c15c2e9	net: Use helper macro RT_TOS() in __icmp_send() Use helper macro RT_TOS() to get tos in __icmp_send(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 18:12:36 -07:00
Miaohe Lin	7551144978	net: Avoid access icmp_err_convert when icmp code is ICMP_FRAG_NEEDED There is no need to fetch errno and fatal info from icmp_err_convert when icmp code is ICMP_FRAG_NEEDED. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 18:11:43 -07:00
Randy Dunlap	5463352776	net: dccp: delete repeated words Drop duplicated words in /net/dccp/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: dccp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 17:31:20 -07:00
Randy Dunlap	8540591885	net: netlink: delete repeated words Drop duplicated words in net/netlink/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 17:31:20 -07:00
Randy Dunlap	2bdcc73c88	net: ipv4: delete repeated words Drop duplicate words in comments in net/ipv4/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 17:31:20 -07:00
Randy Dunlap	b8d7a7c62c	net: sctp: ulpqueue.c: delete duplicated word Drop the repeated word "an". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:21:43 -07:00
Randy Dunlap	14f45bb7b1	net: sctp: sm_make_chunk.c: delete duplicated words + fix typo Drop the repeated words "for", "that", and "a". Change "his" to "this". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:21:43 -07:00
Randy Dunlap	93c3216a71	net: sctp: protocol.c: delete duplicated words + punctuation Drop the repeated words "of" and "that". Add some punctuation for readability. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:21:43 -07:00
Randy Dunlap	9932564f12	net: sctp: chunk.c: delete duplicated word Drop the repeated word "the". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:21:43 -07:00
Randy Dunlap	440d399033	net: sctp: bind_addr.c: delete duplicated word Drop the repeated word "of". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:21:43 -07:00
Randy Dunlap	861e7021ae	net: sctp: auth.c: delete duplicated words Drop the repeated word "the" and "now". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:21:43 -07:00
Randy Dunlap	5e80a0ccbc	net: sctp: associola.c: delete duplicated words Drop the repeated word "the" in two places. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:21:43 -07:00
Luke Hsiao	583bbf0624	io_uring: allow tcp ancillary data for __sys_recvmsg_sock() For TCP tx zero-copy, the kernel notifies the process of completions by queuing completion notifications on the socket error queue. This patch allows reading these notifications via recvmsg to support TCP tx zero-copy. Ancillary data was originally disallowed due to privilege escalation via io_uring's offloading of sendmsg() onto a kernel thread with kernel credentials (https://crbug.com/project-zero/1975). So, we must ensure that the socket type is one where the ancillary data types that are delivered on recvmsg are plain data (no file descriptors or values that are translated based on the identity of the calling process). This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE with tx zero-copy enabled. Before this patch, we received -EINVALID from this specific code path. After this patch, we could read tcp tx zero-copy completion notifications from the MSG_ERRQUEUE. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Arjun Roy <arjunroy@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:16:06 -07:00
Herbert Xu	be769db2f9	net: Get rid of consume_skb when tracing is off The function consume_skb is only meaningful when tracing is enabled. This patch makes it conditional on CONFIG_TRACEPOINTS. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:12:26 -07:00
Paul Moore	d3b990b7f3	netlabel: fix problems with mapping removal This patch fixes two main problems seen when removing NetLabel mappings: memory leaks and potentially extra audit noise. The memory leaks are caused by not properly free'ing the mapping's address selector struct when free'ing the entire entry as well as not properly cleaning up a temporary mapping entry when adding new address selectors to an existing entry. This patch fixes both these problems such that kmemleak reports no NetLabel associated leaks after running the SELinux test suite. The potentially extra audit noise was caused by the auditing code in netlbl_domhsh_remove_entry() being called regardless of the entry's validity. If another thread had already marked the entry as invalid, but not removed/free'd it from the list of mappings, then it was possible that an additional mapping removal audit record would be generated. This patch fixes this by returning early from the removal function when the entry was previously marked invalid. This change also had the side benefit of improving the code by decreasing the indentation level of large chunk of code by one (accounting for most of the diffstat). Fixes: `63c4168874` ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping") Reported-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:08:00 -07:00
Parav Pandit	5d080b5064	devlink: Protect devlink port list traversal Cited patch in fixes tag misses to protect port list traversal while traversing per port reporter list. Protect it using devlink instance lock. Fixes: `f4f5416601` ("devlink: Implement devlink health reporters on per-port basis") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:02:47 -07:00
Parav Pandit	79604c5de2	devlink: Fix per port reporter fields initialization Cited patch in fixes tag initializes reporters_list and reporters_lock of a devlink port after devlink port is added to the list. Once port is added to the list, devlink_nl_cmd_health_reporter_get_dumpit() can access the uninitialized mutex and reporters list head. Fix it by initializing port reporters field before adding port to the list. Fixes: `f4f5416601` ("devlink: Implement devlink health reporters on per-port basis") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 16:02:46 -07:00
Xin Long	3106ecb43a	sctp: not disable bh in the whole sctp_get_port_local() With disabling bh in the whole sctp_get_port_local(), when snum == 0 and too many ports have been used, the do-while loop will take the cpu for a long time and cause cpu stuck: [ ] watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [ ] RIP: 0010:native_queued_spin_lock_slowpath+0x4de/0x940 [ ] Call Trace: [ ] _raw_spin_lock+0xc1/0xd0 [ ] sctp_get_port_local+0x527/0x650 [sctp] [ ] sctp_do_bind+0x208/0x5e0 [sctp] [ ] sctp_autobind+0x165/0x1e0 [sctp] [ ] sctp_connect_new_asoc+0x355/0x480 [sctp] [ ] __sctp_connect+0x360/0xb10 [sctp] There's no need to disable bh in the whole function of sctp_get_port_local. So fix this cpu stuck by removing local_bh_disable() called at the beginning, and using spin_lock_bh() instead. The same thing was actually done for inet_csk_get_port() in Commit `ea8add2b19` ("tcp/dccp: better use of ephemeral ports in bind()"). Thanks to Marcelo for pointing the buggy code out. v1->v2: - use cond_resched() to yield cpu to other tasks if needed, as Eric noticed. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 15:50:17 -07:00
Martin KaFai Lau	267cf9fa43	tcp: bpf: Optionally store mac header in TCP_SAVE_SYN This patch is adapted from Eric's patch in an earlier discussion [1]. The TCP_SAVE_SYN currently only stores the network header and tcp header. This patch allows it to optionally store the mac header also if the setsockopt's optval is 2. It requires one more bit for the "save_syn" bit field in tcp_sock. This patch achieves this by moving the syn_smc bit next to the is_mptcp. The syn_smc is currently used with the TCP experimental option. Since syn_smc is only used when CONFIG_SMC is enabled, this patch also puts the "IS_ENABLED(CONFIG_SMC)" around it like the is_mptcp did with "IS_ENABLED(CONFIG_MPTCP)". The mac_hdrlen is also stored in the "struct saved_syn" to allow a quick offset from the bpf prog if it chooses to start getting from the network header or the tcp header. [1]: https://lore.kernel.org/netdev/CANn89iLJNWh6bkH7DNhy_kmcAexuUCccqERqe7z2QsvPhGrYPQ@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190123.2886935-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	0813a84156	bpf: tcp: Allow bpf prog to write and parse TCP header option [ Note: The TCP changes here is mainly to implement the bpf pieces into the bpf_skops_() functions introduced in the earlier patches. ] The earlier effort in BPF-TCP-CC allows the TCP Congestion Control algorithm to be written in BPF. It opens up opportunities to allow a faster turnaround time in testing/releasing new congestion control ideas to production environment. The same flexibility can be extended to writing TCP header option. It is not uncommon that people want to test new TCP header option to improve the TCP performance. Another use case is for data-center that has a more controlled environment and has more flexibility in putting header options for internal only use. For example, we want to test the idea in putting maximum delay ACK in TCP header option which is similar to a draft RFC proposal [1]. This patch introduces the necessary BPF API and use them in the TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse and write TCP header options. It currently supports most of the TCP packet except RST. Supported TCP header option: ─────────────────────────── This patch allows the bpf-prog to write any option kind. Different bpf-progs can write its own option by calling the new helper bpf_store_hdr_opt(). The helper will ensure there is no duplicated option in the header. By allowing bpf-prog to write any option kind, this gives a lot of flexibility to the bpf-prog. Different bpf-prog can write its own option kind. It could also allow the bpf-prog to support a recently standardized option on an older kernel. Sockops Callback Flags: ────────────────────── The bpf program will only be called to parse/write tcp header option if the following newly added callback flags are enabled in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG A few words on the PARSE CB flags. When the above PARSE CB flags are turned on, the bpf-prog will be called on packets received at a sk that has at least reached the ESTABLISHED state. The parsing of the SYN-SYNACK-ACK will be discussed in the "3 Way HandShake" section. The default is off for all of the above new CB flags, i.e. the bpf prog will not be called to parse or write bpf hdr option. There are details comment on these new cb flags in the UAPI bpf.h. sock_ops->skb_data and bpf_load_hdr_opt() ───────────────────────────────────────── sock_ops->skb_data and sock_ops->skb_data_end covers the whole TCP header and its options. They are read only. The new bpf_load_hdr_opt() helps to read a particular option "kind" from the skb_data. Please refer to the comment in UAPI bpf.h. It has details on what skb_data contains under different sock_ops->op. 3 Way HandShake ─────────────── The bpf-prog can learn if it is sending SYN or SYNACK by reading the sock_ops->skb_tcp_flags. Passive side When writing SYNACK (i.e. sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB), the received SYN skb will be available to the bpf prog. The bpf prog can use the SYN skb (which may carry the header option sent from the remote bpf prog) to decide what bpf header option should be written to the outgoing SYNACK skb. The SYN packet can be obtained by getsockopt(TCP_BPF_SYN). More on this later. Also, the bpf prog can learn if it is in syncookie mode (by checking sock_ops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE). The bpf prog can store the received SYN pkt by using the existing bpf_setsockopt(TCP_SAVE_SYN). The example in a later patch does it. [ Note that the fullsock here is a listen sk, bpf_sk_storage is not very useful here since the listen sk will be shared by many concurrent connection requests. Extending bpf_sk_storage support to request_sock will add weight to the minisock and it is not necessary better than storing the whole ~100 bytes SYN pkt. ] When the connection is established, the bpf prog will be called in the existing PASSIVE_ESTABLISHED_CB callback. At that time, the bpf prog can get the header option from the saved syn and then apply the needed operation to the newly established socket. The later patch will use the max delay ack specified in the SYN header and set the RTO of this newly established connection as an example. The received ACK (that concludes the 3WHS) will also be available to the bpf prog during PASSIVE_ESTABLISHED_CB through the sock_ops->skb_data. It could be useful in syncookie scenario. More on this later. There is an existing getsockopt "TCP_SAVED_SYN" to return the whole saved syn pkt which includes the IP[46] header and the TCP header. A few "TCP_BPF_SYN" getsockopt has been added to allow specifying where to start getting from, e.g. starting from TCP header, or from IP[46] header. The new getsockopt(TCP_BPF_SYN) will also know where it can get the SYN's packet from: - (a) the just received syn (available when the bpf prog is writing SYNACK) and it is the only way to get SYN during syncookie mode. or - (b) the saved syn (available in PASSIVE_ESTABLISHED_CB and also other existing CB). The bpf prog does not need to know where the SYN pkt is coming from. The getsockopt(TCP_BPF_SYN) will hide this details. Similarly, a flags "BPF_LOAD_HDR_OPT_TCP_SYN" is also added to bpf_load_hdr_opt() to read a particular header option from the SYN packet. * Fastopen Fastopen should work the same as the regular non fastopen case. This is a test in a later patch. * Syncookie For syncookie, the later example patch asks the active side's bpf prog to resend the header options in ACK. The server can use bpf_load_hdr_opt() to look at the options in this received ACK during PASSIVE_ESTABLISHED_CB. * Active side The bpf prog will get a chance to write the bpf header option in the SYN packet during WRITE_HDR_OPT_CB. The received SYNACK pkt will also be available to the bpf prog during the existing ACTIVE_ESTABLISHED_CB callback through the sock_ops->skb_data and bpf_load_hdr_opt(). * Turn off header CB flags after 3WHS If the bpf prog does not need to write/parse header options beyond the 3WHS, the bpf prog can clear the bpf_sock_ops_cb_flags to avoid being called for header options. Or the bpf-prog can select to leave the UNKNOWN_HDR_OPT_CB_FLAG on so that the kernel will only call it when there is option that the kernel cannot handle. [1]: draft-wang-tcpm-low-latency-opt-00 https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190104.2885895-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	c9985d09e1	bpf: sock_ops: Change some members of sock_ops_kern from u32 to u8 A later patch needs to add a few pointers and a few u8 to sock_ops_kern. Hence, this patch saves some spaces by moving some of the existing members from u32 to u8 so that the later patch can still fit everything in a cacheline. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190058.2885640-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	331fca4315	bpf: tcp: Add bpf_skops_hdr_opt_len() and bpf_skops_write_hdr_opt() The bpf prog needs to parse the SYN header to learn what options have been sent by the peer's bpf-prog before writing its options into SYNACK. This patch adds a "syn_skb" arg to tcp_make_synack() and send_synack(). This syn_skb will eventually be made available (as read-only) to the bpf prog. This will be the only SYN packet available to the bpf prog during syncookie. For other regular cases, the bpf prog can also use the saved_syn. When writing options, the bpf prog will first be called to tell the kernel its required number of bytes. It is done by the new bpf_skops_hdr_opt_len(). The bpf prog will only be called when the new BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG is set in tp->bpf_sock_ops_cb_flags. When the bpf prog returns, the kernel will know how many bytes are needed and then update the "remaining" arg accordingly. 4 byte alignment will be included in the "remaining" before this function returns. The 4 byte aligned number of bytes will also be stored into the opts->bpf_opt_len. "bpf_opt_len" is a newly added member to the struct tcp_out_options. Then the new bpf_skops_write_hdr_opt() will call the bpf prog to write the header options. The bpf prog is only called if it has reserved spaces before (opts->bpf_opt_len > 0). The bpf prog is the last one getting a chance to reserve header space and writing the header option. These two functions are half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190052.2885316-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	00d211a4ea	bpf: tcp: Add bpf_skops_parse_hdr() The patch adds a function bpf_skops_parse_hdr(). It will call the bpf prog to parse the TCP header received at a tcp_sock that has at least reached the ESTABLISHED state. For the packets received during the 3WHS (SYN, SYNACK and ACK), the received skb will be available to the bpf prog during the callback in bpf_skops_established() introduced in the previous patch and in the bpf_skops_write_hdr_opt() that will be added in the next patch. Calling bpf prog to parse header is controlled by two new flags in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG and BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG. When BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG is set, the bpf prog will only be called when there is unknown option in the TCP header. When BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG is set, the bpf prog will be called on all received TCP header. This function is half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190046.2885054-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	72be0fe6ba	bpf: tcp: Add bpf_skops_established() In tcp_init_transfer(), it currently calls the bpf prog to give it a chance to handle the just "ESTABLISHED" event (e.g. do setsockopt on the newly established sk). Right now, it is done by calling the general purpose tcp_call_bpf(). In the later patch, it also needs to pass the just-received skb which concludes the 3 way handshake. E.g. the SYNACK received at the active side. The bpf prog can then learn some specific header options written by the peer's bpf-prog and potentially do setsockopt on the newly established sk. Thus, instead of reusing the general purpose tcp_call_bpf(), a new function bpf_skops_established() is added to allow passing the "skb" to the bpf prog. The actual skb passing from bpf_skops_established() to the bpf prog will happen together in a later patch which has the necessary bpf pieces. A "skb" arg is also added to tcp_init_transfer() such that it can then be passed to bpf_skops_established(). Calling the new bpf_skops_established() instead of tcp_call_bpf() should be a noop in this patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190039.2884750-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	7656d68455	tcp: Add saw_unknown to struct tcp_options_received In a later patch, the bpf prog only wants to be called to handle a header option if that particular header option cannot be handled by the kernel. This unknown option could be written by the peer's bpf-prog. It could also be a new standard option that the running kernel does not support it while a bpf-prog can handle it. This patch adds a "saw_unknown" bit to "struct tcp_options_received" and it uses an existing one byte hole to do that. "saw_unknown" will be set in tcp_parse_options() if it sees an option that the kernel cannot handle. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190033.2884430-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	ca584ba070	tcp: bpf: Add TCP_BPF_RTO_MIN for bpf_setsockopt This patch adds bpf_setsockopt(TCP_BPF_RTO_MIN) to allow bpf prog to set the min rto of a connection. It could be used together with the earlier patch which has added bpf_setsockopt(TCP_BPF_DELACK_MAX). A later selftest patch will communicate the max delay ack in a bpf tcp header option and then the receiving side can use bpf_setsockopt(TCP_BPF_RTO_MIN) to set a shorter rto. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190027.2884170-1-kafai@fb.com	2020-08-24 14:35:00 -07:00
Martin KaFai Lau	2b8ee4f05d	tcp: bpf: Add TCP_BPF_DELACK_MAX setsockopt This change is mostly from an internal patch and adapts it from sysctl config to the bpf_setsockopt setup. The bpf_prog can set the max delay ack by using bpf_setsockopt(TCP_BPF_DELACK_MAX). This max delay ack can be communicated to its peer through bpf header option. The receiving peer can then use this max delay ack and set a potentially lower rto by using bpf_setsockopt(TCP_BPF_RTO_MIN) which will be introduced in the next patch. Another later selftest patch will also use it like the above to show how to write and parse bpf tcp header option. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190021.2884000-1-kafai@fb.com	2020-08-24 14:34:59 -07:00
Martin KaFai Lau	70a217f197	tcp: Use a struct to represent a saved_syn The TCP_SAVE_SYN has both the network header and tcp header. The total length of the saved syn packet is currently stored in the first 4 bytes (u32) of an array and the actual packet data is stored after that. A later patch will add a bpf helper that allows to get the tcp header alone from the saved syn without the network header. It will be more convenient to have a direct offset to a specific header instead of re-parsing it. This requires to separately store the network hdrlen. The total header length (i.e. network + tcp) is still needed for the current usage in getsockopt. Although this total length can be obtained by looking into the tcphdr and then get the (th->doff << 2), this patch chooses to directly store the tcp hdrlen in the second four bytes of this newly created "struct saved_syn". By using a new struct, it can give a readable name to each individual header length. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190014.2883694-1-kafai@fb.com	2020-08-24 14:34:59 -07:00
David S. Miller	a26aea2010	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Don't flag SCTP heartbeat as invalid for re-used connections, from Florian Westphal. 2) Bogus overlap report due to rbtree tree rotations, from Stefano Brivio. 3) Detect partial overlap with start end point match, also from Stefano. 4) Skip netlink dump of NFTA_SET_USERDATA is unset. 5) Incorrect nft_list_attributes enumeration definition. 6) Missing zeroing before memcpy to destination register, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 06:37:05 -07:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
David S. Miller	7611cbb900	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2020-08-23 11:48:27 -07:00
Tom Parkin	eee049c0ef	l2tp: remove tunnel and session debug flags field The l2tp subsystem now uses standard kernel logging APIs for informational and warning messages, and tracepoints for debug information. Now that the tunnel and session debug flags are unused, remove the field from the core structures. Various system calls (in the case of l2tp_ppp) and netlink messages handle the getting and setting of debug flags. To avoid userspace breakage don't modify the API of these calls; simply ignore set requests, and send dummy data for get requests. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Tom Parkin	ac6ebaf06e	l2tp: remove custom logging macros All l2tp's informational and warning logging is now carried out using standard kernel APIs. Debugging information is now handled using tracepoints. Now that no code is using the custom logging macros, remove them from l2tp_core.h. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Tom Parkin	6b7bdcd7ca	l2tp: add tracepoints to l2tp_core.c Add lifetime event tracing for tunnel and session instances, tracking tunnel and session registration, deletion, and eventual freeing. Port the data path sequence number debug logging to use trace points rather than custom debug macros. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Tom Parkin	2a03dd8e11	l2tp: add tracepoint definitions in trace.h l2tp can provide a better debug experience using tracepoints rather than printk-style logging. Add tracepoint definitions in trace.h for use in the l2tp subsystem code. Add preprocessor definitions for the length of session and tunnel names in l2tp_core.h so we can reuse these in trace.h. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Tom Parkin	3f117d6f4b	l2tp: add tracepoint infrastructure to core The l2tp subsystem doesn't currently make use of tracepoints. As a starting point for adding tracepoints, add skeleton infrastructure for defining tracepoints for the subsystem, and for having them build appropriately whether compiled into the kernel or built as a module. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Tom Parkin	5ee759cda5	l2tp: use standard API for warning log messages The l2tp_* log wrappers only emit messages of a given category if the tunnel or session structure has the appropriate flag set in its debug field. Flags default to being unset. For warning messages, this doesn't make a lot of sense since an administrator is likely to want to know about datapath warnings without needing to tweak the debug flags setting for a given tunnel or session instance. Modify l2tp_warn callsites to use pr_warn_ratelimited instead for unconditional output of warning messages. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Tom Parkin	ab141e3733	l2tp: remove noisy logging, use appropriate log levels l2tp_ppp in particular had a lot of log messages for tracing [get\|set]sockopt calls. These aren't especially useful, so remove these messages. Several log messages flagging error conditions were logged using l2tp_info: they're better off as l2tp_warn. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Tom Parkin	12923365eb	l2tp: don't log data frames l2tp had logging to trace data frame receipt and transmission, including code to dump packet contents. This was originally intended to aid debugging of core l2tp packet handling, but is of limited use now that code is stable. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:44:37 -07:00
Nikolay Aleksandrov	eeaac3634e	net: nexthop: don't allow empty NHA_GROUP Currently the nexthop code will use an empty NHA_GROUP attribute, but it requires at least 1 entry in order to function properly. Otherwise we end up derefencing null or random pointers all over the place due to not having any nh_grp_entry members allocated, nexthop code relies on having at least the first member present. Empty NHA_GROUP doesn't make any sense so just disallow it. Also add a WARN_ON for any future users of nexthop_create_group(). BUG: kernel NULL pointer dereference, address: 0000000000000080 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:fib_check_nexthop+0x4a/0xaa Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85 RSP: 0018:ffff88807983ba00 EFLAGS: 00010213 RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000 RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80 RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001 FS: 00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0 Call Trace: fib_create_info+0x64d/0xaf7 fib_table_insert+0xf6/0x581 ? __vma_adjust+0x3b6/0x4d4 inet_rtm_newroute+0x56/0x70 rtnetlink_rcv_msg+0x1e3/0x20d ? rtnl_calcit.isra.0+0xb8/0xb8 netlink_rcv_skb+0x5b/0xac netlink_unicast+0xfa/0x17b netlink_sendmsg+0x334/0x353 sock_sendmsg_nosec+0xf/0x3f ____sys_sendmsg+0x1a0/0x1fc ? copy_msghdr_from_user+0x4c/0x61 ___sys_sendmsg+0x63/0x84 ? handle_mm_fault+0xa39/0x11b5 ? sockfd_lookup_light+0x72/0x9a __sys_sendmsg+0x50/0x6e do_syscall_64+0x54/0xbe entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f10dacc0bb7 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48 RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7 RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003 RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008 R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440 Modules linked in: CR2: 0000000000000080 CC: David Ahern <dsahern@gmail.com> Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:39:55 -07:00
Miaohe Lin	1aecbf1861	net: dccp: Convert to use the preferred fallthrough macro Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-22 12:38:34 -07:00
Lorenz Bauer	0126240f44	bpf: sockmap: Allow update from BPF Allow calling bpf_map_update_elem on sockmap and sockhash from a BPF context. The synchronization required for this is a bit fiddly: we need to prevent the socket from changing its state while we add it to the sockmap, since we rely on getting a callback via sk_prot->unhash. However, we can't just lock_sock like in sock_map_sk_acquire because that might sleep. So instead we disable softirq processing and use bh_lock_sock to prevent further modification. Yet, this is still not enough. BPF can be called in contexts where the current CPU might have locked a socket. If the BPF can get a hold of such a socket, inserting it into a sockmap would lead to a deadlock. One straight forward example are sock_ops programs that have ctx->sk, but the same problem exists for kprobes, etc. We deal with this by allowing sockmap updates only from known safe contexts. Improper usage is rejected by the verifier. I've audited the enabled contexts to make sure they can't run in a locked context. It's possible that CGROUP_SKB and others are safe as well, but the auditing here is much more difficult. In any case, we can extend the safe contexts when the need arises. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200821102948.21918-6-lmb@cloudflare.com	2020-08-21 15:16:12 -07:00
Lorenz Bauer	13b79d3ffb	bpf: sockmap: Call sock_map_update_elem directly Don't go via map->ops to call sock_map_update_elem, since we know what function to call in bpf_map_update_value. Since we currently don't allow calling map_update_elem from BPF context, we can remove ops->map_update_elem and rename the function to sock_map_update_elem_sys. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200821102948.21918-4-lmb@cloudflare.com	2020-08-21 15:16:11 -07:00
Lorenz Bauer	38e12f908a	bpf: sockmap: Merge sockmap and sockhash update functions Merge the two very similar functions sock_map_update_elem and sock_hash_update_elem into one. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200821102948.21918-3-lmb@cloudflare.com	2020-08-21 15:16:11 -07:00
Lorenz Bauer	7b219da43f	net: sk_msg: Simplify sk_psock initialization Initializing psock->sk_proto and other saved callbacks is only done in sk_psock_update_proto, after sk_psock_init has returned. The logic for this is difficult to follow, and needlessly complex. Instead, initialize psock->sk_proto whenever we allocate a new psock. Additionally, assert the following invariants: * The SK has no ULP: ULP does it's own finagling of sk->sk_prot * sk_user_data is unused: we need it to store sk_psock Protect our access to sk_user_data with sk_callback_lock, which is what other users like reuseport arrays, etc. do. The result is that an sk_psock is always fully initialized, and that psock->sk_proto is always the "original" struct proto. The latter allows us to use psock->sk_proto when initializing IPv6 TCP / UDP callbacks for sockmap. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200821102948.21918-2-lmb@cloudflare.com	2020-08-21 15:16:11 -07:00
Yonghong Song	b76f222690	bpf: Implement link_query callbacks in map element iterators For bpf_map_elem and bpf_sk_local_storage bpf iterators, additional map_id should be shown for fdinfo and userspace query. For example, the following is for a bpf_map_elem iterator. $ cat /proc/1753/fdinfo/9 pos: 0 flags: 02000000 mnt_id: 14 link_type: iter link_id: 34 prog_tag: 104be6d3fe45e6aa prog_id: 173 target_name: bpf_map_elem map_id: 127 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821184419.574240-1-yhs@fb.com	2020-08-21 14:01:39 -07:00
David S. Miller	4af7b32f84	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2020-08-21 The following pull-request contains BPF updates for your net tree. We've added 11 non-merge commits during the last 5 day(s) which contain a total of 12 files changed, 78 insertions(+), 24 deletions(-). The main changes are: 1) three fixes in BPF task iterator logic, from Yonghong. 2) fix for compressed dwarf sections in vmlinux, from Jiri. 3) fix xdp attach regression, from Andrii. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-21 12:54:50 -07:00
Florian Westphal	1e105e6afa	netfilter: nf_tables: fix destination register zeroing Following bug was reported via irc: nft list ruleset set knock_candidates_ipv4 { type ipv4_addr . inet_service size 65535 elements = { 127.0.0.1 . 123, 127.0.0.1 . 123 } } .. udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 } udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport } It should not have been possible to add a duplicate set entry. After some debugging it turned out that the problem is the immediate value (123) in the second-to-last rule. Concatenations use 32bit registers, i.e. the elements are 8 bytes each, not 6 and it turns out the kernel inserted inet firewall @knock_candidates_ipv4 element 0100007f ffff7b00 : 0 [end] element 0100007f 00007b00 : 0 [end] Note the non-zero upper bits of the first element. It turns out that nft_immediate doesn't zero the destination register, but this is needed when the length isn't a multiple of 4. Furthermore, the zeroing in nft_payload is broken. We can't use [len / 4] = 0 -- if len is a multiple of 4, index is off by one. Skip zeroing in this case and use a conditional instead of (len -1) / 4. Fixes: `49499c3e6e` ("netfilter: nf_tables: switch registers to 32 bit addressing") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-21 19:00:33 +02:00
Pablo Neira Ayuso	6f03bf43ee	netfilter: nf_tables: add NFTA_SET_USERDATA if not null Kernel sends an empty NFTA_SET_USERDATA attribute with no value if userspace adds a set with no NFTA_SET_USERDATA attribute. Fixes: `e6d8ecac9e` ("netfilter: nf_tables: Add new attributes into nft_set to store user data.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-21 19:00:14 +02:00
Stefano Brivio	0726763043	netfilter: nft_set_rbtree: Detect partial overlap with start endpoint match Getting creative with nft and omitting the interval_overlap() check from the set_overlap() function, without omitting set_overlap() altogether, led to the observation of a partial overlap that wasn't detected, and would actually result in replacement of the end element of an existing interval. This is due to the fact that we'll return -EEXIST on a matching, pre-existing start element, instead of -ENOTEMPTY, and the error is cleared by API if NLM_F_EXCL is not given. At this point, we can insert a matching start, and duplicate the end element as long as we don't end up into other intervals. For instance, inserting interval 0 - 2 with an existing 0 - 3 interval would result in a single 0 - 2 interval, and a dangling '3' end element. This is because nft will proceed after inserting the '0' start element as no error is reported, and no further conflicting intervals are detected on insertion of the end element. This needs a different approach as it's a local condition that can be detected by looking for duplicate ends coming from left and right, separately. Track those and directly report -ENOTEMPTY on duplicated end elements for a matching start. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-21 17:37:36 +02:00
Stefano Brivio	226a88de47	netfilter: nft_set_rbtree: Handle outcomes of tree rotations in overlap detection Checks for partial overlaps on insertion assume that end elements are always descendant nodes of their corresponding start, because they are inserted later. However, this is not the case if a previous delete operation caused a tree rotation as part of rebalancing. Taking the issue reported by Andreas Fischer as an example, if we omit delete operations, the existing procedure works because, equivalently, we are inserting a start item with value 40 in the this region of the red-black tree with single-sized intervals: overlap flag 10 (start) / \ false 20 (start) / \ false 30 (start) / \ false 60 (start) / \ false 50 (end) / \ false 20 (end) / \ false 40 (start) if we now delete interval 30 - 30, the tree can be rearranged in a way similar to this (note the rotation involving 50 - 50): overlap flag 10 (start) / \ false 20 (start) / \ false 25 (start) / \ false 70 (start) / \ false 50 (end) / \ true (from rule a1.) 50 (start) / \ true 40 (start) and we traverse interval 50 - 50 from the opposite direction compared to what was expected. To deal with those cases, add a start-before-start rule, b4., that covers traversal of existing intervals from the right. We now need to restrict start-after-end rule b3. to cases where there are no occurring nodes between existing start and end elements, because addition of rule b4. isn't sufficient to ensure that the pre-existing end element we encounter while descending the tree corresponds to a start element of an interval that we already traversed entirely. Different types of overlap detection on trees with rotations resulting from re-balancing will be covered by nft test case sets/0044interval_overlap_1. Reported-by: Andreas Fischer <netfilter@d9c.eu> Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1449 Cc: <stable@vger.kernel.org> # 5.6.x Fixes: `7c84d41416` ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-21 17:36:52 +02:00
Xin Long	f6db909641	tipc: call rcu_read_lock() in tipc_aead_encrypt_done() b->media->send_msg() requires rcu_read_lock(), as we can see elsewhere in tipc, tipc_bearer_xmit, tipc_bearer_xmit_skb and tipc_bearer_bc_xmit(). Syzbot has reported this issue as: net/tipc/bearer.c:466 suspicious rcu_dereference_check() usage! Workqueue: cryptd cryptd_queue_worker Call Trace: tipc_l2_send_msg+0x354/0x420 net/tipc/bearer.c:466 tipc_aead_encrypt_done+0x204/0x3a0 net/tipc/crypto.c:761 cryptd_aead_crypt+0xe8/0x1d0 crypto/cryptd.c:739 cryptd_queue_worker+0x118/0x1b0 crypto/cryptd.c:181 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 So fix it by calling rcu_read_lock() in tipc_aead_encrypt_done() for b->media->send_msg(). Fixes: `fc1b6d6de2` ("tipc: introduce TIPC encryption & authentication") Reported-by: syzbot+47bbc6b678d317cccbe0@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-20 16:42:08 -07:00
Alaa Hleihel	eda814b97d	net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow tcf_ct_handle_fragments() shouldn't free the skb when ip_defrag() call fails. Otherwise, we will cause a double-free bug. In such cases, just return the error to the caller. Fixes: `b57dc7c13e` ("net/sched: Introduce action ct") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-20 16:39:31 -07:00
David Laight	ab921f3cdb	net: sctp: Fix negotiation of the number of data streams. The number of output and input streams was never being reduced, eg when processing received INIT or INIT_ACK chunks. The effect is that DATA chunks can be sent with invalid stream ids and then discarded by the remote system. Fixes: `2075e50caf` ("sctp: convert to genradix") Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-20 16:37:37 -07:00
Mark Tomlinson	272502fcb7	gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY When receiving an IPv4 packet inside an IPv6 GRE packet, and the IP6_TNL_F_RCV_DSCP_COPY flag is set on the tunnel, the IPv4 header would get corrupted. This is due to the common ip6_tnl_rcv() function assuming that the inner header is always IPv6. This patch checks the tunnel protocol for IPv4 inner packets, but still defaults to IPv6. Fixes: `308edfdf15` ("gre6: Cleanup GREv6 receive path, call common GRE functions") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-20 16:28:04 -07:00
Vishal Kulkarni	d0a84e1f38	ethtool: allow flow-type ether without IP protocol field Set IP protocol mask only when IP protocol field is set. This will allow flow-type ether with vlan rule which don't have protocol field to apply. ethtool -N ens5f4 flow-type ether proto 0x8100 vlan 0x600\ m 0x1FFF action 3 loc 16 Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-20 16:26:03 -07:00
Eric Dumazet	394fcd8a81	net: zerocopy: combine pages in zerocopy_sg_from_iter() Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0 fragments. Compared to standard sendmsg(), these skbs usually contain up to 16 fragments on arches with 4KB page sizes, instead of two. This adds considerable costs on various ndo_start_xmit() handlers, especially when IOMMU is in the picture. As high performance applications are often using huge pages, we can try to combine adjacent pages belonging to same compound page. Tested on AMD Rome platform, with IOMMU, nominal single TCP flow speed is roughly doubled (~55Gbit -> ~100Gbit), when user application is using hugepages. For reference, nominal single TCP flow speed on this platform without MSG_ZEROCOPY is ~65Gbit. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-20 16:12:50 -07:00
Andrii Nakryiko	c8a36f1945	bpf: xdp: Fix XDP mode when no mode flags specified `7f0a838254` ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") inadvertently changed which XDP mode is assumed when no mode flags are specified explicitly. Previously, driver mode was preferred, if driver supported it. If not, generic SKB mode was chosen. That commit changed default to SKB mode always. This patch fixes the issue and restores the original logic. Fixes: `7f0a838254` ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") Reported-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/bpf/20200820052841.1559757-1-andriin@fb.com	2020-08-20 14:27:12 -07:00
Al Viro	cc44c17baf	csum_partial_copy_nocheck(): drop the last argument It's always 0. Note that we theoretically could use ~0U as well - result will be the same modulo 0xffff, _if_ the damn thing did the right thing for any value of initial sum; later we'll make use of that when convenient. However, unlike csum_and_copy_..._user(), there are instances that did not work for arbitrary initial sums; c6x is one such. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-08-20 15:45:14 -04:00
Al Viro	3ea7ca80d9	icmp_push_reply(): reorder adding the checksum up do csum_partial_copy_nocheck() on the first fragment, then add the rest to it. Equivalent transformation. That was the only caller of csum_partial_copy_nocheck() that might pass it non-zero as the last argument. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-08-20 15:45:13 -04:00
Al Viro	8d5930dfb7	skb_copy_and_csum_bits(): don't bother with the last argument it's always 0 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-08-20 15:45:13 -04:00
Peilin Ye	ce51f63e63	net/smc: Prevent kernel-infoleak in __smc_diag_dump() __smc_diag_dump() is potentially copying uninitialized kernel stack memory into socket buffers, since the compiler may leave a 4-byte hole near the beginning of `struct smcd_diag_dmbinfo`. Fix it by initializing `dinfo` with memset(). Fixes: `4b1b7d3b30` ("net/smc: add SMC-D diag support") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-20 12:07:31 -07:00
David Howells	1d4adfaf65	rxrpc: Make rxrpc_kernel_get_srtt() indicate validity Fix rxrpc_kernel_get_srtt() to indicate the validity of the returned smoothed RTT. If we haven't had any valid samples yet, the SRTT isn't useful. Fixes: `c410bf0193` ("rxrpc: Fix the excessive initial retransmission timeout") Signed-off-by: David Howells <dhowells@redhat.com>	2020-08-20 18:21:28 +01:00
David Howells	4700c4d80b	rxrpc: Fix loss of RTT samples due to interposed ACK The Rx protocol has a mechanism to help generate RTT samples that works by a client transmitting a REQUESTED-type ACK when it receives a DATA packet that has the REQUEST_ACK flag set. The peer, however, may interpose other ACKs before transmitting the REQUESTED-ACK, as can be seen in the following trace excerpt: rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07 rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0 rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0 ... DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx). The incoming ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the sequence number of the DATA packet), causing it to be discarded from the Tx ring. The ACK that was requested (labelled REQ, r=xx references the serial of the DATA packet) comes after the ping, but the sk_buff holding the timestamp has gone and the RTT sample is lost. This is particularly noticeable on RPC calls used to probe the service offered by the peer. A lot of peers end up with an unknown RTT because we only ever sent a single RPC. This confuses the server rotation algorithm. Fix this by caching the information about the outgoing packet in RTT calculations in the rxrpc_call struct rather than looking in the Tx ring. A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and PING-ACK transmissions are recorded in there. When the appropriate response ACK is received, the buffer is checked for a match and, if found, an RTT sample is recorded. If a received ACK refers to a packet with a later serial number than an entry in the cache, that entry is presumed lost and the entry is made available to record a new transmission. ACKs types other than REQUESTED-type and PING-type cause any matching sample to be cancelled as they don't necessarily represent a useful measurement. If there's no space in the buffer on ping/data transmission, the sample base is discarded. Fixes: `50235c4b5a` ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets") Signed-off-by: David Howells <dhowells@redhat.com>	2020-08-20 17:59:27 +01:00
David Howells	68528d937d	rxrpc: Keep the ACK serial in a var in rxrpc_input_ack() Keep the ACK serial number in a variable in rxrpc_input_ack() as it's used frequently. Signed-off-by: David Howells <dhowells@redhat.com>	2020-08-20 16:52:23 +01:00
Johannes Berg	fce2ff728f	nl80211: fix NL80211_ATTR_HE_6GHZ_CAPABILITY usage In nl80211_set_station(), we check NL80211_ATTR_HE_6GHZ_CAPABILITY and then use NL80211_ATTR_HE_CAPABILITY, which is clearly wrong. Fix this to use NL80211_ATTR_HE_6GHZ_CAPABILITY as well. Cc: stable@vger.kernel.org Fixes: `43e64bf301` ("cfg80211: handle 6 GHz capability of new station") Link: https://lore.kernel.org/r/20200805153516.310cef625955.I0abc04dc8abb2c7c005c88ef8fa2d0e3c9fb95c4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-20 17:23:03 +02:00
Alexei Starovoitov	d71fa5c976	bpf: Add kernel module with user mode driver that populates bpffs. Add kernel module with user mode driver that populates bpffs with BPF iterators. $ mount bpffs /my/bpffs/ -t bpf $ ls -la /my/bpffs/ total 4 drwxrwxrwt 2 root root 0 Jul 2 00:27 . drwxr-xr-x 19 root root 4096 Jul 2 00:09 .. -rw------- 1 root root 0 Jul 2 00:27 maps.debug -rw------- 1 root root 0 Jul 2 00:27 progs.debug The user mode driver will load BPF Type Formats, create BPF maps, populate BPF maps, load two BPF programs, attach them to BPF iterators, and finally send two bpf_link IDs back to the kernel. The kernel will pin two bpf_links into newly mounted bpffs instance under names "progs.debug" and "maps.debug". These two files become human readable. $ cat /my/bpffs/progs.debug id name attached 11 dump_bpf_map bpf_iter_bpf_map 12 dump_bpf_prog bpf_iter_bpf_prog 27 test_pkt_access 32 test_main test_pkt_access test_pkt_access 33 test_subprog1 test_pkt_access_subprog1 test_pkt_access 34 test_subprog2 test_pkt_access_subprog2 test_pkt_access 35 test_subprog3 test_pkt_access_subprog3 test_pkt_access 36 new_get_skb_len get_skb_len test_pkt_access 37 new_get_skb_ifindex get_skb_ifindex test_pkt_access 38 new_get_constant get_constant test_pkt_access The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about all BPF programs currently loaded in the system. This information is unstable and will change from kernel to kernel as ".debug" suffix conveys. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200819042759.51280-4-alexei.starovoitov@gmail.com	2020-08-20 16:02:36 +02:00
Florian Westphal	cc5453a5b7	netfilter: conntrack: allow sctp hearbeat after connection re-use If an sctp connection gets re-used, heartbeats are flagged as invalid because their vtag doesn't match. Handle this in a similar way as TCP conntrack when it suspects that the endpoints and conntrack are out-of-sync. When a HEARTBEAT request fails its vtag validation, flag this in the conntrack state and accept the packet. When a HEARTBEAT_ACK is received with an invalid vtag in the reverse direction after we allowed such a HEARTBEAT through, assume we are out-of-sync and re-set the vtag info. v2: remove left-over snippet from an older incarnation that moved new_state/old_state assignments, thats not needed so keep that as-is. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-20 14:13:49 +02:00
Kurt Kanzenbach	bdfbb63c31	ptp: Add generic ptp v2 header parsing function Reason: A lot of the ptp drivers - which implement hardware time stamping - need specific fields such as the sequence id from the ptp v2 header. Currently all drivers implement that themselves. Introduce a generic function to retrieve a pointer to the start of the ptp v2 header. Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-19 16:07:49 -07:00
Johannes Berg	d1fb555929	netlink: fix state reallocation in policy export Evidently, when I did this previously, we didn't have more than 10 policies and didn't run into the reallocation path, because it's missing a memset() for the unused policies. Fix that. Fixes: `d07dcf9aad` ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-19 15:39:36 -07:00
Miaohe Lin	f4ecc74853	net: Stop warning about SO_BSDCOMPAT usage We've been warning about SO_BSDCOMPAT usage for many years. We may remove this code completely now. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-19 12:53:48 -07:00
Wang Hai	ad112aa8b1	SUNRPC: remove duplicate include Remove linux/sunrpc/auth_gss.h which is included more than once Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-08-19 13:19:42 -04:00
Colin Ian King	ad6641189c	net: ipv4: remove duplicate "the the" phrase in Kconfig text The Kconfig help text contains the phrase "the the" in the help text. Fix this and reformat the block of help text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 16:02:16 -07:00
Maxim Mikityanskiy	f01204ec8b	ethtool: Don't omit the netlink reply if no features were changed The legacy ethtool userspace tool shows an error when no features could be changed. It's useful to have a netlink reply to be able to show this error when __netdev_update_features wasn't called, for example: 1. ethtool -k eth0 large-receive-offload: off 2. ethtool -K eth0 rx-fcs on 3. ethtool -K eth0 lro on Could not change any device features rx-lro: off [requested on] 4. ethtool -K eth0 lro on # The output should be the same, but without this patch the kernel # doesn't send the reply, and ethtool is unable to detect the error. This commit makes ethtool-netlink always return a reply when requested, and it still avoids unnecessary calls to __netdev_update_features if the wanted features haven't changed. Fixes: `0980bfcd69` ("ethtool: set netdev features with FEATURES_SET request") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 16:00:24 -07:00
Maxim Mikityanskiy	2847bfed88	ethtool: Account for hw_features in netlink interface ethtool-netlink ignores dev->hw_features and may confuse the drivers by asking them to enable features not in the hw_features bitmask. For example: 1. ethtool -k eth0 tls-hw-tx-offload: off [fixed] 2. ethtool -K eth0 tls-hw-tx-offload on tls-hw-tx-offload: on 3. ethtool -k eth0 tls-hw-tx-offload: on [fixed] Fitler out dev->hw_features from req_wanted to fix it and to resemble the legacy ethtool behavior. Fixes: `0980bfcd69` ("ethtool: set netdev features with FEATURES_SET request") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 16:00:24 -07:00
Maxim Mikityanskiy	840110a4ea	ethtool: Fix preserving of wanted feature bits in netlink interface Currently, ethtool-netlink calculates new wanted bits as: (req_wanted & req_mask) \| (old_active & ~req_mask) It completely discards the old wanted bits, so they are forgotten with the next ethtool command. Sample steps to reproduce: 1. ethtool -k eth0 tx-tcp-segmentation: on # TSO is on from the beginning 2. ethtool -K eth0 tx off tx-tcp-segmentation: off [not requested] 3. ethtool -k eth0 tx-tcp-segmentation: off [requested on] 4. ethtool -K eth0 rx off # Some change unrelated to TSO 5. ethtool -k eth0 tx-tcp-segmentation: off # "Wanted on" is forgotten This commit fixes it by changing the formula to: (req_wanted & req_mask) \| (old_wanted & ~req_mask), where old_active was replaced by old_wanted to account for the wanted bits. The shortcut condition for the case where nothing was changed now compares wanted bitmasks, instead of wanted to active. Fixes: `0980bfcd69` ("ethtool: set netdev features with FEATURES_SET request") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 16:00:24 -07:00
Xin Long	4ef1a7cb08	ipv6: some fixes for ipv6_dev_find() This patch is to do 3 things for ipv6_dev_find(): As David A. noticed, - rt6_lookup() is not really needed. Different from __ip_dev_find(), ipv6_dev_find() doesn't have a compatibility problem, so remove it. As Hideaki suggested, - "valid" (non-tentative) check for the address is also needed. ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will traverse the address hash list, but it's heavy to be called inside ipv6_dev_find(). This patch is to reuse the code of ipv6_chk_addr_and_flags() for ipv6_dev_find(). - dev parameter is passed into ipv6_dev_find(), as link-local addresses from user space has sin6_scope_id set and the dev lookup needs it. Fixes: `81f6cb3122` ("ipv6: add ipv6_dev_find()") Suggested-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 15:58:53 -07:00
Miaohe Lin	eabe861881	net: handle the return value of pskb_carve_frag_list() correctly pskb_carve_frag_list() may return -ENOMEM in pskb_carve_inside_nonlinear(). we should handle this correctly or we would get wrong sk_buff. Fixes: `6fa01ccd88` ("skbuff: Add pskb_extract() helper function") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 15:56:56 -07:00
Miaohe Lin	e3ec1e8ca0	net: eliminate meaningless memcpy to data in pskb_carve_inside_nonlinear() The frags of skb_shared_info of the data is assigned in following loop. It is meaningless to do a memcpy of frags here. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 15:55:24 -07:00
Miaohe Lin	7f8901b74b	net: tipc: Convert to use the preferred fallthrough macro Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 12:46:48 -07:00
Johannes Berg	8aa26c575f	netlink: make NLA_BINARY validation more flexible Add range validation for NLA_BINARY, allowing validation of any combination of combination minimum or maximum lengths, using the existing NLA_POLICY_RANGE()/NLA_POLICY_FULL_RANGE() macros, just like for integers where the value is checked. Also make NLA_POLICY_EXACT_LEN(), NLA_POLICY_EXACT_LEN_WARN() and NLA_POLICY_MIN_LEN() special cases of this, removing the old types NLA_EXACT_LEN and NLA_MIN_LEN. This allows us to save some code where both minimum and maximum lengths are requires, currently the policy only allows maximum (NLA_BINARY), minimum (NLA_MIN_LEN) or exact (NLA_EXACT_LEN), so a range of lengths cannot be accepted and must be checked by the code that consumes the attributes later. Also, this allows advertising the correct ranges in the policy export to userspace. Here, NLA_MIN_LEN and NLA_EXACT_LEN already were special cases of NLA_BINARY with min and min/max length respectively. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 12:28:45 -07:00
Johannes Berg	bc04358550	netlink: consistently use NLA_POLICY_MIN_LEN() Change places that open-code NLA_POLICY_MIN_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 12:28:45 -07:00
Johannes Berg	8140860c81	netlink: consistently use NLA_POLICY_EXACT_LEN() Change places that open-code NLA_POLICY_EXACT_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 12:28:45 -07:00
Jussi Kivilinna	279e89b228	batman-adv: bla: use netif_rx_ni when not in interrupt context batadv_bla_send_claim() gets called from worker thread context through batadv_bla_periodic_work(), thus netif_rx_ni needs to be used in that case. This fixes "NOHZ: local_softirq_pending 08" log messages seen when batman-adv is enabled. Fixes: `23721387c4` ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Jussi Kivilinna <jussi.kivilinna@haltian.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-08-18 19:40:03 +02:00
Linus Lüssing	d8bf0c0164	batman-adv: Fix own OGM check in aggregated OGMs The own OGM check is currently misplaced and can lead to the following issues: For one thing we might receive an aggregated OGM from a neighbor node which has our own OGM in the first place. We would then not only skip our own OGM but erroneously also any other, following OGM in the aggregate. For another, we might receive an OGM aggregate which has our own OGM in a place other then the first one. Then we would wrongly not skip this OGM, leading to populating the orginator and gateway table with ourself. Fixes: `9323158ef9` ("batman-adv: OGMv2 - implement originators logic") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-08-18 19:40:03 +02:00
Sven Eckelmann	303216e76d	batman-adv: Avoid uninitialized chaddr when handling DHCP The gateway client code can try to optimize the delivery of DHCP packets to avoid broadcasting them through the whole mesh. But also transmissions to the client can be optimized by looking up the destination via the chaddr of the DHCP packet. But the chaddr is currently only done when chaddr is fully inside the non-paged area of the skbuff. Otherwise it will not be initialized and the unoptimized path should have been taken. But the implementation didn't handle this correctly. It didn't retrieve the correct chaddr but still tried to perform the TT lookup with this uninitialized memory. Reported-by: syzbot+ab16e463b903f5a37036@syzkaller.appspotmail.com Fixes: `6c413b1c22` ("batman-adv: send every DHCP packet as bat-unicast") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-08-18 19:40:03 +02:00
Sven Eckelmann	0093870aa8	batman-adv: Migrate to linux/prandom.h The commit `c0842fbc1b` ("random32: move the pseudo-random 32-bit definitions to prandom.h") introduced a new header for the pseudo random functions from (previously) linux/random.h. One future goal of the prandom.h change is to make code to switch just the new header file and to avoid the implicit include. This would allow the removal of the implicit include from random.h Signed-off-by: Sven Eckelmann <sven@narfation.org>	2020-08-18 19:39:54 +02:00
Sven Eckelmann	21ba5ab2aa	batman-adv: Drop repeated words in comments checkpatch found various instances of "Possible repeated word" in various comments. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-08-18 19:39:54 +02:00
Randy Dunlap	6f5b92a79c	batman-adv: types.h: delete duplicated words Delete the doubled word "time" in a comment. Delete the doubled word "address" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-08-18 19:39:53 +02:00
Sven Eckelmann	c3b92dd490	batman-adv: Drop unused function batadv_hardif_remove_interfaces() The function batadv_hardif_remove_interfaces was meant to remove all interfaces which are currently in the list of known (compatible) hardifs during module unload. But the function unregister_netdevice_notifier is called in batadv_exit before batadv_hardif_remove_interfaces. This will trigger NETDEV_UNREGISTER events for all available interfaces and in this process remove all interfaces from batadv_hardif_list. And batadv_hardif_remove_interfaces only operated on this (empty) list. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-08-18 19:39:53 +02:00
Simon Wunderlich	426988ee84	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-08-18 19:39:53 +02:00
Linus Torvalds	4cf7562190	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: "Another batch of fixes: 1) Remove nft_compat counter flush optimization, it generates warnings from the refcount infrastructure. From Florian Westphal. 2) Fix BPF to search for build id more robustly, from Jiri Olsa. 3) Handle bogus getopt lengths in ebtables, from Florian Westphal. 4) Infoleak and other fixes to j1939 CAN driver, from Eric Dumazet and Oleksij Rempel. 5) Reset iter properly on mptcp sendmsg() error, from Florian Westphal. 6) Show a saner speed in bonding broadcast mode, from Jarod Wilson. 7) Various kerneldoc fixes in bonding and elsewhere, from Lee Jones. 8) Fix double unregister in bonding during namespace tear down, from Cong Wang. 9) Disable RP filter during icmp_redirect selftest, from David Ahern" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits) otx2_common: Use devm_kcalloc() in otx2_config_npa() net: qrtr: fix usage of idr in port assignment to socket selftests: disable rp_filter for icmp_redirect.sh Revert "net: xdp: pull ethernet header off packet after computing skb->protocol" phylink: <linux/phylink.h>: fix function prototype kernel-doc warning mptcp: sendmsg: reset iter on error redux net: devlink: Remove overzealous WARN_ON with snapshots tipc: not enable tipc when ipv6 works as a module tipc: fix uninit skb->data in tipc_nl_compat_dumpit() net: Fix potential wrong skb->protocol in skb_vlan_untag() net: xdp: pull ethernet header off packet after computing skb->protocol ipvlan: fix device features bonding: fix a potential double-unregister can: j1939: add rxtimer for multipacket broadcast session can: j1939: abort multipacket broadcast session when timeout occurs can: j1939: cancel rxtimer on multipacket broadcast session complete can: j1939: fix support for multipacket broadcast message net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs' net: fddi: skfp: cfm: Remove set but unused variable 'oldstate' net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs' ...	2020-08-17 17:09:50 -07:00
Necip Fazil Yildiran	8dfddfb796	net: qrtr: fix usage of idr in port assignment to socket Passing large uint32 sockaddr_qrtr.port numbers for port allocation triggers a warning within idr_alloc() since the port number is cast to int, and thus interpreted as a negative number. This leads to the rejection of such valid port numbers in qrtr_port_assign() as idr_alloc() fails. To avoid the problem, switch to idr_alloc_u32() instead. Fixes: `bdabad3e36` ("net: Add Qualcomm IPC router") Reported-by: syzbot+f31428628ef672716ea8@syzkaller.appspotmail.com Signed-off-by: Necip Fazil Yildiran <necip@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-17 15:00:41 -07:00
David S. Miller	7f9bf6e824	Revert "net: xdp: pull ethernet header off packet after computing skb->protocol" This reverts commit `f8414a8d88`. eth_type_trans() does the necessary pull on the skb. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-17 11:48:05 -07:00
Sabrina Dubroca	4eb2e13415	espintcp: restore IP CB before handing the packet to xfrm Xiumei reported a bug with espintcp over IPv6 in transport mode, because xfrm6_transport_finish expects to find IP6CB data (struct inet6_skb_cb). Currently, espintcp zeroes the CB, but the relevant part is actually preserved by previous layers (first set up by tcp, then strparser only zeroes a small part of tcp_skb_tb), so we can just relocate it to the start of skb->cb. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-08-17 15:58:04 +02:00
Florian Westphal	b3b2854dcf	mptcp: sendmsg: reset iter on error redux This fix wasn't correct: When this function is invoked from the retransmission worker, the iterator contains garbage and resetting it causes a crash. As the work queue should not be performance critical also zero the msghdr struct. Fixes: `3575938313` "(mptcp: sendmsg: reset iter on error)" Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 21:11:37 -07:00
Andrew Lunn	bd71ea6067	net: devlink: Remove overzealous WARN_ON with snapshots It is possible to trigger this WARN_ON from user space by triggering a devlink snapshot with an ID which already exists. We don't need both -EEXISTS being reported and spamming the kernel log. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 21:07:06 -07:00
Xin Long	c530189905	tipc: not enable tipc when ipv6 works as a module When using ipv6_dev_find() in one module, it requires ipv6 not to work as a module. Otherwise, this error occurs in build: undefined reference to `ipv6_dev_find'. So fix it by adding "depends on IPV6 \|\| IPV6=n" to tipc/Kconfig, as it does in sctp/Kconfig. Fixes: `5a6f6f5791` ("tipc: set ub->ifindex for local ipv6 address") Reported-by: kernel test robot <lkp@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 21:04:55 -07:00
Cong Wang	47733f9daf	tipc: fix uninit skb->data in tipc_nl_compat_dumpit() __tipc_nl_compat_dumpit() has two callers, and it expects them to pass a valid nlmsghdr via arg->data. This header is artificial and crafted just for __tipc_nl_compat_dumpit(). tipc_nl_compat_publ_dump() does so by putting a genlmsghdr as well as some nested attribute, TIPC_NLA_SOCK. But the other caller tipc_nl_compat_dumpit() does not, this leaves arg->data uninitialized on this call path. Fix this by just adding a similar nlmsghdr without any payload in tipc_nl_compat_dumpit(). This bug exists since day 1, but the recent commit `6ea67769ff` ("net: tipc: prepare attrs in __tipc_nl_compat_dumpit()") makes it easier to appear. Reported-and-tested-by: syzbot+0e7181deafa7e0b79923@syzkaller.appspotmail.com Fixes: `d0796d1ef6` ("tipc: convert legacy nl bearer dump to nl compat") Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 21:03:19 -07:00
David S. Miller	8c26544f5a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Endianness issue in IPv4 option support in nft_exthdr, from Stephen Suryaputra. 2) Removes the waitcount optimization in nft_compat, from Florian Westphal. 3) Remove ipv6 -> nf_defrag_ipv6 module dependency, from Florian Westphal. 4) Memleak in chain binding support, also from Florian. 5) Simplify nft_flowtable.sh selftest, from Fabian Frederick. 6) Optional MTU arguments for selftest nft_flowtable.sh, also from Fabian. 7) Remove noise error report when killing process in selftest nft_flowtable.sh, from Fabian Frederick. 8) Reject bogus getsockopt option length in ebtables, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 16:05:36 -07:00
David S. Miller	71a50419c7	linux-can-fixes-for-5.9-20200815 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl83p3MTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqcsSB/9YMSKfcLUNv8eB/J3kPb4n6wlrBO8d BLHGddA5+qVbki5Tx3rQ7qA+fj60v2bVWQYiwC4okBIE3h+uIi/vq+S+oYjA7ncA E6bd0804AgKYbsGWaYMQOrOfVlLyJOk886XOsd2szkpopdPCdgHz45RVgx+wXkk/ 4fUgeiGuBSNI7Kgei6w+OrMqB+dXyUZR5D0MePobVnN4vMjEZd4UsD6TAB3y4C3L Ar9lAj/uPiiUEq/ow7J4kYtfw3DibMFM6JRAJ5+B7dXmBZ4TeAPxlG9XxmAKKHMX R/QxkyZnkE3AIba6YHpIeYLQZyFa41363W5Z70rsA+fQ5Mig5jWXoNnY =hWSB -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.9-20200815' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-08-15 this is a pull request of 4 patches for net/master. All patches are by Zhang Changzhong and fix broadcast related problems in the j1939 CAN networking stack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 16:04:14 -07:00
Miaohe Lin	55eff0eb74	net: Fix potential wrong skb->protocol in skb_vlan_untag() We may access the two bytes after vlan_hdr in vlan_set_encap_proto(). So we should pull VLAN_HLEN + sizeof(unsigned short) in skb_vlan_untag() or we may access the wrong data. Fixes: `0d5501c1c8` ("net: Always untag vlan-tagged traffic on input.") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 15:33:49 -07:00
Jason A. Donenfeld	f8414a8d88	net: xdp: pull ethernet header off packet after computing skb->protocol When an XDP program changes the ethernet header protocol field, eth_type_trans is used to recalculate skb->protocol. In order for eth_type_trans to work correctly, the ethernet header must actually be part of the skb data segment, so the code first pushes that onto the head of the skb. However, it subsequently forgets to pull it back off, making the behavior of the passed-on packet inconsistent between the protocol modifying case and the static protocol case. This patch fixes the issue by simply pulling the ethernet header back off of the skb head. Fixes: `2972495699` ("net: fix generic XDP to handle if eth header was mangled") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-16 15:28:18 -07:00
Linus Torvalds	410520d07f	9p pull request for inclusion in 5.9 - some code cleanup - a couple of static analysis fixes - setattr: try to pick a fid associated with the file rather than the dentry, which might sometimes matter -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAl83dvkACgkQq06b7GqY 5nDk4g/+KNpbIFHhDyhCzmZ1WbaUYlZseFm8VBR9dzwOkXCuVl+WQd9ducYlEifT cZ29xmEuom01LhFOs6VzC+IODQq3h1WvEbVRfCSlLEWvfHyPODSDY8mrn+1TdOZL UOOP1wUjaDqmAcBQ4PSDNmj2cONNWPqkgnuyswjsBpabOY/U8FxG9J/QiPfC12zs QoF5800OiuLYS5SefI9kKJ47DkTLpm/aq7/Z1liAxJBzHrguORjg16cssXhWeysD jEMD3pIrsqMDX/vNkAO0DaR8FWvxTFZR1Bza1ylJiUy3K3rLXKfo+42jiCnuo5e9 uhA012RfiCZB3kIsLe5hCPXUWz0MhizaJAPkceHsgUtkuiU2qXyY05HEvS2HHl0s inqR8gRkZ2NhROx1R/0q+Wu2posCBderDwg7Mkzt/7j5Ue3cTIKvbeBxH9iLLaK+ rHp0NbfP6wYN5qKjH1rru1IOmaUCV2HHV6x/alnwaVBfZqg5HYKkzlS7M+5hAbtb niLFXXCtPBGqS+u9w7sruj6utuiJjXBfB1ejaawY+frS41+v3NkHjjUWCH67u8I9 u3pmPpvm5o7gRoRmvmxEdSb+jAYZwLTcbkUhRMHLQOHCBZXtcul41p2BUpoBm0zw 42mlMMMBAjaxtzLlryvACg5lx6x7ykE4pExKAc0dzrzWfgzwEZQ= =hwGq -----END PGP SIGNATURE----- Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - some code cleanup - a couple of static analysis fixes - setattr: try to pick a fid associated with the file rather than the dentry, which might sometimes matter * tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux: 9p: Remove unneeded cast from memory allocation 9p: remove unused code in 9p net/9p: Fix sparse endian warning in trans_fd.c 9p: Fix memory leak in v9fs_mount 9p: retrieve fid from file when file instance exist.	2020-08-15 08:34:36 -07:00
Linus Torvalds	37711e5e23	NFS client updates for Linux 5.9 Highlights include: Stable fixes: - pNFS: Don't return layout segments that are being used for I/O - pNFS: Don't move layout segments off the active list when being used for I/O Features: - NFS: Add support for user xattrs through the NFSv4.2 protocol - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC - NFSv4.0 allow nconnect for v4.0 Bugfixes and cleanups: - nfs: ensure correct writeback errors are returned on close() - nfs: nfs_file_write() should check for writeback errors - nfs: Fix getxattr kernel panic and memory overflow - NFS: Fix the pNFS/flexfiles mirrored read failover code - SUNRPC: dont update timeout value on connection reset - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS - sunrpc: destroy rpc_inode_cachep after unregister_filesystem -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl83CYEACgkQZwvnipYK APLx4g/9EQNTG5VkUToElRHs4b3vFxbmh9Odnk/JwPHxY5GQ8/AyGqwWHBgMZc7/ 2AV/C83pk7pJsDNsKbVAaFCT1cwjmItHM63vKJYGBYbE8LhkZ/1sYkdtPBYwHoVl 7CWfpVY/4NjYw5GJfrVA5Y0m7lrQInRtIMzfaENw2tpw+/cKUpadxgEJltzFNvpa Ploinr1ZRBl1tvfeHNRP5ZMPk2AfgGWtQKQ/b2UWUk5tXALoQm2Eu+/oku39uqhy hW5tCbU2BzR91gg5JwF9n7VowkCHXfe7lMzDBTVfwZOELOmoyys/1wKv550FWcWi yymljWiPGZOnXGT1vfKptPESQjdtElMfanvEZ0BzS+yNR0HZGnIupaxGlYlG9ZGU 2sXHQPp3mk2Q+L1IgbTSCnSju0YlZo32JQpYCZiROjIXnPWPQ50YNhr8GL18M1FW hTeShg2avWH+59GB6moEBmsuvui7Dy1jkimblToLEoGJ4kbvEl72FYSqTCkAXXbB rVUzhmJFgfk/EOS4d+QKJoBqNzw3aw79wyT7PLkoCYBqPZBHQexlmI+ktMbgUEdw c/fM7l5/Vcb9weIHKzul2Jbk5q6bFME/xPnIkr3v/oKIFlLFwQ04BX6R/42AMJHw V5Q9Wp81Vy6RXQMn8P0ZMeY0WQC/rhpijOEVMUC+Ni+spz44AdM= =k4pE -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Stable fixes: - pNFS: Don't return layout segments that are being used for I/O - pNFS: Don't move layout segments off the active list when being used for I/O Features: - NFS: Add support for user xattrs through the NFSv4.2 protocol - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC - NFSv4.0 allow nconnect for v4.0 Bugfixes and cleanups: - nfs: ensure correct writeback errors are returned on close() - nfs: nfs_file_write() should check for writeback errors - nfs: Fix getxattr kernel panic and memory overflow - NFS: Fix the pNFS/flexfiles mirrored read failover code - SUNRPC: dont update timeout value on connection reset - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS - sunrpc: destroy rpc_inode_cachep after unregister_filesystem" * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits) NFS: Fix flexfiles read failover fs: nfs: delete repeated words in comments rpc_pipefs: convert comma to semicolon nfs: Fix getxattr kernel panic and memory overflow NFS: Don't return layout segments that are in use NFS: Don't move layouts to plh_return_segs list while in use NFS: Add layout segment info to pnfs read/write/commit tracepoints NFS: Add tracepoints for layouterror and layoutstats. NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close() SUNRPC dont update timeout value on connection reset nfs: nfs_file_write() should check for writeback errors nfs: ensure correct writeback errors are returned on close() NFSv4.2: xattr cache: get rid of cache discard work queue NFS: remove redundant initialization of variable result NFSv4.0 allow nconnect for v4.0 freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS sunrpc: destroy rpc_inode_cachep after unregister_filesystem NFSv4.2: add client side xattr caching. NFSv4.2: hook in the user extended attribute handlers NFSv4.2: add the extended attribute proc functions. ...	2020-08-15 08:26:55 -07:00
Zhang Changzhong	0ae18a8268	can: j1939: add rxtimer for multipacket broadcast session According to SAE J1939/21 (Chapter 5.12.3 and APPENDIX C), for transmit side the required time interval between packets of a multipacket broadcast message is 50 to 200 ms, the responder shall use a timeout of 250ms (provides margin allowing for the maximumm spacing of 200ms). For receive side a timeout will occur when a time of greater than 750 ms elapsed between two message packets when more packets were expected. So this patch fix and add rxtimer for multipacket broadcast session. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-5-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-15 11:12:58 +02:00
Zhang Changzhong	2b8b2e3155	can: j1939: abort multipacket broadcast session when timeout occurs If timeout occurs, j1939_tp_rxtimer() first calls hrtimer_start() to restart rxtimer, and then calls __j1939_session_cancel() to set session->state = J1939_SESSION_WAITING_ABORT. At next timeout expiration, because of the J1939_SESSION_WAITING_ABORT session state j1939_tp_rxtimer() will call j1939_session_deactivate_activate_next() to deactivate current session, and rxtimer won't be set. But for multipacket broadcast session, __j1939_session_cancel() don't set session->state = J1939_SESSION_WAITING_ABORT, thus current session won't be deactivate and hrtimer_start() is called to start new rxtimer again and again. So fix it by moving session->state = J1939_SESSION_WAITING_ABORT out of if (!j1939_cb_is_broadcast(&session->skcb)) statement. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-4-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-15 11:12:58 +02:00
Zhang Changzhong	e8b1765308	can: j1939: cancel rxtimer on multipacket broadcast session complete If j1939_xtp_rx_dat_one() receive last frame of multipacket broadcast message, j1939_session_timers_cancel() should be called to cancel rxtimer. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-3-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-15 11:12:57 +02:00
Zhang Changzhong	f4fd77fd87	can: j1939: fix support for multipacket broadcast message Currently j1939_tp_im_involved_anydir() in j1939_tp_recv() check the previously set flags J1939_ECU_LOCAL_DST and J1939_ECU_LOCAL_SRC of incoming skb, thus multipacket broadcast message was aborted by receive side because it may come from remote ECUs and have no exact dst address. Similarly, j1939_tp_cmd_recv() and j1939_xtp_rx_dat() didn't process broadcast message. So fix it by checking and process broadcast message in j1939_tp_recv(), j1939_tp_cmd_recv() and j1939_xtp_rx_dat(). Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1596599425-5534-2-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-15 11:12:57 +02:00
David S. Miller	10a3b7c1c3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2020-08-15 The following pull-request contains BPF updates for your net tree. We've added 23 non-merge commits during the last 4 day(s) which contain a total of 32 files changed, 421 insertions(+), 141 deletions(-). The main changes are: 1) Fix sock_ops ctx access splat due to register override, from John Fastabend. 2) Batch of various fixes to libbpf, bpftool, and selftests when testing build in 32-bit mode, from Andrii Nakryiko. 3) Fix vmlinux.h generation on ARM by mapping GCC built-in types (__Poly*_t) to equivalent ones clang can work with, from Jean-Philippe Brucker. 4) Fix build_id lookup in bpf_get_stackid() helper by walking all NOTE ELF sections instead of just first, from Jiri Olsa. 5) Avoid use of __builtin_offsetof() in libbpf for CO-RE, from Yonghong Song. 6) Fix segfault in test_mmap due to inconsistent length params, from Jianlin Lv. 7) Don't override errno in libbpf when logging errors, from Toke Høiland-Jørgensen. 8) Fix v4_to_v6 sockaddr conversion in sk_lookup test, from Stanislav Fomichev. 9) Add link to bpf-helpers(7) man page to BPF doc, from Joe Stringer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-14 17:12:23 -07:00
Florian Westphal	3575938313	mptcp: sendmsg: reset iter on error Once we've copied data from the iterator we need to revert in case we end up not sending any data. This bug doesn't trigger with normal 'poll' based tests, because we only feed a small chunk of data to kernel after poll indicated POLLOUT. With blocking IO and large writes this triggers. Receiver ends up with less data than it should get. Fixes: `72511aab95` ("mptcp: avoid blocking in tcp_sendpages") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-14 14:11:37 -07:00
David S. Miller	e591d298cc	linux-can-fixes-for-5.9-20200814 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl82a18THG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqVbYB/9wF0ekQk66ES0PlIKkTDqDlt8hH8gN a86yOFow3jjOWOlol1wDyCqh4i+JD0PgRPTCmXJoFYfXUsNwjRQt2ZqVgOlS0dR3 wmxxNw0i7SQ3+LDEf3Dofgk39Y5+VSQGfT3fvM8FfHWi9Mkxmswi2EwCBMx7Cqyp lghxsJyZoeSnwfx7Wcel75NnYID1WscKUyXR6Qd240V1h7s3oAt6ZdDnYLPJnd7M zeRSY+HgDa98HBUMmQi627douQcTtXyE5Zpx/PkkRGEOYIS/SEF8XmA2cQ0Vu8R/ R/u/htBqoV7OlL5XdyH1l2+HDOu0M1EKZiNWYMV34yobUqkrGg0oiJut =NuYv -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.9-20200814' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-08-14 this is a pull request of 6 patches for net/master. All patches fix problems in the j1939 CAN networking stack. The first patch is by Eric Dumazet fixes a kernel-infoleak in j1939_sk_sock2sockaddr_can(). The remaining 5 patches are by Oleksij Rempel and fix recption of j1939 messages not orginated by the stack, a use-after-free in j1939_tp_txtimer(), ensure that the CAN driver has a ml_priv allocated. These problem were found by google's syzbot. Further ETP sessions with block size of less than 255 are fixed and a sanity check was added to j1939_xtp_rx_dat_one() to detect packet corruption. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-14 13:57:56 -07:00
Oleksij Rempel	e052d05402	can: j1939: transport: j1939_xtp_rx_dat_one(): compare own packets to detect corruptions Since the stack relays on receiving own packets, it was overwriting own transmit buffer from received packets. At least theoretically, the received echo buffer can be corrupt or changed and the session partner can request to resend previous data. In this case we will re-send bad data. With this patch we will stop to overwrite own TX buffer and use it for sanity checking. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20200807105200.26441-6-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-14 12:38:47 +02:00
Oleksij Rempel	840835c928	can: j1939: transport: add j1939_session_skb_find_by_offset() function Sometimes it makes no sense to search the skb by pkt.dpo, since we need next the skb within the transaction block. This may happen if we have an ETP session with CTS set to less than 255 packets. After this patch, we will be able to work with ETP sessions where the block size (ETP.CM_CTS byte 2) is less than 255 packets. Reported-by: Henrique Figueira <henrislip@gmail.com> Reported-by: https://github.com/linux-can/can-utils/issues/228 Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20200807105200.26441-5-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-14 12:38:47 +02:00
Oleksij Rempel	af804b7826	can: j1939: socket: j1939_sk_bind(): make sure ml_priv is allocated This patch adds check to ensure that the struct net_device::ml_priv is allocated, as it is used later by the j1939 stack. The allocation is done by all mainline CAN network drivers, but when using bond or team devices this is not the case. Bail out if no ml_priv is allocated. Reported-by: syzbot+f03d384f3455d28833eb@syzkaller.appspotmail.com Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Cc: linux-stable <stable@vger.kernel.org> # >= v5.4 Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20200807105200.26441-4-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-14 12:38:47 +02:00
Oleksij Rempel	cd3b3636c9	can: j1939: transport: j1939_session_tx_dat(): fix use-after-free read in j1939_tp_txtimer() The current stack implementation do not support ECTS requests of not aligned TP sized blocks. If ECTS will request a block with size and offset spanning two TP blocks, this will cause memcpy() to read beyond the queued skb (which does only contain one TP sized block). Sometimes KASAN will detect this read if the memory region beyond the skb was previously allocated and freed. In other situations it will stay undetected. The ETP transfer in any case will be corrupted. This patch adds a sanity check to avoid this kind of read and abort the session with error J1939_XTP_ABORT_ECTS_TOO_BIG. Reported-by: syzbot+5322482fe520b02aea30@syzkaller.appspotmail.com Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Cc: linux-stable <stable@vger.kernel.org> # >= v5.4 Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20200807105200.26441-3-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-14 12:38:47 +02:00
Oleksij Rempel	b43e3a82bc	can: j1939: transport: j1939_simple_recv(): ignore local J1939 messages send not by J1939 stack In current J1939 stack implementation, we process all locally send messages as own messages. Even if it was send by CAN_RAW socket. To reproduce it use following commands: testj1939 -P -r can0:0x80 & cansend can0 18238040#0123 This step will trigger false positive not critical warning: j1939_simple_recv: Received already invalidated message With this patch we add additional check to make sure, related skb is own echo message. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20200807105200.26441-2-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-14 12:38:47 +02:00
Eric Dumazet	38ba8b9241	can: j1939: fix kernel-infoleak in j1939_sk_sock2sockaddr_can() syzbot found that at least 2 bytes of kernel information were leaked during getsockname() on AF_CAN CAN_J1939 socket. Since struct sockaddr_can has in fact two holes, simply clear the whole area before filling it with useful data. BUG: KMSAN: kernel-infoleak in kmsan_copy_to_user+0x81/0x90 mm/kmsan/kmsan_hooks.c:253 CPU: 0 PID: 8466 Comm: syz-executor511 Not tainted 5.8.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 kmsan_internal_check_memory+0x238/0x3d0 mm/kmsan/kmsan.c:423 kmsan_copy_to_user+0x81/0x90 mm/kmsan/kmsan_hooks.c:253 instrument_copy_to_user include/linux/instrumented.h:91 [inline] _copy_to_user+0x18e/0x260 lib/usercopy.c:39 copy_to_user include/linux/uaccess.h:186 [inline] move_addr_to_user+0x3de/0x670 net/socket.c:237 __sys_getsockname+0x407/0x5e0 net/socket.c:1909 __do_sys_getsockname net/socket.c:1920 [inline] __se_sys_getsockname+0x91/0xb0 net/socket.c:1917 __x64_sys_getsockname+0x4a/0x70 net/socket.c:1917 do_syscall_64+0xad/0x160 arch/x86/entry/common.c:386 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x440219 Code: Bad RIP value. RSP: 002b:00007ffe5ee150c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000033 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440219 RDX: 0000000020000240 RSI: 0000000020000100 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401a20 R13: 0000000000401ab0 R14: 0000000000000000 R15: 0000000000000000 Local variable ----address@__sys_getsockname created at: __sys_getsockname+0x91/0x5e0 net/socket.c:1894 __sys_getsockname+0x91/0x5e0 net/socket.c:1894 Bytes 2-3 of 24 are uninitialized Memory access of size 24 starts at ffff8880ba2c7de8 Data copied to user address 0000000020000100 Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Robin van der Gracht <robin@protonic.nl> Cc: Oleksij Rempel <o.rempel@pengutronix.de> Cc: Pengutronix Kernel Team <kernel@pengutronix.de> Cc: linux-can@vger.kernel.org Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20200813161834.4021638-1-edumazet@google.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-08-14 12:31:10 +02:00
Florian Westphal	5c04da55c7	netfilter: ebtables: reject bogus getopt len value syzkaller reports splat: ------------[ cut here ]------------ Buffer overflow detected (80 < 137)! Call Trace: do_ebt_get_ctl+0x2b4/0x790 net/bridge/netfilter/ebtables.c:2317 nf_getsockopt+0x72/0xd0 net/netfilter/nf_sockopt.c:116 ip_getsockopt net/ipv4/ip_sockglue.c:1778 [inline] caused by a copy-to-user with a too-large "len" value. This adds a argument check on len just like in the non-compat version of the handler. Before the "Fixes" commit, the reproducer fails with -EINVAL as expected: 1. core calls the "compat" getsockopt version 2. compat getsockopt version detects the len value is possibly in 64-bit layout (len != compat_len) 3. compat getsockopt version delegates everything to native getsockopt version 4. native getsockopt rejects invalid len -> compat handler only sees len == sizeof(compat_struct) for GET_ENTRIES. After the refactor, event sequence is: 1. getsockopt calls "compat" version (len != native_len) 2. compat version attempts to copy len bytes, where *len is random value from userspace Fixes: `fc66de8e16` ("netfilter/ebtables: clean up compat {get, set}sockopt handling") Reported-by: syzbot+5accb5c62faa1d346480@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-14 11:59:08 +02:00
Linus Torvalds	a1d21081a6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: "Some merge window fallout, some longer term fixes: 1) Handle headroom properly in lapbether and x25_asy drivers, from Xie He. 2) Fetch MAC address from correct r8152 device node, from Thierry Reding. 3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg, from Rouven Czerwinski. 4) Correct fdputs in socket layer, from Miaohe Lin. 5) Revert troublesome sockptr_t optimization, from Christoph Hellwig. 6) Fix TCP TFO key reading on big endian, from Jason Baron. 7) Missing CAP_NET_RAW check in nfc, from Qingyu Li. 8) Fix inet fastreuse optimization with tproxy sockets, from Tim Froidcoeur. 9) Fix 64-bit divide in new SFC driver, from Edward Cree. 10) Add a tracepoint for prandom_u32 so that we can more easily perform usage analysis. From Eric Dumazet. 11) Fix rwlock imbalance in AF_PACKET, from John Ogness" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits) net: openvswitch: introduce common code for flushing flows af_packet: TPACKET_V3: fix fill status rwlock imbalance random32: add a tracepoint for prandom_u32() Revert "ipv4: tunnel: fix compilation on ARCH=um" net: accept an empty mask in /sys/class/net//queues/rx-/rps_cpus net: ethernet: stmmac: Disable hardware multicast filter net: stmmac: dwmac1000: provide multicast filter fallback ipv4: tunnel: fix compilation on ARCH=um vsock: fix potential null pointer dereference in vsock_poll() sfc: fix ef100 design-param checking net: initialize fastreuse on inet_inherit_port net: refactor bind_bucket fastreuse into helper net: phy: marvell10g: fix null pointer dereference net: Fix potential memory leak in proto_register() net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc() net/nfc/rawsock.c: add CAP_NET_RAW check. hinic: fix strncpy output truncated compile warnings drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check net/tls: Fix kmap usage ...	2020-08-13 20:03:11 -07:00
Tonghao Zhang	1f3a090b90	net: openvswitch: introduce common code for flushing flows To avoid some issues, for example RCU usage warning and double free, we should flush the flows under ovs_lock. This patch refactors table_instance_destroy and introduces table_instance_flow_flush which can be invoked by __dp_destroy or ovs_flow_tbl_flush. Fixes: `50b0e61b32` ("net: openvswitch: fix possible memleak on destroy flow-table") Reported-by: Johan Knöös <jknoos@google.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-13 15:53:30 -07:00
John Ogness	88fd1cb80d	af_packet: TPACKET_V3: fix fill status rwlock imbalance After @blk_fill_in_prog_lock is acquired there is an early out vnet situation that can occur. In that case, the rwlock needs to be released. Also, since @blk_fill_in_prog_lock is only acquired when @tp_version is exactly TPACKET_V3, only release it on that exact condition as well. And finally, add sparse annotation so that it is clearer that prb_fill_curr_block() and prb_clear_blk_fill_status() are acquiring and releasing @blk_fill_in_prog_lock, respectively. sparse is still unable to understand the balance, but the warnings are now on a higher level that make more sense. Fixes: `632ca50f2c` ("af_packet: TPACKET_V3: replace busy-wait loop") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-13 15:37:30 -07:00
John Fastabend	84f44df664	bpf: sock_ops sk access may stomp registers when dst_reg = src_reg Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the src_reg = dst_reg when reading the sk field of a sock_ops struct we generate xlated code, 53: (61) r9 = (u32 )(r9 +28) 54: (15) if r9 == 0x0 goto pc+3 56: (79) r9 = (u64 )(r9 +0) This stomps on the r9 reg to do the sk_fullsock check and then when reading the skops->sk field instead of the sk pointer we get the sk_fullsock. To fix use similar pattern noted in the previous fix and use the temp field to save/restore a register used to do sk_fullsock check. After the fix the generated xlated code reads, 52: (7b) (u64 )(r9 +32) = r8 53: (61) r8 = (u32 )(r9 +28) 54: (15) if r9 == 0x0 goto pc+3 55: (79) r8 = (u64 )(r9 +32) 56: (79) r9 = (u64 )(r9 +0) 57: (05) goto pc+1 58: (79) r8 = (u64 )(r9 +32) Here r9 register was in-use so r8 is chosen as the temporary register. In line 52 r8 is saved in temp variable and at line 54 restored in case fullsock != 0. Finally we handle fullsock == 0 case by restoring at line 58. This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a bit more confusing than just adding a new macro despite a bit of duplicating code. Fixes: `1314ef5611` ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718349653.4728.6559437186853473612.stgit@john-Precision-5820-Tower	2020-08-13 22:40:40 +02:00
John Fastabend	fd09af0107	bpf: sock_ops ctx access may stomp registers in corner case I had a sockmap program that after doing some refactoring started spewing this splat at me: [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [...] [18610.807359] Call Trace: [18610.807370] ? 0xffffffffc114d0d5 [18610.807382] __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0 [18610.807391] tcp_connect+0x895/0xd50 [18610.807400] tcp_v4_connect+0x465/0x4e0 [18610.807407] __inet_stream_connect+0xd6/0x3a0 [18610.807412] ? __inet_stream_connect+0x5/0x3a0 [18610.807417] inet_stream_connect+0x3b/0x60 [18610.807425] __sys_connect+0xed/0x120 After some debugging I was able to build this simple reproducer, __section("sockops/reproducer_bad") int bpf_reproducer_bad(struct bpf_sock_ops skops) { volatile __maybe_unused __u32 i = skops->snd_ssthresh; return 0; } And along the way noticed that below program ran without splat, __section("sockops/reproducer_good") int bpf_reproducer_good(struct bpf_sock_ops skops) { volatile __maybe_unused __u32 i = skops->snd_ssthresh; volatile __maybe_unused __u32 family; compiler_barrier(); family = skops->family; return 0; } So I decided to check out the code we generate for the above two programs and noticed each generates the BPF code you would expect, 0000000000000000 <bpf_reproducer_bad>: ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: r1 = (u32 )(r1 + 96) 1: (u32 )(r10 - 4) = r1 ; return 0; 2: r0 = 0 3: exit 0000000000000000 <bpf_reproducer_good>: ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: r2 = (u32 )(r1 + 96) 1: (u32 )(r10 - 4) = r2 ; family = skops->family; 2: r1 = (u32 )(r1 + 20) 3: (u32 )(r10 - 8) = r1 ; return 0; 4: r0 = 0 5: exit So we get reasonable assembly, but still something was causing the null pointer dereference. So, we load the programs and dump the xlated version observing that line 0 above 'r* = (u32 )(r1 +96)' is going to be translated by the skops access helpers. int bpf_reproducer_bad(struct bpf_sock_ops * skops): ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: (61) r1 = (u32 )(r1 +28) 1: (15) if r1 == 0x0 goto pc+2 2: (79) r1 = (u64 )(r1 +0) 3: (61) r1 = (u32 )(r1 +2340) ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 4: (63) (u32 )(r10 -4) = r1 ; return 0; 5: (b7) r0 = 0 6: (95) exit int bpf_reproducer_good(struct bpf_sock_ops * skops): ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: (61) r2 = (u32 )(r1 +28) 1: (15) if r2 == 0x0 goto pc+2 2: (79) r2 = (u64 )(r1 +0) 3: (61) r2 = (u32 )(r2 +2340) ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 4: (63) (u32 )(r10 -4) = r2 ; family = skops->family; 5: (79) r1 = (u64 )(r1 +0) 6: (69) r1 = (u16 )(r1 +16) ; family = skops->family; 7: (63) (u32 )(r10 -8) = r1 ; return 0; 8: (b7) r0 = 0 9: (95) exit Then we look at lines 0 and 2 above. In the good case we do the zero check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check into the bpf_sock_ops check and we can confirm that is the 'struct sock sk' pointer field. But, in the bad case, 0: (61) r1 = (u32 )(r1 +28) 1: (15) if r1 == 0x0 goto pc+2 2: (79) r1 = (u64 )(r1 +0) Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat, [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 The 0x01 makes sense because that is exactly the fullsock value. And its not a valid dereference so we splat. To fix we need to guard the case when a program is doing a sock_ops field access with src_reg == dst_reg. This is already handled in the load case where the ctx_access handler uses a tmp register being careful to store the old value and restore it. To fix the get case test if src_reg == dst_reg and in this case do the is_fullsock test in the temporary register. Remembering to restore the temporary register before writing to either dst_reg or src_reg to avoid smashing the pointer into the struct holding the tmp variable. Adding this inline code to test_tcpbpf_kern will now be generated correctly from, 9: r2 = (u32 )(r2 + 96) to xlated code, 12: (7b) (u64 )(r2 +32) = r9 13: (61) r9 = (u32 )(r2 +28) 14: (15) if r9 == 0x0 goto pc+4 15: (79) r9 = (u64 )(r2 +32) 16: (79) r2 = (u64 )(r2 +0) 17: (61) r2 = (u32 )(r2 +2348) 18: (05) goto pc+1 19: (79) r9 = (u64 )(r2 +32) And in the normal case we keep the original code, because really this is an edge case. From this, 9: r2 = (u32 )(r6 + 96) to xlated code, 22: (61) r2 = (u32 )(r6 +28) 23: (15) if r2 == 0x0 goto pc+2 24: (79) r2 = (u64 )(r6 +0) 25: (61) r2 = (u32 *)(r2 +2348) So three additional instructions if dst == src register, but I scanned my current code base and did not see this pattern anywhere so should not be a big deal. Further, it seems no one else has hit this or at least reported it so it must a fairly rare pattern. Fixes: `9b1f3d6e5a` ("bpf: Refactor sock_ops_convert_ctx_access") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718347772.4728.2781381670567919577.stgit@john-Precision-5820-Tower	2020-08-13 22:40:36 +02:00
Florian Westphal	59136aa3b2	netfilter: nf_tables: free chain context when BINDING flag is missing syzbot found a memory leak in nf_tables_addchain() because the chain object is not free'd correctly on error. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: syzbot+c99868fde67014f7e9f5@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-13 04:17:46 +02:00
Florian Westphal	2404b73c3f	netfilter: avoid ipv6 -> nf_defrag_ipv6 module dependency nf_ct_frag6_gather is part of nf_defrag_ipv6.ko, not ipv6 core. The current use of the netfilter ipv6 stub indirections causes a module dependency between ipv6 and nf_defrag_ipv6. This prevents nf_defrag_ipv6 module from being removed because ipv6 can't be unloaded. Remove the indirection and always use a direct call. This creates a depency from nf_conntrack_bridge to nf_defrag_ipv6 instead: modinfo nf_conntrack depends: nf_conntrack,nf_defrag_ipv6,bridge .. and nf_conntrack already depends on nf_defrag_ipv6 anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-13 04:16:15 +02:00
Andrii Nakryiko	068d9d1eba	bpf: Fix XDP FD-based attach/detach logic around XDP_FLAGS_UPDATE_IF_NOEXIST Enforce XDP_FLAGS_UPDATE_IF_NOEXIST only if new BPF program to be attached is non-NULL (i.e., we are not detaching a BPF program). Fixes: `d4baa9368a` ("bpf, xdp: Extract common XDP program attachment logic") Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Stanislav Fomichev <sdf@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200812022923.1217922-1-andriin@fb.com	2020-08-12 18:00:49 -07:00
David S. Miller	9643609423	Revert "ipv4: tunnel: fix compilation on ARCH=um" This reverts commit `06a7a37be5`. The bug was already fixed, this added a dup include. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-12 13:26:37 -07:00
Eric Dumazet	2e0d8fef5f	net: accept an empty mask in /sys/class/net//queues/rx-/rps_cpus We must accept an empty mask in store_rps_map(), or we are not able to disable RPS on a queue. Fixes: `07bbecb341` ("net: Restrict receive packets queuing to housekeeping CPUs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Cc: Alex Belits <abelits@marvell.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maciej Żenczykowski <maze@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-12 13:18:16 -07:00
Johannes Berg	06a7a37be5	ipv4: tunnel: fix compilation on ARCH=um With certain configurations, a 64-bit ARCH=um errors out here with an unknown csum_ipv6_magic() function. Include the right header file to always have it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-12 12:57:51 -07:00
Stefano Garzarella	1980c05844	vsock: fix potential null pointer dereference in vsock_poll() syzbot reported this issue where in the vsock_poll() we find the socket state at TCP_ESTABLISHED, but 'transport' is null: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097] CPU: 0 PID: 8227 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:vsock_poll+0x75a/0x8e0 net/vmw_vsock/af_vsock.c:1038 Call Trace: sock_poll+0x159/0x460 net/socket.c:1266 vfs_poll include/linux/poll.h:90 [inline] do_pollfd fs/select.c:869 [inline] do_poll fs/select.c:917 [inline] do_sys_poll+0x607/0xd40 fs/select.c:1011 __do_sys_poll fs/select.c:1069 [inline] __se_sys_poll fs/select.c:1057 [inline] __x64_sys_poll+0x18c/0x440 fs/select.c:1057 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This issue can happen if the TCP_ESTABLISHED state is set after we read the vsk->transport in the vsock_poll(). We could put barriers to synchronize, but this can only happen during connection setup, so we can simply check that 'transport' is valid. Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Reported-and-tested-by: syzbot+a61bac2fcc1a7c6623fe@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-12 12:56:06 -07:00
Linus Torvalds	7c2a69f610	Xiubo has completed his work on filesystem client metrics, they are sent to all available MDSes once per second now. Other than that, we have a lot of fixes and cleanups all around the filesystem, including a tweak to cut down on MDS request resends in multi-MDS setups from Yanhu and fixups for SELinux symlink labeling and MClientSession message decoding from Jeff. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl80I3MTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi7rlB/4/tegnfsB+O6MQPt+DkSxjQ6+90J/O aDj3jnwHoUBUitTP92wHdy/TanlZMX0UQqq4caWt1Ot0Rveb6XTfTIzEDtG2ejYQ dcH6LB3dS9Qvjrwh7oRawx8Xux/RdIIA99OvAX7K+4IeLKP5PjmwfdM1Mm0agfoS V8qXm8cGyqF2lWown092Znr8s6OSsDEaUbjB2tlz5msWeK9rSFD5bDYOBdOTkBCU 5v5BIUwWWoU9YFMWDJgVNS5l2Scq5yTDEleZcCCJMovKNrhBi9DcwUDKrDQJ3+5v UmfI5M/L99Ng7yAcPqbeOZ1HibuOrdmqAoR6lCiD+PPEPrf99rVLRinQ =2NtI -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "Xiubo has completed his work on filesystem client metrics, they are sent to all available MDSes once per second now. Other than that, we have a lot of fixes and cleanups all around the filesystem, including a tweak to cut down on MDS request resends in multi-MDS setups from Yanhu and fixups for SELinux symlink labeling and MClientSession message decoding from Jeff" * tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client: (22 commits) ceph: handle zero-length feature mask in session messages ceph: use frag's MDS in either mode ceph: move sb->wb_pagevec_pool to be a global mempool ceph: set sec_context xattr on symlink creation ceph: remove redundant initialization of variable mds ceph: fix use-after-free for fsc->mdsc ceph: remove unused variables in ceph_mdsmap_decode() ceph: delete repeated words in fs/ceph/ ceph: send client provided metric flags in client metadata ceph: periodically send perf metrics to MDSes ceph: check the sesion state and return false in case it is closed libceph: replace HTTP links with HTTPS ones ceph: remove unnecessary cast in kfree() libceph: just have osd_req_op_init() return a pointer ceph: do not access the kiocb after aio requests ceph: clean up and optimize ceph_check_delayed_caps() ceph: fix potential mdsc use-after-free crash ceph: switch to WARN_ON_ONCE in encode_supported_features() ceph: add global total_caps to count the mdsc's total caps number ceph: add check_session_state() helper and make it global ...	2020-08-12 12:51:31 -07:00
Tim Froidcoeur	d76f3351ce	net: initialize fastreuse on inet_inherit_port In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind behaviour or in the incorrect reuse of a bind. the kernel keeps track for each bind_bucket if all sockets in the bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags. These flags allow skipping the costly bind_conflict check when possible (meaning when all sockets have the proper SO_REUSE option). For every socket added to a bind_bucket, these flags need to be updated. As soon as a socket that does not support reuse is added, the flag is set to false and will never go back to true, unless the bind_bucket is deleted. Note that there is no mechanism to re-evaluate these flags when a socket is removed (this might make sense when removing a socket that would not allow reuse; this leaves room for a future patch). For this optimization to work, it is mandatory that these flags are properly initialized and updated. When a child socket is created from a listen socket in __inet_inherit_port, the TPROXY case could create a new bind bucket without properly initializing these flags, thus preventing the optimization to work. Alternatively, a socket not allowing reuse could be added to an existing bind bucket without updating the flags, causing bind_conflict to never be called as it should. Call inet_csk_update_fastreuse when __inet_inherit_port decides to create a new bind_bucket or use a different bind_bucket than the one of the listen socket. Fixes: `093d282321` ("tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()") Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-11 15:49:08 -07:00
Tim Froidcoeur	62ffc589ab	net: refactor bind_bucket fastreuse into helper Refactor the fastreuse update code in inet_csk_get_port into a small helper function that can be called from other places. Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-11 15:49:08 -07:00
Miaohe Lin	0f5907af39	net: Fix potential memory leak in proto_register() If we failed to assign proto idx, we free the twsk_slab_name but forget to free the twsk_slab. Add a helper function tw_prot_cleanup() to free these together and also use this helper function in proto_unregister(). Fixes: `b45ce32135` ("sock: fix potential memory leak in proto_register()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-11 15:36:14 -07:00
Qingyu Li	26896f0146	net/nfc/rawsock.c: add CAP_NET_RAW check. When creating a raw AF_NFC socket, CAP_NET_RAW needs to be checked first. Signed-off-by: Qingyu Li <ieatmuttonchuan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-11 10:34:30 -07:00
Ira Weiny	b06c19d9f8	net/tls: Fix kmap usage When MSG_OOB is specified to tls_device_sendpage() the mapped page is never unmapped. Hold off mapping the page until after the flags are checked and the page is actually needed. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-11 10:20:34 -07:00
Linus Torvalds	97d052ea3f	A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generaly useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul 81ppn2KlvzUK8g== =7Gj+ -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...	2020-08-10 19:07:44 -07:00
Jason Baron	f19008e676	tcp: correct read of TFO keys on big endian systems When TFO keys are read back on big endian systems either via the global sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values don't match what was written. For example, on s390x: # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key 02000000-01000000-04000000-03000000 Instead of: # cat /proc/sys/net/ipv4/tcp_fastopen_key 00000001-00000002-00000003-00000004 Fix this by converting to the correct endianness on read. This was reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net selftest on s390x, which depends on the read value matching what was written. I've confirmed that the test now passes on big and little endian systems. Signed-off-by: Jason Baron <jbaron@akamai.com> Fixes: `438ac88009` ("net: fastopen: robustness and endianness fixes for SipHash") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Eric Dumazet <edumazet@google.com> Reported-and-tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-10 12:12:35 -07:00
Christoph Hellwig	519a8a6cf9	net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces" This reverts commits `6d04fe15f7` and `a31edb2059`. It turns out the idea to share a single pointer for both kernel and user space address causes various kinds of problems. So use the slightly less optimal version that uses an extra bit, but which is guaranteed to be safe everywhere. Fixes: `6d04fe15f7` ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-10 12:06:44 -07:00
Florian Westphal	2f941622fd	netfilter: nft_compat: remove flush counter optimization WARNING: CPU: 1 PID: 16059 at lib/refcount.c:31 refcount_warn_saturate+0xdf/0xf [..] __nft_mt_tg_destroy+0x42/0x50 [nft_compat] nft_target_destroy+0x63/0x80 [nft_compat] nf_tables_expr_destroy+0x1b/0x30 [nf_tables] nf_tables_rule_destroy+0x3a/0x70 [nf_tables] nf_tables_exit_net+0x186/0x3d0 [nf_tables] Happens when a compat expr is destoyed from abort path. There is no functional impact; after this work queue is flushed unconditionally if its pending. This removes the waitcount optimization. Test of repeated iptables-restore of a ~60k kubernetes ruleset doesn't indicate a slowdown. In case the counter is needed after all for some workloads we can revert this and increment the refcount for the != NFT_PREPARE_TRANS case to avoid the increment/decrement imbalance. While at it, also flush for match case, this was an oversight in the original patch. Fixes: `ffe8923f10` ("netfilter: nft_compat: make sure xtables destructors have run") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-10 13:03:36 +02:00
Stephen Suryaputra	b428336676	netfilter: nf_tables: nft_exthdr: the presence return value should be little-endian On big-endian machine, the returned register data when the exthdr is present is not being compared correctly because little-endian is assumed. The function nft_cmp_fast_mask(), called by nft_cmp_fast_eval() and nft_cmp_fast_init(), calls cpu_to_le32(). The following dump also shows that little endian is assumed: $ nft --debug=netlink add rule ip recordroute forward ip option rr exists counter ip [ exthdr load ipv4 1b @ 7 + 0 present => reg 1 ] [ cmp eq reg 1 0x01000000 ] [ counter pkts 0 bytes 0 ] Lastly, debug print in nft_cmp_fast_init() and nft_cmp_fast_eval() when RR option exists in the packet shows that the comparison fails because the assumption: nft_cmp_fast_init:189 priv->sreg=4 desc.len=8 mask=0xff000000 data.data[0]=0x10003e0 nft_cmp_fast_eval:57 regs->data[priv->sreg=4]=0x1 mask=0xff000000 priv->data=0x1000000 v2: use nft_reg_store8() instead (Florian Westphal). Also to avoid the warnings reported by kernel test robot. Fixes: `dbb5281a1f` ("netfilter: nf_tables: add support for matching IPv4 options") Fixes: `c078ca3b0c` ("netfilter: nft_exthdr: Add support for existence check") Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-10 13:02:43 +02:00
Linus Torvalds	7a6b60441f	Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl8oBt0ACgkQM2qzM29m f5dTFQ/9H72E6gr1onsia0/Py0CO8F9qzLgmUBl1vVYAh2/vPqUL1ypxrC5OYrAy TOqESTsJvmGluCFc/77XUTD7NvJY3znIWim49okwDiyee4Y14ZfRhhCxyyA6Z94E FjJQb5TbF1Mti4X3dN8Gn7O1Y/BfTjDAAXnXGlTA1xoLcxM5idWIj+G8x0bPmeDb 2fTbgsoETu6MpS2/L6mraXVh3d5ESOJH+73YvpBl0AhYPzlNASJZMLtHtd+A/JbO IPkMP/7UA5DuJtWGeuQ4I4D5bQNpNWMfN6zhwtih4IV5bkRC7vyAOLG1R7w9+Ufq 58cxPiorMcsg1cHnXG0Z6WVtbMEdWTP/FzmJdE5RC7DEJhmmSUG/R0OmgDcsDZET GovPARho01yp80GwTjCIctDHRRFRL4pdPfr8PjVHetSnx9+zoRUT+D70Zeg/KSy2 99gmCxqSY9BZeHoiVPEX/HbhXrkuDjUSshwl98OAzOFmv6kbwtLntgFbWlBdE6dB mqOxBb73zEoZ5P9GA2l2ShU3GbzMzDebHBb9EyomXHZrLejoXeUNA28VJ+8vPP5S IVHnEwOkdJrNe/7cH4jd/B0NR6f8Da/F9kmkLiG2GNPMqQ8bnVhxTUtZkcAE+fd4 f34qLxsoht70wSSfISjBs7hP5KxEM1lOAf0w0RpycPUKJNV1FB0= =OEpF -----END PGP SIGNATURE----- Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull NFS server updates from Chuck Lever: "Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints" * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits) svcrdma: CM event handler clean up svcrdma: Remove transport reference counting svcrdma: Fix another Receive buffer leak SUNRPC: Refresh the show_rqstp_flags() macro nfsd: netns.h: delete a duplicated word SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()") nfsd: avoid a NULL dereference in __cld_pipe_upcall() nfsd4: a client's own opens needn't prevent delegations nfsd: Use seq_putc() in two functions svcrdma: Display chunk completion ID when posting a rw_ctxt svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send() svcrdma: Introduce Send completion IDs svcrdma: Record Receive completion ID in svc_rdma_decode_rqst svcrdma: Introduce Receive completion IDs svcrdma: Introduce infrastructure to support completion IDs svcrdma: Add common XDR encoders for RDMA and Read segments svcrdma: Add common XDR decoders for RDMA and Read segments SUNRPC: Add helpers for decoding list discriminators symbolically svcrdma: Remove declarations for functions long removed svcrdma: Clean up trace_svcrdma_send_failed() tracepoint ...	2020-08-09 13:58:04 -07:00
Miaohe Lin	7c7ab580db	net: Convert to use the fallthrough macro Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-08 14:29:09 -07:00
Miaohe Lin	11f920d2aa	net: Use helper function ip_is_fragment() Use helper function ip_is_fragment() to check ip fragment. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-08 14:24:53 -07:00
Miaohe Lin	47260ba937	net: Remove meaningless jump label out_fs The out_fs jump label has nothing to do but goto out. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-08 14:23:21 -07:00
Miaohe Lin	ce787a5a07	net: Set fput_needed iff FDPUT_FPUT is set We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed accordingly. Fixes: `00e188ef6a` ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-08 14:21:42 -07:00
Miaohe Lin	6b07edebe6	net: Use helper function fdput() Use helper function fdput() to fput() the file iff FDPUT_FPUT is set. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-08 14:19:16 -07:00
YueHaibing	61ee4137b5	ip_vti: Fix unused variable warning If CONFIG_INET_XFRM_TUNNEL is set but CONFIG_IPV6 is n, net/ipv4/ip_vti.c:493:27: warning: 'vti_ipip6_handler' defined but not used [-Wunused-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-08-08 08:17:22 +02:00
Rouven Czerwinski	1c3b63f155	net/tls: allow MSG_CMSG_COMPAT in sendmsg Trying to use ktls on a system with 32-bit userspace and 64-bit kernel results in a EOPNOTSUPP message during sendmsg: setsockopt(3, SOL_TLS, TLS_TX, …, 40) = 0 sendmsg(3, …, msg_flags=0}, 0) = -1 EOPNOTSUPP (Operation not supported) The tls_sw implementation does strict flag checking and does not allow the MSG_CMSG_COMPAT flag, which is set if the message comes in through the compat syscall. This patch adds MSG_CMSG_COMPAT to the flag check to allow the usage of the TLS SW implementation on systems using the compat syscall path. Note that the same check is present in the sendmsg path for the TLS device implementation, however the flag hasn't been added there for lack of testing hardware. Signed-off-by: Rouven Czerwinski <r.czerwinski@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-07 17:40:45 -07:00
David S. Miller	64cae2fb48	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2020-08-08 The following pull-request contains BPF updates for your net tree. We've added 11 non-merge commits during the last 2 day(s) which contain a total of 24 files changed, 216 insertions(+), 135 deletions(-). The main changes are: 1) Fix UAPI for BPF map iterator before it gets frozen to allow for more extensions/customization in future, from Yonghong Song. 2) Fix selftests build to undo verbose build output, from Andrii Nakryiko. 3) Fix inlining compilation error on bpf_do_trace_printk() due to variable argument lists, from Stanislav Fomichev. 4) Fix an uninitialized pointer warning at btf__parse_raw() in libbpf, from Daniel T. Lee. 5) Fix several compilation warnings in selftests with regards to ignoring return value, from Jianlin Lv. 6) Fix interruptions by switching off timeout for BPF tests, from Jiri Benc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-07 17:33:08 -07:00
Paolo Abeni	7ee2492635	mptcp: fix warn at shutdown time for unaccepted msk sockets With commit `b93df08ccd` ("mptcp: explicitly track the fully established status"), the status of unaccepted mptcp closed in mptcp_sock_destruct() changes from TCP_SYN_RECV to TCP_ESTABLISHED. As a result mptcp_sock_destruct() does not perform the proper cleanup and inet_sock_destruct() will later emit a warn. Address the issue updating the condition tested in mptcp_sock_destruct(). Also update the related comment. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/66 Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Fixes: `b93df08ccd` ("mptcp: explicitly track the fully established status") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-07 17:26:16 -07:00
Linus Torvalds	1fa2c0a0c8	Fix SCM_RIGHTS compat mode -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8trzgWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqrDEACRlxrvLRsMX5MCdZ3SQnnHkGbp 7C/OP331cRe0HnAp1B3P8VhP/Y81fDiCnrP9cb7yoeMASCvCUhVQlMZpLVpppOUE NK0XQdjMlHJGh9qTcgWGd49KpFOFFYIeiv4NaddlmScb8Q+ZkMKiQWURNR65iM8c QLcpfryZEZd7eEZdH8Xtn6ItrYEYTldXSC8oc+e3ev0HCg0D7E4nBaqp1JzHT9I2 Zku98lyap2sxSBZlTKVppowZ7RPqViwRkvGMofWE7brbJ2+H89YMXhj0ZMFQ8+8z htbG6KCWUP7kdUwcyuBgUX8/+cGMsb9R8U34ktxO7GWq1stHAph4e59kJ0cRKjNp v7vFL5nA6kvgYjZ9Y7dQoeKQWFAcN7qK2/DR/11ECx21pHbR/6LgyPDTYYgN88Bp wcbNgPr1AI7bxu/FKe+6a+qgIjV5Ukb4SxNKU7pvkKKSDCzhMxGGCRY9Pmip2gmv EHKOox7QgONGhvuTrE2CYr5hAO5alKbciTXnofAmCpi1ddTrpv5wVDDSSJXK9xNe uZjMvYGGnJWw8KLZiqbC6P1k92Lm8hfFzPuLBuuUILrG5MZsAiLGIOleXTaaehPS ZRPE1h2A95lFulOPXdX0fdQkWeVSf/IkBNxA3Uz/xM7QaF75n83VypFxVfEFzmos 7R14u32ySUeuZX0/wg== =5TFR -----END PGP SIGNATURE----- Merge tag 'seccomp-v5.9-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fix from Kees Cook: "This fixes my typo in the SCM_RIGHTS refactoring that broke compat handling. Thanks to Thadeu Lima de Souza Cascardo for tracking it down, and to Christian Zigotzky and Alex Xu for their reports" * tag 'seccomp-v5.9-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: net/scm: Fix typo in SCM_RIGHTS compat refactoring	2020-08-07 13:16:22 -07:00
Kees Cook	16b89f6953	net/scm: Fix typo in SCM_RIGHTS compat refactoring When refactoring the SCM_RIGHTS code, I accidentally mis-merged my native/compat diffs, which entirely broke using SCM_RIGHTS in compat mode. Use the correct helper. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216156.html Reported-by: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca> Link: https://lore.kernel.org/lkml/1596812929.lz7fuo8r2w.none@localhost/ Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Fixes: `c0029de509` ("net/scm: Regularize compat handling of scm_detach_fds()") Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2020-08-07 12:43:25 -07:00
Linus Torvalds	81e11336d9	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: - a few MM hotfixes - kthread, tools, scripts, ntfs and ocfs2 - some of MM Subsystems affected by this patch series: kthread, tools, scripts, ntfs, ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore, sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) mm: vmscan: consistent update to pgrefill mm/vmscan.c: fix typo khugepaged: khugepaged_test_exit() check mmget_still_valid() khugepaged: retract_page_tables() remember to test exit khugepaged: collapse_pte_mapped_thp() protect the pmd lock khugepaged: collapse_pte_mapped_thp() flush the right range mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible mm: thp: replace HTTP links with HTTPS ones mm/page_alloc: fix memalloc_nocma_{save/restore} APIs mm/page_alloc.c: skip setting nodemask when we are in interrupt mm/page_alloc: fallbacks at most has 3 elements mm/page_alloc: silence a KASAN false positive mm/page_alloc.c: remove unnecessary end_bitidx for [set\|get]_pfnblock_flags_mask() mm/page_alloc.c: simplify pageblock bitmap access mm/page_alloc.c: extract the common part in pfn_to_bitidx() mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits mm/shuffle: remove dynamic reconfiguration mm/memory_hotplug: document why shuffle_zone() is relevant mm/page_alloc: remove nr_free_pagecache_pages() mm: remove vm_total_pages ...	2020-08-07 11:39:33 -07:00
Waiman Long	453431a549	mm, treewide: rename kzfree() to kfree_sensitive() As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important part of what the caller wants. In "kzfree()", the z is actively detrimental, because maybe in the future we really _might_ want to use that "memfill(0xdeadbeef)" or something. The "zero" part of the interface isn't even _relevant_. The main reason that kzfree() exists is to clear sensitive information that should not be leaked to other future users of the same memory objects. Rename kzfree() to kfree_sensitive() to follow the example of the recently added kvfree_sensitive() and make the intention of the API more explicit. In addition, memzero_explicit() is used to clear the memory to make sure that it won't get optimized away by the compiler. The renaming is done by using the command sequence: git grep -w --name-only kzfree \|\ xargs sed -i 's/kzfree/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and adding a kzfree backward compatibility macro in slab.h. [akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h] [akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more] Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Joe Perches <joe@perches.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-07 11:33:22 -07:00
Linus Torvalds	86cfccb669	dlm for 5.9 This set includes a some improvements to the dlm networking layer: improving the ability to trace dlm messages for debugging, and improved handling of bad messages or disrupted connections. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJfLCPxAAoJEDgbc8f8gGmqz04P/2hvv/4rXo9AOgnnstvZV1Qy Yo01Cy807vB1c3jhIJryM2gG61GNH22RAHc2NcfjJwy04HH/1IEr6P48Po3qYEnS 8fZ8B9msxpsujVOrRoeBuLN8elI1HftyNVWaVjH7xtD+fLCDLu9i10kv3aeS+DiB T6f7yQQv7hgXS3xGvlMr2//aLwGD2ZdcRbkOEGo+k7yUjQbIDH/wdZWcPLh6y4yT p20i2ulYKjEZFmXDMa17diONISeGO6iaDhee24XPDwNDp8qI1iPGJsmxltMmn8Qf d2HPF1IDh4eM8lCwmqBtjYTnJd6rAW0v3+Ek1+wzQKVeXLFiz/MEyuOldtpsqmMO 8Og0vr6zfTCjFo8uvyj+cF7Fcj0yIPWg1yb7EauqqxreK8V9GBA1V2ZXYVd8xwea thrAUaq8f+PYQ9uy1FsN3xaO3BFN1VpcvHu4/3gU3OudnZZt2Ae670RYHKC0bq8D 2tSsqaiDnlvniHgh4xvtNIvRANkDS1ZSbkUPZhMHL7DnRJn66oDIfCr7NMbZwvCa AS0q6suUFyXFbAEJcY6XWxe3aQ3WuxIClT84MgzX/dAK2Qcl8ryWGGSVc0dp4Vl1 cd8MtmpnIWsnxqNRl4jn6cfolDheaxL8nouLtJ+3/dC9VkyDyfmrtnM+8aTZKHoa 3/xrBuVkEJAwkAAr8Pb8 =qgti -----END PGP SIGNATURE----- Merge tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a some improvements to the dlm networking layer: improving the ability to trace dlm messages for debugging, and improved handling of bad messages or disrupted connections" * tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: implement tcp graceful shutdown fs: dlm: change handling of reconnects fs: dlm: don't close socket on invalid message fs: dlm: set skb mark per peer socket fs: dlm: set skb mark for listen socket net: sock: add sock_set_mark dlm: Fix kobject memleak	2020-08-06 19:44:25 -07:00
Linus Torvalds	96e3f3c16b	- Add support to enable/disable the thermal zones resulting on core code and drivers cleanup (Andrzej Pietrasiewicz) - Add generic netlink support for userspace notifications: events, temperature and discovery commands (Daniel Lezcano) - Fix redundant initialization for a ret variable (Colin Ian King) - Remove the clock cooling code as it is used nowhere (Amit Kucheria) - Add the rcar_gen3_thermal's r8a774e1 support (Marian-Cristian Rotariu) - Replace all references to thermal.txt in the documentation to the corresponding yaml files (Amit Kucheria) - Add maintainer entry for the IPA (Lukasz Luba) - Add support for MSM8939 for the tsens (Shawn Guo) - Update power allocator and devfreq cooling to SPDX licensing (Lukasz Luba) - Add Cannon Lake Low Power PCH support (Sumeet Pawnikar) - Add tsensor support for V2 mediatek thermal system (Henry Yen) - Fix thermal zone lookup by ID for the core code (Thierry Reding) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAl8q7tsACgkQqDIjiipP 6E+5Rwf7BFEn5YXPvng8cmnAlgvEBc9DdT6mGSo0NpFm9MdUxXlaqvw3WWSGyqWQ +z0Ka7lmn5XyiMsVN11++Snp+79X17HzZf9SXO3glyIpAn+5prTDRhzzj0/jPrtS sEeI++DrILsKKMGVljzftLmwNJN9DkUDNcnmWmZdCDbYVEKtP9Pjf2wBjAnXj7sX JA3CkHRMwYLEQbfaKz37M11cYM+LqbDOlb6U11YWgAGGJ7d7zNYRf2/YSYPM4AN6 iE6j0E+3jIlXesULsap1AzeJaBq+wFxj1FL2TUZ8KscvRrm3AucqzNAT2M/Bc5Az XLKKzc6Gp9JfqB5KXhX2EDu7VRnDBg== =cSMN -----END PGP SIGNATURE----- Merge tag 'thermal-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Add support to enable/disable the thermal zones resulting on core code and drivers cleanup (Andrzej Pietrasiewicz) - Add generic netlink support for userspace notifications: events, temperature and discovery commands (Daniel Lezcano) - Fix redundant initialization for a ret variable (Colin Ian King) - Remove the clock cooling code as it is used nowhere (Amit Kucheria) - Add the rcar_gen3_thermal's r8a774e1 support (Marian-Cristian Rotariu) - Replace all references to thermal.txt in the documentation to the corresponding yaml files (Amit Kucheria) - Add maintainer entry for the IPA (Lukasz Luba) - Add support for MSM8939 for the tsens (Shawn Guo) - Update power allocator and devfreq cooling to SPDX licensing (Lukasz Luba) - Add Cannon Lake Low Power PCH support (Sumeet Pawnikar) - Add tsensor support for V2 mediatek thermal system (Henry Yen) - Fix thermal zone lookup by ID for the core code (Thierry Reding) * tag 'thermal-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (40 commits) thermal: intel: intel_pch_thermal: Add Cannon Lake Low Power PCH support thermal: mediatek: Add tsensor support for V2 thermal system thermal: mediatek: Prepare to add support for other platforms thermal: Update power allocator and devfreq cooling to SPDX licensing MAINTAINERS: update entry to thermal governors file name prefixing thermal: core: Add thermal zone enable/disable notification thermal: qcom: tsens-v0_1: Add support for MSM8939 dt-bindings: tsens: qcom: Document MSM8939 compatible thermal: core: Fix thermal zone lookup by ID thermal: int340x: processor_thermal: fix: update Jasper Lake PCI id thermal: imx8mm: Support module autoloading thermal: ti-soc-thermal: Fix reversed condition in ti_thermal_expose_sensor() MAINTAINERS: Add maintenance information for IPA thermal: rcar_gen3_thermal: Do not shadow thcode variable dt-bindings: thermal: Get rid of thermal.txt and replace references thermal: core: Move initialization after core initcall thermal: netlink: Improve the initcall ordering net: genetlink: Move initialization to core_initcall thermal: rcar_gen3_thermal: Add r8a774e1 support thermal/drivers/clock_cooling: Remove clock_cooling code ...	2020-08-06 18:10:55 -07:00
Yonghong Song	5e7b30205c	bpf: Change uapi for bpf iterator map elements Commit `a5cbe05a66` ("bpf: Implement bpf iterator for map elements") added bpf iterator support for map elements. The map element bpf iterator requires info to identify a particular map. In the above commit, the attr->link_create.target_fd is used to carry map_fd and an enum bpf_iter_link_info is added to uapi to specify the target_fd actually representing a map_fd: enum bpf_iter_link_info { BPF_ITER_LINK_UNSPEC = 0, BPF_ITER_LINK_MAP_FD = 1, MAX_BPF_ITER_LINK_INFO, }; This is an extensible approach as we can grow enumerator for pid, cgroup_id, etc. and we can unionize target_fd for pid, cgroup_id, etc. But in the future, there are chances that more complex customization may happen, e.g., for tasks, it could be filtered based on both cgroup_id and user_id. This patch changed the uapi to have fields __aligned_u64 iter_info; __u32 iter_info_len; for additional iter_info for link_create. The iter_info is defined as union bpf_iter_link_info { struct { __u32 map_fd; } map; }; So future extension for additional customization will be easier. The bpf_iter_link_info will be passed to target callback to validate and generic bpf_iter framework does not need to deal it any more. Note that map_fd = 0 will be considered invalid and -EBADF will be returned to user space. Fixes: `a5cbe05a66` ("bpf: Implement bpf iterator for map elements") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com	2020-08-06 16:39:14 -07:00
Alexander Aring	84d1c61740	net: sock: add sock_set_mark This patch adds a new socket helper function to set the mark value for a kernel socket. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2020-08-06 10:30:50 -05:00
Linus Torvalds	47ec5303d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...	2020-08-05 20:13:21 -07:00
Stefano Brivio	8ed54f167a	ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM On architectures defining _HAVE_ARCH_IPV6_CSUM, we get csum_ipv6_magic() defined by means of arch checksum.h headers. On other architectures, we actually need to include net/ip6_checksum.h to be able to use it. Without this include, building with defconfig breaks at least for s390. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `4cb47a8644` ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-05 12:30:36 -07:00
Paolo Abeni	adf7341064	mptcp: be careful on subflow creation Nicolas reported the following oops: [ 1521.392541] BUG: kernel NULL pointer dereference, address: 00000000000000c0 [ 1521.394189] #PF: supervisor read access in kernel mode [ 1521.395376] #PF: error_code(0x0000) - not-present page [ 1521.396607] PGD 0 P4D 0 [ 1521.397156] Oops: 0000 [#1] SMP PTI [ 1521.398020] CPU: 0 PID: 22986 Comm: kworker/0:2 Not tainted 5.8.0-rc4+ #109 [ 1521.399618] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1521.401728] Workqueue: events mptcp_worker [ 1521.402651] RIP: 0010:mptcp_subflow_create_socket+0xf1/0x1c0 [ 1521.403954] Code: 24 08 89 44 24 04 48 8b 7a 18 e8 2a 48 d4 ff 8b 44 24 04 85 c0 75 7a 48 8b 8b 78 02 00 00 48 8b 54 24 08 48 8d bb 80 00 00 00 <48> 8b 89 c0 00 00 00 48 89 8a c0 00 00 00 48 8b 8b 78 02 00 00 8b [ 1521.408201] RSP: 0000:ffffabc4002d3c60 EFLAGS: 00010246 [ 1521.409433] RAX: 0000000000000000 RBX: ffffa0b9ad8c9a00 RCX: 0000000000000000 [ 1521.411096] RDX: ffffa0b9ae78a300 RSI: 00000000fffffe01 RDI: ffffa0b9ad8c9a80 [ 1521.412734] RBP: ffffa0b9adff2e80 R08: ffffa0b9af02d640 R09: ffffa0b9ad923a00 [ 1521.414333] R10: ffffabc4007139f8 R11: fefefefefefefeff R12: ffffabc4002d3cb0 [ 1521.415918] R13: ffffa0b9ad91fa58 R14: ffffa0b9ad8c9f9c R15: 0000000000000000 [ 1521.417592] FS: 0000000000000000(0000) GS:ffffa0b9af000000(0000) knlGS:0000000000000000 [ 1521.419490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1521.420839] CR2: 00000000000000c0 CR3: 000000002951e006 CR4: 0000000000160ef0 [ 1521.422511] Call Trace: [ 1521.423103] __mptcp_subflow_connect+0x94/0x1f0 [ 1521.425376] mptcp_pm_create_subflow_or_signal_addr+0x200/0x2a0 [ 1521.426736] mptcp_worker+0x31b/0x390 [ 1521.431324] process_one_work+0x1fc/0x3f0 [ 1521.432268] worker_thread+0x2d/0x3b0 [ 1521.434197] kthread+0x117/0x130 [ 1521.435783] ret_from_fork+0x22/0x30 on some unconventional configuration. The MPTCP protocol is trying to create a subflow for an unaccepted server socket. That is allowed by the RFC, even if subflow creation will likely fail. Unaccepted sockets have still a NULL sk_socket field, avoid the issue by failing earlier. Reported-and-tested-by: Nicolas Rybowski <nicolas.rybowski@tessares.net> Fixes: `7d14b0d2b9` ("mptcp: set correct vfs info for subflows") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-05 12:24:20 -07:00
Xin Long	5a6f6f5791	tipc: set ub->ifindex for local ipv6 address Without ub->ifindex set for ipv6 address in tipc_udp_enable(), ipv6_sock_mc_join() may make the wrong dev join the multicast address in enable_mcast(). This causes that tipc links would never be created. So fix it by getting the right netdev and setting ub->ifindex, as it does for ipv4 address. Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-05 12:19:52 -07:00
Xin Long	81f6cb3122	ipv6: add ipv6_dev_find() This is to add an ip_dev_find like function for ipv6, used to find the dev by saddr. It will be used by TIPC protocol. So also export it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-05 12:19:52 -07:00
Tonghao Zhang	5845589ed6	net: openvswitch: silence suspicious RCU usage warning ovs_flow_tbl_destroy always is called from RCU callback or error path. It is no need to check if rcu_read_lock or lockdep_ovsl_is_held was held. ovs_dp_cmd_fill_info always is called with ovs_mutex, So use the rcu_dereference_ovsl instead of rcu_dereference in ovs_flow_tbl_masks_cache_size. Fixes: `9bf24f594c` ("net: openvswitch: make masks cache size configurable") Cc: Eelco Chaudron <echaudro@redhat.com> Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-05 12:11:46 -07:00
Linus Torvalds	2324d50d05	It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU= =JQLZ -----END PGP SIGNATURE----- Merge tag 'docs-5.9' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <alobakin@marvell.com> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...	2020-08-04 22:47:54 -07:00
Olga Kornievskaia	7de62bc09f	SUNRPC dont update timeout value on connection reset Current behaviour: every time a v3 operation is re-sent to the server we update (double) the timeout. There is no distinction between whether or not the previous timer had expired before the re-sent happened. Here's the scenario: 1. Client sends a v3 operation 2. Server RST-s the connection (prior to the timeout) (eg., connection is immediately reset) 3. Client re-sends a v3 operation but the timeout is now 120sec. As a result, an application sees 2mins pause before a retry in case server again does not reply. Instead, this patch proposes to keep track off when the minor timeout should happen and if it didn't, then don't update the new timeout. Value is updated based on the previous value to make timeouts predictable. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-08-04 23:17:11 -04:00
Linus Torvalds	3950e97543	Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "During the development of v5.7 I ran into bugs and quality of implementation issues related to exec that could not be easily fixed because of the way exec is implemented. So I have been diggin into exec and cleaning up what I can. This cycle I have been looking at different ideas and different implementations to see what is possible to improve exec, and cleaning the way exec interfaces with in kernel users. Only cleaning up the interfaces of exec with rest of the kernel has managed to stabalize and make it through review in time for v5.9-rc1 resulting in 2 sets of changes this cycle. - Implement kernel_execve - Make the user mode driver code a better citizen With kernel_execve the code size got a little larger as the copying of parameters from userspace and copying of parameters from userspace is now separate. The good news is kernel threads no longer need to play games with set_fs to use exec. Which when combined with the rest of Christophs set_fs changes should security bugs with set_fs much more difficult" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) exec: Implement kernel_execve exec: Factor bprm_stack_limits out of prepare_arg_pages exec: Factor bprm_execve out of do_execve_common exec: Move bprm_mm_init into alloc_bprm exec: Move initialization of bprm->filename into alloc_bprm exec: Factor out alloc_bprm exec: Remove unnecessary spaces from binfmts.h umd: Stop using split_argv umd: Remove exit_umh bpfilter: Take advantage of the facilities of struct pid exit: Factor thread_group_exited out of pidfd_poll umd: Track user space drivers with struct pid bpfilter: Move bpfilter_umh back into init data exec: Remove do_execve_file umh: Stop calling do_execve_file umd: Transform fork_usermode_blob into fork_usermode_driver umd: Rename umd_info.cmdline umd_info.driver_name umd: For clarity rename umh_info umd_info umh: Separate the user mode driver and the user mode helper support umh: Remove call_usermodehelper_setup_file. ...	2020-08-04 14:27:25 -07:00
Linus Torvalds	fd76a74d94	audit/stable-5.9 PR 20200803 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl8okpIUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNqOQ/8D+m9Ykcby3csEKsp8YtsaukEu62U lRVaxzRNO9wwB24aFwDFuJnIkmsSi/s/O4nBsy2mw+Apn+uDCvHQ9tBU07vlNn2f lu27YaTya7YGlqoe315xijd8tyoX99k8cpQeixvAVr9/jdR09yka7SJ8O7X9mjV7 +SUVDiKCplPKpiwCCRS9cqD7F64T6y35XKzbrzYqdP0UOF2XelZo/Evt5rDRvWUf 5qDN2tP+iM/Fvu5lCfczFwAeivfAdxjQ11n783hx8Ms2qyiaKQCzbEwjqAslmkbs 1k/+ED0NjzXX1ne0JZaz/bk0wsMnmOoa8o+NDcyd7Za/cj5prUZi7kBy+xry4YV8 qKJ40Lk0flCWgUpm6bkYVOByIYHk0gmfBNvjilqf25NR/eOC/9e9ir8PywvYUW/7 kvVK37+N/a3LnFj80sZpIeqqnNU8z9PV1i7//5/kDuKvz94Bq83TJDO6pPKvqDtC njQfCFoHwdEeF8OalK793lIiYaoODqvbkWKChKMqziODJ4ZP8AW06gXpEbEWn7G3 TTnJx7hqzR9t90vBQJeO3Fromfn+9TDlZVdX+EGO8gIqUiLGr0r7LPPep4VkDbNw LxMYKeC2cgRp8Z+XXPDxfXSDL2psTwg6CXcDrXcYnUyBo/yerpBvbJkeaR0h+UR0 j6cvMX+T39X2JXM= =Xs3M -----END PGP SIGNATURE----- Merge tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Aside from some smaller bug fixes, here are the highlights: - add a new backlog wait metric to the audit status message, this is intended to help admins determine how long processes have been waiting for the audit backlog queue to clear - generate audit records for nftables configuration changes - generate CWD audit records for for the relevant LSM audit records" * tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: report audit wait metric in audit status reply audit: purge audit_log_string from the intra-kernel audit API audit: issue CWD record to accompany LSM_AUDIT_DATA_* records audit: use the proper gfp flags in the audit_log_nfcfg() calls audit: remove unused !CONFIG_AUDITSYSCALL __audit_inode* stubs audit: add gfp parameter to audit_log_nfcfg audit: log nftables configuration change events audit: Use struct_size() helper in alloc_chunk	2020-08-04 14:20:26 -07:00
Linus Torvalds	9ecc6ea491	seccomp updates for v5.9-rc1 - Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce "addfd" command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oZcQWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJomDD/4x3j7eXREcXDsHOmlgEaHWGx4l JldHFQhV5GjmD7gOkPcoZSG7NfG7F6VpwAJg7ZoR3qUkem7K8DFucxqgo1RldCot nigleeLX6JeMS0Z+iwjAVZd+5t4xG4J/7GGDHIIMiG5qvwJ0Yf64o1bkjaB2Q/Bv tluBg0WF32kFMG/ZwyY/V2QDbbue97CFPflybOh1o2nWbVzmUlFEEum3UUvZsxc8 smMsattJyuAV7kcEKzKrs8b010NdFZqwdbub5Np9W3XEXGBYMdIPoNsOQGmB9wby j2ui0lzboXRG997jM7TCd1l/XZAv8aAwvPplw3FJRybzkOGs9NDyLMoz87yJpR1T xp511vnMyMbyKIGdungkt7cIyzaictHwaYzznsmuNdCPEjTaIQJr1ctsa4GEgtqf pnkktZ9YbMCcHU0CtZ8GlOVqA9wE+FUm0/u0zgikzJQsB+HcNItiARTTTHRyco7p VJCqK8o4Zx4ELV7QNkSH4nhFkVgRopvrvBiPAGro/qwGOofBg8W8wM8O1+V/MDmp zSU22v4SncT1Xb7dtmdJqDEeHfDikhaCAb4Je2hsGQWzbdAqwHGlpa7vpk9x3Q5r L+XyP+Z+rPHlXYyypJwUvvOQhXOmP0zYxcEHxByqIBfXiwy+3dN4tDDfatWbccwl uTlTDM8kmQn6QzSztA== =yb55 -----END PGP SIGNATURE----- Merge tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "There are a bunch of clean ups and selftest improvements along with two major updates to the SECCOMP_RET_USER_NOTIF filter return: EPOLLHUP support to more easily detect the death of a monitored process, and being able to inject fds when intercepting syscalls that expect an fd-opening side-effect (needed by both container folks and Chrome). The latter continued the refactoring of __scm_install_fd() started by Christoph, and in the process found and fixed a handful of bugs in various callers. - Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)" * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD seccomp: Introduce addfd ioctl to seccomp user notifier fs: Expand __receive_fd() to accept existing fd pidfd: Replace open-coded receive_fd() fs: Add receive_fd() wrapper for __receive_fd() fs: Move __scm_install_fd() to __receive_fd() net/scm: Regularize compat handling of scm_detach_fds() pidfd: Add missing sock updates for pidfd_getfd() net/compat: Add missing sock updates for SCM_RIGHTS selftests/seccomp: Check ENOSYS under tracing selftests/seccomp: Refactor to use fixture variants selftests/harness: Clean up kern-doc for fixtures seccomp: Use -1 marker for end of mode 1 syscall list seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall() selftests/seccomp: Make kcmp() less required seccomp: Use pr_fmt selftests/seccomp: Improve calibration loop selftests/seccomp: use 90s as timeout selftests/seccomp: Expand benchmark to per-filter measurements ...	2020-08-04 14:11:08 -07:00
Linus Torvalds	99ea1521a0	Remove uninitialized_var() macro for v5.9-rc1 - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var() -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+ drhmmImUr1YylrtVOw== =xHmk -----END PGP SIGNATURE----- Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()	2020-08-04 13:49:43 -07:00
Linus Torvalds	427714f258	tasklets API update for v5.9-rc1 - Prepare for tasklet API modernization (Romain Perier, Allen Pais, Kees Cook) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oXpMWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJtJgEACVb88nzYwu5mC5ZcfvwSyXeQsR eDpCkX5HT6CsxlOn0/YJvxUtkkerQftbRuAXrzoUpQkpyBh82PviVZFKDS7NE9Lc 6xPqloi2gbZ8EfgMraVynL+9lpLh0+qNCM7LPg4xT+JxMDLut/nWRdrp8d7uBYfQ AXV6CV4Tc4ijOMROV6AEVVdSTzkRCbiqUnRDBLETBfiJOdDn5MgJgxicWvN5FTpu PiUVF3CtWaKCRfQO/GEAXTG65hOtmql5IbX9n7uooNu/wCCnEFfVUus1uTcsrqxN ByrZ56NVPoO7z2jYLt8Lft3myo2e/mn88PKqrzS2p9GPn0VBv7rcO3ePmbbHL/RA mp+pg8wdpmKrHv4YGfsF+obT1v8f6VJoTLUt5S/WqZAzl1sVJgEJdAkjmDKythGG yYKKCemMceMMzLXxnFAYMzdXzdXZ3YEpiW4UkBb77EhUisDrLxCHSL5t4UzyWnuO Gtzw7N69iHPHLsxAk1hESAD8sdlk2EdN6vzJVelOsiW955x1hpR+msvNpwZwBqdq A2h8VnnrxLK2APl93T5VW9T6kvhzaTwLhoCH+oKklE+U0XJTAYZ4D/AcRVghBvMg bC1+1vDx+t/S+8P308evPQnEygLtL2I+zpPnBA1DZzHRAoY8inCLc5HQOfr6pi/f koNTtKkmSSKaFSYITw== =hb+e -----END PGP SIGNATURE----- Merge tag 'tasklets-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull tasklets API update from Kees Cook: "These are the infrastructure updates needed to support converting the tasklet API to something more modern (and hopefully for removal further down the road). There is a 300-patch series waiting in the wings to get set out to subsystem maintainers, but these changes need to be present in the kernel first. Since this has some treewide changes, I carried this series for -next instead of paining Thomas with it in -tip, but it's got his Ack. This is similar to the timer_struct modernization from a while back, but not nearly as messy (I hope). :) - Prepare for tasklet API modernization (Romain Perier, Allen Pais, Kees Cook)" * tag 'tasklets-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: tasklet: Introduce new initialization API treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() usb: gadget: udc: Avoid tasklet passing a global	2020-08-04 13:40:35 -07:00
David S. Miller	ee895a30ef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Flush the cleanup xtables worker to make sure destructors have completed, from Florian Westphal. 2) iifgroup is matching erroneously, also from Florian. 3) Add selftest for meta interface matching, from Florian Westphal. 4) Move nf_ct_offload_timeout() to header, from Roi Dayan. 5) Call nf_ct_offload_timeout() from flow_offload_add() to make sure garbage collection does not evict offloaded flow, from Roi Dayan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-04 13:32:39 -07:00
Stefano Brivio	4cb47a8644	tunnels: PMTU discovery support for directly bridged IP packets It's currently possible to bridge Ethernet tunnels carrying IP packets directly to external interfaces without assigning them addresses and routes on the bridged network itself: this is the case for UDP tunnels bridged with a standard bridge or by Open vSwitch. PMTU discovery is currently broken with those configurations, because the encapsulation effectively decreases the MTU of the link, and while we are able to account for this using PMTU discovery on the lower layer, we don't have a way to relay ICMP or ICMPv6 messages needed by the sender, because we don't have valid routes to it. On the other hand, as a tunnel endpoint, we can't fragment packets as a general approach: this is for instance clearly forbidden for VXLAN by RFC 7348, section 4.3: VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may fragment encapsulated VXLAN packets due to the larger frame size. The destination VTEP MAY silently discard such VXLAN fragments. The same paragraph recommends that the MTU over the physical network accomodates for encapsulations, but this isn't a practical option for complex topologies, especially for typical Open vSwitch use cases. Further, it states that: Other techniques like Path MTU discovery (see [RFC1191] and [RFC1981]) MAY be used to address this requirement as well. Now, PMTU discovery already works for routed interfaces, we get route exceptions created by the encapsulation device as they receive ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and we already rebuild those messages with the appropriate MTU and route them back to the sender. Add the missing bits for bridged cases: - checks in skb_tunnel_check_pmtu() to understand if it's appropriate to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and RFC 4443 section 2.4 for ICMPv6. This function is already called by UDP tunnels - a new function generating those ICMP or ICMPv6 replies. We can't reuse icmp_send() and icmp6_send() as we don't see the sender as a valid destination. This doesn't need to be generic, as we don't cover any other type of ICMP errors given that we only provide an encapsulation function to the sender While at it, make the MTU check in skb_tunnel_check_pmtu() accurate: we might receive GSO buffers here, and the passed headroom already includes the inner MAC length, so we don't have to account for it a second time (that would imply three MAC headers on the wire, but there are just two). This issue became visible while bridging IPv6 packets with 4500 bytes of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50 bytes of encapsulation headroom, we would advertise MTU as 3950, and we would reject fragmented IPv6 datagrams of 3958 bytes size on the wire. We're exclusively dealing with network MTU here, though, so we could get Ethernet frames up to 3964 octets in that case. v2: - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern) - split IPv4/IPv6 functions (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-04 13:01:45 -07:00
Stefano Brivio	df23bb18b4	ipv4: route: Ignore output interface in FIB lookup for PMTU route Currently, processes sending traffic to a local bridge with an encapsulation device as a port don't get ICMP errors if they exceed the PMTU of the encapsulated link. David Ahern suggested this as a hack, but it actually looks like the correct solution: when we update the PMTU for a given destination by means of updating or creating a route exception, the encapsulation might trigger this because of PMTU discovery happening either on the encapsulation device itself, or its lower layer. This happens on bridged encapsulations only. The output interface shouldn't matter, because we already have a valid destination. Drop the output interface restriction from the associated route lookup. For UDP tunnels, we will now have a route exception created for the encapsulation itself, with a MTU value reflecting its headroom, which allows a bridge forwarding IP packets originated locally to deliver errors back to the sending socket. The behaviour is now consistent with IPv6 and verified with selftests pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in this series. v2: - reset output interface only for bridge ports (David Ahern) - add and use netif_is_any_bridge_port() helper (David Ahern) Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-04 13:01:45 -07:00
David S. Miller	2e7199bd77	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-08-04 The following pull-request contains BPF updates for your net-next tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 135 files changed, 4603 insertions(+), 1013 deletions(-). The main changes are: 1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko. 2) Add BPF iterator for map elements and to iterate all BPF programs for efficient in-kernel inspection, from Yonghong Song and Alexei Starovoitov. 3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid unwinder errors, from Song Liu. 4) Allow cgroup local storage map to be shared between programs on the same cgroup. Also extend BPF selftests with coverage, from YiFei Zhu. 5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM load instructions, from Jean-Philippe Brucker. 6) Follow-up fixes on BPF socket lookup in combination with reuseport group handling. Also add related BPF selftests, from Jakub Sitnicki. 7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for socket create/release as well as bind functions, from Stanislav Fomichev. 8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct xdp_statistics, from Peilin Ye. 9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime. 10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6} fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin. 11) Fix a bpftool segfault due to missing program type name and make it more robust to prevent them in future gaps, from Quentin Monnet. 12) Consolidate cgroup helper functions across selftests and fix a v6 localhost resolver issue, from John Fastabend. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:27:40 -07:00
David S. Miller	76769c38b4	mlx5-updates-2020-08-03 This patchset introduces some updates to mlx5 driver. 1) Jakub converts mlx5 to use the new udp tunnel infrastructure. Starting with a hack to allow drivers to request a static configuration of the default vxlan port, and then a patch that converts mlx5. 2) Parav implements change_carrier ndo for VF eswitch representors, to speedup link state control of representors netdevices. 3) Alex Vesker, makes a simple update to software steering to fix an issue with push vlan action sequence 4) Leon removes a redundant dump stack on error flow. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl8oRdgACgkQSD+KveBX +j4/LQgAkSjNzOaS7bVDzhoYL3aBQOMIzgocJUeVi7xXH8IO1uy55mNDrKBqjxbW dy9U9VsvV5i2V2qkkQLvHVkoDSg8Buo2Uxu4OrZHOLN0KfbFrra4VvmB1CzEBix8 FICnQaZZcE7529P04TgZ8Mo9vRb5VdJFhqED5Nvegy+y8FolEsQYbjIoDBE6wa0j Meqa/29+XCE5FzTOjbbQWizAnRZMbkxtSSreDNgeHxke9eMSO+fmwKScng63QUfl 7nfU6dW6A0d1kHhpL5RqAFOcmkpSdqYaA3SA+/8pPT9X3yOAkxE6KTKGIixpB9JX zQt+Wkna49jJ/JfDQB5vgww5c0HjAQ== =j0fG -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-08-03 This patchset introduces some updates to mlx5 driver. 1) Jakub converts mlx5 to use the new udp tunnel infrastructure. Starting with a hack to allow drivers to request a static configuration of the default vxlan port, and then a patch that converts mlx5. 2) Parav implements change_carrier ndo for VF eswitch representors, to speedup link state control of representors netdevices. 3) Alex Vesker, makes a simple update to software steering to fix an issue with push vlan action sequence 4) Leon removes a redundant dump stack on error flow. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:24:30 -07:00
Paolo Abeni	8555c6bfd5	mptcp: fix bogus sendmsg() return code under pressure In case of memory pressure, mptcp_sendmsg() may call sk_stream_wait_memory() after succesfully xmitting some bytes. If the latter fails we currently return to the user-space the error code, ignoring the succeful xmit. Address the issue always checking for the xmitted bytes before mptcp_sendmsg() completes. Fixes: `f296234c98` ("mptcp: Add handling of incoming MP_JOIN requests") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:14:20 -07:00
Ido Schimmel	c88e11e047	devlink: Pass extack when setting trap's action and group's parameters A later patch will refuse to set the action of certain traps in mlxsw and also to change the policer binding of certain groups. Pass extack so that failure could be communicated clearly to user space. Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:06:46 -07:00
Amit Cohen	08e335f6ad	devlink: Add early_drop trap Add the packet trap that can report packets that were ECN marked due to RED AQM. Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:06:46 -07:00
YueHaibing	80fbbb1672	fib: Fix undef compile warning net/core/fib_rules.c:26:7: warning: "CONFIG_IP_MULTIPLE_TABLES" is not defined, evaluates to 0 [-Wundef] #elif CONFIG_IP_MULTIPLE_TABLES ^~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `8b66a6fd34` ("fib: fix another fib_rules_ops indirect call wrapper problem") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-By: Brian Vazquez <brianvv@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:01:49 -07:00
Geliang Tang	190f8b060e	mptcp: use mptcp_for_each_subflow in mptcp_stream_accept Use mptcp_for_each_subflow in mptcp_stream_accept instead of open-coding. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:01:14 -07:00
David S. Miller	ee494f42a3	A few more changes, notably: * handle new SAE (WPA3 authentication) status codes in the correct way * fix a while that should be an if instead, avoiding infinite loops * handle beacon filtering changing better -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl8n/fgACgkQB8qZga/f l8TiOA/+PwGoXiP7WPulI+V8Bzv/Tmg6rqKEUFX1hxaGQOAlr9hqV2jvdxkOZmIM sQc19vT4TUMiwrdYHOKcKlY5jhi5r8aIzxSA98mGqqKsvERhJWuzoaPR5aHuqOUD VxUidOieV+H4Ol1YxV1Tx4eCmLR8ZPjg9EebMGNPOrASg7JXCjVXb5Pp8XXmsuDn uXdPc9QQ+DMLqCEF+1C+jhH2lDUfyk78AgPuXw7E3CP/copiKljKL7AApoFebpkM Cu0g6hYLBDAnkuZgL7PBw/MQThN5oMLDeX8RNK0uXDy/kdnXd7RL0IMcQpmwTn81 ZeewZjYq/Ha64sscR87kbzzFM15Vo/UhyhDfqQki/2zrQ26GoQdtgbj/OB3CMMd/ 8kBtZPoItlIkpO18kN9gBO/nB0cGghtsLn8u8pKdQ5RiIv1Q4JqerN4rHCyMgLrk FeOH25feAtuJ5MlOkIuHDYTfxNZnw1DE6DQv0hZuui1dbMPHUZuMFrCgZscuVBfj HOzhcyvs2bdCU4ZKLGiiq1zTxPply9rESmk4tvlDo5strqCKUEez8VgtZIPw1LZp pD43UHAfRiBEAF1pO/l1E6LyTvHMHYFKMiqzwCEXZXf2L276hlQ1rYoHeWEP0hhO N03gwHiMs1xug6MeYhsdeyR0tqXb1N7XwnkpSwgtVOH2z+pCBx4= =9iOZ -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A few more changes, notably: * handle new SAE (WPA3 authentication) status codes in the correct way * fix a while that should be an if instead, avoiding infinite loops * handle beacon filtering changing better ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 18:00:22 -07:00
Ioana-Ruxandra Stăncioi	88fab21c69	seg6_iptunnel: Refactor seg6_lwt_headroom out of uapi header Refactor the function seg6_lwt_headroom out of the seg6_iptunnel.h uapi header, because it is only used in seg6_iptunnel.c. Moreover, it is only used in the kernel code, as indicated by the "#ifdef __KERNEL__". Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Ioana-Ruxandra Stăncioi <stancioi@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 17:57:40 -07:00
Jianfeng Wang	730e700e2c	tcp: apply a floor of 1 for RTT samples from TCP timestamps For retransmitted packets, TCP needs to resort to using TCP timestamps for computing RTT samples. In the common case where the data and ACK fall in the same 1-millisecond interval, TCP senders with millisecond- granularity TCP timestamps compute a ca_rtt_us of 0. This ca_rtt_us of 0 propagates to rs->rtt_us. This value of 0 can cause performance problems for congestion control modules. For example, in BBR, the zero min_rtt sample can bring the min_rtt and BDP estimate down to 0, reduce snd_cwnd and result in a low throughput. It would be hard to mitigate this with filtering in the congestion control module, because the proper floor to apply would depend on the method of RTT sampling (using timestamp options or internally-saved transmission timestamps). This fix applies a floor of 1 for the RTT sample delta from TCP timestamps, so that seq_rtt_us, ca_rtt_us, and rs->rtt_us will be at least 1 * (USEC_PER_SEC / TCP_TS_HZ). Note that the receiver RTT computation in tcp_rcv_rtt_measure() and min_rtt computation in tcp_update_rtt_min() both already apply a floor of 1 timestamp tick, so this commit makes the code more consistent in avoiding this edge case of a value of 0. Signed-off-by: Jianfeng Wang <jfwang@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Kevin Yang <yyd@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 17:54:03 -07:00
Huang Guobin	fbc97de84e	tipc: Use is_broadcast_ether_addr() instead of memcmp() Using is_broadcast_ether_addr() instead of directly use memcmp() to determine if the ethernet address is broadcast address. spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Huang Guobin <huangguobin4@huawei.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 16:21:46 -07:00
David S. Miller	f2e0b29a9a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) UAF in chain binding support from previous batch, from Dan Carpenter. 2) Queue up delayed work to expire connections with no destination, from Andrew Sy Kim. 3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva. 4) Replace HTTP links with HTTPS, from Alexander A. Klimov. 5) Remove superfluous null header checks in ip6tables, from Gaurav Singh. 6) Add extended netlink error reporting for expression. 7) Report EEXIST on overlapping chain, set elements and flowtable devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 16:03:18 -07:00
Vincent Duvert	d0f6ba2ef2	appletalk: Fix atalk_proc_init() return path Add a missing return statement to atalk_proc_init so it doesn't return -ENOMEM when successful. This allows the appletalk module to load properly. Fixes: `e2bcd8b0ce` ("appletalk: use remove_proc_subtree to simplify procfs code") Link: https://www.downtowndougbrown.com/2020/08/hacking-up-a-fix-for-the-broken-appletalk-kernel-module-in-linux-5-1-and-newer/ Reported-by: Christopher KOBAYASHI <chris@disavowed.jp> Reported-by: Doug Brown <doug@downtowndougbrown.com> Signed-off-by: Vincent Duvert <vincent.ldev@duvert.net> [lukas: add missing tags] Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v5.1+ Cc: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:48:32 -07:00
Miaohe Lin	2f631133c4	net: Pass NULL to skb_network_protocol() when we don't care about vlan depth When we don't care about vlan depth, we could pass NULL instead of the address of a unused local variable to skb_network_protocol() as a param. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:38:31 -07:00
Miaohe Lin	c15fc199b3	net: Use __skb_pagelen() directly in skb_cow_data() In fact, skb_pagelen() - skb_headlen() is equal to __skb_pagelen(), use it directly to avoid unnecessary skb_headlen() call. Also fix the CHECK note of checkpatch.pl: Comparison to NULL could be written "!__pskb_pull_tail" Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:38:31 -07:00
Lorenzo Bianconi	622e32b7d4	net: gre: recompute gre csum for sctp over gre tunnels The GRE tunnel can be used to transport traffic that does not rely on a Internet checksum (e.g. SCTP). The issue can be triggered creating a GRE or GRETAP tunnel and transmitting SCTP traffic ontop of it where CRC offload has been disabled. In order to fix the issue we need to recompute the GRE csum in gre_gso_segment() not relying on the inner checksum. The issue is still present when we have the CRC offload enabled. In this case we need to disable the CRC offload if we require GRE checksum since otherwise skb_checksum() will report a wrong value. Fixes: `90017accff` ("sctp: Add GSO support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:29:44 -07:00
Nikolay Aleksandrov	fd65e5a95d	net: bridge: clear bridge's private skb space on xmit We need to clear all of the bridge private skb variables as they can be stale due to the packet being recirculated through the stack and then transmitted through the bridge device. Similar memset is already done on bridge's input. We've seen cases where proxyarp_replied was 1 on routed multicast packets transmitted through the bridge to ports with neigh suppress which were getting dropped. Same thing can in theory happen with the port isolation bit as well. Fixes: `821f1b21ca` ("bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:26:46 -07:00
Florent Fourcot	ae79dbf609	ipv6/addrconf: use a boolean to choose between UNREGISTER/DOWN "how" was used as a boolean. Change the type to bool, and improve variable name Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:24:14 -07:00
Florent Fourcot	d208a42a62	ipv6/addrconf: call addrconf_ifdown with consistent values Second parameter of addrconf_ifdown "how" is used as a boolean internally. It does not make sense to call it with something different of 0 or 1. This value is set to 2 in all git history. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:24:14 -07:00
Eelco Chaudron	9bf24f594c	net: openvswitch: make masks cache size configurable This patch makes the masks cache size configurable, or with a size of 0, disable it. Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:17:48 -07:00
Eelco Chaudron	9d2f627b7e	net: openvswitch: add masks cache hit counter Add a counter that counts the number of masks cache hits, and export it through the megaflow netlink statistics. Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:17:48 -07:00
Gaurav Singh	71fed0bc86	ethtool: ethnl_set_linkmodes: remove redundant null check info cannot be NULL here since its being accessed earlier in the function: nlmsg_parse(info->nlhdr...). Remove this redundant NULL check. Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:11:49 -07:00
Peilin Ye	9aba6c5b49	openvswitch: Prevent kernel-infoleak in ovs_ct_put_key() ovs_ct_put_key() is potentially copying uninitialized kernel stack memory into socket buffers, since the compiler may leave a 3-byte hole at the end of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix it by initializing `orig` with memset(). Fixes: `9dd7f8907c` ("openvswitch: Add original direction conntrack tuple to sw_flow_key.") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:09:44 -07:00
wenxu	038ebb1a71	net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct When openvswitch conntrack offload with act_ct action. Fragment packets defrag in the ingress tc act_ct action and miss the next chain. Then the packet pass to the openvswitch datapath without the mru. The over mtu packet will be dropped in output action in openvswitch for over mtu. "kernel: net2: dropped over-mtu packet: 1528 > 1500" This patch add mru in the tc_skb_ext for adefrag and miss next chain situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru to the qdisc_skb_cb when the packet defrag. And When the chain miss, The mru is set to tc_skb_ext which can be got by ovs datapath. Fixes: `b57dc7c13e` ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:04:48 -07:00
Linus Torvalds	e4cbce4d13	The main changes in this cycle were: - Improve uclamp performance by using a static key for the fast path - Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for better power efficiency of RT tasks on battery powered devices. (The default is to maximize performance & reduce RT latencies.) - Improve utime and stime tracking accuracy, which had a fixed boundary of error, which created larger and larger relative errors as the values become larger. This is now replaced with more precise arithmetics, using the new mul_u64_u64_div_u64() helper in math64.h. - Improve the deadline scheduler, such as making it capacity aware - Improve frequency-invariant scheduling - Misc cleanups in energy/power aware scheduling - Add sched_update_nr_running tracepoint to track changes to nr_running - Documentation additions and updates - Misc cleanups and smaller fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oJDURHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ixLg//bqWzFlfWirvngTgDxDnplwUTyKXmMCcq R1IYhlyK2O5FxvhbRmdmW11W3yzyTPvgCs6Q/70negGaPNe2w1OxfxiK9NMKz5eu M1LoXas7pL5g7Pr/ZxxHk/8VqJLV4t9MkodiiInmV6lTaznT3sU6a/kpYQjJyFnG Tuu9jd6JhdRKmePDJnNmUBoGQ7JiOQDcX4HtkcQ3OA+An3624tmJzbW1yts+uj7J ZWo2EY60RfbA9MxQXGPOaR/nAjngWs4Q6tddAh10mftsPq1gR2iFUKju1d31MQt/ RHLdiqJf+AyUC4popKG7a+7ilCKMBwPociSreTJNPyEUQ1X4AM3vUVk4yjUoiDph k2WdsCF8/JRdhXg0NnrpPUqOaAbQj53EeXnitEb92E7WyTZgLOvAtpV//xZo6utp 2QHerfrQ9SoGQjz/ho78za5vQtV1x25yDhd+X4XV4QEhIy85G9/2JCpC/Kc/TXLf OO7A4X69XztKTEJhP60g8ldCPUe4N2vbh1vKY6oAD8AFQVVNZ6n7375/Qa//b0/k ++hcYkPc2EK97/aBFdvzDgqb7aUo7Mtn2ibke16sQU4szulaoRuAHQG4jdGKMwbD dk2VBoxyxeYFXWHsNneSe87+ha3sd0dSN0ul1EB/SlFrVELMvy634YXnMYGW8ima PzyPB0ezpuA= =PbO7 -----END PGP SIGNATURE----- Merge tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Improve uclamp performance by using a static key for the fast path - Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for better power efficiency of RT tasks on battery powered devices. (The default is to maximize performance & reduce RT latencies.) - Improve utime and stime tracking accuracy, which had a fixed boundary of error, which created larger and larger relative errors as the values become larger. This is now replaced with more precise arithmetics, using the new mul_u64_u64_div_u64() helper in math64.h. - Improve the deadline scheduler, such as making it capacity aware - Improve frequency-invariant scheduling - Misc cleanups in energy/power aware scheduling - Add sched_update_nr_running tracepoint to track changes to nr_running - Documentation additions and updates - Misc cleanups and smaller fixes * tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/doc: Factorize bits between sched-energy.rst & sched-capacity.rst sched/doc: Document capacity aware scheduling sched: Document arch_scale_*_capacity() arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE Documentation/sysctl: Document uclamp sysctl knobs sched/uclamp: Add a new sysctl to control RT default boost value sched/uclamp: Fix a deadlock when enabling uclamp static key sched: Remove duplicated tick_nohz_full_enabled() check sched: Fix a typo in a comment sched/uclamp: Remove unnecessary mutex_init() arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry arch_topology, sched/core: Cleanup thermal pressure definition trace/events/sched.h: fix duplicated word linux/sched/mm.h: drop duplicated words in comments smp: Fix a potential usage of stale nr_cpus sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal sched: nohz: stop passing around unused "ticks" parameter. sched: Better document ttwu() sched: Add a tracepoint to track rq->nr_running ...	2020-08-03 14:58:38 -07:00
Dmitry Yakunin	21594c4408	bpf: Allow to specify ifindex for skb in bpf_prog_test_run_skb Now skb->dev is unconditionally set to the loopback device in current net namespace. But if we want to test bpf program which contains code branch based on ifindex condition (eg filters out localhost packets) it is useful to allow specifying of ifindex from userspace. This patch adds such option through ctx_in (__sk_buff) parameter. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200803090545.82046-3-zeil@yandex-team.ru	2020-08-03 23:32:23 +02:00
Dmitry Yakunin	fa5cb548ce	bpf: Setup socket family and addresses in bpf_prog_test_run_skb Now it's impossible to test all branches of cgroup_skb bpf program which accesses skb->family and skb->{local,remote}_ip{4,6} fields because they are zeroed during socket allocation. This commit fills socket family and addresses from related fields in constructed skb. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200803090545.82046-2-zeil@yandex-team.ru	2020-08-03 23:31:59 +02:00
Linus Torvalds	8f0cb6660a	These are the latest RCU bits for v5.9: - kfree_rcu updates - RCU tasks updates - Read-side scalability tests - SRCU updates - Torture-test updates - Documentation updates - Miscellaneous fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8n80ERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gauA/+NtuExW9V9cPDZ8AAp6x6QfoEIgqN4VEk pYuyP0+ZbmwH+h8z7qPqMrwxUHQnhef7gqtlWa7wj9MawbEbmqnA/3uivjX/3Aao bGMMXkqXppc6hgwktgLNk8vfq3LRVEH2P0i0I+Tymgxu3DCHSGRep4LWfdAS/q3z 4pe5JXqdMx+Qnfy/bsVxJTaJAncMq1LQNAtWY1TIwK8L8RmpXrj5dvuLKUr7q+zl P+BfXyrdX+x05TpmHHnI/bR3w9yASL32E0S3IaQYRRqH8TsUIGHWe13Ib6hKXXG5 j7W5KrsOgr0fQBxi+JW2fgGQkrua4o7yk4H2Ygj+Fi5RvP2uqNZdvXFAlP2cUMu/ 7Pg8+7kC6jKIrwpD03s9ZZzm0QN3jsCxFs2PEkkHMzjXbe1CI4tIkTH6ex1uvjR2 v3OhCIp6ypxpEIJbFQucia0iQ4NF+evKjqCvRkbepqQ096jg+CNFh0VG0Tp8XR+y Gk9B9oXvLLPMd6ah5CI9nLJKiMWVRV8mvvqspoblGo//+39ksh4mzxm865tFXYg4 C+DPJvKlY15Ib5eJ/xr8EZ/oS0K2sUF9sMYnK4P8QMhyTBMbpAZiljHYK+Wujt8I g/JCWxrEMv3LHPY9/guB5Nod/Qb4Jqqm9iE9qEX3MQxtt2O2nmmWd91pzFcUXlFU RDBWYJ63Okg= =rNhf -----END PGP SIGNATURE----- Merge tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: - kfree_rcu updates - RCU tasks updates - Read-side scalability tests - SRCU updates - Torture-test updates - Documentation updates - Miscellaneous fixes * tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits) torture: Remove obsolete "cd $KVM" torture: Avoid duplicate specification of qemu command torture: Dump ftrace at shutdown only if requested torture: Add kvm-tranform.sh script for qemu-cmd files torture: Add more tracing crib notes to kvm.sh torture: Improve diagnostic for KCSAN-incapable compilers torture: Correctly summarize build-only runs torture: Pass --kmake-arg to all make invocations rcutorture: Check for unwatched readers torture: Abstract out console-log error detection torture: Add a stop-run capability torture: Create qemu-cmd in --buildonly runs rcu/rcutorture: Replace 0 with false torture: Add --allcpus argument to the kvm.sh script torture: Remove whitespace from identify_qemu_vcpus output rcutorture: NULL rcu_torture_current earlier in cleanup code rcutorture: Handle non-statistic bang-string error messages torture: Set configfile variable to current scenario rcutorture: Add races with task-exit processing locktorture: Use true and false to assign to bool variables ...	2020-08-03 14:31:33 -07:00
Linus Torvalds	ab5c60b79a	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add support for allocating transforms on a specific NUMA Node - Introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY for storage users Algorithms: - Drop PMULL based ghash on arm64 - Fixes for building with clang on x86 - Add sha256 helper that does the digest in one go - Add SP800-56A rev 3 validation checks to dh Drivers: - Permit users to specify NUMA node in hisilicon/zip - Add support for i.MX6 in imx-rngc - Add sa2ul crypto driver - Add BA431 hwrng driver - Add Ingenic JZ4780 and X1000 hwrng driver - Spread IRQ affinity in inside-secure and marvell/cesa" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (157 commits) crypto: sa2ul - Fix inconsistent IS_ERR and PTR_ERR hwrng: core - remove redundant initialization of variable ret crypto: x86/curve25519 - Remove unused carry variables crypto: ingenic - Add hardware RNG for Ingenic JZ4780 and X1000 dt-bindings: RNG: Add Ingenic RNG bindings. crypto: caam/qi2 - add module alias crypto: caam - add more RNG hw error codes crypto: caam/jr - remove incorrect reference to caam_jr_register() crypto: caam - silence .setkey in case of bad key length crypto: caam/qi2 - create ahash shared descriptors only once crypto: caam/qi2 - fix error reporting for caam_hash_alloc crypto: caam - remove deadcode on 32-bit platforms crypto: ccp - use generic power management crypto: xts - Replace memcpy() invocation with simple assignment crypto: marvell/cesa - irq balance crypto: inside-secure - irq balance crypto: ecc - SP800-56A rev 3 local public key validation crypto: dh - SP800-56A rev 3 local public key validation crypto: dh - check validity of Z before export lib/mpi: Add mpi_sub_ui() ...	2020-08-03 10:40:14 -07:00
Jakub Kicinski	966e505976	udp_tunnel: add the ability to hard-code IANA VXLAN mlx5 has the IANA VXLAN port (4789) hard coded by the device, instead of being added dynamically when tunnels are created. To support this add a workaround flag to struct udp_tunnel_nic_info. Skipping updates for the port is fairly trivial, dumping the hard coded port via ethtool requires some code duplication. The port is not a part of any real table, we dump it in a special table which has no tunnel types supported and only one entry. This is the last known workaround / hack needed to convert all drivers to the new infra. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-08-03 10:13:54 -07:00
Loic Poulain	0b91111fb1	mac80211: Do not report beacon loss if beacon filtering enabled mac80211.h says: Beacon filter support is advertised with the IEEE80211_VIF_BEACON_FILTER interface capability. The driver needs to enable beacon filter support whenever power save is enabled, that is IEEE80211_CONF_PS is set. When power save is enabled, the stack will not check for beacon loss and the driver needs to notify about loss of beacons with ieee80211_beacon_loss(). Some controllers may want to dynamically enable the beacon filter capabilities on power save entry (CONF_PS) and disable it on exit. This is the case for the wcn36xx driver which only supports beacon filtering in PS mode (no CONNECTION_MONITOR support). When the mac80211 beacon monitor timer expires, the beacon filter flag must be checked again in case it as been changed in between (e.g. vif moved to PS mode). Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/1592471863-31402-1-git-send-email-loic.poulain@linaro.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-03 13:02:06 +02:00
Roi Dayan	4203b19c27	netfilter: flowtable: Set offload timeout when adding flow On heavily loaded systems the GC can take time to go over all existing conns and reset their timeout. At that time other calls like from nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as expired. To fix this when we set the offload bit we should also reset the timeout instead of counting on GC to finish first iteration over all conns before the initial timeout. Fixes: `90964016e5` ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-03 12:37:24 +02:00
Roi Dayan	73f9407b3e	netfilter: conntrack: Move nf_ct_offload_timeout to header file To be used by callers from other modules. [ Rename DAY to NF_CT_DAY to avoid possible symbol name pollution issue --Pablo ] Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-03 12:36:47 +02:00
Alexander A. Klimov	94f17c00d6	libceph: replace HTTP links with HTTPS ones Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w\|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. [ idryomov: Do the same for the CRUSH paper and replace ceph.newdream.net with ceph.io. ] Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-08-03 11:05:26 +02:00
Jeff Layton	042f649810	libceph: just have osd_req_op_init() return a pointer The caller can just ignore the return. No need for this wrapper that just casts the other function to void. [ idryomov: argument alignment ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-08-03 11:05:25 +02:00
Johannes Berg	5981fe5b05	mac80211: fix misplaced while instead of if This never was intended to be a 'while' loop, it should've just been an 'if' instead of 'while'. Fix this. I noticed this while applying another patch from Ben that intended to fix a busy loop at this spot. Cc: stable@vger.kernel.org Fixes: `b16798f5b9` ("mac80211: mark station unauthorized before key removal") Reported-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20200803110209.253009ae41ff.I3522aad099392b31d5cf2dcca34cbac7e5832dde@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-03 11:03:14 +02:00
Ilya Dryomov	6e6f0f0116	libceph: dump class and method names on method calls Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-08-03 11:03:01 +02:00
Ilya Dryomov	5133ba8f15	libceph: use target_copy() in send_linger() Instead of copying just oloc, oid and flags, copy the entire linger target. This is more for consistency than anything else, as send_linger() -> submit_request() -> __submit_request() sends the request regardless of what calc_target() says (i.e. both on CALC_TARGET_NO_ACTION and CALC_TARGET_NEED_RESEND). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2020-08-03 11:03:01 +02:00
Miaohe Lin	3b1648f109	nl80211: use eth_zero_addr() to clear mac address Use eth_zero_addr() to clear mac address instead of memset(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/1596273349-24333-1-git-send-email-linmiaohe@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-03 10:56:22 +02:00
Miaohe Lin	47d76e3190	mac80211: use eth_zero_addr() to clear mac address Use eth_zero_addr() to clear mac address instead of memset(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/1596273158-24183-1-git-send-email-linmiaohe@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-03 10:56:04 +02:00
John Crispin	6628d00116	mac8211: fix struct initialisation Sparse showed up with the following error. net/mac80211/agg-rx.c:480:43: warning: Using plain integer as NULL pointer Fixes: `2ab4587675` (mac80211: add support for the ADDBA extension element) Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200803084540.179908-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-03 10:55:17 +02:00
Jouni Malinen	4e56cde15f	mac80211: Handle special status codes in SAE commit SAE authentication has been extended with H2E (IEEE 802.11 REVmd) and PK (WFA) options. Those extensions use special status code values in the SAE commit messages (Authentication frame with transaction sequence number 1) to identify which extension is in use. mac80211 was interpreting those new values as the AP denying authentication and that resulted in failure to complete SAE authentication in some cases. Fix this by adding exceptions for the new status code values 126 and 127. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Link: https://lore.kernel.org/r/20200731183830.18735-1-jouni@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-08-03 10:54:54 +02:00
Pablo Neira Ayuso	77a92189ec	netfilter: nf_tables: report EEXIST on overlaps Replace EBUSY by EEXIST in the following cases: - If the user adds a chain with a different configuration such as different type, hook and priority. - If the user adds a non-base chain that clashes with an existing basechain. - If the user adds a { key : value } mapping element and the key exists but the value differs. - If the device already belongs to an existing flowtable. User describe that this error reporting is confusing: - https://bugzilla.netfilter.org/show_bug.cgi?id=1176 - https://bugzilla.netfilter.org/show_bug.cgi?id=1413 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-02 19:53:45 +02:00
David S. Miller	bd0b33b248	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Resolved kernel/bpf/btf.c using instructions from merge commit `69138b34a7` Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-02 01:02:12 -07:00
Andrii Nakryiko	73b11c2ab0	bpf: Add support for forced LINK_DETACH command Add LINK_DETACH command to force-detach bpf_link without destroying it. It has the same behavior as auto-detaching of bpf_link due to cgroup dying for bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case, bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF program attached to corresponding BPF hook. This functionality allows users with enough access rights to manually force-detach attached bpf_link without killing respective owner process. This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly re-using existing link release handling code. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com	2020-08-01 20:38:28 -07:00
Florian Westphal	78470d9d0d	netfilter: nft_meta: fix iifgroup matching iifgroup matching erroneously checks the output interface. Fixes: `8724e819cc` ("netfilter: nft_meta: move all interface related keys to helper") Reported-by: Demi M. Obenour <demiobenour@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-02 04:10:20 +02:00
Pablo Neira Ayuso	83d9dcba06	netfilter: nf_tables: extended netlink error reporting for expressions This patch extends `36dd1bcc07` ("netfilter: nf_tables: initial support for extended ACK reporting") to include netlink extended error reporting for expressions. This allows userspace to identify what rule expression is triggering the error. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-02 03:23:20 +02:00
Linus Torvalds	ac3a0c8472	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Encap offset calculation is incorrect in esp6, from Sabrina Dubroca. 2) Better parameter validation in pfkey_dump(), from Mark Salyzyn. 3) Fix several clang issues on powerpc in selftests, from Tanner Love. 4) cmsghdr_from_user_compat_to_kern() uses the wrong length, from Al Viro. 5) Out of bounds access in mlx5e driver, from Raed Salem. 6) Fix transfer buffer memleak in lan78xx, from Johan Havold. 7) RCU fixups in rhashtable, from Herbert Xu. 8) Fix ipv6 nexthop refcnt leak, from Xiyu Yang. 9) vxlan FDB dump must be done under RCU, from Ido Schimmel. 10) Fix use after free in mlxsw, from Ido Schimmel. 11) Fix map leak in HASH_OF_MAPS bpf code, from Andrii Nakryiko. 12) Fix bug in mac80211 Tx ack status reporting, from Vasanthakumar Thiagarajan. 13) Fix memory leaks in IPV6_ADDRFORM code, from Cong Wang. 14) Fix bpf program reference count leaks in mlx5 during mlx5e_alloc_rq(), from Xin Xiong. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits) vxlan: fix memleak of fdb rds: Prevent kernel-infoleak in rds_notify_queue_get() net/sched: The error lable position is corrected in ct_init_module net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq net/mlx5e: E-Switch, Specify flow_source for rule with no in_port net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring net/mlx5e: CT: Support restore ipv6 tunnel net: gemini: Fix missing clk_disable_unprepare() in error path of gemini_ethernet_port_probe() ionic: unlock queue mutex in error path atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent net: ethernet: mtk_eth_soc: fix MTU warnings net: nixge: fix potential memory leak in nixge_probe() devlink: ignore -EOPNOTSUPP errors on dumpit rxrpc: Fix race between recvmsg and sendmsg on immediate call failure MAINTAINERS: Replace Thor Thayer as Altera Triple Speed Ethernet maintainer selftests/bpf: fix netdevsim trap_flow_action_cookie read ipv6: fix memory leaks on IPV6_ADDRFORM path net/bpfilter: Initialize pos in __bpfilter_process_sockopt igb: reinit_locked() should be called with rtnl_lock e1000e: continue to init PHY even when failed to disable ULP ...	2020-08-01 16:47:24 -07:00
Florian Westphal	7126bd5c8b	mptcp: fix syncookie build error on UP kernel test robot says: net/mptcp/syncookies.c: In function 'mptcp_join_cookie_init': include/linux/kernel.h:47:38: warning: division by zero [-Wdiv-by-zero] #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) I forgot that spinock_t size is 0 on UP, so ARRAY_SIZE cannot be used. Fixes: `9466a1cceb` ("mptcp: enable JOIN requests even if cookies are in use") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-01 11:52:55 -07:00
Brian Vazquez	8b66a6fd34	fib: fix another fib_rules_ops indirect call wrapper problem It turns out that on commit `41d707b733` ("fib: fix fib_rules_ops indirect calls wrappers") I forgot to include the case when CONFIG_IP_MULTIPLE_TABLES is not set. Fixes: `41d707b733` ("fib: fix fib_rules_ops indirect calls wrappers") Reported-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Brian Vazquez <brianvv@google.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-01 11:47:39 -07:00
Eric Dumazet	0e8642cf36	tcp: fix build fong CONFIG_MPTCP=n Fixes these errors: net/ipv4/syncookies.c: In function 'tcp_get_cookie_sock': net/ipv4/syncookies.c:216:19: error: 'struct tcp_request_sock' has no member named 'drop_req' 216 \| if (tcp_rsk(req)->drop_req) { \| ^~ net/ipv4/syncookies.c: In function 'cookie_tcp_reqsk_alloc': net/ipv4/syncookies.c:289:27: warning: unused variable 'treq' [-Wunused-variable] 289 \| struct tcp_request_sock treq; \| ^~~~ make[3]: [scripts/Makefile.build:280: net/ipv4/syncookies.o] Error 1 make[3]: * Waiting for unfinished jobs.... Fixes: `9466a1cceb` ("mptcp: enable JOIN requests even if cookies are in use") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-01 11:46:29 -07:00
David S. Miller	6f3de75cdf	We have a number of changes * code cleanups and fixups as usual * AQL & internal TXQ improvements from Felix * some mesh 802.1X support bits * some injection improvements from Mathy of KRACK fame, so we'll see what this results in ;-) * some more initial S1G supports bits, this time (some of?) the userspace APIs -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl8kTFUACgkQB8qZga/f l8SQdQ/+MfiHePQzCFcnlXCzBa6wwaPQz5XbbWMS+1yVN5WECHomtF8Hb9YA5ITd Iw6HXcDzYk1bt1EqIPAQNQ0t2VfEvC+HtQKxL1LAXKYrlpD4JoL7hU2FK6CPA6/e h8qwrrhOeexeen+0r6LEslxFuVKSbXuBEIszsWkQNmUOoCFWdvbyD7kPNfoQ/wcu 1TYR51Z7TiiyadptP5R0PSxlid1N1qSbnQ9Rq4ow4lfEwCfNb0ksv3eb6paQOeGX b5MIDpZ9FHVhWAMjzt7b7KgbmPreoHbZ+bSv0YgR+g32yLIpcH3wRuPH/eTIV1uG xTQtvHjnOtF7HTJzAzaCY4RH0ozf0rwTVwtcUxl/qVcV/JYXLKXnTOrAidj06gTr Ic5eaUaFALRX82wnmQbjkIezolmdQXom2VEg1yPT1azXnhi/bWQlDW3XcSMAfd8o E+HiTaGo8jD+tll1kvh5sLokecnmnKFgo5dPqlVOQ5qbJKOiLC/3g7/yjSdCKauG COQN0b1t+lHuvu8I2tdp+S1YI1sCES6OdHGKdQQiem5D66mUBaiYN+YuqGnvRF75 8ZU9H5Mn9YRT2n6KEGsVfSnrOeqN15ZTRKepkwYgRzPtAGmVY+iNWwhE6I/clQgQ pt9jzq4jZsFMSO1flBYrFUy+fB4IYYki+VBSk5RFq3Yr8ElN4J0= =FR6T -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2020-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a number of changes * code cleanups and fixups as usual * AQL & internal TXQ improvements from Felix * some mesh 802.1X support bits * some injection improvements from Mathy of KRACK fame, so we'll see what this results in ;-) * some more initial S1G supports bits, this time (some of?) the userspace APIs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 18:51:40 -07:00
Roopa Prabhu	829eb208e8	rtnetlink: add support for protodown reason netdev protodown is a mechanism that allows protocols to hold an interface down. It was initially introduced in the kernel to hold links down by a multihoming protocol. There was also an attempt to introduce protodown reason at the time but was rejected. protodown and protodown reason is supported by almost every switching and routing platform. It was ok for a while to live without a protodown reason. But, its become more critical now given more than one protocol may need to keep a link down on a system at the same time. eg: vrrp peer node, port security, multihoming protocol. Its common for Network operators and protocol developers to look for such a reason on a networking box (Its also known as errDisable by most networking operators) This patch adds support for link protodown reason attribute. There are two ways to maintain protodown reasons. (a) enumerate every possible reason code in kernel - A protocol developer has to make a request and have that appear in a certain kernel version (b) provide the bits in the kernel, and allow user-space (sysadmin or NOS distributions) to manage the bit-to-reasonname map. - This makes extending reason codes easier (kind of like the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/) This patch takes approach (b). a few things about the patch: - It treats the protodown reason bits as counter to indicate active protodown users - Since protodown attribute is already an exposed UAPI, the reason is not enforced on a protodown set. Its a no-op if not used. the patch follows the below algorithm: - presence of reason bits set indicates protodown is in use - user can set protodown and protodown reason in a single or multiple setlink operations - setlink operation to clear protodown, will return -EBUSY if there are active protodown reason bits - reason is not included in link dumps if not used example with patched iproute2: $cat /etc/iproute2/protodown_reasons.d/r.conf 0 mlag 1 evpn 2 vrrp 3 psecurity $ip link set dev vxlan0 protodown on protodown_reason vrrp on $ip link set dev vxlan0 protodown_reason mlag on $ip link show 14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp> $ip link set dev vxlan0 protodown_reason mlag off $ip link set dev vxlan0 protodown off protodown_reason vrrp off Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 18:49:16 -07:00
David S. Miller	69138b34a7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2020-07-31 The following pull-request contains BPF updates for your net tree. We've added 5 non-merge commits during the last 21 day(s) which contain a total of 5 files changed, 126 insertions(+), 18 deletions(-). The main changes are: 1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko. 2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no btf_vmlinux is available, from Peilin Ye. 3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig. 4) Fix a cgroup sockopt verifier test by specifying expected attach type, from Jean-Philippe Brucker. Note that when net gets merged into net-next later on, there is a small merge conflict in kernel/bpf/btf.c between commit `5b801dfb7f` ("bpf: Fix NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree and commit `138b9a0511` ("bpf: Remove btf_id helpers resolving") from the net-next tree. Resolve as follows: remove the old hunk with the __btf_resolve_helper_id() function. Change the btf_resolve_helper_id() so it actually tests for a NULL btf_vmlinux and bails out: int btf_resolve_helper_id(struct bpf_verifier_log log, const struct bpf_func_proto fn, int arg) { int id; if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID \|\| !btf_vmlinux) return -EINVAL; id = fn->btf_id[arg]; if (!id \|\| id > btf_vmlinux->nr_types) return -EINVAL; return id; } Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in the loop with regards to merge conflict resolution). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 17:19:47 -07:00
David S. Miller	8d46215a1f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-07-31 1) Fix policy matching with mark and mask on userspace interfaces. From Xin Long. 2) Several fixes for the new ESP in TCP encapsulation. From Sabrina Dubroca. 3) Fix crash when the hold queue is used. The assumption that xdst->path and dst->child are not a NULL pointer only if dst->xfrm is not a NULL pointer is true with the exception of using the hold queue. Fix this by checking for hold queue usage before dereferencing xdst->path or dst->child. 4) Validate pfkey_dump parameter before sending them. From Mark Salyzyn. 5) Fix the location of the transport header with ESP in UDPv6 encapsulation. From Sabrina Dubroca. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 17:10:53 -07:00
Yousuk Seung	48040793fa	tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports the earliest departure time(EDT) of the timestamped skb. By tracking EDT values of the skb from different timestamps, we can observe when and how much the value changed. This allows to measure the precise delay injected on the sender host e.g. by a bpf-base throttler. Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 17:00:44 -07:00
Florian Westphal	9466a1cceb	mptcp: enable JOIN requests even if cookies are in use JOIN requests do not work in syncookie mode -- for HMAC validation, the peers nonce and the mptcp token (to obtain the desired connection socket the join is for) are required, but this information is only present in the initial syn. So either we need to drop all JOIN requests once a listening socket enters syncookie mode, or we need to store enough state to reconstruct the request socket later. This adds a state table (1024 entries) to store the data present in the MP_JOIN syn request and the random nonce used for the cookie syn/ack. When a MP_JOIN ACK passed cookie validation, the table is consulted to rebuild the request socket from it. An alternate approach would be to "cancel" syn-cookie mode and force MP_JOIN to always use a syn queue entry. However, doing so brings the backlog over the configured queue limit. v2: use req->syncookie, not (removed) want_cookie arg Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:55:32 -07:00
Florian Westphal	6fc8c827dd	tcp: syncookies: create mptcp request socket for ACK cookies with MPTCP option If SYN packet contains MP_CAPABLE option, keep it enabled. Syncokie validation and cookie-based socket creation is changed to instantiate an mptcp request sockets if the ACK contains an MPTCP connection request. Rather than extend both cookie_v4/6_check, add a common helper to create the (mp)tcp request socket. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:55:32 -07:00
Florian Westphal	c83a47e50d	mptcp: subflow: add mptcp_subflow_init_cookie_req helper Will be used to initialize the mptcp request socket when a MP_CAPABLE request was handled in syncookie mode, i.e. when a TCP ACK containing a MP_CAPABLE option is a valid syncookie value. Normally (non-cookie case), MPTCP will generate a unique 32 bit connection ID and stores it in the MPTCP token storage to be able to retrieve the mptcp socket for subflow joining. In syncookie case, we do not want to store any state, so just generate the unique ID and use it in the reply. This means there is a small window where another connection could generate the same token. When Cookie ACK comes back, we check that the token has not been registered in the mean time. If it was, the connection needs to fall back to TCP. Changes in v2: - use req->syncookie instead of passing 'want_cookie' arg to ->init_req() (Eric Dumazet) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:55:32 -07:00
Florian Westphal	08b8d08098	mptcp: rename and export mptcp_subflow_request_sock_ops syncookie code path needs to create an mptcp request sock. Prepare for this and add mptcp prefix plus needed export of ops struct. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:55:32 -07:00
Florian Westphal	78d8b7bc4b	mptcp: subflow: split subflow_init_req When syncookie support is added, we will need to add a variant of subflow_init_req() helper. It will do almost same thing except that it will not compute/add a token to the mptcp token tree. To avoid excess copy&paste, this commit splits away part of the code into a new helper, __subflow_init_req, that can then be re-used from the 'no insert' function added in a followup change. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:55:32 -07:00
Florian Westphal	535fb8152f	mptcp: token: move retry to caller Once syncookie support is added, no state will be stored anymore when the syn/ack is generated in syncookie mode. When the ACK comes back, the generated key will be taken from the TCP ACK, the token is re-generated and inserted into the token tree. This means we can't retry with a new key when the token is already taken in the syncookie case. Therefore, move the retry logic to the caller to prepare for syncookie support in mptcp. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:55:32 -07:00
Florian Westphal	f8ace8d915	tcp: rename request_sock cookie_ts bit to syncookie Nowadays output function has a 'synack_type' argument that tells us when the syn/ack is emitted via syncookies. The request already tells us when timestamps are supported, so check both to detect special timestamp for tcp option encoding is needed. We could remove cookie_ts altogether, but a followup patch would otherwise need to adjust function signatures to pass 'want_cookie' to mptcp core. This way, the 'existing' bit can be used. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:55:32 -07:00
Peilin Ye	bbc8a99e95	rds: Prevent kernel-infoleak in rds_notify_queue_get() rds_notify_queue_get() is potentially copying uninitialized kernel stack memory to userspace since the compiler may leave a 4-byte hole at the end of `cmsg`. In 2016 we tried to fix this issue by doing `= { 0 };` on `cmsg`, which unfortunately does not always initialize that 4-byte hole. Fix it by using memset() instead. Cc: stable@vger.kernel.org Fixes: `f037590fff` ("rds: fix a leak of kernel memory") Fixes: `bdbe6fbc6a` ("RDS: recv.c") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:52:48 -07:00
Gustavo A. R. Silva	77aec5e1c4	net/sched: cls_u32: Use struct_size() helper Make use of the struct_size() helper, in multiple places, instead of an open-coded version in order to avoid any potential type mistakes and protect against potential integer overflows. Also, remove unnecessary object identifier size. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:50:39 -07:00
Andy Shevchenko	cc0b065fd5	hsr: Use %pM format specifier for MAC addresses Convert to %pM instead of using custom code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:46:26 -07:00
Miaohe Lin	9fc95f50ee	net: Pass NULL to skb_network_protocol() when we don't care about vlan depth When we don't care about vlan depth, we could pass NULL instead of the address of a unused local variable to skb_network_protocol() as a param. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:42:30 -07:00
liujian	8c5c51f5ca	net/sched: The error lable position is corrected in ct_init_module Exchange the positions of the err_tbl_init and err_register labels in ct_init_module function. Fixes: `c34b961a24` ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: liujian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:35:39 -07:00
David S. Miller	83a33b2487	bluetooth: sco: Fix sockptr reference. net/bluetooth/sco.c: In function ‘sco_sock_setsockopt’: net/bluetooth/sco.c:862:3: error: cannot convert to a pointer type 862 \| if (get_user(opt, (u32 __user *)optval)) { \| ^~ Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 16:15:03 -07:00
David S. Miller	4bb540dbe4	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-07-31 Here's the main bluetooth-next pull request for 5.9: - Fix firmware filenames for Marvell chipsets - Several suspend-related fixes - Addedd mgmt commands for runtime configuration - Multiple fixes for Qualcomm-based controllers - Add new monitoring feature for mgmt - Fix handling of legacy cipher (E4) together with security level 4 - Add support for Realtek 8822CE controller - Fix issues with Chinese controllers using fake VID/PID values - Multiple other smaller fixes & improvements ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-31 15:11:52 -07:00
Florian Westphal	ffe8923f10	netfilter: nft_compat: make sure xtables destructors have run Pablo Neira found that after recent update of xt_IDLETIMER the iptables-nft tests sometimes show an error. He tracked this down to the delayed cleanup used by nf_tables core: del rule (transaction A) add rule (transaction B) Its possible that by time transaction B (both in same netns) runs, the xt target destructor has not been invoked yet. For native nft expressions this is no problem because all expressions that have such side effects make sure these are handled from the commit phase, rather than async cleanup. For nft_compat however this isn't true. Instead of forcing synchronous behaviour for nft_compat, keep track of the number of outstanding destructor calls. When we attempt to create a new expression, flush the cleanup worker to make sure destructors have completed. With lots of help from Pablo Neira. Reported-by: Pablo Neira Ayso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-31 19:28:00 +02:00
Herbert Xu	075f77324f	Bluetooth: Remove CRYPTO_ALG_INTERNAL flag The flag CRYPTO_ALG_INTERNAL is not meant to be used outside of the Crypto API. It isn't needed here anyway. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-07-31 16:42:04 +03:00
Marcel Holtmann	79bf118957	Bluetooth: Increment management interface revision Increment the mgmt revision due to the recently added new commands. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-07-31 16:41:09 +03:00
Johannes Berg	c8ad010665	mac80211: warn only once in check_sdata_in_driver() at each caller Ben Greear has repeatedly reported in the past (for a few years probably) that this triggers repeatedly in certain scenarios. Make this a macro so that each callsite can trigger the warning only once - that will still give us an idea of what's going on and what paths can reach it, but avoids being too noisy. Link: https://lore.kernel.org/r/20200730155212.06fd3a95dbfb.I0b16829aabfaf5f642bce401502a29d16e2dd444@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:02 +02:00
Chung-Hsien Hsu	f96622749a	nl80211: support 4-way handshake offloading for WPA/WPA2-PSK in AP mode Let drivers advertise support for AP-mode WPA/WPA2-PSK 4-way handshake offloading with a new NL80211_EXT_FEATURE_4WAY_HANDSHAKE_AP_PSK flag. Extend use of NL80211_ATTR_PMK attribute indicating it might be passed as part of NL80211_CMD_START_AP command, and contain the PSK (which is the PMK, hence the name). The driver is assumed to handle the 4-way handshake by itself in this case, instead of relying on userspace. Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com> Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com> Link: https://lore.kernel.org/r/20200623134938.39997-2-chi-hsien.lin@cypress.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:02 +02:00
Johannes Berg	75e6b594bb	cfg80211: invert HE BSS color 'disabled' to 'enabled' This is in fact 'disabled' in the spec, but there it's in a place where that actually makes sense. In our internal data structures, it doesn't really make sense, and in fact the previous commit just fixed a bug in that area. Make this safer by inverting the polarity from 'disabled' to 'enabled'. Link: https://lore.kernel.org/r/20200730130051.5d8399545bd9.Ie62fdcd1a6cd9c969315bc124084a494ca6c8df3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:02 +02:00
Felix Fietkau	c5d1686b31	mac80211: add a function for running rx without passing skbs to the stack This can be used to run mac80211 rx processing on a batch of frames in NAPI poll before passing them to the network stack in a large batch. This can improve icache footprint, or it can be used to pass frames via netif_receive_skb_list. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200726110611.46886-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:01 +02:00
Mathy Vanhoef	cb17ed29a7	mac80211: parse radiotap header when selecting Tx queue Already parse the radiotap header in ieee80211_monitor_select_queue. In a subsequent commit this will allow us to add a radiotap flag that influences the queue on which injected packets will be sent. This also fixes the incomplete validation of the injected frame in ieee80211_monitor_select_queue: currently an out of bounds memory access may occur in in the called function ieee80211_select_queue_80211 if the 802.11 header is too small. Note that in ieee80211_monitor_start_xmit the radiotap header is parsed again, which is necessairy because ieee80211_monitor_select_queue is not always called beforehand. Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20200723100153.31631-6-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:01 +02:00
Mathy Vanhoef	08aca29aa8	mac80211: remove unused flags argument in transmit functions The flags argument in transmit functions is no longer being used and can be removed. Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20200723100153.31631-5-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:01 +02:00
Mathy Vanhoef	2b3dab1353	mac80211: use same flag everywhere to avoid sequence number overwrite Use the IEEE80211_TX_CTRL_NO_SEQNO flag in ieee80211_tx_info to mark probe requests whose sequence number must not be overwritten. This provides consistency with the radiotap flag that can be set to indicate that the sequence number of an injected frame should not be overwritten. Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20200723100153.31631-4-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:01 +02:00
Mathy Vanhoef	29c3e95f79	mac80211: do not overwrite the sequence number if requested Check if the Tx control flag is set to prevent sequence number overwrites, and if so, do not assign a new sequence number to the transmitted frame. Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20200723100153.31631-3-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:01 +02:00
Mathy Vanhoef	e02281e7a5	mac80211: add radiotap flag to prevent sequence number overwrite The radiotap specification contains a flag to indicate that the sequence number of an injected frame should not be overwritten. Parse this flag and define and set a corresponding Tx control flag. Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20200723100153.31631-2-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:00 +02:00
Mathy Vanhoef	1df2bdba52	mac80211: never drop injected frames even if normally not allowed In ieee80211_tx_dequeue there is a check to see if the dequeued frame is allowed in the current state. Injected frames that are normally not allowed are being be dropped here. Fix this by checking if a frame was injected and if so always allowing it. Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20200723100153.31631-1-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:27:00 +02:00
P Praneesh	322cd27c06	cfg80211/mac80211: avoid bss color setting in non-HE modes Adding bss-color configuration is only valid in HE mode. Earlier we have enabled it by default, irrespective of capabilities/mode. Fix that. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Signed-off-by: P Praneesh <ppranees@codeaurora.org> Link: https://lore.kernel.org/r/1594262781-21444-1-git-send-email-ppranees@codeaurora.org [fix up commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:25:12 +02:00
Felix Fietkau	180ac48ee6	mac80211: calculate skb hash early when using itxq This avoids flow separation issues when using software encryption. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200726130947.88145-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:24 +02:00
Felix Fietkau	3ff901cb5d	mac80211: improve AQL tx airtime estimation AQL does not take into account that most HT/VHT/HE traffic is A-MPDU aggregated. Because of that, the per-packet airtime overhead is vastly overestimated. Improve it by assuming an average aggregation length of 16 for non-legacy traffic if not using the VO AC queue. This should improve performance with high data rates, especially with multiple stations Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200724182816.18678-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:24 +02:00
Markus Theil	1303a51c24	cfg80211/mac80211: add connected to auth server to station info This patch adds the necessary bits to later query the auth server flag for every peer from iw. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200611140238.427461-2-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:24 +02:00
Markus Theil	184eebe664	cfg80211/mac80211: add connected to auth server to meshconf Besides information about num of peerings and gate connectivity, the mesh formation byte also contains a flag for authentication server connectivity, that currently cannot be set in the mesh conf. This patch adds this capability, which is necessary to implement 802.1X authentication in mesh mode. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200611140238.427461-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:24 +02:00
Linus Lüssing	e3718a6114	cfg80211/mac80211: add mesh_param "mesh_nolearn" to skip path discovery Currently, before being able to forward a packet between two 802.11s nodes, both a PLINK handshake is performed upon receiving a beacon and then later a PREQ/PREP exchange for path discovery is performed on demand upon receiving a data frame to forward. When running a mesh protocol on top of an 802.11s interface, like batman-adv, we do not need the multi-hop mesh routing capabilities of 802.11s and usually set mesh_fwding=0. However, even with mesh_fwding=0 the PREQ/PREP path discovery is still performed on demand. Even though in this scenario the next hop PREQ/PREP will determine is always the direct 11s neighbor node. The new mesh_nolearn parameter allows to skip the PREQ/PREP exchange in this scenario, leading to a reduced delay, reduced packet buffering and simplifies HWMP in general. mesh_nolearn is still rather conservative in that if the packet destination is not a direct 11s neighbor, it will fall back to PREQ/PREP path discovery. For normal, multi-hop 802.11s mesh routing it is usually not advisable to enable mesh_nolearn as a transmission to a direct but distant neighbor might be worse than reaching that same node via a more robust / higher throughput etc. multi-hop path. Cc: Sven Eckelmann <sven@narfation.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Link: https://lore.kernel.org/r/20200617073034.26149-1-linus.luessing@c0d3.blue [fix nl80211 policy to range 0/1 only] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:23 +02:00
Emmanuel Grumbach	2f1805ea20	cfg80211: allow the low level driver to flush the BSS table The low level driver adds its own opaque information in the BSS table in the cfg80211_bss structure. The low level driver may need to signal that this information is no longer relevant and needs to be recreated. Add an API to allow the low level driver to do that. iwlwifi needs this because it keeps there an information about the firmware's internal clock. This is kept in mac80211's struct ieee80211_bss::sync_device_ts. This information is populated while we scan, we add the internal firmware's clock to each beacon which allows us to program the firmware correctly after association so that it'll know when (in terms of its internal clock) the DTIM and TBTT will happen. When the firmware is reset this internal clock is reset as well and ieee80211_bss::sync_device_ts is no longer accurate. iwlwifi will call this new API any time the firmware is started. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20200625111524.3992-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:23 +02:00
Gustavo A. R. Silva	fc0561dc6a	mac80211: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200707204548.GA9320@embeddedor Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:23 +02:00
Christophe JAILLET	504776be46	nl80211: Simplify error handling path in 'nl80211_trigger_scan()' Re-write the end of 'nl80211_trigger_scan()' with a more standard, easy to understand and future proof version. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20200712173551.274448-1-christophe.jaillet@wanadoo.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:23 +02:00
Christophe JAILLET	8328685682	nl80211: Remove a misleading label in 'nl80211_trigger_scan()' Since commit `5fe231e873` ("cfg80211: vastly simplify locking"), the 'unlock' label at the end of 'nl80211_trigger_scan()' is useless and misleading, because nothing is unlocked there. Direct return can be used instead of 'err = -<error code>; goto unlock;' construction. Remove this label and simplify code accordingly. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20200712173539.274395-1-christophe.jaillet@wanadoo.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:23 +02:00
Julian Squires	9c167b2ddc	cfg80211: allow vendor dumpit to terminate by returning 0 nl80211 vendor netlink dumpit, like netlink_callback->dump, should signal successful completion by returning 0. Currently, that will just cause dumpit to be called again, possibly many times until an error occurs. Since skb->len is never going to be 0 by the time dumpit is called, the only way for dumpit to signal completion is by returning an error. If it returns a positive value, the current message is cancelled, but that positive value is returned and nl80211_vendor_cmd_dump gets called again. Fix that by passing a return value of 0 through. Signed-off-by: Julian Squires <julian@cipht.net> Link: https://lore.kernel.org/r/20200720145033.401307-1-julian@cipht.net [reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:22 +02:00
Colin Ian King	f2a0c18759	mac80211: remove the need for variable rates_idx Currently rates_idx is being initialized with the value -1 and this value is never read so the initialization is redundant and can be removed. The next time the variable is used it is assigned a value that is returned a few statements later. Just return i - 1 and remove the need for rates_idx. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200722153830.959010-1-colin.king@canonical.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:22 +02:00
Thomas Pedersen	df78a0c0b6	nl80211: S1G band and channel definitions Gives drivers the definitions needed to advertise support for S1G bands. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200602062247.23212-1-thomas@adapt-ip.com Link: https://lore.kernel.org/r/20200731055636.795173-1-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-31 09:24:13 +02:00
Alain Michaud	9a9373ffc7	Bluetooth: use the proper scan params when conn is pending When an LE connection is requested and an RPA update is needed via hci_connect_le_scan, the default scanning parameters are used rather than the connect parameters. This leads to significant delays in the connection establishment process when using lower duty cycle scanning parameters. The patch simply looks at the pended connection list when trying to determine which scanning parameters should be used. Before: < HCI Command: LE Set Extended Scan Parameters (0x08\|0x0041) plen 8 #378 [hci0] 1659.247156 Own address type: Public (0x00) Filter policy: Ignore not in white list (0x01) PHYs: 0x01 Entry 0: LE 1M Type: Passive (0x00) Interval: 367.500 msec (0x024c) Window: 37.500 msec (0x003c) After: < HCI Command: LE Set Extended Scan Parameters (0x08\|0x0041) plen 8 #39 [hci0] 7.422109 Own address type: Public (0x00) Filter policy: Ignore not in white list (0x01) PHYs: 0x01 Entry 0: LE 1M Type: Passive (0x00) Interval: 60.000 msec (0x0060) Window: 60.000 msec (0x0060) Signed-off-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Yu Liu <yudiliu@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-31 08:28:23 +02:00
David S. Miller	d0c3c75d5d	A couple of more changes: * remove a warning that can trigger in certain races * check a function pointer before using it * check before adding 6 GHz to avoid a warning in mesh * fix two memory leaks in mesh * fix a TX status bug leading to a memory leak -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl8i4fUACgkQB8qZga/f l8TryA/+NVsEyCUgXRH4ZanXAGpC3QB7925f2THbgkOgdlKwR1iKGtmwZVIQKwl4 DdxUql/6C02l4fW+OBz23ohKkb9MixpSwiYjHKeqlC5EeNvmzLujG0A/X+Qvj/j0 iuttFNJxrwQ7ZCzqKbsufRL9BfWwaIVYB62HFd4gR/FBnPwNhZPlHLJMyzT54eIC ZdDQ8ynqij3uB2BS31J2l3clwO7iOg1ek81DsTgLp3nCGqIDwJzYMRqrVh5+9I1j yAwE9TCBxv8eEGJIFwpF+Yl0ffPvYXT0DjjeRQMOG7lIDa25orbIT8zHSt83y7My 6Rq+8Paw8Z2bQO0Z7EN3ys4fgqMqY1TTVzrTpF0XQaW+CnEdc3mvRN9UGDiViNcP JO9QH7RlCV+CQld/I5F/b3kbpt9DP+TFQyVaE62+OhgG5TCf7upJ7XmHK9IpoX7A cSjJxV8isWLzILTWLiHFXEkyCbdIYeq8PSvuIvJcxOH8hIQ+Si8Rv01K2Wvp0QpM mReW4MYbpHPju88aBl82Tl1fUwfQNfaMp41iN+b2zBS4xB4+zmTFnpq0lIS7azT/ 8Zx+9mlJKUd+YTTKcMvUrwJ2dH8b7w5bEllf5WAEGbLubo+dhgsZ+6PuLRrw/8J7 ujINEQTT+UarZWgevyCG6j/c229UiSfaQNCeoADUAhW4wMCPm3E= =xoMY -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2020-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of more changes: * remove a warning that can trigger in certain races * check a function pointer before using it * check before adding 6 GHz to avoid a warning in mesh * fix two memory leaks in mesh * fix a TX status bug leading to a memory leak ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 17:47:34 -07:00
Gustavo A. R. Silva	a0d716d8e4	net/sched: act_pedit: Use flex_array_size() helper in memcpy() Make use of the flex_array_size() helper to calculate the size of a flexible array member within an enclosing structure. This helper offers defense-in-depth against potential integer overflows, while at the same time makes it explicitly clear that we are dealing with a flexible array member. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 17:38:47 -07:00
Jakub Sitnicki	c64c9c282a	udp, bpf: Ignore connections in reuseport group after BPF sk lookup When BPF sk lookup invokes reuseport handling for the selected socket, it should ignore the fact that reuseport group can contain connected UDP sockets. With BPF sk lookup this is not relevant as we are not scoring sockets to find the best match, which might be a connected UDP socket. Fix it by unconditionally accepting the socket selected by reuseport. This fixes the following two failures reported by test_progs. # ./test_progs -t sk_lookup ... #73/14 UDP IPv4 redir and reuseport with conns:FAIL ... #73/20 UDP IPv6 redir and reuseport with conns:FAIL ... Fixes: `a57066b1a0` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200726120228.1414348-1-jakub@cloudflare.com	2020-07-31 02:00:48 +02:00
Jakub Kicinski	82274d0755	devlink: ignore -EOPNOTSUPP errors on dumpit Number of .dumpit functions try to ignore -EOPNOTSUPP errors. Recent change missed that, and started reporting all errors but -EMSGSIZE back from dumps. This leads to situation like this: $ devlink dev info devlink answers: Operation not supported Dump should not report an error just because the last device to be queried could not provide an answer. To fix this and avoid similar confusion make sure we clear err properly, and not leave it set to an error if we don't terminate the iteration. Fixes: `c62c2cfb80` ("net: devlink: don't ignore errors during dumpit") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:51:53 -07:00
David Howells	65550098c1	rxrpc: Fix race between recvmsg and sendmsg on immediate call failure There's a race between rxrpc_sendmsg setting up a call, but then failing to send anything on it due to an error, and recvmsg() seeing the call completion occur and trying to return the state to the user. An assertion fails in rxrpc_recvmsg() because the call has already been released from the socket and is about to be released again as recvmsg deals with it. (The recvmsg_q queue on the socket holds a ref, so there's no problem with use-after-free.) We also have to be careful not to end up reporting an error twice, in such a way that both returns indicate to userspace that the user ID supplied with the call is no longer in use - which could cause the client to malfunction if it recycles the user ID fast enough. Fix this by the following means: (1) When sendmsg() creates a call after the point that the call has been successfully added to the socket, don't return any errors through sendmsg(), but rather complete the call and let recvmsg() retrieve them. Make sendmsg() return 0 at this point. Further calls to sendmsg() for that call will fail with ESHUTDOWN. Note that at this point, we haven't send any packets yet, so the server doesn't yet know about the call. (2) If sendmsg() returns an error when it was expected to create a new call, it means that the user ID wasn't used. (3) Mark the call disconnected before marking it completed to prevent an oops in rxrpc_release_call(). (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate that the user ID is no longer known by the kernel. An oops like the following is produced: kernel BUG at net/rxrpc/recvmsg.c:605! ... RIP: 0010:rxrpc_recvmsg+0x256/0x5ae ... Call Trace: ? __init_waitqueue_head+0x2f/0x2f ____sys_recvmsg+0x8a/0x148 ? import_iovec+0x69/0x9c ? copy_msghdr_from_user+0x5c/0x86 ___sys_recvmsg+0x72/0xaa ? __fget_files+0x22/0x57 ? __fget_light+0x46/0x51 ? fdget+0x9/0x1b do_recvmmsg+0x15e/0x232 ? _raw_spin_unlock+0xa/0xb ? vtime_delta+0xf/0x25 __x64_sys_recvmmsg+0x2c/0x2f do_syscall_64+0x4c/0x78 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `357f5ef646` ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()") Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:50:20 -07:00
Tom Parkin	340bb1ac45	l2tp: improve API documentation in l2tp_core.h * Improve the description of the key l2tp subsystem data structures. * Add high-level description of the main APIs for interacting with l2tp core. * Add documentation for the l2tp netlink session command callbacks. * Document the session pseudowire callbacks. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:45:31 -07:00
Tom Parkin	ca7885dbcd	l2tp: tweak exports for l2tp_recv_common and l2tp_ioctl All of the l2tp subsystem's exported symbols are exported using EXPORT_SYMBOL_GPL, except for l2tp_recv_common and l2tp_ioctl. These functions alone are not useful without the rest of the l2tp infrastructure, so there's no practical benefit to these symbols using a different export policy. Change these exports to use EXPORT_SYMBOL_GPL for consistency with the rest of l2tp. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:45:31 -07:00
Tom Parkin	2dedab6ff5	l2tp: remove build_header callback in struct l2tp_session The structure of an L2TP data packet header varies depending on the version of the L2TP protocol being used. struct l2tp_session used to have a build_header callback to abstract this difference away. It's clearer to simply choose the correct function to use when building the data packet (and we save on the function pointer in the session structure). This approach does mean dereferencing the parent tunnel structure in order to determine the tunnel version, but we're doing that in the transmit path in any case. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:45:31 -07:00
Tom Parkin	628703f59d	l2tp: return void from l2tp_session_delete l2tp_session_delete is used to schedule a session instance for deletion. The function itself always returns zero, and none of its direct callers check its return value, so have the function return void. This change de-facto changes the l2tp netlink session_delete callback prototype since all pseudowires currently use l2tp_session_delete for their implementation of that operation. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:45:31 -07:00
Tom Parkin	52016e259b	l2tp: don't export tunnel and session free functions Tunnel and session instances are reference counted, and shouldn't be directly freed by pseudowire code. Rather than exporting l2tp_tunnel_free and l2tp_session_free, make them private to l2tp_core.c, and export the refcount functions instead. In order to do this, the refcount functions cannot be declared as inline. Since the codepaths which take and drop tunnel and session references are not directly in the datapath this shouldn't cause performance issues. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:45:31 -07:00
Tom Parkin	b2aecfe8e4	l2tp: don't export __l2tp_session_unhash When __l2tp_session_unhash was first added it was used outside of l2tp_core.c, but that's no longer the case. As such, there's no longer a need to export the function. Make it private inside l2tp_core.c, and relocate it to avoid having to declare the function prototype in l2tp_core.h. Since the function is no longer used outside l2tp_core.c, remove the "__" prefix since we don't need to indicate anything special about its expected use to callers. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:45:31 -07:00
Cong Wang	8c0de6e96c	ipv6: fix memory leaks on IPV6_ADDRFORM path IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket to IPv4, particularly struct ipv6_ac_socklist. Similar to struct ipv6_mc_socklist, we should just close it on this path. This bug can be easily reproduced with the following C program: #include <stdio.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <arpa/inet.h> int main() { int s, value; struct sockaddr_in6 addr; struct ipv6_mreq m6; s = socket(AF_INET6, SOCK_DGRAM, 0); addr.sin6_family = AF_INET6; addr.sin6_port = htons(5000); inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr); connect(s, (struct sockaddr *)&addr, sizeof(addr)); inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr); m6.ipv6mr_interface = 5; setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6)); value = AF_INET; setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value)); close(s); return 0; } Reported-by: ch3332xr@gmail.com Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 16:30:55 -07:00
Christoph Hellwig	4f010246b4	net/bpfilter: Initialize pos in __bpfilter_process_sockopt __bpfilter_process_sockopt never initialized the pos variable passed to the pipe write. This has been mostly harmless in the past as pipes ignore the offset, but the switch to kernel_write now verified the position, which can lead to a failure depending on the exact stack initialization pattern. Initialize the variable to zero to make rw_verify_area happy. Fixes: `6955a76fbc` ("bpfilter: switch to kernel_write") Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Reported-by: Rodrigo Madera <rodrigo.madera@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Rodrigo Madera <rodrigo.madera@gmail.com> Tested-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/bpf/20200730160900.187157-1-hch@lst.de	2020-07-31 01:07:32 +02:00
Stanislav Fomichev	f7c6cb1d97	bpf: Expose socket storage to BPF_PROG_TYPE_CGROUP_SOCK This lets us use socket storage from the following hooks: * BPF_CGROUP_INET_SOCK_CREATE * BPF_CGROUP_INET_SOCK_RELEASE * BPF_CGROUP_INET4_POST_BIND * BPF_CGROUP_INET6_POST_BIND Using existing 'bpf_sk_storage_get_proto' doesn't work because second argument is ARG_PTR_TO_SOCKET. Even though BPF_PROG_TYPE_CGROUP_SOCK hooks operate on 'struct bpf_sock', the verifier still considers it as a PTR_TO_CTX. That's why I'm adding another 'bpf_sk_storage_get_cg_sock_proto' definition strictly for BPF_PROG_TYPE_CGROUP_SOCK which accepts ARG_PTR_TO_CTX which is really 'struct sock' for this program type. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200729003104.1280813-1-sdf@google.com	2020-07-31 00:43:49 +02:00
Ingo Molnar	c1cc4784ce	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the v5.9 RCU bits from Paul E. McKenney: - Documentation updates - Miscellaneous fixes - kfree_rcu updates - RCU tasks updates - Read-side scalability tests - SRCU updates - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-07-31 00:15:53 +02:00
David S. Miller	3c2d19cb8d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-07-30 Please note that I did the first time now --no-ff merges of my testing branch into the master branch to include the [PATCH 0/n] message of a patchset. Please let me know if this is desirable, or if I should do it any different. 1) Introduce a oseq-may-wrap flag to disable anti-replay protection for manually distributed ICVs as suggested in RFC 4303. From Petr Vaněk. 2) Patchset to fully support IPCOMP for vti4, vti6 and xfrm interfaces. From Xin Long. 3) Switch from a linear list to a hash list for xfrm interface lookups. From Eyal Birger. 4) Fixes to not register one xfrm(6)_tunnel object twice. From Xin Long. 5) Fix two compile errors that were introduced with the IPCOMP support for vti and xfrm interfaces. Also from Xin Long. 6) Make the policy hold queue work with VTI. This was forgotten when VTI was implemented. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 14:39:31 -07:00
Alain Michaud	a2ec905d1e	Bluetooth: fix kernel oops in store_pending_adv_report Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: 0000 [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:ffffbc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 000000000000001b RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9903e76c100f RDI: ffff9904289d4b28 RBP: ffffbc6a40493c70 R08: 0000000093570362 R09: 0000000000000000 R10: 0000000000000000 R11: ffff9904344eae38 R12: ffff9904289d4000 R13: 0000000000000000 R14: 00000000ffffffa3 R15: ffff9903e76c100f FS: 0000000000000000(0000) GS:ffff990434580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007feed125a000 CR3: 00000001b860a003 CR4: 00000000003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: `c215e9397b` ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen <theflow@google.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Balakrishna Godavarthi <bgodavar@codeaurora.org> Signed-off-by: Alain Michaud <alainm@chromium.org> Tested-by: Sonny Sasaka <sonnysasaka@chromium.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-07-30 13:54:04 -07:00
Kees Cook	b13fecb1c3	treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() This converts all the existing DECLARE_TASKLET() (and ...DISABLED) macros with DECLARE_TASKLET_OLD() in preparation for refactoring the tasklet callback type. All existing DECLARE_TASKLET() users had a "0" data argument, it has been removed here as well. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org>	2020-07-30 11:15:58 -07:00
Sathish Narasimman	cbbdfa6f33	Bluetooth: Enable controller RPA resolution using Experimental feature This patch adds support to enable the use of RPA Address resolution using expermental feature mgmt command. Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 11:14:05 +02:00
Felix Fietkau	04e35caa32	mac80211: remove STA txq pending airtime underflow warning This warning can trigger if there is a mismatch between frames that were sent with the sta pointer set vs tx status frames reported for the sta address. This can happen due to race conditions on re-creating stations, or even in the case of .sta_add/remove being used instead of .sta_state, which can cause frames to be sent to a station that has not been uploaded yet. If there is an actual underflow issue, it should show up in the device airtime warning below, so it is better to remove this one. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200725084533.13829-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-30 10:26:04 +02:00
Vasanthakumar Thiagarajan	e61fbfca80	mac80211: Fix bug in Tx ack status reporting in 802.3 xmit path Allocated ack_frame id from local->ack_status_frames is not really stored in the tx_info for 802.3 Tx path. Due to this, tx ack status is not reported and ack_frame id is not freed for the buffers requiring tx ack status. Also move the memset to 0 of tx_info before IEEE80211_TX_CTL_REQ_TX_STATUS flag assignment. Fixes: `50ff477a86` ("mac80211: add 802.11 encapsulation offloading support") Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@codeaurora.org> Link: https://lore.kernel.org/r/1595427617-1713-1-git-send-email-vthiagar@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-30 10:25:17 +02:00
Julian Squires	4052d3d2e8	cfg80211: check vendor command doit pointer before use In the case where a vendor command does not implement doit, and has no flags set, doit would not be validated and a NULL pointer dereference would occur, for example when invoking the vendor command via iw. I encountered this while developing new vendor commands. Perhaps in practice it is advisable to always implement doit along with dumpit, but it seems reasonable to me to always check doit anyway, not just when NEED_WDEV. Signed-off-by: Julian Squires <julian@cipht.net> Link: https://lore.kernel.org/r/20200706211353.2366470-1-julian@cipht.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-30 10:24:01 +02:00
Remi Pommarel	5e43540c2a	mac80211: mesh: Free pending skb when destroying a mpath A mpath object can hold reference on a list of skb that are waiting for mpath resolution to be sent. When destroying a mpath this skb list should be cleaned up in order to not leak memory. Fixing that kind of leak: unreferenced object 0xffff0000181c9300 (size 1088): comm "openvpn", pid 1782, jiffies 4295071698 (age 80.416s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 f9 80 36 00 00 00 00 00 ..........6..... 02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<000000004bc6a443>] kmem_cache_alloc+0x1a4/0x2f0 [<000000002caaef13>] sk_prot_alloc.isra.39+0x34/0x178 [<00000000ceeaa916>] sk_alloc+0x34/0x228 [<00000000ca1f1d04>] inet_create+0x198/0x518 [<0000000035626b1c>] __sock_create+0x134/0x328 [<00000000a12b3a87>] __sys_socket+0xb0/0x158 [<00000000ff859f23>] __arm64_sys_socket+0x40/0x58 [<00000000263486ec>] el0_svc_handler+0xd0/0x1a0 [<0000000005b5157d>] el0_svc+0x8/0xc unreferenced object 0xffff000012973a40 (size 216): comm "openvpn", pid 1782, jiffies 4295082137 (age 38.660s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 c0 06 16 00 00 ff ff 00 93 1c 18 00 00 ff ff ................ backtrace: [<000000004bc6a443>] kmem_cache_alloc+0x1a4/0x2f0 [<0000000023c8c8f9>] __alloc_skb+0xc0/0x2b8 [<000000007ad950bb>] alloc_skb_with_frags+0x60/0x320 [<00000000ef90023a>] sock_alloc_send_pskb+0x388/0x3c0 [<00000000104fb1a3>] sock_alloc_send_skb+0x1c/0x28 [<000000006919d2dd>] __ip_append_data+0xba4/0x11f0 [<0000000083477587>] ip_make_skb+0x14c/0x1a8 [<0000000024f3d592>] udp_sendmsg+0xaf0/0xcf0 [<000000005aabe255>] inet_sendmsg+0x5c/0x80 [<000000008651ea08>] __sys_sendto+0x15c/0x218 [<000000003505c99b>] __arm64_sys_sendto+0x74/0x90 [<00000000263486ec>] el0_svc_handler+0xd0/0x1a0 [<0000000005b5157d>] el0_svc+0x8/0xc Fixes: `2bdaf386f9` (mac80211: mesh: move path tables into if_mesh) Signed-off-by: Remi Pommarel <repk@triplefau.lt> Link: https://lore.kernel.org/r/20200704135419.27703-1-repk@triplefau.lt Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-30 10:23:48 +02:00
Remi Pommarel	6a01afcf84	mac80211: mesh: Free ie data when leaving mesh At ieee80211_join_mesh() some ie data could have been allocated (see copy_mesh_setup()) and need to be cleaned up when leaving the mesh. This fixes the following kmemleak report: unreferenced object 0xffff0000116bc600 (size 128): comm "wpa_supplicant", pid 608, jiffies 4294898983 (age 293.484s) hex dump (first 32 bytes): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 0............... 00 0f ac 08 00 00 00 00 c4 65 40 00 00 00 00 00 .........e@..... backtrace: [<00000000bebe439d>] __kmalloc_track_caller+0x1c0/0x330 [<00000000a349dbe1>] kmemdup+0x28/0x50 [<0000000075d69baa>] ieee80211_join_mesh+0x6c/0x3b8 [mac80211] [<00000000683bb98b>] __cfg80211_join_mesh+0x1e8/0x4f0 [cfg80211] [<0000000072cb507f>] nl80211_join_mesh+0x520/0x6b8 [cfg80211] [<0000000077e9bcf9>] genl_family_rcv_msg+0x374/0x680 [<00000000b1bd936d>] genl_rcv_msg+0x78/0x108 [<0000000022c53788>] netlink_rcv_skb+0xb0/0x1c0 [<0000000011af8ec9>] genl_rcv+0x34/0x48 [<0000000069e41f53>] netlink_unicast+0x268/0x2e8 [<00000000a7517316>] netlink_sendmsg+0x320/0x4c0 [<0000000069cba205>] ____sys_sendmsg+0x354/0x3a0 [<00000000e06bab0f>] ___sys_sendmsg+0xd8/0x120 [<0000000037340728>] __sys_sendmsg+0xa4/0xf8 [<000000004fed9776>] __arm64_sys_sendmsg+0x44/0x58 [<000000001c1e5647>] el0_svc_handler+0xd0/0x1a0 Fixes: `c80d545da3` (mac80211: Let userspace enable and configure vendor specific path selection.) Signed-off-by: Remi Pommarel <repk@triplefau.lt> Link: https://lore.kernel.org/r/20200704135007.27292-1-repk@triplefau.lt Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-30 10:23:35 +02:00
Rajkumar Manoharan	65ad3ef9fc	mac80211: fix warning in 6 GHz IE addition in mesh mode The commit `24a2042cb2` ("mac80211: add HE 6 GHz Band Capability element") failed to check device capability before adding HE 6 GHz capability element. Below warning is reported in 11ac device in mesh. Fix that by checking device capability at HE 6 GHz cap IE addition in mesh beacon and association request. WARNING: CPU: 1 PID: 1897 at net/mac80211/util.c:2878 ieee80211_ie_build_he_6ghz_cap+0x149/0x150 [mac80211] [ 3138.720358] Call Trace: [ 3138.720361] ieee80211_mesh_build_beacon+0x462/0x530 [mac80211] [ 3138.720363] ieee80211_start_mesh+0xa8/0xf0 [mac80211] [ 3138.720365] __cfg80211_join_mesh+0x122/0x3e0 [cfg80211] [ 3138.720368] nl80211_join_mesh+0x3d3/0x510 [cfg80211] Fixes: `24a2042cb2` ("mac80211: add HE 6 GHz Band Capability element") Reported-by: Markus Theil <markus.theil@tu-ilmenau.de> Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Link: https://lore.kernel.org/r/1593656424-18240-1-git-send-email-rmanohar@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-07-30 10:20:37 +02:00
Sathish Narasimman	b2cc23398e	Bluetooth: Enable RPA Timeout Enable RPA timeout during bluetooth initialization. The RPA timeout value is used from hdev, which initialized from debug_fs Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 09:34:43 +02:00
Sathish Narasimman	5c49bcce5c	Bluetooth: Enable/Disable address resolution during le create conn In this patch if le_create_conn process is started restrict to disable address resolution and same is disabled during le_enh_connection_complete Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 09:34:43 +02:00
Sathish Narasimman	d03c759e39	Bluetooth: Let controller creates RPA during le create conn When address resolution is enabled and set_privacy is enabled let's use own address type as 0x03 Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 09:34:43 +02:00
Sathish Narasimman	b31bc00bfe	Bluetooth: Translate additional address type during le_conn When using controller based address resolution, then the new address types 0x02 and 0x03 are used. These types need to be converted back into either public address or random address types. This patch is specially during LE_CREATE_CONN if using own_add_type as 0x02 or 0x03. Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 09:34:43 +02:00
Marcel Holtmann	0eee35bdfa	Bluetooth: Update resolving list when updating whitelist When the whitelist is updated, then also update the entries of the resolving list for devices where IRKs are available. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sathish Narsimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 09:34:42 +02:00
Marcel Holtmann	e1d5723575	Bluetooth: Configure controller address resolution if available When the LL Privacy support is available, then as part of enabling or disabling passive background scanning, it is required to set up the controller based address resolution as well. Since only passive background scanning is utilizing the whitelist, the address resolution is now bound to the whitelist and passive background scanning. All other resolution can be easily done by the host stack. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sathish Narsimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 09:34:42 +02:00
Marcel Holtmann	6540351e6f	Bluetooth: Translate additional address type correctly When using controller based address resolution, then the new address types 0x02 and 0x03 are used. These types need to be converted back into either public address or random address types. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sathish Narsimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-30 09:34:42 +02:00
Sabrina Dubroca	71b59bf482	espintcp: count packets dropped in espintcp_rcv Currently, espintcp_rcv drops packets silently, which makes debugging issues difficult. Count packets as either XfrmInHdrError (when the packet was too short or contained invalid data) or XfrmInError (for other issues). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-30 06:51:36 +02:00
Sabrina Dubroca	fadd1a63a7	espintcp: handle short messages instead of breaking the encap socket Currently, short messages (less than 4 bytes after the length header) will break the stream of messages. This is unnecessary, since we can still parse messages even if they're too short to contain any usable data. This is also bogus, as keepalive messages (a single 0xff byte), though not needed with TCP encapsulation, should be allowed. This patch changes the stream parser so that short messages are accepted and dropped in the kernel. Messages that contain a valid SPI or non-ESP header are processed as before. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Reported-by: Andrew Cagney <cagney@libreswan.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-30 06:51:35 +02:00
Brian Vazquez	41d707b733	fib: fix fib_rules_ops indirect calls wrappers This patch fixes: commit `b9aaec8f0b` ("fib: use indirect call wrappers in the most common fib_rules_ops") which didn't consider the case when CONFIG_IPV6_MULTIPLE_TABLES is not set. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `b9aaec8f0b` ("fib: use indirect call wrappers in the most common fib_rules_ops") Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-29 13:26:42 -07:00
Linus Torvalds	21391520cb	A couple of syzcaller fixes for 5.8 the first one in particular has been quite noisy ("broke" in -rc5) so this would be worth landing even this late even if users likely won't see a difference -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAl8hFUAACgkQq06b7GqY 5nAlAA/+NawXB/rLi4v4HYFEKM4qkGZGgj/Ugw6sqRc/z43UMAVzhD+D9+gLBG5k Yka4CgOf+tLedIvW8DhHb22i8e3erQr4FCgfeHzZ3kuuZUYn4hXguBQitv6ra1SH VkKpfOFIviL2h602NF1qVfVGCeM5TRd3TGKMSn/86eeGHP/kNhHAxh0IOXUlsENN 3ZUR55tPO38dlXzQuT68d2mAa2TNnf7NBUI7h8mQLRHAYl3yyZ93Xa+6gfdJO+k3 EBKdKa/k2Whb//EoMui+j/1DXKBFUztS21GMthGpaGM0ZmNUGPmVoXc3ZzUSEpzd uCFeXjWLt6lbSMrbuWbZNyPodFZANQJ5s9iOhmZvXs6uNyM9j+Pi3ICEQrcf7wIw 9cnHG/jZoMgCyelh7Rg1ILG8pEDox3sjfhjun/xce1xOGU4/N0VXyimVqx2BKAC3 4LwveluSCmYuaAewhe3w3EH4i2eJQDfravBc8+6C8RLGSGcdI6tRKG4xGhitbjzk fhZcacUyeC9RMe6S0dh3I0d+EUhcnY2re53kP5ihkUaYOeqfHG7XUNO3C4BAruoR hXDo8cNK6kaCs08+Dp88cGcFsYoNHjLUeDzWKTQVY3YdoDBREmUNbtH2aH9vF6sX GUxUaoG+ZAsaY+MoinuGV6tcGyRYFbQjfgLdKTP1Kp/g4gM8qeo= =kJqr -----END PGP SIGNATURE----- Merge tag '9p-for-5.8-2' of git://github.com/martinetd/linux into master Pull 9p fixes from Dominique Martinet: "A couple of syzcaller fixes for 5.8 The first one in particular has been quite noisy ("broke" in -rc5) so this would be worth landing even this late even if users likely won't see a difference" * tag '9p-for-5.8-2' of git://github.com/martinetd/linux: 9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work net/9p: validate fds in p9_fd_open	2020-07-29 12:29:24 -07:00
Ido Schimmel	ec4f5b3617	mlxsw: spectrum: Use different trap group for externally routed packets Cited commit mistakenly removed the trap group for externally routed packets (e.g., via the management interface) and grouped locally routed and externally routed packet traps under the same group, thereby subjecting them to the same policer. This can result in problems, for example, when FRR is restarted and suddenly all transient traffic is trapped to the CPU because of a default route through the management interface. Locally routed packets required to re-establish a BGP connection will never reach the CPU and the routing tables will not be re-populated. Fix this by using a different trap group for externally routed packets. Fixes: `8110668ecd` ("mlxsw: spectrum_trap: Register layer 3 control traps") Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-29 12:16:21 -07:00
Ido Schimmel	83f3522860	ipv4: Silence suspicious RCU usage warning fib_trie_unmerge() is called with RTNL held, but not from an RCU read-side critical section. This leads to the following warning [1] when the FIB alias list in a leaf is traversed with hlist_for_each_entry_rcu(). Since the function is always called with RTNL held and since modification of the list is protected by RTNL, simply use hlist_for_each_entry() and silence the warning. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Not tainted ----------------------------- net/ipv4/fib_trie.c:1867 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/164: #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0 stack backtrace: CPU: 0 PID: 164 Comm: ip Not tainted 5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d fib_trie_unmerge+0x608/0xdb0 fib_unmerge+0x44/0x360 fib4_rule_configure+0xc8/0xad0 fib_nl_newrule+0x37a/0x1dd0 rtnetlink_rcv_msg+0x4f7/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 ____sys_sendmsg+0x879/0xa00 ___sys_sendmsg+0x122/0x190 __sys_sendmsg+0x103/0x1d0 __x64_sys_sendmsg+0x7d/0xb0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc80a234e97 Code: Bad RIP value. RSP: 002b:00007ffef8b66798 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc80a234e97 RDX: 0000000000000000 RSI: 00007ffef8b66800 RDI: 0000000000000003 RBP: 000000005f141b1c R08: 0000000000000001 R09: 0000000000000000 R10: 00007fc80a2a8ac0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 00007ffef8b67008 R15: 0000556fccb10020 Fixes: `0ddcf43d5d` ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-29 12:12:50 -07:00
Gaurav Singh	42f36eba71	netfilter: ip6tables: Remove redundant null checks Remove superfluous check for NULL pointer to header. Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-29 20:39:43 +02:00
Alexander A. Klimov	50935339c3	netfilter: Replace HTTP links with HTTPS ones Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w\|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-29 20:09:18 +02:00
Ahmed S. Darwish	77cc278f7b	xfrm: policy: Use sequence counters with associated lock A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the sequence counter write side critical section. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead, which allow to associate a lock with the sequence counter. This enables lockdep to verify that the lock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-17-a.darwish@linutronix.de	2020-07-29 16:14:27 +02:00
Ahmed S. Darwish	b901892b51	netfilter: nft_set_rbtree: Use sequence counter with associated rwlock A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_rwlock_t data type, which allows to associate a rwlock with the sequence counter. This enables lockdep to verify that the rwlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-16-a.darwish@linutronix.de	2020-07-29 16:14:26 +02:00
Ahmed S. Darwish	8201d923f4	netfilter: conntrack: Use sequence counter with associated spinlock A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-15-a.darwish@linutronix.de	2020-07-29 16:14:26 +02:00
Brian Vazquez	b9aaec8f0b	fib: use indirect call wrappers in the most common fib_rules_ops This avoids another inderect call per RX packet which save us around 20-40 ns. Changelog: v1 -> v2: - Move declaraions to fib_rules.h to remove warnings Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:42:31 -07:00
Cong Wang	608b4adab1	net_sched: initialize timer earlier in red_init() When red_init() fails, red_destroy() is called to clean up. If the timer is not initialized yet, del_timer_sync() will complain. So we have to move timer_setup() before any failure. Reported-and-tested-by: syzbot+6e95a4fabf88dc217145@syzkaller.appspotmail.com Fixes: `aee9caa03f` ("net: sched: sch_red: Add qevents "early_drop" and "mark"") Cc: Petr Machata <petrm@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:41:23 -07:00
Xiyu Yang	706ec91916	ipv6: Fix nexthop refcnt leak when creating ipv6 route info ip6_route_info_create() invokes nexthop_get(), which increases the refcount of the "nh". When ip6_route_info_create() returns, local variable "nh" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of ip6_route_info_create(). When nexthops can not be used with source routing, the function forgets to decrease the refcnt increased by nexthop_get(), causing a refcnt leak. Fix this issue by pulling up the error source routing handling when nexthops can not be used with source routing. Fixes: `f88d8ea67f` ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:24:08 -07:00
Mat Martineau	721e908990	mptcp: Safely store sequence number when sending data The MPTCP socket's write_seq member can be read without the msk lock held, so use WRITE_ONCE() to store it. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	c75293925f	mptcp: Safely read sequence number when lock isn't held The MPTCP socket's write_seq member should be read with READ_ONCE() when the msk lock is not held. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	06827b348b	mptcp: Skip unnecessary skb extension allocation for bare acks Bare TCP ack skbs are freed right after MPTCP sees them, so the work to allocate, zero, and populate the MPTCP skb extension is wasted. Detect these skbs and do not add skb extensions to them. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	067a0b3dc5	mptcp: Only use subflow EOF signaling on fallback connections The MPTCP state machine handles disconnections on non-fallback connections, but the mptcp_sock still needs to get notified when fallback subflows disconnect. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	43b54c6ee3	mptcp: Use full MPTCP-level disconnect state machine RFC 8684 appendix D describes the connection state machine for MPTCP. This patch implements the DATA_FIN / DATA_ACK exchanges and MPTCP-level socket state changes described in that appendix, rather than simply sending DATA_FIN along with TCP FIN when disconnecting subflows. DATA_FIN is now sent and acknowledged before shutting down the subflows. Received DATA_FIN information (if not part of a data packet) is written to the MPTCP socket when the incoming DSS option is parsed by the subflow, and the MPTCP worker is scheduled to process the flag. DATA_FIN received as part of a full DSS mapping will be handled when the mapping is processed. The DATA_FIN is acknowledged by the worker if the reader is caught up. If there is still data to be moved to the MPTCP-level queue, ack_seq will be incremented to account for the DATA_FIN when it reaches the end of the stream and a DATA_ACK will be sent to the peer. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	16a9a9da17	mptcp: Add helper to process acks of DATA_FIN After DATA_FIN has been sent, the peer will acknowledge it. An ack of the relevant MPTCP-level sequence number will update the MPTCP connection state appropriately. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	6920b85158	mptcp: Add mptcp_close_state() helper This will be used to transition to the appropriate state on close and determine if a DATA_FIN needs to be sent for that state transition. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	3721b9b646	mptcp: Track received DATA_FIN sequence number and add related helpers Incoming DATA_FIN headers need to propagate the presence of the DATA_FIN bit and the associated sequence number to the MPTCP layer, even when arriving on a bare ACK that does not get added to the receive queue. Add structure members to store the DATA_FIN information and helpers to set and check those values. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:42 -07:00
Mat Martineau	7279da6145	mptcp: Use MPTCP-level flag for sending DATA_FIN Since DATA_FIN information is the same for every subflow, store it only in the mptcp_sock. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:41 -07:00
Mat Martineau	242e63f651	mptcp: Remove outdated and incorrect comment mptcp_close() acquires the msk lock, so it clearly should not be held before the function is called. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:41 -07:00
Mat Martineau	57baaf2875	mptcp: Return EPIPE if sending is shut down during a sendmsg A MPTCP socket where sending has been shut down should not attempt to send additional data, since DATA_FIN has already been sent. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:41 -07:00
Mat Martineau	0bac966a1f	mptcp: Allow DATA_FIN in headers without TCP FIN RFC 8684-compliant DATA_FIN needs to be sent and ack'd before subflows are closed with TCP FIN, so write DATA_FIN DSS headers whenever their transmission has been enabled by the MPTCP connection-level socket. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 17:02:41 -07:00
Christoph Hellwig	a31edb2059	net: improve the user pointer check in init_user_sockptr Make sure not just the pointer itself but the whole range lies in the user address space. For that pass the length and then use the access_ok helper to do the check. Fixes: `6d04fe15f7` ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 13:43:40 -07:00
Christoph Hellwig	d3c4815151	net: remove sockptr_advance sockptr_advance never properly worked. Replace it with _offset variants of copy_from_sockptr and copy_to_sockptr. Fixes: `ba423fdaa5` ("net: add a new sockptr_t type") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 13:43:40 -07:00
Christoph Hellwig	a3ad434ad7	netfilter: arp_tables: restore a SPDX identifier This was accidentally removed in an unrelated commit. Fixes: `c2f12630c6` ("netfilter: switch nf_setsockopt to sockptr_t") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-28 13:43:40 -07:00
Abhishek Pandit-Subedi	4e8c36c3b0	Bluetooth: Fix suspend notifier race Unregister from suspend notifications and cancel suspend preparations before running hci_dev_do_close. Otherwise, the suspend notifier may race with unregister and cause cmd_timeout even after hdev has been freed. Below is the trace from when this panic was seen: [ 832.578518] Bluetooth: hci_core.c:hci_cmd_timeout() hci0: command 0x0c05 tx timeout [ 832.586200] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 832.586203] #PF: supervisor read access in kernel mode [ 832.586205] #PF: error_code(0x0000) - not-present page [ 832.586206] PGD 0 P4D 0 [ 832.586210] PM: suspend exit [ 832.608870] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 832.613232] CPU: 3 PID: 10755 Comm: kworker/3:7 Not tainted 5.4.44-04894-g1e9dbb96a161 #1 [ 832.630036] Workqueue: events hci_cmd_timeout [bluetooth] [ 832.630046] RIP: 0010:__queue_work+0xf0/0x374 [ 832.630051] RSP: 0018:ffff9b5285f1fdf8 EFLAGS: 00010046 [ 832.674033] RAX: ffff8a97681bac00 RBX: 0000000000000000 RCX: ffff8a976a000600 [ 832.681162] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff8a976a000748 [ 832.688289] RBP: ffff9b5285f1fe38 R08: 0000000000000000 R09: ffff8a97681bac00 [ 832.695418] R10: 0000000000000002 R11: ffff8a976a0006d8 R12: ffff8a9745107600 [ 832.698045] usb 1-6: new full-speed USB device number 119 using xhci_hcd [ 832.702547] R13: ffff8a9673658850 R14: 0000000000000040 R15: 000000000000001e [ 832.702549] FS: 0000000000000000(0000) GS:ffff8a976af80000(0000) knlGS:0000000000000000 [ 832.702550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 832.702550] CR2: 0000000000000000 CR3: 000000010415a000 CR4: 00000000003406e0 [ 832.702551] Call Trace: [ 832.702558] queue_work_on+0x3f/0x68 [ 832.702562] process_one_work+0x1db/0x396 [ 832.747397] worker_thread+0x216/0x375 [ 832.751147] kthread+0x138/0x140 [ 832.754377] ? pr_cont_work+0x58/0x58 [ 832.758037] ? kthread_blkcg+0x2e/0x2e [ 832.761787] ret_from_fork+0x22/0x40 [ 832.846191] ---[ end trace fa93f466da517212 ]--- Fixes: `9952d90ea2` ("Bluetooth: Handle PM_SUSPEND_PREPARE and PM_POST_SUSPEND") Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-28 20:27:14 +02:00
Chuck Lever	b297fed699	svcrdma: CM event handler clean up Now that there's a core tracepoint that reports these events, there's no need to maintain dprintk() call sites in each arm of the switch statements. We also refresh the documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-07-28 10:18:15 -04:00
Chuck Lever	365e9992b9	svcrdma: Remove transport reference counting Jason tells me that a ULP cannot rely on getting an ESTABLISHED and DISCONNECTED event pair for each connection, so transport reference counting in the CM event handler will never be reliable. Now that we have ib_drain_qp(), svcrdma should no longer need to hold transport references while Sends and Receives are posted. So remove the get/put call sites in the CM event handlers. This eliminates a significant source of locked memory bus traffic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-07-28 10:18:14 -04:00
Chuck Lever	64d2642251	svcrdma: Fix another Receive buffer leak During a connection tear down, the Receive queue is flushed before the device resources are freed. Typically, all the Receives flush with IB_WR_FLUSH_ERR. However, any pending successful Receives flush with IB_WR_SUCCESS, and the server automatically posts a fresh Receive to replace the completing one. This happens even after the connection has closed and the RQ is drained. Receives that are posted after the RQ is drained appear never to complete, causing a Receive resource leak. The leaked Receive buffer is left DMA-mapped. To prevent these late-posted recv_ctxt's from leaking, block new Receive posting after XPT_CLOSE is set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-07-28 10:18:13 -04:00
Peilin Ye	3c4f850e84	xdp: Prevent kernel-infoleak in xsk_getsockopt() xsk_getsockopt() is copying uninitialized stack memory to userspace when 'extra_stats' is 'false'. Fix it. Doing '= {};' is sufficient since currently 'struct xdp_statistics' is defined as follows: struct xdp_statistics { __u64 rx_dropped; __u64 rx_invalid_descs; __u64 tx_invalid_descs; __u64 rx_ring_full; __u64 rx_fill_ring_empty_descs; __u64 tx_ring_empty_descs; }; When being copied to the userspace, 'stats' will not contain any uninitialized 'holes' between struct fields. Fixes: `8aa5a33578` ("xsk: Add new statistics") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/bpf/20200728053604.404631-1-yepeilin.cs@gmail.com	2020-07-28 12:50:15 +02:00
Max Chou	24b065727c	Bluetooth: Return NOTIFY_DONE for hci_suspend_notifier The original return is NOTIFY_STOP, but notifier_call_chain would stop the future call for register_pm_notifier even registered on other Kernel modules with the same priority which value is zero. Signed-off-by: Max Chou <max.chou@realtek.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-28 09:12:31 +02:00
Ismael Ferreras Morezuelas	cde1a8a992	Bluetooth: btusb: Fix and detect most of the Chinese Bluetooth controllers For some reason they tend to squat on the very first CSR/ Cambridge Silicon Radio VID/PID instead of paying fees. This is an extremely common problem; the issue goes as back as 2013 and these devices are only getting more popular, even rebranded by reputable vendors and sold by retailers everywhere. So, at this point in time there are hundreds of modern dongles reusing the ID of what originally was an early Bluetooth 1.1 controller. Linux is the only place where they don't work due to spotty checks in our detection code. It only covered a minimum subset. So what's the big idea? Take advantage of the fact that all CSR chips report the same internal version as both the LMP sub-version and HCI revision number. It always matches, couple that with the manufacturer code, that rarely lies, and we now have a good idea of who is who. Additionally, by compiling a list of user-reported HCI/lsusb dumps, and searching around for legit CSR dongles in similar product ranges we can find what CSR BlueCore firmware supported which Bluetooth versions. That way we can narrow down ranges of fakes for each of them. e.g. Real CSR dongles with LMP subversion 0x73 are old enough that support BT 1.1 only; so it's a dead giveaway when some third-party BT 4.0 dongle reuses it. So, to sum things up; there are multiple classes of fake controllers reusing the same 0A12:0001 VID/PID. This has been broken for a while. Known 'fake' bcdDevices: 0x0100, 0x0134, 0x1915, 0x2520, 0x7558, 0x8891 IC markings on 0x7558: FR3191AHAL 749H15143 (???) https://bugzilla.kernel.org/show_bug.cgi?id=60824 Fixes: `81cac64ba2` (Deal with USB devices that are faking CSR vendor) Reported-by: Michał Wiśniewski <brylozketrzyn@gmail.com> Tested-by: Mike Johnson <yuyuyak@gmail.com> Tested-by: Ricardo Rodrigues <ekatonb@gmail.com> Tested-by: M.Hanny Sabbagh <mhsabbagh@outlook.com> Tested-by: Oussama BEN BRAHIM <b.brahim.oussama@gmail.com> Tested-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-07-28 09:09:00 +02:00
Sabrina Dubroca	d5dba1376e	xfrm: esp6: fix the location of the transport header with encapsulation commit `17175d1a27` ("xfrm: esp6: fix encapsulation header offset computation") changed esp6_input_done2 to correctly find the size of the IPv6 header that precedes the TCP/UDP encapsulation header, but didn't adjust the final call to skb_set_transport_header, which I assumed was correct in using skb_network_header_len. Xiumei Mu reported that when we create xfrm states that include port numbers in the selector, traffic from the user sockets is dropped. It turns out that we get a state mismatch in __xfrm_policy_check, because we end up trying to compare the encapsulation header's ports with the selector that's based on user traffic ports. Fixes: `0146dca70b` ("xfrm: add support for UDPv6 encapsulation of ESP") Fixes: `26333c37fc` ("xfrm: add IPv6 support for espintcp") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-28 07:57:23 +02:00
Al Viro	181964e619	fix a braino in cmsghdr_from_user_compat_to_kern() commit `547ce4cfb3` ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") missed one of the places where ucmlen should've been replaced with cmsg.cmsg_len, now that we are fetching the entire struct rather than doing it field-by-field. As the result, compat sendmsg() with several different-sized cmsg attached started to fail with EINVAL. Trivial to fix, fortunately. Fixes: `547ce4cfb3` ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") Reported-by: Nick Bowler <nbowler@draconx.ca> Tested-by: Nick Bowler <nbowler@draconx.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 13:25:39 -07:00
Murali Karicheri	795ec4f572	net: prp: enhance debugfs to display PRP info Print PRP specific information from node table as part of debugfs node table display. Also display the node as DAN-H or DAN-P depending on the info from node table. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 12:20:40 -07:00
Murali Karicheri	451d8123f8	net: prp: add packet handling support DAN-P (Dual Attached Nodes PRP) nodes are expected to receive traditional IP packets as well as PRP (Parallel Redundancy Protocol) tagged (trailer) packets. PRP trailer is 6 bytes of PRP protocol unit called RCT, Redundancy Control Trailer (RCT) similar to HSR tag. PRP network can have traditional devices such as bridges/switches or PC attached to it and should be able to communicate. Regular Ethernet devices treat the RCT as pads. This patch adds logic to format L2 frames from network stack to add a trailer (RCT) and send it as duplicates over the slave interfaces when the protocol is PRP as per IEC 62439-3. At the ingress, it strips the trailer, do duplicate detection and rejection and forward a stripped frame up the network stack. PRP device should accept frames from Singly Attached Nodes (SAN) and thus the driver mark the link where the frame came from in the node table. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 12:20:40 -07:00
Murali Karicheri	fa4dc89531	net: hsr: define and use proto_ops ptrs to handle hsr specific frames As a preparatory patch to introduce PRP, refactor the code specific to handling HSR frames into separate functions and call them through proto_ops function pointers. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 12:20:40 -07:00
Murali Karicheri	c643ff0383	net: prp: add supervision frame generation utility function Add support for generation of PRP supervision frames. For PRP, supervision frame format is similar to HSR version 0, but have a PRP Redundancy Control Trailer (RCT) added and uses a different message type, PRP_TLV_LIFE_CHECK_DD. Also update is_supervision_frame() to include the new message type used for PRP supervision frame. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 12:20:40 -07:00
Murali Karicheri	28e458e097	net: hsr: introduce protocol specific function pointers As a preparatory patch to introduce support for PRP protocol, add a protocol ops ptr in the private hsr structure to hold function pointers as some of the functions at protocol level packet handling is different for HSR vs PRP. It is expected that PRP will add its of set of functions for protocol handling. Modify existing hsr_announce() function to call proto_ops->send_sv_frame() to send supervision frame for HSR. This is expected to be different for PRP. So introduce a ops function ptr, send_sv_frame() for the same and initialize it to send_hsr_supervsion_frame(). Modify hsr_announce() to call proto_ops->send_sv_frame(). Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 12:20:40 -07:00
Murali Karicheri	121c33b07b	net: hsr: introduce common code for skb initialization As a preparatory patch to introduce PRP protocol support in the driver, refactor the skb init code to a separate function. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 12:20:40 -07:00
Murali Karicheri	8f4c0e0178	hsr: enhance netlink socket interface to support PRP Parallel Redundancy Protocol (PRP) is another redundancy protocol introduced by IEC 63439 standard. It is similar to HSR in many aspects:- - Use a pair of Ethernet interfaces to created the PRP device - Use a 6 byte redundancy protocol part (RCT, Redundancy Check Trailer) similar to HSR Tag. - Has Link Redundancy Entity (LRE) that works with RCT to implement redundancy. Key difference is that the protocol unit is a trailer instead of a prefix as in HSR. That makes it inter-operable with tradition network components such as bridges/switches which treat it as pad bytes, whereas HSR nodes requires some kind of translators (Called redbox) to talk to regular network devices. This features allows regular linux box to be converted to a DAN-P box. DAN-P stands for Dual Attached Node - PRP similar to DAN-H (Dual Attached Node - HSR). Add a comment at the header/source code to explicitly state that the driver files also handles PRP protocol as well. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 12:20:40 -07:00
Matthieu Baerts	367fe04eb6	mptcp: fix joined subflows with unblocking sk Unblocking sockets used for outgoing connections were not containing inet info about the initial connection due to a typo there: the value of "err" variable is negative in the kernelspace. This fixes the creation of additional subflows where the remote port has to be reused if the other host didn't announce another one. This also fixes inet_diag showing blank info about MPTCP sockets from unblocking sockets doing a connect(). Fixes: `41be81a8d3` ("mptcp: fix unblocking connect()") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 11:50:57 -07:00
Martin Varghese	350e7ab92d	net: Removed the device type check to add mpls support for devices MPLS has no dependency with the device type of underlying devices. Hence the device type check to add mpls support for devices can be avoided. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 11:40:47 -07:00
Ido Schimmel	73cb11933c	ipmr: Copy option to correct variable Cited commit mistakenly copied provided option to 'val' instead of to 'mfc': ``` - if (copy_from_user(&mfc, optval, sizeof(mfc))) { + if (copy_from_sockptr(&val, optval, sizeof(val))) { ``` Fix this by copying the option to 'mfc'. selftest router_multicast.sh before: $ ./router_multicast.sh smcroutectl: Unknown or malformed IPC message 'a' from client. smcroutectl: failed removing multicast route, does not exist. TEST: mcast IPv4 [FAIL] Multicast not received on first host TEST: mcast IPv6 [ OK ] smcroutectl: Unknown or malformed IPC message 'a' from client. smcroutectl: failed removing multicast route, does not exist. TEST: RPF IPv4 [FAIL] Multicast not received on first host TEST: RPF IPv6 [ OK ] selftest router_multicast.sh after: $ ./router_multicast.sh TEST: mcast IPv4 [ OK ] TEST: mcast IPv6 [ OK ] TEST: RPF IPv4 [ OK ] TEST: RPF IPv6 [ OK ] Fixes: `01ccb5b48f` ("net/ipv4: switch ip_mroute_setsockopt to sockptr_t") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 11:39:55 -07:00
Karsten Graul	72b7f6c487	net/smc: unique reason code for exceeded max dmb count When the maximum dmb buffer limit for an ism device is reached no more dmb buffers can be registered. When this happens the reason code is set to SMC_CLC_DECL_MEM indicating out-of-memory. This is the same reason code that is used when no memory could be allocated for the new dmb buffer. This is confusing for users when they see this error but there is more memory available. To solve this set a separate new reason code when the maximum dmb limit exceeded. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-27 10:30:01 -07:00
Andrii Nakryiko	e8407fdeb9	bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	c1931c9784	bpf: Implement BPF XDP link-specific introspection APIs Implement XDP link-specific show_fdinfo and link_info to emit ifindex. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-7-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	026a4c28e1	bpf, xdp: Implement LINK_UPDATE for BPF XDP link Add support for LINK_UPDATE command for BPF XDP link to enable reliable replacement of underlying BPF program. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-6-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	aa8d3a716b	bpf, xdp: Add bpf_link-based XDP attachment API Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through BPF_LINK_CREATE command. bpf_xdp_link is mutually exclusive with direct BPF program attachment, previous BPF program should be detached prior to attempting to create a new bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it can't be replaced by other BPF program attachment or link attachment. It will be detached only when the last BPF link FD is closed. bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to how other BPF links behave (cgroup, flow_dissector). At that point bpf_link will become defunct, but won't be destroyed until last FD is closed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	d4baa9368a	bpf, xdp: Extract common XDP program attachment logic Further refactor XDP attachment code. dev_change_xdp_fd() is split into two parts: getting bpf_progs from FDs and attachment logic, working with bpf_progs. This makes attachment logic a bit more straightforward and prepares code for bpf_xdp_link inclusion, which will share the common logic. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-4-andriin@fb.com	2020-07-25 20:37:02 -07:00
Andrii Nakryiko	7f0a838254	bpf, xdp: Maintain info on attached XDP BPF programs in net_device Instead of delegating to drivers, maintain information about which BPF programs are attached in which XDP modes (generic/skb, driver, or hardware) locally in net_device. This effectively obsoletes XDP_QUERY_PROG command. Such re-organization simplifies existing code already. But it also allows to further add bpf_link-based XDP attachments without drivers having to know about any of this at all, which seems like a good setup. XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to install/uninstall active BPF program. All the higher-level concerns about prog/link interaction will be contained within generic driver-agnostic logic. All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed. It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags when resetting installed programs. That seems unnecessary, plus most drivers don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW should be enough of an indicator of what is required of driver to correctly reset active BPF program. dev_xdp_uninstall() is also generalized as an iteration over all three supported mode. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com	2020-07-25 20:37:02 -07:00
Yonghong Song	5ce6e77c7e	bpf: Implement bpf iterator for sock local storage map The bpf iterator for bpf sock local storage map is implemented. User space interacts with sock local storage map with fd as a key and storage value. In kernel, passing fd to the bpf program does not really make sense. In this case, the sock itself is passed to bpf program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184116.590602-1-yhs@fb.com	2020-07-25 20:16:33 -07:00
Yonghong Song	f9c7927295	bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t This patch refactored target bpf_iter_init_seq_priv_t callback function to accept additional information. This will be needed in later patches for map element targets since a particular map should be passed to traverse elements for that particular map. In the future, other information may be passed to target as well, e.g., pid, cgroup id, etc. to customize the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com	2020-07-25 20:16:32 -07:00
Yonghong Song	14fc6bd6b7	bpf: Refactor bpf_iter_reg to have separate seq_info member There is no functionality change for this patch. Struct bpf_iter_reg is used to register a bpf_iter target, which includes information for both prog_load, link_create and seq_file creation. This patch puts fields related seq_file creation into a different structure. This will be useful for map elements iterator where one iterator covers different map types and different map types may have different seq_ops, init/fini private_data function and private_data size. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com	2020-07-25 20:16:32 -07:00
Jakub Sitnicki	c8a2983c4d	udp: Don't discard reuseport selection when group has connections When BPF socket lookup prog selects a socket that belongs to a reuseport group, and the reuseport group has connected sockets in it, the socket selected by reuseport will be discarded, and socket returned by BPF socket lookup will be used instead. Modify this behavior so that the socket selected by reuseport running after BPF socket lookup always gets used. Ignore the fact that the reuseport group might have connections because it is only relevant when scoring sockets during regular hashtable-based lookup. Fixes: `72f7e9440e` ("udp: Run SK_LOOKUP BPF program on socket lookup") Fixes: `6d4201b138` ("udp6: Run SK_LOOKUP BPF program on socket lookup") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20200722161720.940831-2-jakub@cloudflare.com	2020-07-25 20:16:01 -07:00
David S. Miller	a57066b1a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit `efc6b6f6c3` which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-25 17:49:04 -07:00
Linus Torvalds	1b64b2e244	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net into master Pull networking fixes from David Miller: 1) Fix RCU locaking in iwlwifi, from Johannes Berg. 2) mt76 can access uninitialized NAPI struct, from Felix Fietkau. 3) Fix race in updating pause settings in bnxt_en, from Vasundhara Volam. 4) Propagate error return properly during unbind failures in ax88172a, from George Kennedy. 5) Fix memleak in adf7242_probe, from Liu Jian. 6) smc_drv_probe() can leak, from Wang Hai. 7) Don't muck with the carrier state if register_netdevice() fails in the bonding driver, from Taehee Yoo. 8) Fix memleak in dpaa_eth_probe, from Liu Jian. 9) Need to check skb_put_padto() return value in hsr_fill_tag(), from Murali Karicheri. 10) Don't lose ionic RSS hash settings across FW update, from Shannon Nelson. 11) Fix clobbered SKB control block in act_ct, from Wen Xu. 12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang. 13) IS_UDPLITE cleanup a long time ago, incorrectly handled transformations involving UDPLITE_RECV_CC. From Miaohe Lin. 14) Unbalanced locking in netdevsim, from Taehee Yoo. 15) Suppress false-positive error messages in qed driver, from Alexander Lobakin. 16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye. 17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost. 18) Uninitialized value in geneve_changelink(), from Cong Wang. 19) Fix deadlock in xen-netfront, from Andera Righi. 19) flush_backlog() frees skbs with IRQs disabled, so should use dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov Kasiviswanathan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) drivers/net/wan: lapb: Corrected the usage of skb_cow dev: Defer free of skbs in flush_backlog qrtr: orphan socket in qrtr_release() xen-netfront: fix potential deadlock in xennet_remove() flow_offload: Move rhashtable inclusion to the source file geneve: fix an uninitialized value in geneve_changelink() bonding: check return value of register_netdevice() in bond_newlink() tcp: allow at most one TLP probe per flight AX.25: Prevent integer overflows in connect and sendmsg cxgb4: add missing release on skb in uld_send() net: atlantic: fix PTP on AQC10X AX.25: Prevent out-of-bounds read in ax25_sendmsg() sctp: shrink stream outq when fails to do addstream reconf sctp: shrink stream outq only when new outcnt < old outcnt AX.25: Fix out-of-bounds read in ax25_connect() enetc: Remove the mdio bus on PF probe bailout net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload net: ethernet: ave: Fix error returns in ave_init drivers/net/wan/x25_asy: Fix to make it work ipvs: fix the connection sync failed in some cases ...	2020-07-25 11:50:59 -07:00
Subash Abhinov Kasiviswanathan	7df5cb75cf	dev: Defer free of skbs in flush_backlog IRQs are disabled when freeing skbs in input queue. Use the IRQ safe variant to free skbs here. Fixes: `145dd5f9c8` ("net: flush the softnet backlog in process context") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 19:59:22 -07:00
Cong Wang	af9f691f0f	qrtr: orphan socket in qrtr_release() We have to detach sock from socket in qrtr_release(), otherwise skb->sk may still reference to this socket when the skb is released in tun->queue, particularly sk->sk_wq still points to &sock->wq, which leads to a UAF. Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com Fixes: `28fb4e59a4` ("net: qrtr: Expose tunneling endpoint to user space") Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:29:52 -07:00
Tom Parkin	ab6934e084	l2tp: WARN_ON rather than BUG_ON in l2tp_session_free l2tp_session_free called BUG_ON if the tunnel magic feather value wasn't correct. The intent of this was to catch lifetime bugs; for example early tunnel free due to incorrect use of reference counts. Since the tunnel magic feather being wrong indicates either early free or structure corruption, we can avoid doing more damage by simply leaving the tunnel structure alone. If the tunnel refcount isn't dropped when it should be, the tunnel instance will remain in the kernel, resulting in the tunnel structure and socket leaking. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	0dd62f69d8	l2tp: remove BUG_ON refcount value in l2tp_session_free l2tp_session_free is only called by l2tp_session_dec_refcount when the reference count reaches zero, so it's of limited value to validate the reference count value in l2tp_session_free itself. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	493048f5df	l2tp: WARN_ON rather than BUG_ON in l2tp_session_queue_purge l2tp_session_queue_purge is used during session shutdown to drop any skbs queued for reordering purposes according to L2TP dataplane rules. The BUG_ON in this function checks the session magic feather in an attempt to catch lifetime bugs. Rather than crashing the kernel with a BUG_ON, we can simply WARN_ON and refuse to do anything more -- in the worst case this could result in a leak. However this is highly unlikely given that the session purge only occurs from codepaths which have obtained the session by means of a lookup via. the parent tunnel and which check the session "dead" flag to protect against shutdown races. While we're here, have l2tp_session_queue_purge return void rather than an integer, since neither of the callsites checked the return value. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	ebb4f5e6e4	l2tp: don't BUG_ON seqfile checks in l2tp_ppp checkpatch advises that WARN_ON and recovery code are preferred over BUG_ON which crashes the kernel. l2tp_ppp has a BUG_ON check of struct seq_file's private pointer in pppol2tp_seq_start prior to accessing data through that pointer. Rather than crashing, we can simply bail out early and return NULL in order to terminate the seq file processing in much the same way as we do when reaching the end of tunnel/session instances to render. Retain a WARN_ON to help trace possible bugs in this area. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	1aa646ac71	l2tp: don't BUG_ON session magic checks in l2tp_ppp checkpatch advises that WARN_ON and recovery code are preferred over BUG_ON which crashes the kernel. l2tp_ppp.c's BUG_ON checks of the l2tp session structure's "magic" field occur in code paths where it's reasonably easy to recover: * In the case of pppol2tp_sock_to_session, we can return NULL and the caller will bail out appropriately. There is no change required to any of the callsites of this function since they already handle pppol2tp_sock_to_session returning NULL. * In the case of pppol2tp_session_destruct we can just avoid decrementing the reference count on the suspect session structure. In the worst case scenario this results in a memory leak, which is preferable to a crash. Convert these uses of BUG_ON to WARN_ON accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	cd3e29b333	l2tp: remove BUG_ON in l2tp_tunnel_closeall l2tp_tunnel_closeall is only called from l2tp_core.c, and it's easy to statically analyse the code path calling it to validate that it should never be passed a NULL tunnel pointer. Having a BUG_ON checking the tunnel pointer triggers a checkpatch warning. Since the BUG_ON is of no value, remove it to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	ce2f86ae25	l2tp: remove BUG_ON in l2tp_session_queue_purge l2tp_session_queue_purge is only called from l2tp_core.c, and it's easy to statically analyse the code paths calling it to validate that it should never be passed a NULL session pointer. Having a BUG_ON checking the session pointer triggers a checkpatch warning. Since the BUG_ON is of no value, remove it to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	7a379558c2	l2tp: WARN_ON rather than BUG_ON in l2tp_dfs_seq_start l2tp_dfs_seq_start had a BUG_ON to catch a possible programming error in l2tp_dfs_seq_open. Since we can easily bail out of l2tp_dfs_seq_start, prefer to do that and flag the error with a WARN_ON rather than crashing the kernel. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Tom Parkin	95075150d0	l2tp: avoid multiple assignments checkpatch warns about multiple assignments. Update l2tp accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:19:14 -07:00
Willem de Bruijn	01370434df	icmp6: support rfc 4884 Extend the rfc 4884 read interface introduced for ipv4 in commit `eba75c587e` ("icmp: support rfc 4884") to ipv6. Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884. Changes v1->v2: - make ipv6_icmp_error_rfc4884 static (file scope) Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:12:41 -07:00
Willem de Bruijn	178c49d9f9	icmp: prepare rfc 4884 for ipv6 The RFC 4884 spec is largely the same between IPv4 and IPv6. Factor out the IPv4 specific parts in preparation for IPv6 support: - icmp types supported - icmp header size, and thus offset to original datagram start - datagram length field offset in icmp(6)hdr. - datagram length field word size: 4B for IPv4, 8B for IPv6. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:12:41 -07:00
Willem de Bruijn	c4e9e09f55	icmp: revise rfc4884 tests 1) Only accept packets with original datagram len field >= header len. The extension header must start after the original datagram headers. The embedded datagram len field is compared against the 128B minimum stipulated by RFC 4884. It is unlikely that headers extend beyond this. But as we know the exact header length, check explicitly. 2) Remove the check that datagram length must be <= 576B. This is a send constraint. There is no value in testing this on rx. Within private networks it may be known safe to send larger packets. Process these packets. This test was also too lax. It compared original datagram length rather than entire icmp packet length. The stand-alone fix would be: - if (hlen + skb->len > 576) + if (-skb_network_offset(skb) + skb->len > 576) Fixes: `eba75c587e` ("icmp: support rfc 4884") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:12:41 -07:00
Colin Ian King	623b57bec7	sctp: remove redundant initialization of variable status The variable status is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Also put the variable declarations into reverse christmas tree order. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 17:10:13 -07:00
Eelco Chaudron	a65878d6f0	net: openvswitch: fixes potential deadlock in dp cleanup code The previous patch introduced a deadlock, this patch fixes it by making sure the work is canceled without holding the global ovs lock. This is done by moving the reorder processing one layer up to the netns level. Fixes: `eac87c413b` ("net: openvswitch: reorder masks array based on usage") Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com Reviewed-by: Paolo <pabeni@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 16:58:38 -07:00
Christoph Hellwig	dfd3d5266d	sctp: fix slab-out-of-bounds in SCTP_DELAYED_SACK processing This sockopt accepts two kinds of parameters, using struct sctp_sack_info and struct sctp_assoc_value. The mentioned commit didn't notice an implicit cast from the smaller (latter) struct to the bigger one (former) when copying the data from the user space, which now leads to an attempt to write beyond the buffer (because it assumes the storing buffer is bigger than the parameter itself). Fix it by allocating a sctp_sack_info on stack and filling it out based on the small struct for the compat case. Changelog stole from an earlier patch from Marcelo Ricardo Leitner. Fixes: `ebb25defdc` ("sctp: pass a kernel pointer to sctp_setsockopt_delayed_ack") Reported-by: syzbot+0e4699d000d8b874d8dc@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 16:40:36 -07:00
Christoph Hellwig	6d04fe15f7	net: optimize the sockptr_t for unified kernel/user address spaces For architectures like x86 and arm64 we don't need the separate bit to indicate that a pointer is a kernel pointer as the address spaces are unified. That way the sockptr_t can be reduced to a union of two pointers, which leads to nicer calling conventions. The only caveat is that we need to check that users don't pass in kernel address and thus gain access to kernel memory. Thus the USER_SOCKPTR helper is replaced with a init_user_sockptr function that does this check and returns an error if it fails. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	a7b75c5a8c	net: pass a sockptr_t into ->setsockopt Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	d38d2b00ba	net/tcp: switch do_tcp_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	d4c19c4914	net/tcp: switch ->md5_parse to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	91ac1ccaff	net/udp: switch udp_lib_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	894cfbc0cf	net/ipv6: switch do_ipv6_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	b84d2b73af	net/ipv6: factor out a ipv6_set_opt_hdr helper Factour out a helper to set the IPv6 option headers from do_ipv6_setsockopt. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	86298285c9	net/ipv6: switch ipv6_flowlabel_opt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Note that the get case is pretty weird in that it actually copies data back to userspace from setsockopt. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	ff6a4cf214	net/ipv6: split up ipv6_flowlabel_opt Split ipv6_flowlabel_opt into a subfunction for each action and a small wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	b43c615313	net/ipv6: switch ip6_mroute_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	89654c5fcd	net/ipv4: switch do_ip_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	de40a3e883	net/ipv4: merge ip_options_get and ip_options_get_from_user Use the sockptr_t type to merge the versions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	01ccb5b48f	net/ipv4: switch ip_mroute_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	b03afaa82e	bpfilter: switch bpfilter_ip_set_sockopt to sockptr_t This is mostly to prepare for cleaning up the callers, as bpfilter by design can't handle kernel pointers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	c2f12630c6	netfilter: switch nf_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	ab214d1bf8	netfilter: switch xt_copy_counters to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	7e4b9dbabb	netfilter: remove the unused user argument to do_update_counters Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c6d1b26a8f	net/xfrm: switch xfrm_user_policy to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c8c1bbb6eb	net: switch sock_set_timeout to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c34645ac25	net: switch sock_set_timeout to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	5790642b47	net: switch sock_setbindtodevice to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	b1ea9ff6af	net: switch copy_bpf_fprog_from_user to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	d200cf624c	bpfilter: reject kernel addresses The bpfilter user mode helper processes the optval address using process_vm_readv. Don't send it kernel addresses fed under set_fs(KERNEL_DS) as that won't work. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	c9ffebdde8	net/bpfilter: split __bpfilter_process_sockopt Split __bpfilter_process_sockopt into a low-level send request routine and the actual setsockopt hook to split the init time ping from the actual setsockopt processing. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Christoph Hellwig	e024e00818	bpfilter: fix up a sparse annotation The __user doesn't make sense when casting to an integer type, just switch to a uintptr_t cast which also removes the need for the __force. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Ariel Levkovich	5923b8f7fa	net/sched: cls_flower: Add hash info to flow classification Adding new cls flower keys for hash value and hash mask and dissect the hash info from the skb into the flow key towards flow classication. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:23:31 -07:00
Ariel Levkovich	0cb09aff9d	net/flow_dissector: add packet hash dissection Retreive a hash value from the SKB and store it in the dissector key for future matching. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:23:31 -07:00
Herbert Xu	c2b69f24eb	flow_offload: Move rhashtable inclusion to the source file I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to be rebuilt. This dependency came through a bogus inclusion in the file net/flow_offload.h. This patch moves it to the right place. This patch also removes a lingering rhashtable inclusion in cls_api created by the same commit. Fixes: `4e481908c5` ("flow_offload: move tc indirect block to...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:17:22 -07:00
Chuck Lever	986a4b63d3	SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()") Braino when converting "buf->len -=" to "buf->len = len -". The result is under-estimation of the ralign and rslack values. On krb5p mounts, this has caused READDIR to fail with EIO, and KASAN splats when decoding READLINK replies. As a result of fixing this oversight, the gss_unwrap method now returns a buf->len that can be shorter than priv_len for small RPC messages. The additional adjustment done in unwrap_priv_data() can underflow buf->len. This causes the nfsd_request_too_large check to fail during some NFSv3 operations. Reported-by: Marian Rainer-Harbach Reported-by: Pierre Sauter <pierre.sauter@stwm.de> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1886277 Fixes: `31c9590ae4` ("SUNRPC: Add "@len" parameter to gss_unwrap()") Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-07-24 17:10:23 -04:00
David S. Miller	8e8135862c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix NAT hook deletion when table is dormant, from Florian Westphal. 2) Fix IPVS sync stalls, from guodeqing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 17:22:09 -07:00
Vladimir Oltean	5df5661a13	net: dsa: stop overriding master's ndo_get_phys_port_name The purpose of this override is to give the user an indication of what the number of the CPU port is (in DSA, the CPU port is a hardware implementation detail and not a network interface capable of traffic). However, it has always failed (by design) at providing this information to the user in a reliable fashion. Prior to commit `3369afba1e` ("net: Call into DSA netdevice_ops wrappers"), the behavior was to only override this callback if it was not provided by the DSA master. That was its first failure: if the DSA master itself was a DSA port or a switchdev, then the user would not see the number of the CPU port in /sys/class/net/eth0/phys_port_name, but the number of the DSA master port within its respective physical switch. But that was actually ok in a way. The commit mentioned above changed that behavior, and now overrides the master's ndo_get_phys_port_name unconditionally. That comes with problems of its own, which are worse in a way. The idea is that it's typical for switchdev users to have udev rules for consistent interface naming. These are based, among other things, on the phys_port_name attribute. If we let the DSA switch at the bottom to start randomly overriding ndo_get_phys_port_name with its own CPU port, we basically lose any predictability in interface naming, or even uniqueness, for that matter. So, there are reasons to let DSA override the master's callback (to provide a consistent interface, a number which has a clear meaning and must not be interpreted according to context), and there are reasons to not let DSA override it (it breaks udev matching for the DSA master). But, there is an alternative method for users to retrieve the number of the CPU port of each DSA switch in the system: $ devlink port pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0 pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2 pci/0000:00:00.5/4: type notset flavour cpu port 4 spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0 spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1 spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2 spi/spi2.0/4: type notset flavour cpu port 4 spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0 spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1 spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2 spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3 spi/spi2.1/4: type notset flavour cpu port 4 So remove this duplicated, unreliable and troublesome method. From this patch on, the phys_port_name attribute of the DSA master will only contain information about itself (if at all). If the users need reliable information about the CPU port they're probably using devlink anyway. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 15:14:58 -07:00
Yuchung Cheng	76be93fc07	tcp: allow at most one TLP probe per flight Previously TLP may send multiple probes of new data in one flight. This happens when the sender is cwnd limited. After the initial TLP containing new data is sent, the sender receives another ACK that acks partial inflight. It may re-arm another TLP timer to send more, if no further ACK returns before the next TLP timeout (PTO) expires. The sender may send in theory a large amount of TLP until send queue is depleted. This only happens if the sender sees such irregular uncommon ACK pattern. But it is generally undesirable behavior during congestion especially. The original TLP design restrict only one TLP probe per inflight as published in "Reducing Web Latency: the Virtue of Gentle Aggression", SIGCOMM 2013. This patch changes TLP to send at most one probe per inflight. Note that if the sender is app-limited, TLP retransmits old data and did not have this issue. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 12:23:32 -07:00
Dan Carpenter	17ad73e941	AX.25: Prevent integer overflows in connect and sendmsg We recently added some bounds checking in ax25_connect() and ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because they were no longer required. Unfortunately, I believe they are required to prevent integer overflows so I have added them back. Fixes: `8885bb0621` ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()") Fixes: `2f2a7ffad5` ("AX.25: Fix out-of-bounds read in ax25_connect()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 12:09:57 -07:00
Tom Parkin	70c05bfa4a	l2tp: cleanup kzalloc calls Passing "sizeof(struct blah)" in kzalloc calls is less readable, potentially prone to future bugs if the type of the pointer is changed, and triggers checkpatch warnings. Tweak the kzalloc calls in l2tp which use this form to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	0787840dad	l2tp: cleanup netlink tunnel create address handling When creating an L2TP tunnel using the netlink API, userspace must either pass a socket FD for the tunnel to use (for managed tunnels), or specify the tunnel source/destination address (for unmanaged tunnels). Since source/destination addresses may be AF_INET or AF_INET6, the l2tp netlink code has conditionally compiled blocks to support IPv6. Rather than embedding these directly into l2tp_nl_cmd_tunnel_create (where it makes the code difficult to read and confuses checkpatch to boot) split the handling of address-related attributes into a separate function. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	584ca31f46	l2tp: cleanup netlink send of tunnel address information l2tp_nl_tunnel_send has conditionally compiled code to support AF_INET6, which makes the code difficult to follow and triggers checkpatch warnings. Split the code out into functions to handle the AF_INET v.s. AF_INET6 cases, which both improves readability and resolves the checkpatch warnings. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	26d9a27106	l2tp: check socket address type in l2tp_dfs_seq_tunnel_show checkpatch warns about indentation and brace balancing around the conditionally compiled code for AF_INET6 support in l2tp_dfs_seq_tunnel_show. By adding another check on the socket address type we can make the code more readable while removing the checkpatch warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	6c0ec37b82	l2tp: cleanup unnecessary braces in if statements These checks are all simple and don't benefit from extra braces to clarify intent. Remove them for easier-reading code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Tom Parkin	0febc7b3cd	l2tp: cleanup comparisons to NULL checkpatch warns about comparisons to NULL, e.g. CHECK: Comparison to NULL could be written "!rt" #474: FILE: net/l2tp/l2tp_ip.c:474: + if (rt == NULL) { These sort of comparisons are generally clearer and more readable the way checkpatch suggests, so update l2tp accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:54:40 -07:00
Miaohe Lin	49b0aa1b65	net/ncsi: use eth_zero_addr() to clear mac address Use eth_zero_addr() to clear mac address insetad of memset(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:49:41 -07:00
Paolo Abeni	4cf8b7e48a	subflow: introduce and use mptcp_can_accept_new_subflow() So that we can easily perform some basic PM-related adimission checks before creating the child socket. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:25 -07:00
Paolo Abeni	97e617518c	subflow: use rsk_ops->send_reset() tcp_send_active_reset() is more prone to transient errors (memory allocation or xmit queue full): in stress conditions the kernel may drop the egress packet, and the client will be stuck. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:25 -07:00
Paolo Abeni	b7514694ed	subflow: explicitly check for plain tcp rsk When syncookie are in use, the TCP stack may feed into subflow_syn_recv_sock() plain TCP request sockets. We can't access mptcp_subflow_request_sock-specific fields on such sockets. Explicitly check the rsk ops to do safe accesses. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:25 -07:00
Paolo Abeni	fa25e815d9	mptcp: cleanup subflow_finish_connect() The mentioned function has several unneeded branches, handle each case - MP_CAPABLE, MP_JOIN, fallback - under a single conditional and drop quite a bit of duplicate code. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	b93df08ccd	mptcp: explicitly track the fully established status Currently accepted msk sockets become established only after accept() returns the new sk to user-space. As MP_JOIN request are refused as per RFC spec on non fully established socket, the above causes mp_join self-tests instabilities. This change lets the msk entering the established status as soon as it receives the 3rd ack and propagates the first subflow fully established status on the msk socket. Finally we can change the subflow acceptance condition to take in account both the sock state and the msk fully established flag. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	0235d075a5	mptcp: mark as fallback even early ones In the unlikely event of a failure at connect time, we currently clear the request_mptcp flag - so that the MPC handshake is not started at all, but the msk is not explicitly marked as fallback. This would lead to later insertion of wrong DSS options in the xmitted packets, in violation of RFC specs and possibly fooling the peer. Fixes: `e1ff9e82e2` ("net: mptcp: improve fallback to TCP") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	53eb4c383d	mptcp: avoid data corruption on reinsert When updating a partially acked data fragment, we actually corrupt it. This is irrelevant till we send data on a single subflow, as retransmitted data, if any are discarded by the peer as duplicate, but it will cause data corruption as soon as we will start creating non backup subflows. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Paolo Abeni	b0977bb268	subflow: always init 'rel_write_seq' Currently we do not init the subflow write sequence for MP_JOIN subflows. This will cause bad mapping being generated as soon as we will use non backup subflow. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 11:47:24 -07:00
Tom Parkin	efcd8c8540	l2tp: avoid precidence issues in L2TP_SKB_CB macro checkpatch warned about the L2TP_SKB_CB macro's use of its argument: add braces to avoid the problem. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	c0235fb39b	l2tp: line-break long function prototypes In l2tp_core.c both l2tp_tunnel_create and l2tp_session_create take quite a number of arguments and have a correspondingly long prototype. This is both quite difficult to scan visually, and triggers checkpatch warnings. Add a line break to make these function prototypes more readable. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	bdf9866e4b	l2tp: prefer seq_puts for unformatted output checkpatch warns about use of seq_printf where seq_puts would do. Modify l2tp_debugfs accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	dbf82f3fac	l2tp: prefer using BIT macro Use BIT(x) rather than (1<<x), reported by checkpatch.pl. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	bef04d162c	l2tp: add identifier name in function pointer prototype Reported by checkpatch: "WARNING: function definition argument 'struct sock *' should also have an identifier name" Add an identifier name to help document the prototype. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	0864e331fd	l2tp: cleanup suspect code indent l2tp_core has conditionally compiled code in l2tp_xmit_skb for IPv6 support. The structure of this code triggered a checkpatch warning due to incorrect indentation. Fix up the indentation to address the checkpatch warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	8ce9825a59	l2tp: cleanup wonky alignment of line-broken function calls Arguments should be aligned with the function call open parenthesis as per checkpatch. Tweak some function calls which were not aligned correctly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	9f7da9a0e3	l2tp: cleanup difficult-to-read line breaks Some l2tp code had line breaks which made the code more difficult to read. These were originally motivated by the 80-character line width coding guidelines, but were actually a negative from the perspective of trying to follow the code. Remove these linebreaks for clearer code, even if we do exceed 80 characters in width in some places. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	20dcb1107a	l2tp: cleanup comments Modify some l2tp comments to better adhere to kernel coding style, as reported by checkpatch.pl. Add descriptive comments for the l2tp per-net spinlocks to document their use. Fix an incorrect comment in l2tp_recv_common: RFC2661 section 5.4 states that: "The LNS controls enabling and disabling of sequence numbers by sending a data message with or without sequence numbers present at any time during the life of a session." l2tp handles this correctly in l2tp_recv_common, but the comment around the code was incorrect and confusing. Fix up the comment accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Tom Parkin	b71a61ccfe	l2tp: cleanup whitespace use Fix up various whitespace issues as reported by checkpatch.pl: * remove spaces around operators where appropriate, * add missing blank lines following declarations, * remove multiple blank lines, or trailing blank lines at the end of functions. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:08:39 -07:00
Peilin Ye	8885bb0621	AX.25: Prevent out-of-bounds read in ax25_sendmsg() Checks on `addr_len` and `usax->sax25_ndigis` are insufficient. ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7 or 8. Fix it. It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)` Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:06:49 -07:00
Parav Pandit	637989b5d7	devlink: Always use user_ptr[0] for devlink and simplify post_doit Currently devlink instance is searched on all doit() operations. But it is optionally stored into user_ptr[0]. This requires rediscovering devlink again doing post_doit(). Few devlink commands related to port shared buffers needs 3 pointers (devlink, devlink_port, and devlink_sb) while executing doit commands. Though devlink pointer can be derived from the devlink_port during post_doit() operation when doit() callback has acquired devlink instance lock, relying on such scheme to access devlik pointer makes code very fragile. Hence, to avoid ambiguity in post_doit() and to avoid searching devlink instance again, simplify code by always storing devlink instance in user_ptr[0] and derive devlink_sb pointer in their respective callback routines. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:06:08 -07:00
Xin Long	3ecdda3e9a	sctp: shrink stream outq when fails to do addstream reconf When adding a stream with stream reconf, the new stream firstly is in CLOSED state but new out chunks can still be enqueued. Then once gets the confirmation from the peer, the state will change to OPEN. However, if the peer denies, it needs to roll back the stream. But when doing that, it only sets the stream outcnt back, and the chunks already in the new stream don't get purged. It caused these chunks can still be dequeued in sctp_outq_dequeue_data(). As its stream is still in CLOSE, the chunk will be enqueued to the head again by sctp_outq_head_data(). This chunk will never be sent out, and the chunks after it can never be dequeued. The assoc will be 'hung' in a dead loop of sending this chunk. To fix it, this patch is to purge these chunks already in the new stream by calling sctp_stream_shrink_out() when failing to do the addstream reconf. Fixes: `11ae76e67a` ("sctp: implement receiver-side procedures for the Reconf Response Parameter") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:00:12 -07:00
Xin Long	8f13399db2	sctp: shrink stream outq only when new outcnt < old outcnt It's not necessary to go list_for_each for outq->out_chunk_list when new outcnt >= old outcnt, as no chunk with higher sid than new (outcnt - 1) exists in the outqueue. While at it, also move the list_for_each code in a new function sctp_stream_shrink_out(), which will be used in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 18:00:12 -07:00
Paolo Abeni	6ab301c98f	mptcp: zero token hash at creation time. Otherwise the 'chain_len' filed will carry random values, some token creation calls will fail due to excessive chain length, causing unexpected fallback to TCP. Fixes: `2c5ebd001d` ("mptcp: refactor token container") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 17:57:37 -07:00
Peilin Ye	2f2a7ffad5	AX.25: Fix out-of-bounds read in ax25_connect() Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient. ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis` equals to 7 or 8. Fix it. This issue has been reported as a KMSAN uninit-value bug, because in such a case, ax25_connect() reaches into the uninitialized portion of the `struct sockaddr_storage` statically allocated in __sys_connect(). It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)`. Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af66 Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 17:56:40 -07:00
Richard Sailer	749c08f820	net: dccp: Add SIOCOUTQ IOCTL support (send buffer fill) This adds support for the SIOCOUTQ IOCTL to get the send buffer fill of a DCCP socket, like UDP and TCP sockets already have. Regarding the used data field: DCCP uses per packet sequence numbers, not per byte, so sequence numbers can't be used like in TCP. sk_wmem_queued is not used by DCCP and always 0, even in test on highly congested paths. Therefore this uses sk_wmem_alloc like in UDP. Signed-off-by: Richard Sailer <richard_siegfried@systemli.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 17:00:37 -07:00
Kurt Kanzenbach	85e05d263e	net: dsa: of: Allow ethernet-ports as encapsulating node Due to unified Ethernet Switch Device Tree Bindings allow for ethernet-ports as encapsulating node as well. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 16:56:43 -07:00
Christoph Hellwig	a6c0d0934f	net: explicitly include <linux/compat.h> in net/core/sock.c The buildbot found a config where the header isn't already implicitly pulled in, so add an explicit include as well. Fixes: `8c918ffbba` ("net: remove compat_sock_common_{get,set}sockopt") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 13:01:10 -07:00
David S. Miller	dee72f8a0c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-07-21 The following pull-request contains BPF updates for your net-next tree. We've added 46 non-merge commits during the last 6 day(s) which contain a total of 68 files changed, 4929 insertions(+), 526 deletions(-). The main changes are: 1) Run BPF program on socket lookup, from Jakub. 2) Introduce cpumap, from Lorenzo. 3) s390 JIT fixes, from Ilya. 4) teach riscv JIT to emit compressed insns, from Luke. 5) use build time computed BTF ids in bpf iter, from Yonghong. ==================== Purely independent overlapping changes in both filter.h and xdp.h Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 12:35:33 -07:00
Mark Salyzyn	37bd22420f	af_key: pfkey_dump needs parameter validation In pfkey_dump() dplen and splen can both be specified to access the xfrm_address_t structure out of bounds in__xfrm_state_filter_match() when it calls addr_match() with the indexes. Return EINVAL if either are out of range. Signed-off-by: Mark Salyzyn <salyzyn@android.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-team@android.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-22 13:33:22 +02:00
Peter Zijlstra	015dc08918	Merge branch 'sched/urgent'	2020-07-22 10:22:02 +02:00
Florian Westphal	c1d069e3bf	mptcp: move helper to where its used Only used in token.c. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:22:18 -07:00
guodeqing	8210e344cc	ipvs: fix the connection sync failed in some cases The sync_thread_backup only checks sk_receive_queue is empty or not, there is a situation which cannot sync the connection entries when sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf, the sync packets are dropped in __udp_enqueue_schedule_skb, this is because the packets in reader_queue is not read, so the rmem is not reclaimed. Here I add the check of whether the reader_queue of the udp sock is empty or not to solve this problem. Fixes: `2276f58ac5` ("udp: use a separate rx queue for packet reception") Reported-by: zhouxudong <zhouxudong8@huawei.com> Signed-off-by: guodeqing <geffrey.guo@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-22 01:21:34 +02:00
Gustavo A. R. Silva	954d82979b	netfilter: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-22 01:18:05 +02:00
Andrew Sy Kim	35dfb01314	ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1 When expire_nodest_conn=1 and a destination is deleted, IPVS does not expire the existing connections until the next matching incoming packet. If there are many connection entries from a single client to a single destination, many packets may get dropped before all the connections are expired (more likely with lots of UDP traffic). An optimization can be made where upon deletion of a destination, IPVS queues up delayed work to immediately expire any connections with a deleted destination. This ensures any reused source ports from a client (within the IPVS timeouts) are scheduled to new real servers instead of silently dropped. Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-22 01:17:59 +02:00
Parav Pandit	eac5f8a95a	devlink: Constify devlink instance pointer Constify devlink instance pointer while checking if reload operation is supported or not. This helps to review the scope of checks done in reload. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:14:58 -07:00
Parav Pandit	9232a3e67b	devlink: Avoid duplicate check for reload enabled flag Reload operation is enabled or not is already checked by devlink_reload(). Hence, remove the duplicate check from devlink_nl_cmd_reload(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:14:58 -07:00
Parav Pandit	6553e561ca	devlink: Do not hold devlink mutex when initializing devlink fields There is no need to hold a device global lock when initializing devlink device fields of a devlink instance which is not yet part of the devices list. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 16:14:58 -07:00
Miaohe Lin	b0a422772f	net: udp: Fix wrong clean up for IS_UDPLITE macro We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is checked. Fixes: `b2bf1e2659` ("[UDP]: Clean up for IS_UDPLITE macro") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:41:49 -07:00
Xiongfeng Wang	9bb5fbea59	net-sysfs: add a newline when printing 'tx_timeout' by sysfs When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to add a newline for easy reading. root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout 0root@syzkaller:~# Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:35:58 -07:00
Kuniyuki Iwashima	efc6b6f6c3	udp: Improve load balancing for SO_REUSEPORT. Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index \| connected \| reciprocal_scale \| result --------------------------------------------- 0 \| no \| 20% \| 40% 1 \| no \| 20% \| 20% 2 \| yes \| 20% \| 0% 3 \| no \| 20% \| 40% 4 \| yes \| 20% \| 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: `acdcecc612` ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:31:02 -07:00
Kuniyuki Iwashima	f2b2c55e51	udp: Copy has_conns in reuseport_grow(). If an unconnected socket in a UDP reuseport group connect()s, has_conns is set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all sockets in udp_hslot looking for the connected socket with the highest score. However, when the number of sockets bound to the port exceeds max_socks, reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2() to return without scanning all sockets, resulting in that packets sent to connected sockets may be distributed to unconnected sockets. Therefore, reuseport_grow() should copy has_conns. Fixes: `acdcecc612` ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-21 15:31:02 -07:00
Yonghong Song	951cf368bc	bpf: net: Use precomputed btf_id for bpf iterators One additional field btf_id is added to struct bpf_ctx_arg_aux to store the precomputed btf_ids. The btf_id is computed at build time with BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions. All existing bpf iterators are changed to used pre-compute btf_ids. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Yonghong Song	fce557bcef	bpf: Make btf_sock_ids global tcp and udp bpf_iter can reuse some socket ids in btf_sock_ids, so make it global. I put the extern definition in btf_ids.h as a central place so it can be easily discovered by developers. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163402.1393427-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Yonghong Song	bc4f0548f6	bpf: Compute bpf_skc_to_() helper socket btf ids at build time Currently, socket types (struct tcp_sock, udp_sock, etc.) used by bpf_skc_to_() helpers are computed when vmlinux_btf is first built in the kernel. Commit `5a2798ab32` ("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros") implemented a mechanism to compute btf_ids at kernel build time which can simplify kernel implementation and reduce runtime overhead by removing in-kernel btf_id calculation. This patch did exactly this, removing in-kernel btf_id computation and utilizing build-time btf_id computation. If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will define an array with size of 5, which is not enough for btf_sock_ids. So define its own static array if CONFIG_DEBUG_INFO_BTF is not defined. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Daniel Lezcano	c62e7ac395	net: genetlink: Move initialization to core_initcall The generic netlink is initialized far after the netlink protocol itself at subsys_initcall. The devlink is initialized at the same level, but after, as shown by a disassembly of the vmlinux: [ ... ] 374 ffff8000115f22c0 <__initcall_devlink_init4>: 375 ffff8000115f22c4 <__initcall_genl_init4>: [ ... ] The function devlink_init() calls genl_register_family() before the generic netlink subsystem is initialized. As the generic netlink initcall level is set since 2005, it seems that was not a problem, but now we have the thermal framework initialized at the core_initcall level which creates the generic netlink family and sends a notification which leads to a subtle memory corruption only detectable when the CONFIG_INIT_ON_ALLOC_DEFAULT_ON option is set with the earlycon at init time. The thermal framework needs to be initialized early in order to begin the mitigation as soon as possible. Moving it to postcore_initcall is acceptable. This patch changes the initialization level for the generic netlink family to the core_initcall and comes after the netlink protocol initialization. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Link: https://lore.kernel.org/r/20200715074120.8768-1-daniel.lezcano@linaro.org	2020-07-21 10:40:08 +02:00
Steffen Klassert	b328ecc468	xfrm: Make the policy hold queue work with VTI. We forgot to support the xfrm policy hold queue when VTI was implemented. This patch adds everything we need so that we can use the policy hold queue together with VTI interfaces. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-07-21 08:34:44 +02:00
Tung Nguyen	6ef9dcb780	tipc: allow to build NACK message in link timeout function Commit `02288248b0` ("tipc: eliminate gap indicator from ACK messages") eliminated sending of the 'gap' indicator in regular ACK messages and only allowed to build NACK message with enabled probe/probe_reply. However, necessary correction for building NACK message was missed in tipc_link_timeout() function. This leads to significant delay and link reset (due to retransmission failure) in lossy environment. This commit fixes it by setting the 'probe' flag to 'true' when the receive deferred queue is not empty. As a result, NACK message will be built to send back to another peer. Fixes: `02288248b0` ("tipc: eliminate gap indicator from ACK messages") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 20:11:22 -07:00
wenxu	ae372cb175	net/sched: act_ct: fix restore the qdisc_skb_cb after defrag The fragment packets do defrag in tcf_ct_handle_fragments will clear the skb->cb which make the qdisc_skb_cb clear too. So the qdsic_skb_cb should be store before defrag and restore after that. It also update the pkt_len after all the fragments finish the defrag to one packet and make the following actions counter correct. Fixes: `b57dc7c13e` ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:36:37 -07:00
Vladimir Oltean	71d4364abd	net: dsa: use the ETH_MIN_MTU and ETH_DATA_LEN default values Now that DSA supports MTU configuration, undo the effects of commit `8b1efc0f83` ("net: remove MTU limits on a few ether_setup callers") and let DSA interfaces use the default min_mtu and max_mtu specified by ether_setup(). This is more important for min_mtu: since DSA is Ethernet, the minimum MTU is the same as of any other Ethernet interface, and definitely not zero. For the max_mtu, we have a callback through which drivers can override that, if they want to. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:35:04 -07:00
Wang Hai	2b96692bcf	net: hsr: remove redundant null check Because kfree_skb already checked NULL skb parameter, so the additional checks are unnecessary, just remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:33:32 -07:00
Murali Karicheri	5d93518e06	net: hsr: check for return value of skb_put_padto() skb_put_padto() can fail. So check for return type and return NULL for skb. Caller checks for skb and acts correctly if it is NULL. Fixes: `6d6148bc78` ("net: hsr: fix incorrect lsdu size in the tag of HSR frames for small frames") Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:02:28 -07:00
Karsten Graul	a9e4450295	net/smc: fix dmb buffer shortage There is a current limit of 1920 registered dmb buffers per ISM device for smc-d. One link group can contain 255 connections, each connection is using one dmb buffer. When the connection is closed then the registered buffer is held in a queue and is reused by the next connection. When a link group is 'full' then another link group is created and uses an own buffer pool. The link groups are added to a list using list_add() which puts a new link group to the first position in the list. In the situation that many connections are opened (>1920) and a few of them stay open while others are closed quickly we end up with at least 8 link groups. For a new connection a matching link group is looked up, iterating over the list of link groups. The trailing 7 link groups all have registered dmb buffers which could be reused, while the first link group has only a few dmb buffers and then hit the 1920 limit. Because the first link group is not full (255 connection limit not reached) it is chosen and finally the connection falls back to TCP because there is no dmb buffer available in this link group. There are multiple ways to fix that: using list_add_tail() allows to scan older link groups first for free buffers which ensures that buffers are reused first. This fixes the problem for smc-r link groups as well. For smc-d there is an even better way to address this problem because smc-d does not have the 255 connections per link group limit. So fix the problem for smc-d by allowing large link groups. Fixes: `c6ba7c9ba4` ("net/smc: add base infrastructure for SMC-D and ISM") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:52:25 -07:00
Karsten Graul	2bced6aefa	net/smc: put slot when connection is killed To get a send slot smc_wr_tx_get_free_slot() is called, which might wait for a free slot. When smc_wr_tx_get_free_slot() returns there is a check if the connection was killed in the meantime. In that case don't only return an error, but also put back the free slot. Fixes: `b290098092` ("net/smc: cancel send and receive for terminated socket") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:52:25 -07:00
David Howells	639f181f0e	rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read. Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read as this particular error doesn't get stored in ->sk_err by the networking core. Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive errors (there's no way for it to report which call, if any, the error was caused by). Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:47:10 -07:00
Jiri Pirko	a8b7b2d0b3	sched: sch_api: add missing rcu read lock to silence the warning In case the qdisc_match_from_root function() is called from non-rcu path with rtnl mutex held, a suspiciout rcu usage warning appears: [ 241.504354] ============================= [ 241.504358] WARNING: suspicious RCU usage [ 241.504366] 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 Not tainted [ 241.504370] ----------------------------- [ 241.504378] net/sched/sch_api.c:270 RCU-list traversed in non-reader section!! [ 241.504382] other info that might help us debug this: [ 241.504388] rcu_scheduler_active = 2, debug_locks = 1 [ 241.504394] 1 lock held by tc/1391: [ 241.504398] #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0 [ 241.504431] stack backtrace: [ 241.504440] CPU: 0 PID: 1391 Comm: tc Not tainted 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 [ 241.504446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [ 241.504453] Call Trace: [ 241.504465] dump_stack+0x100/0x184 [ 241.504482] lockdep_rcu_suspicious+0x153/0x15d [ 241.504499] qdisc_match_from_root+0x293/0x350 Fix this by passing the rtnl held lockdep condition down to hlist_for_each_entry_rcu() Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 17:00:02 -07:00
Florian Fainelli	9c0c7014f3	net: dsa: Setup dsa_netdev_ops Now that we have all the infrastructure in place for calling into the dsa_ptr->netdev_ops function pointers, install them when we configure the DSA CPU/management interface and tear them down. The flow is unchanged from before, but now we preserve equality of tests when network device drivers do tests like dev->netdev_ops == &foo_ops which was not the case before since we were allocating an entirely new structure. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 16:48:22 -07:00
Florian Fainelli	3369afba1e	net: Call into DSA netdevice_ops wrappers Make the core net_device code call into our ndo_do_ioctl() and ndo_get_phys_port_name() functions via the wrappers defined previously Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 16:48:22 -07:00

... 12 13 14 15 16 ...

62755 Commits