====================
pull-request: ieee802154-next 2022-05-01
Miquel Raynal landed two patch series bundled in this pull request.
The first series re-works the symbol duration handling to better
accommodate the needs of the various phy layers in ieee802154.
In the second series Miquel improves th errors handling from drivers
up mac802154. THis streamlines the error handling throughout the
ieee/mac802154 stack in preparation for sync TX to be introduced for
MLME frames.
====================
Link: https://lore.kernel.org/r/20220501194614.1198325-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
__rtnl_newlink() is 250LoC, but has a few clear sections.
Move the part which creates a new netdev to a separate
function.
For ease of review code will be moved in the next change.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit a293974590 ("rtnetlink: avoid frame size warning in rtnl_newlink()")
moved to allocating the largest attribute array of rtnl_newlink()
on the heap. Kalle reports the stack has grown above 1k again:
net/core/rtnetlink.c:3557:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Move more attrs to the heap, wrap them in a struct.
Don't bother with linkinfo, it's referenced a lot and we take
its size so it's awkward to move, plus it's small (6 elements).
Reported-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Recently we made o_seqno atomic_t. Stop special-casing TUNNEL_SEQ, and
always mark IP6GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Recently we made o_seqno atomic_t. Stop special-casing TUNNEL_SEQ, and
always mark GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The device_is_registered() in nfc core is used to check whether
nfc device is registered in netlink related functions such as
nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
is protected by device_lock, there is still a race condition between
device_del() and device_is_registered(). The root cause is that
kobject_del() in device_del() is not protected by device_lock.
(cleanup task) | (netlink task)
|
nfc_unregister_device | nfc_fw_download
device_del | device_lock
... | if (!device_is_registered)//(1)
kobject_del//(2) | ...
... | device_unlock
The device_is_registered() returns the value of state_in_sysfs and
the state_in_sysfs is set to zero in kobject_del(). If we pass check in
position (1), then set zero in position (2). As a result, the check
in position (1) is useless.
This patch uses bool variable instead of device_is_registered() to judge
whether the nfc device is registered, which is well synchronized.
Fixes: 3e256b8f8d ("NFC: add nfc subsystem core")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have a separate path for sock_def_write_space() and can go one
step further. When it's called from sock_wfree() we know that there is a
preceding atomic for putting down ->sk_wmem_alloc. We can use it to
replace to replace smb_mb() with a less expensive
smp_mb__after_atomic(). It also removes an extra RCU read lock/unlock as
a small bonus.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For non SOCK_USE_WRITE_QUEUE sockets, sock_wfree() (atomically) puts
->sk_wmem_alloc twice. It's needed to keep the socket alive while
calling ->sk_write_space() after the first put.
However, some sockets, such as UDP, are freed by RCU
(i.e. SOCK_RCU_FREE) and use already RCU-safe sock_def_write_space().
Carve a fast path for such sockets, put down all refs in one go before
calling sock_def_write_space() but guard the socket from being freed
by an RCU read section.
note: because TCP sockets are marked with SOCK_USE_WRITE_QUEUE it
doesn't add extra checks in its path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Except for minor rounding differences the first ->sk_wmem_alloc test in
sock_def_write_space() is a hand coded version of sock_writeable().
Replace it with the helper, and also kill the following if duplicating
the check.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever RCU protected list replaces an object,
the pointer to the new object needs to be updated
_before_ the call to kfree_rcu() or call_rcu()
Also ip6_mc_msfilter() needs to update the pointer
before releasing the mc_lock mutex.
Note that linux-5.13 was supporting kfree_rcu(NULL, rcu),
so this fix does not need the conditional test I was
forced to use in the equivalent patch for IPv4.
Fixes: 882ba1f73c ("mld: convert ipv6_mc_socklist->sflist to RCU")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF_RXRPC doesn't currently enable IPv6 UDP Tx checksums on the transport
socket it opens and the checksums in the packets it generates end up 0.
It probably should also enable IPv6 UDP Rx checksums and IPv4 UDP
checksums. The latter only seem to be applied if the socket family is
AF_INET and don't seem to apply if it's AF_INET6. IPv4 packets from an
IPv6 socket seem to have checksums anyway.
What seems to have happened is that the inet_inv_convert_csum() call didn't
get converted to the appropriate udp_port_cfg parameters - and
udp_sock_create() disables checksums unless explicitly told not too.
Fix this by enabling the three udp_port_cfg checksum options.
Fixes: 1a9b86c9fd ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: Vadim Fedorenko <vfedorenko@novek.ru>
cc: David S. Miller <davem@davemloft.net>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit f84af32cbc ("net: ip_queue_rcv_skb() helper")
I dropped the skb dst in tcp_data_queue().
This only dealt with so-called TCP input slow path.
When fast path is taken, tcp_rcv_established() calls
tcp_queue_rcv() while skb still has a dst.
This was mostly fine, because most dsts at this point
are not refcounted (thanks to early demux)
However, TCP packets sent over loopback have refcounted dst.
Then commit 68822bdf76 ("net: generalize skb freeing
deferral to per-cpu lists") came and had the effect
of delaying skb freeing for an arbitrary time.
If during this time the involved netns is dismantled, cleanup_net()
frees the struct net with embedded net->ipv6.ip6_dst_ops.
Then when eventually dst_destroy_rcu() is called,
if (dst->ops->destroy) ... triggers an use-after-free.
It is not clear if ip6_route_net_exit() lacks a rcu_barrier()
as syzbot reported similar issues before the blamed commit.
( https://groups.google.com/g/syzkaller-bugs/c/CofzW4eeA9A/m/009WjumTAAAJ )
Fixes: 68822bdf76 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Throw neigh checks in ip6_finish_output2() under a single slow path if,
so we don't have the overhead in the hot path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two callers of __ip6_finish_output(), both are in
ip6_finish_output(). We can combine the call sites into one and handle
return code after, that will inline __ip6_finish_output().
Note, error handling under NET_XMIT_CN will only return 0 if
__ip6_finish_output() succeded, and in this case it return 0.
Considering that NET_XMIT_SUCCESS is 0, it'll be returning exactly the
same result for it as before.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inline dev_queue_xmit() and dev_queue_xmit_accel(), they both are small
proxy functions doing nothing but redirecting the control flow to
__dev_queue_xmit().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_zerocopy_iter_dgram() is a small proxy function, inline it. For
that, move __zerocopy_sg_from_iter into linux/skbuff.h
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_alloc_send_skb() is simple and just proxying to another function,
so we can inline it and cut associated overhead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge net branch with the required patch for supporting the io_uring
feature that passes back whether we had more data in the socket or not.
* 'tcp-pass-back-data-left-in-socket-after-receive' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux:
tcp: pass back data left in socket after receive
* for-5.19/io_uring-socket: (73 commits)
io_uring: use the text representation of ops in trace
io_uring: rename op -> opcode
io_uring: add io_uring_get_opcode
io_uring: add type to op enum
io_uring: add socket(2) support
net: add __sys_socket_file()
io_uring: fix trace for reduced sqe padding
io_uring: add fgetxattr and getxattr support
io_uring: add fsetxattr and setxattr support
fs: split off do_getxattr from getxattr
fs: split off setxattr_copy and do_setxattr function from setxattr
io_uring: return an error when cqe is dropped
io_uring: use constants for cq_overflow bitfield
io_uring: rework io_uring_enter to simplify return value
io_uring: trace cqe overflows
io_uring: add trace support for CQE overflow
io_uring: allow re-poll if we made progress
io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)
io_uring: add support for IORING_ASYNC_CANCEL_ANY
io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key
...
This is currently done for CMSG_INQ, add an ability to do so via struct
msghdr as well and have CMSG_INQ use that too. If the caller sets
msghdr->msg_get_inq, then we'll pass back the hint in msghdr->msg_inq.
Rearrange struct msghdr a bit so we can add this member while shrinking
it at the same time. On a 64-bit build, it was 96 bytes before this
change and 88 bytes afterwards.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/650c22ca-cffc-0255-9a05-2413a1e20826@kernel.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 7073ea8799.
We must not try to connect the socket while the transport is under
construction, because the mechanisms to safely tear it down are not in
place. As the code stands, we end up leaking the sockets on a connection
error.
Reported-by: wanghai (M) <wanghai38@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The new net.mptcp.pm_type sysctl determines which path manager will be
used by each newly-created MPTCP socket.
v2: Handle builds without CONFIG_SYSCTL
v3: Clarify logic for type-specific PM init (Geliang Tang and Paolo Abeni)
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Userspace-managed sockets should not have their subflows or
advertisements changed by the kernel path manager.
v3: Use helper function for PM mode (Paolo Abeni)
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a MPTCP connection is managed by a userspace PM, bypass the kernel
PM for incoming advertisements and subflow events. Netlink events are
still sent to userspace.
v2: Remove unneeded check in mptcp_pm_rm_addr_received() (Kishen Maloor)
v3: Add and use helper function for PM mode (Paolo Abeni)
Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When adding support for netlink path management commands, the kernel
needs to know whether paths are being controlled by the in-kernel path
manager or a userspace PM.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A few members of the mptcp_pm_data struct were assigned to hard-coded
values in mptcp_pm_data_reset(), and then immediately changed in
mptcp_pm_nl_data_init().
Instead, flatten all the assignments in to mptcp_pm_data_reset().
v2: Resolve conflicts due to rename of mptcp_pm_data_reset()
v4: Resolve conflict in mptcp_pm_data_reset()
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ceph client fixes from Ilya Dryomov:
"A fix for a NULL dereference that turns out to be easily triggerable
by fsync (marked for stable) and a false positive WARN and snap_rwsem
locking fixups"
* tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client:
ceph: fix possible NULL pointer dereference for req->r_session
ceph: remove incorrect session state check
ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_cap
libceph: disambiguate cluster/pool full log message
For reasons best known to the author, gss-proxy does not implement a
NULL procedure, and returns RPC_PROC_UNAVAIL. However we still want to
ensure that we connect to the service at setup time.
So add a quirk-flag specially for this case.
Fixes: 1d658336b0 ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When the rpcbind server closes the socket, we need to ensure that the
socket is closed by the kernel as soon as feasible, so add a
sk_state_change callback to trigger this close.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
As a carry over from the CAN_RAW socket (which allows to change the CAN
interface while mantaining the filter setup) the re-binding of the
CAN_ISOTP socket needs to take care about CAN ID address information and
subscriptions. It turned out that this feature is so limited (e.g. the
sockopts remain fix) that it finally has never been needed/used.
In opposite to the stateless CAN_RAW socket the switching of the CAN ID
subscriptions might additionally lead to an interrupted ongoing PDU
reception. So better remove this unneeded complexity.
Fixes: e057dd3fc2 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220422082337.1676-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix incorrect TCP connection tracking window reset for non-syn
packets, from Florian Westphal.
2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk.
3) Fix nft_socket from the output path, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_socket: only do sk lookups when indev is available
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
====================
Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If there is still a closed socket associated with the transport, then we
need to trigger an autoclose before we can set up a new connection.
Reported-by: wanghai (M) <wanghai38@huawei.com>
Fixes: f00432063d ("SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer. Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hvs_stream_dequeue().
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220428145107.7878-4-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Pointers to VMbus packets sent by Hyper-V are used by the hv_sock driver
within the guest VM. Hyper-V can send packets with erroneous values or
modify packet fields after they are processed by the guest. To defend
against these scenarios, copy the incoming packet after validating its
length and offset fields using hv_pkt_iter_{first,next}(). Use
HVS_PKT_LEN(HVS_MTU_SIZE) to initialize the buffer which holds the
copies of the incoming packets. In this way, the packet can no longer
be modified by the host.
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220428145107.7878-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Check if the incoming interface is available and NFT_BREAK
in case neither skb->sk nor input device are set.
Because nf_sk_lookup_slow*() assume packet headers are in the
'in' direction, use in postrouting is not going to yield a meaningful
result. Same is true for the forward chain, so restrict the use
to prerouting, input and output.
Use in output work if a socket is already attached to the skb.
Fixes: 554ced0a6e ("netfilter: nf_tables: add support for native socket matching")
Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression causing some HCI events to be discarded when they
shouldn't.
* tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
====================
Link: https://lore.kernel.org/r/20220427234031.1257281-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-04-27
We've added 85 non-merge commits during the last 18 day(s) which contain
a total of 163 files changed, 4499 insertions(+), 1521 deletions(-).
The main changes are:
1) Teach libbpf to enhance BPF verifier log with human-readable and relevant
information about failed CO-RE relocations, from Andrii Nakryiko.
2) Add typed pointer support in BPF maps and enable it for unreferenced pointers
(via probe read) and referenced ones that can be passed to in-kernel helpers,
from Kumar Kartikeya Dwivedi.
3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward
progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel.
4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced
the effective prog array before the rcu_read_lock, from Stanislav Fomichev.
5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic
for USDT arguments under riscv{32,64}, from Pu Lehui.
6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire.
7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage
so it can be shipped under Alpine which is musl-based, from Dominique Martinet.
8) Clean up {sk,task,inode} local storage trace RCU handling as they do not
need to use call_rcu_tasks_trace() barrier, from KP Singh.
9) Improve libbpf API documentation and fix error return handling of various
API functions, from Grant Seltzer.
10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data
length of frags + frag_list may surpass old offset limit, from Liu Jian.
11) Various improvements to prog_tests in area of logging, test execution
and by-name subtest selection, from Mykola Lysenko.
12) Simplify map_btf_id generation for all map types by moving this process
to build time with help of resolve_btfids infra, from Menglong Dong.
13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*()
helpers; the probing caused always to use old helpers, from Runqing Yang.
14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS
tracing macros, from Vladimir Isaev.
15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel
switched to memcg-based memory accouting a while ago, from Yafang Shao.
16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu.
17) Fix BPF selftests in two occasions to work around regressions caused by latest
LLVM to unblock CI until their fixes are worked out, from Yonghong Song.
18) Misc cleanups all over the place, from various others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add libbpf's log fixup logic selftests
libbpf: Fix up verifier log for unguarded failed CO-RE relos
libbpf: Simplify bpf_core_parse_spec() signature
libbpf: Refactor CO-RE relo human description formatting routine
libbpf: Record subprog-resolved CO-RE relocations unconditionally
selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests
libbpf: Avoid joining .BTF.ext data with BPF programs by section name
libbpf: Fix logic for finding matching program for CO-RE relocation
libbpf: Drop unhelpful "program too large" guess
libbpf: Fix anonymous type check in CO-RE logic
bpf: Compute map_btf_id during build time
selftests/bpf: Add test for strict BTF type check
selftests/bpf: Add verifier tests for kptr
selftests/bpf: Add C tests for kptr
libbpf: Add kptr type tag macros to bpf_helpers.h
bpf: Make BTF type match stricter for release arguments
bpf: Teach verifier about kptr_get kfunc helpers
bpf: Wire up freeing of referenced kptr
bpf: Populate pairs of btf_id and destructor kfunc in btf
bpf: Adapt copy_map_value for multiple offset case
...
====================
Link: https://lore.kernel.org/r/20220427224758.20976-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>