This introduces mgmt_alloc_skb and mgmt_send_event_skb which are
convenient when building MGMT events that have variable length as the
likes of skb_put_data can be used to insert portion directly on the skb
instead of having to first build an intermediate buffer just to be
copied over the skb.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This adds support for Set Privacy Mode when updating the resolving list
when HCI_CONN_FLAG_DEVICE_PRIVACY so the controller shall use Device
Mode for devices programmed in the resolving list, Device Mode is
actually required when the remote device are not able to use RPA as
otherwise the default mode is Network Privacy Mode in which only
allows RPAs thus the controller would filter out advertisement using
identity addresses for which there is an IRK.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This introduces HCI_CONN_FLAG_DEVICE_PRIVACY which can be used by
userspace to indicate to the controller to use Device Privacy Mode to a
specific device.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This reworks hci_conn_params flags to use bitmap_* helpers and add
support for setting the supported flags in hdev->conn_flags so it can
easily be accessed.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the LE Direct Advertising Report
events received have the minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the LE Extended Advertising Report
events received have the minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the LE Advertising Report events
received have the minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the Extended Inquiry Result events
received have the minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the Inquiry Result with RSSI events
received have the minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the Inquiry Result events received
have the minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the Number of Complete Packets events
received have the minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This uses skb_pull_data to check the BR/EDR events received have the
minimum required length.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Vladimir reported csum issues after my recent change in skb_postpull_rcsum()
Issue here is the following:
initial skb->csum is the csum of
[part to be pulled][rest of packet]
Old code:
skb->csum = csum_sub(skb->csum, csum_partial(pull, pull_length, 0));
New code:
skb->csum = ~csum_partial(pull, pull_length, ~skb->csum);
This is broken if the csum of [pulled part]
happens to be equal to skb->csum, because end
result of skb->csum is 0 in new code, instead
of being 0xffffffff
David Laight suggested to use
skb->csum = -csum_partial(pull, pull_length, -skb->csum);
I based my patches on existing code present in include/net/seg6.h,
update_csum_diff4() and update_csum_diff16() which might need
a similar fix.
I guess that my tests, mostly pulling 40 bytes of IPv6 header
were not providing enough entropy to hit this bug.
v2: added wsum_negate() to make sparse happy.
Fixes: 29c3002644 ("net: optimize skb_postpull_rcsum()")
Fixes: 0bd28476f6 ("gro: optimize skb_gro_postpull_rcsum()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Cc: David Lebrun <dlebrun@google.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211204045356.3659278-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Note that other ip_tunnel users do not seem to hold a reference
on tunnel->dev. Probably needs some investigations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We want to track all dev_hold()/dev_put() to ease leak hunting.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We want to track all dev_hold()/dev_put() to ease leak hunting.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After transfer errors it makes sense to cancel an ongoing synchronous
command that cannot complete anymore. To permit this, export the old
hci_req_sync_cancel function as hci_cmd_sync_cancel in the API.
Signed-off-by: Benjamin Berg <bberg@redhat.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
KCSAN reported a data-race [1] around tx_rebalance_counter
which can be accessed from different contexts, without
the protection of a lock/mutex.
[1]
BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor
write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0:
bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613
bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949
do_set_master net/core/rtnetlink.c:2521 [inline]
__rtnl_newlink net/core/rtnetlink.c:3475 [inline]
rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491
rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmsg+0x195/0x230 net/socket.c:2492
__do_sys_sendmsg net/socket.c:2501 [inline]
__se_sys_sendmsg net/socket.c:2499 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1:
bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511
process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
worker_thread+0x616/0xa70 kernel/workqueue.c:2445
kthread+0x2c7/0x2e0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30
value changed: 0x00000001 -> 0x00000064
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: bond1 bond_alb_monitor
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before commit faa041a40b ("ipv4: Create cleanup helper for fib_nh")
changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.
After the change, this is no longer the case, as free_fib_info_rcu()
runs after rcu grace period, without rtnl being held.
Fixes: faa041a40b ("ipv4: Create cleanup helper for fib_nh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phylink needs slightly more information than phylink_get_interfaces()
allows us to get from the DSA drivers - we need the MAC capabilities.
Replace the phylink_get_interfaces() method with phylink_get_caps() to
allow DSA drivers to fill in the phylink_config MAC capabilities field
as well.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The devlink_resources_unregister() used second parameter as an
entry point for the recursive removal of devlink resources. None
of the callers outside of devlink core needed to use this field,
so let's remove it.
As part of this removal, the "struct devlink_resource" was moved
from .h to .c file as it is not possible to use in any place in
the code except devlink.c.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we use hard code number to verify if we are in the
arp_interval timeslice. But some user may want to reduce/extend
the verify timeslice. With the similar team option 'missed_max'
the uers could change that number based on their own environment.
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.
However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.
This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: Xiumei Mu <xmu@redhat.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: 900575aa33 ("wireguard: device: avoid circular netns references")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The kernel leaks memory when a `fib` rule is present in IPv6 nftables
firewall rules and a suppress_prefix rule is present in the IPv6 routing
rules (used by certain tools such as wg-quick). In such scenarios, every
incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
After some hours of `bpftrace`-ing and source code reading, I tracked
down the issue to ca7a03c417 ("ipv6: do not free rt if
FIB_LOOKUP_NOREF is set on suppress rule").
The problem with that change is that the generic `args->flags` always have
`FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
`RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
decreasing the refcount when needed.
How to reproduce:
- Add the following nftables rule to a prerouting chain:
meta nfproto ipv6 fib saddr . mark . iif oif missing drop
This can be done with:
sudo nft create table inet test
sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
- Run:
sudo ip -6 rule add table main suppress_prefixlength 0
- Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
with every incoming ipv6 packet.
This patch exposes the protocol-specific flags to the protocol
specific `suppress` function, and check the protocol-specific `flags`
argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
FIB_LOOKUP_NOREF when decreasing the refcount, like this.
[1]: ca7a03c417/net/ipv6/fib6_rules.c (L71)
[2]: ca7a03c417/net/ipv6/fib6_rules.c (L99)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
Fixes: ca7a03c417 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen reported a TCP stream corruption for HTTP requests
served by the apache web-server using a cifs mount-point
and memory mapping the relevant file.
The root cause is quite similar to the one addressed by
commit 20eb4f29b6 ("net: fix sk_page_frag() recursion from
memory reclaim"). Here the nested access to the task page frag
is caused by a page fault on the (mmapped) user-space memory
buffer coming from the cifs file.
The page fault handler performs an smb transaction on a different
socket, inside the same process context. Since sk->sk_allaction
for such socket does not prevent the usage for the task_frag,
the nested allocation modify "under the hood" the page frag
in use by the outer sendmsg call, corrupting the stream.
The overall relevant stack trace looks like the following:
httpd 78268 [001] 3461630.850950: probe:tcp_sendmsg_locked:
ffffffff91461d91 tcp_sendmsg_locked+0x1
ffffffff91462b57 tcp_sendmsg+0x27
ffffffff9139814e sock_sendmsg+0x3e
ffffffffc06dfe1d smb_send_kvec+0x28
[...]
ffffffffc06cfaf8 cifs_readpages+0x213
ffffffff90e83c4b read_pages+0x6b
ffffffff90e83f31 __do_page_cache_readahead+0x1c1
ffffffff90e79e98 filemap_fault+0x788
ffffffff90eb0458 __do_fault+0x38
ffffffff90eb5280 do_fault+0x1a0
ffffffff90eb7c84 __handle_mm_fault+0x4d4
ffffffff90eb8093 handle_mm_fault+0xc3
ffffffff90c74f6d __do_page_fault+0x1ed
ffffffff90c75277 do_page_fault+0x37
ffffffff9160111e page_fault+0x1e
ffffffff9109e7b5 copyin+0x25
ffffffff9109eb40 _copy_from_iter_full+0xe0
ffffffff91462370 tcp_sendmsg_locked+0x5e0
ffffffff91462370 tcp_sendmsg_locked+0x5e0
ffffffff91462b57 tcp_sendmsg+0x27
ffffffff9139815c sock_sendmsg+0x4c
ffffffff913981f7 sock_write_iter+0x97
ffffffff90f2cc56 do_iter_readv_writev+0x156
ffffffff90f2dff0 do_iter_write+0x80
ffffffff90f2e1c3 vfs_writev+0xa3
ffffffff90f2e27c do_writev+0x5c
ffffffff90c042bb do_syscall_64+0x5b
ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65
The cifs filesystem rightfully sets sk_allocations to GFP_NOFS,
we can avoid the nesting using the sk page frag for allocation
lacking the __GFP_FS flag. Do not define an additional mm-helper
for that, as this is strictly tied to the sk page frag usage.
v1 -> v2:
- use a stricted sk_page_frag() check instead of reordering the
code (Eric)
Reported-by: Steffen Froemer <sfroemer@redhat.com>
Fixes: 5640f76858 ("net: use a per task frag allocator")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To replace unix_table_lock with per-hash locks in the next patch, we need
to save a hash in each socket because /proc/net/unix or BPF prog iterate
sockets while holding a hash table lock and release it later in a different
function.
Currently, we store a real/pseudo hash in struct unix_address. However, we
do not allocate it to unbound sockets, nor should we do just for that. For
this purpose, we can use sk_hash. Then, we no longer use the hash field in
struct unix_address and can remove it.
Also, this patch does
- rename unix_insert_socket() to unix_insert_unbound_socket()
- remove the redundant list argument from __unix_insert_socket() and
unix_insert_unbound_socket()
- use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash()
- remove 'inline' from unix_remove_socket() and
unix_insert_unbound_socket().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.
Use memset_after() so memset() doesn't get confused about writing
beyond the destination member that is intended to be the starting point
of zeroing through the end of the struct.
Additionally fix the common helper, ieee80211_tx_info_clear_status(),
which was not clearing ack_signal, but the open-coded versions
did. Johannes Berg points out this bug was introduced by commit
e3e1a0bcb3 ("mac80211: reduce IEEE80211_TX_MAX_RATES") but was harmless.
Also drops the associated unneeded BUILD_BUG_ON()s, and adds a note to
carl9170 about usage.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Christian Lamparter <chunkeey@gmail.com> [both CARL9170+P54USB on real HW]
Link: https://lore.kernel.org/r/20211118203839.1289276-1-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If necessary schedule offchan_cac_abort_wk work in cfg80211_radar_event
routine adding offchan parameter to cfg80211_radar_event signature.
Rename cfg80211_radar_event in __cfg80211_radar_event and introduce
the two following inline helpers:
- cfg80211_radar_event
- cfg80211_offchan_radar_event
Doing so the drv will not need to run cfg80211_offchan_cac_abort() after
radar detection on the offchannel chain.
Tested-by: Owen Peng <owen.peng@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/3ff583e021e3343a3ced54a7b09b5e184d1880dc.1637062727.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We can leverage third argument to csum_partial():
X = csum_sub(X, csum_partial(start, len, 0));
-->
X = csum_add(X, ~csum_partial(start, len, 0));
-->
X = ~csum_partial(start, len, ~X);
This removes one add/adc pair and its dependency against the carry flag.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the probe timer is reused as the raise timer when PLPMTUD is in
the Search Complete state. raise_count was introduced to count how many
times the probe timer has timed out. When raise_count reaches to 30, the
raise timer handler will be triggered.
During the whole processing above, the timer keeps timing out every probe_
interval. It is a waste for the Search Complete state, as the raise timer
only needs to time out after 30 * probe_interval.
Since the raise timer and probe timer are never used at the same time, it
is no need to keep probe timer 'alive' in the Search Complete state. This
patch to introduce sctp_transport_reset_raise_timer() to start the timer
as the raise timer when entering the Search Complete state. When entering
the other states, sctp_transport_reset_probe_timer() will still be called
to reset the timer to the probe timer.
raise_count can be removed from sctp_transport as no need to count probe
timer timeout for raise timer timeout. last_rtx_chunks can be removed as
sctp_transport_reset_probe_timer() can be called in the place where asoc
rtx_data_chunks is changed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/edb0e48988ea85997488478b705b11ddc1ba724a.1637781974.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When doing remote name request, we cannot scan. In the normal case it's
OK since we can expect it to finish within a short amount of time.
However, there is a possibility to scan lots of devices that
(1) requires Remote Name Resolve
(2) is unresponsive to Remote Name Resolve
When this happens, we are stuck to do Remote Name Resolve until all is
done before continue scanning.
This patch adds a time limit to stop us spending too long on remote
name request.
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>