mirror of
https://github.com/torvalds/linux.git
synced 2024-11-10 14:11:52 +00:00
eac27a41ab
73751 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Remi Pommarel
|
eac27a41ab |
batman-adv: Do not get eth header before batadv_check_management_packet
If received skb in batadv_v_elp_packet_recv or batadv_v_ogm_packet_recv
is either cloned or non linearized then its data buffer will be
reallocated by batadv_check_management_packet when skb_cow or
skb_linearize get called. Thus geting ethernet header address inside
skb data buffer before batadv_check_management_packet had any chance to
reallocate it could lead to the following kernel panic:
Unable to handle kernel paging request at virtual address ffffff8020ab069a
Mem abort info:
ESR = 0x96000007
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x07: level 3 translation fault
Data abort info:
ISV = 0, ISS = 0x00000007
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000040f45000
[ffffff8020ab069a] pgd=180000007fffa003, p4d=180000007fffa003, pud=180000007fffa003, pmd=180000007fefe003, pte=0068000020ab0706
Internal error: Oops: 96000007 [#1] SMP
Modules linked in: ahci_mvebu libahci_platform libahci dvb_usb_af9035 dvb_usb_dib0700 dib0070 dib7000m dibx000_common ath11k_pci ath10k_pci ath10k_core mwl8k_new nf_nat_sip nf_conntrack_sip xhci_plat_hcd xhci_hcd nf_nat_pptp nf_conntrack_pptp at24 sbsa_gwdt
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.15.42-00066-g3242268d425c-dirty #550
Hardware name: A8k (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : batadv_is_my_mac+0x60/0xc0
lr : batadv_v_ogm_packet_recv+0x98/0x5d0
sp : ffffff8000183820
x29: ffffff8000183820 x28: 0000000000000001 x27: ffffff8014f9af00
x26: 0000000000000000 x25: 0000000000000543 x24: 0000000000000003
x23: ffffff8020ab0580 x22: 0000000000000110 x21: ffffff80168ae880
x20: 0000000000000000 x19: ffffff800b561000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 00dc098924ae0032
x14: 0f0405433e0054b0 x13: ffffffff00000080 x12: 0000004000000001
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000000000000 x7 : ffffffc076dae000 x6 : ffffff8000183700
x5 : ffffffc00955e698 x4 : ffffff80168ae000 x3 : ffffff80059cf000
x2 : ffffff800b561000 x1 : ffffff8020ab0696 x0 : ffffff80168ae880
Call trace:
batadv_is_my_mac+0x60/0xc0
batadv_v_ogm_packet_recv+0x98/0x5d0
batadv_batman_skb_recv+0x1b8/0x244
__netif_receive_skb_core.isra.0+0x440/0xc74
__netif_receive_skb_one_core+0x14/0x20
netif_receive_skb+0x68/0x140
br_pass_frame_up+0x70/0x80
br_handle_frame_finish+0x108/0x284
br_handle_frame+0x190/0x250
__netif_receive_skb_core.isra.0+0x240/0xc74
__netif_receive_skb_list_core+0x6c/0x90
netif_receive_skb_list_internal+0x1f4/0x310
napi_complete_done+0x64/0x1d0
gro_cell_poll+0x7c/0xa0
__napi_poll+0x34/0x174
net_rx_action+0xf8/0x2a0
_stext+0x12c/0x2ac
run_ksoftirqd+0x4c/0x7c
smpboot_thread_fn+0x120/0x210
kthread+0x140/0x150
ret_from_fork+0x10/0x20
Code: f9403844 eb03009f 54fffee1 f94
Thus ethernet header address should only be fetched after
batadv_check_management_packet has been called.
Fixes:
|
||
Sven Eckelmann
|
d8e42a2b0a |
batman-adv: Don't increase MTU when set by user
If the user set an MTU value, it usually means that there are special
requirements for the MTU. But if an interface gots activated, the MTU was
always recalculated and then the user set value was overwritten.
The only reason why this user set value has to be overwritten, is when the
MTU has to be decreased because batman-adv is not able to transfer packets
with the user specified size.
Fixes:
|
||
Sven Eckelmann
|
c6a953cce8 |
batman-adv: Trigger events for auto adjusted MTU
If an interface changes the MTU, it is expected that an NETDEV_PRECHANGEMTU
and NETDEV_CHANGEMTU notification events is triggered. This worked fine for
.ndo_change_mtu based changes because core networking code took care of it.
But for auto-adjustments after hard-interfaces changes, these events were
simply missing.
Due to this problem, non-batman-adv components weren't aware of MTU changes
and thus couldn't perform their own tasks correctly.
Fixes:
|
||
Linus Torvalds
|
6843306689 |
Including fixes from bluetooth, bpf and wireguard.
Current release - regressions: - nvme-tcp: fix comma-related oops after sendpage changes Current release - new code bugs: - ptp: make max_phase_adjustment sysfs device attribute invisible when not supported Previous releases - regressions: - sctp: fix potential deadlock on &net->sctp.addr_wq_lock - mptcp: - ensure subflow is unhashed before cleaning the backlog - do not rely on implicit state check in mptcp_listen() Previous releases - always broken: - net: fix net_dev_start_xmit trace event vs skb_transport_offset() - Bluetooth: - fix use-bdaddr-property quirk - L2CAP: fix multiple UaFs - ISO: use hci_sync for setting CIG parameters - hci_event: fix Set CIG Parameters error status handling - hci_event: fix parsing of CIS Established Event - MGMT: fix marking SCAN_RSP as not connectable - wireguard: queuing: use saner cpu selection wrapping - sched: act_ipt: various bug fixes for iptables <> TC interactions - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging - eth: sfc: fix null-deref in devlink port without MAE access - eth: ibmvnic: do not reset dql stats on NON_FATAL err Misc: - xsk: honor SO_BINDTODEVICE on bind Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSlu+MACgkQMUZtbf5S Irslgw//S7jf/GL8V6y8VL3te+/OPOZnLDTzFFOdy64/y97FE6XIacJUpyWRhtmz oSzcSNHETPW9U+xSGa2ZQlKhAXt6n9iRNvUegql+VBb13Iz+l7AdTeoxRv/YuwDo 5lTOIB6cBw+ATd0oxS6wr8SyUlcvktUKBfTAItjbVM55aXfIUpXIa84+F7avJgIA XP1u/3PHhwItmwo/hXhHH0+P0QA8ix1q2SvRB7DAlQLBsTuQhaKjXWQkYYTKw/Nt dtvh8iQSs/YXaHMjTa5CK28HOD8+ywIizr+uJ9VaNqIzV0W5JE9IE8P4NFpBcY7t kGjTYODOph7dkNmZ5RLj3N+B6CyC57OXDzoo/tr8940UytCLVj9EVyduarLGLx57 edqK9cUz5kWejyGoyZ4Pvlo/SKvCQ2HKMeiAJ0/nNpTJMFuygMoqGsaD6ttzkXMj fZLPjRUK3axd+15ZzhLEf8HyL5Qh+qPqqX9p7NljfMKwhxMWJ5fuICJfdGOSdMJR ndL+wPfRPFQwszZ4pbTY2Ivn29mo8ScBOSOEgQs2mOny+zFzTzmqNWz/jcFfQnjS cylxBEHrgudT2FuCImZ/v66TM5yakHXqIdpTGG+zsvJWQqjM96Z3I7WRvi0g9d75 n84il+j34mnzl90j2xEutqUiK7BQ9ZpZBsutPVTKBIHKWWiortI= =9yzk -----END PGP SIGNATURE----- Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf and wireguard. Current release - regressions: - nvme-tcp: fix comma-related oops after sendpage changes Current release - new code bugs: - ptp: make max_phase_adjustment sysfs device attribute invisible when not supported Previous releases - regressions: - sctp: fix potential deadlock on &net->sctp.addr_wq_lock - mptcp: - ensure subflow is unhashed before cleaning the backlog - do not rely on implicit state check in mptcp_listen() Previous releases - always broken: - net: fix net_dev_start_xmit trace event vs skb_transport_offset() - Bluetooth: - fix use-bdaddr-property quirk - L2CAP: fix multiple UaFs - ISO: use hci_sync for setting CIG parameters - hci_event: fix Set CIG Parameters error status handling - hci_event: fix parsing of CIS Established Event - MGMT: fix marking SCAN_RSP as not connectable - wireguard: queuing: use saner cpu selection wrapping - sched: act_ipt: various bug fixes for iptables <> TC interactions - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging - eth: sfc: fix null-deref in devlink port without MAE access - eth: ibmvnic: do not reset dql stats on NON_FATAL err Misc: - xsk: honor SO_BINDTODEVICE on bind" * tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) nfp: clean mc addresses in application firmware when closing port selftests: mptcp: pm_nl_ctl: fix 32-bit support selftests: mptcp: depend on SYN_COOKIES selftests: mptcp: userspace_pm: report errors with 'remove' tests selftests: mptcp: userspace_pm: use correct server port selftests: mptcp: sockopt: return error if wrong mark selftests: mptcp: sockopt: use 'iptables-legacy' if available selftests: mptcp: connect: fail if nft supposed to work mptcp: do not rely on implicit state check in mptcp_listen() mptcp: ensure subflow is unhashed before cleaning the backlog s390/qeth: Fix vipa deletion octeontx-af: fix hardware timestamp configuration net: dsa: sja1105: always enable the send_meta options net: dsa: tag_sja1105: fix MAC DA patching from meta frames net: Replace strlcpy with strscpy pptp: Fix fib lookup calls. mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX xsk: Honor SO_BINDTODEVICE on bind ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported ... |
||
Jakub Kicinski
|
fdaff05b4a |
bpf-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZKWhbwAKCRDbK58LschI g4XPAPsE0BO25GLziHlTxhsXceSewwNzPTPbK/77Kl7jNVVF+wEAlh2og3OlUdiS 4RtwqyIoP7P4XZ5ii3XGylYN7wBHHgE= =AIOh -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-07-05 We've added 2 non-merge commits during the last 1 day(s) which contain a total of 3 files changed, 16 insertions(+), 4 deletions(-). The main changes are: 1) Fix BTF to warn but not returning an error for a NULL BTF to still be able to load modules under CONFIG_DEBUG_INFO_BTF, from SeongJae Park. 2) Fix xsk sockets to honor SO_BINDTODEVICE in bind(), from Ilya Maximets. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xsk: Honor SO_BINDTODEVICE on bind bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set() ==================== Link: https://lore.kernel.org/r/20230705171716.6494-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Paolo Abeni
|
0226436acf |
mptcp: do not rely on implicit state check in mptcp_listen()
Since the blamed commit, closing the first subflow resets the first
subflow socket state to SS_UNCONNECTED.
The current mptcp listen implementation relies only on such
state to prevent touching not-fully-disconnected sockets.
Incoming mptcp fastclose (or paired endpoint removal) unconditionally
closes the first subflow.
All the above allows an incoming fastclose followed by a listen() call
to successfully race with a blocking recvmsg(), potentially causing the
latter to hit a divide by zero bug in cleanup_rbuf/__tcp_select_window().
Address the issue explicitly checking the msk socket state in
mptcp_listen(). An alternative solution would be moving the first
subflow socket state update into mptcp_disconnect(), but in the long
term the first subflow socket should be removed: better avoid relaying
on it for internal consistency check.
Fixes:
|
||
Paolo Abeni
|
3fffa15bfe |
mptcp: ensure subflow is unhashed before cleaning the backlog
While tacking care of the mptcp-level listener I unintentionally
moved the subflow level unhash after the subflow listener backlog
cleanup.
That could cause some nasty race and makes the code harder to read.
Address the issue restoring the proper order of operations.
Fixes:
|
||
Vladimir Oltean
|
a372d66af4 |
net: dsa: sja1105: always enable the send_meta options
incl_srcpt has the limitation, mentioned in commit |
||
Vladimir Oltean
|
1dcf6efd5f |
net: dsa: tag_sja1105: fix MAC DA patching from meta frames
The SJA1105 manual says that at offset 4 into the meta frame payload we
have "MAC destination byte 2" and at offset 5 we have "MAC destination
byte 1". These are counted from the LSB, so byte 1 is h_dest[ETH_HLEN-2]
aka h_dest[4] and byte 2 is h_dest[ETH_HLEN-3] aka h_dest[3].
The sja1105_meta_unpack() function decodes these the other way around,
so a frame with MAC DA 01:80:c2:11:22:33 is received by the network
stack as having 01:80:c2:22:11:33.
Fixes:
|
||
Lin Ma
|
30c45b5361 |
net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
The attribute TCA_PEDIT_PARMS_EX is not be included in pedit_policy and
one malicious user could fake a TCA_PEDIT_PARMS_EX whose length is
smaller than the intended sizeof(struct tc_pedit). Hence, the
dereference in tcf_pedit_init() could access dirty heap data.
static int tcf_pedit_init(...)
{
// ...
pattr = tb[TCA_PEDIT_PARMS]; // TCA_PEDIT_PARMS is included
if (!pattr)
pattr = tb[TCA_PEDIT_PARMS_EX]; // but this is not
// ...
parm = nla_data(pattr);
index = parm->index; // parm is able to be smaller than 4 bytes
// and this dereference gets dirty skb_buff
// data created in netlink_sendmsg
}
This commit adds TCA_PEDIT_PARMS_EX length in pedit_policy which avoid
the above case, just like the TCA_PEDIT_PARMS.
Fixes:
|
||
Ilya Maximets
|
f7306acec9 |
xsk: Honor SO_BINDTODEVICE on bind
Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. A
privileged process might create the socket and pass it to a non-privileged
process for later use. However, that process will be able to bind the socket
to any network interface. Even though it will not be able to receive any
traffic without modification of the BPF map, the situation is not ideal.
Sockets already have a mechanism that can be used to restrict what interface
they can be attached to. That is SO_BINDTODEVICE.
To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.
Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer workflow
when non-privileged process is using AF_XDP.
The intended workflow is following:
1. First process creates a bare socket with socket(AF_XDP, ...).
2. First process loads the XSK program to the interface.
3. First process adds the socket fd to a BPF map.
4. First process ties socket fd to a particular interface using
SO_BINDTODEVICE.
5. First process sends socket fd to a second process.
6. Second process allocates UMEM.
7. Second process binds socket to the interface with bind(...).
8. Second process sends/receives the traffic.
All the steps above are possible today if the first process is privileged
and the second one has sufficient RLIMIT_MEMLOCK and no capabilities.
However, the second process will be able to bind the socket to any interface
it wants on step 7 and send traffic from it. With the proposed change, the
second process will be able to bind the socket only to a specific interface
chosen by the first process at step 4.
Fixes:
|
||
Linus Torvalds
|
c156d4af43 |
- New Drivers
- Add support for Intel Cherry Trail Whiskey Cove PMIC LEDs - Add support for Awinic AW20036/AW20054/AW20072 LEDs - New Device Support - Add support for PMI632 LPG to QCom LPG - Add support for PMI8998 to QCom Flash - Add support for MT6331, WLEDs and MT6332 to Mediatek MT6323 PMIC - New Functionality - Implement the LP55xx Charge Pump - Add support for suspend / resume to Intel Cherry Trail Whiskey Cove PMIC - Add support for breathing mode to Intel Cherry Trail Whiskey Cove PMIC - Enable per-pin resolution Pinctrl in LEDs GPIO - Fix-ups - Allow thread to sleep by switching from spinlock to mutex - Add lots of Device Tree bindings / support - Adapt relationships / dependencies driven by Kconfig - Switch I2C drivers from .probe_new() to .probe() - Remove superfluous / duplicate code - Replace strlcpy() with strscpy() for efficiency and overflow prevention - Staticify various functions - Trivial: Fixing coding style - Simplify / reduce code - Bug Fixes - Prevent NETDEV_LED_MODE_LINKUP from being cleared on rename - Repair race between led_set_brightness(LED_{OFF,FULL}) - Fix Oops relating to sleeping in critical sections - Clear LED_INIT_DEFAULT_TRIGGER flag when clearing the current trigger - Do not leak resources in error handling paths - Fix unsigned comparison which can never be negative - Provide missing NULL terminating entries in tables - Fix misnaming issues -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmSinb0ACgkQUa+KL4f8 d2FYfg//WWLVfXRuRpY9ueOxvWj65WVPQSQ+wzF/vRTweogR+lN0qxNPH6yT943z ap2EBxpWf84zCifYG4yhTEYDHQT+nH1fIz6xaK29DK8sCQi4WdRpHuE2pE30R/tf Q7SyZi9DlWyoqNiqgNNCl7vkTaHpO3trxoxfEfN2YIB0npLf8yyWRz4feVXXsYtg 41S4Mo7oTxphd7OLvw9PKogdTbT29vBMXen8jzv5g8FObj1Gheg0frq2t2W+bfAl 27cJJJS7he4/WLCDzXVQfB46Nva5NpqHiANbgOAApDGx3hFCzZFTCg6K7+VucpjY bNz3pqmslT5uJxMjqNz8fCSzwWTjyKLHBeGsIT/4HBXD4DnfFbWz9HYkorfNgsu2 lKEp0SYhSmmuS8IVzJvqDqXg6k21hGpuR9P+dI7teoClh0qLTMCz2L2c9p2zNfth 0W+WeLYQ67QTRH9EcHo3dlZH/mP/J1jGmUDbF+DFI6bHsg2iahZUA6ixD18E7sWE RwtCnb3xyn7eoDe3LwJdKtJMyrX59MbFWqozij2NNhvduXc+m1kH/DX5CSaBUVwf RtfDZwWHf4qK4CipuuqOLd5fiUArJ3TSHBxXkoo0Wz7NYXK9k86eIZgWrgdEbvuA oHmSousS19Eiscjtzxl7OjvDJMRc0rTJfD7LzYoHQBL4Vpnd8VI= =9kd5 -----END PGP SIGNATURE----- Merge tag 'leds-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds Pull LED updates from Lee Jones: "New Drivers: - Add support for Intel Cherry Trail Whiskey Cove PMIC LEDs - Add support for Awinic AW20036/AW20054/AW20072 LEDs New Device Support: - Add support for PMI632 LPG to QCom LPG - Add support for PMI8998 to QCom Flash - Add support for MT6331, WLEDs and MT6332 to Mediatek MT6323 PMIC New Functionality: - Implement the LP55xx Charge Pump - Add support for suspend / resume to Intel Cherry Trail Whiskey Cove PMIC - Add support for breathing mode to Intel Cherry Trail Whiskey Cove PMIC - Enable per-pin resolution Pinctrl in LEDs GPIO Fix-ups: - Allow thread to sleep by switching from spinlock to mutex - Add lots of Device Tree bindings / support - Adapt relationships / dependencies driven by Kconfig - Switch I2C drivers from .probe_new() to .probe() - Remove superfluous / duplicate code - Replace strlcpy() with strscpy() for efficiency and overflow prevention - Staticify various functions - Trivial: Fixing coding style - Simplify / reduce code Bug Fixes: - Prevent NETDEV_LED_MODE_LINKUP from being cleared on rename - Repair race between led_set_brightness(LED_{OFF,FULL}) - Fix Oops relating to sleeping in critical sections - Clear LED_INIT_DEFAULT_TRIGGER flag when clearing the current trigger - Do not leak resources in error handling paths - Fix unsigned comparison which can never be negative - Provide missing NULL terminating entries in tables - Fix misnaming issues" * tag 'leds-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (53 commits) leds: leds-mt6323: Adjust return/parameter types in wled get/set callbacks leds: sgm3140: Add richtek,rt5033-led compatible dt-bindings: leds: sgm3140: Document richtek,rt5033 compatible dt-bindings: backlight: kinetic,ktz8866: Add missing type for "current-num-sinks" dt-bindings: leds: Drop unneeded quotes leds: Fix config reference for AW200xx driver leds: leds-mt6323: Add support for WLEDs and MT6332 leds: leds-mt6323: Add support for MT6331 leds leds: leds-mt6323: Open code and drop MT6323_CAL_HW_DUTY macro leds: leds-mt6323: Drop MT6323_ prefix from macros and defines leds: leds-mt6323: Specify registers and specs in platform data dt-bindings: leds: leds-mt6323: Document mt6332 compatible dt-bindings: leds: leds-mt6323: Document mt6331 compatible leds: simatic-ipc-leds-gpio: Introduce more Kconfig switches leds: simatic-ipc-leds-gpio: Split up into multiple drivers leds: simatic-ipc-leds-gpio: Move two extra gpio pins into another table leds: simatic-ipc-leds-gpio: Add terminating entries to gpio tables leds: flash: leds-qcom-flash: Fix an unsigned comparison which can never be negative leds: cht-wcove: Remove unneeded semicolon leds: cht-wcove: Fix an unsigned comparison which can never be negative ... |
||
Eric Dumazet
|
998127cdb4 |
tcp: annotate data races in __tcp_oow_rate_limited()
request sockets are lockless, __tcp_oow_rate_limited() could be called
on the same object from different cpus. This is harmless.
Add READ_ONCE()/WRITE_ONCE() annotations to avoid a KCSAN report.
Fixes:
|
||
Vladimir Oltean
|
a398b9ea0c |
net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge mode
There was a regression introduced by the blamed commit, where pinging to
a VLAN-unaware bridge would fail with the repeated message "Couldn't
decode source port" coming from the tagging protocol driver.
When receiving packets with a bridge_vid as determined by
dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode:
- source_port = 0 (which isn't really valid, more like "don't know")
- switch_id = 0 (which isn't really valid, more like "don't know")
- vbid = value in range 1-7
Since the blamed patch has reversed the order of the checks, we are now
going to believe that source_port != -1 and switch_id != -1, so they're
valid, but they aren't.
The minimal solution to the problem is to only populate source_port and
switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero,
i.e. the source port information is trustworthy.
Fixes:
|
||
Vladimir Oltean
|
6ca3c005d0 |
net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
According to the synchronization rules for .ndo_get_stats() as seen in
Documentation/networking/netdevices.rst, acquiring a plain spin_lock()
should not be illegal, but the bridge driver implementation makes it so.
After running these commands, I am being faced with the following
lockdep splat:
$ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up
$ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set macsec0 master br0 && ip link set macsec0 up
========================================================
WARNING: possible irq lock inversion dependency detected
6.4.0-04295-g31b577b4bd4a #603 Not tainted
--------------------------------------------------------
swapper/1/0 just changed the state of lock:
ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&ocelot->stats_lock){+.+.}-{3:3}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&br->lock --> &br->hash_lock --> &ocelot->stats_lock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ocelot->stats_lock);
local_irq_disable();
lock(&br->lock);
lock(&br->hash_lock);
<Interrupt>
lock(&br->lock);
*** DEADLOCK ***
(details about the 3 locks skipped)
swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this
only matters to the extent that its .ndo_get_stats64() method calls
spin_lock(&ocelot->stats_lock).
Documentation/locking/lockdep-design.rst says:
| A lock is irq-safe means it was ever used in an irq context, while a lock
| is irq-unsafe means it was ever acquired with irq enabled.
(...)
| Furthermore, the following usage based lock dependencies are not allowed
| between any two lock-classes::
|
| <hardirq-safe> -> <hardirq-unsafe>
| <softirq-safe> -> <softirq-unsafe>
Lockdep marks br->hash_lock as softirq-safe, because it is sometimes
taken in softirq context (for example br_fdb_update() which runs in
NET_RX softirq), and when it's not in softirq context it blocks softirqs
by using spin_lock_bh().
Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never
blocks softirqs from running, and it is never taken from softirq
context. So it can always be interrupted by softirqs.
There is a call path through which a function that holds br->hash_lock:
fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock:
ocelot_port_get_stats64(). This can be seen below:
ocelot_port_get_stats64+0x3c/0x1e0
felix_get_stats64+0x20/0x38
dsa_slave_get_stats64+0x3c/0x60
dev_get_stats+0x74/0x2c8
rtnl_fill_stats+0x4c/0x150
rtnl_fill_ifinfo+0x5cc/0x7b8
rtmsg_ifinfo_build_skb+0xe4/0x150
rtmsg_ifinfo+0x5c/0xb0
__dev_notify_flags+0x58/0x200
__dev_set_promiscuity+0xa0/0x1f8
dev_set_promiscuity+0x30/0x70
macsec_dev_change_rx_flags+0x68/0x88
__dev_set_promiscuity+0x1a8/0x1f8
__dev_set_rx_mode+0x74/0xa8
dev_uc_add+0x74/0xa0
fdb_add_hw_addr+0x68/0xd8
fdb_add_local+0xc4/0x110
br_fdb_add_local+0x54/0x88
br_add_if+0x338/0x4a0
br_add_slave+0x20/0x38
do_setlink+0x3a4/0xcb8
rtnl_newlink+0x758/0x9d0
rtnetlink_rcv_msg+0x2f0/0x550
netlink_rcv_skb+0x128/0x148
rtnetlink_rcv+0x24/0x38
the plain English explanation for it is:
The macsec0 bridge port is created without p->flags & BR_PROMISC,
because it is what br_manage_promisc() decides for a VLAN filtering
bridge with a single auto port.
As part of the br_add_if() procedure, br_fdb_add_local() is called for
the MAC address of the device, and this results in a call to
dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken.
Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up
calling __dev_set_promiscuity() for macsec0, which is propagated by its
implementation, macsec_dev_change_rx_flags(), to the lower device: swp0.
This triggers the call path:
dev_set_promiscuity(swp0)
-> rtmsg_ifinfo()
-> dev_get_stats()
-> ocelot_port_get_stats64()
with a calling context that lockdep doesn't like (br->hash_lock held).
Normally we don't see this, because even though many drivers that can be
bridge ports don't support IFF_UNICAST_FLT, we need a driver that
(a) doesn't support IFF_UNICAST_FLT, *and*
(b) it forwards the IFF_PROMISC flag to another driver, and
(c) *that* driver implements ndo_get_stats64() using a softirq-unsafe
spinlock.
Condition (b) is necessary because the first __dev_set_rx_mode() calls
__dev_set_promiscuity() with "bool notify=false", and thus, the
rtmsg_ifinfo() code path won't be entered.
The same criteria also hold true for DSA switches which don't report
IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its
ndo_get_stats64() method, the same lockdep splat can be seen.
I think the deadlock possibility is real, even though I didn't reproduce
it, and I'm thinking of the following situation to support that claim:
fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally
disabled and br->hash_lock held, and may end up attempting to acquire
ocelot->stats_lock.
In parallel, ocelot->stats_lock is currently held by a thread B (say,
ocelot_check_stats_work()), which is interrupted while holding it by a
softirq which attempts to lock br->hash_lock.
Thread B cannot make progress because br->hash_lock is held by A. Whereas
thread A cannot make progress because ocelot->stats_lock is held by B.
When taking the issue at face value, the bridge can avoid that problem
by simply making the ports promiscuous from a code path with a saner
calling context (br->hash_lock not held). A bridge port without
IFF_UNICAST_FLT is going to become promiscuous as soon as we call
dev_uc_add() on it (which we do unconditionally), so why not be
preemptive and make it promiscuous right from the beginning, so as to
not be taken by surprise.
With this, we've broken the links between code that holds br->hash_lock
or br->lock and code that calls into the ndo_change_rx_flags() or
ndo_get_stats64() ops of the bridge port.
Fixes:
|
||
Linus Torvalds
|
dfab92f27c |
NFS client updates for Linux 6.5
Highlights include: Stable fixes and other bugfixes: - nfs: don't report STATX_BTIME in ->getattr - Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" since it breaks NFSv4 state recovery. - NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION - Fix the NFSv4.2 xattr cache shrinker_id - Force a ctime update after a NFSv4.2 SETXATTR call Features and cleanups: - NFS and RPC over TLS client code from Chuck Lever. - Support for use of abstract unix socket addresses with the rpcbind daemon. - Sysfs API to allow shutdown of the kernel RPC client and prevent umount() hangs if the server is known to be permanently down. - XDR cleanups from Anna. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmSgmmUACgkQZwvnipYK APJwUA/+J6uEjJFoigSDU5dpCwQr4pHZPgUn3T2heplcyalGMxLo1VjDTVuFXb+a NZqdUZF2ePmYqss/UYzJC7R6/z9OanVBcpiGqp66foJt9ncs9BSm5AzdV5Gvi4VX 6SrBM98nSqvD47l45LQ90bqIdR6WgMP9OiDC257PzYnaMZJcB0xObD4HWXh1zbIz 3xynJTSQnRGbv9I5EjJJGVIHDWLfSKY61NUXjrUcmMZ2L39ITNy0CRi8sIdj3oY/ A2Iz52IHtAhE77+EetThPskbTLa07raQSWRo3X6XJqCKiJIXa5giNDoG/zLq6sOT hi1AV7Tdxaed2EYibeRWzsSVQIClBb7T/hdro5dWs5u/bxM6Bt+yY90ZWUMZVOAQ /kGTYQXhI31vUgRaEN+2xci0wKDy9wqyAWcD8u8Gz01KaK09sfJSIvvYn+srSeaz wEUQHZCdBGtNFVP2q18q4x8BN27uObh1DdMvNhrxrA7YraXSQvL/rIIsD0jmDInb 6olMm9g9nZSHgq62+CYs2v7J/AJKQzE7PsWrTMJDX1rso+/Lyc6x7oUGxv2IFt5H VZVZNdstKeNzfcnNKsGG2ZbufhasKHqiHJxJTdNOuOi0YBi+ixtJVRpupId3+6aZ ysng0IfzqiWSuiq5Axjreva+480IDSMW+7cqcw5urKEfYY5uVcc= =leGh -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Stable fixes and other bugfixes: - nfs: don't report STATX_BTIME in ->getattr - Revert 'NFSv4: Retry LOCK on OLD_STATEID during delegation return' since it breaks NFSv4 state recovery. - NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION - Fix the NFSv4.2 xattr cache shrinker_id - Force a ctime update after a NFSv4.2 SETXATTR call Features and cleanups: - NFS and RPC over TLS client code from Chuck Lever - Support for use of abstract unix socket addresses with the rpcbind daemon - Sysfs API to allow shutdown of the kernel RPC client and prevent umount() hangs if the server is known to be permanently down - XDR cleanups from Anna" * tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (33 commits) Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" NFS: Don't cleanup sysfs superblock entry if uninitialized nfs: don't report STATX_BTIME in ->getattr NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION NFSv4.2: fix wrong shrinker_id NFSv4: Clean up some shutdown loops NFS: Cancel all existing RPC tasks when shutdown NFS: add sysfs shutdown knob NFS: add a sysfs link to the acl rpc_client NFS: add a sysfs link to the lockd rpc_client NFS: Add sysfs links to sunrpc clients for nfs_clients NFS: add superblock sysfs entries NFS: Make all of /sys/fs/nfs network-namespace unique NFS: Open-code the nfs_kset kset_create_and_add() NFS: rename nfs_client_kobj to nfs_net_kobj NFS: rename nfs_client_kset to nfs_kset NFS: Add an "xprtsec=" NFS mount option NFS: Have struct nfs_client carry a TLS policy field SUNRPC: Add a TCP-with-TLS RPC transport class SUNRPC: Capture CMSG metadata on client-side receive ... |
||
Luiz Augusto von Dentz
|
2be22f1941 |
Bluetooth: hci_event: Fix parsing of CIS Established Event
The ISO Interval on CIS Established Event uses 1.25 ms slots: BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2304: Time = N * 1.25 ms In addition to that this always update the QoS settings based on CIS Established Event. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jiapeng Chong
|
5b6d345d1b |
Bluetooth: hci_conn: Use kmemdup() to replace kzalloc + memcpy
Use kmemdup rather than duplicating its implementation. ./net/bluetooth/hci_conn.c:1880:7-14: WARNING opportunity for kmemdup. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5597 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Ivan Orlov
|
d40d6f52d5 |
Bluetooth: hci_sysfs: make bt_class a static const structure
Now that the driver core allows for struct class to be in read-only memory, move the bt_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at load time. Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: linux-bluetooth@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Luiz Augusto von Dentz
|
14f0dceca6 |
Bluetooth: ISO: Rework sync_interval to be sync_factor
This rework sync_interval to be sync_factor as having sync_interval in the order of seconds is sometimes not disarable. Wit sync_factor the application can tell how many SDU intervals it wants to send an announcement with PA, the EA interval is set to 2 times that so a factor of 24 of BIG SDU interval of 10ms would look like the following: < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 Handle: 0x01 Properties: 0x0000 Min advertising interval: 480.000 msec (0x0300) Max advertising interval: 480.000 msec (0x0300) Channel map: 37, 38, 39 (0x07) Own address type: Random (0x01) Peer address type: Public (0x00) Peer address: 00:00:00:00:00:00 (OUI 00-00-00) Filter policy: Allow Scan Request from Any, Allow Connect Request from Any (0x00) TX power: Host has no preference (0x7f) Primary PHY: LE 1M (0x01) Secondary max skip: 0x00 Secondary PHY: LE 2M (0x02) SID: 0x00 Scan request notifications: Disabled (0x00) < HCI Command: LE Set Periodic Advertising Parameters (0x08|0x003e) plen 7 Handle: 1 Min interval: 240.00 msec (0x00c0) Max interval: 240.00 msec (0x00c0) Properties: 0x0000 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Luiz Augusto von Dentz
|
73f55453ea |
Bluetooth: MGMT: Fix marking SCAN_RSP as not connectable
When receiving a scan response there is no way to know if the remote
device is connectable or not, so when it cannot be merged don't
make any assumption and instead just mark it with a new flag defined as
MGMT_DEV_FOUND_SCAN_RSP so userspace can tell it is a standalone
SCAN_RSP.
Link: https://lore.kernel.org/linux-bluetooth/CABBYNZ+CYMsDSPTxBn09Js3BcdC-x7vZFfyLJ3ppZGGwJKmUTw@mail.gmail.com/
Fixes:
|
||
Pauli Virtanen
|
db9cbcadc1 |
Bluetooth: hci_event: fix Set CIG Parameters error status handling
If the event has error status, return right error code and don't show incorrect "response malformed" messages. Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Pauli Virtanen
|
6b9545dc9f |
Bluetooth: ISO: use hci_sync for setting CIG parameters
When reconfiguring CIG after disconnection of the last CIS, LE Remove
CIG shall be sent before LE Set CIG Parameters. Otherwise, it fails
because CIG is in the inactive state and not configurable (Core v5.3
Vol 6 Part B Sec. 4.5.14.3). This ordering is currently wrong under
suitable timing conditions, because LE Remove CIG is sent via the
hci_sync queue and may be delayed, but Set CIG Parameters is via
hci_send_cmd.
Make the ordering well-defined by sending also Set CIG Parameters via
hci_sync.
Fixes:
|
||
Sungwoo Kim
|
1728137b33 |
Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
l2cap_sock_release(sk) frees sk. However, sk's children are still alive and point to the already free'd sk's address. To fix this, l2cap_sock_release(sk) also cleans sk's children. ================================================================== BUG: KASAN: use-after-free in l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 Read of size 8 at addr ffff888104617aa8 by task kworker/u3:0/276 CPU: 0 PID: 276 Comm: kworker/u3:0 Not tainted 6.2.0-00001-gef397bd4d5fb-dirty #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: hci2 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x72/0x95 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x175/0x478 mm/kasan/report.c:417 kasan_report+0xb1/0x130 mm/kasan/report.c:517 l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 l2cap_chan_ready+0x10e/0x1e0 net/bluetooth/l2cap_core.c:1386 l2cap_config_req+0x753/0x9f0 net/bluetooth/l2cap_core.c:4480 l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5739 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:6509 [inline] l2cap_recv_frame+0xe2e/0x43c0 net/bluetooth/l2cap_core.c:7788 l2cap_recv_acldata+0x6ed/0x7e0 net/bluetooth/l2cap_core.c:8506 hci_acldata_packet net/bluetooth/hci_core.c:3813 [inline] hci_rx_work+0x66e/0xbc0 net/bluetooth/hci_core.c:4048 process_one_work+0x4ea/0x8e0 kernel/workqueue.c:2289 worker_thread+0x364/0x8e0 kernel/workqueue.c:2436 kthread+0x1b9/0x200 kernel/kthread.c:376 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 </TASK> Allocated by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0x82/0x90 mm/kasan/common.c:383 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0x140 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x113/0x1f0 net/core/sock.c:2040 sk_alloc+0x36/0x3c0 net/core/sock.c:2093 l2cap_sock_alloc.constprop.0+0x39/0x1c0 net/bluetooth/l2cap_sock.c:1852 l2cap_sock_create+0x10d/0x220 net/bluetooth/l2cap_sock.c:1898 bt_sock_create+0x183/0x290 net/bluetooth/af_bluetooth.c:132 __sock_create+0x226/0x380 net/socket.c:1518 sock_create net/socket.c:1569 [inline] __sys_socket_create net/socket.c:1606 [inline] __sys_socket_create net/socket.c:1591 [inline] __sys_socket+0x112/0x200 net/socket.c:1639 __do_sys_socket net/socket.c:1652 [inline] __se_sys_socket net/socket.c:1650 [inline] __x64_sys_socket+0x40/0x50 net/socket.c:1650 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:523 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free mm/kasan/common.c:200 [inline] __kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook mm/slub.c:1807 [inline] slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x88/0x1f0 mm/slub.c:3800 sk_prot_free net/core/sock.c:2076 [inline] __sk_destruct+0x347/0x430 net/core/sock.c:2168 sk_destruct+0x9c/0xb0 net/core/sock.c:2183 __sk_free+0x82/0x220 net/core/sock.c:2194 sk_free+0x7c/0xa0 net/core/sock.c:2205 sock_put include/net/sock.h:1991 [inline] l2cap_sock_kill+0x256/0x2b0 net/bluetooth/l2cap_sock.c:1257 l2cap_sock_release+0x1a7/0x220 net/bluetooth/l2cap_sock.c:1428 __sock_release+0x80/0x150 net/socket.c:650 sock_close+0x19/0x30 net/socket.c:1368 __fput+0x17a/0x5c0 fs/file_table.c:320 task_work_run+0x132/0x1c0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x113/0x120 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:296 do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc The buggy address belongs to the object at ffff888104617800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 680 bytes inside of 1024-byte region [ffff888104617800, ffff888104617c00) The buggy address belongs to the physical page: page:00000000dbca6a80 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888104614000 pfn:0x104614 head:00000000dbca6a80 order:2 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffff888100041dc0 ffffea0004212c10 ffffea0004234b10 raw: ffff888104614000 0000000000080002 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888104617980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888104617a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888104617b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Ack: This bug is found by FuzzBT with a modified Syzkaller. Other contributors are Ruoyu Wu and Hui Peng. Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Johan Hovold
|
6945795bc8 |
Bluetooth: fix use-bdaddr-property quirk
Devices that lack persistent storage for the device address can indicate this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller to be marked as unconfigured until user space has set a valid address. The related HCI_QUIRK_USE_BDADDR_PROPERTY was later added to similarly indicate that the device lacks a valid address but that one may be specified in the devicetree. As is clear from commit |
||
Johan Hovold
|
0cb7365850 |
Bluetooth: fix invalid-bdaddr quirk for non-persistent setup
Devices that lack persistent storage for the device address can indicate
this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller
to be marked as unconfigured until user space has set a valid address.
Once configured, the device address must be set on every setup for
controllers with HCI_QUIRK_NON_PERSISTENT_SETUP to avoid marking the
controller as unconfigured and requiring the address to be set again.
Fixes:
|
||
Zhengping Jiang
|
f752a0b334 |
Bluetooth: L2CAP: Fix use-after-free
Fix potential use-after-free in l2cap_le_command_rej. Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Vladimir Oltean
|
c1ae02d876 |
net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT
Currently the sja1105 tagging protocol prefers using the source port information from the VLAN header if that is available, falling back to the INCL_SRCPT option if it isn't. The VLAN header is available for all frames except for META frames initiated by the switch (containing RX timestamps), and thus, the "if (is_link_local)" branch is practically dead. The tag_8021q source port identification has become more loose ("imprecise") and will report a plausible rather than exact bridge port, when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local traffic always needs to know the precise source port. With incorrect source port reporting, for example PTP traffic over 2 bridged ports will all be seen on sockets opened on the first such port, which is incorrect. Now that the tagging protocol has been changed to make link-local frames always contain source port information, we can reverse the order of the checks so that we always give precedence to that information (which is always precise) in lieu of the tag_8021q VID which is only precise for a standalone port. Fixes: |
||
Florian Westphal
|
93d75d475c |
net/sched: act_ipt: zero skb->cb before calling target
xtables relies on skb being owned by ip stack, i.e. with ipv4
check in place skb->cb is supposed to be IPCB.
I don't see an immediate problem (REJECT target cannot be used anymore
now that PRE/POSTROUTING hook validation has been fixed), but better be
safe than sorry.
A much better patch would be to either mark act_ipt as
"depends on BROKEN" or remove it altogether. I plan to do this
for -next in the near future.
This tc extension is broken in the sense that tc lacks an
equivalent of NF_STOLEN verdict.
With NF_STOLEN, target function takes complete ownership of skb, caller
cannot dereference it anymore.
ACT_STOLEN cannot be used for this: it has a different meaning, caller
is allowed to dereference the skb.
At this time NF_STOLEN won't be returned by any targets as far as I can
see, but this may change in the future.
It might be possible to work around this via list of allowed
target extensions known to only return DROP or ACCEPT verdicts, but this
is error prone/fragile.
Existing selftest only validates xt_LOG and act_ipt is restricted
to ipv4 so I don't think this action is used widely.
Fixes:
|
||
Florian Westphal
|
b2dc32dcba |
net/sched: act_ipt: add sanity checks on skb before calling target
Netfilter targets make assumptions on the skb state, for example
iphdr is supposed to be in the linear area.
This is normally done by IP stack, but in act_ipt case no
such checks are made.
Some targets can even assume that skb_dst will be valid.
Make a minimum effort to check for this:
- Don't call the targets eval function for non-ipv4 skbs.
- Don't call the targets eval function for POSTROUTING
emulation when the skb has no dst set.
v3: use skb_protocol helper (Davide Caratti)
Fixes:
|
||
Florian Westphal
|
b4ee93380b |
net/sched: act_ipt: add sanity checks on table name and hook locations
Looks like "tc" hard-codes "mangle" as the only supported table
name, but on kernel side there are no checks.
This is wrong. Not all xtables targets are safe to call from tc.
E.g. "nat" targets assume skb has a conntrack object assigned to it.
Normally those get called from netfilter nat core which consults the
nat table to obtain the address mapping.
"tc" userspace either sets PRE or POSTROUTING as hook number, but there
is no validation of this on kernel side, so update netlink policy to
reject bogus numbers. Some targets may assume skb_dst is set for
input/forward hooks, so prevent those from being used.
act_ipt uses the hook number in two places:
1. the state hook number, this is fine as-is
2. to set par.hook_mask
The latter is a bit mask, so update the assignment to make
xt_check_target() to the right thing.
Followup patch adds required checks for the skb/packet headers before
calling the targets evaluation function.
Fixes:
|
||
Chengfeng Ye
|
6feb37b3b0 |
sctp: fix potential deadlock on &net->sctp.addr_wq_lock
As &net->sctp.addr_wq_lock is also acquired by the timer
sctp_addr_wq_timeout_handler() in protocal.c, the same lock acquisition
at sctp_auto_asconf_init() seems should disable irq since it is called
from sctp_accept() under process context.
Possible deadlock scenario:
sctp_accept()
-> sctp_sock_migrate()
-> sctp_auto_asconf_init()
-> spin_lock(&net->sctp.addr_wq_lock)
<timer interrupt>
-> sctp_addr_wq_timeout_handler()
-> spin_lock_bh(&net->sctp.addr_wq_lock); (deadlock here)
This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock.
The tentative patch fix the potential deadlock by spin_lock_bh().
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Fixes:
|
||
Linus Torvalds
|
3a8a670eee |
Networking changes for 6.5.
Core ---- - Rework the sendpage & splice implementations. Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is. Remove the MSG_SENDPAGE_NOTLAST flag completely. - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid. - Enable socket busy polling with CONFIG_RT. - Improve reliability and efficiency of reporting for ref_tracker. - Auto-generate a user space C library for various Netlink families. Protocols --------- - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2]. - Use per-VMA locking for "page-flipping" TCP receive zerocopy. - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags. - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative. - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO). - Avoid waking up applications using TLS sockets until we have a full record. - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring. - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address. - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch. - PPPoE: make number of hash bits configurable. - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig). - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge). - Support matching on Connectivity Fault Management (CFM) packets. - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug. - HSR: don't enable promiscuous mode if device offloads the proto. - Support active scanning in IEEE 802.15.4. - Continue work on Multi-Link Operation for WiFi 7. BPF --- - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators. - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything. - Accept dynptr memory as memory arguments passed to helpers. - Add routing table ID to bpf_fib_lookup BPF helper. - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands. - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only). - Show target_{obj,btf}_id in tracing link fdinfo. - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter --------- - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value. - Increase ip_vs_conn_tab_bits range for 64BIT builds. - Allow updating size of a set. - Improve NAT tuple selection when connection is closing. Driver API ---------- - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out). - Support configuring rate selection pins of SFP modules. - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines. - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer. - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio). - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message. - Split devlink instance and devlink port operations. New hardware / drivers ---------------------- - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers ------- - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/ fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/ JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4= =9i4I -----END PGP SIGNATURE----- Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" * tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ... |
||
Linus Torvalds
|
6e17c6de3d |
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing. - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability. - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning. - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface. - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree. - Johannes Weiner has done some cleanup work on the compaction code. - David Hildenbrand has contributed additional selftests for get_user_pages(). - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code. - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code. - Huang Ying has done some maintenance on the swap code's usage of device refcounting. - Christoph Hellwig has some cleanups for the filemap/directio code. - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses. - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings. - John Hubbard has a series of fixes to the MM selftesting code. - ZhangPeng continues the folio conversion campaign. - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock. - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8. - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management. - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code. - Vishal Moola also has done some folio conversion work. - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh J0qXCzqaaN8+BuEyLGDVPaXur9KirwY= =B7yQ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ... |
||
Linus Torvalds
|
582c161cf3 |
hardening updates for v6.5-rc1
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko) - Convert strreplace() to return string start (Andy Shevchenko) - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook) - Add missing function prototypes seen with W=1 (Arnd Bergmann) - Fix strscpy() kerndoc typo (Arne Welzel) - Replace strlcpy() with strscpy() across many subsystems which were either Acked by respective maintainers or were trivial changes that went ignored for multiple weeks (Azeem Shaikh) - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers) - Add KUnit tests for strcat()-family - Enable KUnit tests of FORTIFY wrappers under UML - Add more complete FORTIFY protections for strlcat() - Add missed disabling of FORTIFY for all arch purgatories. - Enable -fstrict-flex-arrays=3 globally - Tightening UBSAN_BOUNDS when using GCC - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays - Improve use of const variables in FORTIFY - Add requested struct_size_t() helper for types not pointers - Add __counted_by macro for annotating flexible array size members -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx 9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo 2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA 9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8 sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r 5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J cwANDmkL6QQK7yfeeg== =s0j1 -----END PGP SIGNATURE----- Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "There are three areas of note: A bunch of strlcpy()->strscpy() conversions ended up living in my tree since they were either Acked by maintainers for me to carry, or got ignored for multiple weeks (and were trivial changes). The compiler option '-fstrict-flex-arrays=3' has been enabled globally, and has been in -next for the entire devel cycle. This changes compiler diagnostics (though mainly just -Warray-bounds which is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_ coverage. In other words, there are no new restrictions, just potentially new warnings. Any new FORTIFY warnings we've seen have been fixed (usually in their respective subsystem trees). For more details, see commit |
||
Linus Torvalds
|
729b39ec1b |
selinux/stable-6.5 PR 20230626
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmSZucUUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMoew/+IpRuIKwouAvTINC2IEuacNlCghSs berPaYSLF89WTgbJN6hPm9NtaPU+epm5hikYp9/Ebm1Hi/91zgZZfUAN64c4e9Mx 0GgO4VwuEbx6pOK0CF9EEQTlOWnOOiP24pQlYtQGUcYOTY3OaxFkLjYx9BMw05Rd Km93eVRgJolap62ChCxdULPQQIEW0DDNGAI9TPRrPbtYRT0oSmfsMGL8Ndkui8K8 LlUVpOO5MM5/gCJjP+5PSVoyui6++ao2AwjsFk7I3hJqm3NN5fWFzWH9axLqZEqd ZfGdiah48ga+eNqi6pi79pBetlvpfHshELVwKxN9ck2UjzWQe8dqfy1p/0ikHO29 OuD+urnGTPF668GszGZgC59LoaKrHFUBjfxj3g56/BOk2aqxXKY7qeZClJ/AUEZv +VEa/foB0OCVxCBOcTvXB7Zgiz5isoR3hAQu2MmWzny9tCgHFYXJ1u0UhQaFjx57 ScPxlnjvzD5pA4ts+P2ggRojQ3Xo35dUoC353kuaaCrSg9v8yfz0ex3KeS/m9uJG MbeOtl44Xmqzzy0EB7ycNeF96kdbvKSc5XLBZyuT5CmAMUXlL3s6OOa26aevVifj LwNHAc1D7oe773Ty2WpW2s82Nh4hUyYVdIKg+9RDm74mS2ftZFeeGgFVumNQ80ZH DGhjW2iZY+0a0EU= =xzMY -----END PGP SIGNATURE----- Merge tag 'selinux-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Thanks to help from the MPTCP folks, it looks like we have finally sorted out a proper solution to the MPTCP socket labeling issue, see the new security_mptcp_add_subflow() LSM hook. - Fix the labeled NFS handling such that a labeled NFS share mounted prior to the initial SELinux policy load is properly labeled once a policy is loaded; more information in the commit description. - Two patches to security/selinux/Makefile, the first took the cleanups in v6.4 a bit further and the second removed the grouped targets support as that functionality doesn't appear to be properly supported prior to make v4.3. - Deprecate the "fs" object context type in SELinux policies. The fs object context type was an old vestige that was introduced back in v2.6.12-rc2 but never really used. - A number of small changes that remove dead code, clean up some awkward bits, and generally improve the quality of the code. See the individual commit descriptions for more information. * tag 'selinux-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: avoid bool as identifier name selinux: fix Makefile for versions of make < v4.3 selinux: make labeled NFS work when mounted before policy load selinux: cleanup exit_sel_fs() declaration selinux: deprecated fs ocon selinux: make header files self-including selinux: keep context struct members in sync selinux: Implement mptcp_add_subflow hook security, lsm: Introduce security_mptcp_add_subflow() selinux: small cleanups in selinux_audit_rule_init() selinux: declare read-only data arrays const selinux: retain const qualifier on string literal in avtab_hash_eval() selinux: drop return at end of void function avc_insert() selinux: avc: drop unused function avc_disable() selinux: adjust typos in comments selinux: do not leave dangling pointer behind selinux: more Makefile tweaks |
||
Linus Torvalds
|
72dc6db7e3 |
workqueue: Ordered workqueue creation cleanups
For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This pull request clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoKnA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGc5SAQDOtjML7Cx9AYzbY5+nYc0wTebRRTXGeOu7A3Xy j50rVgEAjHgvHLIdmeYmVhCeHOSN4q7Wn5AOwaIqZalOhfLyKQk= =hs79 -----END PGP SIGNATURE----- Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull ordered workqueue creation updates from Tejun Heo: "For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior" * tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues scsi: NCR5380: Use default @max_active for hostdata->work_q media: coda: Use alloc_ordered_workqueue() to create ordered workqueues crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues wifi: mwifiex: Use default @max_active for workqueues wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues greybus: Use alloc_ordered_workqueue() to create ordered workqueues powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues |
||
Alexander Mikhalitsyn
|
a9c49cc2f5 |
net: scm: introduce and use scm_recv_unix helper
Recently, our friends from bluetooth subsystem reported [1] that after commit |
||
Jakub Kicinski
|
3674fbf045 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.5 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Kuniyuki Iwashima
|
25a9c8a443 |
netlink: Add __sock_i_ino() for __netlink_diag_dump().
syzbot reported a warning in __local_bh_enable_ip(). [0] Commit |
||
Vladimir Oltean
|
d06f925f13 |
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
When using the felix driver (the only one which supports UC filtering and MC filtering) as a DSA master for a random other DSA switch, one can see the following stack trace when the downstream switch ports join a VLAN-aware bridge: ============================= WARNING: suspicious RCU usage ----------------------------- net/8021q/vlan_core.c:238 suspicious rcu_dereference_protected() usage! stack backtrace: Workqueue: dsa_ordered dsa_slave_switchdev_event_work Call trace: lockdep_rcu_suspicious+0x170/0x210 vlan_for_each+0x8c/0x188 dsa_slave_sync_uc+0x128/0x178 __hw_addr_sync_dev+0x138/0x158 dsa_slave_set_rx_mode+0x58/0x70 __dev_set_rx_mode+0x88/0xa8 dev_uc_add+0x74/0xa0 dsa_port_bridge_host_fdb_add+0xec/0x180 dsa_slave_switchdev_event_work+0x7c/0x1c8 process_one_work+0x290/0x568 What it's saying is that vlan_for_each() expects rtnl_lock() context and it's not getting it, when it's called from the DSA master's ndo_set_rx_mode(). The caller of that - dsa_slave_set_rx_mode() - is the slave DSA interface's dsa_port_bridge_host_fdb_add() which comes from the deferred dsa_slave_switchdev_event_work(). We went to great lengths to avoid the rtnl_lock() context in that call path in commit |
||
Kuniyuki Iwashima
|
9d797ee2dc |
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
This reverts commit |
||
David Howells
|
5da4d7b8e6 |
libceph: Partially revert changes to support MSG_SPLICE_PAGES
Fix the mishandling of MSG_DONTWAIT and also reinstates the per-page checking of the source pages (which might have come from a DIO write by userspace) by partially reverting the changes to support MSG_SPLICE_PAGES and doing things a little differently. In messenger_v1: (1) The ceph_tcp_sendpage() is resurrected and the callers reverted to use that. (2) The callers now pass MSG_MORE unconditionally. Previously, they were passing in MSG_MORE|MSG_SENDPAGE_NOTLAST and then degrading that to just MSG_MORE on the last call to ->sendpage(). (3) Make ceph_tcp_sendpage() a wrapper around sendmsg() rather than sendpage(), setting MSG_SPLICE_PAGES if sendpage_ok() returns true on the page. In messenger_v2: (4) Bring back do_try_sendpage() and make the callers use that. (5) Make do_try_sendpage() use sendmsg() for both cases and set MSG_SPLICE_PAGES if sendpage_ok() is set. Fixes: |
||
Jakub Kicinski
|
61dc651cdf |
netfilter pull request 23-06-26
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmSZMg8ACgkQ1V2XiooU IOSPnQ//VBUgCxUtgCuQX4PwY+Dqr//BLD8DcA8SNMKqCpYPmXyamPS+ZhtRI1c5 NplWcFuaER4hqiVHkbKyOzOitDXQb4s4Mn09YyLAIb2/uhxg1f79SYSGSi7H5Fay LElP7l84ars0GxlyUNmiwyzA4ySFyuWekT35o2A3eX5gawpf9mPpD21uOqAOKcad V2Z7Rz0mFcz3e400DNEx2DNehXSWZQT2O+05zIWFfpBZ7UB42GJaC+Id1RqtIX2m w5a2DtvWGKUcgWkA5KHqSQn0Ft21MePqL4QsS/s3z0jffPJUkoQX9pqnccFqr9LL 0aWKOSJFZoYtnbUGkRaPY5Kdob7Wgk5px4FUUBHORb39I98w0zP5h1hFY8jgMJxn J4+8Ys4C7Kv3Z+vq6sEo07WnbaIhj4LNO9GRwjaO2NP/UPUqrGIuhB1elBoVL8uX YvoVF6oRaB4ccaH7gR/4R6liF9flsH16OYJTbHp632Ali1nVZDP1vAvNviI5V12G WhrPVi50Utxn9KrV6ez6JJY2ysts7tip/TVAxQN0hDIS22IOxJcuiYoCxaXOEjPJ 8hd6jkF0NApwnlSkPmoqo+ohQ42Az2PtDvEsENw+U7XHur99Ed1ywxRG/K/Pm6gX QjUhoAfr9hXObwjoHKNID3VZnZpEjEsVr7CBFvj6S6FyTx3Ag5k= =wxop -----END PGP SIGNATURE----- Merge tag 'nf-next-23-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Allow slightly larger IPVS connection table size from Kconfig for 64-bit arch, from Abhijeet Rastogi. 2) Since IPVS connection table might be larger than 2^20 after previous patch, allow to limit it depending on the available memory. Moreover, use kvmalloc. From Julian Anastasov. 3) Do not rebuild VLAN header in nft_payload when matching source and destination MAC address. 4) Remove nested rcu read lock side in ip_set_test(), from Florian Westphal. 5) Allow to update set size, also from Florian. 6) Improve NAT tuple selection when connection is closing, from Florian Westphal. 7) Support for resetting set element stateful expression, from Phil Sutter. 8) Use NLA_POLICY_MAX to narrow down maximum attribute value in nf_tables, from Florian Westphal. * tag 'nf-next-23-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: limit allowed range via nla_policy netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET netfilter: snat: evict closing tcp entries on reply tuple collision netfilter: nf_tables: permit update of set size netfilter: ipset: remove rcu_read_lock_bh pair from ip_set_test netfilter: nft_payload: rebuild vlan header when needed ipvs: dynamically limit the connection hash table ipvs: increase ip_vs_conn_tab_bits range for 64BIT ==================== Link: https://lore.kernel.org/r/20230626064749.75525-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
0aa69d53ac |
for-6.5/io_uring-2023-06-23
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8cEQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnvZD/0QWstFCe1CSLWaycdC9fhWftFt3hyEIST5 CYEL56UZrDWNkv9xTLe855xvMavjd4sdHlUa8NUPghRQeJyKYgRxHBLXRWmy0uNN l47Zjiwsolmbr3Nt6qViLdCDYmG39ZGNwWo8b6p3ybWYLtzxeOblocOBTPzoCtkS hjo7Z0eMONvsvLX+l0o9IDdWtZIQ2fGo4VYkMIVb6CyxRPpuUuPKbE25qaTx+uBg Fy6Qa3SlaTwzqcg3dttggjIP792L/eETUCWGndg5pJNbrkj/fI4Vm4bEljID76eS HODl+pWHmyM6avVkypr7N3Tp5HKF0OTUa4vJTLIZo1QiRu6zlXphtuvGn+McEmgV hbYmQMYWzqJ22k2iEpCR58pdhmZJC9uB8r4Rwgr/t9GKqt4E+15EzmqkG9cUVMGV rfbBwVLwBUd5+0WHwQ8RzdtaUPt17vSIW/8WhU5zoMGVotqVBHO/H+5BtmKPWWpq fx1etQ8XJVPIxziJvgsEitb1s6KZzJspcONDlLEitmZkflv3gGdVm99KNbXwJpcp m6+FcYQ5d5FivfLPGgpx8go+4M2QuoW2yRGwZHu54buCnpxgNjIk898OjrUrdXCg 3/0m99GXmOWQQl0VrrTr+Fv99nVsQ2hMQzOFJGMYRtHEEc5xiTcJiZmoxmF7T7/C TipyW3czsw== =5Me8 -----END PGP SIGNATURE----- Merge tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linux Pull io_uring updates from Jens Axboe: "Nothing major in this release, just a bunch of cleanups and some optimizations around networking mostly. - clean up file request flags handling (Christoph) - clean up request freeing and CQ locking (Pavel) - support for using pre-registering the io_uring fd at setup time (Josh) - Add support for user allocated ring memory, rather than having the kernel allocate it. Mostly for packing rings into a huge page (me) - avoid an unnecessary double retry on receive (me) - maintain ordering for task_work, which also improves performance (me) - misc cleanups/fixes (Pavel, me)" * tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linux: (39 commits) io_uring: merge conditional unlock flush helpers io_uring: make io_cq_unlock_post static io_uring: inline __io_cq_unlock io_uring: fix acquire/release annotations io_uring: kill io_cq_unlock() io_uring: remove IOU_F_TWQ_FORCE_NORMAL io_uring: don't batch task put on reqs free io_uring: move io_clean_op() io_uring: inline io_dismantle_req() io_uring: remove io_free_req_tw io_uring: open code io_put_req_find_next io_uring: add helpers to decode the fixed file file_ptr io_uring: use io_file_from_index in io_msg_grab_file io_uring: use io_file_from_index in __io_sync_cancel io_uring: return REQ_F_ flags from io_file_get_flags io_uring: remove io_req_ffs_set io_uring: remove a confusing comment above io_file_get_flags io_uring: remove the mode variable in io_file_get_flags io_uring: remove __io_file_supports_nowait io_uring: wait interruptibly for request completions on exit ... |
||
Linus Torvalds
|
3eccc0c886 |
for-6.5/splice-2023-06-23
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8QgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpupIEADKEZvpxDyaxHjYZFFeoSJRkh+AEJHe0Xtr J5vUL8t8zmAV3F7i8XaoAEcR0dC0VQcoTc8fAOty71+5hsc7gvtyyNjqU/YWRVqK Xr+VJuSJ+OGx3MzpRWEkepagfPyqP5cyyCOK6gqIgqzc3IwqkR/3QHVRc6oR8YbY AQd7tqm2fQXK9WDHEy5hcaQeqb9uKZjQQoZejpPPerpJM+9RMgKxpCGtnLLIUhr/ sgl7KyLIQPBmveO2vfOR+dmsJBqsLqneqkXDKMAIfpeVEEkHHAlCH4E5Ne1XUS+s ie4If+reuyn1Ktt5Ry1t7w2wr8cX1fcay3K28tgwjE2Bvremc5YnYgb3pyUDW38f tXXkpg/eTXd/Pn0Crpagoa9zJ927tt5JXIO1/PagPEP1XOqUuthshDFsrVqfqbs+ 36gqX2JWB4NJTg9B9KBHA3+iVCJyZLjUqOqws7hOJOvhQytZVm/IwkGBg1Slhe1a J5WemBlqX8lTgXz0nM7cOhPYTZeKe6hazCcb5VwxTUTj9SGyYtsMfqqTwRJO9kiF j1VzbOAgExDYe+GvfqOFPh9VqZho66+DyOD/Xtca4eH7oYyHSmP66o8nhRyPBPZA maBxQhUkPQn4/V/0fL2TwIdWYKsbj8bUyINKPZ2L35YfeICiaYIctTwNJxtRmItB M3VxWD3GZQ== =KhW4 -----END PGP SIGNATURE----- Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ... |
||
Linus Torvalds
|
f7976a6493 |
NFSD 6.5 Release Notes
Fixes and clean-ups include: - Clean-ups in the READ path in anticipation of MSG_SPLICE_PAGES - Better NUMA awareness when allocating pages and other objects - A number of minor clean-ups to XDR encoding - Elimination of a race when accepting a TCP socket - Numerous observability enhancements -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmSVpEQACgkQM2qzM29m f5flNQ/+N+GKwJWKkDDK9l0/OJWLSOV1DaUKs04FW79Xa05bqB9W3mnkQ2QzXmBq 9BUUeAnn0bD+yDgsBd4+l5HMjK91Qm+e+ljE4Gn+hA2Kg5VsBPbNImXQTuAdC/mw 2YWNqaQ6EjHWmzauAOiKqKwDsYefZaiS3+1CEuOlmXeDK979yU7zXHUbFvLjmNiP 5ATDrxfIsQza+Je0sh3JGeJtFZCt127zOd6vEQCiedbC8yy7n1ldWi/OGWPsg/2H z/QWbs9Iw8ExESosBKfX6M13izJV9eZ69ZfvxmLFGcfZmHoOCOtIXzDDqmD6JsPY l4SBtWkP+OB4jePM1nEpFU65EQrGtRK/roGGXNtCTaZAcDBk2/jJ9b5gejkSDsaU B3alN0UwECeEzQ3whYYAGy4m1FPaQcFfxl1RydcvYEfRiuiYxZb/3EO62dDavv14 WTYUGHNxso48DGngyO2xRBIda0Kwqc4vgkzOww6SP+Eok/22q/CsKHiuzRgnrLq6 GYdRX0Zmvl/0lzuBsDOzpOIQg2DuGdc84fjUCxqQu0DBSkehWo4i2eGNHQuHbMUb Tl58/tYiFazVOX1aAKabzKm2iaCjKdLySDhTj+dSgTXKx6SqZl1cXvhFl9xImgYj uW9i+ALZO9PNvtpMIvGaHqQK6xyI9F6isgKyNpXwy9J5D6ck7GA= =1EQc -----END PGP SIGNATURE----- Merge tag 'nfsd-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: - Clean-ups in the READ path in anticipation of MSG_SPLICE_PAGES - Better NUMA awareness when allocating pages and other objects - A number of minor clean-ups to XDR encoding - Elimination of a race when accepting a TCP socket - Numerous observability enhancements * tag 'nfsd-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (46 commits) nfsd: remove redundant assignments to variable len svcrdma: Fix stale comment NFSD: Distinguish per-net namespace initialization nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net SUNRPC: Address RCU warning in net/sunrpc/svc.c SUNRPC: Use sysfs_emit in place of strlcpy/sprintf SUNRPC: Remove transport class dprintk call sites SUNRPC: Fix comments for transport class registration svcrdma: Remove an unused argument from __svc_rdma_put_rw_ctxt() svcrdma: trace cc_release calls svcrdma: Convert "might sleep" comment into a code annotation NFSD: Add an nfsd4_encode_nfstime4() helper SUNRPC: Move initialization of rq_stime SUNRPC: Optimize page release in svc_rdma_sendto() svcrdma: Prevent page release when nothing was received svcrdma: Revert |
||
Pablo Neira Ayuso
|
b389139f12 |
netfilter: nf_tables: fix underflow in chain reference counter
Set element addition error path decrements reference counter on chains twice: once on element release and again via nft_data_release(). Then, |
||
Pablo Neira Ayuso
|
3e70489721 |
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
Otherwise a dangling reference to a rule object that is gone remains
in the set binding list.
Fixes:
|
||
Ilia.Gavrilov
|
f188d30087 |
netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
ct_sip_parse_numerical_param() returns only 0 or 1 now.
But process_register_request() and process_register_response() imply
checking for a negative value if parsing of a numerical header parameter
failed.
The invocation in nf_nat_sip() looks correct:
if (ct_sip_parse_numerical_param(...) > 0 &&
...) { ... }
Make the return value of the function ct_sip_parse_numerical_param()
a tristate to fix all the cases
a) return 1 if value is found; *val is set
b) return 0 if value is not found; *val is unchanged
c) return -1 on error; *val is undefined
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes:
|