linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-15 00:21:59 +00:00

Author	SHA1	Message	Date
Danielle Ratson	c4f78134d4	ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB According to the CMIS standard, the firmware update process is done using a CDB commands sequence. Implement a work that will be triggered from the module layer in the next patch the will initiate and execute all the CDB commands in order, to eventually complete the firmware update process. This flashing process includes, writing the firmware image, running the new firmware image and committing it after testing, so that it will run upon reset. This work will also notify user space about the progress of the firmware update process. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:23 +01:00
Danielle Ratson	a39c84d796	ethtool: cmis_cdb: Add a layer for supporting CDB commands CDB (Command Data Block Message Communication) reads and writes are performed on memory map pages 9Fh-AFh according to the CMIS standard, section 8.20 of revision 5.2. Page 9Fh is used to specify the CDB command to be executed and also provides an area for a local payload (LPL). According to the CMIS standard, the firmware update process is done using a CDB commands sequence that will be implemented in the next patch. The kernel interface that will implement the firmware update using CDB command will include 2 layers that will be added under ethtool: * The upper layer that will be triggered from the module layer, is cmis_fw_update. * The lower one is cmis_cdb. In the future there might be more operations to implement using CDB commands. Therefore, the idea is to keep the CDB interface clean and the cmis_fw_update specific to the CDB commands handling it. These two layers will communicate using the API the consists of three functions: - struct ethtool_cmis_cdb * ethtool_cmis_cdb_init(struct net_device dev, struct ethtool_module_fw_flash_params params); - void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb cdb); - int ethtool_cmis_cdb_execute_cmd(struct net_device dev, struct ethtool_cmis_cdb_cmd_args args); Add the CDB layer to support initializing, finishing and executing CDB commands: The initialization process will include creating of an ethtool_cmis_cdb instance, querying the module CDB support, entering and validating the password from user space (CMD 0x0000) and querying the module features (CMD 0x0040). * The finishing API will simply free the ethtool_cmis_cdb instance. * The executing process will write the CDB command to EEPROM using set_module_eeprom_by_page() that was presented earlier, and will process the reply from EEPROM. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:23 +01:00
Danielle Ratson	31e0aa99dc	ethtool: Veto some operations during firmware flashing process Some operations cannot be performed during the firmware flashing process. For example: - Port must be down during the whole flashing process to avoid packet loss while committing reset for example. - Writing to EEPROM interrupts the flashing process, so operations like ethtool dump, module reset, get and set power mode should be vetoed. - Split port firmware flashing should be vetoed. In order to veto those scenarios, add a flag in 'struct net_device' that indicates when a firmware flash is taking place on the module and use it to prevent interruptions during the process. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:22 +01:00
Danielle Ratson	d7d4cfc4c9	ethtool: Add flashing transceiver modules' firmware notifications ability Add progress notifications ability to user space while flashing modules' firmware by implementing the interface between the user space and the kernel. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:22 +01:00
Sagi Grimberg	2d5f6801db	Revert "net: micro-optimize skb_datagram_iter" This reverts commit `934c29999b`. This triggered a usercopy BUG() in systems with HIGHMEM, reported by the test robot in: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Link: https://patch.msgid.link/20240626070153.759257-1-sagi@grimberg.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-27 16:44:32 -07:00
Jakub Kicinski	56bf02c26a	Highlights this time are: - cfg80211/nl80211: * improvements for 6 GHz regulatory flexibility - mac80211: * use generic netdev stats * multi-link improvements/fixes - brcmfmac: * MFP support (to enable WPA3) - wilc1000: * suspend/resume improvements - iwlwifi: * remove support for older FW for new devices * fast resume (keeping the device configured) - wl18xx: * support newer firmware versions -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmZ9T40ACgkQ10qiO8sP aACifQ/+LPYi4q/WWME+ceNrUebkRS9d0QuT5kA3EdtoxstR5582L32+X9G3RZ23 IAA5Mo7JfPTVqHNcS34Uh0qJge+hNVAJfksenyaCUfLpeNX+c78xlvXIWXpilD/U 7KK82wpovQ82cFAk4oymTYY/9Fzab9V0WswndzEOEaD7QfR0MHtyC6sDONMbt2Qe RSBeZF/rkTjyL2dymVWHUYMMx84sB11Tiwkd7vsk/PhLepOS9PvW2jFGKc0hePeu Q59WdM87rG5zlkBwrEy44mrPTR3GmGpQsDvdajH8xxkO48ry2ATe7qi9PrfSjon5 jaM7oEoHi+XIKfB20Ulpi0hdE67MQhwydfdrtulGe6IZOVpsUbnRiduKDFmkGcFT mjj0L01kp/KQtMsZF35WDCeYhaHLpidh2f18e60XBDPt22goDoWD3PyM7Mhy0flY bA/sh8hQrWw5+jxTfc5UmZHYlWh4TYOyVs6Ub0qMQtFaCdLDFQG/abkdwHZO4e9G 3tstlSSa41vziX1rwMTUkYbNzCdjEVnqnvWAICXXgH38ubdAxId/1xkMSHpEEwGL X9CVPmu2lPKJ4kwhcUnEE1QH5q9kRwaZ5gIq777PfRx9UzT4ViGiRVWx0qC54vLB 34fSEstrXKx9crpfFtOFPQxUHsXzod/kWEDSvkzpZAHeWtpDVu0= =MR6f -----END PGP SIGNATURE----- Merge tag 'wireless-next-2024-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Highlights this time are: - cfg80211/nl80211: * improvements for 6 GHz regulatory flexibility - mac80211: * use generic netdev stats * multi-link improvements/fixes - brcmfmac: * MFP support (to enable WPA3) - wilc1000: * suspend/resume improvements - iwlwifi: * remove support for older FW for new devices * fast resume (keeping the device configured) - wl18xx: * support newer firmware versions * tag 'wireless-next-2024-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (100 commits) wifi: brcmfmac: of: Support interrupts-extended wifi: brcmsmac: advertise MFP_CAPABLE to enable WPA3 net: rfkill: Correct return value in invalid parameter case wifi: mac80211: fix NULL dereference at band check in starting tx ba session wifi: iwlwifi: mvm: fix rs.h kernel-doc wifi: iwlwifi: fw: api: datapath: fix kernel-doc wifi: iwlwifi: fix remaining mistagged kernel-doc comments wifi: iwlwifi: fix prototype mismatch kernel-doc warnings wifi: iwlwifi: fix kernel-doc in iwl-fh.h wifi: iwlwifi: fix kernel-doc in iwl-trans.h wifi: iwlwifi: pcie: fix kernel-doc wifi: iwlwifi: dvm: fix kernel-doc warnings wifi: iwlwifi: mvm: don't log error for failed UATS table read wifi: iwlwifi: trans: make bad state warnings wifi: iwlwifi: fw: api: fix some kernel-doc wifi: iwlwifi: mvm: remove init_dbg module parameter wifi: iwlwifi: update the BA notification API wifi: iwlwifi: mvm: always unblock EMLSR on ROC end wifi: iwlwifi: mvm: use IWL_FW_CHECK for link ID check wifi: iwlwifi: mvm: don't flush BSSes on restart with MLD API ... ==================== Link: https://patch.msgid.link/20240627114135.28507-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-27 13:53:43 -07:00
Jakub Kicinski	193b9b2002	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: `e3f02f32a0` ("ionic: fix kernel panic due to multi-buffer handling") `d9c0420999` ("ionic: Mark error paths in the data path as unlikely") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-27 12:14:11 -07:00
Jakub Kicinski	ffb7aa9fed	Just a few changes: - maintainers: Larry Finger sadly passed away - maintainers: ath trees are in their group now - TXQ FQ quantum configuration fix - TI wl driver: work around stuck FW in AP mode - mac80211: disable softirqs in some new code needing that -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmZ9Iz4ACgkQ10qiO8sP aADG+hAAl4kOGUfU2URi7zd4iod0NCXo9Hqrn7tGkkj1szcO2GvRc69Mja4AKo3L eeUJe/YuGGAGS7myH6+ZF6yymMdLyq8qrcPKN7YeZATcaO/n6K90Go3Wsaz7jXWC xZcnbGV3qvYrAmraA6fxOJ42Z7bt3+h+VVqUU7h/a1iWvuKB+u5f0zn78LLNwwYq HOw/kbe8oAadUOb1Sj7QHdG0ZBvXNwr3q0oTFMW06Z/TxAITovsHQtQcjmi9uJQV ILXitLCo/Xe+skSnAeyC4wNSJYFGWO5TurgW1d7y9ZR5/xQtM95VIciQSLWoD/tR cbQLbNjML17yxvU7G6U7WYxgMFghhUA476GFIsis3s3/KxvQ5nIzQc403yltVpl3 I4E6mF63M37jvY+O3UyDWhWeYbR1/BBh6MQ2fKz57UZqvgLkzVohRwsK4G0jhF+S HmBALZmVOZFJGYDn/mxZIYwdG1iZoKSE/KMNrjNE/mX7IuE8ax5vvFqJ5yqYF6tq IpBoh/NOh9sa0Jq8gfb6VikM424hZ9tkzf5/+C7mhyW3Dl/pzcEojAjeoG5jMnrN 6r+nwypEt2byOE2GrIAKTcH4XhjQSXeedbC7vZ31Cm2ynT1K1H+cysMKl2wa18xz luweG0L4m9FQDBkT8UUPMlcZKNszye1cp+2F4tbZOKRKbMillXU= =TLwj -----END PGP SIGNATURE----- Merge tag 'wireless-2024-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few changes: - maintainers: Larry Finger sadly passed away - maintainers: ath trees are in their group now - TXQ FQ quantum configuration fix - TI wl driver: work around stuck FW in AP mode - mac80211: disable softirqs in some new code needing that * tag 'wireless-2024-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: MAINTAINERS: wifi: update ath.git location MAINTAINERS: Remembering Larry Finger wifi: mac80211: disable softirqs for queued frame handling wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values wifi: wlcore: fix wlcore AP mode ==================== Link: https://patch.msgid.link/20240627083627.15312-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-27 11:57:40 -07:00
Linus Torvalds	fd19d4a492	Including fixes from can, bpf and netfilter. Current release - regressions: - core: add softirq safety to netdev_rename_lock - tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO - batman-adv: fix RCU race at module unload time Current release - new code bugs: Previous releases - regressions: - openvswitch: get related ct labels from its master if it is not confirmed - eth: bonding: fix incorrect software timestamping report - eth: mlxsw: fix memory corruptions on spectrum-4 systems - eth: ionic: use dev_consume_skb_any outside of napi Previous releases - always broken: - netfilter: fully validate NFT_DATA_VALUE on store to data registers - unix: several fixes for OoB data - tcp: fix race for duplicate reqsk on identical SYN - bpf: - fix may_goto with negative offset. - fix the corner case with may_goto and jump to the 1st insn. - fix overrunning reservations in ringbuf - can: - j1939: recover socket queue on CAN bus error during BAM transmission - mcp251xfd: fix infinite loop when xmit fails - dsa: microchip: monitor potential faults in half-duplex mode - eth: vxlan: pull inner IP header in vxlan_xmit_one() - eth: ionic: fix kernel panic due to multi-buffer handling Misc: - selftest: unix tests refactor and a lot of new cases added Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmZ9ZlQSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkawoQAKLTWHswqM790uaAAgqP6jGuC4/waRS8 MowEt5rHlwdMXcHhLrDSrLQoDJAZRsWmjniIgbsaeX+HtY4HXfF0tfDMPKiws3vx Z51qVj7zYjdT7IoZ7Yc8Zlwmt2kVgO4ba6gSigQSORQO9Qq/WNSb0q8BM6cDaYXT cXC7ikPeMlLnxKxsFRpZ3CUD06dI/aJFp/pefPEm7/X/EbROlSs5y+2GshPdp5t7 tzOUsLHs6ORVq/6jg2nRHH+0D+LMuQG0Z0yCMmYerJMJNtRIxyW6tTYeAsWXeyn3 UN3gaoQ/SIURDrNRZvHsaVDNO/u4rbYtFLoK7S5uPffPWqsGJY59FcH+xYFukFCD P5Lca4kKBr8xOahsRfSiO0uFbwQfQAauzNiz9Ue39n1hj+ZhZ/CliBLhUeoBl6Y6 jSsxq+/8CZCQ7beek96cyLx83skAcWAU5BEC9xOVlOTuTL91Gxr9UzSx/FqLI34h Smgw9ZUPzJgvFLgB/OBQ/WYne9LfJ5RYQHZoAXObiozO3TX7NgBUfa0e1T9dLE3F TalysSO3/goiZNK5a/UNJcj3fAcSEs4M2z9UIK790i3P3GuRigs1sJEtTUqyowWk aaTFmWCXE0wdoshJjux3syh3Vk6phJWpOlMLYjy0v5s0BF/ZOfDaKQT/dGsvV1HE AFGpKpybizNV =BYgZ -----END PGP SIGNATURE----- Merge tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, bpf and netfilter. There are a bunch of regressions addressed here, but hopefully nothing spectacular. We are still waiting the driver fix from Intel, mentioned by Jakub in the previous networking pull. Current release - regressions: - core: add softirq safety to netdev_rename_lock - tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO - batman-adv: fix RCU race at module unload time Previous releases - regressions: - openvswitch: get related ct labels from its master if it is not confirmed - eth: bonding: fix incorrect software timestamping report - eth: mlxsw: fix memory corruptions on spectrum-4 systems - eth: ionic: use dev_consume_skb_any outside of napi Previous releases - always broken: - netfilter: fully validate NFT_DATA_VALUE on store to data registers - unix: several fixes for OoB data - tcp: fix race for duplicate reqsk on identical SYN - bpf: - fix may_goto with negative offset - fix the corner case with may_goto and jump to the 1st insn - fix overrunning reservations in ringbuf - can: - j1939: recover socket queue on CAN bus error during BAM transmission - mcp251xfd: fix infinite loop when xmit fails - dsa: microchip: monitor potential faults in half-duplex mode - eth: vxlan: pull inner IP header in vxlan_xmit_one() - eth: ionic: fix kernel panic due to multi-buffer handling Misc: - selftest: unix tests refactor and a lot of new cases added" * tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) net: mana: Fix possible double free in error handling path selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c. af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head. selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c selftest: af_unix: Check SIGURG after every send() in msg_oob.c selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c af_unix: Don't stop recv() at consumed ex-OOB skb. selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c. af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head. af_unix: Stop recv(MSG_PEEK) at consumed OOB skb. selftest: af_unix: Add msg_oob.c. selftest: af_unix: Remove test_unix_oob.c. tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers net: usb: qmi_wwan: add Telit FN912 compositions tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO ionic: use dev_consume_skb_any outside of napi net: dsa: microchip: fix wrong register write when masking interrupt Fix race for duplicate reqsk on identical SYN ibmvnic: Add tx check to prevent skb leak ...	2024-06-27 10:05:35 -07:00
Paolo Abeni	b62cb6a7e8	netfilter pull request 24-06-27 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZ8paQACgkQ1V2XiooU IOTF+Q//Wx505P6J3v2iNfh7kDzHFtOZNZsBz0hlO4XVP7hoobsRiGJsmy+q1s10 pgoBw2nlY7kMAzCTZAInad9+gU3Iv67xMTB6j+qCB0Pnj77HFcRA8U2d6TYg+iDQ QXxeL7gzpBdH81G0PslHH6KeOwpxF5QQkIYH7OlLBGVNJCXH/SiR/gLkwjPojZFL hPMPgNmP78LZp0qLRzWgfjrwtE6oy9kyZB90dJi62SfC0sOGy4aHpFKn4zyzH9UI jB0uBaRXJuecBcS6EnA1lhkUTcIEUWcECa0CQf3OlL0+VFBjNk74R0aQhICPEZKe nFIVEE07N/95jJLSiJOmXZrhw93l2Wtc7efspJwB8bf3EP9eo9PCIjR7us6GIqRm hth0jYzjgGZgLsa74gt8i8js4F9ppgZlWGCs7QkGkGJ+KetCRLEty0DxPlIo0qb0 /l7F9Opu5lYdDYs7uEvBeHZT0vaRwDW6DnpGwIJyh1LO6WA0qnCIOWeBWZCDwRjW Wuck3vR27dEltwqXnfKETtlO22+Lzwv4HUnJ3HXOZdetv691jCezhswyO8CMZ8py i65LL4Ex4duMOSJh0UC3SXIrpnAkOFEG+hnYIu+pEZQgFsqHu+WQrMI+jUigLTnK SDtazKzH6tDkguiQaT35zorF+ZU3rfr+Lbh8Y4NxJEf1SP/g/S4= =eoyB -----END PGP SIGNATURE----- Merge tag 'nf-24-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains two Netfilter fixes for net: Patch #1 fixes CONFIG_SYSCTL=n for a patch coming in the previous PR to move the sysctl toggle to enable SRv6 netfilter hooks from nf_conntrack to the core, from Jianguo Wu. Patch #2 fixes a possible pointer leak to userspace due to insufficient validation of NFT_DATA_VALUE. Linus found this pointer leak to userspace via zdi-disclosures@ and forwarded the notice to Netfilter maintainers, he appears as reporter because whoever found this issue never approached Netfilter maintainers neither via security@ nor in private. netfilter pull request 24-06-27 * tag 'nf-24-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers netfilter: fix undefined reference to 'netfilter_lwtunnel_*' when CONFIG_SYSCTL=n ==================== Link: https://patch.msgid.link/20240626233845.151197-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-27 13:00:50 +02:00
Kuniyuki Iwashima	e400cfa38b	af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head. Even if OOB data is recv()ed, ioctl(SIOCATMARK) must return 1 when the OOB skb is at the head of the receive queue and no new OOB data is queued. Without fix: # RUN msg_oob.no_peek.oob ... # msg_oob.c:305:oob:Expected answ[0] (0) == oob_head (1) # oob: Test terminated by assertion # FAIL msg_oob.no_peek.oob not ok 2 msg_oob.no_peek.oob With fix: # RUN msg_oob.no_peek.oob ... # OK msg_oob.no_peek.oob ok 2 msg_oob.no_peek.oob Fixes: `314001f0bf` ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-27 12:05:01 +02:00
Kuniyuki Iwashima	36893ef0b6	af_unix: Don't stop recv() at consumed ex-OOB skb. Currently, recv() is stopped at a consumed OOB skb even if a new OOB skb is queued and we can ignore the old OOB skb. >>> from socket import * >>> c1, c2 = socket(AF_UNIX, SOCK_STREAM) >>> c1.send(b'hellowor', MSG_OOB) 8 >>> c2.recv(1, MSG_OOB) # consume OOB data stays at middle of recvq. b'r' >>> c1.send(b'ld', MSG_OOB) 2 >>> c2.recv(10) # recv() stops at the old consumed OOB b'hellowo' # should be 'hellowol' manage_oob() should not stop recv() at the old consumed OOB skb if there is a new OOB data queued. Note that TCP behaviour is apparently wrong in this test case because we can recv() the same OOB data twice. Without fix: # RUN msg_oob.no_peek.ex_oob_ahead_break ... # msg_oob.c:138:ex_oob_ahead_break:AF_UNIX :hellowo # msg_oob.c:139:ex_oob_ahead_break:Expected:hellowol # msg_oob.c:141:ex_oob_ahead_break:Expected ret[0] (7) == expected_len (8) # ex_oob_ahead_break: Test terminated by assertion # FAIL msg_oob.no_peek.ex_oob_ahead_break not ok 11 msg_oob.no_peek.ex_oob_ahead_break With fix: # RUN msg_oob.no_peek.ex_oob_ahead_break ... # msg_oob.c:146:ex_oob_ahead_break:AF_UNIX :hellowol # msg_oob.c:147:ex_oob_ahead_break:TCP :helloworl # OK msg_oob.no_peek.ex_oob_ahead_break ok 11 msg_oob.no_peek.ex_oob_ahead_break Fixes: `314001f0bf` ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-27 12:05:01 +02:00
Kuniyuki Iwashima	93c99f21db	af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head. Let's say a socket send()s "hello" with MSG_OOB and "world" without flags, >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX) >>> c1.send(b'hello', MSG_OOB) 5 >>> c1.send(b'world') 5 and its peer recv()s "hell" and "o". >>> c2.recv(10) b'hell' >>> c2.recv(1, MSG_OOB) b'o' Now the consumed OOB skb stays at the head of recvq to return a correct value for ioctl(SIOCATMARK), which is broken now and fixed by a later patch. Then, if peer issues recv() with MSG_DONTWAIT, manage_oob() returns NULL, so recv() ends up with -EAGAIN. >>> c2.setblocking(False) # This causes -EAGAIN even with available data >>> c2.recv(5) Traceback (most recent call last): File "<stdin>", line 1, in <module> BlockingIOError: [Errno 11] Resource temporarily unavailable However, next recv() will return the following available data, "world". >>> c2.recv(5) b'world' When the consumed OOB skb is at the head of the queue, we need to fetch the next skb to fix the weird behaviour. Note that the issue does not happen without MSG_DONTWAIT because we can retry after manage_oob(). This patch also adds a test case that covers the issue. Without fix: # RUN msg_oob.no_peek.ex_oob_break ... # msg_oob.c:134:ex_oob_break:AF_UNIX :Resource temporarily unavailable # msg_oob.c:135:ex_oob_break:Expected:ld # msg_oob.c:137:ex_oob_break:Expected ret[0] (-1) == expected_len (2) # ex_oob_break: Test terminated by assertion # FAIL msg_oob.no_peek.ex_oob_break not ok 8 msg_oob.no_peek.ex_oob_break With fix: # RUN msg_oob.no_peek.ex_oob_break ... # OK msg_oob.no_peek.ex_oob_break ok 8 msg_oob.no_peek.ex_oob_break Fixes: `314001f0bf` ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-27 12:05:01 +02:00
Kuniyuki Iwashima	b94038d841	af_unix: Stop recv(MSG_PEEK) at consumed OOB skb. After consuming OOB data, recv() reading the preceding data must break at the OOB skb regardless of MSG_PEEK. Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is not compliant with TCP. >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX) >>> c1.send(b'hello', MSG_OOB) 5 >>> c1.send(b'world') 5 >>> c2.recv(1, MSG_OOB) b'o' >>> c2.recv(9, MSG_PEEK) # This should return b'hell' b'hellworld' # even with enough buffer. Let's fix it by returning NULL for consumed skb and unlinking it only if MSG_PEEK is not specified. This patch also adds test cases that add recv(MSG_PEEK) before each recv(). Without fix: # RUN msg_oob.peek.oob_ahead_break ... # msg_oob.c:134:oob_ahead_break:AF_UNIX :hellworld # msg_oob.c:135:oob_ahead_break:Expected:hell # msg_oob.c:137:oob_ahead_break:Expected ret[0] (9) == expected_len (4) # oob_ahead_break: Test terminated by assertion # FAIL msg_oob.peek.oob_ahead_break not ok 13 msg_oob.peek.oob_ahead_break With fix: # RUN msg_oob.peek.oob_ahead_break ... # OK msg_oob.peek.oob_ahead_break ok 13 msg_oob.peek.oob_ahead_break Fixes: `314001f0bf` ("af_unix: Add OOB support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-27 12:05:00 +02:00
Pablo Neira Ayuso	7931d32955	netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers register store validation for NFT_DATA_VALUE is conditional, however, the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This only requires a new helper function to infer the register type from the set datatype so this conditional check can be removed. Otherwise, pointer to chain object can be leaked through the registers. Fixes: `96518518cc` ("netfilter: add nftables") Reported-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-06-27 01:09:51 +02:00
Zijun Hu	1bbdb7f7a4	net: rfkill: Correct return value in invalid parameter case rfkill_set_hw_state_reason() does not return current combined block state when its parameter @reason is invalid, that is wrong according to its comments, fix it by correcting the value returned. Also reformat the WARN while at it. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://patch.msgid.link/1718287476-28227-1-git-send-email-quic_zijuhu@quicinc.com [edit/reformat commit message, remove unneeded variable] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:49:01 +02:00
Zong-Zhe Yang	021d53a3d8	wifi: mac80211: fix NULL dereference at band check in starting tx ba session In MLD connection, link_data/link_conf are dynamically allocated. They don't point to vif->bss_conf. So, there will be no chanreq assigned to vif->bss_conf and then the chan will be NULL. Tweak the code to check ht_supported/vht_supported/has_he/has_eht on sta deflink. Crash log (with rtw89 version under MLO development): [ 9890.526087] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 9890.526102] #PF: supervisor read access in kernel mode [ 9890.526105] #PF: error_code(0x0000) - not-present page [ 9890.526109] PGD 0 P4D 0 [ 9890.526114] Oops: 0000 [#1] PREEMPT SMP PTI [ 9890.526119] CPU: 2 PID: 6367 Comm: kworker/u16:2 Kdump: loaded Tainted: G OE 6.9.0 #1 [ 9890.526123] Hardware name: LENOVO 2356AD1/2356AD1, BIOS G7ETB3WW (2.73 ) 11/28/2018 [ 9890.526126] Workqueue: phy2 rtw89_core_ba_work [rtw89_core] [ 9890.526203] RIP: 0010:ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211 [ 9890.526279] Code: f7 e8 d5 93 3e ea 48 83 c4 28 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 49 8b 84 24 e0 f1 ff ff 48 8b 80 90 1b 00 00 <83> 38 03 0f 84 37 fe ff ff bb ea ff ff ff eb cc 49 8b 84 24 10 f3 All code ======== 0: f7 e8 imul %eax 2: d5 (bad) 3: 93 xchg %eax,%ebx 4: 3e ea ds (bad) 6: 48 83 c4 28 add $0x28,%rsp a: 89 d8 mov %ebx,%eax c: 5b pop %rbx d: 41 5c pop %r12 f: 41 5d pop %r13 11: 41 5e pop %r14 13: 41 5f pop %r15 15: 5d pop %rbp 16: c3 retq 17: cc int3 18: cc int3 19: cc int3 1a: cc int3 1b: 49 8b 84 24 e0 f1 ff mov -0xe20(%r12),%rax 22: ff 23: 48 8b 80 90 1b 00 00 mov 0x1b90(%rax),%rax 2a:* 83 38 03 cmpl $0x3,(%rax) <-- trapping instruction 2d: 0f 84 37 fe ff ff je 0xfffffffffffffe6a 33: bb ea ff ff ff mov $0xffffffea,%ebx 38: eb cc jmp 0x6 3a: 49 rex.WB 3b: 8b .byte 0x8b 3c: 84 24 10 test %ah,(%rax,%rdx,1) 3f: f3 repz Code starting with the faulting instruction =========================================== 0: 83 38 03 cmpl $0x3,(%rax) 3: 0f 84 37 fe ff ff je 0xfffffffffffffe40 9: bb ea ff ff ff mov $0xffffffea,%ebx e: eb cc jmp 0xffffffffffffffdc 10: 49 rex.WB 11: 8b .byte 0x8b 12: 84 24 10 test %ah,(%rax,%rdx,1) 15: f3 repz [ 9890.526285] RSP: 0018:ffffb8db09013d68 EFLAGS: 00010246 [ 9890.526291] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9308e0d656c8 [ 9890.526295] RDX: 0000000000000000 RSI: ffffffffab99460b RDI: ffffffffab9a7685 [ 9890.526300] RBP: ffffb8db09013db8 R08: 0000000000000000 R09: 0000000000000873 [ 9890.526304] R10: ffff9308e0d64800 R11: 0000000000000002 R12: ffff9308e5ff6e70 [ 9890.526308] R13: ffff930952500e20 R14: ffff9309192a8c00 R15: 0000000000000000 [ 9890.526313] FS: 0000000000000000(0000) GS:ffff930b4e700000(0000) knlGS:0000000000000000 [ 9890.526316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9890.526318] CR2: 0000000000000000 CR3: 0000000391c58005 CR4: 00000000001706f0 [ 9890.526321] Call Trace: [ 9890.526324] <TASK> [ 9890.526327] ? show_regs (arch/x86/kernel/dumpstack.c:479) [ 9890.526335] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 9890.526340] ? page_fault_oops (arch/x86/mm/fault.c:713) [ 9890.526347] ? search_module_extables (kernel/module/main.c:3256 (discriminator 3)) [ 9890.526353] ? ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211 Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Link: https://patch.msgid.link/20240617115217.22344-1-kevin_yang@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:44:00 +02:00
Emmanuel Grumbach	1decf05d0f	wifi: mac80211: inform the low level if drv_stop() is a suspend This will allow the low level driver to take different actions for different flows. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240618192529.739036208b6e.Ie18a2fe8e02bf2717549d39420b350cfdaf3d317@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:25:46 +02:00
Johannes Berg	321028bc45	wifi: mac80211: disable softirqs for queued frame handling As noticed by syzbot, calling ieee80211_handle_queued_frames() (and actually handling frames there) requires softirqs to be disabled, since we call into the RX code. Fix that in the case of cleaning up frames left over during shutdown. Fixes: `177c6ae972` ("wifi: mac80211: handle tasklet frames before stopping") Reported-by: syzbot+1d516edf1e74469ba5d3@syzkaller.appspotmail.com Link: https://patch.msgid.link/20240626091559.cd6f08105a6e.I74778610a5ff2cf8680964698131099d2960352a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:23:50 +02:00
Eric Dumazet	d1cba2ea81	wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values syzbot is able to trigger softlockups, setting NL80211_ATTR_TXQ_QUANTUM to 2^31. We had a similar issue in sch_fq, fixed with commit `d9e15a2733` ("pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM") watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:0:24] Modules linked in: irq event stamp: 131135 hardirqs last enabled at (131134): [<ffff80008ae8778c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline] hardirqs last enabled at (131134): [<ffff80008ae8778c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95 hardirqs last disabled at (131135): [<ffff80008ae85378>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (131135): [<ffff80008ae85378>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (125892): [<ffff80008907e82c>] neigh_hh_init net/core/neighbour.c:1538 [inline] softirqs last enabled at (125892): [<ffff80008907e82c>] neigh_resolve_output+0x268/0x658 net/core/neighbour.c:1553 softirqs last disabled at (125896): [<ffff80008904166c>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19 CPU: 1 PID: 24 Comm: kworker/1:0 Not tainted 6.9.0-rc7-syzkaller-gfda5695d692c #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: mld mld_ifc_work pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __list_del include/linux/list.h:195 [inline] pc : __list_del_entry include/linux/list.h:218 [inline] pc : list_move_tail include/linux/list.h:310 [inline] pc : fq_tin_dequeue include/net/fq_impl.h:112 [inline] pc : ieee80211_tx_dequeue+0x6b8/0x3b4c net/mac80211/tx.c:3854 lr : __list_del_entry include/linux/list.h:218 [inline] lr : list_move_tail include/linux/list.h:310 [inline] lr : fq_tin_dequeue include/net/fq_impl.h:112 [inline] lr : ieee80211_tx_dequeue+0x67c/0x3b4c net/mac80211/tx.c:3854 sp : ffff800093d36700 x29: ffff800093d36a60 x28: ffff800093d36960 x27: dfff800000000000 x26: ffff0000d800ad50 x25: ffff0000d800abe0 x24: ffff0000d800abf0 x23: ffff0000e0032468 x22: ffff0000e00324d4 x21: ffff0000d800abf0 x20: ffff0000d800abf8 x19: ffff0000d800abf0 x18: ffff800093d363c0 x17: 000000000000d476 x16: ffff8000805519dc x15: ffff7000127a6cc8 x14: 1ffff000127a6cc8 x13: 0000000000000004 x12: ffffffffffffffff x11: ffff7000127a6cc8 x10: 0000000000ff0100 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffff80009287aa08 x4 : 0000000000000008 x3 : ffff80008034c7fc x2 : ffff0000e0032468 x1 : 00000000da0e46b8 x0 : ffff0000e0032470 Call trace: __list_del include/linux/list.h:195 [inline] __list_del_entry include/linux/list.h:218 [inline] list_move_tail include/linux/list.h:310 [inline] fq_tin_dequeue include/net/fq_impl.h:112 [inline] ieee80211_tx_dequeue+0x6b8/0x3b4c net/mac80211/tx.c:3854 wake_tx_push_queue net/mac80211/util.c:294 [inline] ieee80211_handle_wake_tx_queue+0x118/0x274 net/mac80211/util.c:315 drv_wake_tx_queue net/mac80211/driver-ops.h:1350 [inline] schedule_and_wake_txq net/mac80211/driver-ops.h:1357 [inline] ieee80211_queue_skb+0x18e8/0x2244 net/mac80211/tx.c:1664 ieee80211_tx+0x260/0x400 net/mac80211/tx.c:1966 ieee80211_xmit+0x278/0x354 net/mac80211/tx.c:2062 __ieee80211_subif_start_xmit+0xab8/0x122c net/mac80211/tx.c:4338 ieee80211_subif_start_xmit+0xe0/0x438 net/mac80211/tx.c:4532 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x27c/0x938 net/core/dev.c:3547 __dev_queue_xmit+0x1678/0x33fc net/core/dev.c:4341 dev_queue_xmit include/linux/netdevice.h:3091 [inline] neigh_resolve_output+0x558/0x658 net/core/neighbour.c:1563 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x104c/0x1ee8 net/ipv6/ip6_output.c:137 ip6_finish_output+0x428/0x7a0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x270/0x594 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:450 [inline] NF_HOOK+0x160/0x4f0 include/linux/netfilter.h:314 mld_sendpack+0x7b4/0x10f4 net/ipv6/mcast.c:1818 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x840/0xd0c net/ipv6/mcast.c:2650 process_one_work+0x7b8/0x15d4 kernel/workqueue.c:3267 process_scheduled_works kernel/workqueue.c:3348 [inline] worker_thread+0x938/0xef4 kernel/workqueue.c:3429 kthread+0x288/0x310 kernel/kthread.c:388 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Fixes: `52539ca89f` ("cfg80211: Expose TXQ stats and parameters to userspace") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240615160800.250667-1-edumazet@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:23:30 +02:00
Ilan Peer	5036eaffed	wifi: cfg80211: Always call tracing Call the tracing function even if the cfg80211 callbacks are not set. This would allow better understanding of user space actions. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Ilan Peer <ilan.peer@intel.com> Link: https://patch.msgid.link/20240614093541.018cb816e176.I28f68740a6b42144346f5c175c7874b0a669a364@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:22:12 +02:00
Johannes Berg	9cc88678db	wifi: mac80211: check SSID in beacon Check that the SSID in beacons is correct, if it's not hidden and beacon protection is enabled (otherwise there's no value). If it doesn't match, disconnect. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143809.8b24a3d26a3d.I3e3ef31dbd2ec606be74d502a9d00dd9514c6885@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:22:10 +02:00
Johannes Berg	0b2d9d9aec	wifi: mac80211: correcty limit wider BW TDLS STAs When updating a channel context, the code can apply wider bandwidth TDLS STA channel definitions to each and every channel context used by the device, an approach that will surely lead to problems if there is ever more than one. Restrict the wider BW TDLS STA consideration to only TDLS STAs that are actually related to links using the channel context being updated. Fixes: `0fabfaafec` ("mac80211: upgrade BW of TDLS peers when possible") Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143707.1ad989acecde.I5c75c94d95c3f4ea84f8ff4253189f4b13bad5c3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:22:06 +02:00
Johannes Berg	d42fcaece0	wifi: mac80211: add ieee80211_tdls_sta_link_id() We've open-coded this twice and will need it again, add ieee80211_tdls_sta_link_id() to get the one link ID for a TDLS STA. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143707.9f8141ae1725.I343822bbba0ae08dedb2f54a0ce87f2ae5ebeb2b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:22:04 +02:00
Johannes Berg	dd7b1bdb56	wifi: mac80211: update STA/chandef width during switch In channel switch without an additional channel context, where the reassign logic kicks in, we also need to update the station bandwidth and chandef minimum width correctly to avoid having station rate control configured to wider bandwidth than the channel context. Do that now. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143418.0bc3d28231b3.I51e76df86212057ca0469e235ba9bf4461cbee75@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:21:59 +02:00
Johannes Berg	b275123685	wifi: mac80211: make ieee80211_chan_bw_change() able to use reserved Make ieee80211_chan_bw_change() able to use the reserved chanreq (really the chandef part of it) for the calculations, so it can be used _without_ applying the changes first. Remove the comment that indicates this is required, since it no longer is. However, this capability only gets used later. Also, this is not ideal, we really should not different so much between reserved and non-reserved usage, to simplify. That's a further cleanup later though. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143418.1a08cf83b8cb.Ie567bb272eb25ce487651088f13ad041f549651c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:21:56 +02:00
Johannes Berg	7d2bad829c	wifi: mac80211: optionally pass chandef to ieee80211_sta_cur_vht_bw() We'll need this as well for channel switching cases, so add the ability now to pass the chandef to calculate for. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143418.f70e05d9f306.Ifa0ce267de4f0ef3c21d063fb0cbf50e84d7d6ff@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:21:54 +02:00
Johannes Berg	25af8ff51d	wifi: mac80211: optionally pass chandef to ieee80211_sta_cap_rx_bw() We'll need this function to take a new chandef in (some) channel switching cases, so prepare for that by allowing that to be passed and using it if so. Clean up the code a little bit while at it. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143418.772313f08b6a.If9708249e5870671e745d4c2b02e03b25092bea3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:21:52 +02:00
Johannes Berg	b777bdfc9b	wifi: mac80211: handle protected dual of public action The code currently handles ECSA (extended channel switch announcement) public action frames. Handle also their protected dual, which actually is protected. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143037.db642feb8b2e.I184fa5c9bffb68099171701e403c2aa733f60fde@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:21:48 +02:00
Johannes Berg	414e090bc4	wifi: mac80211: restrict public action ECSA frame handling Public action extended channel switch announcement (ECSA) frames cannot be protected well, the spec is unclear about what should happen in the presence of stations that can receive protected dual and stations that cannot. Mitigate these issues by not treating public action frames as the absolute truth, only treat them as a hint to stop transmitting (quiet mode), and do the remainder of the CSA handling only when receiving the next beacon (or protected action frame) that contains the CSA; or, if it doesn't, simply stop being quiet and continue operating normally. This limits the exposure to malicious ECSA public action frames, since they cannot cause a disconnect now, only a short interruption in traffic. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143037.ec7ccc45903e.Ife17d55c7ecbf98060f9c52889f3c8ba48798970@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:21:44 +02:00
Johannes Berg	dc494fdc1f	wifi: mac80211: refactor CSA queue block/unblock This code is duplicated many times, refactor it into new separate functions. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20240612143037.1ad22f10392d.If21490c2c67aae28f3c54038363181ee920ce3d1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-06-26 10:21:40 +02:00
Neal Cardwell	5dfe9d2739	tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO Testing determined that the recent commit `9e046bb111` ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()") has a race, and does not always ensure retrans_stamp is 0 after a TFO payload retransmit. If transmit completion for the SYN+data skb happens after the client TCP stack receives the SYNACK (which sometimes happens), then retrans_stamp can erroneously remain non-zero for the lifetime of the connection, causing a premature ETIMEDOUT later. Testing and tracing showed that the buggy scenario is the following somewhat tricky sequence: + Client attempts a TFO handshake. tcp_send_syn_data() sends SYN + TFO cookie + data in a single packet in the syn_data skb. It hands the syn_data skb to tcp_transmit_skb(), which makes a clone. Crucially, it then reuses the same original (non-clone) syn_data skb, transforming it by advancing the seq by one byte and removing the FIN bit, and enques the resulting payload-only skb in the sk->tcp_rtx_queue. + Client sets retrans_stamp to the start time of the three-way handshake. + Cookie mismatches or server has TFO disabled, and server only ACKs SYN. + tcp_ack() sees SYN is acked, tcp_clean_rtx_queue() clears retrans_stamp. + Since the client SYN was acked but not the payload, the TFO failure code path in tcp_rcv_fastopen_synack() tries to retransmit the payload skb. However, in some cases the transmit completion for the clone of the syn_data (which had SYN + TFO cookie + data) hasn't happened. In those cases, skb_still_in_host_queue() returns true for the retransmitted TFO payload, because the clone of the syn_data skb has not had its tx completetion. + Because skb_still_in_host_queue() finds skb_fclone_busy() is true, it sets the TSQ_THROTTLED bit and the retransmit does not happen in the tcp_rcv_fastopen_synack() call chain. + The tcp_rcv_fastopen_synack() code next implicitly assumes the retransmit process is finished, and sets retrans_stamp to 0 to clear it, but this is later overwritten (see below). + Later, upon tx completion, tcp_tsq_write() calls tcp_xmit_retransmit_queue(), which puts the retransmit in flight and sets retrans_stamp to a non-zero value. + The client receives an ACK for the retransmitted TFO payload data. + Since we're in CA_Open and there are no dupacks/SACKs/DSACKs/ECN to make tcp_ack_is_dubious() true and make us call tcp_fastretrans_alert() and reach a code path that clears retrans_stamp, retrans_stamp stays nonzero. + Later, if there is a TLP, RTO, RTO sequence, then the connection will suffer an early ETIMEDOUT due to the erroneously ancient retrans_stamp. The fix: this commit refactors the code to have tcp_rcv_fastopen_synack() retransmit by reusing the relevant parts of tcp_simple_retransmit() that enter CA_Loss (without changing cwnd) and call tcp_xmit_retransmit_queue(). We have tcp_simple_retransmit() and tcp_rcv_fastopen_synack() share code in this way because in both cases we get a packet indicating non-congestion loss (MTU reduction or TFO failure) and thus in both cases we want to retransmit as many packets as cwnd allows, without reducing cwnd. And given that retransmits will set retrans_stamp to a non-zero value (and may do so in a later calling context due to TSQ), we also want to enter CA_Loss so that we track when all retransmitted packets are ACked and clear retrans_stamp when that happens (to ensure later recurring RTOs are using the correct retrans_stamp and don't declare ETIMEDOUT prematurely). Fixes: `9e046bb111` ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()") Fixes: `a7abf3cd76` ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Link: https://patch.msgid.link/20240624144323.2371403-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:22:49 -07:00
Heng Qi	f750dfe825	ethtool: provide customized dim profile management The NetDIM library, currently leveraged by an array of NICs, delivers excellent acceleration benefits. Nevertheless, NICs vary significantly in their dim profile list prerequisites. Specifically, virtio-net backends may present diverse sw or hw device implementation, making a one-size-fits-all parameter list impractical. On Alibaba Cloud, the virtio DPU's performance under the default DIM profile falls short of expectations, partly due to a mismatch in parameter configuration. I also noticed that ice/idpf/ena and other NICs have customized profilelist or placed some restrictions on dim capabilities. Motivated by this, I tried adding new params for "ethtool -C" that provides a per-device control to modify and access a device's interrupt parameters. Usage ======== The target NIC is named ethx. Assume that ethx only declares support for rx profile setting (with DIM_PROFILE_RX flag set in profile_flags) and supports modification of usec and pkt fields. 1. Query the currently customized list of the device $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 256, .comps = n/a,}, {.usec = 8, .pkts = 256, .comps = n/a,}, {.usec = 64, .pkts = 256, .comps = n/a,}, {.usec = 128, .pkts = 256, .comps = n/a,}, {.usec = 256, .pkts = 256, .comps = n/a,} tx-profile: n/a 2. Tune $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n "n" means do not modify this field. $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 1, .comps = n/a,}, {.usec = 2, .pkts = 256, .comps = n/a,}, {.usec = 3, .pkts = 3, .comps = n/a,}, {.usec = 4, .pkts = 4, .comps = n/a,}, {.usec = 256, .pkts = 5, .comps = n/a,} tx-profile: n/a 3. Hint If the device does not support some type of customized dim profiles, the corresponding "n/a" will display. If the "n/a" field is being modified, -EOPNOTSUPP will be reported. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:15:06 -07:00
James Chapman	a8a8d89dbd	l2tp: remove incorrect __rcu attribute This fixes a sparse warning. Fixes: `d18d3f0a24` ("l2tp: replace hlist with simple list for per-tunnel session list") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406220754.evK8Hrjw-lkp@intel.com/ Signed-off-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20240624082945.1925009-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 08:29:42 -07:00
luoxuanqiang	ff46e3b442	Fix race for duplicate reqsk on identical SYN When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets are received at the same time and processed on different CPUs, it can potentially create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request(). These two different reqsk will respond with two SYNACK packets, and since the generation of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have different seq values. The consequence is that when the Client receives and replies with an ACK to the earlier SYNACK packet, we will reset(RST) it. ======================================================================== This behavior is consistently reproducible in my local setup, which comprises: \| NETA1 ------ NETB1 \| PC_A --- bond --- \| \| --- bond --- PC_B \| NETA2 ------ NETB2 \| - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU. - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are also bonded and configured in BOND_MODE_BROADCAST mode. If the client attempts a TCP connection to the server, it might encounter a failure. Capturing packets from the server side reveals: 10.10.10.10.45182 > localhost: Flags [S], seq 320236027, 10.10.10.10.45182 > localhost: Flags [S], seq 320236027, localhost > 10.10.10.10.45182: Flags [S.], seq 2967855116, localhost > 10.10.10.10.45182: Flags [S.], seq 2967855123, <== 10.10.10.10.45182 > localhost: Flags [.], ack 4294967290, 10.10.10.10.45182 > localhost: Flags [.], ack 4294967290, localhost > 10.10.10.10.45182: Flags [R], seq 2967855117, <== localhost > 10.10.10.10.45182: Flags [R], seq 2967855117, Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly. ======================================================================== The attempted solution is as follows: Add a return value to inet_csk_reqsk_queue_hash_add() to confirm if the ehash insertion is successful (Up to now, the reason for unsuccessful insertion is that a reqsk for the same connection has already been inserted). If the insertion fails, release the reqsk. Due to the refcnt, Kuniyuki suggests also adding a return value check for the DCCP module; if ehash insertion fails, indicating a successful insertion of the same connection, simply release the reqsk as well. Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is adjusted to be after successful insertion. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: luoxuanqiang <luoxuanqiang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240621013929.1386815-1-luoxuanqiang@kylinos.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:37:45 +02:00
Kuniyuki Iwashima	22e5751b05	af_unix: Don't use spin_lock_nested() in copy_peercred(). When (AF_UNIX, SOCK_STREAM) socket connect()s to a listening socket, the listener's sk_peer_pid/sk_peer_cred are copied to the client in copy_peercred(). Then, two sk_peer_locks are held there; one is client's and another is listener's. However, the latter is not needed because we hold the listner's unix_state_lock() there and unix_listen() cannot update the cred concurrently. Let's drop the unnecessary spin_lock() and use the bare spin_lock() for the client to protect concurrent read by getsockopt(SO_PEERCRED). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:19 +02:00
Kuniyuki Iwashima	e4bd881d98	af_unix: Remove put_pid()/put_cred() in copy_peercred(). When (AF_UNIX, SOCK_STREAM) socket connect()s to a listening socket, the listener's sk_peer_pid/sk_peer_cred are copied to the client in copy_peercred(). Then, the client's sk_peer_pid and sk_peer_cred are always NULL, so we need not call put_pid() and put_cred() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	faf489e689	af_unix: Set sk_peer_pid/sk_peer_cred locklessly for new socket. init_peercred() is called in 3 places: 1. socketpair() : both sockets 2. connect() : child socket 3. listen() : listening socket The first two need not hold sk_peer_lock because no one can touch the socket. Let's set cred/pid without holding lock for the two cases and rename the old init_peercred() to update_peercred() to properly reflect the use case. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	8647ece481	af_unix: Define locking order for U_RECVQ_LOCK_EMBRYO in unix_collect_skb(). While GC is cleaning up cyclic references by SCM_RIGHTS, unix_collect_skb() collects skb in the socket's recvq. If the socket is TCP_LISTEN, we need to collect skb in the embryo's queue. Then, both the listener's recvq lock and the embroy's one are held. The locking is always done in the listener -> embryo order. Let's define it as unix_recvq_lock_cmp_fn() instead of using spin_lock_nested(). Note that the reverse order is defined for consistency. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	c4da4661d9	af_unix: Remove U_LOCK_DIAG. sk_diag_dump_icons() acquires embryo's lock by unix_state_lock_nested() to fetch its peer. The embryo's ->peer is set to NULL only when its parent listener is close()d. Then, unix_release_sock() is called for each embryo after unlinking skb by skb_dequeue(). In sk_diag_dump_icons(), we hold the parent's recvq lock, so we need not acquire unix_state_lock_nested(), and peer is always non-NULL. Let's remove unnecessary unix_state_lock_nested() and non-NULL test for peer. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	b380b18102	af_unix: Don't acquire unix_state_lock() for sock_i_ino(). sk_diag_dump_peer() and sk_diag_dump() call unix_state_lock() for sock_i_ino() which reads SOCK_INODE(sk->sk_socket)->i_ino, but it's protected by sk->sk_callback_lock. Let's remove unnecessary unix_state_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	98f706de44	af_unix: Define locking order for U_LOCK_SECOND in unix_stream_connect(). While a SOCK_(STREAM\|SEQPACKET) socket connect()s to another, we hold two locks of them by unix_state_lock() and unix_state_lock_nested() in unix_stream_connect(). Before unix_state_lock_nested(), the following is guaranteed by checking sk->sk_state: 1. The first socket is TCP_LISTEN 2. The second socket is not the first one 3. Simultaneous connect() must fail So, the client state can be TCP_CLOSE or TCP_LISTEN or TCP_ESTABLISHED. Let's define the expected states as unix_state_lock_cmp_fn() instead of using unix_state_lock_nested(). Note that 2. is detected by debug_spin_lock_before() and 3. cannot be expressed as lock_cmp_fn. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	1ca27e0c8c	af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect(). When a SOCK_(STREAM\|SEQPACKET) socket connect()s to another one, we need to lock the two sockets to check their states in unix_stream_connect(). We use unix_state_lock() for the server and unix_state_lock_nested() for client with tricky sk->sk_state check to avoid deadlock. The possible deadlock scenario are the following: 1) Self connect() 2) Simultaneous connect() The former is simple, attempt to grab the same lock, and the latter is AB-BA deadlock. After the server's unix_state_lock(), we check the server socket's state, and if it's not TCP_LISTEN, connect() fails with -EINVAL. Then, we avoid the former deadlock by checking the client's state before unix_state_lock_nested(). If its state is not TCP_LISTEN, we can make sure that the client and the server are not identical based on the state. Also, the latter deadlock can be avoided in the same way. Due to the server sk->sk_state requirement, AB-BA deadlock could happen only with TCP_LISTEN sockets. So, if the client's state is TCP_LISTEN, we can give up the second lock to avoid the deadlock. CPU 1 CPU 2 CPU 3 connect(A -> B) connect(B -> A) listen(A) --- --- --- unix_state_lock(B) B->sk_state == TCP_LISTEN READ_ONCE(A->sk_state) == TCP_CLOSE ^^^^^^^^^ ok, will lock A unix_state_lock(A) .--------------' WRITE_ONCE(A->sk_state, TCP_LISTEN) \| unix_state_unlock(A) \| \| unix_state_lock(A) \| A->sk_sk_state == TCP_LISTEN \| READ_ONCE(B->sk_state) == TCP_LISTEN v ^^^^^^^^^^ unix_state_lock_nested(A) Don't lock B !! Currently, while checking the client's state, we also check if it's TCP_ESTABLISHED, but this is unlikely and can be checked after we know the state is not TCP_CLOSE. Moreover, if it happens after the second lock, we now jump to the restart label, but it's unlikely that the server is not found during the retry, so the jump is mostly to revist the client state check. Let's remove the retry logic and check the state against TCP_CLOSE first. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	ed99822817	af_unix: Define locking order for U_LOCK_SECOND in unix_state_double_lock(). unix_dgram_connect() and unix_dgram_{send,recv}msg() lock the socket and peer in ascending order of the socket address. Let's define the order as unix_state_lock_cmp_fn() instead of using unix_state_lock_nested(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	3955802f16	af_unix: Define locking order for unix_table_double_lock(). When created, AF_UNIX socket is put into net->unx.table.buckets[], and the hash is stored in sk->sk_hash. * unbound socket : 0 <= sk_hash <= UNIX_HASH_MOD When bind() is called, the socket could be moved to another bucket. * pathname socket : 0 <= sk_hash <= UNIX_HASH_MOD * abstract socket : UNIX_HASH_MOD + 1 <= sk_hash <= UNIX_HASH_MOD * 2 + 1 Then, we call unix_table_double_lock() which locks a single bucket or two. Let's define the order as unix_table_lock_cmp_fn() instead of using spin_lock_nested(). The locking is always done in ascending order of sk->sk_hash, which is the index of buckets/locks array allocated by kvmalloc_array(). sk_hash_A < sk_hash_B <=> &locks[sk_hash_A].dep_map < &locks[sk_hash_B].dep_map So, the relation of two sk->sk_hash can be derived from the addresses of dep_map in the array of locks. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Jakub Kicinski	482000cf7f	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZnlmXgAKCRDbK58LschI g2ovAP9iynwwFEjMSxHjQVXSq1J1PMqF4966vmy30RCKJMMN/QD/SRsRRKcfsPis BzKOdsOVbWlDl2CUqvBrPZGT6laKoQc= =6/0V -----END PGP SIGNATURE----- Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-24 We've added 12 non-merge commits during the last 10 day(s) which contain a total of 10 files changed, 412 insertions(+), 16 deletions(-). The main changes are: 1) Fix a BPF verifier issue validating may_goto with a negative offset, from Alexei Starovoitov. 2) Fix a BPF verifier validation bug with may_goto combined with jump to the first instruction, also from Alexei Starovoitov. 3) Fix a bug with overrunning reservations in BPF ring buffer, from Daniel Borkmann. 4) Fix a bug in BPF verifier due to missing proper var_off setting related to movsx instruction, from Yonghong Song. 5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(), from Daniil Dulov. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xdp: Remove WARN() from __xdp_reg_mem_model() selftests/bpf: Add tests for may_goto with negative offset. bpf: Fix may_goto with negative offset. selftests/bpf: Add more ring buffer test coverage bpf: Fix overrunning reservations in ringbuf selftests/bpf: Tests with may_goto and jumps to the 1st insn bpf: Fix the corner case with may_goto and jump to the 1st insn. bpf: Update BPF LSM maintainer list bpf: Fix remap of arena. selftests/bpf: Add a few tests to cover bpf: Add missed var_off setting in coerce_subreg_to_size_sx() bpf: Add missed var_off setting in set_sext32_default_val() ==================== Link: https://patch.msgid.link/20240624124330.8401-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-24 18:15:22 -07:00
Sebastian Andrzej Siewior	3f9fe37d9e	net: Move per-CPU flush-lists to bpf_net_context on PREEMPT_RT. The per-CPU flush lists, which are accessed from within the NAPI callback (xdp_do_flush() for instance), are per-CPU. There are subject to the same problem as struct bpf_redirect_info. Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the access. The lists initialized on first usage (similar to bpf_net_ctx_get_ri()). Cc: "Björn Töpel" <bjorn@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-16-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-24 16:41:24 -07:00
Sebastian Andrzej Siewior	401cb7dae8	net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used. - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list. At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists. The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided. Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it, bpf_net_ctx_clear() removes it again. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. The returned data structure is zero initialized to ensure nothing is leaked from stack. This is done on first usage of the struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to note that initialisation is required. First invocation of bpf_net_ctx_get_ri() will memset() the data structure and update bpf_redirect_info::kern_flags. bpf_redirect_info::nh is excluded from memset because it is only used once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is moved past nh to exclude it from memset. The pointer to bpf_net_context is saved task's task_struct. Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-24 16:41:24 -07:00
Sebastian Andrzej Siewior	78f520b7bb	net: Use nested-BH locking for bpf_scratchpad. bpf_scratchpad is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-14-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-24 16:41:23 -07:00
Sebastian Andrzej Siewior	d1542d4ae4	seg6: Use nested-BH locking for seg6_bpf_srh_states. The access to seg6_bpf_srh_states is protected by disabling preemption. Based on the code, the entry point is input_action_end_bpf() and every other function (the bpf helper functions bpf_lwt_seg6_*()), that is accessing seg6_bpf_srh_states, should be called from within input_action_end_bpf(). input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of the function and then disables preemption. This looks wrong because if preemption needs to be disabled as part of the locking mechanism then the variable shouldn't be accessed beforehand. Looking at how it is used via test_lwt_seg6local.sh then input_action_end_bpf() is always invoked from softirq context. If this is always the case then the preempt_disable() statement is superfluous. If this is not always invoked from softirq then disabling only preemption is not sufficient. Replace the preempt_disable() statement with nested-BH locking. This is not an equivalent replacement as it assumes that the invocation of input_action_end_bpf() always occurs in softirq context and thus the preempt_disable() is superfluous. Add a local_lock_t the data structure and use local_lock_nested_bh() for locking. Add lockdep_assert_held() to ensure the lock is held while the per-CPU variable is referenced in the helper functions. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-13-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-24 16:41:23 -07:00

1 2 3 4 5 ...

77633 Commits