Kalle Valo says:
====================
wireless-drivers fixes for v5.11
Second set of fixes for v5.11. Like in last time we again have more
fixes than usual Actually a bit too much for my liking in this state
of the cycle, but due to unrelated challenges I was only able to
submit them now.
We have few important crash fixes, iwlwifi modifying read-only data
being the most reported issue, and also smaller fixes to iwlwifi.
mt76
* fix a clang warning about enum usage
* fix rx buffer refcounting crash
mt7601u
* fix rx buffer refcounting crash
* fix crash when unbplugging the device
iwlwifi
* fix a crash where we were modifying read-only firmware data
* lots of smaller fixes all over the driver
* tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: (24 commits)
mt7601u: fix kernel crash unplugging the device
iwlwifi: queue: bail out on invalid freeing
iwlwifi: mvm: guard against device removal in reprobe
iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
iwlwifi: mvm: clear IN_D3 after wowlan status cmd
iwlwifi: pcie: add rules to match Qu with Hr2
iwlwifi: mvm: invalidate IDs of internal stations at mvm start
iwlwifi: mvm: fix the return type for DSM functions 1 and 2
iwlwifi: pcie: reschedule in long-running memory reads
iwlwifi: pcie: use jiffies for memory read spin time limit
iwlwifi: pcie: fix context info memory leak
iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
iwlwifi: pcie: set LTR on more devices
iwlwifi: queue: don't crash if txq->entries is NULL
iwlwifi: fix the NMI flow for old devices
iwlwifi: pnvm: don't try to load after failures
iwlwifi: pnvm: don't skip everything when not reloading
iwlwifi: pcie: avoid potential PNVM leaks
iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
iwlwifi: mvm: skip power command when unbinding vif during CSA
...
====================
Link: https://lore.kernel.org/r/20210126092202.6A367C433CA@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In D3 resume flow, avoid the following race where sending
packets before updating the sequence number (sequence
number received from the wowlan status command response):
Thread 1:
__iwl_mvm_resume clears IWL_MVM_STATUS_IN_D3 and is cut
by thread 2 before reaching iwl_mvm_query_wakeup_reasons.
Thread 2:
iwl_mvm_mac_itxq_xmit calls iwl_mvm_tx_skb since
IWL_MVM_STATUS_IN_D3 is not set using a wrong sequence number.
Thread 1:
__iwl_mvm_resume continues and calls iwl_mvm_query_wakeup_reasons
updating the sequence number received from the firmware.
The next packet that will be sent now will cause sysassert 0x1096.
Fix the bug by moving 'clear IWL_MVM_STATUS_IN_D3' to after
sending the wowlan status command and updating the sequence
number.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.fe927ec939c6.I103d3321fb55da7e6c6c51582cfadf94eb8b6c58@changeid
If we spin for a long time in memory reads that (for some reason in
hardware) take a long time, then we'll eventually get messages such
as
watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272]
This is because the reading really does take a very long time, and
we don't schedule, so we're hogging the CPU with this task, at least
if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y.
Previously I misinterpreted the situation and thought that this was
only going to happen if we had interrupts disabled, and then fixed
this (which is good anyway, however), but that didn't always help;
looking at it again now I realized that the spin unlock will only
reschedule if CONFIG_PREEMPT is used.
In order to avoid this issue, change the code to cond_resched() if
we've been spinning for too long here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 04516706bb ("iwlwifi: pcie: limit memory read spin time")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid
I noticed that the flow that triggers an NMI on the firmware
for old devices (tested on 7265) doesn't work.
Apparently, the firmware / device is still in low power when
we write the register that triggers the NMI. We call the
"grab_nic_access" function to make sure the device is awake
but that wasn't enough. I played with this and noticed that
if we wait 1 ms after the device reports it is awake before
we write to the NMI register, the device always sees our
write and the firmware gets properly asserted.
Triggering an NMI to the firmware can be done with the
debugfs hook:
echo 1 > /sys/kernel/debug/iwlwifi/0000\:00\:03.0/iwlmvm/fw_nmi
What happened before is that the firmware would just stall
without running its NMI routine. Because of that the driver
wouldn't get the "firmware crashed" interrupt. After a while
the driver would notice that the firmware is not responding
to some command and it would read the error data from the
firmware, but this data is populated in the NMI service
routine in the firmware which was not called. So in the logs
it looked like:
iwlwifi 0000:00:03.0: Error sending REPLY_ERROR: time out after 2000ms.
iwlwifi 0000:00:03.0: Current CMD queue read_ptr 33 write_ptr 34
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000000 | ADVANCED_SYSASSERT
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00000000 | branchlink2
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink1
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000000 | data2
iwlwifi 0000:00:03.0: 0x00000000 | data3
iwlwifi 0000:00:03.0: 0x00000000 | beacon time
iwlwifi 0000:00:03.0: 0x00000000 | tsf low
...
With this fix, immediately after we trigger the NMI to the
firmware, we get the expected:
iwlwifi 0000:00:03.0: Microcode SW error detected. Restarting 0x2000000.
iwlwifi 0000:00:03.0: Start IWL Error Log Dump:
iwlwifi 0000:00:03.0: Status: 0x00000040, count: 6
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
iwlwifi 0000:00:03.0: 0x000002F1 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00043D6C | branchlink2
iwlwifi 0000:00:03.0: 0x0004AFD6 | interruptlink1
iwlwifi 0000:00:03.0: 0x000008C4 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000080 | data2
iwlwifi 0000:00:03.0: 0x07030000 | data3
iwlwifi 0000:00:03.0: 0x003FD4C3 | beacon time
iwlwifi 0000:00:03.0: 0x00C22AC3 | tsf low
iwlwifi 0000:00:03.0: 0x00000000 | tsf hi
iwlwifi 0000:00:03.0: 0x00000000 | time gp1
iwlwifi 0000:00:03.0: 0x00C22AC3 | time gp2
iwlwifi 0000:00:03.0: 0x00000001 | uCode revision type
iwlwifi 0000:00:03.0: 0x0000001D | uCode version major
Notice the first line: "Microcode SW error detected:" which
is printed in the driver's ISR, which means that the driver
actually got an interrupt from the firmware saying that it
crashed. And then we have the properly populated error data.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.70e67cc75d88.I6615cad4361862e7f3c9f2d3cafb6a8c61e16781@changeid
Clang warns in both mt7615 and mt7915:
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:271:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
txq = MT_MCUQ_FWDL;
~ ^~~~~~~~~~~~
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:278:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
txq = MT_MCUQ_WA;
~ ^~~~~~~~~~
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:282:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
txq = MT_MCUQ_WM;
~ ^~~~~~~~~~
3 warnings generated.
drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:238:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
qid = MT_MCUQ_WM;
~ ^~~~~~~~~~
drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:240:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
qid = MT_MCUQ_FWDL;
~ ^~~~~~~~~~~~
2 warnings generated.
Use the proper type for the queue ID variables to fix these warnings.
Additionally, rename the txq variable in mt7915_mcu_send_message to be
more neutral like mt7615_mcu_send_message.
Fixes: e637763b60 ("mt76: move mcu queues to mt76_dev q_mcu array")
Link: https://github.com/ClangBuiltLinux/linux/issues/1229
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201229211548.1348077-1-natechancellor@gmail.com
Without crc32, the driver fails to link:
arm-linux-gnueabi-ld: drivers/net/wireless/ath/wil6210/fw.o: in function `wil_fw_verify':
fw.c:(.text+0x74c): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/wireless/ath/wil6210/fw.o:fw.c:(.text+0x758): more undefined references to `crc32_le' follow
Fixes: 151a970650 ("wil6210: firmware download")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers fixes for v5.11
First set of fixes for v5.11, more fixes than usual this time. For
ath11k we have several fixes for QCA6390 PCI support and mt76 has
several. Also one build fix for mt76.
mt76
* fix two NULL pointer dereference
* fix build error when CONFIG_MAC80211_MESH is disabled
rtlwifi
* fix use-after-free in firmware handling code
ath11k
* error handling fixes
* fix crash found during connect and disconnect test
* handle HT disable better
* avoid printing qmi memory failure during firmware bootup
* disable ASPM during firmware bootup
* tag 'wireless-drivers-2020-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
MAINTAINERS: switch to different email address
mt76: mt7915: fix MESH ifdef block
mt76: mt76s: fix NULL pointer dereference in mt76s_process_tx_queue
mt76: sdio: remove wake logic in mt76s_process_tx_queue
mt76: usb: remove wake logic in mt76u_status_worker
ath11k: pci: disable ASPM L0sLs before downloading firmware
ath11k: qmi: try to allocate a big block of DMA memory first
rtlwifi: rise completion at the last step of firmware callback
mt76: mt76u: fix NULL pointer dereference in mt76u_status_worker
ath11k: Fix ath11k_pci_fix_l1ss()
ath11k: Fix error code in ath11k_core_suspend()
ath11k: start vdev if a bss peer is already created
ath11k: fix crash caused by NULL rx_channel
ath11k: add missing null check on allocated skb
====================
Link: https://lore.kernel.org/r/20201222163727.D4336C433C6@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ath.git fixes for 5.11. Major changes:
ath11k
* add null check for skb allocation
* fix crash found during connect/disconnect stress testing
* fix for HT disabled case
* brown paperbag fixes for my bugs in suspend code
* fix an unnecessary qmi allocation during firmware bootup
* disable ASPM during firmware bootup to avoid issues
Not all firmware versions support allocating DMA memory in smaller blocks so
first try to allocate big block of DMA memory for QMI. If the allocation fails,
let firmware request multiple blocks of DMA memory with smaller size.
This also fixes an unnecessary error message seen during ath11k probe on
QCA6390:
ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1608127593-15192-1-git-send-email-kvalo@codeaurora.org
The "if (!ret)" condition is inverted and it should be "if (ret)". It means
that we return success when we had intended to return an error code. This also
caused a spurious warning even when the suspend was successful:
[ 297.186612] ath11k_pci 0000:06:00.0: failed to suspend hif: 0
Fixes: d1b0c33850 ("ath11k: implement suspend for QCA6390 PCI devices")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/X9nF17L2/EKOSbn/@mwanda
For QCA6390, bss peer must be created before vdev is to start. This
change is to start vdev if a bss peer is created. Otherwise, ath11k
delays to start vdev.
This fixes an issue in a case where HT/VHT/HE settings change between
authentication and association, e.g., due to the user space request
to disable HT.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201211051358.9191-1-cjhuang@codeaurora.org
ath.git patches for v5.11. Major changes:
ath11k
* suspend support for QCA6390 PCI devices
* support TXOP duration based RTS threshold
* mesh: add support for 256 bitmap in blockack frames in 11ax
Now that all the needed pieces are in place implement suspend support QCA6390
PCI devices. All other devices will return -EOPNOTSUPP during suspend. The
suspend is implemented by switching the firmware to WoW mode during suspend, so
the firmware will be running on low power mode while host is in suspend.
At the moment we are not able to shutdown and fully power off the device due to
bugs in MHI subsystem, so WoW mode is a workaround for the time being.
During suspend we enable WoW mode, disable CE irq and DP irq, then put MHI to
suspend state. During resume, driver resumes MHI firstly, then enables CE irq
and dp IRQ, and sends WoW wakeup command to firmware.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607708150-21066-11-git-send-email-kvalo@codeaurora.org
Firmware will check all the pipes before entering WoW mode during suspend. If
ATH11K_HTC_FLAG_NEED_CREDIT_UPDATE is set, firmware treats this pipe needed to
return credit even though it's actually not required. If any pipe needs to
return credit, the suspend_complete message doesn't send to host but is
dropped. So host gets time out and WoW suspend failed.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1607708150-21066-8-git-send-email-kvalo@codeaurora.org