linux/drivers/pci
Smita Koralahalli 2ae8fbbe1c PCI/DPC: Ignore Surprise Down error on hot removal
According to PCIe r6.0 sec 6.7.6 [1], async removal with DPC may result in
surprise down error. This error is expected and is just a side-effect of
async remove.

Ignore surprise down error generated as a side-effect of async remove.
Typically, this error is benign as the pciehp handler invoked by PDC
or/and DLLSC alongside DPC, de-enumerates and brings down the device
appropriately, but the error messages might confuse users. Get rid of
these irritating log messages with a 1s delay while pciehp waits for
DPC recovery.

The implementation is as follows: On an async remove a DPC is triggered
along with a Presence Detect State change and/or DLL State Change.
Determine it's an async remove by checking for DPC Trigger Status in DPC
Status Register and Surprise Down Error Status in AER Uncorrected Error
Status to be non-zero. If true, treat the DPC event as a side-effect of
async remove, clear the error status registers and continue with hot-plug
tear down routines. If not, follow the existing routine to handle AER and
DPC errors.

Masking Surprise Down Errors was explored as an alternative approach, but
discarded due to the odd behavior that masking only avoids the interrupt,
but still records an error per PCIe r6.0, sec 6.2.3.2.2. That stale error
would be reported the next time some error other than Surprise Down is
handled.

Dmesg before:

  pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
  pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
  pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
  pcieport 0000:00:01.4:   device [1022:14ab] error status/mask=00000020/04004000
  pcieport 0000:00:01.4:    [ 5] SDES (First)
  nvme nvme2: frozen state error detected, reset controller
  pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
  pcieport 0000:00:01.4: AER: subordinate device reset failed
  pcieport 0000:00:01.4: AER: device recovery failed
  pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
  nvme2n1: detected capacity change from 1953525168 to 0
  pci 0000:04:00.0: Removing from iommu group 49

Dmesg after:

 pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
 nvme1n1: detected capacity change from 1953525168 to 0
 pci 0000:04:00.0: Removing from iommu group 37

[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
    https://members.pcisig.com/wg/PCI-SIG/document/16609

Link: https://lore.kernel.org/r/20240207181854.121335-1-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-02-28 10:24:36 -06:00
..
controller pci-v6.8-changes 2024-01-17 16:23:17 -08:00
endpoint Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
hotplug Revert "PCI: acpiphp: Reassign resources on bridge if necessary" 2023-12-15 14:55:10 -06:00
msi PCI/MSI: Use FIELD_GET/PREP() 2023-10-24 10:54:04 -05:00
pcie PCI/DPC: Ignore Surprise Down error on hot removal 2024-02-28 10:24:36 -06:00
switch PCI: switchtec: Fix stdev_release() crash after surprise hot remove 2023-11-22 09:44:06 -06:00
access.c PCI: Move pci_clear_and_set_dword() helper to PCI header 2023-12-13 13:35:41 +00:00
ats.c PCI/ATS: Use FIELD_GET() 2023-10-24 16:55:45 -05:00
bus.c Devicetree updates for v6.6: 2023-08-30 16:59:03 -07:00
doe.c PCI/DOE: Fix destroy_work_on_stack() race 2023-07-27 15:20:47 -05:00
ecam.c
host-bridge.c
iov.c PCI: Use resource names in PCI log messages 2023-12-15 17:28:42 -06:00
irq.c PCI: Check for alloc failure in pci_request_irq() 2022-11-21 16:55:18 -06:00
Kconfig PCI: Replace unnecessary UTF-8 in Kconfig 2023-10-06 14:30:48 -05:00
Makefile PCI: Create device tree node for bridge 2023-08-22 14:56:09 -05:00
mmap.c
of_property.c PCI: of_property: Handle interrupt parsing failures 2023-09-29 17:33:46 -05:00
of.c PCI: of: Destroy changeset when adding PCI device node fails 2023-09-29 17:33:51 -05:00
p2pdma.c PCI/P2PDMA: Remove redundant goto 2023-10-23 12:17:52 -05:00
pci-acpi.c Merge branch 'pci/pm' 2023-10-28 13:30:59 -05:00
pci-bridge-emul.c PCI: pci-bridge-emul: Set position of PCI capabilities to real HW value 2022-08-25 12:07:56 +02:00
pci-bridge-emul.h PCI: pci-bridge-emul: Set position of PCI capabilities to real HW value 2022-08-25 12:07:56 +02:00
pci-driver.c PCI/PM: Mark devices disconnected if upstream PCIe link is down on resume 2023-09-29 17:42:00 -05:00
pci-label.c
pci-mid.c
pci-pf-stub.c
pci-stub.c
pci-sysfs.c Driver core changes for 6.7-rc1 2023-11-03 15:15:47 -10:00
pci.c cxl for v6.8 2024-01-18 16:22:43 -08:00
pci.h pci-v6.8-changes 2024-01-17 16:23:17 -08:00
probe.c PCI: Log bridge info when first enumerating bridge 2023-12-15 17:28:43 -06:00
proc.c
quirks.c pci-v6.8-changes 2024-01-17 16:23:17 -08:00
remove.c PCI: Create device tree node for bridge 2023-08-22 14:56:09 -05:00
rom.c
search.c PCI: Add pci_get_base_class() helper 2023-09-28 16:49:44 -05:00
setup-bus.c PCI: Use resource names in PCI log messages 2023-12-15 17:28:42 -06:00
setup-irq.c
setup-res.c PCI: Use resource names in PCI log messages 2023-12-15 17:28:42 -06:00
slot.c PCI/sysfs: Constify struct kobj_type pci_slot_ktype 2023-02-16 12:00:25 -06:00
syscall.c PCI: Use consistent put_user() pointer types 2023-08-25 08:15:13 -05:00
vc.c PCI/VC: Use FIELD_GET() 2023-10-24 16:55:45 -05:00
vgaarb.c pci-v6.7-changes 2023-11-02 14:05:18 -10:00
vpd.c PCI/VPD: Add runtime power management to sysfs interface 2023-08-11 14:19:16 -05:00
xen-pcifront.c x86: always initialize xen-swiotlb when xen-pcifront is enabling 2023-07-31 17:54:27 +02:00