linux/drivers/cxl
Dan Williams 8d2ad999ca cxl/port: Fix delete_endpoint() vs parent unregistration race
The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of
ports (struct cxl_port objects) between an endpoint and the root of a
CXL topology. Each port including the endpoint port is attached to the
cxl_port driver.

Given that setup, it follows that when either any port in that lineage
goes through a cxl_port ->remove() event, or the memdev goes through a
cxl_mem ->remove() event. The hierarchy below the removed port, or the
entire hierarchy if the memdev is removed needs to come down.

The delete_endpoint() callback is careful to check whether it is being
called to tear down the hierarchy, or if it is only being called to
teardown the memdev because an ancestor port is going through
->remove().

That care needs to take the device_lock() of the endpoint's parent.
Which requires 2 bugs to be fixed:

1/ A reference on the parent is needed to prevent use-after-free
   scenarios like this signature:

    BUG: spinlock bad magic on CPU#0, kworker/u56:0/11
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Workqueue: cxl_port detach_memdev [cxl_core]
    RIP: 0010:spin_bug+0x65/0xa0
    Call Trace:
      do_raw_spin_lock+0x69/0xa0
     __mutex_lock+0x695/0xb80
     delete_endpoint+0xad/0x150 [cxl_core]
     devres_release_all+0xb8/0x110
     device_unbind_cleanup+0xe/0x70
     device_release_driver_internal+0x1d2/0x210
     detach_memdev+0x15/0x20 [cxl_core]
     process_one_work+0x1e3/0x4c0
     worker_thread+0x1dd/0x3d0

2/ In the case of RCH topologies, the parent device that needs to be
   locked is not always @port->dev as returned by cxl_mem_find_port(), use
   endpoint->dev.parent instead.

Fixes: 8dd2bc0f8e ("cxl/mem: Add the cxl_mem driver")
Cc: <stable@vger.kernel.org>
Reported-by: Robert Richter <rrichter@amd.com>
Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:23 -07:00
..
core cxl/port: Fix delete_endpoint() vs parent unregistration race 2023-10-27 20:13:23 -07:00
acpi.c cxl/acpi: Annotate struct cxl_cxims_data with __counted_by 2023-09-22 14:31:04 -07:00
cxl.h Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxl 2023-06-25 18:56:13 -07:00
cxlmem.h cxl/memdev: Only show sanitize sysfs files when supported 2023-07-28 13:16:54 -06:00
cxlpci.h cxl/pci: Find and register CXL PMU devices 2023-05-30 11:20:35 -07:00
Kconfig cxl: fix CONFIG_FW_LOADER dependency 2023-07-14 14:32:22 -06:00
Makefile cxl/pmem: Introduce nvdimm_security_ops with ->get_flags() operation 2022-11-30 16:30:47 -08:00
mem.c Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxl 2023-06-25 18:56:13 -07:00
pci.c cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native() 2023-09-11 15:24:30 -07:00
pmem.c cxl/mbox: Move mailbox related driver state to its own data structure 2023-06-25 14:31:08 -07:00
pmu.h cxl/pci: Find and register CXL PMU devices 2023-05-30 11:20:35 -07:00
port.c Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxl 2023-06-25 18:56:13 -07:00
security.c Merge branch 'for-6.5/cxl-type-2' into for-6.5/cxl 2023-06-25 17:16:51 -07:00