linux

Author	SHA1	Message	Date
James Zhu	aef06d2b1b	drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:10 -04:00
James Zhu	386061cd99	drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:10 -04:00
James Zhu	f55c0d6527	drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:09 -04:00
Peng Ju Zhou	9f04eb7acf	drm/amdgpu: Skip the program of MMMC_VM_AGP_* in SRIOV KMD should not program these registers, the value were defined in the host, so skip them in the SRIOV environment. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:09 -04:00
pengzhou	f5e25a83c1	drm/amdgpu: Modify MMHUB register access from MMIO to RLCG in file mmhub_v2* In SRIOV environment, KMD should access GC registers with RLCG if GC indirect access flag enabled. Signed-off-by: pengzhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:09 -04:00
Peng Ju Zhou	6ba3f59eb4	drm/amdgpu: Modify GC register access from MMIO to RLCG in file amdgpu_gmc.c In SRIOV environment, KMD should access GC registers with RLCG if GC indirect access flag enabled. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:08 -04:00
Peng Ju Zhou	f2958a8b87	drm/amdgpu: Modify GC register access from MMIO to RLCG in file nv.c In SRIOV environment, KMD should access GC registers with RLCG if GC indirect access flag enabled. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:08 -04:00
Peng Ju Zhou	7373fc5e2e	drm/amdgpu: Modify GC register access from MMIO to RLCG in file sdma_v5* In SRIOV environment, KMD should access GC registers with RLCG if GC indirect access flag enabled. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:08 -04:00
Peng Ju Zhou	a9dc23bee2	drm/amdgpu: Modify GC register access from MMIO to RLCG in file soc15.c In SRIOV environment, KMD should access GC registers with RLCG if GC indirect access flag enabled. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:07 -04:00
Peng Ju Zhou	d697f3d8b9	drm/amdgpu: Modify GC register access from MMIO to RLCG in file kfd_v10* In SRIOV environment, KMD should access GC registers with RLCG if GC indirect access flag enabled. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:07 -04:00
Peng Ju Zhou	cda722d2a8	drm/amdgpu: Modify GC register access from MMIO to RLCG in file gfx_v10* In SRIOV environment, KMD should access GC registers with RLCG if GC indirect access flag enabled. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:07 -04:00
Peng Ju Zhou	a5504e9ad4	drm/amdgpu: Indirect register access for Navi12 sriov This patch series are used for GC/MMHUB(part)/IH_RB_CNTL indirect access in the SRIOV environment. There are 4 bits, controlled by host, to control if GC/MMHUB(part)/IH_RB_CNTL indirect access enabled. (one bit is master bit controls other 3 bits) For GC registers, changing all the register access from MMIO to RLC and use RLC as the default access method in the full access time. For partial MMHUB registers, changing their access from MMIO to RLC in the full access time, the remaining registers keep the original access method. For IH_RB_CNTL register, changing it's access from MMIO to PSP. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:32:06 -04:00
Kevin Wang	8200b1cd85	drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error 1.correct KFD SDMA RLC queue register offset error. (all sdma rlc register offset is base on SDMA0.RLC0_RLC0_RB_CNTL) 2.HQD_N_REGS (19+6+7+12) 12: the 2 more resgisters than navi1x (SDMAx_RLCy_MIDCMD_DATA{9,10}) the patch also can be fixed NULL pointer issue when read /sys/kernel/debug/kfd/hqds on sienna_cichlid chip. Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 10:31:58 -04:00
Dave Airlie	9a91e5e0af	Merge tag 'amd-drm-next-5.14-2021-05-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-05-21: amdgpu: - RAS fixes - SR-IOV fixes - More BO management cleanups - Aldebaran fixes - Display fixes - Support for new GPU, Beige Goby - Backlight fixes amdkfd: - RAS fixes - DMA mapping fixes - HMM SVM fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210521045743.4047-1-alexander.deucher@amd.com	2021-05-21 15:59:05 +10:00
Dave Airlie	c99c4d0ca5	Merge tag 'amd-drm-next-5.14-2021-05-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-05-19: amdgpu: - Aldebaran updates - More LTTPR display work - Vangogh updates - SDMA 5.x GCR fixes - RAS fixes - PCIe ASPM support - Modifier fixes - Enable TMZ on Renoir - Buffer object code cleanup - Display overlay fixes - Initial support for multiple eDP panels - Initial SR-IOV support for Aldebaran - DP link training refactor - Misc code cleanups and bug fixes - SMU regression fixes for variable sized arrays - MAINTAINERS fixes for amdgpu amdkfd: - Initial SR-IOV support for Aldebaran - Topology fixes - Initial HMM SVM support - Misc code cleanups and bug fixes radeon: - Misc code cleanups and bug fixes - SMU regression fixes for variable sized arrays - Flickering fix for Oland with multiple 4K displays UAPI: - amdgpu: Drop AMDGPU_GEM_CREATE_SHADOW flag. This was always a kernel internal flag and userspace use of it has always been blocked. It's no longer needed so remove it. - amdkgd: HMM SVM support Overview: https://patchwork.freedesktop.org/series/85562/ Porposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/fxkamd/hmm-wip Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210520031258.231896-1-alexander.deucher@amd.com	2021-05-21 15:29:40 +10:00
James Zhu	20ebbfd22f	drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:04:58 -04:00
James Zhu	23f10a571d	drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:04:38 -04:00
James Zhu	ff48f6dbf0	drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:04:17 -04:00
James Zhu	4a62542ae0	drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:03:55 -04:00
James Zhu	2fb536ea42	drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:03:30 -04:00
James Zhu	0c6013377b	drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:03:10 -04:00
James Zhu	b95f045ea3	drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:02:52 -04:00
Kevin Wang	ba515a5821	drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error 1.correct KFD SDMA RLC queue register offset error. (all sdma rlc register offset is base on SDMA0.RLC0_RLC0_RB_CNTL) 2.HQD_N_REGS (19+6+7+12) 12: the 2 more resgisters than navi1x (SDMAx_RLCy_MIDCMD_DATA{9,10}) the patch also can be fixed NULL pointer issue when read /sys/kernel/debug/kfd/hqds on sienna_cichlid chip. Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-05-20 17:01:36 -04:00
Andrey Grodzovsky	07775fc138	drm/amdgpu: Unmap all MMIO mappings Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. v8: Add condition of xgmi.connected_to_cpu to MTTR handling and remove MTTR handling from the old place. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210517193105.491461-1-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Andrey Grodzovsky	98c6e6a7e2	drm/amdgpu: Verify DMA opearations from device are done In case device remove is just simualted by sysfs then verify device doesn't keep doing DMA to the released memory after pci_remove is done. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-16-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Andrey Grodzovsky	54a85db8de	drm/amdgpu: Fix hang on device removal. If removing while commands in flight you cannot wait to flush the HW fences on a ring since the device is gone. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-13-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Andrey Grodzovsky	ca4e17244b	drm/amdgpu: Prevent any job recoveries after device is unplugged. Return DRM_TASK_STATUS_ENODEV back to the scheduler when device is not present so they timeout timer will not be rearmed. v5: Update to match updated return values in enum drm_gpu_sched_stat Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-12-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Andrey Grodzovsky	f89f8c6baf	drm/amdgpu: Guard against write accesses after device removal This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed. v5: Protect more places wher memcopy_to/form_io takes place Protect IB submissions v6: Switch to !drm_dev_enter instead of scoping entire code with brackets. v7: Drop guard of HW ring commands emission protection since they are in GART and not in MMIO. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-10-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Andrey Grodzovsky	35bba8313b	drm/amdgpu: Convert driver sysfs attributes to static attributes This allows to remove explicit creation and destruction of those attrs and by this avoids warnings on device finalizing post physical device extraction. v5: Use newly added pci_driver.dev_groups directly Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-9-andrey.grodzovsky@amd.com	2021-05-19 23:50:27 -04:00
Andrey Grodzovsky	03f9016ed8	drm/amdgpu: Remap all page faults to per process dummy page. On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object. v4: Update for modified ttm_bo_vm_dummy_page Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-7-andrey.grodzovsky@amd.com	2021-05-19 23:50:27 -04:00
Andrey Grodzovsky	d10d0daa20	drm/amdgpu: Handle IOMMU enabled case. Problem: Handle all DMA IOMMU group related dependencies before the group is removed. Those manifest themself in that when IOMMU enabled DMA map/unmap is dependent on the presence of IOMMU group the device belongs to but, this group is released once the device is removed from PCI topology. Fix: Expedite all such unmap operations to pci remove driver callback. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list v7: Drop amdgpu_gart_fini In amdgpu_ih_ring_fini do uncinditional check (!ih->ring) to avoid freeing uniniitalized rings. Call amdgpu_ih_ring_fini unconditionally. v8: Add deatiled explanation Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210517143851.475058-1-andrey.grodzovsky@amd.com	2021-05-19 23:50:27 -04:00
Andrey Grodzovsky	e9669fb782	drm/amdgpu: Add early fini callback Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. v7: Split kfd suspend from device exit to expdite HW related stuff to amdgpu_pci_remove v8: Squash previous KFD commit into this commit to avoid compile break. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com	2021-05-19 23:48:50 -04:00
Andrey Grodzovsky	72c8c97b15	drm/amdgpu: Split amdgpu_device_fini into early and late Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped. v4: Change functions prefix early->hw and late->sw Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-3-andrey.grodzovsky@amd.com	2021-05-19 23:45:49 -04:00
Dave Airlie	ae25ec2fc6	Merge tag 'drm-misc-next-2021-05-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.14: UAPI Changes: Cross-subsystem Changes: Core Changes: * aperture: Fix unlocking on errors * legacy: Fix some doc comments Driver Changes: * drm/amdgpu: Free resource on fence usage query; Fix fence calculation; * drm/bridge: Lt9611: Add missing MODULE_DEVICE_TABLE * drm/i915: Print formats with %p4cc * drm/ingenic: IPU planes are now always of type OVERLAY * drm/nouveau: Remove left-over reference to struct drm_device.pdev * drm/panfrost: Disable devfreq if num_supplies > 1; Add Mediatek MT8183 + DT bindings; Cleanups * drm/simpledrm: Print resources with %pr; Fix use-after-free errors; Fix NULL deref; Fix MAINTAINERS entry * drm/vmwgfx: Fix memory allocation and leak in FIFO allocation; Fix return value in PCI resource setup Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YKJs2IfwSYvuGPU7@linux-uq9g.fritz.box	2021-05-20 13:31:12 +10:00
Christian König	81db370c88	drm/amdgpu: stop touching sched.ready in the backend This unfortunately comes up in regular intervals and breaks GPU reset for the engine in question. The sched.ready flag controls if an engine can't get working during hw_init, but should never be set to false during hw_fini. v2: squash in unused variable fix (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:45:00 -04:00
Lang Yu	6e8bcdd63a	drm/amd/amdgpu: fix a potential deadlock in gpu reset When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] * DEADLOCK * [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Christian KÃnig <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:50 -04:00
Aaron Liu	9a530062d5	drm/amdgpu: modify system reference clock source for navi+ (V2) Starting from Navi+, the rlc reference clock is used for system clock from vbios gfx_info table. It is incorrect to use core_refclk_10khz of vbios smu_info table as system clock. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:48 -04:00
Guchun Chen	87476d12c5	drm/amdgpu: update sdma golden setting for Navi12 Current golden setting is out of date. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:45 -04:00
Guchun Chen	6c65d8678c	drm/amdgpu: update gc golden setting for Navi12 Current golden setting is out of date. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:43 -04:00
xinhui pan	a8e56b80df	drm/amdgpu: Fix a use-after-free looks like we forget to set ttm->sg to NULL. Hit panic below [ 1235.844104] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b7b4b: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI [ 1235.989074] Call Trace: [ 1235.991751] sg_free_table+0x17/0x20 [ 1235.995667] amdgpu_ttm_backend_unbind.cold+0x4d/0xf7 [amdgpu] [ 1236.002288] amdgpu_ttm_backend_destroy+0x29/0x130 [amdgpu] [ 1236.008464] ttm_tt_destroy+0x1e/0x30 [ttm] [ 1236.013066] ttm_bo_cleanup_memtype_use+0x51/0xa0 [ttm] [ 1236.018783] ttm_bo_release+0x262/0xa50 [ttm] [ 1236.023547] ttm_bo_put+0x82/0xd0 [ttm] [ 1236.027766] amdgpu_bo_unref+0x26/0x50 [amdgpu] [ 1236.032809] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7aa/0xd90 [amdgpu] [ 1236.040400] kfd_ioctl_alloc_memory_of_gpu+0xe2/0x330 [amdgpu] [ 1236.046912] kfd_ioctl+0x463/0x690 [amdgpu] Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:39 -04:00
Mukul Joshi	1f6256590c	drm/amdgpu: Query correct register for DF hashing on Aldebaran For Aldebaran, driver needs to query DramMegaBaseAddress to check if DF hashing is enabled. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:19 -04:00
James Zhu	295c4f513f	drm/amdgpu: add video_codecs query support for aldebaran Add video_codecs query support for aldebaran. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:16 -04:00
Felix Kuehling	e552ee40b0	drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind The dmabuf attachment should be updated by moving the SG BO to DOMAIN_CPU and back to DOMAIN_GTT. This does not necessarily invoke the populate/unpopulate callbacks. Do this in backend_bind/unbind instead. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:10 -04:00
Felix Kuehling	5ac3c3e45f	drm/amdgpu: Add DMA mapping of GTT BOs Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:06 -04:00
Felix Kuehling	9e5d275319	drm/amdgpu: Move kfd_mem_attach outside reservation This is needed to avoid deadlocks with DMA buf import in the next patch. Also move PT/PD validation out of kfd_mem_attach, that way the caller can bo this unconditionally. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:03 -04:00
Felix Kuehling	b72ed8a2de	drm/amdgpu: DMA map/unmap when updating GPU mappings DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called with the BO and page tables reserved, so we can safely update the DMA mapping. DMA unmap when a BO is unmapped from a GPU and before updating mappings in restore workers. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:43:59 -04:00
Felix Kuehling	264fb4d332	drm/amdgpu: Add multi-GPU DMA mapping helpers Add BO-type specific helpers functions to DMA-map and unmap kfd_mem_attachments. Implement this functionality for userptrs by creating one SG BO per GPU and filling it with a DMA mapping of the pages from the original mem->bo. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:43:56 -04:00
Felix Kuehling	7141394edc	drm/amdgpu: Simplify AQL queue mapping Do AQL queue double-mapping with a single attach call. That will make it easier to create per-GPU BOs later, to be shared between the two BO VA mappings on the same GPU. Freeing the attachments is not necessary if map_to_gpu fails. These will be cleaned up when the kdg_mem object is destroyed in amdgpu_amdkfd_gpuvm_free_memory_of_gpu. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:43:52 -04:00
Felix Kuehling	4e94272f8a	drm/amdgpu: Keep a bo-reference per-attachment For now they all reference the same BO. For correct DMA mappings they will refer to different BOs per-GPU. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:43:49 -04:00
Felix Kuehling	c780b2eedb	drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment This name is more fitting, especially for the changes coming next to support multi-GPU systems with proper DMA mappings. Cleaned up the code and renamed some related functions and variables to improve readability. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:43:46 -04:00

... 37 38 39 40 41 ...

11170 Commits