linux

Author	SHA1	Message	Date
Matthew Auld	21856e1e34	drm/ttm: move ttm_tt_{add, clear}_mapping into amdgpu Now that setting page->index shouldn't be needed anymore, we are just left with setting page->mapping, and here it looks like amdgpu is the only user, where pointing the page->mapping at the dev_mapping is used to verify that the pages do indeed belong to the device, if userspace later tries to touch them. v2(Christian): - Drop the functions altogether and just inline modifying the page->mapping Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210927114114.152310-3-matthew.auld@intel.com Signed-off-by: Christian König <christian.koenig@amd.com>	2021-09-29 13:55:09 +02:00
Prike Liang	26db706a6d	drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix In the s2idle stress test sdma resume fail occasionally,in the failed case GPU is in the gfxoff state.This issue may introduce by firmware miss handle doorbell S/R and now temporary fix the issue by forcing exit gfxoff for sdma resume. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-28 14:40:27 -04:00
Simon Ser	98122e63a7	drm/amdgpu: check tiling flags when creating FB on GFX8- On GFX9+, format modifiers are always enabled and ensure the frame-buffers can be scanned out at ADDFB2 time. On GFX8-, format modifiers are not supported and no other check is performed. This means ADDFB2 IOCTLs will succeed even if the tiling isn't supported for scan-out, and will result in garbage displayed on screen [1]. Fix this by adding a check for tiling flags for GFX8 and older. The check is taken from radeonsi in Mesa (see how is_displayable is populated in gfx6_compute_surface). Changes in v2: use drm_WARN_ONCE instead of drm_WARN (Michel) [1]: https://github.com/swaywm/wlroots/issues/3185 Signed-off-by: Simon Ser <contact@emersion.fr> Acked-by: Michel Dänzer <mdaenzer@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <hwentlan@amd.com> Cc: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-28 14:40:19 -04:00
Hawking Zhang	9f52c25f59	drm/amdgpu: correct initial cp_hqd_quantum for gfx9 didn't read the value of mmCP_HQD_QUANTUM from correct register offset Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-28 14:39:29 -04:00
Leslie Shi	66805763a9	drm/amdgpu: fix gart.bo pin_count leak gmc_v{9,10}_0_gart_disable() isn't called matched with correspoding gart_enbale function in SRIOV case. This will lead to gart.bo pin_count leak on driver unload. Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 14:38:16 -04:00
Hawking Zhang	e794747622	drm/amdgpu: correct initial cp_hqd_quantum for gfx9 didn't read the value of mmCP_HQD_QUANTUM from correct register offset Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 09:30:08 -04:00
Tao Zhou	f524dd54a7	drm/amdgpu: skip umc ras irq handling in poison mode (v2) In ras poison mode, umc uncorrectable error will be ignored until the corrupted data consumed by another ras module (such as gfx, sdma). v2: update the debug message and replace dev_warn with dev_info. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 09:30:07 -04:00
Tao Zhou	e43488493c	drm/amdgpu: set poison supported flag for RAS (v2) Add RAS poison supported flag and tell PSP RAS TA about the info. v2: rename poison mode to poison supported, we can also disable poison mode even we support it. print value of poison supported if ras feature enablement fails. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 09:30:07 -04:00
Tao Zhou	aaca8c3861	drm/amdgpu: add poison mode query for UMC Add ras poison mode query interface for UMC. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 09:30:06 -04:00
Tao Zhou	ca5c636dc6	drm/amdgpu: add poison mode query for DF (v2) Add ras poison mode query interface for DF. v2: replace RREG32_PCIE with RREG32_SOC15. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 09:30:06 -04:00
Candice Li	77ec28eac2	drm/amdgpu: Update PSP TA Invoke to use common TA context as input Updated invoke to use new common TA structure similarily to load/unload. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 09:30:06 -04:00
Leslie Shi	71cf9e72b3	drm/amdgpu: fix gart.bo pin_count leak gmc_v{9,10}_0_gart_disable() isn't called matched with correspoding gart_enbale function in SRIOV case. This will lead to gart.bo pin_count leak on driver unload. Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-28 09:30:05 -04:00
Dave Airlie	1e3944578b	Merge tag 'amd-drm-next-5.16-2021-09-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.16-2021-09-27: amdgpu: - RAS improvements - BACO fixes - Yellow Carp updates - Misc code cleanups - Initial DP 2.0 support - VCN priority handling - Cyan Skillfish updates - Rework IB handling for multimedia engine tests - Backlight fixes - DCN 3.1 power saving improvements - Runtime PM fixes - Modifier support for DCC image stores for gfx 10.3 - Hotplug fixes - Clean up stack related warnings in display code - DP alt mode fixes - Display rework for better handling FP code - Debugfs fixes amdkfd: - SVM fixes - DMA map fixes radeon: - AGP fix From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210927212653.4575-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2021-09-28 17:08:26 +10:00
Alex Deucher	2485e2753e	drm/amdgpu: make soc15_common_ip_funcs static It's not used outside of soc15.c Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 16:35:27 -04:00
Candice Li	9080a18fc5	drm/amdgpu: Remove all code paths under the EAGAIN path in RAS late init All code paths under the EAGAIN path in RAS late init are unused. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 16:35:13 -04:00
John Clements	73490d2658	drm/amdgpu: Consolidate RAS cmd warning messages Explicity post warning if cmd is issued against unsupported IP Update to latest RAS TA interface Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 16:35:04 -04:00
John Clements	640ae42efb	drm/amdgpu: Updated RAS infrastructure Update RAS infrastructure to support RAS query for MCA subblocks Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 16:34:43 -04:00
Guchun Chen	6effad8abe	drm/amdgpu: move amdgpu_virt_release_full_gpu to fini_early stage adev->rmmio is set to be NULL in amdgpu_device_unmap_mmio to prevent access after pci_remove, however, in SRIOV case, amdgpu_virt_release_full_gpu will still use adev->rmmio for access after amdgpu_device_unmap_mmio. The patch is to move such SRIOV calling earlier to fini_early stage. Fixes: `07775fc138` ("drm/amdgpu: Unmap all MMIO mappings") Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 16:34:28 -04:00
Andrey Grodzovsky	ebe86a57c8	drm/amdgpu: Fix resume failures when device is gone Problem: When device goes into suspend and unplugged during it then all HW programming during resume fails leading to a bad SW during pci remove handling which follows. Because device is first resumed and only later removed we cannot rely on drm_dev_enter/exit here. Fix: Use a flag we use for PCIe error recovery to avoid accessing registres. This allows to successfully complete pm resume sequence and finish pci remove. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:29 -04:00
Andrey Grodzovsky	c03509cbc0	drm/amdgpu: Fix MMIO access page fault Add more guards to MMIO access post device unbind/unplug Bug: https://bugs.archlinux.org/task/72092?project=1&order=dateopened&sort=desc&pagenum=1 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:29 -04:00
Andrey Grodzovsky	d82e2c249c	drm/amdgpu: Fix crash on device remove/driver unload Crash: BUG: unable to handle page fault for address: 00000000000010e1 RIP: 0010:vega10_power_gate_vce+0x26/0x50 [amdgpu] Call Trace: pp_set_powergating_by_smu+0x16a/0x2b0 [amdgpu] amdgpu_dpm_set_powergating_by_smu+0x92/0xf0 [amdgpu] amdgpu_dpm_enable_vce+0x2e/0xc0 [amdgpu] vce_v4_0_hw_fini+0x95/0xa0 [amdgpu] amdgpu_device_fini_hw+0x232/0x30d [amdgpu] amdgpu_driver_unload_kms+0x5c/0x80 [amdgpu] amdgpu_pci_remove+0x27/0x40 [amdgpu] pci_device_remove+0x3e/0xb0 device_release_driver_internal+0x103/0x1d0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x79/0xa0 pci_stop_and_remove_bus_device_locked+0x1b/0x30 remove_store+0x7b/0x90 dev_attr_store+0x17/0x30 sysfs_kf_write+0x4b/0x60 kernfs_fop_write_iter+0x151/0x1e0 Why: VCE/UVD had dependency on SMC block for their suspend but SMC block is the first to do HW fini due to some constraints How: Since the original patch was dealing with suspend issues move the SMC block dependency back into suspend hooks as was done in V1 of the original patches. Keep flushing idle work both in suspend and HW fini seuqnces since it's essential in both cases. Fixes: `859e465927` ("drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend") Fixes: `bf756fb833` ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend") Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:29 -04:00
xinhui pan	0a2267809f	drm/amdgpu: Fix uvd ib test timeout when use pre-allocated BO Now we use same BO for create/destroy msg. So destroy will wait for the fence returned from create to be signaled. The default timeout value in destroy is 10ms which is too short. Lets wait both fences with the specific timeout. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:29 -04:00
xinhui pan	b2fe31cf64	drm/amdgpu: Put drm_dev_enter/exit outside hot codepath We hit soft hang while doing memory pressure test on one numa system. After a qucik look, this is because kfd invalid/valid userptr memory frequently with process_info lock hold. Looks like update page table mapping use too much cpu time. perf top says below, 75.81% [kernel] [k] __srcu_read_unlock 6.19% [amdgpu] [k] amdgpu_gmc_set_pte_pde 3.56% [kernel] [k] __srcu_read_lock 2.20% [amdgpu] [k] amdgpu_vm_cpu_update 2.20% [kernel] [k] __sg_page_iter_dma_next 2.15% [drm] [k] drm_dev_enter 1.70% [drm] [k] drm_prime_sg_to_dma_addr_array 1.18% [kernel] [k] __sg_alloc_table_from_pages 1.09% [drm] [k] drm_dev_exit So move drm_dev_enter/exit outside gmc code, instead let caller do it. They are gart_unbind, gart_map, vm_clear_bo, vm_update_pdes and gmc_init_pdb0. vm_bo_update_mapping already calls it. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:29 -04:00
John Clements	226f4f5a6b	drm/amdgpu: Resolve nBIF RAS error harvesting bug Set correct RAS nBIF error query register offsets on aldebaran Signed-off-by: John Clements <john.clements@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:28 -04:00
Candice Li	17c6805a00	drm/amdgpu: Update PSP TA unload function Update PSP TA unload function to use PSP TA context as input argument. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:28 -04:00
Candice Li	3f83f17b73	drm/amdgpu: Conform ASD header/loading to generic TA systems Update asd_context structure and add asd_initialize function to conform ASD header/loading to generic TA systems. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:28 -04:00
Paul Menzel	31ea43442d	drm/amdgpu: Demote TMZ unsupported log message from warning to info As the user cannot do anything about the unsupported Trusted Memory Zone (TMZ) feature, do not warn about it, but make it informational, so demote the log level from warning to info. Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:28 -04:00
Michel Dänzer	6cd1f9b40a	drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count This was unusual; normally, inline functions are declared static as well, and defined in a header file if used by multiple compilation units. The latter would be more involved in this case, so just drop the inline declaration for now. Fixes compile failure building for ppc64le on RHEL 8: In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32, from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available 90 \| inline uint32_t amdgpu_ras_eeprom_max_record_count(void); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here 1985 \| max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count(); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `c84d46707e` "drm/amdgpu: validate bad page threshold in ras(v3)" Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-23 15:17:28 -04:00
Dave Airlie	0dfc70818a	Merge tag 'drm-misc-next-2021-09-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for $kernel-version: UAPI Changes: Cross-subsystem Changes: - dma-buf: Avoid a warning with some allocations, Remove DMA_FENCE_TRACE macros Core Changes: - bridge: New helper to git rid of panels in drivers - fence: Improve dma_fence_add_callback documentation, Improve dma_fence_ops->wait documentation - ioctl: Unexport drm_ioctl_permit - lease: Documentation improvements - fourcc: Add new macro to determine the modifier vendor - quirks: Add the Steam Deck, Chuwi HiBook, Chuwi Hi10 Pro, Samsung Galaxy Book 10.6, KD Kurio Smart C15200 2-in-1, Lenovo Ideapad D330 - resv: Improve the documentation - shmem-helpers: Allocate WC pages on x86, Switch to vmf_insert_pfn - sched: Fix for a timer being canceled too soon, Avoid null pointer derefence if the fence is null in drm_sched_fence_free, Convert drivers to rely on its dependency tracking - ttm: Switch to kerneldoc, new helper to clear all DMA mappings, pool shrinker optitimization, Remove ttm_tt_destroy_common, Fix for unbinding on multiple drivers Driver Changes: - bochs: New PCI IDs - msm: Fence ordering impromevemnts - stm: Add layer alpha support, zpos - v3d: Fix for a Vulkan CTS failure - vc4: Conversion to the new bridge helpers - vgem: Use shmem helpers - virtio: Support mapping exported vram - zte: Remove obsolete driver - bridge: Probe improvements for it66121, enable DSI EOTP for anx7625, errors propagation improvements for anx7625 - panels: 60fps mode for otm8009a, New driver for Samsung S6D27A1 Signed-off-by: Dave Airlie <airlied@redhat.com> # gpg: Signature made Thu 16 Sep 2021 17:30:50 AEST # gpg: using EDDSA key 5C1337A45ECA9AEB89060E9EE3EF0D6F671851C5 # gpg: Can't check signature: No public key From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210916073132.ptbbmjetm7v3ufq3@gilmour	2021-09-22 15:30:40 +10:00
Paul Menzel	b287e49468	drm/amdgpu: Demote TMZ unsupported log message from warning to info As the user cannot do anything about the unsupported Trusted Memory Zone (TMZ) feature, do not warn about it, but make it informational, so demote the log level from warning to info. Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-16 09:56:24 -04:00
Michel Dänzer	114518ff3b	drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count This was unusual; normally, inline functions are declared static as well, and defined in a header file if used by multiple compilation units. The latter would be more involved in this case, so just drop the inline declaration for now. Fixes compile failure building for ppc64le on RHEL 8: In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32, from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available 90 \| inline uint32_t amdgpu_ras_eeprom_max_record_count(void); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here 1985 \| max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count(); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `c84d46707e` "drm/amdgpu: validate bad page threshold in ras(v3)" Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-16 09:56:24 -04:00
James Zhu	f02abeb077	drm/amdgpu: move iommu_resume before ip init/resume Separate iommu_resume from kfd_resume, and move it before other amdgpu ip init/resume. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-16 09:56:24 -04:00
James Zhu	8066008482	drm/amdgpu: add amdgpu_amdkfd_resume_iommu Add amdgpu_amdkfd_resume_iommu for amdgpu. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-16 09:56:24 -04:00
James Zhu	fefc01f042	drm/amdkfd: separate kfd_iommu_resume from kfd_resume Separate kfd_iommu_resume from kfd_resume for fine-tuning of amdgpu device init/resume/reset/recovery sequence. v2: squash in fix for !CONFIG_HSA_AMD Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-16 09:56:24 -04:00
Nirmoy Das	b04ce53eac	drm/amdgpu: use IS_ERR for debugfs APIs debugfs APIs returns encoded error so use IS_ERR for checking return value. v2: return PTR_ERR(ent) References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-By: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-14 16:21:15 -04:00
Christian König	c92db8d64f	drm/amdgpu: fix use after free during BO move The memory backing old_mem is already freed at that point, move the check a bit more up. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `bfa3357ef9` ("drm/ttm: allocate resource object instead of embedding it v2") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1699 Acked-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-09-14 16:17:39 -04:00
Ernst Sjöstrand	67a44e6598	drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10 Seems like newer cards can have even more instances now. Found by UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29 index 8 is out of range for type 'uint32_t *[8]' Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697 Cc: stable@vger.kernel.org Signed-off-by: Ernst Sjöstrand <ernstp@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 16:14:41 -04:00
xinhui pan	0fcfb30019	drm/amdgpu: Fix a race of IB test Direct IB submission should be exclusive. So use write lock. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:58 -04:00
xinhui pan	405a81ae3f	drm/amdgpu: VCN avoid memory allocation during IB test alloc extra msg from direct IB pool. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:58 -04:00
xinhui pan	cb9038aa8a	drm/amdgpu: VCE avoid memory allocation during IB test alloc extra msg from direct IB pool. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:58 -04:00
xinhui pan	68331d7cf3	drm/amdgpu: UVD avoid memory allocation during IB test move BO allocation in sw_init. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:58 -04:00
Candice Li	de3a1e3360	drm/amdgpu: Unify PSP TA context Remove all TA binary structures and add the specific binary structure in struct ta_context. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:58 -04:00
James Zhu	9cec53c18a	drm/amdgpu: move iommu_resume before ip init/resume Separate iommu_resume from kfd_resume, and move it before other amdgpu ip init/resume. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:58 -04:00
James Zhu	ea20e246f3	drm/amdgpu: add amdgpu_amdkfd_resume_iommu Add amdgpu_amdkfd_resume_iommu for amdgpu. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:58 -04:00
James Zhu	f8846323d5	drm/amdkfd: separate kfd_iommu_resume from kfd_resume Separate kfd_iommu_resume from kfd_resume for fine-tuning of amdgpu device init/resume/reset/recovery sequence. v2: squash in fix for !CONFIG_HSA_AMD Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:59:46 -04:00
shaoyunl	8e6d0b6996	drm/amdgpu: Get atomicOps info from Host for sriov setup The AtomicOp Requester Enable bit is reserved in VFs and the PF value applies to all associated VFs. so guest driver can not directly enable the atomicOps for VF, it depends on PF to enable it. In current design, amdgpu driver will get the enabled atomicOps bits through private pf2vf data Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:57:11 -04:00
xinhui pan	a7496559e4	drm/amdgpu: Increase direct IB pool size Direct IB pool is used for vce/vcn IB extra msg too. Increase its size to AMDGPU_IB_POOL_SIZE. v2: Squash in unused variable removal Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:56:49 -04:00
John Clements	3771449bc8	drm/amdgpu: Update RAS trigger error block support Added trigger error support for MP0/MP1/MPIO blocks Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:37:48 -04:00
John Clements	334f81d164	drm/amdgpu: Update RAS status print Remove uncessary RAS status prints Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:37:41 -04:00
Likun Gao	02f958a20c	drm/amdgpu: refactor function to init no-psp fw Refactor the code of amdgpu_ucode_init_single_fw to make it more readable as too many ucode need to handle on this function currently. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-09-14 15:37:34 -04:00

... 26 27 28 29 30 ...

11170 Commits