Commit Graph

11358 Commits

Author SHA1 Message Date
Hamza Mahfooz
17d819e282 Revert "drm/amdgpu: use dirty framebuffer helper"
This reverts commit 66f99628eb.

Unfortunately, that commit causes performance regressions on non-PSR
setups. So, just revert it until FB_DAMAGE_CLIPS support can be added.

Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2189
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216554
Fixes: 66f99628eb ("drm/amdgpu: use dirty framebuffer helper")
Fixes: abbc7a3daf ("drm/amdgpu: don't register a dirty callback for non-atomic")
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:08:27 -04:00
Philip Yang
2302d50714 drm/amdgpu: Correct amdgpu_amdkfd_total_mem_size calculation
amdkfd_total_mem_size is the size of total GPUs vram plus system memory
to estimate page tables memory usage and leave enough VRAM room for page
tables allocation.

Calculate amdkfd_total_mem_size in amdgpu_amdkfd_device_probe is
incorrect because adev->gmc.real_vram_size is still 0 called from
amdgpu_device_ip_early_init. Move the calculation
to amdgpu_amdkfd_device_init to get the correct VRAM size.

Do reverse calculation in amdgpu_amdkfd_device_fini_sw to support
hot-unplugging GPUs.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:08:18 -04:00
Philip Yang
9a3c6067bd drm/amdgpu: Set vmbo destroy after pt bo is created
Under VRAM usage pression, map to GPU may fail to create pt bo and
vmbo->shadow_list is not initialized, then ttm_bo_release calling
amdgpu_bo_vm_destroy to access vmbo->shadow_list generates below
dmesg and NULL pointer access backtrace:

Set vmbo destroy callback to amdgpu_bo_vm_destroy only after creating pt
bo successfully, otherwise use default callback amdgpu_bo_destroy.

amdgpu: amdgpu_vm_bo_update failed
amdgpu: update_gpuvm_pte() failed
amdgpu: Failed to map bo to gpuvm
amdgpu 0000:43:00.0: amdgpu: Failed to map peer:0000:43:00.0 mem_domain:2
BUG: kernel NULL pointer dereference, address:
 RIP: 0010:amdgpu_bo_vm_destroy+0x4d/0x80 [amdgpu]
 Call Trace:
  <TASK>
  ttm_bo_release+0x207/0x320 [amdttm]
  amdttm_bo_init_reserved+0x1d6/0x210 [amdttm]
  amdgpu_bo_create+0x1ba/0x520 [amdgpu]
  amdgpu_bo_create_vm+0x3a/0x80 [amdgpu]
  amdgpu_vm_pt_create+0xde/0x270 [amdgpu]
  amdgpu_vm_ptes_update+0x63b/0x710 [amdgpu]
  amdgpu_vm_update_range+0x2e7/0x6e0 [amdgpu]
  amdgpu_vm_bo_update+0x2bd/0x600 [amdgpu]
  update_gpuvm_pte+0x160/0x420 [amdgpu]
  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x313/0x1130 [amdgpu]
  kfd_ioctl_map_memory_to_gpu+0x115/0x390 [amdgpu]
  kfd_ioctl+0x24a/0x5b0 [amdgpu]

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:08:09 -04:00
Arunpravin Paneer Selvam
312b4dc11d drm/amdgpu: Fix VRAM BO swap issue
DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag
v3: Rewrite the code to save a bit of calculations and
    variables (Christian)

Fixes: c9cad937c0 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:07:37 -04:00
Ruili Ji
21a550de5f drm/amdgpu: Enable F32_WPTR_POLL_ENABLE in mqd
This patch is to fix the SDMA user queue doorbell missing issue on
SDMA 6.0. F32_WPTR_POLL_ENABLE has to be set if doorbell mode is
used. Otherwise ringing SDMA user queue doorbell can't wake up
system from gfxoff.

Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-06 12:05:44 -04:00
Yang Yingliang
525530ad9a drm/amdgpu/sdma: add missing release_firmware() in amdgpu_sdma_init_microcode()
In some error path in amdgpu_sdma_init_microcode(), release_firmware() is
not called, the memory allocated in request_firmware() will be leaked,
calling amdgpu_sdma_destroy_inst_ctx() which calls release_firmware() to
avoid memory leak.

Fixes: 15aa13056d ("drm/amdgpu: add function to init SDMA microcode")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:05:01 -04:00
Sonny Jiang
e626d9b9c6 drm/amdgpu: Enable VCN PG on GC11_0_1
Enable VCN PG on GC11_0_1

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-06 12:02:49 -04:00
Dave Airlie
65898687cf Merge tag 'amd-drm-next-6.1-2022-09-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.1-2022-09-30:

amdgpu:
- RLC FW code cleanup
- RLC fixes for GC 11.x
- SMU 13.x fixes
- CP FW code cleanup
- SDMA FW code cleanup
- GC 11.x fixes
- DCN 3.2.x fixes
- DCN 3.1.4 fixes
- Misc fixes
- RAS fixes
- SR-IOV fixes
- VCN 4.x fixes

amdkfd:
- GC 11.x fixes
- Xnack fixes
- UBSAN warning fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220930162012.5823-1-alexander.deucher@amd.com
2022-10-04 09:42:24 +10:00
Sonny Jiang
730548ba02 drm/amdgpu: Enable sram on vcn_4_0_2
Enable sram on vcn_4_0_2

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 11:21:02 -04:00
Sonny Jiang
0b37f47494 drm/amdgpu: Enable VCN DPG for GC11_0_1
Enable VCN DPG on GC11_0_1

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 11:20:43 -04:00
Le Ma
a79852a393 drm/amdgpu: correct the memcpy size for ip discovery firmware
Use fw->size instead of discovery_tmr_size for fallback path.

Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:44:02 -04:00
Vignesh Chander
f61a825aa8 drm/amdgpu: Skip put_reset_domain if it doesn't exist
For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:43:52 -04:00
Graham Sider
e67135571e drm/amdgpu: remove switch from amdgpu_gmc_noretry_set
Simplify the logic in amdgpu_gmc_noretry_set by getting rid of the
switch. Also set noretry=1 as default for GFX 10.3.0 and greater since
retry faults are not supported.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:43:41 -04:00
Leo Li
3ff4ccc3e9 drm/amdgpu: Fix mc_umc_status used uninitialized warning
On ChromeOS clang build, the following warning is seen:

/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:6: error: variable 'mc_umc_status' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
        if (mca_addr == UMC_INVALID_ADDR) {
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:485:21: note: uninitialized use occurs here
        if ((REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 &&
                           ^~~~~~~~~~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:1208:5: note: expanded from macro 'REG_GET_FIELD'
        (((value) & REG_FIELD_MASK(reg, field)) >> REG_FIELD_SHIFT(reg, field))
           ^~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:2: note: remove the 'if' if its condition is always true
        if (mca_addr == UMC_INVALID_ADDR) {
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:460:24: note: initialize the variable 'mc_umc_status' to silence this warning
        uint64_t mc_umc_status, mc_umc_addrt0;
                              ^
                               = 0
1 error generated.
make[5]: *** [/mnt/host/source/src/third_party/kernel/v5.15/scripts/Makefile.build:289: drivers/gpu/drm/amd/amdgpu/umc_v6_7.o] Error 1

Fix by initializing mc_umc_status = 0.

Fixes: 1014bd1cb3 ("drm/amdgpu: support to convert dedicated umc mca address")
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:50 -04:00
Tao Zhou
5e1fdf76cf drm/amdgpu: add page retirement handling for CPU RAS
Do RAS page retirement in poison consumption handler unconditionally.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:36 -04:00
Tao Zhou
cd4c99f103 drm/amdgpu: use RAS error address convert api in mca notifier
Use the convert interface to simplify code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:31 -04:00
Tao Zhou
1014bd1cb3 drm/amdgpu: support to convert dedicated umc mca address
Update umc error address query interface, the mca address can be read
from register or input from parameter.

TODO: define a common address conversion function to simplify the code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:21 -04:00
Tao Zhou
c19a5f325a drm/amdgpu: export umc error address convert interface
Make it global so we can convert specific mca address.

v2: rename query_error_address_per_channel to
convert_ras_error_address

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:15 -04:00
Likun Gao
baf28cc10a drm/amdgpu: fix sdma v4 init microcode error
Fix init SDMA microcode error for sdma v4, which caused by mistake when
rearch sdma init microcode function (coding 4.2.2 to 4.2.0).

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:08 -04:00
Bokun Zhang
d7274ec723 drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
  during the suspend time. Furthermore, we cannot request a
  mode 1 reset under SRIOV as VF. Therefore, we will skip it
  as it is called in suspend_noirq() function.

- In the resume code path, we need to send REQ_GPU_INIT to the
  hypervisor and also resume PSP IP block under SRIOV.

Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Likun Gao
2d89e2ddfd drm/amdgpu: fix compiler warning for amdgpu_gfx_cp_init_microcode
Change the type of parameter on amdgpu_gfx_cp_init_microcode to fix
compiler warning.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Hawking Zhang
940d4dd402 drm/amdgpu: add rlc_sr_cntl_list to firmware array
To allow upload the list via psp

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Jiadong.Zhu
e7b8e90add drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call
amdgpu_fence_process which must be used in irq handler.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Jiadong.Zhu
415be17fb2 drm/amdgpu: Correct the position in patch_cond_exec
The current position calulated in gfx_v9_0_ring_emit_patch_cond_exec
underflows when the wptr is divisible by ring->buf_mask + 1.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Graham Sider
3e9cf23428 drm/amdgpu: pass queue size and is_aql_queue to MES
Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:44 -04:00
David Belanger
585a82618b drm/amdgpu: Enable SA software trap.
Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.
v3: Remove debugger code changes.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Ruijing Dong
167be85228 drm/amdgpu/vcn: update vcn4 fw shared data structure
update VF_RB_SETUP_FLAG, add SMU_DPM_INTERFACE_FLAG,
and corresponding change in VCN4.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
b077656b8c drm/amdgpu/sdma6: use common function to init sdma fw
Use common function to init sdma v6 firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
52642d13d6 drm/amdgpu: support sdma struct v2 fw init
Support SDMA firmware init on common function for sdma v2 struct.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
108db8decf drm/amdgpu/sdma5: use common function to init sdma fw
Use common function to init sdma v5 firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
a2d3b4b81f drm/amdgpu/sdma4: use common function to init sdma fw
Use common function to init sdma v4 firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
15aa13056d drm/amdgpu: add function to init SDMA microcode
Add an common function to init SDMA related microcode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
e268df1d20 drm/amdgpu/gfx11: use common function to init cp fw
Use common function to init gfx v11 CP firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
5993e4c68a drm/amdgpu/gfx10: use common function to init CP fw
Use common function to init gfx v10 CP firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
93cad722d3 drm/amdgpu/gfx9: use common function to init cp fw
Use common function to init gfx v9 CP firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao
ec71b25017 drm/amdgpu: add function to init CP microcode
Add an common function to init CP related microcode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Evan Quan
7436538899 drm/amdgpu: avoid gfx register accessing during gfxoff
Make sure gfxoff is disabled before gfx register accessing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Lijo Lazar
bb66ecbf12 drm/amdgpu: Use simplified API for p2p dist calc
Use the simpified API that calculates distance between two devices.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Lijo Lazar
d0fa84f174 drm/amdgpu: Disable verbose for p2p dist calc
Disable verbose while getting p2p distance. With verbose, it shows
warning if ACS redirect is set between the devices. Adds noise
to dmesg logs when a few GPU devices are on the same platform.

Example log:

amdgpu 0000:34:00.0: ACS redirect is set between the client and provider (0000:31:00.0)
amdgpu 0000:34:00.0: to disable ACS redirect for this path, add the kernel parameter:
	pci=disable_acs_redir=0000:30:00.0;0000:2e:00.0;0000:33:00.0;0000:2e:10.0

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
YiPeng Chai
642c040113 drm/amdgpu: Fixed ras warning when uninstalling amdgpu
For the asic using smu v13_0_2, there is the following
warning when uninstalling amdgpu:
  amdgpu: ras disable gfx failed poison:1 ret:-22.

[Why]:
  For the asic using smu v13_0_2, the psp .suspend and
  mode1reset is called before amdgpu_ras_pre_fini during
  amdgpu uninstall, it has disabled all ras features and
  reset the psp. Since the psp is reset, calling
  amdgpu_ras_disable_all_features in amdgpu_ras_pre_fini
  to disable ras features will fail.

[How]:
  If all ras features are disabled, amdgpu_ras_disable_all_features
  will not be called to disable all ras features again.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Hawking Zhang
7c32d4e37f drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx11

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Hawking Zhang
39a35d52d4 drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx10

v2: squash in size validation fix (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Dave Airlie
e8573000f4 Merge tag 'amd-drm-next-6.1-2022-09-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.1-2022-09-23:

amdgpu:
- SDMA fix
- Add new firmware types to debugfs/IOCTL version queries
- Misc spelling and grammar fixes
- Misc code cleanups
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- CS cleanup
- Gang submit support
- Clang fixes
- Non-DC audio fix
- GPUVM locking fixes
- Vega10 PWN fan speed fix

amdkgd:
- MQD manager cleanup
- Misc spelling and grammar fixes

UAPI:
- Add new firmware types to the FW version query IOCTL

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923215729.6061-1-alexander.deucher@amd.com
2022-09-28 14:56:09 +10:00
Dave Airlie
907cc346ff Merge tag 'drm-misc-next-2022-09-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 6.1:

UAPI Changes:

Cross-subsystem Changes:
  - dma-buf: Improve signaling when debugging

Core Changes:
  - Backlight handling improvements
  - format-helper: Add drm_fb_build_fourcc_list()
  - fourcc: Kunit tests improvements
  - modes: Add DRM_MODE_INIT() macro
  - plane: Remove drm_plane_init(), Allocate planes with drm_universal_plane_alloc()
  - plane-helper: Add drm_plane_helper_atomic_check()
  - probe-helper: Add drm_connector_helper_get_modes_fixed() and
    drm_crtc_helper_mode_valid_fixed()
  - tests: Conversion to parametrized tests, test name consistency

Driver Changes:
  - amdgpu: Fix for a VRAM eviction issue
  - ast: Resolution handling improvements
  - mediatek: small code improvements for DP
  - omap: Refcounting fix, small improvements
  - rockchip: RK3568 support, Gamma support for RK3399
  - sun4i: Build failure fix when !OF
  - udl: Multiple fixes here and there
  - vc4: HDMI hotplug handling improvements
  - vkms: Warning fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923073943.d43tne5hni3iknlv@houat
2022-09-28 13:50:46 +10:00
Hawking Zhang
f6f8bb5989 drm/amdgpu/gfx9: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx9

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:02:39 -04:00
Hawking Zhang
5b41521268 drm/amdgpu: add helper to init rlc firmware
To initialzie rlc firmware according to rlc
firmware header version

v2: squash in backwards compat fix

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:02:38 -04:00
Arunpravin Paneer Selvam
39dd0cc2e5 drm/amdgpu: Fix VRAM eviction issue
A user reported that when he starts a game (MTGA) with wine,
he observed an error msg failed to pin framebuffer with error -12.
Found an issue with the VRAM mem type eviction decision condition
logic. This patch will fix the if condition code error.

Gitlab bug link:
https://gitlab.freedesktop.org/drm/amd/-/issues/2159

Fixes: ded910f368 ("drm/amdgpu: Implement intersect/compatible functions")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220922151447.265696-1-Arunpravin.PaneerSelvam@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2022-09-22 19:53:06 +02:00
Hawking Zhang
435d6e6f02 drm/amdgpu: add helper to init rlc fw in header v2_4
To initialize rlc firmware in header v2_4

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:27 -04:00
Hawking Zhang
a0d9084d7f drm/amdgpu: add helper to init rlc fw in header v2_3
To initialize rlc firmware in header v2_3

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:21 -04:00
Hawking Zhang
a97d0ec8bb drm/amdgpu: add helper to init rlc fw in header v2_2
To initialize rlc firmware in header v2_2

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:15 -04:00