The GPU reset function of raven2 is not maintained or tested, so it should be
very unstable.
Now the amdgpu_asic_reset function is added to amdgpu_pmops_suspend, which
causes the S3 test of raven2 to fail, so the asic_reset of raven2 is ignored
here.
Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Chen Gong <curry.gong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clang warns:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_v9.c:267:3:
error: implicit conversion from enumeration type 'enum
mes_map_queues_extended_engine_sel_enum' to different enumeration type
'enum mes_unmap_queues_extended_engine_sel_enum'
[-Werror,-Wenum-conversion]
extended_engine_sel__mes_map_queues__sdma0_to_7_sel :
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Use 'extended_engine_sel__mes_unmap_queues__sdma0_to_7_sel' to eliminate
the warning, which is the same numeric value of the proper type.
Fixes: 009e9a1585 ("drm/amdkfd: navi2x requires extended engines to map and unmap sdma queues")
Link: https://github.com/ClangBuiltLinux/linux/issues/1596
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clang build fails with
amdgpu_ras.c:2416:7: error: variable 'ras_obj' is used uninitialized
whenever 'if' condition is true
if (adev->in_suspend || amdgpu_in_reset(adev)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
amdgpu_ras.c:2453:6: note: uninitialized use occurs here
if (ras_obj->ras_cb)
^~~~~~~
There is a logic error in the error handler's labels.
ex/ The sysfs: is the last goto label in the normal code but
is the middle of error handler. Rework the error handler.
cleanup: is the first error, so it's handler should be last.
interrupt: is the second error, it's handler is next. interrupt:
handles the failure of amdgpu_ras_interrupt_add_hander() by
calling amdgpu_ras_interrupt_remove_handler(). This is wrong,
remove() assumes the interrupt has been setup, not torn down by
add(). Change the goto label to cleanup.
sysfs is the last error, it's handler should be first. sysfs:
handles the failure of amdgpu_ras_sysfs_create() by calling
amdgpu_ras_sysfs_remove(). But when the create() fails there
is nothing added so there is nothing to remove. This error
handler is not needed. Remove the error handler and change
goto label to interrupt.
Fixes: b293e891b0 ("drm/amdgpu: add helper function to do common ras_late_init/fini (v3)")
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Needed by umr to detect if ip discovered ASIC is an APU or not.
(v2): Remove asic type from packet it's not strictly needed
(v3): Correct comment
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The `program_aspm` callback is already guarded for aspm, but the
`enable_aspm` callback doesn't follow the module parameter.
Update it to use the helper `amdgpu_device_should_use_aspm`.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication
that if one PCIe bridge with an AMD GPU connected doesn't support ASPM
then none of them do. This is an invalid assumption as the PCIe core will
configure ASPM for individual PCIe bridges.
Create a new helper function that can be called by individual dGPUs to
react to the `amdgpu_aspm` module parameter without having negative results
for other dGPUs on the PCIe bus.
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. Define amdgpu_ras_block_late_init_default in amdgpu_ras.c as
.ras_late_init common function, which is called when
.ras_late_init of ras block isn't initialized.
2. Remove the code of using amdgpu_ras_block_late_init to
initialize .ras_late_init in ras blocks.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. Move calling ras block instance members from module internal
function to the top calling xxx_ras_late_init.
2. Module internal function calls can only use parameter variables
of xxx_ras_late_init instead of ras block instance members.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Modify .ras_late_init function pointer parameter so that
it can remove redundant intermediate calls in some ras blocks.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- set DC version
- add construct/destroy dc clock management function
- register dcn interrupt handler
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>