1. For Navi12, CHIP_SIENNA_CICHLID, skip tmr load operation;
2. Check pointer before release firmware.
v2: use CHIP_SIENNA_CICHLID instead
v3: remove local "bool ret"; fix grammer issue
v4: use my name instead of "root"
v5: fix grammer issue and indent issue
Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.
During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.
v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.
v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;
[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249] dump_stack+0x98/0xd5
[ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833] free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831] ksys_ioctl+0x98/0xb0
[ 1230.204004] __x64_sys_ioctl+0x1a/0x20
[ 1230.205174] do_syscall_64+0x5f/0x250
[ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe
2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.
v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset
v5:
1. Fix some style issues.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: Luben Tukov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
TMR is required to be destoried with GFX_CMD_ID_DESTROY_TMR while the
system goes to suspend. Otherwise, PSP may return the failure state
(0xFFFF007) on Gfx-2-PSP command GFX_CMD_ID_SETUP_TMR after do multiple
times suspend/resume.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The release_firmware() function is NULL tolerant so we do not need
to check for NULL param before calling it.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guest VM issue the PSP clear_vf_fw command at 2 points:
1.On VF driver loading, after VF message PSP to setup rings,
the next command is “clear_vf_fw”
2.On VF driver unload before VF message to
destroy rings
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Ack-by: Monk.liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change memory training init and finit a common function, as it only have
software behavior do not relay on the IP version of PSP.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
get_hive_id/get_node_id/get_topology_info/set_topology_info
are common xgmi command supported by TA for all the ASICs
that support xgmi link. They should be implemented as common
helper functions to avoid duplicated code per IP generation
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
driver already had psp_firmware_header struture to
deal with different layout of sos ucode. the sos
micorcode initialization could be common one.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
vmr ring is dedicated for sriov vf (i.e.guest driver
in sriov), which is general communication interface
between driver and psp fw accross all ip version.
it is not correct to make it as ip specific callback.
it is even worse to check specific tOS version per IP
version (like psp_v11/v12).
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Added dedicated function to check if particular fw should be skipped from loading.
Added dedicated function for SMU FW loading via PSP
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace dev_warn() with dev_info() and note that they are
optional to avoid confusing users.
The RAS TAs only exist on server boards and the HDCP and DTM
TAs only exist on client boards. They are optional either way.
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The buffer used when calling psp is a shared buffer. If we have multiple calls
at the same time we can overwrite the buffer.
[How]
Add mutex to guard the shared buffer.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As the VCN firmware will not use
vf vmr now. And new psp policy won't support set tmr
now.
For driver compatible issue, ignore the not support error.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
invoking an error injection successfully will cause an at_event intterrupt that
will occur before the invoke sequence can complete causing an invalid error
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
During GU reset PSP's sysfs was being wrongly reinitilized
during call to amdgpu_device_ip_late_init which was failing
with duplicate error.
Fix:
Move psp_sysfs_init to psp_sw_init to avoid this. Add guards
in sysfs file's read and write hook agains premature call
if PSP is not finished initialization.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Starts USBC PD FW download and reads back the latest FW version.
v2:
Move sysfs file creation to late init
Add locking around PSP calls to avoid concurrent access to PSP's C2P registers
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For those ASICs with DF Cstate management centralized to PMFW,
TMR setup should be performed between pmfw loading and other
non-psp firmwares loading.
V2: skip possible SMU firmware reloading
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1)
we shouldn't load PSP kdb and sys/sos for VF, they are
supposed to be handled by hypervisor
2)
ih reroute doesn't work on VF thus we should avoid calling
it, besides VF should not use those PSP register sets for PF
3)
shouldn't load SMU ucode under SRIOV, otherwise PSP would report
error
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
there was a type in the terminate command.
We should be calling psp_dtm_unload() instead of psp_hdcp_unload()
Fixes: 143f230533 ("drm/amdgpu: psp DTM init")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For sriov, psp ip block has to be initialized before
ih block for the dynamic register programming interface
that needed for vf ih ring buffer. On the other hand,
current psp ip block hw_init function will initialize
xgmi session which actaully depends on interrupt to
return session context. This results an empty xgmi ta
session id and later failures on all the xgmi ta cmd
invoked from vf. xgmi ta session initialization has to
be done after ih ip block hw_init call.
to unify xgmi session init/fini for both bare-metal
sriov virtualization use scenario, move xgmi ta init
to xgmi_add_device call, and accordingly terminate xgmi
ta session in xgmi_remove_device call.
The existing suspend/resume sequence will not be changed.
v2: squash in return fix from Nirmoy
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Before, initialization of smu ip block would be skipped
for sriov ASICs. But if there's only one VF being used,
guest driver should be able to dump some HW info such as
clks, temperature,etc.
To solve this, now after onevf mode is enabled, host
driver will notify guest. If it's onevf mode, guest will
do smu hw_init and skip some steps in normal smu hw_init
flow because host driver has already done it for smu.
With this fix, guest app can talk with smu and dump hw
information from smu.
v2: refine the logic for pm_enabled.Skip hw_init by not
changing pm_enabled.
v3: refine is_support_sw_smu and fix some indentation
issue.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>