The rlc version of raven_kicer_rlc is different from the legacy rlc
version of raven_rlc. So it needs to add a judgement function for
raven_kicer_rlc and avoid disable GFXOFF when loading raven_kicer_rlc.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When NBIO's RAS error happens, before trigging GPU reset, it's needed
to record error counter information, which can correct the error counter
value missed issue when reading from debugfs.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Once sync flood interrupt is triggered by RAS error, before
actual GPU recovery job, it's necessary to log on and print
non-zero error counter, this will help user knows where the
RAS error source is from quickly.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The rlc version of raven_kicer_rlc is different from the legacy rlc
version of raven_rlc. So it needs to add a judgement function for
raven_kicer_rlc and avoid disable GFXOFF when loading raven_kicer_rlc.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lyude needs some patches in 5.6-rc2 and we didn't bring drm-misc-next
forward yet, so it looks like a good occasion.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
there was a type in the terminate command.
We should be calling psp_dtm_unload() instead of psp_hdcp_unload()
Fixes: 143f230533 ("drm/amdgpu: psp DTM init")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The copy_to_user() function returns the number of bytes remaining to be
copied, but we want to return a negative error code to the user.
Fixes: 030d5b97a5 ("drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_vram_read")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we readback all ones. Fixes rlc counter
readback while gfxoff is active.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we readback all ones. Fixes rlc counter
readback while gfxoff is active.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's 25 Mhz (refclk / 4). This fixes the interpretation
of the rlc clock counter.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
there was a type in the terminate command.
We should be calling psp_dtm_unload() instead of psp_hdcp_unload()
Fixes: 143f230533 ("drm/amdgpu: psp DTM init")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Seems to work reliably on VI+ except for a few so enable runpm barring
those where baco for runtime power management is not supported.
[rajneesh] Picked https://patchwork.freedesktop.org/patch/335402/ to
enable runtime pm with baco for kfd. Also fixed a checkpatch warning and
added extra checks for VEGA20 and ARCTURUS.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.
However, in an ideal case, when gpu is runtime suspended the kfd driver
should be able to:
- auto resume amdgpu driver whenever a client requests compute service
- prevent runtime suspend for amdgpu while kfd is in use
This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.
Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix warning during switching to dpg pause mode for
VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 16
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GDS clear workaround will cause gfx failure in suspend/resume case.
[ 98.679559] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <gfx_v9_0> failed -110
[ 98.679561] PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 98.679562] PM: Device 0000:03:00.0 failed to resume async: error -110
As this workaround is specific to the HW bug of GDS's ECC error
existing in cold boot up, so bypass this workaround in suspend/
resume case after booting up.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
hwc->conf was designated specifically for AMD APU IOMMU purposes. This
could cause problems in performance and/or function since APU IOMMU
implementation is elsewhere. Also hwc->conf and hwc->config are
different members of an anonymous union so hwc->conf aliases as
hw->last_tag.
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support pause_state for multiple instance, and it will fix vcn2.5 DPG mode
power off issue on instance 1.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix warning during switching to dpg pause mode for
VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 16
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GDS clear workaround will cause gfx failure in suspend/resume case.
[ 98.679559] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <gfx_v9_0> failed -110
[ 98.679561] PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 98.679562] PM: Device 0000:03:00.0 failed to resume async: error -110
As this workaround is specific to the HW bug of GDS's ECC error
existing in cold boot up, so bypass this workaround in suspend/
resume case after booting up.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull drm fixes from Dave Airlie:
"Just some fixes for this merge window: the tegra changes fix some
regressions in the merge, nouveau has a few modesetting fixes.
The amdgpu fixes are bit bigger, but they contain a couple of weeks of
fixes, and don't seem to contain anything that isn't really a fix.
Summary:
tegra:
- merge window regression fixes
nouveau:
- couple of volta/turing modesetting fixes
amdgpu:
- EDC fixes for Arcturus
- GDDR6 memory training fixe
- Fix for reading gfx clockgating registers while in GFXOFF state
- i2c freq fixes
- Misc display fixes
- TLB invalidation fix when using semaphores
- VCN 2.5 instancing fixes
- Switch raven1 gfxoff to a blacklist
- Coreboot workaround for KV/KB
- Root cause dongle fixes for display and revert workaround
- Enable GPU reset for renoir and navi
- Navi overclocking fixes
- Fix up confusing warnings in display clock validation on raven
amdkfd:
- SDMA fix
radeon:
- Misc LUT fixes"
* tag 'drm-next-2020-02-07' of git://anongit.freedesktop.org/drm/drm: (90 commits)
gpu: host1x: Set DMA direction only for DMA-mapped buffer objects
drm/tegra: Reuse IOVA mapping where possible
drm/tegra: Relax IOMMU usage criteria on old Tegra
drm/amd/dm/mst: Ignore payload update failures
drm/amdgpu: update default voltage for boot od table for navi1x
drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
drm/amdgpu: fetch default VDDC curve voltages (v2)
drm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2)
drm/amdgpu/navi10: add OD_RANGE for navi overclocking
drm/amdgpu/navi: fix index for OD MCLK
drm/amd/display: Fix HW/SW state mismatch
drm/amd/display: Fix a typo when computing dsc configuration
drm/amd/powerplay: fix navi10 system intermittent reboot issue V2
drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
drm/amd/display: Only enable cursor on pipes that need it
drm/nouveau/kms/gv100-: avoid sending a core update until the first modeset
drm/nouveau/kms/gv100-: move window ownership setup into modesetting path
drm/nouveau/disp/gv100-: halt NV_PDISP_FE_RM_INTR_STAT_CTRL_DISP_ERROR storms
...
hwc->conf was designated specifically for AMD APU IOMMU purposes. This
could cause problems in performance and/or function since APU IOMMU
implementation is elsewhere. Also hwc->conf and hwc->config are
different members of an anonymous union so hwc->conf aliases as
hw->last_tag.
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support pause_state for multiple instance, and it will fix vcn2.5 DPG mode
power off issue on instance 1.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For sriov and pp_onevf_mode, do not send message to set smu
status, because smu doesn't support these messages under VF.
Besides, it should skip smu_suspend when pp_onevf_mode is disabled.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For sriov, psp ip block has to be initialized before
ih block for the dynamic register programming interface
that needed for vf ih ring buffer. On the other hand,
current psp ip block hw_init function will initialize
xgmi session which actaully depends on interrupt to
return session context. This results an empty xgmi ta
session id and later failures on all the xgmi ta cmd
invoked from vf. xgmi ta session initialization has to
be done after ih ip block hw_init call.
to unify xgmi session init/fini for both bare-metal
sriov virtualization use scenario, move xgmi ta init
to xgmi_add_device call, and accordingly terminate xgmi
ta session in xgmi_remove_device call.
The existing suspend/resume sequence will not be changed.
v2: squash in return fix from Nirmoy
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If provided we only sync to the BOs reservation
object and no longer to the root PD.
v2: update comment, cleanup amdgpu_bo_sync_wait_resv
v3: use correct reservation object while clearing
v4: fix typo in amdgpu_bo_sync_wait_resv
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For unlocked page table updates we need to be able
to sync to fences of a specific VM.
v2: use SYNC_ALWAYS in the UVD code
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clang warns:
../drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c:967:35: warning: implicit
conversion from enumeration type 'enum amdgpu_ras_block' to different
enumeration type 'enum ta_ras_block' [-Wenum-conversion]
block_info.block_id = info->head.block;
~ ~~~~~~~~~~~^~~~~
1 warning generated.
Use the function added in commit 828cfa2909 ("drm/amdgpu: Fix amdgpu
ras to ta enums conversion") that handles this conversion explicitly.
Fixes: 4c461d89db ("drm/amdgpu: add RAS support for the gfx block of Arcturus")
Link: https://github.com/ClangBuiltLinux/linux/issues/849
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>