Commit Graph

11170 Commits

Author SHA1 Message Date
Claudio Suarez
3c02193102 drm/amdgpu: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
Once EDID is parsed, the monitor HDMI support information is available
through drm_display_info.is_hdmi. The amdgpu driver still calls
drm_detect_hdmi_monitor() to retrieve the same information, which
is less efficient. Change to drm_display_info.is_hdmi

This is a TODO task in Documentation/gpu/todo.rst

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Claudio Suarez <cssk@net-c.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07 13:13:07 -05:00
Claudio Suarez
20543be93c drm/amdgpu: update drm_display_info correctly when the edid is read
drm_display_info is updated by drm_get_edid() or
drm_connector_update_edid_property(). In the amdgpu driver it is almost
always updated when the edid is read in amdgpu_connector_get_edid(),
but not always.  Change amdgpu_connector_get_edid() and
amdgpu_connector_free_edid() to keep drm_display_info updated.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Claudio Suarez <cssk@net-c.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07 13:12:58 -05:00
Stanley.Yang
cf63b70272 drm/amdgpu: skip umc ras error count harvest
remove in recovery stat check, skip umc ras err cnt
harvest in amdgpu_ras_log_on_err_counter

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07 13:12:19 -05:00
Flora Cui
30c1e39197 drm/amdgpu: free vkms_output after use
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07 13:12:13 -05:00
Flora Cui
f7ed3f90b2 drm/amdgpu: drop the critial WARN_ON in amdgpu_vkms
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07 13:12:05 -05:00
Stanley.Yang
aed1faab9d drm/amdgpu: only skip get ecc info for aldebaran
skip get ecc info for aldebarn through check ip version
do not affect other asic type

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07 13:07:55 -05:00
Zhou Qingyang
b220110e4c drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()
In amdgpu_connector_lcd_native_mode(), the return value of
drm_mode_duplicate() is assigned to mode, and there is a dereference
of it in amdgpu_connector_lcd_native_mode(), which will lead to a NULL
pointer dereference on failure of drm_mode_duplicate().

Fix this bug add a check of mode.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_DRM_AMDGPU=m show no new warnings, and
our static analyzer no longer warns about this code.

Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-02 12:43:31 -05:00
Alex Deucher
baf3f8f374 drm/amdgpu: handle SRIOV VCN revision parsing
For SR-IOV, the IP discovery revision number encodes
additional information.  Handle that case here.

v2: drop additional IP versions

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-02 12:43:25 -05:00
Stanley.Yang
bab73f092d drm/amdgpu: skip query ecc info in gpu recovery
this is a workaround due to get ecc info failed during gpu recovery

[  700.236122] amdgpu 0000:09:00.0: amdgpu: Failed to export SMU ecc table!
[  700.236128] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.331171] amdgpu: qcm fence wait loop timeout expired
[  704.331194] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
[  704.332445] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.332448] amdgpu 0000:09:00.0: amdgpu: Bailing on TDR for s_job:ffffffffffffffff, as another already in progress
[  704.332456] amdgpu: Pasid 0x8000 destroy queue 0 failed, ret -62
[  710.360924] amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000013 SMN_C2PMSG_82:0x00000007
[  710.360964] amdgpu 0000:09:00.0: amdgpu: Failed to disable smu features.
[  710.361002] amdgpu 0000:09:00.0: amdgpu: Fail to disable dpm features!
[  710.361014] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-02 12:43:06 -05:00
shaoyunl
428890a3fe drm/amdgpu: adjust the kfd reset sequence in reset sriov function
This change revert previous commits:
9f4f2c1a35 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
271fd38ce5 ("drm/amdgpu: move kfd post_reset out of reset_sriov function")

This change moves the amdgpu_amdkfd_pre_reset to an earlier place
in amdgpu_device_reset_sriov, presumably to address the sequence issue
that the first patch was originally meant to fix.

Some register access(GRBM_GFX_CNTL) only be allowed on full access
mode. Move kfd_pre_reset and  kfd_post_reset back inside reset_sriov
function.

Fixes: 9f4f2c1a35 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
Fixes: 271fd38ce5 ("drm/amdgpu: move kfd post_reset out of reset_sriov function")
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 17:09:30 -05:00
Philip Yang
494f2e42ce drm/amdkfd: fix double free mem structure
drm_gem_object_put calls release_notify callback to free the mem
structure and unreserve_mem_limit, move it down after the last access
of mem and make it conditional call.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 17:08:00 -05:00
Lijo Lazar
e0570f0b6e drm/amdgpu: Don't halt RLC on GFX suspend
On aldebaran, RLC also controls GFXCLK. Skip halting RLC during GFX IP suspend
and keep it running till PMFW disables all DPMs.

    [  578.019986] amdgpu 0000:23:00.0: amdgpu: GPU reset begin!
    [  583.245566] amdgpu 0000:23:00.0: amdgpu: Failed to disable smu features.
    [  583.245621] amdgpu 0000:23:00.0: amdgpu: Fail to disable dpm features!
    [  583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
    [  583.248504] [drm] free PSP TMR buffer

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 17:02:40 -05:00
Guchun Chen
7551f70ab9 drm/amdgpu: fix the missed handling for SDMA2 and SDMA3
There is no base reg offset or ip_version set for SDMA2
and SDMA3 on SIENNA_CICHLID, so add them.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 17:00:55 -05:00
Flora Cui
1053b9c948 drm/amdgpu: check atomic flag to differeniate with legacy path
since vkms support atomic KMS interface

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <aleander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:59:38 -05:00
Flora Cui
3e467e478e drm/amdgpu: cancel the correct hrtimer on exit
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:58:52 -05:00
Jane Jian
da3b36a23b drm/amdgpu/sriov/vcn: add new vcn ip revision check case for SIENNA_CICHLID
[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
add new version check for vcn0 disabled revision(3, 0, 192), typically modified under
sriov mode

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:58:11 -05:00
Yann Dirson
ddb267b66a drm/amdgpu: update fw_load_type module parameter doc to match code
amdgpu_ucode_get_load_type() does not interpret this parameter as
documented.  It is ignored for many ASIC types (which presumably
only support one load_type), and when not ignored it is only used
to force direct loading instead of PSP loading.  SMU loading is
only available for ASICs for which the parameter is ignored.

Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:16:06 -05:00
Philip Yang
a899fe8b43 drm/amdkfd: err_pin_bo path leaks kfd_bo_list
Refactor userptr and pin_bo path to make it less confusing, move
err_pin_bo label up to remove mem from process_info kfd_bo_list.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:16:00 -05:00
shaoyunl
992110d747 drm/amdgpu: adjust the kfd reset sequence in reset sriov function
This change revert previous commits:
9f4f2c1a35 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
271fd38ce5 ("drm/amdgpu: move kfd post_reset out of reset_sriov function")

This change moves the amdgpu_amdkfd_pre_reset to an earlier place
in amdgpu_device_reset_sriov, presumably to address the sequence issue
that the first patch was originally meant to fix.

Some register access(GRBM_GFX_CNTL) only be allowed on full access
mode. Move kfd_pre_reset and  kfd_post_reset back inside reset_sriov
function.

Fixes: 9f4f2c1a35 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
Fixes: 271fd38ce5 ("drm/amdgpu: move kfd post_reset out of reset_sriov function")
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:07:33 -05:00
Philip Yang
a872c152fd drm/amdkfd: fix double free mem structure
drm_gem_object_put calls release_notify callback to free the mem
structure and unreserve_mem_limit, move it down after the last access
of mem and make it conditional call.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:07:13 -05:00
Lijo Lazar
81d104f4af drm/amdgpu: Don't halt RLC on GFX suspend
On aldebaran, RLC also controls GFXCLK. Skip halting RLC during GFX IP suspend
and keep it running till PMFW disables all DPMs.

    [  578.019986] amdgpu 0000:23:00.0: amdgpu: GPU reset begin!
    [  583.245566] amdgpu 0000:23:00.0: amdgpu: Failed to disable smu features.
    [  583.245621] amdgpu 0000:23:00.0: amdgpu: Fail to disable dpm features!
    [  583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
    [  583.248504] [drm] free PSP TMR buffer

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:04:23 -05:00
Lijo Lazar
fe9c5c9aff drm/amdgpu: Use MAX_HWIP instead of HW_ID_MAX
HW_ID_MAX considers HWID of all IPs, far more than what amdgpu uses.
amdgpu tracks only the IPs defined by amd_hw_ip_block_type whose max
is MAX_HWIP.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:04:10 -05:00
Guchun Chen
3700169886 drm/amdgpu: fix the missed handling for SDMA2 and SDMA3
There is no base reg offset or ip_version set for SDMA2
and SDMA3 on SIENNA_CICHLID, so add them.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:04:02 -05:00
Guchun Chen
6c18ecefab drm/amdgpu: declare static function to fix compiler warning
>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:503:6: warning: no previous prototype for function 'release_psp_cmd_buf' [-Wmissing-prototypes]
   void release_psp_cmd_buf(struct psp_context *psp)
        ^
   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:503:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void release_psp_cmd_buf(struct psp_context *psp)
   ^
   static
   1 warning generated.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:03:56 -05:00
Philip Yang
3c2d6ea279 drm/amdgpu: handle IH ring1 overflow
IH ring1 is used to process GPU retry fault, overflow is enabled to
drain retry fault because we want receive other interrupts while
handling retry fault to recover range. There is no overflow flag set
when wptr pass rptr. Use timestamp of rptr and wptr to handle overflow
and drain retry fault.

If fault timestamp goes backward, the fault is filtered and should not
be processed. Drain fault is finished if processed_timestamp is equal to
or larger than checkpoint timestamp.

Add amdgpu_ih_functions interface decode_iv_ts for different chips to
get timestamp from IV entry with different iv size and timestamp offset.
amdgpu_ih_decode_iv_ts_helper is used for vega10, vega20, navi10.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:03:34 -05:00
Stanley.Yang
232d1d43b5 drm/amdgpu: fix disable ras feature failed when unload drvier v2
v2:
    still need call ras_disable_all_featrures to handle
    ras initilization failure case.

Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw,
so ras ta will unload before send ras disable command, ras dsiable operation
must before hw fini.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:03:22 -05:00
Flora Cui
700de2c8aa drm/amdgpu: check atomic flag to differeniate with legacy path
since vkms support atomic KMS interface

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <aleander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:03:03 -05:00
Flora Cui
deefd07eed drm/amdgpu: fix vkms crtc settings
otherwise adev->mode_info.crtcs[] is NULL

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:02:57 -05:00
Flora Cui
4f7ee199d9 drm/amdgpu: cancel the correct hrtimer on exit
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:02:46 -05:00
Jane Jian
981b304546 drm/amdgpu/sriov/vcn: add new vcn ip revision check case for SIENNA_CICHLID
[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
add new version check for vcn0 disabled revision(3, 0, 192), typically modified under
sriov mode

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:02:04 -05:00
Javier Martinez Canillas
6a2d2ddf2c drm: Move nomodeset kernel parameter to the DRM subsystem
The "nomodeset" kernel cmdline parameter is handled by the vgacon driver
but the exported vgacon_text_force() symbol is only used by DRM drivers.

It makes much more sense for the parameter logic to be in the subsystem
of the drivers that are making use of it.

Let's move the vgacon_text_force() function and related logic to the DRM
subsystem. While doing that, rename it to drm_firmware_drivers_only() and
make it return true if "nomodeset" was used and false otherwise. This is
a better description of the condition that the drivers are testing for.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211112133230.1595307-4-javierm@redhat.com
2021-11-27 13:52:22 +01:00
Javier Martinez Canillas
35f7775f81 drm: Don't print messages if drivers are disabled due nomodeset
The nomodeset kernel parameter handler already prints a message that the
DRM drivers will be disabled, so there's no need for drivers to do that.

Suggested-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211112133230.1595307-2-javierm@redhat.com
2021-11-27 13:52:16 +01:00
Alex Deucher
692cd92e66 drm/amd/display: update bios scratch when setting backlight
Update the bios scratch register when updating the backlight
level.  Some platforms apparently read this scratch register
and do additional operations in their hotkey handlers.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1518
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:14:36 -05:00
Lijo Lazar
57961c4c18 drm/amdgpu: Skip ASPM programming on aldebaran
There is no need for additional programming, keep the default settings.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:14:03 -05:00
Yang Wang
fd08953b2d drm/amdgpu: fix byteorder error in amdgpu discovery
fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:13:54 -05:00
Philip Yang
c4ef8a73bf drm/amdgpu: enable Navi retry fault wptr overflow
If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
will be full quickly. IH cannot receive other interrupts, this causes
deadlock if migrating buffer using sdma and waiting for sdma done
while handling retry fault.

Remove VMC from IH storm client, enable ring1 write pointer
overflow, then IH will drop retry fault interrupts and be able to receive
other interrupts while driver is handling retry fault.

IH ring1 write pointer doesn't writeback to memory by IH, and ring1
write pointer recorded by self-irq is not updated, so always read
the latest ring1 write pointer from register.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:13:10 -05:00
Philip Yang
8888e2fe9c drm/amdgpu: enable Navi 48-bit IH timestamp counter
By default this timestamp is 32 bit counter. It gets overflowed in
around 10 minutes.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:12:55 -05:00
Philip Yang
4d62555f62 drm/amdgpu: IH process reset count when restart
Otherwise when IH process restart, count is zero, the loop will
not exit to wake_up_all after processing AMDGPU_IH_MAX_NUM_IVS
interrupts.

Cc: stable@vger.kernel.org
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:09:46 -05:00
Alex Deucher
53af98c091 drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
Renoir and newer gfx9 APUs have new TSC register that is
not part of the gfxoff tile, so it can be read without
needing to disable gfx off.

Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:09:17 -05:00
Alex Deucher
244ee39885 drm/amdgpu/gfx10: add wraparound gpu counter check for APUs as well
Apply the same check we do for dGPUs for APUs as well.

Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:09:09 -05:00
shaoyunl
271fd38ce5 drm/amdgpu: move kfd post_reset out of reset_sriov function
Fixes: 9f4f2c1a35 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")

For sriov XGMI  configuration, the host driver will handle the hive reset,
so in guest side, the reset_sriov only be called once on one device. This will
make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already
been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov
function to make them balance .

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:08:58 -05:00
xinhui pan
4eb6bb649f drm/amdgpu: Fix double free of dmabuf
amdgpu_amdkfd_gpuvm_free_memory_of_gpu drop dmabuf reference increased in
amdgpu_gem_prime_export.
amdgpu_bo_destroy drop dmabuf reference increased in
amdgpu_gem_prime_import.

So remove this extra dma_buf_put to avoid double free.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Tested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:04:05 -05:00
Felix Kuehling
d3a21f7e35 drm/amdgpu: Fix MMIO HDP flush on SRIOV
Disable HDP register remapping on SRIOV and set rmmio_remap.reg_offset
to the fixed address of the VF register for hdp_v*_flush_hdp.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 15:02:25 -05:00
Alex Deucher
1f57925493 drm/amd/display: update bios scratch when setting backlight
Update the bios scratch register when updating the backlight
level.  Some platforms apparently read this scratch register
and do additional operations in their hotkey handlers.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1518
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:54 -05:00
Lijo Lazar
6ff53495ce drm/amdgpu: Skip ASPM programming on aldebaran
There is no need for additional programming, keep the default settings.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00
Yang Wang
cc7818d709 drm/amdgpu: fix byteorder error in amdgpu discovery
fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00
Philip Yang
23eb49251b drm/amdgpu: enable Navi retry fault wptr overflow
If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
will be full quickly. IH cannot receive other interrupts, this causes
deadlock if migrating buffer using sdma and waiting for sdma done
while handling retry fault.

Remove VMC from IH storm client, enable ring1 write pointer
overflow, then IH will drop retry fault interrupts and be able to receive
other interrupts while driver is handling retry fault.

IH ring1 write pointer doesn't writeback to memory by IH, and ring1
write pointer recorded by self-irq is not updated, so always read
the latest ring1 write pointer from register.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00
Philip Yang
71ee9236ab drm/amdgpu: enable Navi 48-bit IH timestamp counter
By default this timestamp is 32 bit counter. It gets overflowed in
around 10 minutes.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00
Philip Yang
514f4a99c7 drm/amdgpu: IH process reset count when restart
Otherwise when IH process restart, count is zero, the loop will
not exit to wake_up_all after processing AMDGPU_IH_MAX_NUM_IVS
interrupts.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00
Evan Quan
1223c15c78 drm/amdgpu: update the domain flags for dumb buffer creation
After switching to generic framebuffer framework, we rely on the
->dumb_create routine for frame buffer creation. However, the
different domain flags used are not optimal. Add the contiguous
flag to directly allocate the scanout BO as one linear buffer.

Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00