Luben Tuikov
4a580877bd
drm/amdgpu: Get DRM dev from adev by inline-f
...
Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 13:06:06 -04:00
Luben Tuikov
1348969ab6
drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)
...
Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.
v2: Use a typed visible static inline function
instead of an invisible macro.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 13:06:06 -04:00
Prike.Liang
50166d1ce5
drm/amdgpu: enable HDP clock gatting
...
Enabe HDP SD/DS clock gatting in Renoir series.
Signed-off-by: Prike.Liang <Prike.Liang@amd.com >
Reviewed-by: Evan Quan <evan.quan@amd.com >
Reviewed-by: Huang Rui <ray.huang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 13:06:05 -04:00
Prike.Liang
d844812b28
drm/amdgpu: enable ATHUB clock gatting
...
Enable ATHUB clock gatting set in Renoir series.
Signed-off-by: Prike.Liang <Prike.Liang@amd.com >
Reviewed-by: Evan Quan <evan.quan@amd.com >
Reviewed-by: Huang Rui <ray.huang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 13:06:05 -04:00
Dennis Li
08ebb485f0
drm/amdgpu: annotate a false positive recursive locking
...
Re-apply commit 72e14ebf9f
[ 584.110304] ============================================
[ 584.110590] WARNING: possible recursive locking detected
[ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE
[ 584.111164] --------------------------------------------
[ 584.111456] kworker/38:1/553 is trying to acquire lock:
[ 584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.112112]
but task is already holding lock:
[ 584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.113068]
other info that might help us debug this:
[ 584.113689] Possible unsafe locking scenario:
[ 584.114350] CPU0
[ 584.114685] ----
[ 584.115014] lock(&adev->reset_sem);
[ 584.115349] lock(&adev->reset_sem);
[ 584.115678]
*** DEADLOCK ***
[ 584.116624] May be due to missing lock nesting notation
[ 584.117284] 4 locks held by kworker/38:1/553:
[ 584.117616] #0 : ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630
[ 584.117967] #1 : ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[ 584.118358] #2 : ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[ 584.118786] #3 : ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.119222]
stack backtrace:
[ 584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G OE 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[ 584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[ 584.121638] Call Trace:
[ 584.122050] dump_stack+0x98/0xd5
[ 584.122499] __lock_acquire+0x1139/0x16e0
[ 584.122931] ? trace_hardirqs_on+0x3b/0xf0
[ 584.123358] ? cancel_delayed_work+0xa6/0xc0
[ 584.123771] lock_acquire+0xb8/0x1c0
[ 584.124197] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.124599] down_write+0x49/0x120
[ 584.125032] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.125472] amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.125910] ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[ 584.126367] amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[ 584.126789] process_one_work+0x29e/0x630
[ 584.127208] worker_thread+0x3c/0x3f0
[ 584.127621] ? __kthread_parkme+0x61/0x90
[ 584.128014] kthread+0x12f/0x150
[ 584.128402] ? process_one_work+0x630/0x630
[ 584.128790] ? kthread_park+0x90/0x90
[ 584.129174] ret_from_fork+0x3a/0x50
Each adev has owned lock_class_key to avoid false positive
recursive locking.
v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning
[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210
v3:
change to use down_write_nest_lock to annotate the false dead-lock
warning.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch >
Signed-off-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 13:05:52 -04:00
Dennis Li
d95e8e97e2
drm/amdgpu: refine create and release logic of hive info
...
Change to dynamically create and release hive info object,
which help driver support more hives in the future.
v2:
Change to save hive object pointer in adev, to avoid locking
xgmi_mutex every time when calling amdgpu_get_xgmi_hive.
v3:
1. Change type of hive object pointer in adev from void* to
amdgpu_hive_info*.
2. remove unnecessary variable initialization.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 12:24:14 -04:00
Dennis Li
aac891685d
drm/amdgpu: refine message print for devices of hive
...
Using dev_xxx instead of DRM_xxx/pr_xxx to indicate which device
of a hive is the message for.
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 12:24:06 -04:00
Dennis Li
cbfd17f7ba
drm/amdgpu: fix the nullptr issue when reenter GPU recovery
...
in single gpu system, if driver reenter gpu recovery,
amdgpu_device_lock_adev will return false, but hive is
nullptr now.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 12:23:54 -04:00
Dennis Li
6049db43d6
drm/amdgpu: change reset lock from mutex to rw_semaphore
...
clients don't need reset-lock for synchronization when no
GPU recovery.
v2:
change to return the return value of down_read_killable.
v3:
if GPU recovery begin, VF ignore FLR notification.
Reviewed-by: Monk Liu <monk.liu@amd.com >
Acked-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 12:23:48 -04:00
Lukas Bulwahn
5049a05269
drm/amd/display: remove unintended executable mode
...
Besides the intended change, commit 4cc1178e16 ("drm/amdgpu: replace DRM
prefix with PCI device info for gfx/mmhub") also set the source files
mmhub_v1_0.c and gfx_v9_4.c to be executable, i.e., changed fromold mode
644 to new mode 755.
Commit 241b2ec931 ("drm/amd/display: Add dcn30 Headers (v2)") added the
four header files {dpcs,dcn}_3_0_0_{offset,sh_mask}.h as executable, i.e.,
mode 755.
Set to the usual modes for source and headers files and clean up those
mistakes. No functional change.
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 12:23:02 -04:00
Dennis Li
53b3f8f40e
drm/amdgpu: refine codes to avoid reentering GPU recovery
...
if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.
v2:
drop "? true : false" in the definition of amdgpu_in_reset
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-24 12:22:56 -04:00
Dave Airlie
ebb21aa188
drm/ttm: drop bus.size from bus placement.
...
This is always calculated the same, and only used in a couple of places.
Signed-off-by: Dave Airlie <airlied@redhat.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20200811074658.58309-2-airlied@gmail.com
2020-08-24 17:06:08 +10:00
Dave Airlie
098754fe3c
drm/ttm: init mem->bus in common code.
...
The drivers all do the same thing here.
Reviewed-by: Christian König <christian.koenig@amd.com > for both.
Signed-off-by: Dave Airlie <airlied@redhat.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20200811074658.58309-1-airlied@gmail.com
2020-08-24 17:00:48 +10:00
Gustavo A. R. Silva
df561f6688
treewide: Use fallthrough pseudo-keyword
...
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org >
2020-08-23 17:36:59 -05:00
Dave Airlie
ba9086a6df
Merge tag 'amd-drm-fixes-5.9-2020-08-20' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
...
amd-drm-fixes-5.9-2020-08-20:
amdgpu:
- Fixes for Navy Flounder
- Misc display fixes
- RAS fix
amdkfd:
- SDMA fix for renoir
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Alex Deucher <alexdeucher@gmail.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20200820041938.3928-1-alexander.deucher@amd.com
2020-08-21 10:17:52 +10:00
Jiansong Chen
da2446b66b
Revert "drm/amdgpu: disable gfxoff for navy_flounder"
...
This reverts commit 9c9b17a7d1 .
Newly released sdma fw (51.52) provides a fix for the issue.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com >
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-19 23:55:04 -04:00
Dave Airlie
485d41b092
Merge tag 'amd-drm-fixes-5.9-2020-08-12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
...
amd-drm-fixes-5.9-2020-08-12:
amdgpu:
- Fix allocation size
- SR-IOV fixes
- Vega20 SMU feature state caching fix
- Fix custom pptable handling
- Arcturus golden settings update
- Several display fixes
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Alex Deucher <alexdeucher@gmail.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20200813033610.4008-1-alexander.deucher@amd.com
2020-08-19 13:56:28 +10:00
Leo Liu
cdab4211f6
drm/amdgpu/jpeg: remove redundant check when it returns
...
Fix warning from kernel test robot
v2: remove the local variable as well
Signed-off-by: Leo Liu <leo.liu@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 18:22:16 -04:00
jqdeng
8e1d88f948
drm/amdgpu: Limit the error info print rate
...
Use function printk_ratelimit to limit the print rate.
Signed-off-by: jqdeng <Emily.Deng@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 18:22:10 -04:00
jqdeng
9a1cddd637
drm/amdgpu: Fix repeatly flr issue
...
Only for no job running test case need to do recover in
flr notification.
For having job in mirror list, then let guest driver to
hit job timeout, and then do recover.
Signed-off-by: jqdeng <Emily.Deng@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 18:22:02 -04:00
Jiansong Chen
332d790365
Revert "drm/amdgpu: disable gfxoff for navy_flounder"
...
This reverts commit ba4e049e63 .
Newly released sdma fw (51.52) provides a fix for the issue.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com >
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 18:20:34 -04:00
Luben Tuikov
9af5e21dac
drm/scheduler: Remove priority macro INVALID (v2)
...
Remove DRM_SCHED_PRIORITY_INVALID. We no longer
carry around an invalid priority and cut it off
at the source.
Backwards compatibility behaviour of AMDGPU CTX
IOCTL passing in garbage for context priority
from user space and then mapping that to
DRM_SCHED_PRIORITY_NORMAL is preserved.
v2: Revert "res" --> "r" and
"prio" --> "priority".
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 18:20:26 -04:00
Luben Tuikov
e2d732fdb7
drm/scheduler: Scheduler priority fixes (v2)
...
Remove DRM_SCHED_PRIORITY_LOW, as it was used
in only one place.
Rename and separate by a line
DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
as it represents a (total) count of said
priorities and it is used as such in loops
throughout the code. (0-based indexing is the
the count number.)
Remove redundant word HIGH in priority names,
and rename *KERNEL* to *HIGH*, as it really
means that, high.
v2: Add back KERNEL and remove SW and HW,
in lieu of a single HIGH between NORMAL and KERNEL.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 18:20:17 -04:00
Huang Rui
34174b89bf
drm/amdkfd: fix the wrong sdma instance query for renoir
...
Renoir only has one sdma instance, it will get failed once query the
sdma1 registers. So use switch-case instead of static register array.
Signed-off-by: Huang Rui <ray.huang@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 17:54:12 -04:00
Bhawanpreet Lakha
0a668aee0a
drm/amdgpu: parse ta firmware for navy_flounder
...
Use the same case as sienna_cichlid
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 17:53:50 -04:00
Guchun Chen
1a68d96f81
drm/amdgpu: fix NULL pointer access issue when unloading driver
...
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.
BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: 0002 [#1 ] SMP PTI
CPU: 11 PID: 1526 Comm: modprobe Tainted: G OE 5.6.0-guchchen #1
Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
RIP: 0010:down_write+0x15/0x40
Code: eb de e8 7e 17 72 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 89 fb e8 92
d8 ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 43 08 5b c3
RSP: 0018:ffffb1590386fcd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff85b2fcc2 RDI: 00000000000000a0
RBP: ffffb1590386fd30 R08: ffffffff85b2fcc2 R09: 000000000002b3c0
R10: ffff97a330618c40 R11: 00000000000005f6 R12: ffff97a3481beb40
R13: 00000000000000a0 R14: ffff97a3481beb40 R15: 0000000000000000
FS: 00007fb11a717540(0000) GS:ffff97a376cc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000004066d6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
simple_recursive_removal+0x63/0x370
? debugfs_remove+0x60/0x60
debugfs_remove+0x40/0x60
amdgpu_ras_fini+0x82/0x230 [amdgpu]
? __kernfs_remove.part.17+0x101/0x1f0
? kernfs_name_hash+0x12/0x80
amdgpu_device_fini+0x1c0/0x580 [amdgpu]
amdgpu_driver_unload_kms+0x3e/0x70 [amdgpu]
amdgpu_pci_remove+0x36/0x60 [amdgpu]
pci_device_remove+0x3b/0xb0
device_release_driver_internal+0xe5/0x1c0
driver_detach+0x46/0x90
bus_remove_driver+0x58/0xd0
pci_unregister_driver+0x29/0x90
amdgpu_exit+0x11/0x25 [amdgpu]
__x64_sys_delete_module+0x13d/0x210
do_syscall_64+0x5f/0x250
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 17:00:46 -04:00
Jiansong Chen
9c9b17a7d1
drm/amdgpu: disable gfxoff for navy_flounder
...
gfxoff is temporarily disabled for navy_flounder,
since at present the feature has broken some basic
amdgpu test.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-18 16:58:59 -04:00
Maxime Ripard
d85ddd1318
Merge v5.9-rc1 into drm-misc-next
...
Sam needs 5.9-rc1 to have dev_err_probe in to merge some patches.
Signed-off-by: Maxime Ripard <maxime@cerno.tech >
2020-08-18 14:14:25 +02:00
Kevin Wang
4444457450
drm/amdgpu: add condition check for trace_amdgpu_cs()
...
v1:
add trace event enabled check to avoid nop loop when submit multi ibs
in amdgpu_cs_ioctl() function.
v2:
add a new wrapper function to trace all amdgpu cs ibs.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-17 14:06:37 -04:00
Kevin Wang
736b172978
drm/amdgpu: fix amdgpu_bo_release_notify() comment error
...
fix amdgpu_bo_release_notify() comment error.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Reviewed-by: Dennis Li <Dennis.Li@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-17 14:06:28 -04:00
Huang Rui
d95c42a150
drm/amdkfd: fix the wrong sdma instance query for renoir
...
Renoir only has one sdma instance, it will get failed once query the
sdma1 registers. So use switch-case instead of static register array.
Signed-off-by: Huang Rui <ray.huang@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-17 14:06:13 -04:00
Alex Deucher
11043b7a99
drm/amdgpu: note what type of reset we are using
...
When we reset the GPU, note what type of reset will be
used. This makes debugging different reset scenarios
more clear as the driver may use different reset
methods depending on conditions on the system.
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 17:03:20 -04:00
Alex Deucher
bddbacc9e0
drm/amdgpu: print where we get the vbios image from
...
ACPI, ROM, PCI BAR, etc.
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 17:03:20 -04:00
Bhawanpreet Lakha
31e726ca3d
drm/amdgpu: parse ta firmware for navy_flounder
...
Use the same case as sienna_cichlid
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:41 -04:00
James Zhu
ac1128c996
drm/amdgpu/vcn3.0: only SIENNA_CICHLID need specify instance for dec/enc
...
Only SIENNA_CICHLID(VCN3) has 2 unsymmetrical instances, there're less
codecs on instance 1, we use 0 for decode and 1 for encode.
Signed-off-by: James Zhu <James.Zhu@amd.com >
Reviewed-by: Leo Liu <leo.liu@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:41 -04:00
Evan Quan
e098bc9612
drm/amd/pm: optimize the power related source code layout
...
The target is to provide a clear entry point(for power routines).
Also this can help to maintain a clear view about the frameworks
used on different ASICs. Hopefully all these can make power part
more friendly to play with.
Signed-off-by: Evan Quan <evan.quan@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:41 -04:00
Evan Quan
e9372d2371
drm/amd/powerplay: put those exposed power interfaces in amdgpu_dpm.c
...
As other power interfaces.
Signed-off-by: Evan Quan <evan.quan@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:41 -04:00
Evan Quan
20d3c28ce4
drm/amd/powerplay: optimize i2c bus access implementation
...
The caller needs not care about the internal details how the powerplay
API implemented.
Signed-off-by: Evan Quan <evan.quan@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:41 -04:00
Evan Quan
70bdb6ed22
drm/amd/powerplay: drop unnecessary pp_funcs checker
...
It's redundant. Also, the callers should not care about
the implementation details.
Signed-off-by: Evan Quan <evan.quan@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:41 -04:00
Evan Quan
b89e9eb681
drm/amd/powerplay: optimize amdgpu_dpm_set_clockgating_by_smu() implementation
...
Cover the implementation details from outside(of power part).
Signed-off-by: Evan Quan <evan.quan@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:41 -04:00
Guchun Chen
ae2bf61ff3
drm/amdgpu: guard ras debugfs creation/removal based on CONFIG_DEBUG_FS
...
It can avoid potential build warn/error when
CONFIG_DEBUG_FS is not set.
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Guchun Chen
2e2f5dd514
drm/amdgpu: fix NULL pointer access issue when unloading driver
...
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.
BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: 0002 [#1 ] SMP PTI
CPU: 11 PID: 1526 Comm: modprobe Tainted: G OE 5.6.0-guchchen #1
Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
RIP: 0010:down_write+0x15/0x40
Code: eb de e8 7e 17 72 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 89 fb e8 92
d8 ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 43 08 5b c3
RSP: 0018:ffffb1590386fcd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff85b2fcc2 RDI: 00000000000000a0
RBP: ffffb1590386fd30 R08: ffffffff85b2fcc2 R09: 000000000002b3c0
R10: ffff97a330618c40 R11: 00000000000005f6 R12: ffff97a3481beb40
R13: 00000000000000a0 R14: ffff97a3481beb40 R15: 0000000000000000
FS: 00007fb11a717540(0000) GS:ffff97a376cc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000004066d6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
simple_recursive_removal+0x63/0x370
? debugfs_remove+0x60/0x60
debugfs_remove+0x40/0x60
amdgpu_ras_fini+0x82/0x230 [amdgpu]
? __kernfs_remove.part.17+0x101/0x1f0
? kernfs_name_hash+0x12/0x80
amdgpu_device_fini+0x1c0/0x580 [amdgpu]
amdgpu_driver_unload_kms+0x3e/0x70 [amdgpu]
amdgpu_pci_remove+0x36/0x60 [amdgpu]
pci_device_remove+0x3b/0xb0
device_release_driver_internal+0xe5/0x1c0
driver_detach+0x46/0x90
bus_remove_driver+0x58/0xd0
pci_unregister_driver+0x29/0x90
amdgpu_exit+0x11/0x25 [amdgpu]
__x64_sys_delete_module+0x13d/0x210
do_syscall_64+0x5f/0x250
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Christian König
f1403342eb
drm/amdgpu: revert "fix system hang issue during GPU reset"
...
The whole approach wasn't thought through till the end.
We already had a reset lock like this in the past and it caused the same problems like this one.
Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.
This reverts commit df9c8d1aa2 .
Signed-off-by: Christian König <christian.koenig@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Evan Quan
9f979a49e2
drm/amd/powerplay: enable swSMU mgpu fan boost support
...
Enable mgpu fan boost feature on swSMU routines.
Signed-off-by: Evan Quan <evan.quan@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Evan Quan
f10bb940d8
drm/amd/powerplay: optimize the interface for mgpu fan boost enablement
...
Cover the implementation details from outside(of power). Also preparing
for expanding this to swSMU.
Signed-off-by: Evan Quan <evan.quan@amd.com >
Acked-by: Nirmoy Das <nirmoy.das@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Jiansong Chen
ba4e049e63
drm/amdgpu: disable gfxoff for navy_flounder
...
gfxoff is temporarily disabled for navy_flounder,
since at present the feature has broken some basic
amdgpu test.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Oak Zeng
9fb1506eb6
drm/amdgpu: Use function pointer for some mmhub functions
...
Add more function pointers to amdgpu_mmhub_funcs. ASIC specific
implementation of most mmhub functions are called from a general
function pointer, instead of calling different function for
different ASIC. Simplify the code by deleting duplicate functions
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Nirmoy Das
2f53072434
drm/amdgpu: pass NULL pointer instead of 0
...
Fixes: c030f2e416 ("drm/amdgpu: add amdgpu_ras.c to support ras (v2)")
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com >
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Dennis Li
72e14ebf9f
drm/amdgpu: annotate a false positive recursive locking
...
[ 584.110304] ============================================
[ 584.110590] WARNING: possible recursive locking detected
[ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE
[ 584.111164] --------------------------------------------
[ 584.111456] kworker/38:1/553 is trying to acquire lock:
[ 584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.112112]
but task is already holding lock:
[ 584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.113068]
other info that might help us debug this:
[ 584.113689] Possible unsafe locking scenario:
[ 584.114350] CPU0
[ 584.114685] ----
[ 584.115014] lock(&adev->reset_sem);
[ 584.115349] lock(&adev->reset_sem);
[ 584.115678]
*** DEADLOCK ***
[ 584.116624] May be due to missing lock nesting notation
[ 584.117284] 4 locks held by kworker/38:1/553:
[ 584.117616] #0 : ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630
[ 584.117967] #1 : ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[ 584.118358] #2 : ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[ 584.118786] #3 : ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.119222]
stack backtrace:
[ 584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G OE 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[ 584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[ 584.121638] Call Trace:
[ 584.122050] dump_stack+0x98/0xd5
[ 584.122499] __lock_acquire+0x1139/0x16e0
[ 584.122931] ? trace_hardirqs_on+0x3b/0xf0
[ 584.123358] ? cancel_delayed_work+0xa6/0xc0
[ 584.123771] lock_acquire+0xb8/0x1c0
[ 584.124197] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.124599] down_write+0x49/0x120
[ 584.125032] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.125472] amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.125910] ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[ 584.126367] amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[ 584.126789] process_one_work+0x29e/0x630
[ 584.127208] worker_thread+0x3c/0x3f0
[ 584.127621] ? __kthread_parkme+0x61/0x90
[ 584.128014] kthread+0x12f/0x150
[ 584.128402] ? process_one_work+0x630/0x630
[ 584.128790] ? kthread_park+0x90/0x90
[ 584.129174] ret_from_fork+0x3a/0x50
Each adev has owned lock_class_key to avoid false positive
recursive locking.
v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning
[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210
v3:
change to use down_write_nest_lock to annotate the false dead-lock
warning.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch >
Signed-off-by: Dennis Li <Dennis.Li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00
Wenhui Sheng
a4322e1881
drm/amdgpu: add debugfs interface for RAP test
...
After amdgpu driver loading successfully, we can use
RAP debugfs interface <debugfs_dir>/dri/xxx/rap_test
to trigger RAP test.
Currently only L0 validate test is supported.
v2: refine amdgpu_rap.h
Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com >
Reviewed-by: Guchun Chen <Guchun.Chen@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-08-14 16:22:40 -04:00