Commit Graph

9532 Commits

Author SHA1 Message Date
Andrey Grodzovsky
03f9016ed8 drm/amdgpu: Remap all page faults to per process dummy page.
On device removal reroute all CPU mappings to dummy page
per drm_file instance or imported GEM object.

v4:
Update for modified ttm_bo_vm_dummy_page

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-7-andrey.grodzovsky@amd.com
2021-05-19 23:50:27 -04:00
Andrey Grodzovsky
d10d0daa20 drm/amdgpu: Handle IOMMU enabled case.
Problem:
Handle all DMA IOMMU group related dependencies before the
group is removed. Those manifest themself in that when IOMMU
enabled DMA map/unmap is dependent on the presence of IOMMU
group the device belongs to but, this group is released once
the device is removed from PCI topology.

Fix:
Expedite all such unmap operations to pci remove driver callback.

v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate
v6: Drop the BO unamp list
v7:
Drop amdgpu_gart_fini
In amdgpu_ih_ring_fini do uncinditional  check (!ih->ring)
to avoid freeing uniniitalized rings.
Call amdgpu_ih_ring_fini unconditionally.
v8: Add deatiled explanation

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210517143851.475058-1-andrey.grodzovsky@amd.com
2021-05-19 23:50:27 -04:00
Andrey Grodzovsky
e9669fb782 drm/amdgpu: Add early fini callback
Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug

v5:
Move HW finilization into this callback to prevent MMIO accesses
post cpi remove.

v7:
Split kfd suspend from device exit to expdite HW related
stuff to amdgpu_pci_remove

v8:
Squash previous KFD commit into this commit to avoid compile break.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com
2021-05-19 23:48:50 -04:00
Andrey Grodzovsky
72c8c97b15 drm/amdgpu: Split amdgpu_device_fini into early and late
Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and
releasing driver data structures can be kept until
drm_driver.release hook is called, i.e. when the last device
reference is dropped.

v4: Change functions prefix early->hw and late->sw

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-3-andrey.grodzovsky@amd.com
2021-05-19 23:45:49 -04:00
Dave Airlie
ae25ec2fc6 drm-misc-next for 5.14:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
 
  * aperture: Fix unlocking on errors
 
  * legacy: Fix some doc comments
 
 Driver Changes:
 
  * drm/amdgpu: Free resource on fence usage query; Fix fence calculation;
 
  * drm/bridge: Lt9611: Add missing MODULE_DEVICE_TABLE
 
  * drm/i915: Print formats with %p4cc
 
  * drm/ingenic: IPU planes are now always of type OVERLAY
 
  * drm/nouveau: Remove left-over reference to struct drm_device.pdev
 
  * drm/panfrost: Disable devfreq if num_supplies > 1; Add Mediatek MT8183 +
    DT bindings; Cleanups
 
  * drm/simpledrm: Print resources with %pr; Fix use-after-free errors;
    Fix NULL deref; Fix MAINTAINERS entry
 
  * drm/vmwgfx: Fix memory allocation and leak in FIFO allocation; Fix
    return value in PCI resource setup
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmCia/4ACgkQaA3BHVML
 eiM3OAf/RagUoVL1cBja+ytwnRC6C0mEeqD4LXPG/nezEc9Hx726zWguf21n9zSm
 zk4YX7EMhO6ZaAeHDZjk0Y2QbLxKIG+jtqQ7WmzUmo+Skiqaz+Yt4ZAMlnz51qRC
 9D91NiHMYxLhyTGnTJJX3Xxng6Zd9j3tyIez4a6nV0HBszbyZl/536LyYOeUnzzn
 Nav1SVX0H6obznjBmUMc1kIl4OnaEAmaKzv4rwncFQRSnjQ06fwwUhovcxFYgQhB
 IfM4qOqX8fk7h2fE5WL0E5lHb3EX3S/19I5dmOGTuyNkSItJYSyJc1iQ35t7iCZr
 /fQUUrUZJLCYFG7p4KvZ4n3UkHEcYg==
 =uq3p
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2021-05-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.14:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

 * aperture: Fix unlocking on errors

 * legacy: Fix some doc comments

Driver Changes:

 * drm/amdgpu: Free resource on fence usage query; Fix fence calculation;

 * drm/bridge: Lt9611: Add missing MODULE_DEVICE_TABLE

 * drm/i915: Print formats with %p4cc

 * drm/ingenic: IPU planes are now always of type OVERLAY

 * drm/nouveau: Remove left-over reference to struct drm_device.pdev

 * drm/panfrost: Disable devfreq if num_supplies > 1; Add Mediatek MT8183 +
   DT bindings; Cleanups

 * drm/simpledrm: Print resources with %pr; Fix use-after-free errors;
   Fix NULL deref; Fix MAINTAINERS entry

 * drm/vmwgfx: Fix memory allocation and leak in FIFO allocation; Fix
   return value in PCI resource setup

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YKJs2IfwSYvuGPU7@linux-uq9g.fritz.box
2021-05-20 13:31:12 +10:00
Christian König
81db370c88 drm/amdgpu: stop touching sched.ready in the backend
This unfortunately comes up in regular intervals and breaks
GPU reset for the engine in question.

The sched.ready flag controls if an engine can't get working
during hw_init, but should never be set to false during hw_fini.

v2: squash in unused variable fix (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:45:00 -04:00
Lang Yu
6e8bcdd63a drm/amd/amdgpu: fix a potential deadlock in gpu reset
When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:

[  805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[  806.290952] [drm] free PSP TMR buffer

[  806.319406] ============================================
[  806.320315] WARNING: possible recursive locking detected
[  806.321225] 5.11.0-custom #1 Tainted: G        W  OEL
[  806.322135] --------------------------------------------
[  806.323043] cat/2593 is trying to acquire lock:
[  806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.325668]
               but task is already holding lock:
[  806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.328430]
               other info that might help us debug this:
[  806.329539]  Possible unsafe locking scenario:

[  806.330549]        CPU0
[  806.330983]        ----
[  806.331416]   lock(&adev->dm.dc_lock);
[  806.332086]   lock(&adev->dm.dc_lock);
[  806.332738]
                *** DEADLOCK ***

[  806.333747]  May be due to missing lock nesting notation

[  806.334899] 3 locks held by cat/2593:
[  806.335537]  #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[  806.337009]  #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  806.339018]  #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.340869]
               stack backtrace:
[  806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G        W  OEL    5.11.0-custom #1
[  806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[  806.344413] Call Trace:
[  806.344849]  dump_stack+0x93/0xbd
[  806.345435]  __lock_acquire.cold+0x18a/0x2cf
[  806.346179]  lock_acquire+0xca/0x390
[  806.346807]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.347813]  __mutex_lock+0x9b/0x930
[  806.348454]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.349434]  ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[  806.350581]  ? _raw_spin_unlock_irqrestore+0x47/0x50
[  806.351437]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.352437]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.353252]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.354064]  mutex_lock_nested+0x1b/0x20
[  806.354747]  ? mutex_lock_nested+0x1b/0x20
[  806.355457]  dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.356427]  ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[  806.357736]  amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[  806.360394]  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  806.362926]  amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[  806.365560]  amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian KÃnig <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:50 -04:00
Aaron Liu
9a530062d5 drm/amdgpu: modify system reference clock source for navi+ (V2)
Starting from Navi+, the rlc reference clock is used for system clock
from vbios gfx_info table. It is incorrect to use core_refclk_10khz of
vbios smu_info table as system clock.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:48 -04:00
Guchun Chen
87476d12c5 drm/amdgpu: update sdma golden setting for Navi12
Current golden setting is out of date.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:45 -04:00
Guchun Chen
6c65d8678c drm/amdgpu: update gc golden setting for Navi12
Current golden setting is out of date.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:43 -04:00
xinhui pan
a8e56b80df drm/amdgpu: Fix a use-after-free
looks like we forget to set ttm->sg to NULL.
Hit panic below

[ 1235.844104] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b7b4b: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 1235.989074] Call Trace:
[ 1235.991751]  sg_free_table+0x17/0x20
[ 1235.995667]  amdgpu_ttm_backend_unbind.cold+0x4d/0xf7 [amdgpu]
[ 1236.002288]  amdgpu_ttm_backend_destroy+0x29/0x130 [amdgpu]
[ 1236.008464]  ttm_tt_destroy+0x1e/0x30 [ttm]
[ 1236.013066]  ttm_bo_cleanup_memtype_use+0x51/0xa0 [ttm]
[ 1236.018783]  ttm_bo_release+0x262/0xa50 [ttm]
[ 1236.023547]  ttm_bo_put+0x82/0xd0 [ttm]
[ 1236.027766]  amdgpu_bo_unref+0x26/0x50 [amdgpu]
[ 1236.032809]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7aa/0xd90 [amdgpu]
[ 1236.040400]  kfd_ioctl_alloc_memory_of_gpu+0xe2/0x330 [amdgpu]
[ 1236.046912]  kfd_ioctl+0x463/0x690 [amdgpu]

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:39 -04:00
Mukul Joshi
1f6256590c drm/amdgpu: Query correct register for DF hashing on Aldebaran
For Aldebaran, driver needs to query DramMegaBaseAddress to
check if DF hashing is enabled.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:19 -04:00
James Zhu
295c4f513f drm/amdgpu: add video_codecs query support for aldebaran
Add video_codecs query support for aldebaran.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:16 -04:00
Felix Kuehling
e552ee40b0 drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind
The dmabuf attachment should be updated by moving the SG BO to DOMAIN_CPU
and back to DOMAIN_GTT. This does not necessarily invoke the
populate/unpopulate callbacks. Do this in backend_bind/unbind instead.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:10 -04:00
Felix Kuehling
5ac3c3e45f drm/amdgpu: Add DMA mapping of GTT BOs
Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:06 -04:00
Felix Kuehling
9e5d275319 drm/amdgpu: Move kfd_mem_attach outside reservation
This is needed to avoid deadlocks with DMA buf import in the next patch.
Also move PT/PD validation out of kfd_mem_attach, that way the caller
can bo this unconditionally.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:03 -04:00
Felix Kuehling
b72ed8a2de drm/amdgpu: DMA map/unmap when updating GPU mappings
DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
with the BO and page tables reserved, so we can safely update the DMA
mapping.

DMA unmap when a BO is unmapped from a GPU and before updating mappings
in restore workers.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:59 -04:00
Felix Kuehling
264fb4d332 drm/amdgpu: Add multi-GPU DMA mapping helpers
Add BO-type specific helpers functions to DMA-map and unmap
kfd_mem_attachments. Implement this functionality for userptrs by creating
one SG BO per GPU and filling it with a DMA mapping of the pages from the
original mem->bo.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:56 -04:00
Felix Kuehling
7141394edc drm/amdgpu: Simplify AQL queue mapping
Do AQL queue double-mapping with a single attach call. That will make it
easier to create per-GPU BOs later, to be shared between the two BO VA
mappings on the same GPU.

Freeing the attachments is not necessary if map_to_gpu fails. These will be
cleaned up when the kdg_mem object is destroyed in
amdgpu_amdkfd_gpuvm_free_memory_of_gpu.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:52 -04:00
Felix Kuehling
4e94272f8a drm/amdgpu: Keep a bo-reference per-attachment
For now they all reference the same BO. For correct DMA mappings they will
refer to different BOs per-GPU.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:49 -04:00
Felix Kuehling
c780b2eedb drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
This name is more fitting, especially for the changes coming next to
support multi-GPU systems with proper DMA mappings. Cleaned up the code
and renamed some related functions and variables to improve readability.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:46 -04:00
Jingwen Chen
0a6fb50286 drm/amd/amdgpu: fix refcount leak
[Why]
the gem object rfb->base.obj[0] is get according to num_planes
in amdgpufb_create, but is not put according to num_planes

[How]
put rfb->base.obj[0] in amdgpu_fbdev_destroy according to num_planes

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:43 -04:00
YuBiao Wang
4aa7e6e07b drm/amd/amdgpu: psp program IH_RB_CTRL on sienna_cichlid
[Why]
IH_RB_CNTL is blocked by PSP so we need to ask psp to help config it.

[How]
Move psp ip block before ih, and use psp to program IH_RB_CNTL under sriov.

Reviewed-by: Chen, Horace <Horace.Chen@amd.com>
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:40 -04:00
Aurabindo Pillai
ac87f94294 drm/amd/display: Enable HDCP for Beige Goby
[Why&How]
Add beige_goby_ta.bin to module firmware table and call psp init for TA

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:32 -04:00
Aurabindo Pillai
ddaed58b57 drm/amd/amdgpu: Enable DCN IP init for Beige Goby
[Why&How]
Adds DCN IP block initialization for Beige Goby

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:42:13 -04:00
Jiansong Chen
2db8378f09 drm/amdgpu: fix GCR_GENERAL_CNTL offset for beige_goby
beige_goby has similar gc_10_3 ip with sienna_cichlid,
so follow its registers offset setting.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:53 -04:00
Chengming Gui
ece3cbadb4 drm/amd/amdgpu: Enable gfxoff for beige_goby
Enable gfxoff in driver side based on SMC#73.3

v2: fix typo 'Eanble' --> 'Enable'

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:50 -04:00
Tao Zhou
d69d278fc7 drm/amdgpu: add cgls for beige_goby
Enable cgls to improve the runtime power efficiency.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:45 -04:00
Veerabadhran Gopalakrishnan
e47e4c0e4f drm/amdgpu: enabled VCN3.0 CG for BEIGE GOBY
Enable VCN CG for BEIGE GOBY

Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:33 -04:00
Tao Zhou
a764bef36d drm/amdgpu: enable ih CG for beige_goby
Enable ih clock gating for beige_goby.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:30 -04:00
Tao Zhou
170c193ffd drm/amdgpu: enable hdp CG and LS for beige_goby
Enable hdp MGCG and LS for beige_goby.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:28 -04:00
Tao Zhou
5d36b865e4 drm/amdgpu: enable mc CG and LS for beige_goby
Enable mc CG and LS for beige_goby.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:25 -04:00
Tao Zhou
147de218c2 drm/amdgpu: enable athub/mmhub PG for beige_goby
Enable athub/mmhub power gating for beige_goby.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:22 -04:00
Tao Zhou
d75caec8a4 drm/amdgpu: support athub cg setting for beige_goby
Enable athub cg for beige_goby.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:19 -04:00
Tao Zhou
bc6bd46bc3 drm/amdgpu: enable GFX clock gating for beige_goby
Enable GFX MGCG, CGCG and 3DCG for beige_goby.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:15 -04:00
Chengming Gui
5ed7715dbb drm/amd/pm: add mode1 support for beige_goby
Add mode1 reset as the default reset method for beige_goby

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:04 -04:00
Chengming Gui
09c31c778d drm/amd/amdgpu: update golden_setting_10_3_5 for beige_goby
add mmCGTT_SPI_{RA0/RA1}_CLK_CTRL setting

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:41:01 -04:00
Veerabadhran Gopalakrishnan
f703d4b6f2 drm/amdgpu: Enable VCN for Beige Goby
Enabled VCN support for Beige Goby chip

Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:58 -04:00
Hawking Zhang
3df8ecc8a1 drm/amdgpu: add gc_10_3_5 golden setting for beige_goby
execute gc_10_3_5 golden registers one-time initialization

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:53 -04:00
Alex Deucher
77a3e25102 drm/amdgpu: add mmhub client support for beige goby
For decoding GPUVM page faults.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:50 -04:00
Chengming Gui
c072981910 drm/amd/amdgpu: add psp support for beige_goby
add general PSP support for beige_goby

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:47 -04:00
Chengming Gui
4d3526690a drm/amd/amdgpu: add smu support for beige_goby
Use soft-pptable for beige_goby

v2: fix format

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:44 -04:00
Chengming Gui
0e5f4b0988 drm/amd/amdgpu: Use IP discovery table for beige goby
Rather than gpu info firmware.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:36 -04:00
Chengming Gui
afee60e4c5 drm/amd/amdgpu: support cp_fw_write_wait for beige_goby
Same as dimgrey_cavefish to support WAIT_REG_MEM packet.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:34 -04:00
Chengming Gui
5663da86c9 drm/amd/amdgpu: add virtual display support for beige_goby
Add virtual ip block for beige_goby

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:30 -04:00
Chengming Gui
67b35b08e7 drm/amd/amdgpu: configure beige_goby gfx according to gfx 10.3's definition
The gfx version of beige_goby is 10.3,
identical to sienna_cichlid,
follow the way of sienna_cichlid

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:28 -04:00
Chengming Gui
8760403e19 drm/amd/amdgpu: add sdma ip block for beige_goby
Enable sdma block for beige_goby, same as sienna_cichlid

v2: share the same setting of sdma instance num with vangogh

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Alexander Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:26 -04:00
Chengming Gui
898319ca1e drm/amd/amdgpu: add gfx ip block for beige_goby
Enable gfx block for beige_goby, same as dimgrey_cavefish

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:23 -04:00
Chengming Gui
a1dede364b drm/amd/amdgpu: add ih ip block for beige_goby
Enable ih block for beige_goby, same as dimgrey_cavefish

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:19 -04:00
Chengming Gui
2d527ea6fd drm/amd/amdgpu: add gmc ip block for beige_goby
Enable gmc block for beige_goby, same as sienna_cichlid

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:17 -04:00
Chengming Gui
aa2caa2ad6 drm/amd/amdgpu: add common ip block for beige_goby
Same as dimgrey_cavefish
v2: fix comments typo

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:14 -04:00
Chengming Gui
ece6fb068d drm/amd/amdgpu: add mmhub support for beige_goby
Same as dimgrey_cavefish

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:11 -04:00
Chengming Gui
fd5b4b44e4 drm/amd/amdgpu: initialize IP offset for beige_goby
Add ip offset definition for beige_goby and initialize it

v2: squash in fixes (Alex)
V3: fix permissions on file (Alex)

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:09 -04:00
Chengming Gui
8573035a95 drm/amd/amdgpu: add common support for beige_goby
Add external id and set clock gating for beige_goby

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:06 -04:00
Chengming Gui
d2bfc50de2 drm/amd/amdgpu: add gmc support for beige_goby
Same as dimgrey_cavefish

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:40:03 -04:00
Chengming Gui
f7b97efef6 drm/amd/amdgpu: add support for beige_goby firmware
Add support for beige_goby cp/rlc firmware

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:39:59 -04:00
Chengming Gui
b41f5b7ab0 drm/amd/amdgpu: set asic family and ip blocks for beige_goby
Same as navi series

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:39:57 -04:00
Chengming Gui
2542e3c654 drm/amd/amdgpu: set fw load type for beige_goby
Use direct load for beige_goby

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:39:49 -04:00
Chengming Gui
6f1695918c drm/amd/amdgpu: add beige_goby asic type
Add chip type for beige_goby

v2: fix enum count (Alex)

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:39:45 -04:00
John Clements
8f6368a9c9 drm/amdgpu: Conditionally reset RAS counters on boot
Only clear RAS error counters if perestent EDC harvesting is not supported

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:38:11 -04:00
Changfeng
8ef4f94add drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang
There is problem with 3DCGCG firmware and it will cause compute test
hang on picasso/raven1. It needs to disable 3DCGCG in driver to avoid
compute hang.

Signed-off-by: Changfeng <Changfeng.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:38:08 -04:00
Peng Ju Zhou
d9c7f753b8 drm/amdgpu: Refine the error report when flush tlb.
there are 2 hubs to flush in the gmc, to make it easier
to debug when hub flush failed, refine the logs.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:38:03 -04:00
Peng Ju Zhou
e0972f8c21 drm/amdgpu: Skip the program of GRBM_CAM* in SRIOV
KMD should not the program these registers,
so skip them in the SRIOV environment.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:37:59 -04:00
Yi Li
ea46eaf26c drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE
When PAGE_SIZE is larger than AMDGPU_PAGE_SIZE, the number of GPU TLB
entries which need to update in amdgpu_map_buffer() should be multiplied
by AMDGPU_GPU_PAGES_IN_CPU_PAGE (PAGE_SIZE / AMDGPU_PAGE_SIZE).

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Yi Li <liyi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:37:53 -04:00
Philip Yang
765385ec00 drm/amdkfd: heavy-weight flush TLB after unmap
Need do a heavy-weight TLB flush to make sure we have no more dirty data
in the cache for the unmapped pages.

Define enum TLB_FLUSH_TYPE, add flush_type parameter to
amdgpu_amdkfd_flush_gpu_tlb_pasid.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:37:50 -04:00
Jiawei Gu
5228cd6574 drm/amdgpu: Fill adev->unique_id with data from PF2VF msg
Initialize unique_id from PF2VF under virtualization.

V2: skip smu_get_unique_id() under virtualization

Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:33:58 -04:00
Philip Yang
bf546940d5 drm/amdgpu: flush TLB if valid PDE turns into PTE
Mapping huge page, 2MB aligned address with 2MB size, uses PDE0 as PTE.
If previously valid PDE0, PDE0.V=1 and PDE0.P=0 turns into PTE, this
requires TLB flush, otherwise page table walker will not read updated
PDE0.

Change page table update mapping to return table_freed flag to indicate
the previously valid PDE may have turned into a PTE if page table is
freed.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:33:54 -04:00
Christian König
3b5d86fc23 drm/amdgpu: move struct amdgpu_vram_reservation into vram mgr
Not used outside of that file.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:32:21 -04:00
Christian König
dfffdf5e65 drm/amdgpu: check contiguous flags instead of mm_node
Drop the last user of drm_mm_node.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:32:18 -04:00
Christian König
abf91e0d33 drm/amdgpu: set the contiguous flag if possible
This avoids reallocating scanout BOs on first pin in a lot of cases.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:32:14 -04:00
Christian König
2b77ade8b9 drm/amdgpu: use cursor functions in amdgpu_bo_in_cpu_visible_vram
One of the last remaining uses of drm_mm_node.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:32:09 -04:00
Christian König
0ccc3ccf5b drm/amdgpu: re-apply "use the new cursor in the VM code" v2
Now that we found the underlying problem we can re-apply this patch.

This reverts commit 6b44b667e2.

v2: rebase on KFD changes

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:30:21 -04:00
Hawking Zhang
c6a1113333 drm/amdgpu: query boot config cap before issue psp cmd
Only send boot_config cmd to ASICs that support dynamic
boot config. Otherwise, the boot_config cmd will fail.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:30:17 -04:00
Hawking Zhang
cffd6f9d42 drm/amdgpu: add helper function to query dynamic boot config cap
Check firmware flags to determine whether dynmaic
boot config is supported or not.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:30:14 -04:00
Hawking Zhang
82a5203016 drm/amdgpu: switch to cached fw flags for mem training cap
Check cached firmware_flags to determin whether
two stage mem training is supported or not.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:30:11 -04:00
Hawking Zhang
698b101086 drm/amdgpu: switch to cached fw flags for sram ecc cap
Check cached firmware_flags to determine whether
sram ecc is supported or not.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:30:09 -04:00
Hawking Zhang
58ff791ad3 drm/amdgpu: switch to cached fw flags for gpu virt cap
Check cached firmware_flags to determine if gpu
virtualization is supported in vbios

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:30:06 -04:00
Hawking Zhang
5968c6a2ba drm/amdgpu: add atomfirmware helper function to query fw cap
Fimware capability was changed from 16 bits to 32 bits
for atomfirmware. add helper funciton to query firmware
capability and cache the value at early stage.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:30:03 -04:00
Bokun Zhang
ed9d205363 drm/amdgpu: Complete multimedia bandwidth interface
- Update SRIOV PF2VF header with latest revision

- Extend existing function in amdgpu_virt.c to read MM bandwidth config
  from PF2VF message

- Add SRIOV Sienna Cichlid codec array and update the bandwidth with
  PF2VF message

v2: squash in removal of unused variable (Alex)

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Monk liu <monk.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:58 -04:00
Felix Kuehling
2b2339eeaf drm/amdgpu: Albebaran: MTYPE_NC for coarse-grain remote memory
MTYPE UC was used for a specific use case that ended up not being
implemented. Use NC for better performance for coarse-grained memory where
cache coherence during shader execution is not required.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:56 -04:00
Felix Kuehling
0c6f7777cf drm/amdgpu: Arcturus: MTYPE_NC for coarse-grain remote memory
MTYPE UC was used for a specific use case that ended up not being
implemented. Use NC for better performance for coarse-grained memory where
cache coherence during shader execution is not required.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:53 -04:00
Jinzhou Su
195c41fba4 drm/amdgpu: Add compile flag for securedisplay
Add compile flag CONFIG_DEBUG_FS to clear the warning:
unused variable 'amdgpu_securedisplay_debugfs_ops'

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:49 -04:00
Roy Sun
0aa0725fa7 drm/amd/amdgpu: Cancel the hrtimer in sw_fini
Move the process of cancelling hrtimer to sw_fini

Signed-off-by: Roy Sun <Roy.Sun@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:32 -04:00
Kenneth Feng
0064b0ce85 drm/amd/pm: enable ASPM by default
Since ASPM function has been stable, we don't need to add the modprobe
parameter and we can enable ASPM by default.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:30 -04:00
Likun Gao
32358093b6 drm/amdgpu: update the method for harvest IP for specific SKU
Update the method of disabling VCN IP for specific SKU for navi1x ASIC,
it will judge whether should add the related IP at the function of
amdgpu_device_ip_block_add().

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:27 -04:00
Likun GAO
7bd939d04d drm/amdgpu: add judgement when add ip blocks (v2)
Judgement whether to add an sw ip according to the harvest info.

v2: fix indentation (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:22 -04:00
Dennis Li
1acbb613c4 drm/amdgpu: add synchronization among waves in the same threadgroup
It is possible that the previous waves have exited before others are
created, so the other waves maybe reuse pyhsical resouces left by
previous ones. Therefore add barrier instruction to synchronize waves within
the same threadgroup.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:29:17 -04:00
Dave Airlie
3a3ca72653 drm-misc-next for 5.14:
UAPI Changes:
 
  * drm: Disable connector force-probing for non-master clients
  * drm: Enforce consistency between IN_FORMATS property and cap + related
    driver cleanups
  * drm/amdgpu: Track devices, process info and fence info via
    /proc/<pid>/fdinfo
  * drm/ioctl: Mark AGP-related ioctls as legacy
  * drm/ttm: Provide tt_shrink file to trigger shrinker via debugfs;
 
 Cross-subsystem Changes:
 
  * fbdev/efifb: Special handling of non-PCI devices
  * fbdev/imxfb: Fix error message
 
 Core Changes:
 
  * drm: Add connector helper to attach HDR-metadata property and convert
    drivers
  * drm: Add connector helper to compare HDR-metadata and convert drivers
  * drm: Add conenctor helper to attach colorspace property
  * drm: Signal colorimetry in HDMI infoframe
  * drm: Support pitch for destination buffers; Add blitter function
    with generic format conversion
  * drm: Remove struct drm_device.pdev and update legacy drivers
  * drm: Remove obsolete DRM_KMS_FB_HELPER config option in core and drivers
  * drm: Remove obsolete drm_pci_alloc/drm_pci_free
 
  * drm/aperture: Add helpers for aperture ownership and convert drivers, replaces rsp fbdev helpers
 
  * drm/agp: Mark DRM AGP code as legacy and convert legacy drivers
 
  * drm/atomic-helpers: Cleanups
 
  * drm/dp: Handle downstream port counts of 0 correctly; AUX channel fixes; Use
    drm_err_*/drm_dbg_*(); Cleanups
 
  * drm/dp_dual_mode: Use drm_err_*/drm_dbg_*()
 
  * drm/dp_mst: Use drm_err_*/drm_dbg_*(); Use Extended Base Receiver Capability DPCD space
 
  * drm/gem-ttm-helper: Provide helper for dumb_map_offset and convert drivers
 
  * drm/panel: Use sysfs_emit; panel-simple: Use runtime PM, Power up panel
               when reading EDID, Cache EDID, Cleanups;
               Lms397KF04: DT bindings
 
  * drm/pci: Mark AGP helpers as legacy
 
  * drm/print: Handle NULL for DRM devices gracefully
 
  * drm/scheduler: Change scheduled fence track
 
  * drm/ttm: Don't count SG BOs against pages_limit; Warn about freeing pinned
             BOs; Fix error handling if no BO can be swapped out; Move special
             handling of non-GEM drivers into vmwgfx; Move page_alignment into
             the BO; Set drm-misc as TTM tree in MAINTAINERS; Cleanup
 	    ttm_agp_backend; Add ttm_sys_manager for system domain; Cleanups
 
 Driver Changes:
 
  * drm: Don't set allow_fb_modifiers explictly in drivers
 
  * drm/amdgpu: Pin/unpin fixes wrt to TTM; Use bo->base.size instead of
    mem->num_pages
 
  * drm/ast: Use managed pcim_iomap(); Fix EDID retrieval with DP501
 
  * drm/bridge: MHDP8546: HDCP support + DT bindings, Register DP AUX channel
    with userspace; Sil8620: Fix module dependencies; dw-hdmi: Add option to
    not load CEC driver; Fix stopping in drm_bridge_chain_pre_enable();
    Ti-sn65dsi86: Fix refclk handling, Break GPIO and MIPI-to-eDP into
    subdrivers, Use pm_runtime autosuspend, cleanups; It66121: Add
    driver + DT bindings; Adv7511: Support I2S IEC958 encoding; Anx7625: fix
    power-on delay; Nwi-dsi: Modesetting fixes; Cleanups
 
  * drm/bochs: Support screen blanking
 
  * drm/gma500: Cleanups
 
  * drm/gud: Cleanups
 
  * drm/i915: Use correct max source link rate for MST
 
  * drm/kmb: Cleanups
 
  * drm/meson: Disable dw-hdmi CEC driver
 
  * drm/nouveau: Pin/unpin fixes wrt to TTM; Use bo->base.size instead of
    mem->num_pages; Register AUX adapters after their connectors
 
  * drm/qxl: Fix shadow BO unpin
 
  * drm/radeon: Duplicate some DRM AGP code to uncouple from legacy drivers
 
  * drm/simpledrm: Add a generic DRM driver for simple-framebuffer devices
 
  * drm/tiny: Fix log spam if probe function gets deferred
 
  * drm/vc4: Add support for HDR-metadata property; Cleanups
 
  * drm/virtio: Create dumb BOs as guest blobs;
 
  * drm/vkms: Use managed drmm_universal_plane_alloc(); Add XRGB plane
    composition; Add overlay support
 
  * drm/vmwgfx: Enable console with DRM_FBDEV_EMULATION; Fix CPU updates
    of coherent multisample surfaces; Remove reservation semaphore; Add
    initial SVGA3 support; Support amd64; Use 1-based IDR; Use min_t();
    Cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmCb4QsACgkQaA3BHVML
 eiPwUgf/eTodvGQyB0cjv1vyHlttLo2t9k4QBO0pzVH0DJokl/pMpY0CuS8A/afW
 RmKLYod3TQb2QeEqWjocPxcYrh5WCbjdDZlmSb+pF+qau4b4s09SzIogK3lO1Nve
 9N1WVa7C3JC3k3XYexpeZ78RtoNN0UboMKDfbZODnn1PtjVtOp7Nbb92trRuB7y+
 B72A8RQMYB5IywVln9+lzLYcrmpHZbk/sLmC5pIGBPcTyhn0TFinUYlg9iq1PvNM
 fIqvPvXwxDVRO6hgnxZWKrdvQKCOcl5KFnk4E6H+ZkgWJ+yuAWI9r2N9TeelcW+M
 jlCHreWEHhuTPkr/ypnVmO8kuEgSFA==
 =G9ip
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2021-05-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.14:

UAPI Changes:

 * drm: Disable connector force-probing for non-master clients
 * drm: Enforce consistency between IN_FORMATS property and cap + related
   driver cleanups
 * drm/amdgpu: Track devices, process info and fence info via
   /proc/<pid>/fdinfo
 * drm/ioctl: Mark AGP-related ioctls as legacy
 * drm/ttm: Provide tt_shrink file to trigger shrinker via debugfs;

Cross-subsystem Changes:

 * fbdev/efifb: Special handling of non-PCI devices
 * fbdev/imxfb: Fix error message

Core Changes:

 * drm: Add connector helper to attach HDR-metadata property and convert
   drivers
 * drm: Add connector helper to compare HDR-metadata and convert drivers
 * drm: Add conenctor helper to attach colorspace property
 * drm: Signal colorimetry in HDMI infoframe
 * drm: Support pitch for destination buffers; Add blitter function
   with generic format conversion
 * drm: Remove struct drm_device.pdev and update legacy drivers
 * drm: Remove obsolete DRM_KMS_FB_HELPER config option in core and drivers
 * drm: Remove obsolete drm_pci_alloc/drm_pci_free

 * drm/aperture: Add helpers for aperture ownership and convert drivers, replaces rsp fbdev helpers

 * drm/agp: Mark DRM AGP code as legacy and convert legacy drivers

 * drm/atomic-helpers: Cleanups

 * drm/dp: Handle downstream port counts of 0 correctly; AUX channel fixes; Use
   drm_err_*/drm_dbg_*(); Cleanups

 * drm/dp_dual_mode: Use drm_err_*/drm_dbg_*()

 * drm/dp_mst: Use drm_err_*/drm_dbg_*(); Use Extended Base Receiver Capability DPCD space

 * drm/gem-ttm-helper: Provide helper for dumb_map_offset and convert drivers

 * drm/panel: Use sysfs_emit; panel-simple: Use runtime PM, Power up panel
              when reading EDID, Cache EDID, Cleanups;
              Lms397KF04: DT bindings

 * drm/pci: Mark AGP helpers as legacy

 * drm/print: Handle NULL for DRM devices gracefully

 * drm/scheduler: Change scheduled fence track

 * drm/ttm: Don't count SG BOs against pages_limit; Warn about freeing pinned
            BOs; Fix error handling if no BO can be swapped out; Move special
            handling of non-GEM drivers into vmwgfx; Move page_alignment into
            the BO; Set drm-misc as TTM tree in MAINTAINERS; Cleanup
	    ttm_agp_backend; Add ttm_sys_manager for system domain; Cleanups

Driver Changes:

 * drm: Don't set allow_fb_modifiers explictly in drivers

 * drm/amdgpu: Pin/unpin fixes wrt to TTM; Use bo->base.size instead of
   mem->num_pages

 * drm/ast: Use managed pcim_iomap(); Fix EDID retrieval with DP501

 * drm/bridge: MHDP8546: HDCP support + DT bindings, Register DP AUX channel
   with userspace; Sil8620: Fix module dependencies; dw-hdmi: Add option to
   not load CEC driver; Fix stopping in drm_bridge_chain_pre_enable();
   Ti-sn65dsi86: Fix refclk handling, Break GPIO and MIPI-to-eDP into
   subdrivers, Use pm_runtime autosuspend, cleanups; It66121: Add
   driver + DT bindings; Adv7511: Support I2S IEC958 encoding; Anx7625: fix
   power-on delay; Nwi-dsi: Modesetting fixes; Cleanups

 * drm/bochs: Support screen blanking

 * drm/gma500: Cleanups

 * drm/gud: Cleanups

 * drm/i915: Use correct max source link rate for MST

 * drm/kmb: Cleanups

 * drm/meson: Disable dw-hdmi CEC driver

 * drm/nouveau: Pin/unpin fixes wrt to TTM; Use bo->base.size instead of
   mem->num_pages; Register AUX adapters after their connectors

 * drm/qxl: Fix shadow BO unpin

 * drm/radeon: Duplicate some DRM AGP code to uncouple from legacy drivers

 * drm/simpledrm: Add a generic DRM driver for simple-framebuffer devices

 * drm/tiny: Fix log spam if probe function gets deferred

 * drm/vc4: Add support for HDR-metadata property; Cleanups

 * drm/virtio: Create dumb BOs as guest blobs;

 * drm/vkms: Use managed drmm_universal_plane_alloc(); Add XRGB plane
   composition; Add overlay support

 * drm/vmwgfx: Enable console with DRM_FBDEV_EMULATION; Fix CPU updates
   of coherent multisample surfaces; Remove reservation semaphore; Add
   initial SVGA3 support; Support amd64; Use 1-based IDR; Use min_t();
   Cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YJvkD523evviED01@linux-uq9g.fritz.box
2021-05-19 09:22:56 +10:00
David M Nieto
5c439c38f5 drm/amdgpu: fix fence calculation (v2)
The proper metric for fence utilization over several
contexts is an harmonic mean, but such calculation is
prohibitive in kernel space, so the code approximates it.

Because the approximation diverges when one context has a
very small ratio compared with the other context, this change
filter out ratios smaller that 0.01%

v2: make the fence calculation static and initialize variables
within that function

v3: Fix warnings (Alex)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David M Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210513174539.27409-2-david.nieto@amd.com
2021-05-13 14:09:12 -04:00
David M Nieto
a7f0849682 drm/amdgpu: free resources on fence usage query
Free the resources if the fence needs to be ignored
during the ratio calculation

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David M Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210513174539.27409-1-david.nieto@amd.com
2021-05-13 14:02:24 -04:00
Sathishkumar S
5c1efb5f76 drm/amdgpu: update vcn1.0 Non-DPG suspend sequence
update suspend register settings in Non-DPG mode.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-13 10:49:29 -04:00
Sathishkumar S
3666f83a11 drm/amdgpu: set vcn mgcg flag for picasso
enable vcn mgcg flag for picasso.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-13 10:49:16 -04:00
Likun Gao
5c1a376823 drm/amdgpu: update the method for harvest IP for specific SKU
Update the method of disabling VCN IP for specific SKU for navi1x ASIC,
it will judge whether should add the related IP at the function of
amdgpu_device_ip_block_add().

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-13 10:47:15 -04:00
Likun GAO
83a0b86391 drm/amdgpu: add judgement when add ip blocks (v2)
Judgement whether to add an sw ip according to the harvest info.

v2: fix indentation (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-13 10:46:58 -04:00
Thomas Zimmermann
fd531024ba Merge drm/drm-next into drm-misc-next
Backmerging to get v5.12 fixes. Requested for vmwgfx.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2021-05-11 15:59:18 +02:00
Zhen Lei
0bb6d3db4f drm/amdgpu: Delete two unneeded bool conversions
The result of an expression consisting of a single relational operator is
already of the bool type and does not need to be evaluated explicitly.

No functional change.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11 09:44:35 -04:00
Dwaipayan Ray
c666bbf0e9 drm/amd/amdgpu: Fix errors in function documentation
Fix a couple of syntax errors and removed one excess
parameter in the function documentations which lead
to kernel docs build warning.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:14:49 -04:00
Dennis Li
7780f50358 drm/amdgpu: add function to clear MMEA error status for aldebaran
For aldebaran, hardware will not clear error status automatically when
reading error status register, insteadly driver should set clear bit of
the error status register explicitly to clear error status.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:11:44 -04:00
Dennis Li
4f64f1c8e1 drm/amdgpu: correct the funtion to clear GCEA error status
The bit 11 of GCEA_ERR_STATUS register is used to clear GCEA error
status.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:11:38 -04:00
Dennis Li
011907fda3 drm/amdgpu: covert ras status to kernel errno
The original codes use ras status and kernl errno together in the same
function, which is a wrong code style.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:09:37 -04:00
Oak Zeng
7ddd977085 drm/amdgpu: Quit RAS initialization earlier if RAS is disabled
If RAS is disabled through amdgpu_ras_enable kernel parameter,
we should quit the RAS initialization eariler to avoid initialization
of some RAS data structure such as sysfs etc.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:08:48 -04:00
Luben Tuikov
ef0d7d2001 drm/amdgpu: Export ras_*_enabled to debugfs
Export the runtime-set "ras_hw_enabled" and
"ras_enabled" to debugfs, for debugging.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:08:25 -04:00
Luben Tuikov
8ab0d6f030 drm/amdgpu: Rename to ras_*_enabled
Rename,
  ras_hw_supported --> ras_hw_enabled, and
  ras_features     --> ras_enabled,
to show that ras_enabled is a subset of
ras_hw_enabled, which itself is a subset
of the ASIC capability.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:08:12 -04:00
Luben Tuikov
e509965e58 drm/amdgpu: Move up ras_hw_supported
Move ras_hw_supported into struct amdgpu_dev.
The dependency is:
struct amdgpu_ras <== struct amdgpu_dev <== ASIC,
read as "struct amdgpu_ras depends on struct
amdgpu_dev, which depends on the hardware."

This can be loosely understood as, "if RAS is
supported, which is property of the ASIC (struct
amdgpu_dev), then we can access struct
amdgpu_ras."

v2: Fix a typo: must binary AND in ternary cond
    in amdgpu_ras.c

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:08:06 -04:00
Luben Tuikov
acdae2169b drm/amdgpu: Remove redundant ras->supported
Remove redundant ras->supported, as this value
is also stored in adev->ras_features.

Use adev->ras_features, as that supercedes "ras",
since the latter is its member.

The dependency goes like this:
ras <== adev->ras_features <== hw_supported,
and is read as "ras depends on ras_features, which
depends on hw_supported." The arrows show the flow
of information, i.e. the dependency update.

"hw_supported" should also live in "adev".

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:07:56 -04:00
Sathishkumar S
71efc8701a drm/amdgpu: update vcn1.0 Non-DPG suspend sequence
update suspend register settings in Non-DPG mode.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:07:48 -04:00
Dennis Li
2a1bf57c0f drm/amdgpu: update the shader to clear specific SGPRs
Add shader codes to explicitly clear specific SGPRs, such as
flat_scratch_lo, flat_scratch_hi and so on. And also correct the
allocation size of SGPRs in PGM_RSRC1.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:07:27 -04:00
Mukul Joshi
da6b993717 drm/amdgpu: Enable TCP channel hashing for Aldebaran
Enable TCP channel hashing to match DF hash settings for Aldebaran.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:07:20 -04:00
Bas Nieuwenhuizen
37ac3dc00d drm/amdgpu: Use device specific BO size & stride check.
The builtin size check isn't really the right thing for AMD
modifiers due to a couple of reasons:

1) In the format structs we don't do set any of the tilesize / blocks
etc. to avoid having format arrays per modifier/GPU
2) The pitch on the main plane is pixel_pitch * bytes_per_pixel even
for tiled ...
3) The pitch for the DCC planes is really the pixel pitch of the main
surface that would be covered by it ...

Note that we only handle GFX9+ case but we do this after converting
the implicit modifier to an explicit modifier, so on GFX9+ all
framebuffers should be checked here.

There is a TODO about DCC alignment, but it isn't worse than before
and I'd need to dig a bunch into the specifics. Getting this out in
a reasonable timeframe to make sure it gets the appropriate testing
seemed more important.

Finally as I've found that debugging addfb2 failures is a pita I was
generous adding explicit error messages to every failure case.

Fixes: f258907fdd ("drm/amdgpu: Verify bo size can fit framebuffer size on init.")
Tested-by: Simon Ser <contact@emersion.fr>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:45 -04:00
Bas Nieuwenhuizen
bcfbb6016b drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.
Otherwise tiling modes that require the values form this field
(In particular _*_X) would be corrupted upon video decode.

Copied from the VCN v2 code.

Fixes: 99541f392b ("drm/amdgpu: add mc resume DPG mode for VCN3.0")
Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:45 -04:00
Alex Deucher
67387dfe0f drm/amdgpu: change the default timeout for kernel compute queues
Change to 60s.  This matches what we already do in virtualization.
Infinite timeout can lead to deadlocks in the kernel.

Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:45 -04:00
Sathishkumar S
a8f768874a drm/amdgpu: set vcn mgcg flag for picasso
enable vcn mgcg flag for picasso.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:45 -04:00
Fabio M. De Francesco
fb63726523 drm/amd/amdgpu/amdgpu_drv.c: Replace drm_modeset_lock_all with drm_modeset_lock
drm_modeset_lock_all() is not needed here, so it is replaced with
drm_modeset_lock(). The crtc list around which we are looping never
changes, therefore the only lock we need is to protect access to
crtc->state.

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:45 -04:00
Alex Deucher
36f77e12a2 drm/amdgpu: drop the GCR packet from the emit_ib frame for sdma5.0
It's not needed here and has been added to the proper place
in the previous patch.  This aligns with what we do for sdma 5.2.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:45 -04:00
Alex Deucher
e8d7aa68c8 drm/amdgpu: Add graphics cache rinse packet for sdma 5.0
Add emit mem sync callback for sdma_v5_0

In amdgpu sync object test, three threads created jobs
to send GFX IB and SDMA IB in sequence. After the first
GFX thread joined, sometimes the third thread will reuse
the same physical page to store the SDMA IB. There will
be a risk that SDMA will read GFX IB in the previous physical
page. So it's better to flush the cache before commit sdma IB.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:45 -04:00
Stanley.Yang
f50160cf0f drm/amdgpu: force enable gfx ras for vega20 ws
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:44 -04:00
Nirmoy Das
b617207e80 drm/amdgpu: remove excess function parameter
Fix below htmldocs build warning:

"warning: Excess function parameter 'vm_context' description in 'amdgpu_vm_init'"

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:44 -04:00
Peng Ju Zhou
589bb0ca47 drm/amdgpu: Rename the flags to eliminate ambiguity v2
The flags vf_reg_access_* may cause confusion,
rename the flags to make it more clear.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:44 -04:00
Evan Quan
a1b6aa4947 drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC
Polaris12 32bit ASIC needs a special MC firmware.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Zhigang Luo
e7de0d844e drm/amdgpu: Add Aldebaran virtualization support
1. add Aldebaran in virtualization detection list.
2. disable Aldebaran virtual display support as there is no GFX
   engine in Aldebaran.
3. skip TMR loading if Aldebaran is in virtualizatin mode as it
   shares the one host loaded.

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Zhigang Luo
838eb73c8d drm/amdgpu: Add a new device ID for Aldebaran
It is Aldebaran VF device ID, for virtualization support.

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
1f6e8eb153 drm/amdgpu: enable gfx ras in aldebran by default
gfx ras now can be enabled by default in aldebaran

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
9adaac6eb4 drm/amdgpu: switch to mmhub ras callback for ras fini
invoke callback function for mmhub ras fini

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
8e17ddc2e2 drm/amdgpu: retired reset_ras_error_count from hdp callbacks
It was moved to hdp ras callbacks

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
78871b6c8b drm/amdgpu: enable ras error count query and reset for HDP
add hdp block ras error query and reset support in
amdgpu ras error count query and reset interface

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
7c63694eb9 drm/amdgpu: init/fini hdp v4_0 ras
invoke hdp v4_0 ras init in gmc late_init phase
while ras fini in gmc sw_fini phase

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
6f12507fad drm/amdgpu: initialize hdp v4_0 ras functions
hdp v4_0 support ras features

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
ca81b26d21 drm/amdgpu: implement hdp v4_0 ras functions
implement hdp v4_0 ras functions, including
ras init/fini, query/reset_error_counter

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
b11625f56f drm/amdgpu: add helpers for hdp ras init/fini
hdp ras init/fini are common functions that
can be shared among hdp generations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Hawking Zhang
8f4a92937b drm/amdgpu: add hdp ras structures
centralize all hdp ras operation to ras_funcs

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:43 -04:00
Simon Ser
b44cdca7fd amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create
This error code-path is missing a drm_gem_object_put call. Other
error code-paths are fine.

Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 1769152ac6 ("drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:42 -04:00
Fabio M. De Francesco
1fdbbc123f drm/amd/amdgpu: Fix errors in documentation of function parameters
In the function documentation, I removed the excess parameters,
described the undocumented ones, and fixed the syntax errors.

Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:42 -04:00
Kai-Heng Feng
440d8774ef drm/amdgpu: Register VGA clients after init can no longer fail
When an amdgpu device fails to init, it makes another VGA device cause
kernel splat:
kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed
kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
kernel: amdgpu: probe of 0000:08:00.0 failed with error -110
...
kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G        W         5.12.0-rc8+ #12
kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020
kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu]
kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0
kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000
kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38
kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246
kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000
kernel: FS:  00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  vga_arbiter_notify_clients.part.0+0x4a/0x80
kernel:  vga_get+0x17f/0x1c0
kernel:  vga_arb_write+0x121/0x6a0
kernel:  ? apparmor_file_permission+0x1c/0x20
kernel:  ? security_file_permission+0x30/0x180
kernel:  vfs_write+0xca/0x280
kernel:  ksys_write+0x67/0xe0
kernel:  __x64_sys_write+0x1a/0x20
kernel:  do_syscall_64+0x38/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f93041e02f7
kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7
kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f
kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0
kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808
kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980
kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic
kernel: CR2: 0000000000000018
kernel: ---[ end trace 76d04313d4214c51 ]---

Commit 4192f7b576 ("drm/amdgpu: unmap register bar on device init
failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(),
so the VGA clients remain registered. So when
vga_arbiter_notify_clients() iterates over registered clients, it causes
NULL pointer dereference.

Since there's no reason to register VGA clients that early, so solve
the issue by putting them after all the goto cleanups.

v2:
 - Remove redundant vga_switcheroo cleanup in failed: label.

Fixes: 4192f7b576 ("drm/amdgpu: unmap register bar on device init failure")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:42 -04:00
Pavan Kumar Ramayanam
8e4d5d43cc drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown
The runtime resume PM op disregards the return value from
amdgpu_device_resume(), masking errors for failed resumes at the PM
layer.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Pavan Kumar Ramayanam <pavan.ramayanam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:42 -04:00
Victor Zhao
db7f1e0140 drm/amdgpu: fix r initial values
Sriov gets suspend of IP block <dce_virtual> failed as return
value was not initialized.

v2: return 0 directly to align original code semantic before this
was broken out into a separate helper function instead of setting
initial values

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:42 -04:00
Dave Airlie
0844708ac3 Merge tag 'amd-drm-fixes-5.13-2021-05-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-5.13-2021-05-05:

amdgpu:
- MPO hang workaround
- Fix for concurrent VM flushes on vega/navi
- dcefclk is not adjustable on navi1x and newer
- MST HPD debugfs fix
- Suspend/resumes fixes
- Register VGA clients late in case driver fails to load
- Fix GEM leak in user framebuffer create
- Add support for polaris12 with 32 bit memory interface
- Fix duplicate cursor issue when using overlay
- Fix corruption with tiled surfaces on VCN3
- Add BO size and stride check to fix BO size verification

radeon:
- Fix off-by-one in power state parsing
- Fix possible memory leak in power state parsing

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210506033929.3875-1-alexander.deucher@amd.com
2021-05-07 12:44:51 +10:00
Bas Nieuwenhuizen
234055fd97 drm/amdgpu: Use device specific BO size & stride check.
The builtin size check isn't really the right thing for AMD
modifiers due to a couple of reasons:

1) In the format structs we don't do set any of the tilesize / blocks
etc. to avoid having format arrays per modifier/GPU
2) The pitch on the main plane is pixel_pitch * bytes_per_pixel even
for tiled ...
3) The pitch for the DCC planes is really the pixel pitch of the main
surface that would be covered by it ...

Note that we only handle GFX9+ case but we do this after converting
the implicit modifier to an explicit modifier, so on GFX9+ all
framebuffers should be checked here.

There is a TODO about DCC alignment, but it isn't worse than before
and I'd need to dig a bunch into the specifics. Getting this out in
a reasonable timeframe to make sure it gets the appropriate testing
seemed more important.

Finally as I've found that debugging addfb2 failures is a pita I was
generous adding explicit error messages to every failure case.

Fixes: f258907fdd ("drm/amdgpu: Verify bo size can fit framebuffer size on init.")
Tested-by: Simon Ser <contact@emersion.fr>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-05 23:09:54 -04:00
Bas Nieuwenhuizen
8bf073ca92 drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.
Otherwise tiling modes that require the values form this field
(In particular _*_X) would be corrupted upon video decode.

Copied from the VCN v2 code.

Fixes: 99541f392b ("drm/amdgpu: add mc resume DPG mode for VCN3.0")
Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-05-05 23:09:12 -04:00
Roy Sun
8744425411 drm/amdgpu: Add show_fdinfo() interface
Tracking devices, process info and fence info using
/proc/pid/fdinfo

Signed-off-by: David M Nieto <David.Nieto@amd.com>
Signed-off-by: Roy Sun <Roy.Sun@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210426062701.39732-2-Roy.Sun@amd.com
2021-05-05 09:26:53 +02:00
Evan Quan
c83c4e1912 drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC
Polaris12 32bit ASIC needs a special MC firmware.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-05-04 17:35:25 -04:00
Christian König
d79025c7f5 drm/ttm: always initialize the full ttm_resource v2
Init all fields in ttm_resource_alloc() when we create a new resource.

v2: use place->mem_type instead of res->mem_type

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210430092508.60710-2-christian.koenig@amd.com
2021-05-03 12:50:41 +02:00
Simon Ser
e0c16eb4b3 amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create
This error code-path is missing a drm_gem_object_put call. Other
error code-paths are fine.

Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 1769152ac6 ("drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-04-29 00:04:27 -04:00
Kai-Heng Feng
8c3dd61cfa drm/amdgpu: Register VGA clients after init can no longer fail
When an amdgpu device fails to init, it makes another VGA device cause
kernel splat:
kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed
kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
kernel: amdgpu: probe of 0000:08:00.0 failed with error -110
...
kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G        W         5.12.0-rc8+ #12
kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020
kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu]
kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0
kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000
kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38
kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246
kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000
kernel: FS:  00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  vga_arbiter_notify_clients.part.0+0x4a/0x80
kernel:  vga_get+0x17f/0x1c0
kernel:  vga_arb_write+0x121/0x6a0
kernel:  ? apparmor_file_permission+0x1c/0x20
kernel:  ? security_file_permission+0x30/0x180
kernel:  vfs_write+0xca/0x280
kernel:  ksys_write+0x67/0xe0
kernel:  __x64_sys_write+0x1a/0x20
kernel:  do_syscall_64+0x38/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f93041e02f7
kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7
kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f
kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0
kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808
kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980
kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic
kernel: CR2: 0000000000000018
kernel: ---[ end trace 76d04313d4214c51 ]---

Commit 4192f7b576 ("drm/amdgpu: unmap register bar on device init
failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(),
so the VGA clients remain registered. So when
vga_arbiter_notify_clients() iterates over registered clients, it causes
NULL pointer dereference.

Since there's no reason to register VGA clients that early, so solve
the issue by putting them after all the goto cleanups.

v2:
 - Remove redundant vga_switcheroo cleanup in failed: label.

Fixes: 4192f7b576 ("drm/amdgpu: unmap register bar on device init failure")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29 00:03:21 -04:00
Pavan Kumar Ramayanam
b45aeb2dea drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown
The runtime resume PM op disregards the return value from
amdgpu_device_resume(), masking errors for failed resumes at the PM
layer.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Pavan Kumar Ramayanam <pavan.ramayanam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29 00:03:11 -04:00
Victor Zhao
4b12ee6f42 drm/amdgpu: fix r initial values
Sriov gets suspend of IP block <dce_virtual> failed as return
value was not initialized.

v2: return 0 directly to align original code semantic before this
was broken out into a separate helper function instead of setting
initial values

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-04-29 00:02:40 -04:00
Christian König
20a5f5a98e drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2
Starting with Vega the hardware supports concurrent flushes
of VMID which can be used to implement per process VMID
allocation.

But concurrent flushes are mutual exclusive with back to
back VMID allocations, fix this to avoid a VMID used in
two ways at the same time.

v2: don't set ring to NULL

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-04-28 23:45:54 -04:00
Dennis Li
0e0036c7d1 drm/amdgpu: fix no full coverage issue for gprs initialization
The wave's number per simd in aldebaran is changed to 8, so it is
impossible to use old algorithm to initiate all sgprs with one
threadgroup. The new algorithm firstly use three threadgroups to
initiate most sgprs simultaneously and then use another threadgroup with
4 waves to cover other uninitiated sgprs.

v2:
Add more description about the new algorithm to clear sgprs and add some
comment for shader binaries

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:36:05 -04:00
Philip Yang
36255b5f61 drm/amdgpu: address remove from fault filter
Add interface to remove address from fault filter ring by resetting
fault ring entry key, then future vm fault on the address will be
processed to recover.

Define fault key as atomic64_t type to use atomic read/set/cmpxchg key
to protect fault ring access by interrupt handler and interrupt deferred
work for vg20. Change fault->timestamp to 48-bit to share same uint64_t
with 8-bit fault->next, it is enough for 48bit IH timestamp.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:36:05 -04:00
Philip Yang
11dd55d174 drm/amdgpu: return IH ring drain finished if ring is empty
Sometimes IH do not setup ring wptr overflow flag after wptr exceed
rptr. As a workaround, if IH rptr equals to wptr, ring is empty,
return true to indicate IH ring checkpoint is processed, IH ring drain
is finished.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:36:05 -04:00
Jack Zhang
95ea3dbc4e drm/amd/amdgpu/sriov disable all ip hw status by default
Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.

The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.

Without this change, sriov tdr have gfx IB test fail.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Review-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:36:05 -04:00
Christian König
dd03daec0f drm/amdgpu: restructure amdgpu_vram_mgr_new
Merge the two loops, loosen the restriction for big allocations.
This reduces the CPU overhead in the good case, but increases
it a bit under memory pressure.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-and-Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:36:05 -04:00
Feifei Xu
5d11699914 drm/amdgpu: Correct and simplify sdma 4.x irq.num_types
Correct and init the sdma4.x irq.num_types.

v2: squash in fix (Alex)

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:36:04 -04:00
Feifei Xu
3d2bee9188 drm/amdgpu: Change the sdma interrupt print level
Change the print level into debug.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:35:51 -04:00
Victor Zhao
041e69160d drm/amdgpu/sriov: Remove clear vf fw support
PSP clear_vf_fw feature is outdated and has been removed.
Remove the related functions.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:35:51 -04:00
Hawking Zhang
a30f128602 drm/amdgpu: provide socket/die id info in RAS message
Add socket/die information in RAS messages for platforms
that support query those information

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:35:50 -04:00
Hawking Zhang
dfdd4b8a95 drm/amdgpu: implement smuio callback to query socket id
get_socket_id is used to query socket id

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:35:49 -04:00
Jinzhou Su
ec0f72cb95 drm/amdgpu: Enable SDMA LS for Vangogh
Add flags AMD_CG_SUPPORT_SDMA_LS for Vangogh.
Start to open sdma ls from firmware version 70.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:35:49 -04:00
Souptick Joarder
5f5cb2afd6 drm/amdgpu: Added missing prototype
Kernel test robot throws below warning ->

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:125:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_load'
>> [-Wmissing-prototypes]
     125 | int kgd_arcturus_hqd_sdma_load(struct kgd_dev *kgd, void
*mqd,

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:195:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_dump'
>> [-Wmissing-prototypes]
     195 | int kgd_arcturus_hqd_sdma_dump(struct kgd_dev *kgd,

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:227:6: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_is_occupied'
>> [-Wmissing-prototypes]
     227 | bool kgd_arcturus_hqd_sdma_is_occupied(struct kgd_dev *kgd,
void *mqd)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:246:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_destroy'
>> [-Wmissing-prototypes]
     246 | int kgd_arcturus_hqd_sdma_destroy(struct kgd_dev *kgd, void
*mqd,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Added prototype for these functions.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:35:49 -04:00
Lyude Paul
0c4fada608 drm/dp: Pass drm_dp_aux to drm_dp*_link_train_channel_eq_delay()
So that we can start using drm_dbg_*() for
drm_dp_link_train_channel_eq_delay() and
drm_dp_lttpr_link_train_channel_eq_delay().

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-7-lyude@redhat.com
Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27 18:43:42 -04:00
Lyude Paul
9e98666644 drm/dp: Pass drm_dp_aux to drm_dp_link_train_clock_recovery_delay()
So that we can start using drm_dbg_*() in
drm_dp_link_train_clock_recovery_delay().

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-6-lyude@redhat.com
Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27 18:43:42 -04:00
Lyude Paul
6cba3fe433 drm/dp: Add backpointer to drm_device in drm_dp_aux
This is something that we've wanted for a while now: the ability to
actually look up the respective drm_device for a given drm_dp_aux struct.
This will also allow us to transition over to using the drm_dbg_*() helpers
for debug message printing, as we'll finally have a drm_device to reference
for doing so.

Note that there is one limitation with this - because some DP AUX adapters
exist as platform devices which are initialized independently of their
respective DRM devices, one cannot rely on drm_dp_aux->drm_dev to always be
non-NULL until drm_dp_aux_register() has been called. We make sure to point
this out in the documentation for struct drm_dp_aux.

v3:
* Add WARN_ON_ONCE() to drm_dp_aux_register() if drm_dev isn't filled out

Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210423184309.207645-4-lyude@redhat.com
Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-04-27 18:43:42 -04:00
Maxime Ripard
355b602961
Merge drm/drm-next into drm-misc-next
Christian needs some patches from drm/next

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2021-04-26 14:03:09 +02:00
Christian König
3dc7216c1d drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2
Starting with Vega the hardware supports concurrent flushes
of VMID which can be used to implement per process VMID
allocation.

But concurrent flushes are mutual exclusive with back to
back VMID allocations, fix this to avoid a VMID used in
two ways at the same time.

v2: don't set ring to NULL

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:19:05 -04:00
Nirmoy Das
42daecfc20 drm/amdgpu: remove AMDGPU_GEM_CREATE_SHADOW flag
Remove unused AMDGPU_GEM_CREATE_SHADOW flag.
Userspace is never allowed to use this.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:18:56 -04:00
Nirmoy Das
cd2454d6cd drm/amdgpu: cleanup amdgpu_bo_create()
Remove shadow bo related code as vm code is creating shadow bo
using proper API. Without shadow bo code, amdgpu_bo_create() is basically
a wrapper around amdgpu_bo_do_create(). So rename amdgpu_bo_do_create()
to amdgpu_bo_create().

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:18:47 -04:00
Nirmoy Das
adf6f5c51e drm/amdgpu: create shadow bo using amdgpu_bo_create_shadow()
Shadow BOs are only needed for vm code so call amdgpu_bo_create_shadow()
directly instead of depending on amdgpu_bo_create().

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:18:32 -04:00
Nirmoy Das
77df5c131d drm/amdgpu: remove unused vm context flags
Remove unused AMDGPU_VM_CONTEXT_GFX and AMDGPU_VM_CONTEXT_COMPUTE
flags.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:18:26 -04:00
Nirmoy Das
a35455d065 drm/amdgpu: cleanup amdgpu_vm_init()
Currently only way to create compute vm is through
amdgpu_vm_make_compute(). So vm_context isn't required
anymore for amdgpu_vm_init().

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:18:19 -04:00
Nirmoy Das
25e9146ae6 drm/amdgpu: expose amdgpu_bo_create_shadow()
Exposed amdgpu_bo_create_shadow() will be needed
for amdgpu_vm handling.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:18:10 -04:00
Christian König
589939d401 drm/amdgpu: fix coding style and documentation in amdgpu_vram_mgr.c
No functional changes, just cleaning up some leftovers and improve
documentation.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.aiemd@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:18:02 -04:00
Christian König
a614b336f1 drm/amdgpu: fix coding style and documentation in amdgpu_gtt_mgr.c
Avoid the forward define, fix coding style, add documentation.

No functional change.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.aiemd@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:17:56 -04:00
Alex Sierra
126bbd4ab5 drm/amdgpu: extend xnack limit page fault timeout
Extending this timeout will prevent IH from storm interrupts coming
from SDMA while a page fault is active. Currently, on Aldebaran,
handling that many interrupts can take a lot of CPU time
(up to 4 seconds).
This eventually causes timeouts in other tasks.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:16:20 -04:00
Hawking Zhang
502f0e2804 drm/amdgpu: disable gfx ras by default in aldebaran
aldebaran gfx ras is still under development

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:16:14 -04:00
Kenneth Feng
1d712be90a drm/amd/amdgpu: add cgls
enable cgls to improve the runtime power efficiency.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:15:59 -04:00
Stanley.Yang
19d0dfda4c drm/amdgpu: optimize gfx ras features flag clean
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:15:52 -04:00
Jinzhou Su
ef9bcfde9e drm/amdgpu: Enable SDMA MGCG for Vangogh
Add flags AMD_CG_SUPPORT_SDMA_MGCG for Vangogh.
Start to open sdma mgcg from firmware version 70.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:15:39 -04:00
John Clements
7e882aee84 drm/amdgpu: add support for ras init flags
conditionally configure ras for dgpu mode or poison propogation mode

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:15:33 -04:00
Dennis Li
6effe77972 drm/amdgpu: refine gprs init shaders to check coverage
Add codes to check whether all SIMDs are covered, make sure that all
GPRs are initialized.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-23 17:15:21 -04:00
Christian König
c777dc9e79 drm/ttm: move the page_alignment into the BO v2
The alignment is a constant property and shouldn't change.

v2: move documentation as well as suggested by Matthew.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210413135248.1266-4-christian.koenig@amd.com
2021-04-23 16:23:02 +02:00
Lee Jones
3bffd71deb drm/amd/amdgpu/amdgpu_cs: Repair some function naming disparity
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:685: warning: expecting prototype for cs_parser_fini(). Prototype was for amdgpu_cs_parser_fini() instead
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1502: warning: expecting prototype for amdgpu_cs_wait_all_fence(). Prototype was for amdgpu_cs_wait_all_fences() instead
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1656: warning: expecting prototype for amdgpu_cs_find_bo_va(). Prototype was for amdgpu_cs_find_mapping() instead

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Jerome Glisse <glisse@freedesktop.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:51:13 -04:00
Lee Jones
03691f5502 drm/amd/amdgpu/amdgpu_ring: Provide description for 'sched_score'
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:169: warning: Function parameter or member 'sched_score' not described in 'amdgpu_ring_init'

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:51:09 -04:00
Lee Jones
27aa4a69b4 drm/amd/amdgpu/amdgpu_ttm: Fix incorrectly documented function 'amdgpu_ttm_copy_mem_to_mem()'
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:311: warning: expecting prototype for amdgpu_copy_ttm_mem_to_mem(). Prototype was for amdgpu_ttm_copy_mem_to_mem() instead

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Jerome Glisse <glisse@freedesktop.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:51:06 -04:00
Lee Jones
777d9000d9 drm/amd/amdgpu/amdgpu_gart: Correct a couple of function names in the docs
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:73: warning: expecting prototype for amdgpu_dummy_page_init(). Prototype was for amdgpu_gart_dummy_page_init() instead
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:96: warning: expecting prototype for amdgpu_dummy_page_fini(). Prototype was for amdgpu_gart_dummy_page_fini() instead

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:51:02 -04:00
Lee Jones
b16cc4bb1a drm/amd/amdgpu/amdgpu_fence: Provide description for 'sched_score'
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:444: warning: Function parameter or member 'sched_score' not described in 'amdgpu_fence_driver_init_ring'

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Jerome Glisse <glisse@freedesktop.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:50:59 -04:00
Lee Jones
2196927bcb drm/amd/amdgpu/amdgpu_device: Remove unused variable 'r'
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_suspend’:
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3733:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:50:48 -04:00
Alex Sierra
485bea1f90 drm/amdgpu: add svm_bo eviction to enable_signal cb
Add to amdgpu_amdkfd_fence.enable_signal callback, support
for svm_bo fence eviction.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:49:56 -04:00
Alex Sierra
5f319c5c21 drm/amdgpu: svm bo enable_signal call condition
[why]
To support svm bo eviction mechanism.

[how]
If the BO crated has AMDGPU_AMDKFD_CREATE_SVM_BO flag set,
enable_signal callback will be called inside amdgpu_evict_flags.
This also causes gutting of the BO by removing all placements,
so that TTM won't actually do an eviction. Instead it will discard
the memory held by the BO. This is needed for HMM migration to user
mode system memory pages.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:49:49 -04:00
Alex Sierra
f04c79cfba drm/amdgpu: add param bit flag to create SVM BOs
Add CREATE_SVM_BO define bit for SVM BOs.
Another define flag was moved to concentrate these
KFD type flags in one include file.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:49:31 -04:00
Alex Sierra
eb2cec5537 drm/amdkfd: add svm_bo reference for eviction fence
[why]
As part of the SVM functionality, the eviction mechanism used for
SVM_BOs is different. This mechanism uses one eviction fence per prange,
instead of one fence per kfd_process.

[how]
A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between
SVM_BO or regular BO evictions. This also include modifications to set the
reference at the fence creation call.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:49:22 -04:00
Alex Sierra
ea53af8a59 drm/amdkfd: SVM API call to restore page tables
Use SVM API to restore page tables when retry fault and
compute context are enabled.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:49:15 -04:00
Alex Sierra
9dd9cc2f74 drm/amdgpu: enable 48-bit IH timestamp counter
By default this timestamp is 32 bit counter. It gets
overflowed in around 10 minutes.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:48:58 -04:00
Philip Yang
c46ebb6a6d drm/amdkfd: set memory limit to avoid OOM with HMM enabled
HMM migration alloc sizeof(struct page) on system memory for each VRAM
page, it is 1GB system memory reserved for 64GB VRAM. To avoid
application OOM, increase system memory used size based on VRAM size of
all GPUs, then application alloc memory will fail if system memory usage
reach the limit.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:48:00 -04:00
Felix Kuehling
9705c85ff2 drm/amdgpu: Enable retry faults unconditionally on Aldebaran
This is needed to allow per-process XNACK mode selection in the SQ when
booting with XNACK off by default.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:47:34 -04:00
Felix Kuehling
f80fe9d3c1 drm/amdkfd: map svm range to GPUs
Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap
svm range system memory pages address to GPUs.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:47:21 -04:00
Philip Yang
d27afacfea drm/amdgpu: export vm update mapping interface
It will be used by kfd to map svm range to GPU, because svm range does
not have amdgpu_bo and bo_va, cannot use amdgpu_bo_update interface, use
amdgpu vm update interface directly.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:47:13 -04:00
Philip Yang
d8a3c1c80c drm/amdkfd: support larger svm range allocation
For larger range allocation, if hmm_range_fault return -EBUSY, set retry
timeout based on 1 second for every 512MB, this is safe timeout value.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:46:50 -04:00
Philip Yang
04d8d73dbc drm/amdgpu: add common HMM get pages function
Move the HMM get pages function from amdgpu_ttm and to amdgpu_mn. This
common function will be used by new svm APIs.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:46:43 -04:00
Felix Kuehling
cccbeb6209 drm/amdgpu: Remove verify_access shortcut for KFD BOs
This shortcut is no longer needed with access managed properly by KFD.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:46:01 -04:00
Felix Kuehling
d4ec4bdc0b drm/amdkfd: Allow access for mmapping KFD BOs
DRM render node file handles are used for CPU mapping of BOs using mmap
by the Thunk. It uses the DRM render node of the GPU where the BO was
allocated.

DRM allows mmap access automatically when it creates a GEM handle for a
BO. KFD BOs don't have GEM handles, so KFD needs to manage access
manually. Use drm_vma_node_allow to allow user mode to mmap BOs allocated
with kfd_ioctl_alloc_memory_of_gpu through the DRM render node that was
used in the kfd_ioctl_acquire_vm call for the same GPU.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:54 -04:00
Felix Kuehling
b40a6ab2cf drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap
to access the BO through the corresponding file descriptor. The VM can
also be extracted from drm_priv, so drm_priv can replace the vm parameter
in the kfd2kgd interface.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:45 -04:00
Alex Deucher
7845d80dda drm/amdgpu/gmc9: remove dummy read workaround for newer chips
Aldebaran has a hw fix so no longer requires the workaround.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:36 -04:00
Jinzhou Su
5c88e3b86a drm/amdgpu: Add mem sync flag for IB allocated by SA
The buffer of SA bo will be used by many cases. So it's better
to invalidate the cache of indirect buffer allocated by SA before
commit the IB.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:24 -04:00
Mukul Joshi
ceb47e0d84 drm/amdgpu: Fix SDMA RAS error reporting on Aldebaran
Fix the following issues with SDMA RAS error reporting:
1. Read the EDC_COUNTER2 register also to fetch error counts
   for all sub-blocks in SDMA.
2. SDMA RAS on Aldebaran suports single-bit uncorrectable errors
   only. So, report error count in UE count instead of CE count.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-By: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:17 -04:00
Mukul Joshi
1f0d8e3781 drm/amdgpu: Reset RAS error count and status regs
Reset the RAS error count and error status registers after
reading to prevent over reporting error counts on Aldebaran.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-By: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:10 -04:00
Oak Zeng
5f41741a6d Revert "drm/amdgpu: workaround the TMR MC address issue (v2)"
This reverts commit 2f055097da.
2f055097da was a driver workaround
when PSP firmware was not ready. Now the PSP fw is ready so we
revert this driver workaround.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:43:49 -04:00
Jiansong Chen
7c49ee9ec5 drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefish
dimgrey_cavefish has similar gc_10_3 ip with sienna_cichlid,
so follow its registers offset setting.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:36:28 -04:00
John Clements
f9727922fc drm/amdgpu: resolve erroneous gfx_v9_4_2 prints
resolve bug on aldebaran where gfx error counts will
print on driver load when there are no errors present

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:36:10 -04:00
Dennis Li
6df23f4c5c drm/amdgpu: fix a error injection failed issue
because "sscanf(str, "retire_page")" always return 0, if application use
the raw data for error injection, it always wrongly falls into "op ==
3". Change to use strstr instead.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:36:01 -04:00
Hawking Zhang
1f8d3ad2a0 drm/amdgpu: only harvest gcea/mmea error status in aldebaran
In aldebaran, driver only needs to harvest SDP
RdRspStatus, WrRspStatus and first parity error
on RdRsp data. Check error type before harvest
error information.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:35:55 -04:00
Hawking Zhang
53ee6609b4 drm/amdgpu: only harvest gcea/mmea error status in arcturus
SDP RdRspStatus/WrRspStatus or first parity error on
RdRsp data can cause system fatal error in arcturus.
GPU will be freezed in such case.

Driver needs to harvest these error information before
reset the GPU. Check error type to avoid harvest normal
gcea/mmea information.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:35:45 -04:00
Huang Rui
9406d39bb6 drm/amdgpu: enable tmz on renoir asics
The tmz functions are verified on renoir chips as well. So enable it by
default.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:35:36 -04:00
Hawking Zhang
28a5d7a589 drm/amdgpu: correct default gfx wdt timeout setting
When gfx wdt was configured to fatal_disable, the
timeout period should be configured to 0x0 (timeout
disabled)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:35:29 -04:00
Christian König
ce4528daf5 drm/amdgpu: check base size instead of mem.num_pages
Drop some ussage of mem in the code.

Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210413135248.1266-2-christian.koenig@amd.com
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2021-04-19 15:05:23 +02:00
Christian König
e2ac853156 drm/amdgpu: freeing pinned objects is illegal now
We want to drop support in TTM for this.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210415084730.2057-2-christian.koenig@amd.com
2021-04-19 14:20:36 +02:00
Christian König
2f40801dc5 drm/amdgpu: make sure we unpin the UVD BO
Releasing pinned BOs is illegal now.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210415084730.2057-1-christian.koenig@amd.com
2021-04-19 14:15:50 +02:00
Dan Carpenter
90cb3d8aca drm/amdgpu: fix an error code in init_pmu_entry_by_type_and_add()
If the kmemdup() fails then this should return a negative error code
but it currently returns success

Fixes: b4a7db71ea ("drm/amdgpu: add per device user friendly xgmi events for vega20")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:45 -04:00
Qingqing Zhuo
fe18017839 Revert "Revert "drm/amdgpu: Ensure that the modifier requested is supported by plane.""
This reverts commit 55fa622fe6.

The regression caused by the original patch has been
cleared, thus introduce back the change.

Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:45 -04:00
Joseph Greathouse
47e5d79a45 drm/amdgpu: Copy MEC FW version to MEC2 if we skipped loading MEC2
If we skipped loading MEC2 firmware separately
from MEC, then MEC2 will be running the same
firmware image. Copy the MEC version and feature
numbers into MEC2 version and feature numbers.
This is needed for things like GWS support, where
we rely on knowing what version of firmware is
running on MEC2. Leaving these MEC2 entries blank
breaks our ability to version-check enables and
workarounds.

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:45 -04:00
Felix Kuehling
f45e6b9d03 drm/amdkfd: Remove legacy code not acquiring VMs
ROCm user mode has acquired VMs from DRM file descriptors for as long
as it supported the upstream KFD. Legacy code to support older versions
of ROCm is not needed any more.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Ramesh Errabolu
ba5b662c36 drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO
Extend current implementation of SG_TABLE construction method to
allow exportation of sub-buffers of a VRAM BO. This capability will
enable logical partitioning of a VRAM BO into multiple non-overlapping
sub-buffers. One example of this use case is to partition a VRAM BO
into two sub-buffers, one for SRC and another for DST.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Luben Tuikov
546aa546b0 drm/amdgpu: Add double-sscanf but invert
Add back the double-sscanf so that both decimal
and hexadecimal values could be read in, but this
time invert the scan so that hexadecimal format
with a leading 0x is tried first, and if that
fails, then try decimal format.

Also use a logical-AND instead of nesting double
if-conditional.

See commit "drm/amdgpu: Fix a bug for input with double sscanf"

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Kenneth Feng
b960cb25b1 drm/amd/amdgpu: add ASPM support on polaris
add ASPM support on polaris

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Kenneth Feng
9d015c0dae drm/amd/amdgpu: enable ASPM on vega
enable ASPM on vega to save the power
without the performance hurt.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Kenneth Feng
3273f8b9e6 drm/amd/amdgpu: enable ASPM on navi1x
enable ASPM on navi1x for the benifit of system power consumption
without performance hurt.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Jack Zhang
d4abd00663 drm/amd/sriov no need to config GECC for sriov
No need to config GECC feature here for sriov
Leave the host drvier to do the configuration job.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Luben Tuikov
737c375b88 drm/amdgpu: Fix kernel-doc for the RAS sysfs interface
Imporve the kernel-doc for the RAS sysfs
interface. Fix the grammar, fix the context.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Luben Tuikov
7fb6407145 drm/amdgpu: Add bad_page_cnt_threshold to debugfs
Add bad_page_cnt_threshold to debugfs, an optional
file system used for debugging, for reporting
purposes only--it usually matches the size of
EEPROM but may be different depending on the
"bad_page_threshold" kernel module option.

The "bad_page_cnt_threshold" is a dynamically
computed value. It depends on three things: the
VRAM size; the size of the EEPROM (or the size
allocated to the RAS table therein); and the
"bad_page_threshold" module parameter. It is a
dynamically computed value, when the amdgpu module
is run, on which further parameters and logic
depend, and as such it is helpful to see the
dynamically computed value in debugfs.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Luben Tuikov
80b0cd0fb9 drm/amdgpu: Fix a bug in checking the result of reserve page
Fix if (ret) --> if (!ret), a bug, for
"retire_page", which caused the kernel to recall
the method with *pos == end of file, and that
bounced back with error. On the first run, we
advanced *pos, but returned 0 back to fs layer,
also a bug.

Fix the logic of the check of the result of
amdgpu_reserve_page_direct()--it is 0 on success,
and non-zero on error, not the other way
around. This patch fixes this bug.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Luben Tuikov
6cb7a1d40a drm/amdgpu: Fix a bug for input with double sscanf
Remove double-sscanf to scan for %llu and 0x%llx,
as that is not going to work!

The %llu will consume the "0" in "0x" of your
input, and the hex value you think you're entering
will always be 0. That is, a valid hex value can
never be consumed.

On the other hand, just entering a hex number
without leading 0x will either be scanned as a
string and not match, for instance FAB123, or
the leading decimal portion is scanned as the
%llu, for instance 123FAB will be scanned as 123,
which is not correct.

Thus remove the first %llu scan and leave only the
%llx scan, removing the leading 0x since %llx can
scan either.

Addresses are usually always hex values, so this
suffices.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Xinhui Pan <xinhui.pan@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Jinzhou Su
b45fdeab45 drm/amdgpu: Add graphics cache rinse packet for sdma
Add emit mem sync callback for sdma_v5_2

In amdgpu sync object test, three threads created jobs
to send GFX IB and SDMA IB in sequence. After the first
GFX thread joined, sometimes the third thread will reuse
the same physical page to store the SDMA IB. There will
be a risk that SDMA will read GFX IB in the previous physical
page. So it's better to flush the cache before commit sdma IB.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
Eric Huang
6890f4cb9a drm/amdkfd: change MTYPEs for Aldebaran's HW requirement
Due to changes of HW memory model, we need to change Aldebaran
MTYPEs to meet HW changes.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:03:27 -04:00
Oak Zeng
36c082378c drm/amdgpu: Introduce new SETUP_TMR interface
This new interface passes both virtual and physical address
to PSP. It is backward compatible with old interface.

v2: use a function to simplify tmr physical address calc (Lijo)

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:03:14 -04:00
Oak Zeng
0ca565ab97 drm/amdgpu: Calling address translation functions to simplify codes
Use amdgpu_gmc_vram_pa and amdgpu_gmc_vram_cpu_pa
to simplify codes. No logic change.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:03:01 -04:00
Oak Zeng
dead5e421a drm/amdgpu: Introduce functions for vram physical addr calculation
Add one function to calculate BO's GPU physical address.
And another function to calculate BO's CPU physical address.

v2: Use functions vs macros (Christian)
    Use more proper function names (Christian)

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:02:54 -04:00
John Clements
651a032121 drm/amdgpu: update gfx 9.4.2 ras error reporting
only output ras error status if an error bit is set or error counter is set

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:02:47 -04:00
John Clements
89514083f8 drm/amdgpu: update mmhub 1.7 ras error reporting
only output ras error status if an error bit is set

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:02:39 -04:00
Thomas Zimmermann
6848c291a5 drm/aperture: Convert drivers to aperture interfaces
Mass-convert all drivers from FB helpers to aperture interfaces. No
functional changes besides checking for returned errno codes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210412131043.5787-3-tzimmermann@suse.de
2021-04-14 09:00:04 +02:00
John Clements
cbb8f989d5 drm/amdgpu: page retire over debugfs mechanism
added support in RAS debugfs to add bad page for isolated page retirement testing

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:58:28 -04:00
John Clements
52a9df8180 drm/amdgpu: enable ras eeprom on aldebaran
enable ras eeprom loading by default on aldebaran

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:54:53 -04:00
John Clements
134d16d50f drm/amdgpu: RAS harvest on driver load
In event of RAS UE + warm reset, error counters shall be harvested and cleared on driver load

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:54:51 -04:00
Jude Shih
f066af882b drm/amdgpu: add DMUB outbox event IRQ source define/complete/debug flag
[Why & How]
We use outbox interrupt that allows us to do the AUX via DMUB
Therefore, we need to add some irq source related definition
in the header files;

Signed-off-by: Jude Shih <shenshih@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:54:42 -04:00
xinhui pan
b16e685725 drm/amdgpu: Fix size overflow
ttm->num_pages is uint32. Hit overflow when << PAGE_SHIFT directly

Fixes: 230c079fdc ("drm/ttm: make num_pages uint32_t")
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:54:13 -04:00
Hawking Zhang
d844c6d747 drm/amdgpu: move mmhub ras_func init to ip specific file
mmhub ras is always owned by gpu driver. ras_funcs
initialization shall be done at ip level, instead of
putting it in common gmc interface file

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:54:10 -04:00
Thomas Zimmermann
e90f8be3b9 drm/amdgpu: Remove unused function amdgpu_bo_fbdev_mmap()
Remove an unused function. Mapping the fbdev framebuffer is apparently
not supported.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:54:05 -04:00
Qingqing Zhuo
55fa622fe6 Revert "drm/amdgpu: Ensure that the modifier requested is supported by plane."
This reverts commit f4a9be998c.

The original commit was found to cause the following two issues
on sienna cichlid:
1. Refresh rate locked during vrrdemo
2. Display sticks on flipped landscape mode after changing
   orientation, and cannot be changed back to regular landscape

Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:53:41 -04:00
Hawking Zhang
719a9b3323 drm/amdgpu: split gfx callbacks into ras and non-ras ones
gfx ras is only available in cerntain ip generations.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:22 -04:00
Hawking Zhang
8bc7b360ad drm/amdgpu: split mmhub callbacks into ras and non-ras ones
mmhub ras is only avaiable in cerntain mmhub ip
generation.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:19 -04:00
Hawking Zhang
68d705dd6a drm/amdgpu: do not register df_mca interrupt in certain config
df/mca ras is not managed by gpu driver when gpu
is connected to cpu through xgmi. gpu driver should
register x86 mca notifier for umc ras error
notification

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:14 -04:00
Hawking Zhang
49070c4ea3 drm/amdgpu: split umc callbacks to ras and non-ras ones
umc ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. split umc callbacks
into ras and non-ras ones so gpu driver only
initializes umc ras callbacks when it manages
umc ras.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:11 -04:00
Hawking Zhang
52137ca852 drm/amdgpu: move xgmi ras functions to xgmi_ras_funcs
xgmi ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. move all xgmi ras
functions to xgmi_ras_funcs so gpu driver only
initializes xgmi ras functions when it manages
xgmi ras.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:07 -04:00
Hawking Zhang
6e36f23193 drm/amdgpu: split nbio callbacks into ras and non-ras ones
nbio ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. split nbio callbacks
into ras and non-ras ones so gpu driver only
initializes nbio ras callbacks when it manages
nbio ras.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:04 -04:00
Hawking Zhang
87da0cc101 drm/amdgpu: implement query_ras_error_address callback
query_ras_error_address will be invoked to query bad
page address when there is poison data in HBM consumed
by GPU engines.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:01 -04:00
Hawking Zhang
878b9e944c drm/amdgpu: implement umc query error count callback
umc query_ras_error_count will be invoked to query
umc correctable and uncorrectable error. It will
reset the umc ras error counter after the query.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:58 -04:00
Hawking Zhang
3f903560d1 drm/amdgpu: add helper funtion to query umc ras error
Add helper functions to query correctable and
uncorrectable umc ras error.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:56 -04:00
Hawking Zhang
1696bf3589 drm/amdgpu: create umc_v6_7_funcs for aldebaran
umc_v6_7_funcs are callbacks to support umc ras
functionalities in aldebaran

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:52 -04:00
Hawking Zhang
75f06251c9 drm/amdgpu: initialze ras caps per paltform config
Driver only manages GFX/SDMA/MMHUB RAS in platforms
that gpu node is connected to cpu through XGMI, other
than that, it queries VBIOS for RAS capabilities.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:48 -04:00
Alex Deucher
c108aef148 drm/amdgpu: drop some unused atombios functions
These were leftover from the old CI dpm code which was
retired a while ago.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:34 -04:00
Bernard Zhao
f08726868c drm/amd: cleanup coding style a bit
Fix patch check warning:
WARNING: suspect code indent for conditional statements (8, 17)
+       if (obj && obj->use < 0) {
+                DRM_ERROR("RAS ERROR: Unbalance obj(%s) use\n", obj->head.name);

WARNING: braces {} are not necessary for single statement blocks
+       if (obj && obj->use < 0) {
+                DRM_ERROR("RAS ERROR: Unbalance obj(%s) use\n", obj->head.name);
+       }

Signed-off-by: Bernard Zhao <bernard@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:31 -04:00
Stanley.Yang
5a43452704 drm/amdgpu: support sdma error injection
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reivewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:23 -04:00
Philip Yang
2b665c3735 drm/amdgpu: reserve fence slot to update page table
Forgot to reserve a fence slot to use sdma to update page table, cause
below kernel BUG backtrace to handle vm retry fault while application is
exiting.

[  133.048143] kernel BUG at /home/yangp/git/compute_staging/kernel/drivers/dma-buf/dma-resv.c:281!
[  133.048487] Workqueue: events amdgpu_irq_handle_ih1 [amdgpu]
[  133.048506] RIP: 0010:dma_resv_add_shared_fence+0x204/0x280
[  133.048672]  amdgpu_vm_sdma_commit+0x134/0x220 [amdgpu]
[  133.048788]  amdgpu_vm_bo_update_range+0x220/0x250 [amdgpu]
[  133.048905]  amdgpu_vm_handle_fault+0x202/0x370 [amdgpu]
[  133.049031]  gmc_v9_0_process_interrupt+0x1ab/0x310 [amdgpu]
[  133.049165]  ? kgd2kfd_interrupt+0x9a/0x180 [amdgpu]
[  133.049289]  ? amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049408]  amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049534]  amdgpu_ih_process+0x9b/0x1c0 [amdgpu]
[  133.049657]  amdgpu_irq_handle_ih1+0x21/0x60 [amdgpu]
[  133.049669]  process_one_work+0x29f/0x640
[  133.049678]  worker_thread+0x39/0x3f0
[  133.049685]  ? process_one_work+0x640/0x640

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:20 -04:00
Peng Ju Zhou
5e025531b7 drm/amdgpu: indirect register access for nv12 sriov
1. expand rlcg interface for gc & mmhub indirect access
2. add rlcg interface for no kiq

v2: squash in fix for gfx9 (Changfeng)

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:17 -04:00
Peng Ju Zhou
5d23851029 drm/amdgpu: indirect register access for nv12 sriov
using the control bits got from host to control registers access.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:13 -04:00
Peng Ju Zhou
77eabc6f59 drm/amdgpu: indirect register access for nv12 sriov
get pf2vf msg info at it's earliest time so that
guest driver can use these info to decide whether
register indirect access enabled.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:09 -04:00
Peng Ju Zhou
8b8a162da8 drm/amdgpu: indirect register access for nv12 sriov
unify host driver and guest driver indirect access
control bits names

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:06 -04:00
Xℹ Ruoyao
9a89a721b4 drm/amdgpu: check alignment on CPU page for bo map
The page table of AMDGPU requires an alignment to CPU page so we should
check ioctl parameters for it.  Return -EINVAL if some parameter is
unaligned to CPU page, instead of corrupt the page table sliently.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:00 -04:00
Huacai Chen
f4d3da72a7 drm/amdgpu: Set a suitable dev_info.gart_page_size
In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages.  So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).

Signed-off-by: Rui Wang <wangr@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1
[Xi: rebased for drm-next, use max_t for checkpatch,
     and reworded commit message.]
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:53 -04:00
Guchun Chen
9973de10b5 drm/amdgpu: fix compiler warning(v2)
warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);

v2: put short variable declaration last

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:42 -04:00
Guchun Chen
3c3dc65433 drm/amdgpu: fix NULL pointer dereference
ttm->sg needs to be checked before accessing its child member.

Call Trace:
 amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
 ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm]
 ttm_bo_release+0x17d/0x300 [ttm]
 amdgpu_bo_unref+0x1a/0x30 [amdgpu]
 amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x78b/0x8b0 [amdgpu]
 kfd_ioctl_alloc_memory_of_gpu+0x118/0x220 [amdgpu]
 kfd_ioctl+0x222/0x400 [amdgpu]
 ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
 __x64_sys_ioctl+0x8e/0xd0
 ? __context_tracking_exit+0x52/0x90
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f97f264d317
Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdb402c338 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f97f3cc63a0 RCX: 00007f97f264d317
RDX: 00007ffdb402c380 RSI: 00000000c0284b16 RDI: 0000000000000003
RBP: 00007ffdb402c380 R08: 00007ffdb402c428 R09: 00000000c4000004
R10: 00000000c4000004 R11: 0000000000000246 R12: 00000000c0284b16
R13: 0000000000000003 R14: 00007f97f3cc63a0 R15: 00007f8836200000

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:38 -04:00
Rohit Khaire
4d675e1eb8 drm/amdgpu: Add new PF2VF flags for VF register access method
Add 3 sub flags to notify guest for indirect reg access of
gc, mmhub and ih

The host sets these flags depending on L1 RAP version,
asic and other scenarios. These flags ensure that
there is compatibility between different guest/host/vbios versions.

Signed-off-by: Rohit Khaire <rohit.khaire@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:22 -04:00
Feifei Xu
0698b13403 drm/amdgpu: skip PP_MP1_STATE_UNLOAD on aldebaran
This message is not needed on Aldebaran.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:50 -04:00
Chengming Gui
4a7ffbdb27 drm/amd/amdgpu: set MP1 state to UNLOAD before reload its FW for vega20/ALDEBARAN
When resume from gpu reset, need set MP1 state to UNLOAD before reload SMU
FW otherwise will cause following errors:
[  121.642772] [drm] reserve 0x400000 from 0x87fec00000 for PSP TMR [  123.801051] [drm] failed to load ucode id (24) [  123.801055] [drm] psp command (0x6) failed and response status is (0x0) [  123.801214] [drm:psp_load_smu_fw [amdgpu]] *ERROR* PSP load smu failed!
[  123.801398] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [  123.801536] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22 [  123.801632] amdgpu 0000:04:00.0: amdgpu: GPU reset(9) failed [  123.801691] amdgpu 0000:07:00.0: amdgpu: GPU reset(9) failed [  123.802899] amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -22

v2: add error info and including ALDEBARAN also

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:43 -04:00
Lijo Lazar
404b277bbe drm/amdgpu: Reset error code for 'no handler' case
If reset handler is not implemented, reset error before proceeding.

Fixes issue with the following trace -
[  106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0
[  106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume
[  106.509116] [drm] PCIE GART of 512M enabled.
[  106.509120] [drm] PTB located at 0x0000008000000000
[  106.509136] [drm] VRAM is lost due to GPU reset!
[  106.509332] [drm] PSP is resuming...

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:37 -04:00
Alex Sierra
03e70a0271 drm/amdgpu: ih reroute for newer asics than vega20
Starting Arcturus, it supports ih reroute through mmio directly
in bare metal environment. This is also valid for newer asics
such as Aldebaran.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:11 -04:00
Nirmoy Das
84e070f58a drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
Offset calculation wasn't correct as start addresses are in pfn
not in bytes.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:02 -04:00
Evan Quan
2d64d23e95 drm/amd/pm: unify the interface for gfx state setting
No need to have special handling for swSMU supported ASICs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:51 -04:00
Evan Quan
2e4b2f7b57 drm/amd/pm: unify the interface for loading SMU microcode
No need to have special handling for swSMU supported ASICs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:38 -04:00
Lijo Lazar
928a0fe688 drm/amdgpu: Fix build warnings
Fix header guard and make internal functions static. Fixes the below warnings:

drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
drivers/gpu/drm/amd/amdgpu/aldebaran.c:110:6: warning: no previous prototype for function 'aldebaran_async_reset' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/aldebaran_ppt.c:1435:5: warning: no previous prototype for function 'aldebaran_mode2_reset' [-Wmissing-prototypes]

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:32 -04:00
Lijo Lazar
ea4e96a7b3 drm/amdgpu: Enable recovery on aldebaran
Add aldebaran to devices which support recovery

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:29 -04:00
Lijo Lazar
142600e854 drm/amdgpu: Add mode2 reset support for aldebaran
v1: Aldebaran uses reset control to support mode2 reset. The sequences to
reset and restore hardware context are specific to a particular
configuration.

v2: Clear bus mastering before reset.
Fix coding style issues, drop unwanted variables and info log.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:26 -04:00
Lijo Lazar
5d89bb2d2f drm/amdgpu: Make set PG/CG state functions public
Expose PG/CG set states functions for other clients

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:22 -04:00
Lijo Lazar
a2052839cd drm/amdgpu: Add PSP public function to load a list of FWs
v1: Adds a function to load a list of FWs as passed by the caller. This is
needed as only a select need to loaded for some use cases.

v2: Omit unrelated change, remove info log, fix return value when count is 0

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:18 -04:00
Lijo Lazar
04442bf70d drm/amdgpu: Add reset control handling to reset workflow
This prefers reset control based handling if it's implemented
for a particular ASIC. If not, it takes the legacy path. It uses
the legacy method of preparing environment (job, scheduler tasks)
and restoring environment.

v2: remove unused variable (Alex)

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:14 -04:00
Lijo Lazar
e071dce38f drm/amdgpu: Add reset control to amdgpu_device
v1: Add generic amdgpu_reset_control to handle different types of resets. It
may be added at device, hive or ip level. Each reset control has a list
of handlers associated with it to handle different types of reset. Reset
control is responsible for choosing the right handler given a particular
reset context.

Handler objects may implement a set of functions on how to handle a
particular type of reset.

prepare_env = Prepare environment/software context (not used currently).
prepare_hwcontext = Prepare hardware context for the reset.
perform_reset = Perform the type of reset.
restore_hwcontext = Restore the hw context after reset.
restore_env = Restore the environment after reset (not used currently).

Reset context carries the context of reset, as of now this is based on
the parameters used for current set of resets.

v2: Fix coding style

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:08 -04:00
Jack Zhang
e6c6338f39 drm/amd/amdgpu implement tdr advanced mode
[Why]
Previous tdr design treats the first job in job_timeout as the bad job.
But sometimes a later bad compute job can block a good gfx job and
cause an unexpected gfx job timeout because gfx and compute ring share
internal GC HW mutually.

[How]
This patch implements an advanced tdr mode.It involves an additinal
synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
step in order to find the real bad job.

1. At Step0 Resubmit stage, it synchronously submits and pends for the
first job being signaled. If it gets timeout, we identify it as guilty
and do hw reset. After that, we would do the normal resubmit step to
resubmit left jobs.

2. For whole gpu reset(vram lost), do resubmit as the old way.

v2: squash in build fix (Alex)

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:45 -04:00
Nirmoy Das
030bb4addb drm/amdgpu: make BO type check less restrictive
BO with ttm_bo_type_sg type can also have tiling_flag and metadata.
So so BO type check for only ttm_bo_type_kernel.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reported-by: Tom StDenis <Tom.StDenis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:37 -04:00
Nirmoy Das
cc1bcf85b0 drm/amdgpu: use amdgpu_bo_user bo for metadata and tiling flag
Tiling flag and metadata are only needed for BOs created by
amdgpu_gem_object_create(), so we can remove those from the
base class.

v2: * squash tiling_flags and metadata relared patches into one
    * use BUG_ON for non ttm_bo_type_device type when accessing
    tiling_flags and metadata._
v3: *include to_amdgpu_bo_user

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:31 -04:00
Nirmoy Das
22b40f7a3a drm/amdgpu: use amdgpu_bo_create_user() for when possible
Use amdgpu_bo_create_user() for all the BO allocations for
ttm_bo_type_device type.

v2: include amdgpu_amdkfd_alloc_gws() as well it calls amdgpu_bo_create()
    for  ttm_bo_type_device

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:28 -04:00
Nirmoy Das
9ad0d033ed drm/amdgpu: introduce struct amdgpu_bo_user
Implement a new struct amdgpu_bo_user as subclass of
struct amdgpu_bo and a function to created amdgpu_bo_user
bo with a flag to identify the owner.

v2: amdgpu_bo_to_amdgpu_bo_user -> to_amdgpu_bo_user()

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:19 -04:00
Nirmoy Das
9fd5543e95 drm/amdgpu: allow variable BO struct creation
Allow allocating BO structures with different structure size
than struct amdgpu_bo.

v2: Check bo_ptr_size in all amdgpu_bo_create() caller.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:12 -04:00
Christian König
87cc7f9ebf drm/amdgpu: load balance VCN3 decode as well v8
Add VCN3 IB parsing to figure out to which instance we can send the
stream for decode.

v2: remove VCN instance limit as well, fix amdgpu_cs_find_mapping,
    check supported formats instead of unsupported.
v3: fix typo and error handling
v4: make sure the message BO is CPU accessible
v5: fix addr calculation once more
v6: only check message buffers
v7: fix constant and use defines
v8: fix create msg calculation

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:07 -04:00
Christian König
c62dfdbbf7 drm/amdgpu: share scheduler score on VCN3 instances
The VCN3 instances can do both decode as well as encode.

Share the scheduler load balancing score and remove fixing encode to
only the second instance.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:03 -04:00
Christian König
c107171b8d drm/amdgpu: add the sched_score to amdgpu_ring_init
Allow separate ring to share the same scheduler score.

No functional change.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:44:56 -04:00
Bhaskar Chowdhury
63a93023ee drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes
s/acccess/access/
s/inferface/interface/
s/sequnce/sequence/  .....two different places.
s/retrive/retrieve/
s/sheduling/scheduling/
s/independant/independent/
s/wether/whether/ ......two different places.
s/emmit/emit/
s/synce/sync/

Reviewed-by: Nirmoy Das<nirmoy.das@amd.com>
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:44:29 -04:00
Arnd Bergmann
9e76e7b206 amdgpu: securedisplay: simplify i2c hexdump output
A previous fix I did left a rather complicated loop in
amdgpu_securedisplay_debugfs_write() for what could be expressed in a
simple sprintf, as Rasmus pointed out.

This drops the leading 0x for each byte, but is otherwise
much nicer.

Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:44:08 -04:00
Mark Yacoub
f4a9be998c drm/amdgpu: Ensure that the modifier requested is supported by plane.
On initializing the framebuffer, call drm_any_plane_has_format to do a
check if the modifier is supported. drm_any_plane_has_format calls
dm_plane_format_mod_supported which is extended to validate that the
modifier is on the list of the plane's supported modifiers.

The bug was caught using igt-gpu-tools test: kms_addfb_basic.addfb25-bad-modifier

Tested on ChromeOS Zork by turning on the display, running an overlay
test, and running a YT video.

=== Changes from v1 ===
Explicitly handle DRM_FORMAT_MOD_INVALID modifier.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Mark Yacoub <markyacoub@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:54 -04:00
Tian Tao
36000c7a51 drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emit
Fix the following coccicheck warning:
drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING:
use scnprintf or sprintf

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:47 -04:00
Christian König
266b2d25e3 drm/amdgpu: remove irq_src->data handling
That is unused for quite some time now.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:28 -04:00
Luben Tuikov
084e2640e5 drm/amdgpu: Fix check for RAS support
Use positive logic to check for RAS
support. Rename the function to actually indicate
what it is testing for. Essentially, make the
function a predicate with the correct name.

Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:18 -04:00
Philip Cox
9b7f1e0467 drm/amdgpu: Set amdgpu.noretry=1 for Arcturus
Setting amdgpu.noretry=1 as default for Arcturus.

Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:10 -04:00
John Clements
cad7b7510c drm/amdgpu: added support for dynamic GECC
updated host to send boot config to psp to enable GECC for sienna cichlid

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:06 -04:00
John Clements
e40889ecfd drm/amdgpu: update host to psp interface
added interface support for setting boot config

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:03 -04:00
Horace Chen
437f3e0b6e drm/amdgpu: move vram recover into sriov full access
[what]
currently driver recover vram after full access, which may hit
a corner case that meanwhile another whole gpu reset may be
triggered by another VF, which will cause vram recover fail
then fail the whole device reset.

[how]
move the recover vram into full access. So another bad VF will
not disturb the recover sequence for this vf.

Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed by: Monk.Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:42:54 -04:00
Evan Quan
181e772f7d drm/amd/pm: drop redundant and unneeded BACO APIs V2
Use other APIs which are with the same functionality but much
more clean.

V2: drop mediate unneeded interface

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:42:50 -04:00
Evan Quan
c6ce68e676 drm/amd/pm: label these APIs used internally as static
Also drop unnecessary header file and declarations.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:42:43 -04:00
Arnd Bergmann
19c383affd amdgpu: fix gcc -Wrestrict warning
gcc warns about an sprintf() that uses the same buffer as source
and destination, which is undefined behavior in C99:

drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c: In function 'amdgpu_securedisplay_debugfs_write':
drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c:141:6: error: 'sprintf' argument 3 overlaps destination object 'i2c_output' [-Werror=restrict]
  141 |      sprintf(i2c_output, "%s 0x%X", i2c_output,
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  142 |       securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf[i]);
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c:97:7: note: destination object referenced by 'restrict'-qualified argument 1 was declared here
   97 |  char i2c_output[256];
      |       ^~~~~~~~~~

Rewrite it to remember the current offset into the buffer instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:42:25 -04:00
Arnd Bergmann
7d98d416c2 amdgpu: avoid incorrect %hu format string
clang points out that the %hu format string does not match the type
of the variables here:

drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                  version_major, version_minor);
                                  ^~~~~~~~~~~~~
include/drm/drm_print.h:498:19: note: expanded from macro 'DRM_ERROR'
        __drm_err(fmt, ##__VA_ARGS__)
                  ~~~    ^~~~~~~~~~~

Change it to a regular %u, the same way a previous patch did for
another instance of the same warning.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:42:20 -04:00
Wan Jiabing
32c811b097 drivers: gpu: Remove duplicate include of amdgpu_hdp.h
amdgpu_hdp.h has been included at line 91, so remove
the duplicate include.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:42:16 -04:00
xinhui pan
639979887a drm/amdgpu: Use correct size when access vram
To make size is 4 byte aligned. Use &~0x3ULL instead of &3ULL.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:39:49 -04:00
Nirmoy Das
15e16daa35 drm/amdgpu: fix amdgpu_res_first()
Fix size comparison in the resource cursor.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:38:47 -04:00
Evan Quan
1689fca0d6 drm/amd/pm: fix Navi1x runtime resume failure V2
The RLC was put into a wrong state on runtime suspend. Thus the RLC
autoload will fail on the succeeding runtime resume. By adding an
intermediate PPSMC_MSG_PrepareMp1ForUnload(some GC hard reset involved,
designed for PnP), we can bring RLC back into the desired state.

V2: integrate INTERRUPTS_ENABLED flag clearing into current
    mp1 state set routines

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:53 -04:00
Lijo Lazar
50ca25228e drm/amdgpu: Enable VCN/JPEG CG on aldebaran
Enable clockgating for VCN and JPEG blocks on aldebaran

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:47 -04:00
Bhaskar Chowdhury
4a49751041 drm/amdgpu: Fix a typo
s/proces/process/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:45 -04:00
Bhaskar Chowdhury
7c4f2b235d drm/amdgpu: Fix a typo
s/traing/training/

...Plus the entire sentence construction for better readability.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:42 -04:00
Daniel Gomez
0f6f9dd490 drm/amdgpu/ttm: Fix memory leak userptr pages
If userptr pages have been pinned but not bounded,
they remain uncleared.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Gomez <daniel@qtec.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:34 -04:00
Alex Deucher
5d3a2d9522 drm/amdgpu: skip kfd suspend/resume for S0ix
GFX is in gfxoff mode during s0ix so we shouldn't need to
actually tear anything down and restore it.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:29 -04:00
Alex Deucher
50ec83f0d8 drm/amdgpu: drop S0ix checks around CG/PG in suspend
We handle it properly within the CG/PG functions directly
now.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:24 -04:00
Pratik Vishwakarma
5d70a549d0 drm/amdgpu: skip CG/PG for gfx during S0ix
Not needed as the device is in gfxoff state so the CG/PG state
is handled just like it would be for gfxoff during runtime gfxoff.

This should also prevent delays on resume.

Reworked from Pratik's original patch (Alex)

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
2021-04-09 16:37:06 -04:00
Alex Deucher
32ff160da7 drm/amdgpu: update comments about s0ix suspend/resume
Provide and explanation as to why we skip GFX and PSP for
S0ix.  GFX goes into gfxoff, same as runtime, so no need
to tear down and re-init.  PSP is part of the always on
state, so no need to touch it.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:00 -04:00
Alex Deucher
f937008757 drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend
The SMU expects CGPG to be enabled when entering S0ix.
with this we can re-enable SMU suspend.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:57 -04:00
Alex Deucher
557f42a2b3 drm/amdgpu: re-enable suspend phase 2 for S0ix
This really needs to be done to properly tear down
the device.  SMC, PSP, and GFX are still problematic,
need to dig deeper into what aspect of them that is
problematic.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:53 -04:00
Alex Deucher
3441693157 drm/amdgpu: move s0ix check into amdgpu_device_ip_suspend_phase2 (v3)
No functional change.

v2: use correct dev
v3: rework

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:48 -04:00
Alex Deucher
a2e15b0e6c drm/amdgpu: clean up non-DC suspend/resume handling
Move the non-DC specific code into the DCE IP blocks similar
to how we handle DC.  This cleans up the common suspend
and resume pathes.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:40 -04:00
Alex Deucher
48ccbf730c drm/amdgpu: don't evict vram on APUs for suspend to ram (v4)
Vram is system memory, so no need to evict.

v2: use PM_EVENT messages
v3: use correct dev
v4: use driver flags

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:34 -04:00
Alex Deucher
62498733d4 drm/amdgpu: rework S3/S4/S0ix state handling
Set flags at the top level pmops callbacks to track
state.  This cleans up the current set of flags and
properly handles S4 on S0ix capable systems.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:29 -04:00
Prike Liang
e5192f7b4a drm/amdgpu: fix the hibernation suspend with s0ix
During system hibernation suspend still need un-gate gfx CG/PG firstly to handle HW
status check before HW resource destory.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:17 -04:00
Alex Deucher
b98c6299ef drm/amdgpu: disentangle HG systems from vgaswitcheroo
There's no need to keep vgaswitcheroo around for HG
systems.  They don't use muxes and their power control
is handled via ACPI.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:07 -04:00
Alex Deucher
b2aba43af9 drm/amdgpu: enable DPM_FLAG_MAY_SKIP_RESUME and DPM_FLAG_SMART_SUSPEND flags (v2)
Once the device has runtime suspended, we don't need to power it
back up again for system suspend.  Likewise for resume, we don't
to power up the device again on resume only to power it back off
again via runtime pm because it's still idle.

v2: add DPM_FLAG_SMART_PREPARE as well

Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:36:02 -04:00
Alex Deucher
e25443d276 drm/amdgpu: add a dev_pm_ops prepare callback (v2)
as per:
https://www.kernel.org/doc/html/latest/driver-api/pm/devices.html

The prepare callback is required to support the DPM_FLAG_SMART_SUSPEND
driver flag.  This allows runtime pm to auto complete when the
system goes into suspend avoiding a wake up on suspend and on resume.
Apply this for hybrid gfx and BOCO systems where d3cold is
provided by the ACPI platform.

v2: check if device is runtime suspended in prepare.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:35:55 -04:00
Alex Deucher
ed098aa34c drm/amdgpu: Add additional Sienna Cichlid PCI ID
Add new DID.

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:35:51 -04:00
Nirmoy Das
5a8cd98e6e drm/amdgpu: wrap kiq ring ops with kiq spinlock
KIQ ring is being operated by kfd as well as amdgpu.
KFD is using kiq lock, we should the same from amdgpu side
as well.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:35:31 -04:00
Xiaojian Du
fe68ceef34 Revert "drm/amdgpu: disable gpu reset on Vangogh for now"
This reverts commit 33cf440d59.
And it will enable mode-2 gpu reset for vangogh,
it asks PSP firmware version is 00.1A.00.0F or newer.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:35:05 -04:00
Dennis Li
56b53c0b5a drm/amdgpu: add codes to capture invalid hardware access when recovery
When recovery thread has begun GPU reset, there should be not other
threads to access hardware, otherwise system randomly hang.

v2 (chk): rewritten from scratch, use trylock and lockdep instead of
hand wiring the logic.

v3: add in_irq check

v4: change to check in_task

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:34:53 -04:00
Dave Airlie
1539f71602 drm-misc-next for 5.13:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
   - mst: Improve topology logging
   - edid: Rework and improvements for displayid
 
 Driver Changes:
   - anx7625: Regulators support
   - bridge: Support for the Chipone ICN6211, Lontium LT8912B
   - lt9611: Fix 4k panels handling
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYGWaewAKCRDj7w1vZxhR
 xXhNAP94jdJwM8G7U9dNAXbkkYcqo/RusREsTIp3V3p4nJTDLQEA7tvre1tXQKVb
 vNl5yzexsaWe9+LsPR5Dm9ZDPk1VFQc=
 =ZVM6
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2021-04-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.13:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
  - mst: Improve topology logging
  - edid: Rework and improvements for displayid

Driver Changes:
  - anx7625: Regulators support
  - bridge: Support for the Chipone ICN6211, Lontium LT8912B
  - lt9611: Fix 4k panels handling

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401110552.2b3yetlgsjtlotcn@gilmour
2021-04-07 17:32:12 +10:00
Daniel Vetter
2cbcb78c9e Merge tag 'amd-drm-next-5.13-2021-03-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.13-2021-03-23:

amdgpu:
- Debugfs cleanup
- Various cleanups and spelling fixes
- Flexible array cleanups
- Initial AMD Freesync HDMI
- Display fixes
- 10bpc dithering improvements
- Display ASSR support
- Clean up and unify powerplay and swsmu interfaces
- Vangogh fixes
- Add SMU gfx busy queues for RV/PCO
- PCIE DPM fixes
- S0ix fixes
- GPU metrics data fixes
- DCN secure display support
- Backlight type override
- Add initial support for Aldebaran
- RAS fixes
- Prime fixes for A+A systems
- Reset fixes
- Initial resource cursor support
- Drop legacy IO BAR requirements
- Various power fixes

amdkfd:
- MMU notifier fixes
- APU fixes

radeon:
- Debugfs cleanups
- Flexible array cleanups

UAPI:
- amdgpu: Add a new INFO ioctl interface to query video capabilities
  rather than hardcoding them in userspace.  This allows us to provide
  fine grained asic capabilities (e.g., if a particular part is
  bandwidth limited, we can limit the capabilities).  Proposed userspace:
  https://gitlab.freedesktop.org/leoliu/drm/-/commits/info_video_caps
  https://gitlab.freedesktop.org/leoliu/mesa/-/commits/info_video_caps
- amdkfd: bump the driver version.  There was a problem with reporting
  some RAS features on older versions of the driver. Proposed userspace:
  7cdd63475c

Danvet: A bunch of conflicts all over, but it seems to compile ... I
did put the call to dc_allow_idle_optimizations() on a single line
since it looked a bit too jarring to be left alone.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324040147.1990338-1-alexander.deucher@amd.com
2021-03-26 15:53:21 +01:00
Christian König
a1f091f8ef drm/ttm: switch to per device LRU lock
Instead of having a global lock for potentially less contention.

Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/424010/
2021-03-24 17:05:25 +01:00
Felix Kuehling
b16256874a drm/amdgpu: Mark Aldebaran HW support as experimental
The HW is not in production yet. Driver support is still in development.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:40:44 -04:00
Christian König
e5c04edfcd drm/amdgpu: revert "reserve backup pages for bad page retirment"
As noted during the review this approach doesn't make sense at all.

We should not apply any limitation on the VRAM applications can use inside the kernel.

If an application or end user wants to reserve a certain amount of VRAM for bad pages handling we should do this in the upper layer.

This reverts commit f89b881c81.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:40:06 -04:00
Christian König
6b44b667e2 drm/amdgpu: revert "use the new cursor in the VM code"
We are seeing VM page faults with this. Revert the change until the bugs
are fixed.

This reverts commit 94ae8dc557.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:38:42 -04:00
xinhui pan
79fcd446e7 drm/amdgpu: Fix memory leak
drm_gem_object_put() should be paired with drm_gem_object_lookup().

All gem objs are saved in fb->base.obj[]. Need put the old first before
assign a new obj.

Trigger VRAM leak by running command below
$ service gdm restart

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:37:13 -04:00
Alex Deucher
2d28b70ec3 drm/amdgpu: drop extraneous hw_status update
We set the same variable a few lines above.  Drop the duplicate
setting.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:36:36 -04:00
Nikola Cornij
a85ba00538 drm/amdgpu/display: re-enable freesync video patches
Since this is a "revert of a revert", the end effect is that freesync
video is back to its original state, the way it was before the first
revert.

Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:36:10 -04:00
shaoyunl
050743da31 drm/amdgpu: Keep pending_reset valid during smu reset the ASIC
SMU internal might need to check this pending_reset setting to decide the reset method

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:36:04 -04:00
shaoyunl
2d02893ffc drm/amdgpu: Enable light SBR in XGMI+passthrough configuration
This is to fix the case where it only enable the light SMU
on normal device init. This feature actually need to be enabled after ASIC
been reset as well.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:33:44 -04:00
Feifei Xu
ec1e80f0d7 drm/amdgpu: Use dev_info if VFCT table not valid
Some ASICs do not have GOP driver to copy vbios image into
VFCT table. And it will go to next check.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:31:24 -04:00
Alex Deucher
e99d2eaafd drm/amdgpu: drop legacy IO bar support
It was leftover from radeon where it was required for some
specific old hardware.  It hasn't been required for ages
and the driver already falls back to MMIO when legacy IO
is not available.  Legacy IO also seems to be problematic on
on some thunderbolt devices.  Drop it.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
2021-03-23 23:31:17 -04:00
Christian König
d423f5514d drm/amdgpu: nuke the ih reentrant lock
Interrupts on are non-reentrant on linux. This is just an ancient
leftover from radeon where irq processing was kicked of from different
places.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:30:23 -04:00
Felix Kuehling
7816e4a98c drm/amdkfd: Fix recursive lock warnings
memalloc_nofs_save/restore are no longer sufficient to prevent recursive
lock warnings when holding locks that can be taken in MMU notifiers. Use
memalloc_noreclaim_save/restore instead.

Fixes: f920e413ff ("mm: track mmu notifiers in fs_reclaim_acquire/release")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:30:18 -04:00
Stanley.Yang
970fd19764 drm/amdgpu: fix send ras disable cmd when asic not support ras
cause:
	It is necessary to send ras disable command to ras-ta during gfx
	block ras later init, because the ras capability is disable read
	from vbios for vega20 gaming, but the ras context is released
	during ras init process, this will cause send ras disable command
	to ras-to failed.
    how:
	Delay releasing ras context, the ras context
	will be released after gfx block later init done.

Changed from V1:
    move release_ras_context into ras_resume

Changed from V2:
    check BIT(UMC) is more reasonable before access eeprom table

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:30:12 -04:00
Hawking Zhang
97e272928e drm/amdgpu: update ecc query support for arcturus
arcturus and sienna_cichlid share the same version
of umc_info interface (umc_info v33). arcturus uses
umc_config to indicate ECC capability, while
sienna_cichlid uses umc_config1 to indicate ECC
capability. driver needs to check either umc_config
or umc_config1 to decide ECC capability for ASICs
that use umc_info v33 interface.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:30:04 -04:00
Christian König
94ae8dc557 drm/amdgpu: use the new cursor in the VM code
Separate the drm_mm_node walking from the actual handling.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:30:02 -04:00