Commit Graph

10548 Commits

Author SHA1 Message Date
Stanley.Yang
69691c8235 drm/amdgpu: message smu to update bad channel info
It should notice SMU to update bad channel info when detected
uncorrectable error in UMC block

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15 14:25:16 -04:00
Lijo Lazar
bb7c3e9ce2 drm/amdgpu: Disable baco dummy mode
On aldebaran, BACO dummy mode may be enabled during reset. Disable
it during resume.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15 14:25:15 -04:00
Lang Yu
96a2f0f2c8 drm/amdgpu: fix a wrong ib reference
It should be p->job->ibs[j] instead of p->job->ibs[i] here.

Fixes: cdc7893fc9 ("drm/amdgpu: use job and ib structures directly in CS parsers")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-09 17:28:37 -05:00
Christian König
48e9fbd1a2 drm/amdgpu: initialize the vmid_wait with the stub fence
This way we don't need to check for NULL any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Christian König
6103b2f24e drm/amdgpu: properly embed the IBs into the job
We now have standard macros for that.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Christian König
cdc7893fc9 drm/amdgpu: use job and ib structures directly in CS parsers
Instead of providing the ib index provide the job and ib pointers directly to
the patch and parse functions for UVD and VCE.

Also move the set/get functions for IB values to the IB declerations.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Christian König
a190f8dc4a drm/amdgpu: header cleanup
No function change, just move a bunch of definitions from amdgpu.h into
separate header files.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Jingwen Chen
8c7442f026 drm/amd/amdgpu: set disabled vcn to no_schduler
[Why]
after the reset domain introduced, the sched.ready will be init after
hw_init, which will overwrite the setup in vcn hw_init, and lead to
vcn ib test fail.

[How]
set disabled vcn to no_scheduler

Fixes: 5fd8518d18 ("drm/amdgpu: Move scheduler init to after XGMI is ready")
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Christian König
d18b8eadd8 drm/amdgpu: install ctx entities with cmpxchg
Since we removed the context lock we need to make sure that not two threads
are trying to install an entity at the same time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 461fa7b0ac ("drm/amdgpu: remove ctx->lock")
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Yifan Zhang
b664a56e86 drm/amdkfd: implement get_atc_vmid_pasid_mapping_info for gfx10.3
This patch implements get_atc_vmid_pasid_mapping_info for gfx10.3

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Ruijing Dong
11eb648d01 drm/amdgpu/vcn: Add vcn firmware log
vcn fwlog is for debugging purpose only,
by default, it is disabled.

Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Ruijing Dong
b6065ebf55 drm/amdgpu/vcn: Update fw shared data structure
Add fw log in fw shared data structure.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
David Yu
811c04dbb3 drm/amdgpu: Add DFC CAP support for aldebaran
Add DFC CAP support for aldebaran

Initialize cap microcode in psp_init_sriov_microcode,
the ta microcode will be  initialized in psp_vxx_init_microcode

Signed-off-by: David Yu <David.Yu@amd.com>
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Harish Kasiviswanathan
24bf9fd197 drm/amdgpu: Set correct DMA mask for aldebaran
Aldebaran has 48-bit physical address support

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:29 -05:00
Lijo Lazar
9e08564727 drm/amdgpu: Refactor mode2 reset logic for v13.0.2
Use IP version and refactor reset logic to apply to a list of devices.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:29 -05:00
Weiguo Li
3192f1d9b6 drm/amdgpu: remove redundant null check
Remove the redundant null check since the caller ensures
that 'ctx' is never NULL.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Alex Deucher
825e0af0d4 drm/amdgpu/sdma5: drop unused cyan skillfish firmware
Leftover from bring up.  Not used anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Alex Deucher
31f5f46043 drm/amdgpu/gfx10: drop unused cyan skillfish firmware
Leftover from bring up.  Not used anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Alex Deucher
1b537e6410 drm/amdgpu: remove unused gpu_info firmwares
These were leftover from bring up and are no longer
necessary.  The information is available via
the IP discovery table.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Alex Deucher
45a3e06be4 drm/amdgpu: Use IP versions in convert_tiling_flags_to_modifier()
Rather than checking the asic_type.

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Prike Liang
d7709eb6a1 drm/amdgpu: enable gfxoff routine for GC 10.3.7
Enable gfxoff routine for GC 10.3.7.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Prike Liang
fabe175385 drm/amdgpu: enable gfx power gating for GC 10.3.7
Enable gfx power gating for GC 10.3.7.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Prike Liang
9a1358bb2c drm/amdgpu/nv: enable clock gating for GC 10.3.7 subblock
This will enable the following block clock gating.

 - MC
 - SDMA
 - HDP
 - ATHUB
 - IH
 - VCN/JPEG

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Prike Liang
00bfab4457 drm/amdgpu: enable gfx clock gating control for GC 10.3.7
Enable gfx cg gate/ungate control for GC 10.3.7.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Qiang Yu
b6901d93cc drm/amdgpu: fix suspend/resume hang regression
Regression has been reported that suspend/resume may hang with
the previous vm ready check commit.

So bring back the evicted list check as a temp fix.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922
Fixes: c1a66c3bc4 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Yifan Zha
e6fac6a9c9 drm/amdgpu: Move CAP firmware loading to the beginning of PSP firmware list
[Why]
As PSP needs to verify the signature, CAP firmware must be loaded first when PSP loads firmwares.
Otherwise, when DFC feature is enabled, CP firmwares would be loaded failed.

[ 1149.160480] [drm] MM table gpu addr = 0x800022f000, cpu addr = 00000000a62afcea.
[ 1149.209874] [drm] failed to load ucode CP_CE(0x8)
[ 1149.209878] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.215914] [drm] failed to load ucode CP_PFP(0x9)
[ 1149.215917] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.221941] [drm] failed to load ucode CP_ME(0xA)
[ 1149.221944] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.228082] [drm] failed to load ucode CP_MEC1(0xB)
[ 1149.228085] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.234209] [drm] failed to load ucode CP_MEC2(0xD)
[ 1149.234212] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.242379] [drm] failed to load ucode VCN(0x1C)
[ 1149.242382] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)

[How]
Move CAP UCODE ID to the beginning of AMDGPU_UCODE_ID enum list.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Bokun Zhang <Bokun.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Andrey Grodzovsky
5aa061474b drm/amdgpu: Bump minor version for hot plug tests enabling.
This will allow to enable the tests only after latest fix
after which the tests passed on my system.

I tested on NV21 standalone and Vega 10 and Polaris as
pair with DRI_PRIME.

It's possible there might be still issues on ASICs i don't
have at my posession but that that the point of enbling
the tests finally - if other people during testing will
encounter errors they will report and I will be able to fix.

The releated merge request for enabling libdrm tests suite  is in
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Andrey Grodzovsky
57230f0ce6 drm/amdgpu: Fix sigsev when accessing MMIO on hot unplug.
Protect with drm_dev_enter/exit

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Yifan Zhang
7d4108e4ce drm/amdgpu: convert code name to ip version for noretry set
Use IP version rather than codename for noretry set.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Yifan Zhang
957b0787ee drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populated
otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set
is called.

Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
80e0c2cb37 drm/amdgpu: Remove redundant .ras_fini initialization in some ras blocks
1. Define amdgpu_ras_block_late_fini_default in amdgpu_ras.c as
   .ras_fini common function, which is called when
   .ras_fini of ras block isn't initialized.
2. Remove the code of using amdgpu_ras_block_late_fini to
   initialize .ras_fini in ras blocks.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
30e58102d5 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in mca ras block
Remove redundant calls of amdgpu_ras_block_late_fini in mca ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
149d7ba1f8 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in sdma ras block
Remove redundant calls of amdgpu_ras_block_late_fini in sdma ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
aa8e65dfc7 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in hdp ras block
Remove redundant calls of amdgpu_ras_block_late_fini in hdp ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
f148c143ef drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras block
Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
0dca257d6d drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in umc ras block
Remove redundant calls of amdgpu_ras_block_late_fini in umc ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
f578a37d19 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in nbio ras block
Remove redundant calls of amdgpu_ras_block_late_fini in nbio ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
9dad47c50f drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in mmhub ras block
Remove redundant calls of amdgpu_ras_block_late_fini in mmhub ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
35366481d0 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras block
Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
1f211a827c drm/amdgpu: centrally calls the .ras_fini function of all ras blocks
centrally calls the .ras_fini function of all ras blocks.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
667c7091a3 drm/amdgpu: Optimize xxx_ras_fini function of each ras block
1. Move the variables of ras block instance members from
   specific xxx_ras_fini to general ras_fini call.
2. Function calls inside the modules only use parameters
   passed from xxx_ras_fini instead of ras block instance
   members.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
01d468d9a4 drm/amdgpu: Modify .ras_fini function pointer parameter
Modify .ras_fini function pointer parameter so that
we can remove redundant intermediate calls in some
ras blocks.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Tom Rix
ca6fcfa8d4 drm/amdgpu: Fix realloc of ptr
Clang static analysis reports this error
amdgpu_debugfs.c:1690:9: warning: 1st function call
  argument is an uninitialized value
  tmp = krealloc_array(tmp, i + 1,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~

realloc uses tmp, so tmp can not be garbage.
And the return needs to be checked.

Fixes: 5ce5a584cb ("drm/amdgpu: add debugfs for reset registers list")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Dave Airlie
38a15ad948 Merge tag 'amd-drm-next-5.18-2022-02-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.18-2022-02-25:

amdgpu:
- Raven2 suspend/resume fix
- SDMA 5.2.6 updates
- VCN 3.1.2 updates
- SMU 13.0.5 updates
- DCN 3.1.5 updates
- Virtual display fixes
- SMU code cleanup
- Harvest fixes
- Expose benchmark tests via debugfs
- Drop no longer relevant gart aperture tests
- More RAS restructuring
- W=1 fixes
- PSR rework
- DP/VGA adapter fixes
- DP MST fixes
- GPUVM eviction fix
- GPU reset debugfs register dumping support
- Misc display fixes
- SR-IOV fix
- Aldebaran mGPU fix
- Add module parameter to disable XGMI for testing

amdkfd:
- IH ring overflow logging fixes
- CRIU fixes
- Misc fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com
2022-03-01 16:19:02 +10:00
Dave Airlie
6c64ae228f Backmerge tag 'v5.17-rc6' into drm-next
This backmerges v5.17-rc6 so I can merge some amdgpu and some tegra changes on top.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-02-28 14:57:14 +10:00
Andrey Grodzovsky
2656fd230d drm/amdgpu: Exclude PCI reset method for now.
According to my investigation of the state of PCI
reset recently it's not working. The reason is
due to the fact the kernel PCI code rejects SBR
when there are more then one PF under same bridge
which we always have (at least AUDIO PF but usually
more) and that because SBR will reset all the PFS
and devices under the same bridge as you and you
cannot assume they support SBR.
Once we anble FLR support we can reenable this option as
FLR is doable on single PF and doens't have this
restriction.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24 17:25:20 -05:00
Alex Sierra
158a05a0b8 drm/amdgpu: Add use_xgmi_p2p module parameter
This parameter controls xGMI p2p communication, which is enabled by
default. However, it can be disabled by setting it to 0. In case xGMI
p2p is disabled in a dGPU, PCIe p2p interface will be used instead.
This parameter is ignored in GPUs that do not support xGMI
p2p configuration.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24 17:25:12 -05:00
Xiaogang Chen
45f0ff404c drm/amdgpu: config HDP_MISC_CNTL.READ_BUFFER_WATERMARK
To fix applications running across multiple GPU config hang.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24 17:24:43 -05:00
Dave Airlie
54f43c17d6 Merge tag 'drm-misc-next-2022-02-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.18:

UAPI Changes:

Cross-subsystem Changes:
- Split out panel-lvds and lvds dt bindings .
- Put yes/no on/off disabled/enabled strings in linux/string_helpers.h
  and use it in drivers and tomoyo.
- Clarify dma_fence_chain and dma_fence_array should never include eachother.
- Flatten chains in syncobj's.
- Don't double add in fbdev/defio when page is already enlisted.
- Don't sort deferred-I/O pages by default in fbdev.

Core Changes:
- Fix missing pm_runtime_put_sync in bridge.
- Set modifier support to only linear fb modifier if drivers don't
  advertise support.
- As a result, we remove allow_fb_modifiers.
- Add missing clear for EDID Deep Color Modes in drm_reset_display_info.
- Assorted documentation updates.
- Warn once in drm_clflush if there is no arch support.
- Add missing select for dp helper in drm_panel_edp.
- Assorted small fixes.
- Improve fb-helper's clipping handling.
- Don't dump shmem mmaps in a core dump.
- Add accounting to ttm resource manager, and use it in amdgpu.
- Allow querying the detected eDP panel through debugfs.
- Add helpers for xrgb8888 to 8 and 1 bits gray.
- Improve drm's buddy allocator.
- Add selftests for the buddy allocator.

Driver Changes:
- Add support for nomodeset to a lot of drm drivers.
- Use drm_module_*_driver in a lot of drm drivers.
- Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau,
  bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86.
- Add bridge/it6505.
- Create DP and DVI-I connectors in ast.
- Assorted nouveau backlight fixes.
- Rework amdgpu reset handling.
- Add dt bindings for ingenic,jz4780-dw-hdmi.
- Support reading edid through aux channel in ingenic.
- Add a drm driver for Solomon SSD130x OLED displays.
- Add simple support for sharp LQ140M1JW46.
- Add more panels to nt35560.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/686ec871-e77f-c230-22e5-9e3bb80f064a@linux.intel.com
2022-02-25 05:50:18 +10:00
Qiang Yu
c1a66c3bc4 drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
Workstation application ANSA/META v21.1.4 get this error dmesg when
running CI test suite provided by ANSA/META:
[drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)

This is caused by:
1. create a 256MB buffer in invisible VRAM
2. CPU map the buffer and access it causes vm_fault and try to move
   it to visible VRAM
3. force visible VRAM space and traverse all VRAM bos to check if
   evicting this bo is valuable
4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
   will set amdgpu_vm->evicting, but latter due to not in visible
   VRAM, won't really evict it so not add it to amdgpu_vm->evicted
5. before next CS to clear the amdgpu_vm->evicting, user VM ops
   ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
   but fail in amdgpu_vm_bo_update_mapping() (check
   amdgpu_vm->evicting) and get this error log

This error won't affect functionality as next CS will finish the
waiting VM ops. But we'd better clear the error log by checking
the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
amdgpu_vm_bo_update_mapping() later.

Another reason is amdgpu_vm->evicted list holds all BOs (both
user buffer and page table), but only page table BOs' eviction
prevent VM ops. amdgpu_vm->evicting flag is set only for page
table BOs, so we should use evicting flag instead of evicted list
in amdgpu_vm_ready().

The side effect of this change is: previously blocked VM op (user
buffer in "evicted" list but no page table in it) gets done
immediately.

v2: update commit comments.

Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-02-23 16:31:06 -05:00