Commit Graph

22332 Commits

Author SHA1 Message Date
Dave Airlie
1c46f3c075 Merge tag 'amd-drm-fixes-5.19-2022-07-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.19-2022-07-20:

amdgpu:
- Drop redundant buffer cleanup that can lead to a segfault
- Add a bo_list mutex to avoid possible list corruption in CS

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220720210917.6202-1-alexander.deucher@amd.com
2022-07-21 13:22:40 +10:00
Luben Tuikov
90af0ca047 drm/amdgpu: Protect the amdgpu_bo_list list with a mutex v2
Protect the struct amdgpu_bo_list with a mutex. This is used during command
submission in order to avoid buffer object corruption as recorded in
the link below.

v2 (chk): Keep the mutex looked for the whole CS to avoid using the
	  list from multiple CS threads at the same time.

Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Vitaly Prosyak <Vitaly.Prosyak@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2048
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-07-20 16:23:34 -04:00
xinhui pan
e1aadbab44 drm/amdgpu: Remove one duplicated ef removal
That has been done in BO release notify.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2074
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 11:45:23 -04:00
Linus Torvalds
fcd1b2b9c7 Merge tag 'drm-fixes-2022-07-15' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "This is the regular fixes pull for this week. This has a bunch of
  amdgpu fixes, major one reverts the buddy allocator until it can be
  tested more, otherwise just small ones, then i915 has a bunch of
  fixes.

  The outstanding firmware regressions reported by phoronix will
  hopefully be dealt with ASAP.

  amdgpu:
   - revert buddy allocator support for now
   - DP MST blank screen fix for specific platforms
   - MEC firmware check fix for GC 10.3.7
   - Deep color fix for DCE
   - Fix possible divide by 0
   - Coverage blend mode fix
   - Fix cursor only commit timestamps

  i915:
   - Selftest fix
   - TTM fix sg_table construction
   - Error return fixes
   - Fix a performance regression related to waitboost
   - Fix GT resets"

* tag 'drm-fixes-2022-07-15' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Ensure valid event timestamp for cursor-only commits
  drm/amd/display: correct check of coverage blend mode
  drm/amd/pm: Prevent divide by zero
  drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.
  drm/amdkfd: correct the MEC atomic support firmware checking for GC 10.3.7
  drm/amd/display: Ignore First MST Sideband Message Return Error
  drm/i915/selftests: fix subtraction overflow bug
  drm/i915/gem: Look for waitboosting across the whole object prior to individual waits
  drm/i915/gt: Serialize TLB invalidates with GT resets
  drm/i915/gt: Serialize GRDOM access between multiple engine resets
  drm/i915/ttm: fix sg_table construction
  drm/i915/selftests: fix a couple IS_ERR() vs NULL tests
  drm/i915: Fix vm use-after-free in vma destruction
  drm/i915/guc: ADL-N should use the same GuC FW as ADL-S
  drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
  drm/i915/gvt: IS_ERR() vs NULL bug in intel_gvt_update_reg_whitelist()
  Revert "drm/amdgpu: add drm buddy support to amdgpu"
2022-07-15 09:56:24 -07:00
Stylon Wang
2d4bd81fea drm/amd/display: Fix new dmub notification enabling in DM
[Why]
Changes from "Fix for dmub outbox notification enable" need to land
in DM or DMUB outbox notification would be disabled.

[How]
Enable outbox notification only after interrupt are enabled and IRQ
handlers registered. Any pending notification will be sent by DMUB
once outbox notification is enabled.

Fixes: ed72087064 ("drm/amd/display: Fix for dmub outbox notification enable")
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-07-15 10:04:59 -04:00
Dave Airlie
093f8d8f10 Merge tag 'amd-drm-fixes-5.19-2022-07-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.19-2022-07-13:

amdgpu:
- DP MST blank screen fix for specific platforms
- MEC firmware check fix for GC 10.3.7
- Deep color fix for DCE
- Fix possible divide by 0
- Coverage blend mode fix
- Fix cursor only commit timestamps

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220713172920.6037-1-alexander.deucher@amd.com
2022-07-15 11:26:20 +10:00
Dave Airlie
b1f4347f73 Merge tag 'drm-misc-fixes-2022-07-14' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Only a revert for amdgpu reverting the switch to the drm buddy
allocator.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220714071821.hsejxpsgkbbzlec2@houat
2022-07-15 09:26:12 +10:00
Linus Torvalds
d11219ad53 amdgpu: disable powerpc support for the newer display engine
The DRM_AMD_DC_DCN display engine support (Raven, Navi, and newer) has
not been building cleanly on powerpc and causes link errors due to
mixing hard- and soft-float object files:

  powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o uses soft float
  powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o
  [..]

and while patches are floating around, it's not exactly obvious what is
going on.

The problem bisects to commit 41b7a347bf ("powerpc: Book3S 64-bit
outline-only KASAN support") but that is probably more about changing
config variables than the fundamental cause.

Despite the bisection result, a more directly related commit seems to be
26f4712aed ("drm/amd/display: move FPU related code from dcn31 to
dml/dcn31 folder").  It's probably a combination of the two.

This has been going on since the merge window, without any final word.
So instead of blindly applying patches that may or may not be the right
thing, let's disable this for now.

As Michael Ellerman says:
 "IIUIC this code was never enabled on ppc before, so disabling it seems
  like a reasonable fix to get the build clean"

and once we have more actual feedback (and find any potential users) we
can always re-enable it with the patch that fixes the issues and
back-port as necessary.

Fixes: 41b7a347bf ("powerpc: Book3S 64-bit outline-only KASAN support")
Fixes: 26f4712aed ("drm/amd/display: move FPU related code from dcn31 to dml/dcn31 folder")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/20220606153910.GA1773067@roeck-us.net/
Link: https://lore.kernel.org/all/20220618232737.2036722-1-linux@roeck-us.net/
Link: https://lore.kernel.org/all/20220713050724.GA2471738@roeck-us.net/
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-14 10:05:46 -07:00
Michel Dänzer
3283c83eb6 drm/amd/display: Ensure valid event timestamp for cursor-only commits
Requires enabling the vblank machinery for them.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2030
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-07-13 12:20:37 -04:00
Melissa Wen
47053b1e73 drm/amd/display: correct check of coverage blend mode
Check the value of per_pixel_alpha to decide whether the Coverage pixel
blend mode is applicable or not.

Fixes: 76818cdd11 ("drm/amd/display: add Coverage blend mode for overlay plane")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 12:20:37 -04:00
Yefim Barashkin
0638c98c17 drm/amd/pm: Prevent divide by zero
divide error: 0000 [#1] SMP PTI
CPU: 3 PID: 78925 Comm: tee Not tainted 5.15.50-1-lts #1
Hardware name: MSI MS-7A59/Z270 SLI PLUS (MS-7A59), BIOS 1.90 01/30/2018
RIP: 0010:smu_v11_0_set_fan_speed_rpm+0x11/0x110 [amdgpu]

Speed is user-configurable through a file.
I accidentally set it to zero, and the driver crashed.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Yefim Barashkin <mr.b34r@kolabnow.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-07-13 12:20:37 -04:00
Mario Kleiner
add61d3c31 drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.
Various DCE versions had trouble with 36 bpp lb depth, requiring fixes,
last time in commit 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display
on CIK GPUs") for DCE-8. So far >= DCE-11.2 was considered ok, but now I
found out that on DCE-11.2 it causes dithering when there shouldn't be
any, so identity pixel passthrough with identity gamma LUTs doesn't work
when it should. This breaks various important neuroscience applications,
as reported to me by scientific users of Polaris cards under Ubuntu 22.04
with Linux 5.15, and confirmed by testing it myself on DCE-11.2.

Lets only use depth 36 for DCN engines, where my testing showed that it
is both necessary for high color precision output, e.g., RGBA16 fb's,
and not harmful, as far as more than one year in real-world use showed.

DCE engines seem to work fine for high precision output at 30 bpp, so
this ("famous last words") depth 30 should hopefully fix all known problems
without introducing new ones.

Successfully retested on DCE-11.2 Polaris and DCN-1.0 Raven Ridge on
top of Linux 5.19.0-rc2 + drm-next.

Fixes: 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display on CIK GPUs")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org # 5.14.0
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 12:20:37 -04:00
Prike Liang
c004486548 drm/amdkfd: correct the MEC atomic support firmware checking for GC 10.3.7
On the GC 10.3.7 platform the initial MEC release version #3 can support
atomic operation,so need correct and set its MEC atomic support version to #3.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.18.x
2022-07-13 12:20:36 -04:00
Fangzhi Zuo
acea108fa0 drm/amd/display: Ignore First MST Sideband Message Return Error
[why]
First MST sideband message returns AUX_RET_ERROR_HPD_DISCON
on certain intel platform. Aux transaction considered failure
if HPD unexpected pulled low. The actual aux transaction success
in such case, hence do not return error.

[how]
Not returning error when AUX_RET_ERROR_HPD_DISCON detected
on the first sideband message.

v2: squash in additional DMI entries
v3: squash in static fix

Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-07-13 12:20:26 -04:00
Arunpravin Paneer Selvam
925b6e5913 Revert "drm/amdgpu: add drm buddy support to amdgpu"
This reverts commit c9cad937c0.

This is part of a revert of the following commits:
commit 708d19d9f3 ("drm/amdgpu: move internal vram_mgr function into the C file")
commit 5e3f1e7729 ("drm/amdgpu: fix start calculation in amdgpu_vram_mgr_new")
commit c9cad937c0 ("drm/amdgpu: add drm buddy support to amdgpu")

[WHY]
Few users reported garbaged graphics as soon as x starts,
reverting until this can be resolved.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220708093047.492662-3-Arunpravin.PaneerSelvam@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2022-07-08 14:24:30 +02:00
Alex Deucher
3a4b1cc28f drm/amdgpu/display: disable prefer_shadow for generic fb helpers
Seems to break hibernation.  Disable for now until we can root
cause it.

Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=216119
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-06 22:00:03 -04:00
Alex Deucher
f9a89117fb drm/amdgpu: keep fbdev buffers pinned during suspend
Was dropped when we converted to the generic helpers.

Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-06 21:59:52 -04:00
Alex Deucher
a775e4e494 Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"
This reverts commit 92020e81dd.

This causes stuttering and timeouts with DMCUB for some users
so revert it until we understand why and safely enable it
to save power.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1887
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: stable@vger.kernel.org
2022-06-29 14:50:52 -04:00
Ruili Ji
5cb0e3fb2c drm/amdgpu: To flush tlb for MMHUB of RAVEN series
amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:8 pasid:32769, for process test_basic pid 3305 thread test_basic pid 3305)
amdgpu: in page starting at address 0x00007ff990003000 from IH client 0x12 (VMC)
amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00840051
amdgpu: Faulty UTCL2 client ID: MP1 (0x0)
amdgpu: MORE_FAULTS: 0x1
amdgpu: WALKER_ERROR: 0x0
amdgpu: PERMISSION_FAULTS: 0x5
amdgpu: MAPPING_ERROR: 0x0
amdgpu: RW: 0x1

When memory is allocated by kfd, no one triggers the tlb flush for MMHUB0.
There is page fault from MMHUB0.

v2:fix indentation
v3:change subject and fix indentation

Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-29 14:50:51 -04:00
Alex Deucher
bbba251577 drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover()
Use the correct adev variable for the drm_fb_helper in
amdgpu_device_gpu_recover().  Noticed by inspection.

Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-29 14:50:42 -04:00
Joshua Ashton
e84131a88a amd/display/dc: Fix COLOR_ENCODING and COLOR_RANGE doing nothing for DCN20+
For DCN20 and above, the code that actually hooks up the provided
input_color_space got lost at some point.

Fixes COLOR_ENCODING and COLOR_RANGE doing nothing on DCN20+.
Tested using Steam Remote Play Together + gamescope.

Update other DCNs the same wasy DCN1.x was updates in
commit a1e07ba89d ("drm/amd/display: Use plane->color_space for dpp if specified")

Fixes: a1e07ba89d ("drm/amd/display: Use plane->color_space for dpp if specified")
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-22 17:17:16 -04:00
George Shen
98b02e9f00 drm/amd/display: Fix typo in override_lane_settings
[Why]
The function currently skips overriding the drive
settings of the first lane.

[How]
Change for loop to start at 0 instead of 1.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-22 17:16:23 -04:00
Qingqing Zhuo
235870f659 drm/amd/display: Fix DC warning at driver load
[Why]
Wrong index was checked for dcfclk_mhz, causing false warning.

[How]
Fix the assertion index.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.18.x
2022-06-22 17:15:41 -04:00
Mario Limonciello
937e24b7f5 drm/amd: Revert "drm/amd/display: keep eDP Vdd on when eDP stream is already enabled"
A variety of Lenovo machines with Rembrandt APUs and OLED panels have
stopped showing the display at login.  This behavior clears up after
leaving it idle and moving the mouse or touching keyboard.

It was bisected to be caused by commit 559e265522 ("drm/amd/display:
keep eDP Vdd on when eDP stream is already enabled").  Revert this commit
to fix the issue.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2047
Reported-by: Aaron Ma <aaron.ma@canonical.com>
Fixes: 559e265522 ("drm/amd/display: keep eDP Vdd on when eDP stream is already enabled")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-22 17:11:28 -04:00
Alex Deucher
f15345a377 drm/amdgpu: Adjust logic around GTT size (v3)
Certain GL unit tests for large textures can cause problems
with the OOM killer since there is no way to link this memory
to a process.  This was originally mitigated (but not necessarily
eliminated) by limiting the GTT size.  The problem is this limit
is often too low for many modern games so just make the limit 1/2
of system memory. The OOM accounting needs to be addressed, but
we shouldn't prevent common 3D applications from being usable
just to potentially mitigate that corner case.

Set default GTT size to max(3G, 1/2 of system ram) by default.

v2: drop previous logic and default to 3/4 of ram
v3: default to half of ram to align with ttm
v4: fix spelling in comment (Kent)

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1942
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-22 17:11:16 -04:00
Roman Li
4fd17f2ac0 drm/amd/display: Cap OLED brightness per max frame-average luminance
[Why]
For OLED eDP the Display Manager uses max_cll value as a limit
for brightness control.
max_cll defines the content light luminance for individual pixel.
Whereas max_fall defines frame-average level luminance.
The user may not observe the difference in brightness in between
max_fall and max_cll.
That negatively impacts the user experience.

[How]
Use max_fall value instead of max_cll as a limit for brightness control.

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-14 13:11:11 -04:00
Michel Dänzer
c904e3acba drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl
The commit below changed the TTM manager size unit from pages to
bytes, but failed to adjust the corresponding calculations in
amdgpu_ioctl.

Fixes: dfa714b88e ("drm/amdgpu: remove GTT accounting v2")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1930
Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6642
Tested-by: Martin Roukala <martin.roukala@mupuf.org>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.18.x
2022-06-14 13:10:06 -04:00
Dave Airlie
0a17875064 Merge tag 'amd-drm-fixes-5.19-2022-06-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.19-2022-06-08:

amdgpu:
- DCN 3.1 golden settings fix
- eDP fixes
- DMCUB fixes
- GFX11 fixes and cleanups
- VCN fix for yellow carp
- GMC11 fixes
- RAS fixes
- GPUVM TLB flush fixes
- SMU13 fixes
- VCN3 AV1 regression fix
- VCN2 JPEG fix
- Other misc fixes

amdkfd:
- MMU notifier fix
- Support for more GC 10.3.x families
- Pinned BO handling fix
- Partial migration bug fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220608203008.6187-1-alexander.deucher@amd.com
2022-06-09 17:22:49 +10:00
Yifan Zhang
431d071286 drm/amdgpu/mes: only invalid/prime icache when finish loading both pipe MES FWs.
invalid/prime icahce operation takes effect both pipes cuconrrently,
therefore CP_MES_IC_BASE_LO/HI and CP_MES_MDBASE_LO/HI both have to be
set before prime icache. Otherwise MES hardware gets garbage data in
above regsters and causes page fault

[  470.873200] amdgpu 0000:33:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:217 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[  470.873222] amdgpu 0000:33:00.0: amdgpu:   in page starting at address 0x000092cb89b00000 from client 10
[  470.873234] amdgpu 0000:33:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000BB3
[  470.873242] amdgpu 0000:33:00.0: amdgpu:      Faulty UTCL2 client ID: CPC (0x5)
[  470.873247] amdgpu 0000:33:00.0: amdgpu:      MORE_FAULTS: 0x1
[  470.873251] amdgpu 0000:33:00.0: amdgpu:      WALKER_ERROR: 0x1
[  470.873256] amdgpu 0000:33:00.0: amdgpu:      PERMISSION_FAULTS: 0xb
[  470.873260] amdgpu 0000:33:00.0: amdgpu:      MAPPING_ERROR: 0x1
[  470.873264] amdgpu 0000:33:00.0: amdgpu:      RW: 0x0

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08 15:39:16 -04:00
Mohammad Zafar Ziya
578eb31776 drm/amdgpu/jpeg2: Add jpeg vmid update under IB submit
Add jpeg vmid update under IB submit

Signed-off-by: Mohammad Zafar Ziya <Mohammadzafar.ziya@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-08 11:24:50 -04:00
Christian König
84205d0093 drm/amdgpu: always flush the TLB on gfx8
The TLB on GFX8 stores each block of 8 PTEs where any of the valid bits
are set.

Fixes: 5255e146c9 ("drm/amdgpu: rework TLB flushing")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08 11:24:13 -04:00
Christian König
1d2afeb798 drm/amdgpu: fix limiting AV1 to the first instance on VCN3
The job is not yet initialized here.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2037
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: cdc7893fc9 ("drm/amdgpu: use job and ib structures directly in CS parsers")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08 11:24:13 -04:00
Jesse Zhang
a956a11ee6 drm/amdkfd:Fix fw version for 10.3.6
fix fw error when loading fw for 10.3.6

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.18.x
2022-06-08 11:24:00 -04:00
Joseph Greathouse
b3f9234e10 drm/amdgpu: Add MODE register to wave debug info in gfx11
All other chips, from gfx6-gfx10, now include the MODE register at the
end of the wave debug state. This appears to have been missed in gfx11,
so this patch adds in MODE to the debug state for gfx11.

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-07 16:18:48 -04:00
Nicholas Kazlauskas
8b8ce2b90a Revert "drm/amd/display: Pass the new context into disable OTG WA"
This reverts commit 8440f57532.

Causes a hang when hotplugging DP, shutting down system, or
enabling dual eDP.

Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-07 16:18:21 -04:00
Guchun Chen
41782d7056 Revert "drm/amdgpu: Ensure the DMA engine is deactivated during set ups"
This reverts commit b992a19085.

This causes regression in GPU reset related test.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: ricetons@gmail.com
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-07 16:18:07 -04:00
Linus Torvalds
d0e60d46bc Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:

 - bitmap: optimize bitmap_weight() usage, from me

 - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
   Carvalho Chehab

 - include/linux/find: Fix documentation, from Anna-Maria Behnsen

 - bitmap: fix conversion from/to fix-sized arrays, from me

 - bitmap: Fix return values to be unsigned, from Kees Cook

It has been in linux-next for at least a week with no problems.

* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
  nodemask: Fix return values to be unsigned
  bitmap: Fix return values to be unsigned
  KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
  KVM: x86: hyper-v: fix type of valid_bank_mask
  ia64: cleanup remove_siblinginfo()
  drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
  KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
  lib/bitmap: add test for bitmap_{from,to}_arr64
  lib: add bitmap_{from,to}_arr64
  lib/bitmap: extend comment for bitmap_(from,to)_arr32()
  include/linux/find: Fix documentation
  lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
  genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
  irq: mips: replace cpumask_weight with cpumask_empty where appropriate
  drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  ...
2022-06-04 14:04:27 -07:00
Evan Quan
12d6c18cfa drm/amdgpu: suppress the compile warning about 64 bit type
Suppress the compile warning below:
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:1292
gfx_v11_0_rlc_backdoor_autoload_copy_ucode() warn: should '1 << id' be a 64 bit type?

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:43:36 -04:00
Evan Quan
4513edf74c drm/amd/pm: suppress compile warnings about possible unaligned accesses
Suppress the following compile warnings:
>> drivers/gpu/drm/amd/amdgpu/../pm/swsmu/inc/smu_v11_0_pptable.h:163:17:
warning: field smc_pptable within 'struct smu_11_0_powerplay_table' is
less aligned than 'PPTable_t' and is usually due to 'struct smu_11_0_powerplay_table'
being packed, which can lead to unaligned accesses [-Wunaligned-access]
         PPTable_t smc_pptable;                        //PPTable_t in smu11_driver_if.h
                   ^
   1 warning generated.
--
>> drivers/gpu/drm/amd/amdgpu/../pm/swsmu/inc/smu_v11_0_7_pptable.h:193:17:
warning: field smc_pptable within 'struct smu_11_0_7_powerplay_table' is
less aligned than 'PPTable_t' and is usually due to 'struct smu_11_0_7_powerplay_table'
being packed, which can lead to unaligned accesses [-Wunaligned-access]
         PPTable_t smc_pptable;                        //PPTable_t in smu11_driver_if.h
                   ^
   1 warning generated.
--
>> drivers/gpu/drm/amd/amdgpu/../pm/swsmu/inc/smu_v13_0_pptable.h:161:12:
warning: field smc_pptable within 'struct smu_13_0_powerplay_table' is less aligned than
'PPTable_t' and is usually due to 'struct smu_13_0_powerplay_table' being packed, which
can lead to unaligned accesses [-Wunaligned-access]

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:43:36 -04:00
Philip Yang
88467db6e2 drm/amdkfd: Fix partial migration bugs
Migration range from system memory to VRAM, if system page can not be
locked or unmapped, we do partial migration and leave some pages in
system memory. Several bugs found to copy pages and update GPU mapping
for this situation:

1. copy to vram should use migrate->npage which is total pages of range
as npages, not migrate->cpages which is number of pages can be migrated.

2. After partial copy, set VRAM res cursor as j + 1, j is number of
system pages copied plus 1 page to skip copy.

3. copy to ram, should collect all continuous VRAM pages and copy
together.

4. Call amdgpu_vm_update_range, should pass in offset as bytes, not
as number of pages.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-03 16:43:22 -04:00
Lang Yu
4fac4fcf45 drm/amdkfd: add pinned BOs to kfd_bo_list
The kfd_bo_list is used to restore process BOs after
evictions. As page tables could be destroyed during
evictions, we should also update pinned BOs' page tables
during restoring to make sure they are valid.

So for pinned BOs,
1, Validate them and update their page tables.
2, Don't add eviction fence for them.

v2:
 - Don't handle pinned ones specially in BO validation.(Felix)

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:27:17 -04:00
Philip Yang
4d1e5f12b7 drm/amdgpu: Update PDEs flush TLB if PTB/PDB moved
Flush TLBs when existing PDEs are updated because a PTB or PDB moved,
but avoids unnecessary TLB flushes when new PDBs or PTBs are added to
the page table, which commonly happens when memory is mapped for the
first time.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:27:17 -04:00
Sunil Khatri
371017309a drm/amdgpu: enable tmz by default for GC 10.3.7
Add IP GC 10.3.7 in the list of target to have
tmz enabled by default.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.18.x
2022-06-03 16:27:00 -04:00
Mario Limonciello
7c4f4f197e drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions
Loading amdgpu on GC 10.3.7 shows an ERR level message:
`kfd kfd: amdgpu: GC IP 0a0307 not supported in kfd`

Add these targets to match yellow carp structures.

Reported-by: David Chang <david.chang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Jesse(Jie) Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.18.x
2022-06-03 16:26:36 -04:00
Linus Torvalds
ab18b7b36a Merge tag 'drm-next-2022-06-03-1' of git://anongit.freedesktop.org/drm/drm
Pull more drm updates from Dave Airlie:
 "This is mostly regular fixes, msm and amdgpu. There is a tegra patch
  that is bit of prep work for a 5.20 feature to avoid some inter-tree
  syncs, and a couple of late addition amdgpu uAPI changes but best to
  get those in early, and the userspace pieces are ready.

  msm:
   - Limiting WB modes to max sspp linewidth
   - Fixing the supported rotations to add 180 back for IGT
   - Fix to handle pm_runtime_get_sync() errors to avoid unclocked
     access in the bind() path for dpu driver
   - Fix the irq_free() without request issue which was a big-time
     hitter in the CI-runs.

  amdgpu:
   - Update fdinfo to the common drm format
   - uapi:
       - Add VM_NOALLOC GPUVM attribute to prevent buffers for going
         into the MALL
       - Add AMDGPU_GEM_CREATE_DISCARDABLE flag to create buffers that
         can be discarded on eviction
       - Mesa code which uses these:
           https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16466
   - Link training fixes
   - DPIA fixes
   - Misc code cleanups
   - Aux fixes
   - Hotplug fixes
   - More FP clean up
   - Misc GFX9/10 fixes
   - Fix a possible memory leak in SMU shutdown
   - SMU 13 updates
   - RAS fixes
   - TMZ fixes
   - GC 11 updates
   - SMU 11 metrics fixes
   - Fix coverage blend mode for overlay plane
   - Note DDR vs LPDDR memory
   - Fuzz fix for CS IOCTL
   - Add new PCI DID

  amdkfd:
   - Clean up hive setup
   - Misc fixes

  tegra:
   - add some prelim 5.20 work to avoid inter-tree mess"

* tag 'drm-next-2022-06-03-1' of git://anongit.freedesktop.org/drm/drm: (57 commits)
  drm/msm/dpu: Move min BW request and full BW disable back to mdss
  drm/msm/dpu: Fix pointer dereferenced before checking
  drm/msm/dpu: Remove unused code
  drm/msm/disp/dpu1: remove superfluous init
  drm/msm/dp: Always clear mask bits to disable interrupts at dp_ctrl_reset_irq_ctrl()
  gpu: host1x: Add context bus
  drm/amdgpu: add drm-client-id to fdinfo v2
  drm/amdgpu: Convert to common fdinfo format v5
  drm/amdgpu: bump minor version number
  drm/amdgpu: add AMDGPU_VM_NOALLOC v2
  drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  drm/amdgpu: add beige goby PCI ID
  drm/amd/pm: Return auto perf level, if unsupported
  drm/amdkfd: fix typo in comment
  drm/amdgpu/gfx: fix typos in comments
  drm/amdgpu/cs: make commands with 0 chunks illegal behaviour.
  drm/amdgpu: differentiate between LP and non-LP DDR memory
  drm/amdgpu: Resolve pcie_bif RAS recovery bug
  drm/amdgpu: clean up asd on the ta_firmware_header_v2_0
  drm/amdgpu/discovery: validate VCN and SDMA instances
  ...
2022-06-03 09:49:29 -07:00
Yury Norov
525d651560 drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
The smu_v1X_0_set_allowed_mask() uses bitmap_copy() to convert
bitmap to 32-bit array. This may be wrong due to endiannes issues.
Fix it by switching to bitmap_{from,to}_arr32.

CC: Alexander Gordeev <agordeev@linux.ibm.com>
CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CC: Christian Borntraeger <borntraeger@linux.ibm.com>
CC: Claudio Imbrenda <imbrenda@linux.ibm.com>
CC: David Hildenbrand <david@redhat.com>
CC: Heiko Carstens <hca@linux.ibm.com>
CC: Janosch Frank <frankja@linux.ibm.com>
CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
CC: Sven Schnelle <svens@linux.ibm.com>
CC: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-03 06:52:58 -07:00
Philip Yang
fa582c6f36 drm/amdkfd: Use mmget_not_zero in MMU notifier
MMU notifier callback may pass in mm with mm->mm_users==0 when process
is exiting, use mmget_no_zero to avoid accessing invalid mm in deferred
list work after mm is gone.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-01 16:04:41 -04:00
Candice Li
2a46096335 drm/amdgpu: Resolve RAS GFX error count issue after cold boot on Arcturus
Adjust the sequence for ras late init and separate ras reset error status
from query status.

v2: squash in fix from Candice

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-01 15:58:45 -04:00
Stanley.Yang
28caf8c467 drm/amdgpu: fix ras supported check
Fix aldebaran ras supported check on SRIOV guest side,
the previous check conditicon block all ras feature
on baremetal

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-01 15:58:09 -04:00
Aurabindo Pillai
fd843d0341 drm/amd/display: remove stale config guards
This code should be executed.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-06-01 15:57:18 -04:00