Commit Graph

1218978 Commits

Author SHA1 Message Date
Daniel Vetter
aec3e2e23b Merge tag 'drm-misc-fixes-2023-11-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-fixes for v6.7-rc1:

qxl:
- qxl memory leak fix.
syncobj:
- Fix waiting for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE
vc4:
- Fix UAF in mock helpers

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[sima: Stitch together both changelogs from Maarten. Also because of
branch history this contains a few more bugfixes which are already in
v6.6, but I didn't feel like this justifies some backmerge since there
wasn't any real conflict.]
Link: https://patchwork.freedesktop.org/patch/msgid/bc8598ee-d427-4616-8ebd-64107ab9a2d8@linux.intel.com
2023-11-10 16:57:49 +01:00
Daniel Vetter
0b336ec076 Merge tag 'drm-intel-next-fixes-2023-11-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 fixes for v6.7-rc1:
- Fix null dereference when perf interface is not available
- Fix a -Wstringop-overflow warning
- Fix a -Wformat-truncation warning in intel_tc_port_init
- Flush WC GGTT only on required platforms
- Fix MTL HBR3 rate support on C10 phy and eDP
- Fix MTL notify_guc for multi-GT
- Bump GLK CDCLK frequency when driving multiple pipes
- Fix potential spectre vulnerability

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/878r78xrxd.fsf@intel.com
2023-11-10 16:43:45 +01:00
Nirmoy Das
9506fba463 drm/i915/tc: Fix -Wformat-truncation in intel_tc_port_init
Fix below compiler warning:

intel_tc.c:1879:11: error: ‘%d’ directive output may be truncated
writing between 1 and 11 bytes into a region of size 3
[-Werror=format-truncation=]
"%c/TC#%d", port_name(port), tc_port + 1);
           ^~
intel_tc.c:1878:2: note: ‘snprintf’ output between 7 and 17 bytes
into a destination of size 8
  snprintf(tc->port_name, sizeof(tc->port_name),
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    "%c/TC#%d", port_name(port), tc_port + 1);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

v2: use kasprintf(Imre)
v3: use const for port_name, and fix tc mem leak(Imre)

Fixes: 3eafcddf76 ("drm/i915/tc: Move TC port fields to a new intel_tc_port struct")
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231026125636.5080-1-nirmoy.das@intel.com
(cherry picked from commit 70a3cbbe62)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-11-06 14:42:58 +02:00
Kunwu Chan
1a8e9bad6e drm/i915: Fix potential spectre vulnerability
Fix smatch warning:
drivers/gpu/drm/i915/gem/i915_gem_context.c:847 set_proto_ctx_sseu()
warn: potential spectre issue 'pc->user_engines' [r] (local cap)

Fixes: d4433c7600 ("drm/i915/gem: Use the proto-context to handle create parameters (v5)")
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231103110922.430122-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 27b086382c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-11-06 14:42:54 +02:00
Ville Syrjälä
0cb89cd42f drm/i915: Bump GLK CDCLK frequency when driving multiple pipes
On GLK CDCLK frequency needs to be at least 2*96 MHz when accessing
the audio hardware. Currently we bump the CDCLK frequency up
temporarily (if not high enough already) whenever audio hardware
is being accessed, and drop it back down afterwards.

With a single active pipe this works just fine as we can switch
between all the valid CDCLK frequencies by changing the cd2x
divider, which doesn't require a full modeset. However with
multiple active pipes the cd2x divider trick no longer works,
and thus we end up blinking all displays off and back on.

To avoid this let's just bump the CDCLK frequency to >=2*96MHz
whenever multiple pipes are active. The downside is slightly
higher power consumption, but that seems like an acceptable
tradeoff. With a single active pipe we can stick to the current
more optiomal (from power comsumption POV) behaviour.

Cc: stable@vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9599
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031160800.18371-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 451eaa1a61)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-11-06 14:42:49 +02:00
Nirmoy Das
0ad755fb88 drm/i915/mtl: Apply notify_guc to all GTs
Handle platforms with multiple GTs by iterate over all GTs.
Add a Fixes commit so this gets propagated for MTL support.

Fixes: 213c43676b ("drm/i915/mtl: Remove the 'force_probe' requirement for Meteor Lake")
Suggested-by: John Harrison <john.c.harrison@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231025102826.16955-1-nirmoy.das@intel.com
(cherry picked from commit 949113d34f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-11-06 14:42:44 +02:00
Zongmin Zhou
0e8b9f258b
drm/qxl: prevent memory leak
The allocated memory for qdev->dumb_heads should be released
in qxl_destroy_monitors_object before qxl suspend.
otherwise,qxl_create_monitors_object will be called to
reallocate memory for qdev->dumb_heads after qxl resume,
it will cause memory leak.

Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn>
Link: https://lore.kernel.org/r/20230801025309.4049813-1-zhouzongmin@kylinos.cn
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2023-11-06 09:37:03 +01:00
Dave Airlie
9ccde17d46 amd-drm-next-6.7-2023-11-03:
amdgpu:
 - Fix RAS support check
 - RAS fixes
 - MES fixes
 - SMU13 fixes
 - Contiguous memory allocation fix
 - BACO fixes
 - GPU reset fixes
 - Min power limit fixes
 - GFX11 fixes
 - USB4/TB hotplug fixes
 - ARM regression fix
 - GFX9.4.3 fixes
 - KASAN/KCSAN stack size check fixes
 - SR-IOV fixes
 - SMU14 fixes
 - PSP13 fixes
 - Display blend fixes
 - Flexible array size fixes
 
 amdkfd:
 - GPUVM fix
 
 radeon:
 - Flexible array size fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZUUr4AAKCRC93/aFa7yZ
 2EycAQChYxkqbqfKqhJrU7lyz8EEBybis8UzshmHZnCCQF7ZGgD/a8UUhjr/7itI
 0n/hVEuu0sOVGaC8UQ0M8bl10NTvigI=
 =9rw/
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.7-2023-11-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.7-2023-11-03:

amdgpu:
- Fix RAS support check
- RAS fixes
- MES fixes
- SMU13 fixes
- Contiguous memory allocation fix
- BACO fixes
- GPU reset fixes
- Min power limit fixes
- GFX11 fixes
- USB4/TB hotplug fixes
- ARM regression fix
- GFX9.4.3 fixes
- KASAN/KCSAN stack size check fixes
- SR-IOV fixes
- SMU14 fixes
- PSP13 fixes
- Display blend fixes
- Flexible array size fixes

amdkfd:
- GPUVM fix

radeon:
- Flexible array size fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231103173203.4912-1-alexander.deucher@amd.com
2023-11-06 11:25:14 +10:00
Dave Airlie
f056cb9681 Merge tag 'drm-misc-next-fixes-2023-11-02' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next-fixes for v6.7-rc1:

- dt binding fix for ssd132x
- Initialize ssd130x crtc_state to NULL.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/58f40043-bb8a-4716-bf07-89f6a9f56c4c@linux.intel.com
2023-11-06 11:24:30 +10:00
Ilya Bakoulin
6d5e0032a9 drm/amd/display: Enable fast update on blendTF change
[Why]
Full update is not required on surface blend TF change.

[How]
Update full_update_required condition.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Ilya Bakoulin <ilya.bakoulin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Ilya Bakoulin
5d853ad5a8 drm/amd/display: Fix blend LUT programming
[Why]
LUT write index does not get reset to zero when writing the LUT values
for each separate RGB component, which results in wrong data for 2 of
the 3 components.

[How]
Reset LUT write index to zero before writing each component's data.

Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com>
Acked-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Ilya Bakoulin <ilya.bakoulin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Sung Joon Kim
995dedb7a4 drm/amd/display: Program plane color setting correctly
[why]
There are some registers for plane
color that are skipped programming
on resume. Need to add those as part
of the sequence.

[how]
Add new function hook for programming
plane color control.

Reviewed-by: Duncan Ma <duncan.ma@amd.com>
Acked-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Sung Joon Kim <sungkim@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Hawking Zhang
23618280cc drm/amdgpu: Query and report boot status
Query boot status and report boot errors. A follow
up change is needed to stop GPU initialization if boot
fails.

v2: only invoke the call for dGPU (Le/Lijo)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Hawking Zhang
df57e019d5 drm/amdgpu: Add psp v13 function to query boot status
Add psp v13 function to query boot status.

v2: limit the use case to dGPU only (Lijo)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Li Ma
908cebc9a4 drm/amd/swsmu: remove fw version check in sw_init.
dorp fw version check and using max table size to init table.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Li Ma
34ec3cedca drm/amd/swsmu: update smu v14_0_0 driver if and metrics table
Update driver if headers and metrics table in smu v14_0_0 after smu fw promotion.
Drop the legacy metrics table and add warning of checking pmfw version.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Hawking Zhang
38a64e3a33 drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks
Add MP0_C2PMSG_109/126 register field shift/masks
that are used to identify boot status by driver.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Ma Jun
dbab63561b drm/amdgpu: Optimize the asic type fix code
Use a new struct array to define the asic information which
asic type needs to be fixed.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Tim Huang
36e7ff5c13 drm/amdgpu: fix GRBM read timeout when do mes_self_test
Use a proper MEID to make sure the CP_HQD_* and CP_GFX_HQD_* registers
can be touched when initialize the compute and gfx mqd in mes_self_test.
Otherwise, we expect no response from CP and an GRBM eventual timeout.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 12:18:32 -04:00
Tao Zhou
18eae367cb drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count
Handle xgmi hive case.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Ma Jun
88e5c8f874 drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs
The current code checks sriov vf flag multiple times when creating
hwmon sysfs. So fix it.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Felix Kuehling
0e2e7c5b3d drm/amdgpu: Attach eviction fence on alloc
Instead of attaching the eviction fence when a KFD BO is first mapped,
attach it when it is allocated or imported. This in preparation to allow
KFD BOs to be mapped using the render node API.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Felix Kuehling
5a104cb97c drm/amdkfd: Improve amdgpu_vm_handle_moved
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by
the caller. This will be useful for handling extra BO VA mappings in
KFD VMs that are managed through the render node API.

v2: rebase against drm_exec changes (Alex)

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Nathan Chancellor
6740ec97bc drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2
When building ARCH=x86_64 allmodconfig with clang, which will typically
have sanitizers enabled, there is a warning about a large stack frame.

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:6265:13: error: stack frame size (2520) exceeds limit (2048) in 'dml_prefetch_check' [-Werror,-Wframe-larger-than]
   6265 | static void dml_prefetch_check(struct display_mode_lib_st *mode_lib)
        |             ^
  1 error generated.

Notably, GCC 13.2.0 does not do too much of a better job, as it is right
at the current limit of 2048 (and others have reported being over with
older GCC versions):

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c: In function 'dml_prefetch_check':
  drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:6705:1: error: the frame size of 2048 bytes is larger than 1800 bytes [-Werror=frame-larger-than=]
   6705 | }
        | ^

In the past, these warnings have been avoided by reducing the number of
parameters to various functions so that not as many arguments need to be
passed on the stack. However, these patches take a good amount of effort
to write despite being mechanical due to code structure and complexity
and they are never carried forward to new generations of the code so
that effort has to be expended every new hardware generation, which
becomes harder to justify as time goes on.

To avoid having a noticeable or lengthy breakage in all{mod,yes}config,
which are easy testing targets that have -Werror enabled, increase the
limit for configurations that have KASAN or KCSAN enabled by 50% so that
cases of extremely poor code generation can still be caught while not
breaking the majority of builds. CONFIG_KMSAN also causes high stack
usage but the frame limit is already set to zero when it is enabled,
which is accounted for by the check for CONFIG_FRAME_WARN=0 in the dml2
Makefile.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:59:52 -04:00
Wayne Lin
b1904ed480 drm/amd/display: Avoid NULL dereference of timing generator
[Why & How]
Check whether assigned timing generator is NULL or not before
accessing its funcs to prevent NULL dereference.

Reviewed-by: Jun Lei <jun.lei@amd.com>
Acked-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:59:51 -04:00
Mukul Joshi
be457b2252 drm/amdkfd: Update cache info for GFX 9.4.3
Update cache info reporting based on compute and
memory partitioning modes.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:59:51 -04:00
Mukul Joshi
0ce8edae8b drm/amdkfd: Populate cache info for GFX 9.4.3
GFX 9.4.3 uses a new version of the GC info table which
contains the cache info. This patch adds a new function
to populate the cache info from IP discovery for GFX 9.4.3.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:59:51 -04:00
Alex Deucher
ba0fb4b48c drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64
Issues were reported with commit 1cfb4d6121
("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere
Altra Developer Platform (AVA developer platform).

Various ARM systems seem to have problems related
to PCIe and MMIO access.  In this case, I'm not sure
if this is specific to the ADLINK platform or ARM
in general.  Seems to be some coherency issue with
VRAM.  For now, just don't put MQDs in VRAM on ARM.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html
Fixes: 1cfb4d6121 ("drm/amdgpu: put MQDs in VRAM")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: alexey.klimov@linaro.org
2023-11-03 11:59:51 -04:00
Alex Deucher
23170863ea drm/amdgpu/smu13: drop compute workload workaround
This was fixed in PMFW before launch and is no longer
required.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-11-03 11:59:51 -04:00
Alex Deucher
3938eb956e drm/amdgpu: add a retry for IP discovery init
AMD dGPUs have integrated FW that runs as soon as the
device gets power and initializes the board (determines
the amount of memory, provides configuration details to
the driver, etc.).  For direct PCIe attached cards this
happens as soon as power is applied and normally completes
well before the OS has even started loading.  However, with
hotpluggable ports like USB4, the driver needs to wait for
this to complete before initializing the device.

This normally takes 60-100ms, but could take longer on
some older boards periodically due to memory training.

Retry for up to a second.  In the non-hotplug case, there
should be no change in behavior and this should complete
on the first try.

v2: adjust test criteria
v3: adjust checks for the masks, only enable on removable devices
v4: skip bif_fb_en check

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:51 -04:00
Perry Yuan
886b92f635 drm/amdgpu: ungate power gating when system suspend
[Why] During suspend, if GFX DPM is enabled and GFXOFF feature is
enabled the system may get hung. So, it is suggested to disable
GFXOFF feature during suspend and enable it after resume.

[How] Update the code to disable GFXOFF feature during suspend and enable
it after resume.

[  311.396526] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[  311.396530] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[  311.396531] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62

Acked-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Kun Liu <kun.liu2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:51 -04:00
José Pekkarinen
3a50f41bc2 drm/radeon: replace 1-element arrays with flexible-array members
Reported by coccinelle, the following patch will move the
following 1 element arrays to flexible arrays.

drivers/gpu/drm/radeon/atombios.h:5523:32-48: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5545:32-48: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5461:34-44: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4447:30-40: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4236:30-41: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7095:28-45: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:3896:27-37: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5443:16-25: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5454:34-43: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4603:21-32: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4628:32-46: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:6285:29-39: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4296:30-36: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4756:28-36: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4064:22-35: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7327:9-24: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7332:32-53: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7362:26-41: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7369:29-44: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7349:24-32: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7355:27-35: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)

Signed-off-by: José Pekkarinen <jose.pekkarinen@foxhound.fi>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:59:51 -04:00
Alex Deucher
49afe91370 drm/amd: Fix UBSAN array-index-out-of-bounds for Powerplay headers
For pptable structs that use flexible array sizes, use flexible arrays.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039926
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:59:51 -04:00
Alex Deucher
7b1c6263ea drm/amdgpu: don't use pci_is_thunderbolt_attached()
It's only valid on Intel systems with the Intel VSEC.
Use dev_is_removable() instead.  This should do the right
thing regardless of the platform.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:44 -04:00
Alex Deucher
432e664e7c drm/amdgpu: don't use ATRM for external devices
The ATRM ACPI method is for fetching the dGPU vbios rom
image on laptops and all-in-one systems.  It should not be
used for external add in cards.  If the dGPU is thunderbolt
connected, don't try ATRM.

v2: pci_is_thunderbolt_attached only works for Intel.  Use
    pdev->external_facing instead.
v3: dev_is_removable() seems to be what we want

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:13 -04:00
Alex Deucher
b3c942bb6c drm/amdgpu/gfx10,11: use memcpy_to/fromio for MQDs
Since they were moved to VRAM, we need to use the IO
variants of memcpy.

Fixes: 1cfb4d6121 ("drm/amdgpu: put MQDs in VRAM")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:38:19 -04:00
Tao Zhou
f7aeee7346 drm/amdgpu: use mode-2 reset for RAS poison consumption
Switch from mode-1 reset to mode-2 for poison consumption.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:38:13 -04:00
Lin.Cao
b77cc85bdb drm/amdgpu doorbell range should be set when gpu recovery
GFX doorbell range should be set after flr otherwise the gfx doorbell
range will be overlap with MEC.

v2: remove "amdgpu_sriov_vf" and "amdgpu_in_reset" check, and add grbm
select for the case of 2 gfx rings.

Signed-off-by: Lin.Cao <lincao12@amd.com>
Acked-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:38:04 -04:00
Ma Jun
42ef313754 drm/amd/pm: Return 0 as default min power limit for legacy asics
Return 0 as the default min power limit for the asics use
powerplay.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:37:38 -04:00
Dave Airlie
2ba446f821 drm: renesas: shmobile: Atomic conversion + DT support
Currently, there are two drivers for the LCD controller on Renesas
 SuperH-based and ARM-based SH-Mobile and R-Mobile SoCs:
   1. sh_mobile_lcdcfb, using the fbdev framework,
   2. shmob_drm, using the DRM framework.
 However, only the former driver is used, as all platform support
 integrates the former.  None of these drivers support DT-based systems.
 
 Convert the SH-Mobile DRM driver to atomic modesetting, and add DT
 support, complemented by the customary set of fixes and improvements.
 
 Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 Link: https://lore.kernel.org/r/cover.1694767208.git.geert+renesas@glider.be/
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCZS0HuQAKCRCKwlD9ZEnx
 cI+KAP9PThUJqV7z1YxVeM/qYWYhqR4wezD18QCanCguAIvx9wD9F2ccIsFOso35
 iIv23N4D6gWJllaA2WBeKye3zYEjDQI=
 =RSnM
 -----END PGP SIGNATURE-----

Merge tag 'shmob-drm-atomic-dt-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into drm-next

drm: renesas: shmobile: Atomic conversion + DT support

Currently, there are two drivers for the LCD controller on Renesas
SuperH-based and ARM-based SH-Mobile and R-Mobile SoCs:
  1. sh_mobile_lcdcfb, using the fbdev framework,
  2. shmob_drm, using the DRM framework.
However, only the former driver is used, as all platform support
integrates the former.  None of these drivers support DT-based systems.

Convert the SH-Mobile DRM driver to atomic modesetting, and add DT
support, complemented by the customary set of fixes and improvements.

Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://lore.kernel.org/r/cover.1694767208.git.geert+renesas@glider.be/
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://patchwork.freedesktop.org/patch/msgid/CAMuHMdUF61V5qNyKbrTGxZfEJvCVuLO7q2R5MqZYkzRC_cNr0w@mail.gmail.com
2023-11-02 11:52:55 +10:00
Yang Wang
2bfb0ca3dd drm/amdgpu: remove unused macro HW_REV
remove unused macro HW_REV

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 17:13:59 -04:00
Ma Jun
7f3e6b840f drm/amd/pm: Fix error of MACO flag setting code
MACO only works if BACO is supported

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-10-31 17:13:13 -04:00
Arunpravin Paneer Selvam
9ae587f850 drm/amdgpu: Fix the vram base start address
If the size returned by drm buddy allocator is higher than
the required size, we take the higher size to calculate
the buffer start address. This is required if we couldn't
trim the buffer to the requested size. This will fix the
display corruption issue on APU's which has limited VRAM
size.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2859
Fixes: 0a1844bf0b ("drm/buddy: Improve contiguous memory allocation")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 17:10:13 -04:00
Tao Zhou
d539b0ad7c drm/amdgpu: set XGMI IP version manually for v6_4
The version can't be queried from discovery table.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 17:09:53 -04:00
Tong Liu01
853eebe6ec drm/amdgpu: add unmap latency when gfx11 set kiq resources
[why]
If driver does not set unmap latency for KIQ, the default value of KIQ
unmap latency is zero. When do unmap queue, KIQ will return that almost
immediately after receiving unmap command. So, the queue status will be
saved to MQD incorrectly or lost in some chance.

[how]
Set unmap latency when do kiq set resources. The unmap latency is set to
be 1 second that is synchronized with Windows driver.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:16 -04:00
Kenneth Feng
5f38ac54e6 drm/amd/pm: fix the high voltage and temperature issue
fix the high voltage and temperature issue after the driver is unloaded on smu 13.0.0,
smu 13.0.7 and smu 13.0.10
v2 - fix the code format and make sure it is used on the unload case only.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:16 -04:00
Yifan Zhang
a17f574ab4 drm/amdgpu: remove amdgpu_mes_self_test in gpu recover
gpu tlb flush is skipped if reset sem is held, it makes
mes_self_test fail since it involves add_hw_queue/remove_hw_queue
which needs tlb flush functional. Remove mes_self_test in gpu
recover sequence.

This patch is to fix the recover failure in gfx11.

[ 1831.768292] [drm] ring sdma_32769.3.3 was added
[ 1831.768313] [drm] ring gfx_32769.1.1 ib test pass
[ 1831.768337] [drm] ring compute_32769.2.2 ib test pass
[ 1831.768399] amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process  pid 0 thread  pid 0)
[ 1831.768434] amdgpu 0000:c2:00.0: amdgpu:   in page starting at address 0x0000aec200000000 from client 10
[ 1831.768456] amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30
[ 1831.768473] amdgpu 0000:c2:00.0: amdgpu:      Faulty UTCL2 client ID: CPC (0x5)
[ 1831.768489] amdgpu 0000:c2:00.0: amdgpu:      MORE_FAULTS: 0x0
[ 1831.768501] amdgpu 0000:c2:00.0: amdgpu:      WALKER_ERROR: 0x0
[ 1831.768513] amdgpu 0000:c2:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[ 1831.768521] amdgpu 0000:c2:00.0: amdgpu:      MAPPING_ERROR: 0x0
[ 1831.768529] amdgpu 0000:c2:00.0: amdgpu:      RW: 0x0
[ 1831.931229] amdgpu 0000:c2:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma_32769.3.3 test failed (-110)
[ 1832.062917] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 1832.063107] [drm:amdgpu_mes_remove_hw_queue [amdgpu]] *ERROR* failed to remove hardware queue, queue id = 3

Fixes: e2e3788850 ("drm/amdgpu: rework lock handling for flush_tlb v2")
Reported-by: Li Ma <li.ma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:15 -04:00
Candice Li
e020d01575 drm/amdgpu: Drop deferred error in uncorrectable error check
Drop checking deferred error which can be handled by poison
consumption.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:15 -04:00
Lijo Lazar
5575ce2132 drm/amd/pm: Fix warnings
Fixes warnings:

drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:286:45:
warning: '%s' directive output may be truncated writing up to 29 bytes
into a region of size 23 [-Wformat-truncation=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:286:52:
warning: '%s' directive output may be truncated writing up to 29 bytes
into a region of size 23 [-Wformat-truncation=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu14/smu_v14_0.c:72:45: warning:
'%s' directive output may be truncated writing up to 29 bytes into a
region of size 23 [-Wformat-truncation=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu14/smu_v14_0.c:72:52: warning:
'%s' directive output may be truncated writing up to 29 bytes into a
region of size 23 [-Wformat-truncation=]

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:15 -04:00
Tao Zhou
d1d4c0b7b6 drm/amdgpu: check RAS supported first in ras_reset_error_count
Not all platforms support RAS.

Fixes: 73582be11a ("drm/amdgpu: bypass RAS error reset in some conditions")
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:15 -04:00