divide error: 0000 [#1] SMP PTI
CPU: 3 PID: 78925 Comm: tee Not tainted 5.15.50-1-lts #1
Hardware name: MSI MS-7A59/Z270 SLI PLUS (MS-7A59), BIOS 1.90 01/30/2018
RIP: 0010:smu_v11_0_set_fan_speed_rpm+0x11/0x110 [amdgpu]
Speed is user-configurable through a file.
I accidentally set it to zero, and the driver crashed.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Yefim Barashkin <mr.b34r@kolabnow.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Various DCE versions had trouble with 36 bpp lb depth, requiring fixes,
last time in commit 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display
on CIK GPUs") for DCE-8. So far >= DCE-11.2 was considered ok, but now I
found out that on DCE-11.2 it causes dithering when there shouldn't be
any, so identity pixel passthrough with identity gamma LUTs doesn't work
when it should. This breaks various important neuroscience applications,
as reported to me by scientific users of Polaris cards under Ubuntu 22.04
with Linux 5.15, and confirmed by testing it myself on DCE-11.2.
Lets only use depth 36 for DCN engines, where my testing showed that it
is both necessary for high color precision output, e.g., RGBA16 fb's,
and not harmful, as far as more than one year in real-world use showed.
DCE engines seem to work fine for high precision output at 30 bpp, so
this ("famous last words") depth 30 should hopefully fix all known problems
without introducing new ones.
Successfully retested on DCE-11.2 Polaris and DCN-1.0 Raven Ridge on
top of Linux 5.19.0-rc2 + drm-next.
Fixes: 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display on CIK GPUs")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org # 5.14.0
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This version brings along following fixes:
- Fixes for MST, MPO, PSRSU, DP 2.0, Freesync and others
- Add register offsets of NBI and DCN.
- Improvement of ALPM
- Removing assert statement for Linux DM
- Re-implementing ARGB16161616 pixel format
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
With single display odm 2:1 policy, when moving windowed MPO across
the display, we experience a momentary lag when we move between the
centre of the display and the right half of the display. This is
caused by the MPO pipe being reallocated when it crosses this
boundary
[How]
Handle two cases:
1. if the head pipe has a MPO pipe already allocated in the old
context, then use that pipe if it is available in the current
context
2. if the head pipe is on the left side, check the right side to
see if it has a MPO pipe already allocated. If so, don't use
that pipe if it is selected as the idle pipe in the current
context
Add new function pointer called .acquire_idle_pipe_for_head_pipe
that will pass in the head pipe and handle case 1
Add find_idle_secondary_pipe_check_mpo() to handle case 2
if we don't hit case 1.
In dc_add_plane_to_context(), start with head pipe and check
case 1 and 2 in call acquire_free_pipe_for_head().
If we are on the right side of the display, check case 1
again by passing in right side pipe as the new head in
call acquire_free_pipe_for_head().
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Ariel Bernstein <Eric.Bernstein@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
Add a field to store the DCN IP offset for use with runtime offset
calculation
This offset is indexed using reg*_BASE_IDX for the corresponding
group of registers. For example, address of DIG_BE_CNTL instance 0 is
calculated like: dcn_reg_offsets[regDIG0_DIG_BE_CNTL_BASE_IDX] +
regDIG0_DIG_BE_CNTL.
{dcn,nbio}_reg_offsets are used only for the ASICs for which runtime
initializaion of offsets are enabled through the modified SR* macros
that contain an additional REG_STRUCT element in the macro definition.
DCN3.5+ will fail dc_create() if {dcn,nbio}_reg_offsets are null. They
are applicable starting with DCN32/321 and are not used for ASICs
upstreamed before them. ASICs before DCN32/321 will not contain any
computation that involves {dcn,nbio}_reg_offsets. For them, the
address/offset computation is done during compile time.
This is evident from the BASE_INNER definition for compile time vs run
time initialization:
Compile time init: #define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg
Run time init: #define BASE_INNER(seg) ctx->dcn_reg_offsets[seg]
BASE_INNER macro is local to each dcnxx_resource.c and hence different
ASICs can have either runtime or compile time initialization of offsets.
The computation of offset is done for registers all at once during
driver load and hence it does not introduce any performance overhead
during normal operation.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
ABGR16161616 colour format was added to dcn10/20/30, and set
any ARGB16161616 to the same value as it (26). As such, the
HDR10 Green Point y value was too far off of the EDID stated
value for DisplayPort.
[How]
Added back the pixel format as 22 for ARGB16161616 for
dcn10/20/30.
Reviewed-by: Reza Amini <reza.amini@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Ethan Wellenreiter <Ethan.Wellenreiter@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Unbounded request logic in resource/DML has some issues where
unbounded request is being enabled incorrectly. SW today enables
unbounded request unconditionally in hardware, on the assumption
that HW can always support it in single pipe scenarios.
This worked until now because the same assumption is made in DML.
A new DML update is needed to fix a bug, where there are single
pipe scenarios where unbounded cannot be enabled, and this change
in DML needs to be ported in, and dcn32 resource logic fixed.
[how]
First, dcn32_resource should program unbounded req in HW according
to unbounded req enablement output from DML, as opposed to DML input
Second, port in DML1 update which disables unbounded req in some
scenarios to fix an issue with poor stutter performance
Reviewed-by: Nevenko Stupar <Nevenko.Stupar@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move reset_context out of gpu recover function to make it configurable
for different reset purpose.
For the reset way of call gpu_recovery sysfs, force to use full reset
method. Otherwise, try soft reset by default if the related ASIC
supportted, if soft reset failed, will use full reset.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support SDMA soft reset for SDMA v6.
V3: use ib test to check soft reset.
V4: squash in unused variable fix (Alex)
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Status flags definition is reduced to read
less bytes in SCDC transaction for status update.
[How]
Reduce definition of reserved bytes from 3 to 1
for status update.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Chris Park <chris.park@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On the GC 10.3.7 platform the initial MEC release version #3 can support
atomic operation,so need correct and set its MEC atomic support version to #3.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Ideally link capability should be independent from the link
configuration that we decide to use in enable link. Otherwise if link
capability is changed after validation has completed, we could end up
enabling a link configuration with invalid configuration. This would
lead to over link bandwidth subscription or in the extreme case
causes us to enable HPO link to a DIO stream.
[how]
Add a new struct in pipe ctx called link config. This structure will
contain link configuration to enable a link. It will be populated
during map pool resources after we validate link bandwidth. Remove
the reference of verified link cap during enable link process and
use link config in pipe ctx instead.
Reviewed-by: George Shen <George.Shen@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
First MST sideband message returns AUX_RET_ERROR_HPD_DISCON
on certain intel platform. Aux transaction considered failure
if HPD unexpected pulled low. The actual aux transaction success
in such case, hence do not return error.
[how]
Not returning error when AUX_RET_ERROR_HPD_DISCON detected
on the first sideband message.
v2: squash in additional DMI entries
v3: squash in static fix
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Didn't really know what this buffer was when initially implemented,
but these days we do, so move it somewhere more appropriate.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Doesn't fix any known issue, but noticed fifo being initialised in
logs in response to mmu allocation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes resume from hibernate failing on (at least) TU102, where cursor
channel init failed due to being performed before the core channel.
Not solid idea why suspend-to-ram worked, but, presumably HW being in
an entirely clean state has something to do with it.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Userspace never ended up using this to be clever about dealing with
channel death, and it won't be, not like this anyway.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Next for v5.20
GPU:
- a619 support
- Fix for unclocked GMU register access
- Devcore dump enhancements
Core:
- client utilization via fdinfo support
- fix fence rollover issue
- gem: Lockdep false-positive warning fix
- gem: Switch to pfn mappings
DPU:
- constification of HW catalog
- support for using encoder as CRC source
- WB support on sc7180
- WB resolution fixes
DP:
- dropped custom bulk clock implementation
- made dp_bridge_mode_valid() return MODE_CLOCK_HIGH where applicable
- fix link retraining on resolution change
MDP5:
- MSM8953 perf data
HDMI:
- YAML'ification of schema
- dropped obsolete GPIO support
- misc cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtuqswBGPw-kCYzJvckK2RR1XTeUEgaXwVG_mvpbv3gPA@mail.gmail.com
One impact of commit 047a1b877e ("dma-buf & drm/amdgpu: remove
dma_resv workaround") is that it stores many, many more fences. Whereas
adding an exclusive fence used to remove the shared fence list, that
list is now preserved and the write fences included into the list. Not
just a single write fence, but now a write/read fence per context. That
causes us to have to track more fences than before (albeit half of those
are redundant), and we trigger more interrupts for multi-engine
workloads.
As part of reducing the impact from handling more signaling, we observe
we only need to kick the signal worker after adding a fence iff we have
good cause to believe that there is work to be done in processing the
fence i.e. we either need to enable the interrupt or the request is
already complete but we don't know if we saw the interrupt and so need
to check signaling.
References: 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d7b953c7a4ba747c8196a164e2f8c5aef468d048.1657289332.git.karolina.drobnik@intel.com
We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.
On applying commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.
Commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.
Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.
Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.
Reported-by: Thomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com