amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:8 pasid:32769, for process test_basic pid 3305 thread test_basic pid 3305)
amdgpu: in page starting at address 0x00007ff990003000 from IH client 0x12 (VMC)
amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00840051
amdgpu: Faulty UTCL2 client ID: MP1 (0x0)
amdgpu: MORE_FAULTS: 0x1
amdgpu: WALKER_ERROR: 0x0
amdgpu: PERMISSION_FAULTS: 0x5
amdgpu: MAPPING_ERROR: 0x0
amdgpu: RW: 0x1
When memory is allocated by kfd, no one triggers the tlb flush for MMHUB0.
There is page fault from MMHUB0.
v2:fix indentation
v3:change subject and fix indentation
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Use the correct adev variable for the drm_fb_helper in
amdgpu_device_gpu_recover(). Noticed by inspection.
Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Fixes for v5.19-rc5
- Fix to increment vsync_cnt before calling drm_crtc_handle_vblank so that
userspace sees the value *after* it is incremented if waiting for vblank
events
- Fix to reset drm_dev to NULL in dp_display_unbind to avoid a crash in
probe/bind error paths
- Fix to resolve the smatch error of de-referencing before NULL check in
dpu_encoder_phys_wb.c
- Fix error return to userspace if fence-id allocation fails in submit
ioctl
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvswNKdd02EYKYv5Zjv7f+mcqeWC7hHQ1SBjqYzN_ZHnA@mail.gmail.com
Pull xen fixes from Juergen Gross:
- A rare deadlock in Qubes-OS between the i915 driver and Xen grant
unmapping, solved by making the unmapping fully asynchronous
- A bug in the Xen blkfront driver caused by incomplete error handling
- A fix for undefined behavior (shifting a signed int by 31 bits)
- A fix in the Xen drmfront driver avoiding a WARN()
* tag 'for-linus-5.19a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/gntdev: Avoid blocking in unmap_grant_pages()
drm/xen: Add missing VM_DONTEXPAND flag in mmap callback
x86/xen: Remove undefined behavior in setup_features()
xen-blkfront: Handle NULL gendisk
Fixes for v5.19-rc4
- Workaround for parade DSI bridge power sequencing
- Fix for multi-planar YUV format offsets
- Limiting WB modes to max sspp linewidth
- Fixing the supported rotations to add 180 back for IGT
- Fix to handle pm_runtime_get_sync() errors to avoid unclocked access
in the bind() path for dpu driver
- Fix the irq_free() without request issue which was a being hit frequently
in CI.
- Fix to add minimum ICC vote in the msm_mdss pm_resume path to address
bootup splats
- Fix to avoid dereferencing without checking in WB encoder
- Fix to avoid crash during suspend in DP driver by ensuring interrupt
mask bits are updated
- Remove unused code from dpu_encoder_virt_atomic_check()
- Fix to remove redundant init of dsc variable
- Fix to ensure mmap offset is initialized to avoid memory corruption
from unpin/evict
- Fix double runpm disable in probe-defer path
- VMA fenced-unpin fixes
- Fix for WB max-width
- Fix for rare dp resolution change issue
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvdsOF1-+WfTWyEyu33XPcvxOCU00G-dz7EF2J+fdyUHg@mail.gmail.com
The 'vsync_cnt' is used to count the number of frames for a crtc.
Unfortunately, we increment the count after waking up userspace via
dpu_crtc_vblank_callback() calling drm_crtc_handle_vblank().
drm_crtc_handle_vblank() wakes up userspace processes that have called
drm_wait_vblank_ioctl(), and if that ioctl is expecting the count to
increase it won't.
Increment the count before calling into the drm APIs so that we don't
have to worry about ordering the increment with anything else in drm.
This fixes a software video decode test that fails to see frame counts
increase on Trogdor boards.
Cc: Mark Yacoub <markyacoub@chromium.org>
Cc: Jessica Zhang <quic_jesszhan@quicinc.com>
Fixes: 885455d6bf ("drm/msm: Change dpu_crtc_get_vblank_counter to use vsync count.")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Tested-by: Jessica Zhang <quic_jesszhan@quicinc.com> # Trogdor (sc7180)
Patchwork: https://patchwork.freedesktop.org/patch/490531/
Link: https://lore.kernel.org/r/20220622023855.2970913-1-swboyd@chromium.org
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
For DCN20 and above, the code that actually hooks up the provided
input_color_space got lost at some point.
Fixes COLOR_ENCODING and COLOR_RANGE doing nothing on DCN20+.
Tested using Steam Remote Play Together + gamescope.
Update other DCNs the same wasy DCN1.x was updates in
commit a1e07ba89d ("drm/amd/display: Use plane->color_space for dpp if specified")
Fixes: a1e07ba89d ("drm/amd/display: Use plane->color_space for dpp if specified")
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
A variety of Lenovo machines with Rembrandt APUs and OLED panels have
stopped showing the display at login. This behavior clears up after
leaving it idle and moving the mouse or touching keyboard.
It was bisected to be caused by commit 559e265522 ("drm/amd/display:
keep eDP Vdd on when eDP stream is already enabled"). Revert this commit
to fix the issue.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2047
Reported-by: Aaron Ma <aaron.ma@canonical.com>
Fixes: 559e265522 ("drm/amd/display: keep eDP Vdd on when eDP stream is already enabled")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain GL unit tests for large textures can cause problems
with the OOM killer since there is no way to link this memory
to a process. This was originally mitigated (but not necessarily
eliminated) by limiting the GTT size. The problem is this limit
is often too low for many modern games so just make the limit 1/2
of system memory. The OOM accounting needs to be addressed, but
we shouldn't prevent common 3D applications from being usable
just to potentially mitigate that corner case.
Set default GTT size to max(3G, 1/2 of system ram) by default.
v2: drop previous logic and default to 3/4 of ram
v3: default to half of ram to align with ttm
v4: fix spelling in comment (Kent)
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1942
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The device is identified by "NEXT" in board name, however there are
different versions of it, "Next Advance" and "Next Pro", that have
different DMI board names.
Due to a production error a batch or two have their board names prefixed
by "AYANEO", this makes it 6 different DMI board names. To save some
space in final kernel image DMI_MATCH is used instead of
DMI_EXACT_MATCH.
Signed-off-by: Maya Matuszczyk <maccraft123mc@gmail.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220619111952.8487-1-maccraft123mc@gmail.com
Display resolution change is implemented through drm modeset. Older
modeset (resolution) has to be disabled first before newer modeset
(resolution) can be enabled. Display disable will turn off both
pixel clock and main link clock so that main link have to be
re-trained during display enable to have new video stream flow
again. At current implementation, display enable function manually
kicks up irq_hpd_handle which will read panel link status and start
link training if link status is not in sync state.
However, there is rare case that a particular panel links status keep
staying in sync for some period of time after main link had been shut
down previously at display disabled. In this case, main link retraining
will not be executed by irq_hdp_handle(). Hence video stream of newer
display resolution will fail to be transmitted to panel due to main
link is not in sync between host and panel.
This patch will bypass irq_hpd_handle() in favor of directly call
dp_ctrl_on_stream() to always perform link training in regardless of
main link status. So that no unexpected exception resolution change
failure cases will happen. Also this implementation are more efficient
than manual kicking off irq_hpd_handle function.
Changes in v2:
-- set force_link_train flag on DP only (is_edp == false)
Changes in v3:
-- revise commit text
-- add Fixes tag
Changes in v4:
-- revise commit text
Changes in v5:
-- fix spelling at commit text
Changes in v6:
-- split dp_ctrl_on_stream() for phy test case
-- revise commit text for modeset
Changes in v7:
-- drop 0 assignment at local variable (ret = 0)
Changes in v8:
-- add patch to remove pixel_rate from dp_ctrl
Changes in v9:
-- forward declare dp_ctrl_on_stream_phy_test_report()
Fixes: 62671d2ef2 ("drm/msm/dp: fixes wrong connection state caused by failure of link train")
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/489895/
Link: https://lore.kernel.org/r/1655411200-7255-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
As explained in [1], using max_linewidth to limit the modes
does not seem to remove 4K modes on chipsets such as
sm8250 where the max_linewidth actually supports 4k.
This would have been alright if dual SSPP support was
present but otherwise fails the per SSPP bandwidth check.
The ideal way to implement this would be to filter out
the modes which will exceed the bandwidth check by computing
it.
But this would be an exhaustive solution till we have
dual SSPP support.
Let's instead use max_mixer_width to limit the modes.
max_mixer_width still remains 2560 on sm8250 so even if
the max_linewidth is 4096, the only way 4k modes could have
been supported is to have source split enabled on the SSPP.
Since source split support is not enabled yet in DPU driver,
enforce max_mixer_width as the upper limit on the modes.
[1] https://patchwork.freedesktop.org/patch/489662/
Fixes: e67dcecda0 ("drm/msm/dpu: limit writeback modes according to max_linewidth")
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/489893/
Link: https://lore.kernel.org/r/1655407606-21760-1-git-send-email-quic_abhinavk@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
During msm initialize phase, dp_display_unbind() will be called to undo
initializations had been done by dp_display_bind() previously if there is
error happen at msm_drm_bind. In this case, core_initialized flag had to
be check to make sure clocks is on before update DP controller register
to disable HPD interrupts. Otherwise system will crash due to below NOC
fatal error.
QTISECLIB [01f01a7ad]CNOC2 ERROR: ERRLOG0_LOW = 0x00061007
QTISECLIB [01f01a7ad]GEM_NOC ERROR: ERRLOG0_LOW = 0x00001007
QTISECLIB [01f0371a0]CNOC2 ERROR: ERRLOG0_HIGH = 0x00000003
QTISECLIB [01f055297]GEM_NOC ERROR: ERRLOG0_HIGH = 0x00000003
QTISECLIB [01f072beb]CNOC2 ERROR: ERRLOG1_LOW = 0x00000024
QTISECLIB [01f0914b8]GEM_NOC ERROR: ERRLOG1_LOW = 0x00000042
QTISECLIB [01f0ae639]CNOC2 ERROR: ERRLOG1_HIGH = 0x00004002
QTISECLIB [01f0cc73f]GEM_NOC ERROR: ERRLOG1_HIGH = 0x00004002
QTISECLIB [01f0ea092]CNOC2 ERROR: ERRLOG2_LOW = 0x0009020c
QTISECLIB [01f10895f]GEM_NOC ERROR: ERRLOG2_LOW = 0x0ae9020c
QTISECLIB [01f125ae1]CNOC2 ERROR: ERRLOG2_HIGH = 0x00000000
QTISECLIB [01f143be7]GEM_NOC ERROR: ERRLOG2_HIGH = 0x00000000
QTISECLIB [01f16153a]CNOC2 ERROR: ERRLOG3_LOW = 0x00000000
QTISECLIB [01f17fe07]GEM_NOC ERROR: ERRLOG3_LOW = 0x00000000
QTISECLIB [01f19cf89]CNOC2 ERROR: ERRLOG3_HIGH = 0x00000000
QTISECLIB [01f1bb08e]GEM_NOC ERROR: ERRLOG3_HIGH = 0x00000000
QTISECLIB [01f1d8a31]CNOC2 ERROR: SBM1 FAULTINSTATUS0_LOW = 0x00000002
QTISECLIB [01f1f72a4]GEM_NOC ERROR: SBM0 FAULTINSTATUS0_LOW = 0x00000001
QTISECLIB [01f21a217]CNOC3 ERROR: ERRLOG0_LOW = 0x00000006
QTISECLIB [01f23dfd3]NOC error fatal
changes in v2:
-- drop the first patch (drm/msm: enable msm irq after all initializations are done successfully at msm_drm_init()) since the problem had been fixed by other patch
Fixes: 570d3e5d28 ("drm/msm/dp: stop event kernel thread when DP unbind")
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/488387/
Link: https://lore.kernel.org/r/1654538139-7450-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Prior to the last commit, this could result in setting the GPU
written fence value back to an older value, if we had missed
updating completed_fence prior to suspend. This was mostly
harmless as the GPU would eventually overwrite it again with
the correct value. But we should just not do this. Instead
just leave a sanity check that the fence looks plausible (in
case the GPU scribbled on memory).
Reported-by: Steev Klimaszewski <steev@kali.org>
Fixes: 95d1deb02a ("drm/msm/gem: Add fenced vma unpin")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Patchwork: https://patchwork.freedesktop.org/patch/490138/
Link: https://lore.kernel.org/r/20220618161120.3451993-2-robdclark@gmail.com
I noticed while looking at some traces, that we could miss calls to
msm_update_fence(), as the irq could have raced with retire_submits()
which could have already popped the last submit on a ring out of the
queue of in-flight submits. But walking the list of submits in the
irq handler isn't really needed, as dma_fence_is_signaled() will dtrt.
So lets just drop it entirely.
v2: use spin_lock_irqsave/restore as we are no longer protected by the
spin_lock_irqsave/restore() in update_fences()
Reported-by: Steev Klimaszewski <steev@kali.org>
Fixes: 95d1deb02a ("drm/msm/gem: Add fenced vma unpin")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Patchwork: https://patchwork.freedesktop.org/patch/490136/
Link: https://lore.kernel.org/r/20220618161120.3451993-1-robdclark@gmail.com
When doing an asynchronous page flip (PAGE_FLIP ioctl with the
DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the
possible GPU buffer being rendered through a call to
vc4_queue_seqno_cb().
On the BCM2835-37, the GPU driver is part of the vc4 driver and that
function is defined in vc4_gem.c to wait for the buffer to be rendered,
and once it's done, call a callback.
However, on the BCM2711 used on the RaspberryPi4, the GPU driver is
separate (v3d) and that function won't do anything. This was working
because we were going into a path, due to uninitialized variables, that
was always scheduling the callback.
However, we were never actually waiting for the buffer to be rendered
which was resulting in frames being displayed out of order.
The generic API to signal those kind of completion in the kernel are the
DMA fences, and fortunately the v3d drivers supports them and signal
when its job is done. That API also provides an equivalent function that
allows to have a callback being executed when the fence is signalled as
done.
Let's change our driver a bit to rely on the previous function for the
older SoCs, and on DMA fences for the BCM2711.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Link: https://lore.kernel.org/r/20220610115149.964394-14-maxime@cerno.tech