Amlogic SoC devices report the following errors frequently causing excessive
dmesg log spam and early log rotataion, although the errors appear to be
harmless as everything works fine:
[ 7.202702] panfrost ffe40000.gpu: error powering up gpu L2
[ 7.203760] panfrost ffe40000.gpu: error powering up gpu shader
ARM staff have advised increasing the timeout values to eliminate the errors
in most normal scenarios, and testing with several different G31/G52 devices
shows 20000 to be a reliable value.
Fixes: f3ba91228e ("drm/panfrost: Add initial panfrost driver")
Suggested-by: Steven Price <steven.price@arm.com>
Signed-off-by: Christian Hewitt <christianshewitt@gmail.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201008141738.13560-1-christianshewitt@gmail.com
These settings are used by an ASPEED BMC to determine when the host is
trying to drive the display over PCIe (vga_pw) and to switch the
output between PCIe and the internal graphics device (dac_mux).
The valid values for the dac mux are:
00: VGA mode (default, aka PCIe)
01: Graphics CRT (aka BMC internal graphics, this driver)
10: Pass through mode from video input port A
11: Pass through mode from video input port B
Values for the read-only vga password register are:
1: Host driving the display
0: Host not driving the display
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://patchwork.freedesktop.org/patch/msgid/20200916083413.777307-1-joel@jms.id.au
If more than two jobs end up timeout-ing concurrently, only one of them
(the one attached to the scheduler acquiring the lock) is fully handled.
The other one remains in a dangling state where it's no longer part of
the scheduling queue, but still blocks something in scheduler, leading
to repetitive timeouts when new jobs are queued.
Let's make sure all bad jobs are properly handled by the thread
acquiring the lock.
v3:
- Add Steven's R-b
- Don't take the sched_lock when stopping the schedulers
v2:
- Fix the subject prefix
- Stop the scheduler before returning from panfrost_job_timedout()
- Call cancel_delayed_work_sync() after drm_sched_stop() to make sure
no timeout handlers are in flight when we reset the GPU (Steven Price)
- Make sure we release the reset lock before restarting the
schedulers (Steven Price)
Fixes: f3ba91228e ("drm/panfrost: Add initial panfrost driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002122506.1374183-1-boris.brezillon@collabora.com
When doing an atomic modeset with ALLOW_MODESET drivers are allowed to
pull in arbitrary other resources, including CRTCs (e.g. when
reconfiguring global resources).
But in nonblocking mode userspace has then no idea this happened,
which can lead to spurious EBUSY calls, both:
- when that other CRTC is currently busy doing a page_flip the
ALLOW_MODESET commit can fail with an EBUSY
- on the other CRTC a normal atomic flip can fail with EBUSY because
of the additional commit inserted by the kernel without userspace's
knowledge
For blocking commits this isn't a problem, because everyone else will
just block until all the CRTC are reconfigured. Only thing userspace
can notice is the dropped frames without any reason for why frames got
dropped.
Consensus is that we need new uapi to handle this properly, but no one
has any idea what exactly the new uapi should look like. Since this
has been shipping for years already compositors need to deal no matter
what, so as a first step just try to enforce this across drivers
better with some checks.
v2: Add comments and a WARN_ON to enforce this only when allowed - we
don't want to silently convert page flips into blocking plane updates
just because the driver is buggy.
v3: Fix inverted WARN_ON (Pekka).
v4: Drop the uapi changes, only add a WARN_ON for now to enforce some
rules for drivers.
v5: Make the WARNING more informative (Daniel)
v6: Add unconditional debug output for compositor hackers to figure
out what's going on when they get an EBUSY (Daniel)
v7: Fix up old/new_crtc_state confusion for real (Pekka/Ville)
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
References: https://lists.freedesktop.org/archives/dri-devel/2018-July/182281.html
Bugzilla: https://gitlab.freedesktop.org/wayland/weston/issues/24#note_9568
Cc: Daniel Stone <daniel@fooishbar.org>
Cc: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Cc: Simon Ser <contact@emersion.fr>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200925084651.3250104-1-daniel.vetter@ffwll.ch
We want to use the dev_* functions here rather than the pr_* variants.
Switch to using dev_warn() which mirrors what we do on other asics.
Fixes the following build errors on ARC:
../drivers/gpu/drm/amd/amdgpu/../powerplay/navi10_ppt.c: In function 'navi10_fill_i2c_req':
../arch/arc/include/asm/bug.h:24:2: error: implicit declaration of function 'pr_warn'; did you mean 'drm_warn'? [-Werror=implicit-function-declaration]
../drivers/gpu/drm/amd/amdgpu/../powerplay/sienna_cichlid_ppt.c: In function 'sienna_cichlid_fill_i2c_req':
../arch/arc/include/asm/bug.h:24:2: error: implicit declaration of function 'pr_warn'; did you mean 'drm_warn'? [-Werror=implicit-function-declaration]
Reported-by: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Commit c1cf79ca5c ("drm/amdgpu: use IP discovery table for renoir")
introduced a NULL pointer dereference when booting with
amdgpu.discovery=0, because it removed the call of vega10_reg_base_init()
for that case.
Fix this by calling that funcion if amdgpu_discovery == 0 in addition to
the case that amdgpu_discovery_reg_base_init() failed.
Fixes: c1cf79ca5c ("drm/amdgpu: use IP discovery table for renoir")
Signed-off-by: Dirk Gouders <dirk@gouders.net>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable Green_Sardine VCN support and VCN firmware loading
v2: use apu flags
Signed-off-by: Thong Thai <thong.thai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The default auto setting for kcq should not generate
a warning.
Fixes: a300de40f6 ("drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We want to use the dev_* functions here rather than the pr_* variants.
Switch to using dev_warn() which mirrors what we do on other asics.
Fixes the following build errors on ARC:
../drivers/gpu/drm/amd/amdgpu/../powerplay/navi10_ppt.c: In function 'navi10_fill_i2c_req':
../arch/arc/include/asm/bug.h:24:2: error: implicit declaration of function 'pr_warn'; did you mean 'drm_warn'? [-Werror=implicit-function-declaration]
../drivers/gpu/drm/amd/amdgpu/../powerplay/sienna_cichlid_ppt.c: In function 'sienna_cichlid_fill_i2c_req':
../arch/arc/include/asm/bug.h:24:2: error: implicit declaration of function 'pr_warn'; did you mean 'drm_warn'? [-Werror=implicit-function-declaration]
Reported-by: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add xgmi perfmons for Arcturus.
v4: Align with patch 2 with struct improvements
v3: Align with patch 2 streamlining perf types versus event config types.
v2: Resend for re-review with alignment for v3 in patch 2.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Chris Freehill <chris.freehill@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Non-outbound data metrics are non useful so mark them as legacy.
Bucket new perf counters into device and not device ip.
Bind events to chip instead of IP.
Report available event counters and not number of hw counter banks.
Move DF public macros to private since not needed outside of IP version.
v5: cleanup by moving per chip configs into structs
v4: After more discussion, replace *_LEGACY references with IP references
to indicate concept of pmu-typed versus event-config-typed event
registration.
v3: attr groups const array is global but attr groups are allocated per
device which doesn't work and causes problems on memory allocation and
de-allocation for pmu unregister. Switch to building const attr groups
per pmu instead to simplify solution.
v2: add comments on sysfs structure and formatting.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mapping hw counters per event config will cause ABA problems so map per
event instead.
v2: Discontinue starting perf counters if add fails. Make it clear what's
happening with pmc_start.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since we track the idle_pulse for flushing the barriers and avoid
re-emitting the pulse upon idling if no futher action is required, this
also impacts the heartbeat. Before emitting a fresh heartbeat, we look
at the engine idle status and assume that if the pulse was the last
request emitted along the heartbeat, the engine is idling and a
heartbeat pulse not required. This assumption fails, but we can reuse
the idle pulse as the heartbeat if we are yet to emit one, and so track
the status of that pulse for our engine health check.
This impacts tgl/rcs0 as we rely on the heartbeat for our healthcheck for
the normal preemption detection mechanism is disabled by default.
Testcase: igt/gem_exec_schedule/preempt-hang/rcs0 #tgl
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006094653.7558-1-chris@chris-wilson.co.uk
As the previous patch fixed the places where we walk the whole scatterlist
for DMA addresses, this patch fixes the random lookup functionality.
To achieve this we have to add a second lookup iterator and add a
i915_gem_object_get_sg_dma helper, to be used analoguous to existing
i915_gem_object_get_sg_dma. Therefore two lookup caches are maintained per
object and they are flushed at the same point for simplicity. (Strictly
speaking the DMA cache should be flushed from i915_gem_gtt_finish_pages,
but today this conincides with unsetting of the pages in general.)
Partial VMA view is then fixed to use the new DMA lookup and properly
query sg length.
v2:
* Checkpatch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Tom Murphy <murphyt7@tcd.ie>
Cc: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-2-tvrtko.ursulin@linux.intel.com
Currently we do a final scrub of the HW state upon release. However,
when rebinding the device, this is too late as the device may either
have been partially rebound or the device is no longer accessible. If
the device has been removed before release, the reset goes astray
leaving the device in an inconsistent state, unlikely to work without a
full PCI reset. Furthermore, if the device is partially rebound before
the HW scrubbing, there may be leftover HW state that should have been
scrubbed. Either way, we need to push the scrubbing earlier before the
removal, so into unregister. The danger is that on older machines,
resetting the GPU also impact the display engine and so the reset should
be after modesetting is disabled (and before reuse we need to recover
modesetting).
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2508
Testcase: igt/core_hotunplug
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200929112639.24223-1-chris@chris-wilson.co.uk