With asynchronous migrations, the vma state may be several migrations
ahead of the state that matches the request we're capturing.
Address that by introducing an i915_vma_snapshot structure that
can be used to snapshot relevant state at request submission.
In order to make sure we access the correct memory, the snapshots take
references on relevant sg-tables and memory regions.
Also move the capture list allocation out of the fence signaling
critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to
avoid compiling in members and functions used for error capture
when they're not used.
Finally, Introduce lockdep annotation.
v4:
- Break out the capture allocation mode change to a separate patch.
v5:
- Fix compilation error in the !CONFIG_DRM_I915_CAPTURE_ERROR case
(kernel test robot)
v6:
- Use #if IS_ENABLED() instead of #ifdef to match driver style.
- Move yet another change of allocation mode to the separate patch.
- Commit message rework due to patch reordering.
v7:
- Adjust for removal of region refcounting.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211129202245.472043-1-thomas.hellstrom@linux.intel.com
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
Pinned contexts, like the migrate contexts need reset after resume
since their context image may have been lost. Also the GuC needs to
register pinned contexts.
Add a list to struct intel_engine_cs where we add all pinned contexts on
creation, and traverse that list at resume time to reset the pinned
contexts.
This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest for now,
but proper LMEM backup / restore is needed for full suspend functionality.
However, note that even with full LMEM backup / restore it may be
desirable to keep the reset since backing up the migrate context images
must happen using memcpy() after the migrate context has become inactive,
and for performance- and other reasons we want to avoid memcpy() from
LMEM.
Also traverse the list at guc_init_lrc_mapping() calling
guc_kernel_context_pin() for the pinned contexts, like is already done
for the kernel context.
v2:
- Don't reset the contexts on each __engine_unpark() but rather at
resume time (Chris Wilson).
v3:
- Reset contexts in the engine sanitize callback. (Chris Wilson)
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Brost Matthew <matthew.brost@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-6-thomas.hellstrom@linux.intel.com
Xe_HP adds some new bits to the FUSE1 register to let us know whether a
given SFC unit is present. We should take this into account while
initializing SFC availability to our VCS and VECS engines.
While we're at it, update the FUSE1 register definition to use
REG_GENMASK / REG_FIELD_GET notation.
Note that, the bspec confusingly names the fuse bits "disable" despite
the register reflecting the *enable* status of the SFC units. The
original architecture documents which the bspec is based on do properly
name this field "SFC_ENABLE."
Bspec: 52543
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210917161203.812251-2-matthew.d.roper@intel.com
We no longer have traditional slices on Xe_HP platforms, but the
INSTDONE registers are replicated according to gslice representation
which is similar. We can mostly re-use the existing instdone code with
just a few modifications:
* Create an alternate instdone loop macro that will iterate over the
flat DSS space, but still provide the gslice/dss steering values for
compatibility with the legacy code.
* We should allocate INSTDONE storage space according to the maximum
number of gslices rather than the maximum number of legacy slices to
ensure we have enough storage space to hold all of the values. XeHP
design has 8 gslices, whereas older platforms never had more than 3
slices.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210805163647.801064-3-matthew.d.roper@intel.com
In the case of a full GPU reset (e.g. because GuC has died or because
GuC's hang detection has been disabled), the driver can't rely on GuC
reporting the guilty context. Instead, the driver needs to scan all
active contexts and find one that is currently executing, as per the
execlist mode behaviour. In GuC mode, this scan is different to
execlist mode as the active request list is handled very differently.
Similarly, the request state dump in debugfs needs to be handled
differently when in GuC submission mode.
Also refactured some of the request scanning code to avoid duplication
across the multiple code paths that are now replicating it.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-20-matthew.brost@intel.com
We receive notification of an engine reset from GuC at its
completion. Meaning GuC has potentially cleared any HW state
we may have been interested in capturing. GuC resumes scheduling
on the engine post-reset, as the resets are meant to be transparent,
further muddling our error state.
There is ongoing work to define an API for a GuC debug state dump. The
suggestion for now is to manually disable FW initiated resets in cases
where debug state is needed.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-19-matthew.brost@intel.com
This is only used by GRAPHICS_VER == 6 and GRAPHICS_VER == 7. All other
recent platforms do not depend on this field, so it doesn't make much
sense to keep it generic like that. Instead, just do a mapping from
engine class to HW ID in the single place that is needed.
v2: use macros with the direct register address instead of calculating
from the legacy HW_ID (Matt Roper)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210723002551.3906535-1-lucas.demarchi@intel.com
The engine hw_id is only used by RING_FAULT_REG(), which is not used
by GRAPHICS_VER >= 8. We did use hw_id on recent platforms to set
the engine's guc_id, but that is not the case anymore since
commit c784e5249e ("drm/i915/guc: Update to use firmware v49.0.1"):
now we only use class and id information to generate guc_id.
We tend to keep adding new defines just to be consistent, but let's try
to remove them and let them defined to 0 for engines that only exist on
gen8+ platforms.
v2: Reword commit message and add information about when we stopped
using hw_id (Matt Roper)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210720232014.3302645-3-lucas.demarchi@intel.com
Previously, we were storing the ring size in the ring pointer before it
was actually allocated. We would then guard setting the ring size on
checking for CONTEXT_ALLOC_BIT. This is error-prone at best and really
only saves us a few bytes on something that already burns at least 4K.
Instead, this patch adds a new ring_size field and makes everything use
that.
v2 (Daniel Vetter):
- Replace 512 * SZ_4K with SZ_2M
v2 (Jason Ekstrand):
- Rebase on top of page migration code
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-3-jason@jlekstrand.net
Introduce i915_sched_engine object which is lower level data structure
that i915_scheduler / generic code can operate on without touching
execlist specific structures. This allows additional submission backends
to be added without breaking the layering. Currently the execlists
backend uses 1 of these object per each engine (physical or virtual) but
future backends like the GuC will point to less instances utilizing the
reference counting.
This is a bit of detour to integrating the i915 with the DRM scheduler
but this object will still exist when the DRM scheduler lands in the
i915. It will however look a bit different. It will encapsulate the
drm_gpu_scheduler object plus and common variables (to the backends)
related to scheduling. Regardless this is a step in the right direction.
This patch starts the aforementioned transition by moving the priolist
into the i915_sched_engine object.
v3:
(Jason Ekstrand)
Update comment next to intel_engine_cs.virtual
Add kernel doc
(Checkpatch)
Fix double the in commit message
v4:
(Daniele)
Update comment message.
Add comment about subclass field
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-2-matthew.brost@intel.com
This was done by the following semantic patch:
@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)
@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E
@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E
@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E
@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)
@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)
@@
identifier def.id;
@@
- id
+ ver
It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-2-lucas.demarchi@intel.com
Driver Changes:
- Prepare for local/device memory support on DG1 by starting
to use it for kernel internal allocations: context, ring
and engine scratch (Matt A, CQ, Abdiel, Imre)
- Sandybridge fix to avoid hard hang on ring resume (Chris)
- Limit imported dma-buf size to int32 (Matt A)
- Double check heartbeat timeout before resetting (Chris)
- Use new tasklet API for execution list (Emil)
- Fix SPDX checkpats warnings (Chris)
- Fixes for various checkpatch warnings (Chris)
- Selftest improvements (Chris)
- Move the defer_request waiter active assertion to correct spot (Chris)
- Make local-memory probing a GT operation (Matt, Tvrtko)
- Protect against request freeing during cancellation on wedging (Chris)
- Retire unexpected starting state error dumping (Chris)
- Distinction of memory regions in debugging (Zbigniew)
- Always flush the submission queue on checking for idle (Chris)
- Consolidate 2big error check to helper (Matt)
- Decrease number of subplatform bits (Tvrtko)
- Remove unused internal request priority levels (Chris)
- Document the unused internal header bits in buddy allocator (Matt)
- Cleanup the region class/instance encoding (Matt)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YGxksaZGXHnFxlwg@jlahtine-mobl.ger.corp.intel.com
Instead of sharing pages with breadcrumbs, give each timeline a
single page. This allows unrelated timelines not to share locks
any more during command submission.
As an additional benefit, seqno wraparound no longer requires
i915_vma_pin, which means we no longer need to worry about a
potential -EDEADLK at a point where we are ready to submit.
Changes since v1:
- Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
- Extra check for completion in intel_read_hwsp().
Changes since v2:
- Fix inconsistent indent in hwsp_alloc() (kbuild)
- memset entire cacheline to 0.
Changes since v3:
- Do same in intel_timeline_reset_seqno(), and clflush for good measure.
Changes since v4:
- Use refcounting on timeline, instead of relying on i915_active.
- Fix waiting on kernel requests.
Changes since v5:
- Bump amount of slots to maximum (256), for best wraparounds.
- Add hwsp_offset to i915_request to fix potential wraparound hang.
- Ensure timeline wrap test works with the changes.
- Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
fix a hang.
Changes since v6:
- Rename i915_request_active_offset to i915_request_active_seqno(),
and elaborate the function. (tvrtko)
Changes since v7:
- Move hunk to where it belongs. (jekstrand)
- Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com
Highlights:
- Alderlake S enabling, via topic branch (Aditya, Anusha, Caz, José, Lucas, Matt, Tejas)
- Refactor display code to shrink intel_display.c etc. (Dave)
- Support more gen 9 and Tigerlake PCH combinations (Lyude, Tejas)
- Add eDP MSO support (Jani)
Display:
- Refactor to support multiple PSR instances (Gwan-gyeong)
- Link training debug logging updates (Sean)
- Updates to eDP fixed mode handling (Jani)
- Disable PSR2 on JSL/EHL (Edmund)
- Support DDR5 and LPDDR5 for bandwidth computation (Clint, José)
- Update VBT DP max link rate table (Shawn)
- Disable the QSES check for HDCP2.2 over MST (Juston)
- PSR updates, refactoring, selective fetch (José, Gwan-gyeong)
- Display init sequence refactoring (Lucas)
- Limit LSPCON to gen 9 and 10 platforms (Ankit)
- Fix DDI lane polarity per VBT info (Uma)
- Fix HDMI vswing programming location in mode set (Ville)
- Various display improvements and refactorings and cleanups (Ville)
- Clean up DDI clock routing and readout (Ville)
- Workaround async flip + VT-d corruption on HSW/BDW (Ville)
- SAGV watermark fixes and cleanups (Ville)
- Silence pipe tracepoint WARNs (Ville)
Other:
- Remove require_force_probe protection from RKL, may need to be revisited (Tejas)
- Detect loss of MMIO access (Matt)
- GVT display improvements
- drm/i915: Disable runtime power management during shutdown (Imre)
- Perf/OA updates (Umesh)
- Remove references to struct drm_device.pdev, via topic branch (Thomas)
- Backmerge (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87v99rnk1g.fsf@intel.com
Commit 311a50e76a ("drm/i915: Add support for mandatory cmdparsing")
introduced mandatory command parsing but setup failures were not
translated into wedging the GPU which was probably the intent.
Possible errors come in two categories. Either the sanity check on
internal tables has failed, which should be caught in CI unless an
affected platform would be missed in testing; or memory allocation failure
happened during driver load, which should be extremely unlikely but for
correctness should still be handled.
v2:
* Tidy coding style. (Chris)
[airlied: cherry-picked to avoid rc1 base]
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 311a50e76a ("drm/i915: Add support for mandatory cmdparsing")
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210302114213.1102223-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 5a1a659762d35a6dc51047c9127c011303c77b7f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>