2019-08-12 09:29:35 +00:00
|
|
|
// SPDX-License-Identifier: MIT
|
2015-08-12 15:43:39 +01:00
|
|
|
/*
|
|
|
|
|
* Copyright © 2014 Intel Corporation
|
|
|
|
|
*/
|
|
|
|
|
|
2017-10-04 18:13:40 +00:00
|
|
|
#include <linux/circ_buf.h>
|
2017-03-16 12:56:18 +00:00
|
|
|
|
2019-05-28 10:29:49 +01:00
|
|
|
#include "gem/i915_gem_context.h"
|
2021-01-12 18:12:35 -08:00
|
|
|
#include "gt/gen8_engine_cs.h"
|
|
|
|
|
#include "gt/intel_breadcrumbs.h"
|
2019-07-13 11:00:11 +01:00
|
|
|
#include "gt/intel_context.h"
|
2021-07-26 17:23:24 -07:00
|
|
|
#include "gt/intel_engine_heartbeat.h"
|
2022-01-10 21:15:56 -08:00
|
|
|
#include "gt/intel_engine_pm.h"
|
|
|
|
|
#include "gt/intel_engine_regs.h"
|
2021-10-14 10:19:59 -07:00
|
|
|
#include "gt/intel_gpu_commands.h"
|
2019-07-13 11:00:14 +01:00
|
|
|
#include "gt/intel_gt.h"
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
#include "gt/intel_gt_clock_utils.h"
|
2021-05-21 11:32:15 -07:00
|
|
|
#include "gt/intel_gt_irq.h"
|
2019-08-08 21:27:58 +01:00
|
|
|
#include "gt/intel_gt_pm.h"
|
2022-01-27 15:43:33 -08:00
|
|
|
#include "gt/intel_gt_regs.h"
|
2021-07-21 14:50:49 -07:00
|
|
|
#include "gt/intel_gt_requests.h"
|
2020-12-19 02:03:42 +00:00
|
|
|
#include "gt/intel_lrc.h"
|
2021-07-21 14:50:49 -07:00
|
|
|
#include "gt/intel_lrc_reg.h"
|
2021-01-12 18:12:35 -08:00
|
|
|
#include "gt/intel_mocs.h"
|
2019-10-24 11:03:44 +01:00
|
|
|
#include "gt/intel_ring.h"
|
|
|
|
|
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
#include "intel_guc_ads.h"
|
2022-03-21 09:45:24 -07:00
|
|
|
#include "intel_guc_capture.h"
|
2017-11-16 19:02:41 +05:30
|
|
|
#include "intel_guc_submission.h"
|
2019-07-13 11:00:11 +01:00
|
|
|
|
2017-10-04 18:13:40 +00:00
|
|
|
#include "i915_drv.h"
|
2019-08-06 13:07:28 +03:00
|
|
|
#include "i915_trace.h"
|
2017-10-04 18:13:40 +00:00
|
|
|
|
2015-08-12 15:43:41 +01:00
|
|
|
/**
|
2015-10-19 16:10:54 -07:00
|
|
|
* DOC: GuC-based command submission
|
2015-08-12 15:43:41 +01:00
|
|
|
*
|
|
|
|
|
* The Scratch registers:
|
|
|
|
|
* There are 16 MMIO-based registers start from 0xC180. The kernel driver writes
|
|
|
|
|
* a value to the action register (SOFT_SCRATCH_0) along with any data. It then
|
|
|
|
|
* triggers an interrupt on the GuC via another register write (0xC4C8).
|
|
|
|
|
* Firmware writes a success/fail code back to the action register after
|
|
|
|
|
* processes the request. The kernel driver polls waiting for this update and
|
|
|
|
|
* then proceeds.
|
|
|
|
|
*
|
2021-09-09 09:47:44 -07:00
|
|
|
* Command Transport buffers (CTBs):
|
|
|
|
|
* Covered in detail in other sections but CTBs (Host to GuC - H2G, GuC to Host
|
|
|
|
|
* - G2H) are a message interface between the i915 and GuC.
|
|
|
|
|
*
|
|
|
|
|
* Context registration:
|
|
|
|
|
* Before a context can be submitted it must be registered with the GuC via a
|
|
|
|
|
* H2G. A unique guc_id is associated with each context. The context is either
|
|
|
|
|
* registered at request creation time (normal operation) or at submission time
|
|
|
|
|
* (abnormal operation, e.g. after a reset).
|
|
|
|
|
*
|
|
|
|
|
* Context submission:
|
|
|
|
|
* The i915 updates the LRC tail value in memory. The i915 must enable the
|
|
|
|
|
* scheduling of the context within the GuC for the GuC to actually consider it.
|
|
|
|
|
* Therefore, the first time a disabled context is submitted we use a schedule
|
|
|
|
|
* enable H2G, while follow up submissions are done via the context submit H2G,
|
|
|
|
|
* which informs the GuC that a previously enabled context has new work
|
|
|
|
|
* available.
|
|
|
|
|
*
|
|
|
|
|
* Context unpin:
|
|
|
|
|
* To unpin a context a H2G is used to disable scheduling. When the
|
|
|
|
|
* corresponding G2H returns indicating the scheduling disable operation has
|
|
|
|
|
* completed it is safe to unpin the context. While a disable is in flight it
|
|
|
|
|
* isn't safe to resubmit the context so a fence is used to stall all future
|
|
|
|
|
* requests of that context until the G2H is returned.
|
|
|
|
|
*
|
|
|
|
|
* Context deregistration:
|
|
|
|
|
* Before a context can be destroyed or if we steal its guc_id we must
|
|
|
|
|
* deregister the context with the GuC via H2G. If stealing the guc_id it isn't
|
|
|
|
|
* safe to submit anything to this guc_id until the deregister completes so a
|
|
|
|
|
* fence is used to stall all requests associated with this guc_id until the
|
|
|
|
|
* corresponding G2H returns indicating the guc_id has been deregistered.
|
|
|
|
|
*
|
2021-10-14 10:19:41 -07:00
|
|
|
* submission_state.guc_ids:
|
2021-09-09 09:47:44 -07:00
|
|
|
* Unique number associated with private GuC context data passed in during
|
|
|
|
|
* context registration / submission / deregistration. 64k available. Simple ida
|
|
|
|
|
* is used for allocation.
|
|
|
|
|
*
|
|
|
|
|
* Stealing guc_ids:
|
|
|
|
|
* If no guc_ids are available they can be stolen from another context at
|
|
|
|
|
* request creation time if that context is unpinned. If a guc_id can't be found
|
|
|
|
|
* we punt this problem to the user as we believe this is near impossible to hit
|
|
|
|
|
* during normal use cases.
|
|
|
|
|
*
|
|
|
|
|
* Locking:
|
|
|
|
|
* In the GuC submission code we have 3 basic spin locks which protect
|
|
|
|
|
* everything. Details about each below.
|
|
|
|
|
*
|
|
|
|
|
* sched_engine->lock
|
|
|
|
|
* This is the submission lock for all contexts that share an i915 schedule
|
|
|
|
|
* engine (sched_engine), thus only one of the contexts which share a
|
|
|
|
|
* sched_engine can be submitting at a time. Currently only one sched_engine is
|
|
|
|
|
* used for all of GuC submission but that could change in the future.
|
|
|
|
|
*
|
2021-10-14 10:19:41 -07:00
|
|
|
* guc->submission_state.lock
|
2021-10-14 10:19:42 -07:00
|
|
|
* Global lock for GuC submission state. Protects guc_ids and destroyed contexts
|
|
|
|
|
* list.
|
2021-09-09 09:47:44 -07:00
|
|
|
*
|
|
|
|
|
* ce->guc_state.lock
|
|
|
|
|
* Protects everything under ce->guc_state. Ensures that a context is in the
|
|
|
|
|
* correct state before issuing a H2G. e.g. We don't issue a schedule disable
|
|
|
|
|
* on a disabled context (bad idea), we don't issue a schedule enable when a
|
|
|
|
|
* schedule disable is in flight, etc... Also protects list of inflight requests
|
|
|
|
|
* on the context and the priority management state. Lock is individual to each
|
|
|
|
|
* context.
|
|
|
|
|
*
|
|
|
|
|
* Lock ordering rules:
|
|
|
|
|
* sched_engine->lock -> ce->guc_state.lock
|
2021-10-14 10:19:41 -07:00
|
|
|
* guc->submission_state.lock -> ce->guc_state.lock
|
2015-08-12 15:43:41 +01:00
|
|
|
*
|
2021-09-09 09:47:44 -07:00
|
|
|
* Reset races:
|
|
|
|
|
* When a full GT reset is triggered it is assumed that some G2H responses to
|
|
|
|
|
* H2Gs can be lost as the GuC is also reset. Losing these G2H can prove to be
|
|
|
|
|
* fatal as we do certain operations upon receiving a G2H (e.g. destroy
|
|
|
|
|
* contexts, release guc_ids, etc...). When this occurs we can scrub the
|
|
|
|
|
* context state and cleanup appropriately, however this is quite racey.
|
|
|
|
|
* To avoid races, the reset code must disable submission before scrubbing for
|
|
|
|
|
* the missing G2H, while the submission code must check for submission being
|
|
|
|
|
* disabled and skip sending H2Gs and updating context states when it is. Both
|
|
|
|
|
* sides must also make sure to hold the relevant locks.
|
2015-08-12 15:43:41 +01:00
|
|
|
*/
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
/* GuC Virtual Engine */
|
|
|
|
|
struct guc_virtual_engine {
|
|
|
|
|
struct intel_engine_cs base;
|
|
|
|
|
struct intel_context context;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static struct intel_context *
|
2021-10-14 10:19:56 -07:00
|
|
|
guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count,
|
|
|
|
|
unsigned long flags);
|
|
|
|
|
|
|
|
|
|
static struct intel_context *
|
|
|
|
|
guc_create_parallel(struct intel_engine_cs **engines,
|
|
|
|
|
unsigned int num_siblings,
|
|
|
|
|
unsigned int width);
|
2021-07-26 17:23:16 -07:00
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
#define GUC_REQUEST_SIZE 64 /* bytes */
|
|
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
/*
|
|
|
|
|
* We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
|
|
|
|
|
* per the GuC submission interface. A different allocation algorithm is used
|
|
|
|
|
* (bitmap vs. ida) between multi-lrc and single-lrc hence the reason to
|
|
|
|
|
* partition the guc_id space. We believe the number of multi-lrc contexts in
|
|
|
|
|
* use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
|
|
|
|
|
* multi-lrc.
|
|
|
|
|
*/
|
2021-12-14 09:05:00 -08:00
|
|
|
#define NUMBER_MULTI_LRC_GUC_ID(guc) \
|
|
|
|
|
((guc)->submission_state.num_guc_ids / 16)
|
2021-10-14 10:19:50 -07:00
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
/*
|
|
|
|
|
* Below is a set of functions which control the GuC scheduling state which
|
2021-09-09 09:47:38 -07:00
|
|
|
* require a lock.
|
2021-07-21 14:50:49 -07:00
|
|
|
*/
|
|
|
|
|
#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0)
|
|
|
|
|
#define SCHED_STATE_DESTROYED BIT(1)
|
2021-07-21 14:50:51 -07:00
|
|
|
#define SCHED_STATE_PENDING_DISABLE BIT(2)
|
2021-07-26 17:23:39 -07:00
|
|
|
#define SCHED_STATE_BANNED BIT(3)
|
2021-09-09 09:47:38 -07:00
|
|
|
#define SCHED_STATE_ENABLED BIT(4)
|
|
|
|
|
#define SCHED_STATE_PENDING_ENABLE BIT(5)
|
|
|
|
|
#define SCHED_STATE_REGISTERED BIT(6)
|
2022-04-12 15:59:55 -07:00
|
|
|
#define SCHED_STATE_POLICY_REQUIRED BIT(7)
|
|
|
|
|
#define SCHED_STATE_BLOCKED_SHIFT 8
|
2021-07-26 17:23:40 -07:00
|
|
|
#define SCHED_STATE_BLOCKED BIT(SCHED_STATE_BLOCKED_SHIFT)
|
|
|
|
|
#define SCHED_STATE_BLOCKED_MASK (0xfff << SCHED_STATE_BLOCKED_SHIFT)
|
2021-09-09 09:47:30 -07:00
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static inline void init_sched_state(struct intel_context *ce)
|
|
|
|
|
{
|
2021-09-09 09:47:34 -07:00
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
2021-09-09 09:47:22 -07:00
|
|
|
ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK;
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:34 -07:00
|
|
|
__maybe_unused
|
|
|
|
|
static bool sched_state_is_init(struct intel_context *ce)
|
|
|
|
|
{
|
2022-02-17 13:29:42 -08:00
|
|
|
/* Kernel contexts can have SCHED_STATE_REGISTERED after suspend. */
|
|
|
|
|
return !(ce->guc_state.sched_state &
|
2021-09-09 09:47:38 -07:00
|
|
|
~(SCHED_STATE_BLOCKED_MASK | SCHED_STATE_REGISTERED));
|
2021-09-09 09:47:34 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static inline bool
|
|
|
|
|
context_wait_for_deregister_to_register(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state &
|
|
|
|
|
SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
|
set_context_wait_for_deregister_to_register(struct intel_context *ce)
|
|
|
|
|
{
|
2021-09-09 09:47:34 -07:00
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
2021-07-21 14:50:49 -07:00
|
|
|
ce->guc_state.sched_state |=
|
|
|
|
|
SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
|
clr_context_wait_for_deregister_to_register(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state &=
|
|
|
|
|
~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline bool
|
|
|
|
|
context_destroyed(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state & SCHED_STATE_DESTROYED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
|
set_context_destroyed(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
static inline bool context_pending_disable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void set_context_pending_disable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void clr_context_pending_disable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state &= ~SCHED_STATE_PENDING_DISABLE;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:39 -07:00
|
|
|
static inline bool context_banned(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state & SCHED_STATE_BANNED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void set_context_banned(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state |= SCHED_STATE_BANNED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void clr_context_banned(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state &= ~SCHED_STATE_BANNED;
|
|
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:38 -07:00
|
|
|
static inline bool context_enabled(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state & SCHED_STATE_ENABLED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void set_context_enabled(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state |= SCHED_STATE_ENABLED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void clr_context_enabled(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state &= ~SCHED_STATE_ENABLED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline bool context_pending_enable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state & SCHED_STATE_PENDING_ENABLE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void set_context_pending_enable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state |= SCHED_STATE_PENDING_ENABLE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void clr_context_pending_enable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state &= ~SCHED_STATE_PENDING_ENABLE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline bool context_registered(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state & SCHED_STATE_REGISTERED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void set_context_registered(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state |= SCHED_STATE_REGISTERED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void clr_context_registered(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state &= ~SCHED_STATE_REGISTERED;
|
|
|
|
|
}
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
static inline bool context_policy_required(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return ce->guc_state.sched_state & SCHED_STATE_POLICY_REQUIRED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void set_context_policy_required(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state |= SCHED_STATE_POLICY_REQUIRED;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void clr_context_policy_required(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
ce->guc_state.sched_state &= ~SCHED_STATE_POLICY_REQUIRED;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
static inline u32 context_blocked(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return (ce->guc_state.sched_state & SCHED_STATE_BLOCKED_MASK) >>
|
|
|
|
|
SCHED_STATE_BLOCKED_SHIFT;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void incr_context_blocked(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
|
|
|
|
|
ce->guc_state.sched_state += SCHED_STATE_BLOCKED;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!context_blocked(ce)); /* Overflow check */
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void decr_context_blocked(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!context_blocked(ce)); /* Underflow check */
|
|
|
|
|
|
|
|
|
|
ce->guc_state.sched_state -= SCHED_STATE_BLOCKED;
|
|
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:40 -07:00
|
|
|
static inline bool context_has_committed_requests(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return !!ce->guc_state.number_committed_requests;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void incr_context_committed_requests(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
++ce->guc_state.number_committed_requests;
|
|
|
|
|
GEM_BUG_ON(ce->guc_state.number_committed_requests < 0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void decr_context_committed_requests(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
--ce->guc_state.number_committed_requests;
|
|
|
|
|
GEM_BUG_ON(ce->guc_state.number_committed_requests < 0);
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:49 -07:00
|
|
|
static struct intel_context *
|
|
|
|
|
request_to_scheduling_context(struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
return intel_context_to_parent(rq->context);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static inline bool context_guc_id_invalid(struct intel_context *ce)
|
|
|
|
|
{
|
2022-03-01 16:33:52 -08:00
|
|
|
return ce->guc_id.id == GUC_INVALID_CONTEXT_ID;
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void set_context_guc_id_invalid(struct intel_context *ce)
|
|
|
|
|
{
|
2022-03-01 16:33:52 -08:00
|
|
|
ce->guc_id.id = GUC_INVALID_CONTEXT_ID;
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return &ce->engine->gt->uc.guc;
|
|
|
|
|
}
|
|
|
|
|
|
2018-02-22 14:22:29 +00:00
|
|
|
static inline struct i915_priolist *to_priolist(struct rb_node *rb)
|
|
|
|
|
{
|
|
|
|
|
return rb_entry(rb, struct i915_priolist, node);
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:48 -07:00
|
|
|
/*
|
|
|
|
|
* When using multi-lrc submission a scratch memory area is reserved in the
|
2021-10-14 10:19:59 -07:00
|
|
|
* parent's context state for the process descriptor, work queue, and handshake
|
|
|
|
|
* between the parent + children contexts to insert safe preemption points
|
|
|
|
|
* between each of the BBs. Currently the scratch area is sized to a page.
|
2021-10-14 10:19:48 -07:00
|
|
|
*
|
|
|
|
|
* The layout of this scratch area is below:
|
|
|
|
|
* 0 guc_process_desc
|
2021-10-14 10:19:59 -07:00
|
|
|
* + sizeof(struct guc_process_desc) child go
|
|
|
|
|
* + CACHELINE_BYTES child join[0]
|
|
|
|
|
* ...
|
|
|
|
|
* + CACHELINE_BYTES child join[n - 1]
|
2021-10-14 10:19:48 -07:00
|
|
|
* ... unused
|
|
|
|
|
* PARENT_SCRATCH_SIZE / 2 work queue start
|
|
|
|
|
* ... work queue
|
|
|
|
|
* PARENT_SCRATCH_SIZE - 1 work queue end
|
|
|
|
|
*/
|
|
|
|
|
#define WQ_SIZE (PARENT_SCRATCH_SIZE / 2)
|
|
|
|
|
#define WQ_OFFSET (PARENT_SCRATCH_SIZE - WQ_SIZE)
|
2021-10-14 10:19:59 -07:00
|
|
|
|
|
|
|
|
struct sync_semaphore {
|
|
|
|
|
u32 semaphore;
|
|
|
|
|
u8 unused[CACHELINE_BYTES - sizeof(u32)];
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
struct parent_scratch {
|
2022-07-18 16:07:32 -07:00
|
|
|
union guc_descs {
|
|
|
|
|
struct guc_sched_wq_desc wq_desc;
|
|
|
|
|
struct guc_process_desc_v69 pdesc;
|
|
|
|
|
} descs;
|
2021-10-14 10:19:59 -07:00
|
|
|
|
|
|
|
|
struct sync_semaphore go;
|
|
|
|
|
struct sync_semaphore join[MAX_ENGINE_INSTANCE + 1];
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
u8 unused[WQ_OFFSET - sizeof(union guc_descs) -
|
2021-10-14 10:19:59 -07:00
|
|
|
sizeof(struct sync_semaphore) * (MAX_ENGINE_INSTANCE + 2)];
|
|
|
|
|
|
|
|
|
|
u32 wq[WQ_SIZE / sizeof(u32)];
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static u32 __get_parent_scratch_offset(struct intel_context *ce)
|
2021-10-14 10:19:48 -07:00
|
|
|
{
|
|
|
|
|
GEM_BUG_ON(!ce->parallel.guc.parent_page);
|
|
|
|
|
|
|
|
|
|
return ce->parallel.guc.parent_page * PAGE_SIZE;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static u32 __get_wq_offset(struct intel_context *ce)
|
|
|
|
|
{
|
2021-10-14 10:19:59 -07:00
|
|
|
BUILD_BUG_ON(offsetof(struct parent_scratch, wq) != WQ_OFFSET);
|
|
|
|
|
|
|
|
|
|
return __get_parent_scratch_offset(ce) + WQ_OFFSET;
|
2021-10-14 10:19:48 -07:00
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
static struct parent_scratch *
|
|
|
|
|
__get_parent_scratch(struct intel_context *ce)
|
2021-10-14 10:19:48 -07:00
|
|
|
{
|
2021-10-14 10:19:59 -07:00
|
|
|
BUILD_BUG_ON(sizeof(struct parent_scratch) != PARENT_SCRATCH_SIZE);
|
|
|
|
|
BUILD_BUG_ON(sizeof(struct sync_semaphore) != CACHELINE_BYTES);
|
|
|
|
|
|
2021-10-14 10:19:48 -07:00
|
|
|
/*
|
|
|
|
|
* Need to subtract LRC_STATE_OFFSET here as the
|
|
|
|
|
* parallel.guc.parent_page is the offset into ce->state while
|
|
|
|
|
* ce->lrc_reg_reg is ce->state + LRC_STATE_OFFSET.
|
|
|
|
|
*/
|
2021-10-14 10:19:59 -07:00
|
|
|
return (struct parent_scratch *)
|
2021-10-14 10:19:48 -07:00
|
|
|
(ce->lrc_reg_state +
|
2021-10-14 10:19:59 -07:00
|
|
|
((__get_parent_scratch_offset(ce) -
|
2021-10-14 10:19:48 -07:00
|
|
|
LRC_STATE_OFFSET) / sizeof(u32)));
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static struct guc_process_desc_v69 *
|
|
|
|
|
__get_process_desc_v69(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct parent_scratch *ps = __get_parent_scratch(ce);
|
|
|
|
|
|
|
|
|
|
return &ps->descs.pdesc;
|
|
|
|
|
}
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
static struct guc_sched_wq_desc *
|
2022-07-18 16:07:32 -07:00
|
|
|
__get_wq_desc_v70(struct intel_context *ce)
|
2021-10-14 10:19:59 -07:00
|
|
|
{
|
|
|
|
|
struct parent_scratch *ps = __get_parent_scratch(ce);
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
return &ps->descs.wq_desc;
|
2021-10-14 10:19:59 -07:00
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static u32 *get_wq_pointer(struct intel_context *ce, u32 wqi_size)
|
2021-10-14 10:19:52 -07:00
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* Check for space in work queue. Caching a value of head pointer in
|
|
|
|
|
* intel_context structure in order reduce the number accesses to shared
|
|
|
|
|
* GPU memory which may be across a PCIe bus.
|
|
|
|
|
*/
|
|
|
|
|
#define AVAILABLE_SPACE \
|
|
|
|
|
CIRC_SPACE(ce->parallel.guc.wqi_tail, ce->parallel.guc.wqi_head, WQ_SIZE)
|
|
|
|
|
if (wqi_size > AVAILABLE_SPACE) {
|
2022-07-18 16:07:32 -07:00
|
|
|
ce->parallel.guc.wqi_head = READ_ONCE(*ce->parallel.guc.wq_head);
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
if (wqi_size > AVAILABLE_SPACE)
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
#undef AVAILABLE_SPACE
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
return &__get_parent_scratch(ce)->wq[ce->parallel.guc.wqi_tail / sizeof(u32)];
|
2021-10-14 10:19:52 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:46 -07:00
|
|
|
static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = xa_load(&guc->context_lookup, id);
|
|
|
|
|
|
2022-03-01 16:33:52 -08:00
|
|
|
GEM_BUG_ON(id >= GUC_MAX_CONTEXT_ID);
|
2021-07-21 14:50:46 -07:00
|
|
|
|
|
|
|
|
return ce;
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static struct guc_lrc_desc_v69 *__get_lrc_desc_v69(struct intel_guc *guc, u32 index)
|
|
|
|
|
{
|
|
|
|
|
struct guc_lrc_desc_v69 *base = guc->lrc_desc_pool_vaddr_v69;
|
|
|
|
|
|
|
|
|
|
if (!base)
|
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(index >= GUC_MAX_CONTEXT_ID);
|
|
|
|
|
|
|
|
|
|
return &base[index];
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_lrc_desc_pool_create_v69(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
u32 size;
|
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
|
|
size = PAGE_ALIGN(sizeof(struct guc_lrc_desc_v69) *
|
|
|
|
|
GUC_MAX_CONTEXT_ID);
|
|
|
|
|
ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool_v69,
|
|
|
|
|
(void **)&guc->lrc_desc_pool_vaddr_v69);
|
|
|
|
|
if (ret)
|
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_lrc_desc_pool_destroy_v69(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
if (!guc->lrc_desc_pool_vaddr_v69)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
guc->lrc_desc_pool_vaddr_v69 = NULL;
|
|
|
|
|
i915_vma_unpin_and_release(&guc->lrc_desc_pool_v69, I915_VMA_RELEASE_MAP);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
static inline bool guc_submission_initialized(struct intel_guc *guc)
|
|
|
|
|
{
|
2022-03-01 16:33:51 -08:00
|
|
|
return guc->submission_initialized;
|
2021-07-21 14:50:46 -07:00
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static inline void _reset_lrc_desc_v69(struct intel_guc *guc, u32 id)
|
|
|
|
|
{
|
|
|
|
|
struct guc_lrc_desc_v69 *desc = __get_lrc_desc_v69(guc, id);
|
|
|
|
|
|
|
|
|
|
if (desc)
|
|
|
|
|
memset(desc, 0, sizeof(*desc));
|
|
|
|
|
}
|
|
|
|
|
|
2022-03-01 16:33:50 -08:00
|
|
|
static inline bool ctx_id_mapped(struct intel_guc *guc, u32 id)
|
2021-07-21 14:50:46 -07:00
|
|
|
{
|
|
|
|
|
return __get_context(guc, id);
|
|
|
|
|
}
|
|
|
|
|
|
2022-03-01 16:33:50 -08:00
|
|
|
static inline void set_ctx_id_mapping(struct intel_guc *guc, u32 id,
|
|
|
|
|
struct intel_context *ce)
|
2021-07-21 14:50:46 -07:00
|
|
|
{
|
2021-07-26 17:23:23 -07:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* xarray API doesn't have xa_save_irqsave wrapper, so calling the
|
|
|
|
|
* lower level functions directly.
|
|
|
|
|
*/
|
|
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
|
|
|
|
__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
|
|
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
2021-07-21 14:50:46 -07:00
|
|
|
}
|
|
|
|
|
|
2022-03-01 16:33:50 -08:00
|
|
|
static inline void clr_ctx_id_mapping(struct intel_guc *guc, u32 id)
|
|
|
|
|
{
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
if (unlikely(!guc_submission_initialized(guc)))
|
|
|
|
|
return;
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
_reset_lrc_desc_v69(guc, id);
|
|
|
|
|
|
2022-03-01 16:33:50 -08:00
|
|
|
/*
|
|
|
|
|
* xarray API doesn't have xa_erase_irqsave wrapper, so calling
|
|
|
|
|
* the lower level functions directly.
|
|
|
|
|
*/
|
|
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
|
|
|
|
__xa_erase(&guc->context_lookup, id);
|
|
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
|
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:23 -07:00
|
|
|
static void decr_outstanding_submission_g2h(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
|
|
|
|
|
wake_up_all(&guc->ct.wq);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:58 -07:00
|
|
|
static int guc_submission_send_busy_loop(struct intel_guc *guc,
|
|
|
|
|
const u32 *action,
|
|
|
|
|
u32 len,
|
|
|
|
|
u32 g2h_len_dw,
|
|
|
|
|
bool loop)
|
|
|
|
|
{
|
2021-09-09 09:47:23 -07:00
|
|
|
/*
|
|
|
|
|
* We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
|
|
|
|
|
* so we don't handle the case where we don't get a reply because we
|
|
|
|
|
* aborted the send due to the channel being busy.
|
|
|
|
|
*/
|
|
|
|
|
GEM_BUG_ON(g2h_len_dw && !loop);
|
2021-07-21 14:50:58 -07:00
|
|
|
|
2021-09-09 09:47:23 -07:00
|
|
|
if (g2h_len_dw)
|
2021-07-21 14:50:58 -07:00
|
|
|
atomic_inc(&guc->outstanding_submission_g2h);
|
|
|
|
|
|
2021-09-09 09:47:23 -07:00
|
|
|
return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
|
2021-07-21 14:50:58 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:26 -07:00
|
|
|
int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
|
|
|
|
|
atomic_t *wait_var,
|
|
|
|
|
bool interruptible,
|
|
|
|
|
long timeout)
|
2021-07-21 14:50:58 -07:00
|
|
|
{
|
|
|
|
|
const int state = interruptible ?
|
|
|
|
|
TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
|
|
|
|
|
DEFINE_WAIT(wait);
|
|
|
|
|
|
|
|
|
|
might_sleep();
|
|
|
|
|
GEM_BUG_ON(timeout < 0);
|
|
|
|
|
|
|
|
|
|
if (!atomic_read(wait_var))
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
if (!timeout)
|
|
|
|
|
return -ETIME;
|
|
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
|
prepare_to_wait(&guc->ct.wq, &wait, state);
|
|
|
|
|
|
|
|
|
|
if (!atomic_read(wait_var))
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
if (signal_pending_state(state, current)) {
|
|
|
|
|
timeout = -EINTR;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (!timeout) {
|
|
|
|
|
timeout = -ETIME;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
timeout = io_schedule_timeout(timeout);
|
|
|
|
|
}
|
|
|
|
|
finish_wait(&guc->ct.wq, &wait);
|
|
|
|
|
|
|
|
|
|
return (timeout < 0) ? timeout : 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
|
|
|
|
|
{
|
|
|
|
|
if (!intel_uc_uses_guc_submission(&guc_to_gt(guc)->uc))
|
|
|
|
|
return 0;
|
|
|
|
|
|
2021-07-26 17:23:26 -07:00
|
|
|
return intel_guc_wait_for_pending_msg(guc,
|
|
|
|
|
&guc->outstanding_submission_g2h,
|
|
|
|
|
true, timeout);
|
2021-07-21 14:50:58 -07:00
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static int guc_context_policy_init_v70(struct intel_context *ce, bool loop);
|
2022-03-01 16:33:53 -08:00
|
|
|
static int try_context_registration(struct intel_context *ce, bool loop);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
|
2017-10-25 22:00:14 +02:00
|
|
|
{
|
2021-07-26 17:23:39 -07:00
|
|
|
int err = 0;
|
2021-10-14 10:19:52 -07:00
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
2021-07-21 14:50:47 -07:00
|
|
|
u32 action[3];
|
|
|
|
|
int len = 0;
|
2021-07-21 14:50:57 -07:00
|
|
|
u32 g2h_len_dw = 0;
|
2021-07-26 17:23:23 -07:00
|
|
|
bool enabled;
|
2017-10-25 22:00:14 +02:00
|
|
|
|
2021-09-09 09:47:38 -07:00
|
|
|
lockdep_assert_held(&rq->engine->sched_engine->lock);
|
|
|
|
|
|
2021-07-26 17:23:39 -07:00
|
|
|
/*
|
|
|
|
|
* Corner case where requests were sitting in the priority list or a
|
|
|
|
|
* request resubmitted after the context was banned.
|
|
|
|
|
*/
|
|
|
|
|
if (unlikely(intel_context_is_banned(ce))) {
|
|
|
|
|
i915_request_put(i915_request_mark_eio(rq));
|
|
|
|
|
intel_engine_signal_breadcrumbs(ce->engine);
|
2021-09-09 09:47:38 -07:00
|
|
|
return 0;
|
2021-07-26 17:23:39 -07:00
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:42 -07:00
|
|
|
GEM_BUG_ON(!atomic_read(&ce->guc_id.ref));
|
2021-07-21 14:50:49 -07:00
|
|
|
GEM_BUG_ON(context_guc_id_invalid(ce));
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
if (context_policy_required(ce)) {
|
2022-07-18 16:07:32 -07:00
|
|
|
err = guc_context_policy_init_v70(ce, false);
|
2022-04-12 15:59:55 -07:00
|
|
|
if (err)
|
|
|
|
|
return err;
|
|
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:38 -07:00
|
|
|
spin_lock(&ce->guc_state.lock);
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
/*
|
|
|
|
|
* The request / context will be run on the hardware when scheduling
|
2021-10-14 10:19:52 -07:00
|
|
|
* gets enabled in the unblock. For multi-lrc we still submit the
|
|
|
|
|
* context to move the LRC tails.
|
2021-07-26 17:23:40 -07:00
|
|
|
*/
|
2021-10-14 10:19:52 -07:00
|
|
|
if (unlikely(context_blocked(ce) && !intel_context_is_parent(ce)))
|
2021-07-26 17:23:40 -07:00
|
|
|
goto out;
|
|
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
enabled = context_enabled(ce) || context_blocked(ce);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
if (!enabled) {
|
|
|
|
|
action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
|
2021-09-09 09:47:42 -07:00
|
|
|
action[len++] = ce->guc_id.id;
|
2021-07-21 14:50:47 -07:00
|
|
|
action[len++] = GUC_CONTEXT_ENABLE;
|
2021-07-21 14:50:51 -07:00
|
|
|
set_context_pending_enable(ce);
|
|
|
|
|
intel_context_get(ce);
|
2021-07-21 14:50:57 -07:00
|
|
|
g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
|
2021-07-21 14:50:47 -07:00
|
|
|
} else {
|
|
|
|
|
action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
|
2021-09-09 09:47:42 -07:00
|
|
|
action[len++] = ce->guc_id.id;
|
2021-07-21 14:50:47 -07:00
|
|
|
}
|
drm/i915/guc: Preemption! With GuC
Pretty similar to what we have on execlists.
We're reusing most of the GEM code, however, due to GuC quirks we need a
couple of extra bits.
Preemption is implemented as GuC action, and actions can be pretty slow.
Because of that, we're using a mutex to serialize them. Since we're
requesting preemption from the tasklet, the task of creating a workitem
and wrapping it in GuC action is delegated to a worker.
To distinguish that preemption has finished, we're using additional
piece of HWSP, and since we're not getting context switch interrupts,
we're also adding a user interrupt.
The fact that our special preempt context has completed unfortunately
doesn't mean that we're ready to submit new work. We also need to wait
for GuC to finish its own processing.
v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
no need for ordered workqueue, put on a reviewer hat when looking at my own
patches (Chris)
Move struct work around in intel_guc, move user interruput outside of
conditional (Michał)
Keep ring around rather than chase though intel_context
v3: Extract WA for flushing ggtt writes to a helper (Chris)
Keep work_struct in intel_guc rather than engine (Michał)
Use ordered workqueue for inject_preempt worker to avoid GuC quirks.
v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
Drop stray newlines, use container_of for intel_guc in worker,
check for presence of workqueue when flushing it, rather than
enable_guc_submission modparam, reorder preempt postprocessing (Chris)
v5: Make wq NULL after destroying it
v6: Swap struct guc_preempt_work members (Michał)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
2017-10-26 15:35:58 +02:00
|
|
|
|
2021-07-21 14:50:57 -07:00
|
|
|
err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
|
2021-07-21 14:50:51 -07:00
|
|
|
if (!enabled && !err) {
|
2021-07-21 14:51:01 -07:00
|
|
|
trace_intel_context_sched_enable(ce);
|
2021-07-21 14:50:58 -07:00
|
|
|
atomic_inc(&guc->outstanding_submission_g2h);
|
2021-07-21 14:50:47 -07:00
|
|
|
set_context_enabled(ce);
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Without multi-lrc KMD does the submission step (moving the
|
|
|
|
|
* lrc tail) so enabling scheduling is sufficient to submit the
|
|
|
|
|
* context. This isn't the case in multi-lrc submission as the
|
|
|
|
|
* GuC needs to move the tails, hence the need for another H2G
|
|
|
|
|
* to submit a multi-lrc context after enabling scheduling.
|
|
|
|
|
*/
|
|
|
|
|
if (intel_context_is_parent(ce)) {
|
|
|
|
|
action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT;
|
|
|
|
|
err = intel_guc_send_nb(guc, action, len - 1, 0);
|
|
|
|
|
}
|
2021-07-21 14:50:51 -07:00
|
|
|
} else if (!enabled) {
|
|
|
|
|
clr_context_pending_enable(ce);
|
|
|
|
|
intel_context_put(ce);
|
|
|
|
|
}
|
2021-07-26 17:23:39 -07:00
|
|
|
if (likely(!err))
|
|
|
|
|
trace_i915_request_guc_submit(rq);
|
2017-02-28 11:28:03 +00:00
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
out:
|
2021-09-09 09:47:38 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-07-21 14:50:47 -07:00
|
|
|
return err;
|
drm/i915/guc: Split hw submission for replay after GPU reset
Something I missed before sending off the partial series was that the
non-scheduler guc reset path was broken (in the full series, this is
pushed to the execlists reset handler). The issue is that after a reset,
we have to refill the GuC workqueues, which we do by resubmitting the
requests. However, if we already have submitted them, the fences within
them have already been used and triggering them again is an error.
Instead, just repopulate the guc workqueue.
[ 115.858560] [IGT] gem_busy: starting subtest hang-render
[ 135.839867] [drm] GPU HANG: ecode 9:0:0xe757fefe, in gem_busy [1716], reason: Hang on render ring, action: reset
[ 135.839902] drm/i915: Resetting chip after gpu hang
[ 135.839957] [drm] RC6 on
[ 135.858351] ------------[ cut here ]------------
[ 135.858357] WARNING: CPU: 2 PID: 45 at drivers/gpu/drm/i915/i915_sw_fence.c:108 i915_sw_fence_complete+0x25/0x30
[ 135.858357] Modules linked in: rfcomm bnep binfmt_misc nls_iso8859_1 input_leds snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core btusb btrtl snd_hwdep snd_pcm 8250_dw snd_seq_midi hid_lenovo snd_seq_midi_event snd_rawmidi iwlwifi x86_pkg_temp_thermal coretemp snd_seq crct10dif_pclmul snd_seq_device hci_uart snd_timer crc32_pclmul ghash_clmulni_intel idma64 aesni_intel virt_dma btbcm snd btqca aes_x86_64 btintel lrw cfg80211 bluetooth gf128mul glue_helper ablk_helper cryptd soundcore intel_lpss_pci intel_pch_thermal intel_lpss_acpi intel_lpss acpi_als mfd_core kfifo_buf acpi_pad industrialio autofs4 hid_plantronics usbhid dm_mirror dm_region_hash dm_log sdhci_pci ahci sdhci libahci i2c_hid hid
[ 135.858389] CPU: 2 PID: 45 Comm: kworker/2:1 Tainted: G W 4.9.0-rc4+ #238
[ 135.858389] Hardware name: /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
[ 135.858392] Workqueue: events_long i915_hangcheck_elapsed
[ 135.858394] ffffc900001bf9b8 ffffffff812bb238 0000000000000000 0000000000000000
[ 135.858396] ffffc900001bf9f8 ffffffff8104f621 0000006c00000000 ffff8808296137f8
[ 135.858398] 0000000000000a00 ffff8808457a0000 ffff880845764e60 ffff880845760000
[ 135.858399] Call Trace:
[ 135.858403] [<ffffffff812bb238>] dump_stack+0x4d/0x65
[ 135.858405] [<ffffffff8104f621>] __warn+0xc1/0xe0
[ 135.858406] [<ffffffff8104f748>] warn_slowpath_null+0x18/0x20
[ 135.858408] [<ffffffff813f8c15>] i915_sw_fence_complete+0x25/0x30
[ 135.858410] [<ffffffff813f8fad>] i915_sw_fence_commit+0xd/0x30
[ 135.858412] [<ffffffff8142e591>] __i915_gem_request_submit+0xe1/0xf0
[ 135.858413] [<ffffffff8142e5c8>] i915_gem_request_submit+0x28/0x40
[ 135.858415] [<ffffffff814433e7>] i915_guc_submit+0x47/0x210
[ 135.858417] [<ffffffff81443e98>] i915_guc_submission_enable+0x468/0x540
[ 135.858419] [<ffffffff81442495>] intel_guc_setup+0x715/0x810
[ 135.858421] [<ffffffff8142b6b4>] i915_gem_init_hw+0x114/0x2a0
[ 135.858423] [<ffffffff813eeaa8>] i915_reset+0xe8/0x120
[ 135.858424] [<ffffffff813f3937>] i915_reset_and_wakeup+0x157/0x180
[ 135.858426] [<ffffffff813f79db>] i915_handle_error+0x1ab/0x230
[ 135.858428] [<ffffffff812c760d>] ? scnprintf+0x4d/0x90
[ 135.858430] [<ffffffff81435985>] i915_hangcheck_elapsed+0x275/0x3d0
[ 135.858432] [<ffffffff810668cf>] process_one_work+0x12f/0x410
[ 135.858433] [<ffffffff81066bf3>] worker_thread+0x43/0x4d0
[ 135.858435] [<ffffffff81066bb0>] ? process_one_work+0x410/0x410
[ 135.858436] [<ffffffff81066bb0>] ? process_one_work+0x410/0x410
[ 135.858438] [<ffffffff8106bbb4>] kthread+0xd4/0xf0
[ 135.858440] [<ffffffff8106bae0>] ? kthread_park+0x60/0x60
v2: Only resubmit submitted requests
v3: Don't forget the pending requests have reserved space.
Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129121024.22650-6-chris@chris-wilson.co.uk
2016-11-29 12:10:24 +00:00
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
int ret = __guc_add_request(guc, rq);
|
|
|
|
|
|
|
|
|
|
if (unlikely(ret == -EBUSY)) {
|
|
|
|
|
guc->stalled_request = rq;
|
|
|
|
|
guc->submission_stall_reason = STALL_ADD_REQUEST;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:48 -07:00
|
|
|
static inline void guc_set_lrc_tail(struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
rq->context->lrc_reg_state[CTX_RING_TAIL] =
|
|
|
|
|
intel_ring_set_tail(rq->ring, rq->tail);
|
|
|
|
|
}
|
|
|
|
|
|
drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.
However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.
To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.
v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-20 15:20:51 +01:00
|
|
|
static inline int rq_prio(const struct i915_request *rq)
|
2017-05-17 13:10:00 +01:00
|
|
|
{
|
2020-05-07 16:23:38 +01:00
|
|
|
return rq->sched.attr.priority;
|
2017-05-17 13:10:00 +01:00
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
static bool is_multi_lrc_rq(struct i915_request *rq)
|
|
|
|
|
{
|
2021-10-14 10:19:53 -07:00
|
|
|
return intel_context_is_parallel(rq->context);
|
2021-10-14 10:19:52 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static bool can_merge_rq(struct i915_request *rq,
|
|
|
|
|
struct i915_request *last)
|
|
|
|
|
{
|
|
|
|
|
return request_to_scheduling_context(rq) ==
|
|
|
|
|
request_to_scheduling_context(last);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static u32 wq_space_until_wrap(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return (WQ_SIZE - ce->parallel.guc.wqi_tail);
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static void write_wqi(struct intel_context *ce, u32 wqi_size)
|
2021-10-14 10:19:52 -07:00
|
|
|
{
|
|
|
|
|
BUILD_BUG_ON(!is_power_of_2(WQ_SIZE));
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Ensure WQI are visible before updating tail
|
|
|
|
|
*/
|
|
|
|
|
intel_guc_write_barrier(ce_to_guc(ce));
|
|
|
|
|
|
|
|
|
|
ce->parallel.guc.wqi_tail = (ce->parallel.guc.wqi_tail + wqi_size) &
|
|
|
|
|
(WQ_SIZE - 1);
|
2022-07-18 16:07:32 -07:00
|
|
|
WRITE_ONCE(*ce->parallel.guc.wq_tail, ce->parallel.guc.wqi_tail);
|
2021-10-14 10:19:52 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_wq_noop_append(struct intel_context *ce)
|
|
|
|
|
{
|
2022-07-18 16:07:32 -07:00
|
|
|
u32 *wqi = get_wq_pointer(ce, wq_space_until_wrap(ce));
|
2021-10-14 10:19:52 -07:00
|
|
|
u32 len_dw = wq_space_until_wrap(ce) / sizeof(u32) - 1;
|
|
|
|
|
|
|
|
|
|
if (!wqi)
|
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!FIELD_FIT(WQ_LEN_MASK, len_dw));
|
|
|
|
|
|
|
|
|
|
*wqi = FIELD_PREP(WQ_TYPE_MASK, WQ_TYPE_NOOP) |
|
|
|
|
|
FIELD_PREP(WQ_LEN_MASK, len_dw);
|
|
|
|
|
ce->parallel.guc.wqi_tail = 0;
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int __guc_wq_item_append(struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
|
|
|
|
struct intel_context *child;
|
|
|
|
|
unsigned int wqi_size = (ce->parallel.number_children + 4) *
|
|
|
|
|
sizeof(u32);
|
|
|
|
|
u32 *wqi;
|
|
|
|
|
u32 len_dw = (wqi_size / sizeof(u32)) - 1;
|
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
|
|
/* Ensure context is in correct state updating work queue */
|
|
|
|
|
GEM_BUG_ON(!atomic_read(&ce->guc_id.ref));
|
|
|
|
|
GEM_BUG_ON(context_guc_id_invalid(ce));
|
|
|
|
|
GEM_BUG_ON(context_wait_for_deregister_to_register(ce));
|
2022-03-01 16:33:50 -08:00
|
|
|
GEM_BUG_ON(!ctx_id_mapped(ce_to_guc(ce), ce->guc_id.id));
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
/* Insert NOOP if this work queue item will wrap the tail pointer. */
|
|
|
|
|
if (wqi_size > wq_space_until_wrap(ce)) {
|
|
|
|
|
ret = guc_wq_noop_append(ce);
|
|
|
|
|
if (ret)
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
wqi = get_wq_pointer(ce, wqi_size);
|
2021-10-14 10:19:52 -07:00
|
|
|
if (!wqi)
|
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!FIELD_FIT(WQ_LEN_MASK, len_dw));
|
|
|
|
|
|
|
|
|
|
*wqi++ = FIELD_PREP(WQ_TYPE_MASK, WQ_TYPE_MULTI_LRC) |
|
|
|
|
|
FIELD_PREP(WQ_LEN_MASK, len_dw);
|
|
|
|
|
*wqi++ = ce->lrc.lrca;
|
|
|
|
|
*wqi++ = FIELD_PREP(WQ_GUC_ID_MASK, ce->guc_id.id) |
|
|
|
|
|
FIELD_PREP(WQ_RING_TAIL_MASK, ce->ring->tail / sizeof(u64));
|
|
|
|
|
*wqi++ = 0; /* fence_id */
|
|
|
|
|
for_each_child(ce, child)
|
|
|
|
|
*wqi++ = child->ring->tail / sizeof(u64);
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
write_wqi(ce, wqi_size);
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_wq_item_append(struct intel_guc *guc,
|
|
|
|
|
struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
|
|
if (likely(!intel_context_is_banned(ce))) {
|
|
|
|
|
ret = __guc_wq_item_append(rq);
|
|
|
|
|
|
|
|
|
|
if (unlikely(ret == -EBUSY)) {
|
|
|
|
|
guc->stalled_request = rq;
|
|
|
|
|
guc->submission_stall_reason = STALL_MOVE_LRC_TAIL;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static bool multi_lrc_submit(struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
|
|
|
|
|
|
|
|
|
intel_ring_set_tail(rq->ring, rq->tail);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* We expect the front end (execbuf IOCTL) to set this flag on the last
|
|
|
|
|
* request generated from a multi-BB submission. This indicates to the
|
|
|
|
|
* backend (GuC interface) that we should submit this context thus
|
|
|
|
|
* submitting all the requests generated in parallel.
|
|
|
|
|
*/
|
|
|
|
|
return test_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL, &rq->fence.flags) ||
|
|
|
|
|
intel_context_is_banned(ce);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
static int guc_dequeue_one_context(struct intel_guc *guc)
|
2018-04-03 19:35:37 +01:00
|
|
|
{
|
2021-07-21 14:50:47 -07:00
|
|
|
struct i915_sched_engine * const sched_engine = guc->sched_engine;
|
|
|
|
|
struct i915_request *last = NULL;
|
2017-03-16 12:56:18 +00:00
|
|
|
bool submit = false;
|
2017-09-14 10:32:13 +02:00
|
|
|
struct rb_node *rb;
|
2021-07-21 14:50:47 -07:00
|
|
|
int ret;
|
2017-09-14 10:32:13 +02:00
|
|
|
|
2021-06-17 18:06:34 -07:00
|
|
|
lockdep_assert_held(&sched_engine->lock);
|
2018-05-08 22:03:18 +01:00
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
if (guc->stalled_request) {
|
|
|
|
|
submit = true;
|
|
|
|
|
last = guc->stalled_request;
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
switch (guc->submission_stall_reason) {
|
|
|
|
|
case STALL_REGISTER_CONTEXT:
|
|
|
|
|
goto register_context;
|
|
|
|
|
case STALL_MOVE_LRC_TAIL:
|
|
|
|
|
goto move_lrc_tail;
|
|
|
|
|
case STALL_ADD_REQUEST:
|
|
|
|
|
goto add_request;
|
|
|
|
|
default:
|
|
|
|
|
MISSING_CASE(guc->submission_stall_reason);
|
|
|
|
|
}
|
drm/i915/guc: Preemption! With GuC
Pretty similar to what we have on execlists.
We're reusing most of the GEM code, however, due to GuC quirks we need a
couple of extra bits.
Preemption is implemented as GuC action, and actions can be pretty slow.
Because of that, we're using a mutex to serialize them. Since we're
requesting preemption from the tasklet, the task of creating a workitem
and wrapping it in GuC action is delegated to a worker.
To distinguish that preemption has finished, we're using additional
piece of HWSP, and since we're not getting context switch interrupts,
we're also adding a user interrupt.
The fact that our special preempt context has completed unfortunately
doesn't mean that we're ready to submit new work. We also need to wait
for GuC to finish its own processing.
v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
no need for ordered workqueue, put on a reviewer hat when looking at my own
patches (Chris)
Move struct work around in intel_guc, move user interruput outside of
conditional (Michał)
Keep ring around rather than chase though intel_context
v3: Extract WA for flushing ggtt writes to a helper (Chris)
Keep work_struct in intel_guc rather than engine (Michał)
Use ordered workqueue for inject_preempt worker to avoid GuC quirks.
v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
Drop stray newlines, use container_of for intel_guc in worker,
check for presence of workqueue when flushing it, rather than
enable_guc_submission modparam, reorder preempt postprocessing (Chris)
v5: Make wq NULL after destroying it
v6: Swap struct guc_preempt_work members (Michał)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
2017-10-26 15:35:58 +02:00
|
|
|
}
|
|
|
|
|
|
2021-06-17 18:06:31 -07:00
|
|
|
while ((rb = rb_first_cached(&sched_engine->queue))) {
|
2018-02-22 14:22:29 +00:00
|
|
|
struct i915_priolist *p = to_priolist(rb);
|
2018-02-21 09:56:36 +00:00
|
|
|
struct i915_request *rq, *rn;
|
drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.
Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.
There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).
v2: Avoid use-after-free when deleting a depleted priolist
v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17 13:10:03 +01:00
|
|
|
|
2021-01-20 12:14:38 +00:00
|
|
|
priolist_for_each_request_consume(rq, rn, p) {
|
2021-10-14 10:19:52 -07:00
|
|
|
if (last && !can_merge_rq(rq, last))
|
|
|
|
|
goto register_context;
|
drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.
Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.
There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).
v2: Avoid use-after-free when deleting a depleted priolist
v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17 13:10:03 +01:00
|
|
|
|
2018-10-01 13:32:04 +01:00
|
|
|
list_del_init(&rq->sched.link);
|
2021-07-21 14:50:47 -07:00
|
|
|
|
2018-02-21 09:56:36 +00:00
|
|
|
__i915_request_submit(rq);
|
2021-07-21 14:50:47 -07:00
|
|
|
|
|
|
|
|
trace_i915_request_in(rq, 0);
|
drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.
However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.
To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.
v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-20 15:20:51 +01:00
|
|
|
last = rq;
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
if (is_multi_lrc_rq(rq)) {
|
|
|
|
|
/*
|
|
|
|
|
* We need to coalesce all multi-lrc requests in
|
|
|
|
|
* a relationship into a single H2G. We are
|
|
|
|
|
* guaranteed that all of these requests will be
|
|
|
|
|
* submitted sequentially.
|
|
|
|
|
*/
|
|
|
|
|
if (multi_lrc_submit(rq)) {
|
|
|
|
|
submit = true;
|
|
|
|
|
goto register_context;
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
submit = true;
|
|
|
|
|
}
|
2017-03-16 12:56:18 +00:00
|
|
|
}
|
|
|
|
|
|
2021-06-17 18:06:31 -07:00
|
|
|
rb_erase_cached(&p->node, &sched_engine->queue);
|
2019-02-28 10:20:33 +00:00
|
|
|
i915_priolist_free(p);
|
2018-02-22 14:22:29 +00:00
|
|
|
}
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
register_context:
|
drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.
However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.
To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.
v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-20 15:20:51 +01:00
|
|
|
if (submit) {
|
2021-10-14 10:19:52 -07:00
|
|
|
struct intel_context *ce = request_to_scheduling_context(last);
|
|
|
|
|
|
2022-03-01 16:33:50 -08:00
|
|
|
if (unlikely(!ctx_id_mapped(guc, ce->guc_id.id) &&
|
2021-10-14 10:19:52 -07:00
|
|
|
!intel_context_is_banned(ce))) {
|
2022-03-01 16:33:53 -08:00
|
|
|
ret = try_context_registration(ce, false);
|
2021-10-14 10:19:52 -07:00
|
|
|
if (unlikely(ret == -EPIPE)) {
|
|
|
|
|
goto deadlk;
|
|
|
|
|
} else if (ret == -EBUSY) {
|
|
|
|
|
guc->stalled_request = last;
|
|
|
|
|
guc->submission_stall_reason =
|
|
|
|
|
STALL_REGISTER_CONTEXT;
|
|
|
|
|
goto schedule_tasklet;
|
|
|
|
|
} else if (ret != 0) {
|
|
|
|
|
GEM_WARN_ON(ret); /* Unexpected */
|
|
|
|
|
goto deadlk;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
move_lrc_tail:
|
|
|
|
|
if (is_multi_lrc_rq(last)) {
|
|
|
|
|
ret = guc_wq_item_append(guc, last);
|
|
|
|
|
if (ret == -EBUSY) {
|
|
|
|
|
goto schedule_tasklet;
|
|
|
|
|
} else if (ret != 0) {
|
|
|
|
|
GEM_WARN_ON(ret); /* Unexpected */
|
|
|
|
|
goto deadlk;
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
guc_set_lrc_tail(last);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
add_request:
|
2021-07-21 14:50:47 -07:00
|
|
|
ret = guc_add_request(guc, last);
|
2021-10-14 10:19:52 -07:00
|
|
|
if (unlikely(ret == -EPIPE)) {
|
|
|
|
|
goto deadlk;
|
|
|
|
|
} else if (ret == -EBUSY) {
|
|
|
|
|
goto schedule_tasklet;
|
|
|
|
|
} else if (ret != 0) {
|
|
|
|
|
GEM_WARN_ON(ret); /* Unexpected */
|
2021-07-26 17:23:23 -07:00
|
|
|
goto deadlk;
|
2021-07-21 14:50:47 -07:00
|
|
|
}
|
drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.
However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.
To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.
v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-20 15:20:51 +01:00
|
|
|
}
|
2021-07-21 14:50:47 -07:00
|
|
|
|
|
|
|
|
guc->stalled_request = NULL;
|
2021-10-14 10:19:52 -07:00
|
|
|
guc->submission_stall_reason = STALL_NONE;
|
2021-07-21 14:50:47 -07:00
|
|
|
return submit;
|
2021-07-26 17:23:23 -07:00
|
|
|
|
|
|
|
|
deadlk:
|
|
|
|
|
sched_engine->tasklet.callback = NULL;
|
|
|
|
|
tasklet_disable_nosync(&sched_engine->tasklet);
|
|
|
|
|
return false;
|
2021-10-14 10:19:52 -07:00
|
|
|
|
|
|
|
|
schedule_tasklet:
|
|
|
|
|
tasklet_schedule(&sched_engine->tasklet);
|
|
|
|
|
return false;
|
2017-03-16 12:56:18 +00:00
|
|
|
}
|
|
|
|
|
|
2021-01-26 16:01:55 +01:00
|
|
|
static void guc_submission_tasklet(struct tasklet_struct *t)
|
2017-03-16 12:56:18 +00:00
|
|
|
{
|
2021-06-17 18:06:38 -07:00
|
|
|
struct i915_sched_engine *sched_engine =
|
|
|
|
|
from_tasklet(sched_engine, t, tasklet);
|
2018-09-25 09:31:59 +01:00
|
|
|
unsigned long flags;
|
2021-07-21 14:50:47 -07:00
|
|
|
bool loop;
|
2018-09-25 09:31:59 +01:00
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
spin_lock_irqsave(&sched_engine->lock, flags);
|
2017-03-16 12:56:18 +00:00
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
do {
|
|
|
|
|
loop = guc_dequeue_one_context(sched_engine->private_data);
|
|
|
|
|
} while (loop);
|
2018-09-25 09:31:59 +01:00
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
i915_sched_engine_reset_on_empty(sched_engine);
|
2021-06-17 18:06:33 -07:00
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
spin_unlock_irqrestore(&sched_engine->lock, flags);
|
2017-03-16 12:56:18 +00:00
|
|
|
}
|
|
|
|
|
|
2021-05-21 11:32:15 -07:00
|
|
|
static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
|
|
|
|
|
{
|
2021-07-21 14:50:47 -07:00
|
|
|
if (iir & GT_RENDER_USER_INTERRUPT)
|
2021-05-21 11:32:15 -07:00
|
|
|
intel_engine_signal_breadcrumbs(engine);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
static void __guc_context_destroy(struct intel_context *ce);
|
|
|
|
|
static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
|
|
|
|
|
static void guc_signal_context_fence(struct intel_context *ce);
|
2021-07-26 17:23:39 -07:00
|
|
|
static void guc_cancel_context_requests(struct intel_context *ce);
|
2021-07-26 17:23:40 -07:00
|
|
|
static void guc_blocked_fence_complete(struct intel_context *ce);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
|
|
|
|
static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
unsigned long index, flags;
|
2021-07-26 17:23:39 -07:00
|
|
|
bool pending_disable, pending_enable, deregister, destroyed, banned;
|
2021-07-26 17:23:23 -07:00
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
2021-07-26 17:23:23 -07:00
|
|
|
xa_for_each(&guc->context_lookup, index, ce) {
|
2021-09-09 09:47:39 -07:00
|
|
|
/*
|
|
|
|
|
* Corner case where the ref count on the object is zero but and
|
|
|
|
|
* deregister G2H was lost. In this case we don't touch the ref
|
|
|
|
|
* count and finish the destroy of the context.
|
|
|
|
|
*/
|
|
|
|
|
bool do_put = kref_get_unless_zero(&ce->ref);
|
|
|
|
|
|
|
|
|
|
xa_unlock(&guc->context_lookup);
|
|
|
|
|
|
|
|
|
|
spin_lock(&ce->guc_state.lock);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Once we are at this point submission_disabled() is guaranteed
|
|
|
|
|
* to be visible to all callers who set the below flags (see above
|
|
|
|
|
* flush and flushes in reset_prepare). If submission_disabled()
|
|
|
|
|
* is set, the caller shouldn't set these flags.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
destroyed = context_destroyed(ce);
|
|
|
|
|
pending_enable = context_pending_enable(ce);
|
|
|
|
|
pending_disable = context_pending_disable(ce);
|
|
|
|
|
deregister = context_wait_for_deregister_to_register(ce);
|
2021-07-26 17:23:39 -07:00
|
|
|
banned = context_banned(ce);
|
2021-07-26 17:23:23 -07:00
|
|
|
init_sched_state(ce);
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
if (pending_enable || destroyed || deregister) {
|
2021-09-09 09:47:23 -07:00
|
|
|
decr_outstanding_submission_g2h(guc);
|
2021-07-26 17:23:23 -07:00
|
|
|
if (deregister)
|
|
|
|
|
guc_signal_context_fence(ce);
|
|
|
|
|
if (destroyed) {
|
2021-10-14 10:19:42 -07:00
|
|
|
intel_gt_pm_put_async(guc_to_gt(guc));
|
2021-07-26 17:23:23 -07:00
|
|
|
release_guc_id(guc, ce);
|
|
|
|
|
__guc_context_destroy(ce);
|
|
|
|
|
}
|
|
|
|
|
if (pending_enable || deregister)
|
|
|
|
|
intel_context_put(ce);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Not mutualy exclusive with above if statement. */
|
|
|
|
|
if (pending_disable) {
|
|
|
|
|
guc_signal_context_fence(ce);
|
2021-07-26 17:23:39 -07:00
|
|
|
if (banned) {
|
|
|
|
|
guc_cancel_context_requests(ce);
|
|
|
|
|
intel_engine_signal_breadcrumbs(ce->engine);
|
|
|
|
|
}
|
2021-07-26 17:23:23 -07:00
|
|
|
intel_context_sched_disable_unpin(ce);
|
2021-09-09 09:47:23 -07:00
|
|
|
decr_outstanding_submission_g2h(guc);
|
2021-09-09 09:47:39 -07:00
|
|
|
|
|
|
|
|
spin_lock(&ce->guc_state.lock);
|
2021-07-26 17:23:40 -07:00
|
|
|
guc_blocked_fence_complete(ce);
|
2021-09-09 09:47:39 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-07-26 17:23:40 -07:00
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
intel_context_put(ce);
|
|
|
|
|
}
|
2021-09-09 09:47:39 -07:00
|
|
|
|
|
|
|
|
if (do_put)
|
|
|
|
|
intel_context_put(ce);
|
|
|
|
|
xa_lock(&guc->context_lookup);
|
2021-07-26 17:23:23 -07:00
|
|
|
}
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
2021-07-26 17:23:23 -07:00
|
|
|
}
|
|
|
|
|
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
/*
|
|
|
|
|
* GuC stores busyness stats for each engine at context in/out boundaries. A
|
|
|
|
|
* context 'in' logs execution start time, 'out' adds in -> out delta to total.
|
|
|
|
|
* i915/kmd accesses 'start', 'total' and 'context id' from memory shared with
|
|
|
|
|
* GuC.
|
|
|
|
|
*
|
|
|
|
|
* __i915_pmu_event_read samples engine busyness. When sampling, if context id
|
|
|
|
|
* is valid (!= ~0) and start is non-zero, the engine is considered to be
|
|
|
|
|
* active. For an active engine total busyness = total + (now - start), where
|
|
|
|
|
* 'now' is the time at which the busyness is sampled. For inactive engine,
|
|
|
|
|
* total busyness = total.
|
|
|
|
|
*
|
|
|
|
|
* All times are captured from GUCPMTIMESTAMP reg and are in gt clock domain.
|
|
|
|
|
*
|
|
|
|
|
* The start and total values provided by GuC are 32 bits and wrap around in a
|
|
|
|
|
* few minutes. Since perf pmu provides busyness as 64 bit monotonically
|
|
|
|
|
* increasing ns values, there is a need for this implementation to account for
|
|
|
|
|
* overflows and extend the GuC provided values to 64 bits before returning
|
|
|
|
|
* busyness to the user. In order to do that, a worker runs periodically at
|
|
|
|
|
* frequency = 1/8th the time it takes for the timestamp to wrap (i.e. once in
|
|
|
|
|
* 27 seconds for a gt clock frequency of 19.2 MHz).
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
#define WRAP_TIME_CLKS U32_MAX
|
|
|
|
|
#define POLL_TIME_CLKS (WRAP_TIME_CLKS >> 3)
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
__extend_last_switch(struct intel_guc *guc, u64 *prev_start, u32 new_start)
|
|
|
|
|
{
|
|
|
|
|
u32 gt_stamp_hi = upper_32_bits(guc->timestamp.gt_stamp);
|
|
|
|
|
u32 gt_stamp_last = lower_32_bits(guc->timestamp.gt_stamp);
|
|
|
|
|
|
|
|
|
|
if (new_start == lower_32_bits(*prev_start))
|
|
|
|
|
return;
|
|
|
|
|
|
2022-01-24 18:01:24 -08:00
|
|
|
/*
|
|
|
|
|
* When gt is unparked, we update the gt timestamp and start the ping
|
|
|
|
|
* worker that updates the gt_stamp every POLL_TIME_CLKS. As long as gt
|
|
|
|
|
* is unparked, all switched in contexts will have a start time that is
|
|
|
|
|
* within +/- POLL_TIME_CLKS of the most recent gt_stamp.
|
|
|
|
|
*
|
|
|
|
|
* If neither gt_stamp nor new_start has rolled over, then the
|
|
|
|
|
* gt_stamp_hi does not need to be adjusted, however if one of them has
|
|
|
|
|
* rolled over, we need to adjust gt_stamp_hi accordingly.
|
|
|
|
|
*
|
|
|
|
|
* The below conditions address the cases of new_start rollover and
|
|
|
|
|
* gt_stamp_last rollover respectively.
|
|
|
|
|
*/
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
if (new_start < gt_stamp_last &&
|
|
|
|
|
(new_start - gt_stamp_last) <= POLL_TIME_CLKS)
|
|
|
|
|
gt_stamp_hi++;
|
|
|
|
|
|
|
|
|
|
if (new_start > gt_stamp_last &&
|
|
|
|
|
(gt_stamp_last - new_start) <= POLL_TIME_CLKS && gt_stamp_hi)
|
|
|
|
|
gt_stamp_hi--;
|
|
|
|
|
|
|
|
|
|
*prev_start = ((u64)gt_stamp_hi << 32) | new_start;
|
|
|
|
|
}
|
|
|
|
|
|
2022-02-16 09:41:39 -08:00
|
|
|
#define record_read(map_, field_) \
|
|
|
|
|
iosys_map_rd_field(map_, 0, struct guc_engine_usage_record, field_)
|
|
|
|
|
|
2022-01-24 18:01:24 -08:00
|
|
|
/*
|
|
|
|
|
* GuC updates shared memory and KMD reads it. Since this is not synchronized,
|
|
|
|
|
* we run into a race where the value read is inconsistent. Sometimes the
|
|
|
|
|
* inconsistency is in reading the upper MSB bytes of the last_in value when
|
|
|
|
|
* this race occurs. 2 types of cases are seen - upper 8 bits are zero and upper
|
|
|
|
|
* 24 bits are zero. Since these are non-zero values, it is non-trivial to
|
|
|
|
|
* determine validity of these values. Instead we read the values multiple times
|
|
|
|
|
* until they are consistent. In test runs, 3 attempts results in consistent
|
|
|
|
|
* values. The upper bound is set to 6 attempts and may need to be tuned as per
|
|
|
|
|
* any new occurences.
|
|
|
|
|
*/
|
|
|
|
|
static void __get_engine_usage_record(struct intel_engine_cs *engine,
|
|
|
|
|
u32 *last_in, u32 *id, u32 *total)
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
{
|
2022-02-16 09:41:39 -08:00
|
|
|
struct iosys_map rec_map = intel_guc_engine_usage_record_map(engine);
|
2022-01-24 18:01:24 -08:00
|
|
|
int i = 0;
|
|
|
|
|
|
|
|
|
|
do {
|
2022-02-16 09:41:39 -08:00
|
|
|
*last_in = record_read(&rec_map, last_switch_in_stamp);
|
|
|
|
|
*id = record_read(&rec_map, current_context_index);
|
|
|
|
|
*total = record_read(&rec_map, total_runtime);
|
2022-01-24 18:01:24 -08:00
|
|
|
|
2022-02-16 09:41:39 -08:00
|
|
|
if (record_read(&rec_map, last_switch_in_stamp) == *last_in &&
|
|
|
|
|
record_read(&rec_map, current_context_index) == *id &&
|
|
|
|
|
record_read(&rec_map, total_runtime) == *total)
|
2022-01-24 18:01:24 -08:00
|
|
|
break;
|
|
|
|
|
} while (++i < 6);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_update_engine_gt_clks(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
struct intel_engine_guc_stats *stats = &engine->stats.guc;
|
|
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
2022-01-24 18:01:24 -08:00
|
|
|
u32 last_switch, ctx_id, total;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
|
|
|
|
|
lockdep_assert_held(&guc->timestamp.lock);
|
|
|
|
|
|
2022-01-24 18:01:24 -08:00
|
|
|
__get_engine_usage_record(engine, &last_switch, &ctx_id, &total);
|
|
|
|
|
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
stats->running = ctx_id != ~0U && last_switch;
|
|
|
|
|
if (stats->running)
|
|
|
|
|
__extend_last_switch(guc, &stats->start_gt_clk, last_switch);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Instead of adjusting the total for overflow, just add the
|
|
|
|
|
* difference from previous sample stats->total_gt_clks
|
|
|
|
|
*/
|
|
|
|
|
if (total && total != ~0U) {
|
|
|
|
|
stats->total_gt_clks += (u32)(total - stats->prev_total);
|
|
|
|
|
stats->prev_total = total;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-01-10 17:55:23 -08:00
|
|
|
static u32 gpm_timestamp_shift(struct intel_gt *gt)
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
{
|
2022-01-10 17:55:23 -08:00
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
u32 reg, shift;
|
|
|
|
|
|
|
|
|
|
with_intel_runtime_pm(gt->uncore->rpm, wakeref)
|
|
|
|
|
reg = intel_uncore_read(gt->uncore, RPM_CONFIG0);
|
|
|
|
|
|
|
|
|
|
shift = (reg & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >>
|
|
|
|
|
GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT;
|
|
|
|
|
|
|
|
|
|
return 3 - shift;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_update_pm_timestamp(struct intel_guc *guc, ktime_t *now)
|
|
|
|
|
{
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
u32 gt_stamp_lo, gt_stamp_hi;
|
|
|
|
|
u64 gpm_ts;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
|
|
|
|
|
lockdep_assert_held(&guc->timestamp.lock);
|
|
|
|
|
|
|
|
|
|
gt_stamp_hi = upper_32_bits(guc->timestamp.gt_stamp);
|
2022-04-26 17:35:15 -07:00
|
|
|
gpm_ts = intel_uncore_read64_2x32(gt->uncore, MISC_STATUS0,
|
|
|
|
|
MISC_STATUS1) >> guc->timestamp.shift;
|
2022-01-10 17:55:23 -08:00
|
|
|
gt_stamp_lo = lower_32_bits(gpm_ts);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
*now = ktime_get();
|
|
|
|
|
|
2022-01-10 17:55:23 -08:00
|
|
|
if (gt_stamp_lo < lower_32_bits(guc->timestamp.gt_stamp))
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
gt_stamp_hi++;
|
|
|
|
|
|
2022-01-10 17:55:23 -08:00
|
|
|
guc->timestamp.gt_stamp = ((u64)gt_stamp_hi << 32) | gt_stamp_lo;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Unlike the execlist mode of submission total and active times are in terms of
|
|
|
|
|
* gt clocks. The *now parameter is retained to return the cpu time at which the
|
|
|
|
|
* busyness was sampled.
|
|
|
|
|
*/
|
|
|
|
|
static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_guc_stats stats_saved, *stats = &engine->stats.guc;
|
|
|
|
|
struct i915_gpu_error *gpu_error = &engine->i915->gpu_error;
|
|
|
|
|
struct intel_gt *gt = engine->gt;
|
|
|
|
|
struct intel_guc *guc = >->uc.guc;
|
|
|
|
|
u64 total, gt_stamp_saved;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
u32 reset_count;
|
2021-11-08 13:10:57 -08:00
|
|
|
bool in_reset;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
|
|
|
|
|
spin_lock_irqsave(&guc->timestamp.lock, flags);
|
|
|
|
|
|
|
|
|
|
/*
|
2021-11-08 13:10:57 -08:00
|
|
|
* If a reset happened, we risk reading partially updated engine
|
|
|
|
|
* busyness from GuC, so we just use the driver stored copy of busyness.
|
|
|
|
|
* Synchronize with gt reset using reset_count and the
|
|
|
|
|
* I915_RESET_BACKOFF flag. Note that reset flow updates the reset_count
|
|
|
|
|
* after I915_RESET_BACKOFF flag, so ensure that the reset_count is
|
|
|
|
|
* usable by checking the flag afterwards.
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
*/
|
|
|
|
|
reset_count = i915_reset_count(gpu_error);
|
2021-11-08 13:10:57 -08:00
|
|
|
in_reset = test_bit(I915_RESET_BACKOFF, >->reset.flags);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
|
|
|
|
|
*now = ktime_get();
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* The active busyness depends on start_gt_clk and gt_stamp.
|
|
|
|
|
* gt_stamp is updated by i915 only when gt is awake and the
|
|
|
|
|
* start_gt_clk is derived from GuC state. To get a consistent
|
|
|
|
|
* view of activity, we query the GuC state only if gt is awake.
|
|
|
|
|
*/
|
2021-12-06 18:02:39 -08:00
|
|
|
if (!in_reset && intel_gt_pm_get_if_awake(gt)) {
|
2021-11-08 13:10:57 -08:00
|
|
|
stats_saved = *stats;
|
|
|
|
|
gt_stamp_saved = guc->timestamp.gt_stamp;
|
2022-01-24 18:01:24 -08:00
|
|
|
/*
|
|
|
|
|
* Update gt_clks, then gt timestamp to simplify the 'gt_stamp -
|
|
|
|
|
* start_gt_clk' calculation below for active engines.
|
|
|
|
|
*/
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
guc_update_engine_gt_clks(engine);
|
2022-01-10 17:55:23 -08:00
|
|
|
guc_update_pm_timestamp(guc, now);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
intel_gt_pm_put_async(gt);
|
|
|
|
|
if (i915_reset_count(gpu_error) != reset_count) {
|
|
|
|
|
*stats = stats_saved;
|
|
|
|
|
guc->timestamp.gt_stamp = gt_stamp_saved;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
total = intel_gt_clock_interval_to_ns(gt, stats->total_gt_clks);
|
|
|
|
|
if (stats->running) {
|
|
|
|
|
u64 clk = guc->timestamp.gt_stamp - stats->start_gt_clk;
|
|
|
|
|
|
|
|
|
|
total += intel_gt_clock_interval_to_ns(gt, clk);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
spin_unlock_irqrestore(&guc->timestamp.lock, flags);
|
|
|
|
|
|
|
|
|
|
return ns_to_ktime(total);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void __reset_guc_busyness_stats(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
enum intel_engine_id id;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
ktime_t unused;
|
|
|
|
|
|
|
|
|
|
cancel_delayed_work_sync(&guc->timestamp.work);
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&guc->timestamp.lock, flags);
|
|
|
|
|
|
2022-01-10 17:55:23 -08:00
|
|
|
guc_update_pm_timestamp(guc, &unused);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
for_each_engine(engine, gt, id) {
|
|
|
|
|
guc_update_engine_gt_clks(engine);
|
|
|
|
|
engine->stats.guc.prev_total = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
spin_unlock_irqrestore(&guc->timestamp.lock, flags);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void __update_guc_busyness_stats(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
enum intel_engine_id id;
|
2021-11-19 17:42:01 -08:00
|
|
|
unsigned long flags;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
ktime_t unused;
|
|
|
|
|
|
drm/i915/guc: Don't update engine busyness stats too frequently
Using two different types of workoads, it was observed that
guc_update_engine_gt_clks was being called too frequently and/or
causing a CPU-to-lmem bandwidth hit over PCIE. Details on
the workloads and numbers are in the notes below.
Background: At the moment, guc_update_engine_gt_clks can be invoked
via one of 3 ways. #1 and #2 are infrequent under normal operating
conditions:
1.When a predefined "ping_delay" timer expires so that GuC-
busyness can sample the GTPM clock counter to ensure it
doesn't miss a wrap-around of the 32-bits of the HW counter.
(The ping_delay is calculated based on 1/8th the time taken
for the counter go from 0x0 to 0xffffffff based on the
GT frequency. This comes to about once every 28 seconds at a
GT frequency of 19.2Mhz).
2.In preparation for a gt reset.
3.In response to __gt_park events (as the gt power management
puts the gt into a lower power state when there is no work
being done).
Root-cause: For both the workloads described farther below, it was
observed that when user space calls IOCTLs that unparks the
gt momentarily and repeats such calls many times in quick succession,
it triggers calling guc_update_engine_gt_clks as many times. However,
the primary purpose of guc_update_engine_gt_clks is to ensure we don't
miss the wraparound while the counter is ticking. Thus, the solution
is to ensure we skip that check if gt_park is calling this function
earlier than necessary.
Solution: Snapshot jiffies when we do actually update the busyness
stats. Then get the new jiffies every time intel_guc_busyness_park
is called and bail if we are being called too soon. Use half of the
ping_delay as a safe threshold.
NOTE1: Workload1: IGTs' gem_create was modified to create a file handle,
allocate memory with sizes that range from a min of 4K to the max supported
(in power of two step-sizes). Its maps, modifies and reads back the
memory. Allocations and modification is repeated until total memory
allocation reaches the max. Then the file handle is closed. With this
workload, guc_update_engine_gt_clks was called over 188 thousand times
in the span of 15 seconds while this test ran three times. With this patch,
the number of calls reduced to 14.
NOTE2: Workload2: 30 transcode sessions are created in quick succession.
While these sessions are created, pcm-iio tool was used to measure I/O
read operation bandwidth consumption sampled at 100 milisecond intervals
over the course of 20 seconds. The total bandwidth consumed over 20 seconds
without this patch was measured at average at 311KBps per sample. With this
patch, the number went down to about 175Kbps which is about a 43% savings.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220623023157.211650-2-alan.previn.teres.alexis@intel.com
2022-06-22 19:31:57 -07:00
|
|
|
guc->timestamp.last_stat_jiffies = jiffies;
|
|
|
|
|
|
2021-11-19 17:42:01 -08:00
|
|
|
spin_lock_irqsave(&guc->timestamp.lock, flags);
|
2022-01-10 17:55:23 -08:00
|
|
|
|
|
|
|
|
guc_update_pm_timestamp(guc, &unused);
|
|
|
|
|
for_each_engine(engine, gt, id)
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
guc_update_engine_gt_clks(engine);
|
2022-01-10 17:55:23 -08:00
|
|
|
|
2021-11-19 17:42:01 -08:00
|
|
|
spin_unlock_irqrestore(&guc->timestamp.lock, flags);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_timestamp_ping(struct work_struct *wrk)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = container_of(wrk, typeof(*guc),
|
|
|
|
|
timestamp.work.work);
|
|
|
|
|
struct intel_uc *uc = container_of(guc, typeof(*uc), guc);
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
int srcu, ret;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Synchronize with gt reset to make sure the worker does not
|
|
|
|
|
* corrupt the engine/guc stats.
|
|
|
|
|
*/
|
|
|
|
|
ret = intel_gt_reset_trylock(gt, &srcu);
|
|
|
|
|
if (ret)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
with_intel_runtime_pm(>->i915->runtime_pm, wakeref)
|
|
|
|
|
__update_guc_busyness_stats(guc);
|
|
|
|
|
|
|
|
|
|
intel_gt_reset_unlock(gt, srcu);
|
|
|
|
|
|
|
|
|
|
mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
|
|
|
|
|
guc->timestamp.ping_delay);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_action_enable_usage_stats(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
u32 offset = intel_guc_engine_usage_offset(guc);
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF,
|
|
|
|
|
offset,
|
|
|
|
|
0,
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
return intel_guc_send(guc, action, ARRAY_SIZE(action));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_init_engine_stats(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
|
|
|
|
|
mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
|
|
|
|
|
guc->timestamp.ping_delay);
|
|
|
|
|
|
|
|
|
|
with_intel_runtime_pm(>->i915->runtime_pm, wakeref) {
|
|
|
|
|
int ret = guc_action_enable_usage_stats(guc);
|
|
|
|
|
|
|
|
|
|
if (ret)
|
|
|
|
|
drm_err(>->i915->drm,
|
|
|
|
|
"Failed to enable usage stats: %d!\n", ret);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void intel_guc_busyness_park(struct intel_gt *gt)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = >->uc.guc;
|
|
|
|
|
|
|
|
|
|
if (!guc_submission_initialized(guc))
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
cancel_delayed_work(&guc->timestamp.work);
|
drm/i915/guc: Don't update engine busyness stats too frequently
Using two different types of workoads, it was observed that
guc_update_engine_gt_clks was being called too frequently and/or
causing a CPU-to-lmem bandwidth hit over PCIE. Details on
the workloads and numbers are in the notes below.
Background: At the moment, guc_update_engine_gt_clks can be invoked
via one of 3 ways. #1 and #2 are infrequent under normal operating
conditions:
1.When a predefined "ping_delay" timer expires so that GuC-
busyness can sample the GTPM clock counter to ensure it
doesn't miss a wrap-around of the 32-bits of the HW counter.
(The ping_delay is calculated based on 1/8th the time taken
for the counter go from 0x0 to 0xffffffff based on the
GT frequency. This comes to about once every 28 seconds at a
GT frequency of 19.2Mhz).
2.In preparation for a gt reset.
3.In response to __gt_park events (as the gt power management
puts the gt into a lower power state when there is no work
being done).
Root-cause: For both the workloads described farther below, it was
observed that when user space calls IOCTLs that unparks the
gt momentarily and repeats such calls many times in quick succession,
it triggers calling guc_update_engine_gt_clks as many times. However,
the primary purpose of guc_update_engine_gt_clks is to ensure we don't
miss the wraparound while the counter is ticking. Thus, the solution
is to ensure we skip that check if gt_park is calling this function
earlier than necessary.
Solution: Snapshot jiffies when we do actually update the busyness
stats. Then get the new jiffies every time intel_guc_busyness_park
is called and bail if we are being called too soon. Use half of the
ping_delay as a safe threshold.
NOTE1: Workload1: IGTs' gem_create was modified to create a file handle,
allocate memory with sizes that range from a min of 4K to the max supported
(in power of two step-sizes). Its maps, modifies and reads back the
memory. Allocations and modification is repeated until total memory
allocation reaches the max. Then the file handle is closed. With this
workload, guc_update_engine_gt_clks was called over 188 thousand times
in the span of 15 seconds while this test ran three times. With this patch,
the number of calls reduced to 14.
NOTE2: Workload2: 30 transcode sessions are created in quick succession.
While these sessions are created, pcm-iio tool was used to measure I/O
read operation bandwidth consumption sampled at 100 milisecond intervals
over the course of 20 seconds. The total bandwidth consumed over 20 seconds
without this patch was measured at average at 311KBps per sample. With this
patch, the number went down to about 175Kbps which is about a 43% savings.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220623023157.211650-2-alan.previn.teres.alexis@intel.com
2022-06-22 19:31:57 -07:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Before parking, we should sample engine busyness stats if we need to.
|
|
|
|
|
* We can skip it if we are less than half a ping from the last time we
|
|
|
|
|
* sampled the busyness stats.
|
|
|
|
|
*/
|
|
|
|
|
if (guc->timestamp.last_stat_jiffies &&
|
|
|
|
|
!time_after(jiffies, guc->timestamp.last_stat_jiffies +
|
|
|
|
|
(guc->timestamp.ping_delay / 2)))
|
|
|
|
|
return;
|
|
|
|
|
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
__update_guc_busyness_stats(guc);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void intel_guc_busyness_unpark(struct intel_gt *gt)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = >->uc.guc;
|
2022-01-24 18:01:24 -08:00
|
|
|
unsigned long flags;
|
|
|
|
|
ktime_t unused;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
|
|
|
|
|
if (!guc_submission_initialized(guc))
|
|
|
|
|
return;
|
|
|
|
|
|
2022-01-24 18:01:24 -08:00
|
|
|
spin_lock_irqsave(&guc->timestamp.lock, flags);
|
|
|
|
|
guc_update_pm_timestamp(guc, &unused);
|
|
|
|
|
spin_unlock_irqrestore(&guc->timestamp.lock, flags);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
|
|
|
|
|
guc->timestamp.ping_delay);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
static inline bool
|
|
|
|
|
submission_disabled(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine * const sched_engine = guc->sched_engine;
|
|
|
|
|
|
|
|
|
|
return unlikely(!sched_engine ||
|
2021-12-21 13:02:12 -08:00
|
|
|
!__tasklet_is_enabled(&sched_engine->tasklet) ||
|
|
|
|
|
intel_gt_is_wedged(guc_to_gt(guc)));
|
2021-07-26 17:23:23 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void disable_submission(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine * const sched_engine = guc->sched_engine;
|
|
|
|
|
|
|
|
|
|
if (__tasklet_is_enabled(&sched_engine->tasklet)) {
|
|
|
|
|
GEM_BUG_ON(!guc->ct.enabled);
|
|
|
|
|
__tasklet_disable_sync_once(&sched_engine->tasklet);
|
|
|
|
|
sched_engine->tasklet.callback = NULL;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void enable_submission(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine * const sched_engine = guc->sched_engine;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&guc->sched_engine->lock, flags);
|
|
|
|
|
sched_engine->tasklet.callback = guc_submission_tasklet;
|
|
|
|
|
wmb(); /* Make sure callback visible */
|
|
|
|
|
if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
|
|
|
|
|
__tasklet_enable(&sched_engine->tasklet)) {
|
|
|
|
|
GEM_BUG_ON(!guc->ct.enabled);
|
|
|
|
|
|
|
|
|
|
/* And kick in case we missed a new request submission. */
|
|
|
|
|
tasklet_hi_schedule(&sched_engine->tasklet);
|
|
|
|
|
}
|
|
|
|
|
spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_flush_submissions(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine * const sched_engine = guc->sched_engine;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&sched_engine->lock, flags);
|
|
|
|
|
spin_unlock_irqrestore(&sched_engine->lock, flags);
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:42 -07:00
|
|
|
static void guc_flush_destroyed_contexts(struct intel_guc *guc);
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
void intel_guc_submission_reset_prepare(struct intel_guc *guc)
|
2018-05-16 19:33:52 +01:00
|
|
|
{
|
2021-07-26 17:23:23 -07:00
|
|
|
if (unlikely(!guc_submission_initialized(guc))) {
|
|
|
|
|
/* Reset called during driver load? GuC not yet initialised! */
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:24 -07:00
|
|
|
intel_gt_park_heartbeats(guc_to_gt(guc));
|
2021-07-26 17:23:23 -07:00
|
|
|
disable_submission(guc);
|
|
|
|
|
guc->interrupts.disable(guc);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
__reset_guc_busyness_stats(guc);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
|
|
|
|
/* Flush IRQ handler */
|
|
|
|
|
spin_lock_irq(&guc_to_gt(guc)->irq_lock);
|
|
|
|
|
spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
|
|
|
|
|
|
|
|
|
|
guc_flush_submissions(guc);
|
2021-10-14 10:19:42 -07:00
|
|
|
guc_flush_destroyed_contexts(guc);
|
2022-01-20 20:31:18 -08:00
|
|
|
flush_work(&guc->ct.requests.worker);
|
2021-09-09 09:47:38 -07:00
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
scrub_guc_desc_for_outstanding_g2h(guc);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static struct intel_engine_cs *
|
|
|
|
|
guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
intel_engine_mask_t tmp, mask = ve->mask;
|
|
|
|
|
unsigned int num_siblings = 0;
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(engine, ve->gt, mask, tmp)
|
|
|
|
|
if (num_siblings++ == sibling)
|
|
|
|
|
return engine;
|
|
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline struct intel_engine_cs *
|
|
|
|
|
__context_to_physical_engine(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = ce->engine;
|
|
|
|
|
|
|
|
|
|
if (intel_engine_is_virtual(engine))
|
|
|
|
|
engine = guc_virtual_get_sibling(engine, 0);
|
|
|
|
|
|
|
|
|
|
return engine;
|
2018-05-16 19:33:52 +01:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
|
2020-12-19 02:03:42 +00:00
|
|
|
{
|
2021-07-26 17:23:23 -07:00
|
|
|
struct intel_engine_cs *engine = __context_to_physical_engine(ce);
|
|
|
|
|
|
2021-07-26 17:23:39 -07:00
|
|
|
if (intel_context_is_banned(ce))
|
|
|
|
|
return;
|
|
|
|
|
|
2020-12-19 02:03:42 +00:00
|
|
|
GEM_BUG_ON(!intel_context_is_pinned(ce));
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* We want a simple context + ring to execute the breadcrumb update.
|
|
|
|
|
* We cannot rely on the context being intact across the GPU hang,
|
|
|
|
|
* so clear it and rebuild just what we need for the breadcrumb.
|
|
|
|
|
* All pending requests for this context will be zapped, and any
|
|
|
|
|
* future request will be after userspace has had the opportunity
|
|
|
|
|
* to recreate its own state.
|
|
|
|
|
*/
|
|
|
|
|
if (scrub)
|
|
|
|
|
lrc_init_regs(ce, engine, true);
|
|
|
|
|
|
|
|
|
|
/* Rerun the request; its payload has been neutered (if guilty). */
|
|
|
|
|
lrc_update_regs(ce, engine, head);
|
|
|
|
|
}
|
|
|
|
|
|
2022-04-15 15:40:21 -07:00
|
|
|
static void guc_engine_reset_prepare(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
2022-06-21 12:21:05 -07:00
|
|
|
if (!IS_GRAPHICS_VER(engine->i915, 11, 12))
|
2022-04-15 15:40:21 -07:00
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
intel_engine_stop_cs(engine);
|
|
|
|
|
|
2022-06-21 12:21:05 -07:00
|
|
|
/*
|
|
|
|
|
* Wa_22011802037:gen11/gen12: In addition to stopping the cs, we need
|
|
|
|
|
* to wait for any pending mi force wakeups
|
|
|
|
|
*/
|
|
|
|
|
intel_engine_wait_for_pending_mi_fw(engine);
|
2022-04-15 15:40:21 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
static void guc_reset_nop(struct intel_engine_cs *engine)
|
2019-04-11 14:05:14 +01:00
|
|
|
{
|
2021-07-26 17:23:23 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
|
|
|
|
|
{
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
__unwind_incomplete_requests(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct i915_request *rq, *rn;
|
|
|
|
|
struct list_head *pl;
|
|
|
|
|
int prio = I915_PRIORITY_INVALID;
|
|
|
|
|
struct i915_sched_engine * const sched_engine =
|
|
|
|
|
ce->engine->sched_engine;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&sched_engine->lock, flags);
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_lock(&ce->guc_state.lock);
|
2021-09-09 09:47:24 -07:00
|
|
|
list_for_each_entry_safe_reverse(rq, rn,
|
2021-09-09 09:47:43 -07:00
|
|
|
&ce->guc_state.requests,
|
2021-09-09 09:47:24 -07:00
|
|
|
sched.link) {
|
2021-07-26 17:23:23 -07:00
|
|
|
if (i915_request_completed(rq))
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
list_del_init(&rq->sched.link);
|
|
|
|
|
__i915_request_unsubmit(rq);
|
|
|
|
|
|
|
|
|
|
/* Push the request back into the queue for later resubmission. */
|
|
|
|
|
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
|
|
|
|
|
if (rq_prio(rq) != prio) {
|
|
|
|
|
prio = rq_prio(rq);
|
|
|
|
|
pl = i915_sched_lookup_priolist(sched_engine, prio);
|
|
|
|
|
}
|
|
|
|
|
GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
|
|
|
|
|
|
2021-09-09 09:47:24 -07:00
|
|
|
list_add(&rq->sched.link, pl);
|
2021-07-26 17:23:23 -07:00
|
|
|
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
|
|
|
|
|
}
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-07-26 17:23:23 -07:00
|
|
|
spin_unlock_irqrestore(&sched_engine->lock, flags);
|
|
|
|
|
}
|
|
|
|
|
|
2022-04-25 17:30:45 -07:00
|
|
|
static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t stalled)
|
2021-07-26 17:23:23 -07:00
|
|
|
{
|
2022-04-25 17:30:45 -07:00
|
|
|
bool guilty;
|
2021-07-26 17:23:23 -07:00
|
|
|
struct i915_request *rq;
|
2021-09-09 09:47:27 -07:00
|
|
|
unsigned long flags;
|
2021-07-26 17:23:23 -07:00
|
|
|
u32 head;
|
2021-10-14 10:19:54 -07:00
|
|
|
int i, number_children = ce->parallel.number_children;
|
|
|
|
|
struct intel_context *parent = ce;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-26 17:23:23 -07:00
|
|
|
|
2021-07-26 17:23:39 -07:00
|
|
|
intel_context_get(ce);
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
/*
|
2021-09-09 09:47:27 -07:00
|
|
|
* GuC will implicitly mark the context as non-schedulable when it sends
|
|
|
|
|
* the reset notification. Make sure our state reflects this change. The
|
|
|
|
|
* context will be marked enabled on resubmission.
|
2021-07-26 17:23:23 -07:00
|
|
|
*/
|
2021-09-09 09:47:27 -07:00
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
2022-01-13 10:13:51 -08:00
|
|
|
clr_context_enabled(ce);
|
2021-09-09 09:47:27 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
/*
|
|
|
|
|
* For each context in the relationship find the hanging request
|
|
|
|
|
* resetting each context / request as needed
|
|
|
|
|
*/
|
|
|
|
|
for (i = 0; i < number_children + 1; ++i) {
|
|
|
|
|
if (!intel_context_is_pinned(ce))
|
|
|
|
|
goto next_context;
|
|
|
|
|
|
2022-04-25 17:30:45 -07:00
|
|
|
guilty = false;
|
2021-10-14 10:19:54 -07:00
|
|
|
rq = intel_context_find_active_request(ce);
|
|
|
|
|
if (!rq) {
|
|
|
|
|
head = ce->ring->tail;
|
|
|
|
|
goto out_replay;
|
|
|
|
|
}
|
2019-04-11 14:05:14 +01:00
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
if (i915_request_started(rq))
|
2022-04-25 17:30:45 -07:00
|
|
|
guilty = stalled & ce->engine->mask;
|
2019-04-11 14:05:14 +01:00
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
GEM_BUG_ON(i915_active_is_idle(&ce->active));
|
|
|
|
|
head = intel_ring_wrap(ce->ring, rq->head);
|
2019-04-11 14:05:14 +01:00
|
|
|
|
2022-04-25 17:30:45 -07:00
|
|
|
__i915_request_reset(rq, guilty);
|
2021-07-26 17:23:23 -07:00
|
|
|
out_replay:
|
2022-04-25 17:30:45 -07:00
|
|
|
guc_reset_state(ce, head, guilty);
|
2021-10-14 10:19:54 -07:00
|
|
|
next_context:
|
|
|
|
|
if (i != number_children)
|
|
|
|
|
ce = list_next_entry(ce, parallel.child_link);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
__unwind_incomplete_requests(parent);
|
|
|
|
|
intel_context_put(parent);
|
2021-07-26 17:23:23 -07:00
|
|
|
}
|
|
|
|
|
|
2022-04-25 17:30:45 -07:00
|
|
|
void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
|
2021-07-26 17:23:23 -07:00
|
|
|
{
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
unsigned long index;
|
2021-09-09 09:47:39 -07:00
|
|
|
unsigned long flags;
|
2021-07-26 17:23:23 -07:00
|
|
|
|
|
|
|
|
if (unlikely(!guc_submission_initialized(guc))) {
|
|
|
|
|
/* Reset called during driver load? GuC not yet initialised! */
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
|
|
|
|
xa_for_each(&guc->context_lookup, index, ce) {
|
|
|
|
|
if (!kref_get_unless_zero(&ce->ref))
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
xa_unlock(&guc->context_lookup);
|
|
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
if (intel_context_is_pinned(ce) &&
|
|
|
|
|
!intel_context_is_child(ce))
|
2021-07-26 17:23:23 -07:00
|
|
|
__guc_reset_context(ce, stalled);
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
intel_context_put(ce);
|
|
|
|
|
|
|
|
|
|
xa_lock(&guc->context_lookup);
|
|
|
|
|
}
|
|
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
/* GuC is blown away, drop all references to contexts */
|
|
|
|
|
xa_destroy(&guc->context_lookup);
|
2019-04-11 14:05:14 +01:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
static void guc_cancel_context_requests(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
|
|
|
|
|
struct i915_request *rq;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
/* Mark all executing requests as skipped. */
|
|
|
|
|
spin_lock_irqsave(&sched_engine->lock, flags);
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_lock(&ce->guc_state.lock);
|
|
|
|
|
list_for_each_entry(rq, &ce->guc_state.requests, sched.link)
|
2021-07-26 17:23:23 -07:00
|
|
|
i915_request_put(i915_request_mark_eio(rq));
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-07-26 17:23:23 -07:00
|
|
|
spin_unlock_irqrestore(&sched_engine->lock, flags);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
|
2019-04-11 14:05:14 +01:00
|
|
|
{
|
|
|
|
|
struct i915_request *rq, *rn;
|
|
|
|
|
struct rb_node *rb;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
/* Can be called during boot if GuC fails to load */
|
2021-07-26 17:23:23 -07:00
|
|
|
if (!sched_engine)
|
2021-07-21 14:50:47 -07:00
|
|
|
return;
|
|
|
|
|
|
2019-04-11 14:05:14 +01:00
|
|
|
/*
|
|
|
|
|
* Before we call engine->cancel_requests(), we should have exclusive
|
|
|
|
|
* access to the submission state. This is arranged for us by the
|
|
|
|
|
* caller disabling the interrupt generation, the tasklet and other
|
|
|
|
|
* threads that may then access the same state, giving us a free hand
|
|
|
|
|
* to reset state. However, we still need to let lockdep be aware that
|
|
|
|
|
* we know this state may be accessed in hardirq context, so we
|
|
|
|
|
* disable the irq around this manipulation and we want to keep
|
|
|
|
|
* the spinlock focused on its duties and not accidentally conflate
|
|
|
|
|
* coverage to the submission's irq state. (Similarly, although we
|
|
|
|
|
* shouldn't need to disable irq around the manipulation of the
|
|
|
|
|
* submission's irq state, we also wish to remind ourselves that
|
|
|
|
|
* it is irq state.)
|
|
|
|
|
*/
|
2021-06-17 18:06:34 -07:00
|
|
|
spin_lock_irqsave(&sched_engine->lock, flags);
|
2019-04-11 14:05:14 +01:00
|
|
|
|
|
|
|
|
/* Flush the queued requests to the timeline list (for retiring). */
|
2021-06-17 18:06:31 -07:00
|
|
|
while ((rb = rb_first_cached(&sched_engine->queue))) {
|
2019-04-11 14:05:14 +01:00
|
|
|
struct i915_priolist *p = to_priolist(rb);
|
|
|
|
|
|
2021-01-20 12:14:38 +00:00
|
|
|
priolist_for_each_request_consume(rq, rn, p) {
|
2019-04-11 14:05:14 +01:00
|
|
|
list_del_init(&rq->sched.link);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
2019-04-11 14:05:14 +01:00
|
|
|
__i915_request_submit(rq);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
|
|
|
|
i915_request_put(i915_request_mark_eio(rq));
|
2019-04-11 14:05:14 +01:00
|
|
|
}
|
|
|
|
|
|
2021-06-17 18:06:31 -07:00
|
|
|
rb_erase_cached(&p->node, &sched_engine->queue);
|
2019-04-11 14:05:14 +01:00
|
|
|
i915_priolist_free(p);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Remaining _unready_ requests will be nop'ed when submitted */
|
|
|
|
|
|
2021-06-17 18:06:31 -07:00
|
|
|
sched_engine->queue_priority_hint = INT_MIN;
|
|
|
|
|
sched_engine->queue = RB_ROOT_CACHED;
|
2019-04-11 14:05:14 +01:00
|
|
|
|
2021-06-17 18:06:34 -07:00
|
|
|
spin_unlock_irqrestore(&sched_engine->lock, flags);
|
2019-04-11 14:05:14 +01:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
void intel_guc_submission_cancel_requests(struct intel_guc *guc)
|
2019-04-11 14:05:14 +01:00
|
|
|
{
|
2021-07-26 17:23:23 -07:00
|
|
|
struct intel_context *ce;
|
|
|
|
|
unsigned long index;
|
2021-09-09 09:47:39 -07:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
|
|
|
|
xa_for_each(&guc->context_lookup, index, ce) {
|
|
|
|
|
if (!kref_get_unless_zero(&ce->ref))
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
xa_unlock(&guc->context_lookup);
|
2021-07-26 17:23:23 -07:00
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
if (intel_context_is_pinned(ce) &&
|
|
|
|
|
!intel_context_is_child(ce))
|
2021-07-26 17:23:23 -07:00
|
|
|
guc_cancel_context_requests(ce);
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
intel_context_put(ce);
|
|
|
|
|
|
|
|
|
|
xa_lock(&guc->context_lookup);
|
|
|
|
|
}
|
|
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
guc_cancel_sched_engine_requests(guc->sched_engine);
|
|
|
|
|
|
|
|
|
|
/* GuC is blown away, drop all references to contexts */
|
|
|
|
|
xa_destroy(&guc->context_lookup);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void intel_guc_submission_reset_finish(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
/* Reset called during driver load or during wedge? */
|
|
|
|
|
if (unlikely(!guc_submission_initialized(guc) ||
|
2021-12-21 13:02:12 -08:00
|
|
|
intel_gt_is_wedged(guc_to_gt(guc)))) {
|
2021-07-26 17:23:23 -07:00
|
|
|
return;
|
|
|
|
|
}
|
2019-04-11 14:05:14 +01:00
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
/*
|
|
|
|
|
* Technically possible for either of these values to be non-zero here,
|
|
|
|
|
* but very unlikely + harmless. Regardless let's add a warn so we can
|
|
|
|
|
* see in CI if this happens frequently / a precursor to taking down the
|
|
|
|
|
* machine.
|
|
|
|
|
*/
|
|
|
|
|
GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
|
|
|
|
|
atomic_set(&guc->outstanding_submission_g2h, 0);
|
|
|
|
|
|
2021-07-26 17:23:35 -07:00
|
|
|
intel_guc_global_policies_update(guc);
|
2021-07-26 17:23:23 -07:00
|
|
|
enable_submission(guc);
|
2021-07-26 17:23:24 -07:00
|
|
|
intel_gt_unpark_heartbeats(guc_to_gt(guc));
|
2019-04-11 14:05:14 +01:00
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:42 -07:00
|
|
|
static void destroyed_worker_func(struct work_struct *w);
|
2022-01-20 20:31:17 -08:00
|
|
|
static void reset_fail_worker_func(struct work_struct *w);
|
2021-10-14 10:19:42 -07:00
|
|
|
|
2015-08-12 15:43:39 +01:00
|
|
|
/*
|
2017-03-22 10:39:52 -07:00
|
|
|
* Set up the memory resources to be shared with the GuC (via the GGTT)
|
|
|
|
|
* at firmware loading time.
|
2015-08-12 15:43:39 +01:00
|
|
|
*/
|
2017-11-16 19:02:39 +05:30
|
|
|
int intel_guc_submission_init(struct intel_guc *guc)
|
2015-08-12 15:43:39 +01:00
|
|
|
{
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
2022-07-18 16:07:32 -07:00
|
|
|
int ret;
|
2015-08-12 15:43:39 +01:00
|
|
|
|
2022-03-01 16:33:51 -08:00
|
|
|
if (guc->submission_initialized)
|
2017-03-22 10:39:46 -07:00
|
|
|
return 0;
|
2015-08-12 15:43:39 +01:00
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
if (guc->fw.major_ver_found < 70) {
|
|
|
|
|
ret = guc_lrc_desc_pool_create_v69(guc);
|
|
|
|
|
if (ret)
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
guc->submission_state.guc_ids_bitmap =
|
2021-12-14 09:05:00 -08:00
|
|
|
bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
|
2022-07-18 16:07:32 -07:00
|
|
|
if (!guc->submission_state.guc_ids_bitmap) {
|
|
|
|
|
ret = -ENOMEM;
|
|
|
|
|
goto destroy_pool;
|
|
|
|
|
}
|
2021-10-14 10:19:50 -07:00
|
|
|
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
|
2022-01-10 17:55:23 -08:00
|
|
|
guc->timestamp.shift = gpm_timestamp_shift(gt);
|
2022-03-01 16:33:51 -08:00
|
|
|
guc->submission_initialized = true;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
|
2015-08-12 15:43:39 +01:00
|
|
|
return 0;
|
2022-07-18 16:07:32 -07:00
|
|
|
|
|
|
|
|
destroy_pool:
|
|
|
|
|
guc_lrc_desc_pool_destroy_v69(guc);
|
|
|
|
|
|
|
|
|
|
return ret;
|
2017-03-22 10:39:46 -07:00
|
|
|
}
|
|
|
|
|
|
2017-11-16 19:02:39 +05:30
|
|
|
void intel_guc_submission_fini(struct intel_guc *guc)
|
2017-03-22 10:39:46 -07:00
|
|
|
{
|
2022-03-01 16:33:51 -08:00
|
|
|
if (!guc->submission_initialized)
|
2021-07-21 14:50:47 -07:00
|
|
|
return;
|
|
|
|
|
|
2021-10-14 10:19:42 -07:00
|
|
|
guc_flush_destroyed_contexts(guc);
|
2022-07-18 16:07:32 -07:00
|
|
|
guc_lrc_desc_pool_destroy_v69(guc);
|
2021-07-21 14:50:47 -07:00
|
|
|
i915_sched_engine_put(guc->sched_engine);
|
2021-10-14 10:19:50 -07:00
|
|
|
bitmap_free(guc->submission_state.guc_ids_bitmap);
|
2022-03-01 16:33:51 -08:00
|
|
|
guc->submission_initialized = false;
|
2016-11-29 12:10:23 +00:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static inline void queue_request(struct i915_sched_engine *sched_engine,
|
|
|
|
|
struct i915_request *rq,
|
|
|
|
|
int prio)
|
2021-01-12 18:12:35 -08:00
|
|
|
{
|
2021-07-21 14:50:49 -07:00
|
|
|
GEM_BUG_ON(!list_empty(&rq->sched.link));
|
|
|
|
|
list_add_tail(&rq->sched.link,
|
|
|
|
|
i915_sched_lookup_priolist(sched_engine, prio));
|
|
|
|
|
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
|
2021-09-09 09:47:29 -07:00
|
|
|
tasklet_hi_schedule(&sched_engine->tasklet);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_bypass_tasklet_submit(struct intel_guc *guc,
|
|
|
|
|
struct i915_request *rq)
|
|
|
|
|
{
|
2021-10-14 10:19:52 -07:00
|
|
|
int ret = 0;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
__i915_request_submit(rq);
|
|
|
|
|
|
|
|
|
|
trace_i915_request_in(rq, 0);
|
|
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
if (is_multi_lrc_rq(rq)) {
|
|
|
|
|
if (multi_lrc_submit(rq)) {
|
|
|
|
|
ret = guc_wq_item_append(guc, rq);
|
|
|
|
|
if (!ret)
|
|
|
|
|
ret = guc_add_request(guc, rq);
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
guc_set_lrc_tail(rq);
|
|
|
|
|
ret = guc_add_request(guc, rq);
|
|
|
|
|
}
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
if (unlikely(ret == -EPIPE))
|
|
|
|
|
disable_submission(guc);
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
static bool need_tasklet(struct intel_guc *guc, struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
|
|
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
|
|
|
|
|
|
|
|
|
return submission_disabled(guc) || guc->stalled_request ||
|
|
|
|
|
!i915_sched_engine_is_empty(sched_engine) ||
|
2022-03-01 16:33:50 -08:00
|
|
|
!ctx_id_mapped(guc, ce->guc_id.id);
|
2021-10-14 10:19:52 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static void guc_submit_request(struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
|
|
|
|
|
struct intel_guc *guc = &rq->engine->gt->uc.guc;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
/* Will be called from irq-context when using foreign fences. */
|
|
|
|
|
spin_lock_irqsave(&sched_engine->lock, flags);
|
|
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
if (need_tasklet(guc, rq))
|
2021-07-21 14:50:49 -07:00
|
|
|
queue_request(sched_engine, rq, rq_prio(rq));
|
|
|
|
|
else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
|
|
|
|
|
tasklet_hi_schedule(&sched_engine->tasklet);
|
|
|
|
|
|
|
|
|
|
spin_unlock_irqrestore(&sched_engine->lock, flags);
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
2021-10-14 10:19:50 -07:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
|
|
|
|
|
|
|
|
|
if (intel_context_is_parent(ce))
|
|
|
|
|
ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
|
2021-12-14 09:05:00 -08:00
|
|
|
NUMBER_MULTI_LRC_GUC_ID(guc),
|
2021-10-14 10:19:50 -07:00
|
|
|
order_base_2(ce->parallel.number_children
|
|
|
|
|
+ 1));
|
|
|
|
|
else
|
|
|
|
|
ret = ida_simple_get(&guc->submission_state.guc_ids,
|
2021-12-14 09:05:00 -08:00
|
|
|
NUMBER_MULTI_LRC_GUC_ID(guc),
|
|
|
|
|
guc->submission_state.num_guc_ids,
|
2021-10-14 10:19:50 -07:00
|
|
|
GFP_KERNEL | __GFP_RETRY_MAYFAIL |
|
|
|
|
|
__GFP_NOWARN);
|
|
|
|
|
if (unlikely(ret < 0))
|
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
|
|
ce->guc_id.id = ret;
|
|
|
|
|
return 0;
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
|
|
|
|
|
{
|
2021-10-14 10:19:50 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
if (!context_guc_id_invalid(ce)) {
|
2021-10-14 10:19:50 -07:00
|
|
|
if (intel_context_is_parent(ce))
|
|
|
|
|
bitmap_release_region(guc->submission_state.guc_ids_bitmap,
|
|
|
|
|
ce->guc_id.id,
|
|
|
|
|
order_base_2(ce->parallel.number_children
|
|
|
|
|
+ 1));
|
|
|
|
|
else
|
|
|
|
|
ida_simple_remove(&guc->submission_state.guc_ids,
|
|
|
|
|
ce->guc_id.id);
|
2022-03-01 16:33:50 -08:00
|
|
|
clr_ctx_id_mapping(guc, ce->guc_id.id);
|
2021-07-21 14:50:49 -07:00
|
|
|
set_context_guc_id_invalid(ce);
|
|
|
|
|
}
|
2021-09-09 09:47:42 -07:00
|
|
|
if (!list_empty(&ce->guc_id.link))
|
|
|
|
|
list_del_init(&ce->guc_id.link);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
2021-10-14 10:19:41 -07:00
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
2021-07-21 14:50:49 -07:00
|
|
|
__release_guc_id(guc, ce);
|
2021-10-14 10:19:41 -07:00
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
2021-10-14 10:19:50 -07:00
|
|
|
struct intel_context *cn;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:41 -07:00
|
|
|
lockdep_assert_held(&guc->submission_state.lock);
|
2021-10-14 10:19:50 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
|
|
|
|
GEM_BUG_ON(intel_context_is_parent(ce));
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:41 -07:00
|
|
|
if (!list_empty(&guc->submission_state.guc_id_list)) {
|
2021-10-14 10:19:50 -07:00
|
|
|
cn = list_first_entry(&guc->submission_state.guc_id_list,
|
2021-07-21 14:50:49 -07:00
|
|
|
struct intel_context,
|
2021-09-09 09:47:42 -07:00
|
|
|
guc_id.link);
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
GEM_BUG_ON(atomic_read(&cn->guc_id.ref));
|
|
|
|
|
GEM_BUG_ON(context_guc_id_invalid(cn));
|
|
|
|
|
GEM_BUG_ON(intel_context_is_child(cn));
|
|
|
|
|
GEM_BUG_ON(intel_context_is_parent(cn));
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
list_del_init(&cn->guc_id.link);
|
2021-12-14 09:04:55 -08:00
|
|
|
ce->guc_id.id = cn->guc_id.id;
|
2021-09-09 09:47:38 -07:00
|
|
|
|
2021-12-14 09:04:54 -08:00
|
|
|
spin_lock(&cn->guc_state.lock);
|
2021-10-14 10:19:50 -07:00
|
|
|
clr_context_registered(cn);
|
2021-12-14 09:04:54 -08:00
|
|
|
spin_unlock(&cn->guc_state.lock);
|
2021-09-09 09:47:38 -07:00
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
set_context_guc_id_invalid(cn);
|
|
|
|
|
|
2021-12-14 09:05:00 -08:00
|
|
|
#ifdef CONFIG_DRM_I915_SELFTEST
|
|
|
|
|
guc->number_guc_id_stolen++;
|
|
|
|
|
#endif
|
|
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
return 0;
|
2021-07-21 14:50:49 -07:00
|
|
|
} else {
|
|
|
|
|
return -EAGAIN;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
static int assign_guc_id(struct intel_guc *guc, struct intel_context *ce)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
|
|
|
|
int ret;
|
|
|
|
|
|
2021-10-14 10:19:41 -07:00
|
|
|
lockdep_assert_held(&guc->submission_state.lock);
|
2021-10-14 10:19:50 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
ret = new_guc_id(guc, ce);
|
2021-07-21 14:50:49 -07:00
|
|
|
if (unlikely(ret < 0)) {
|
2021-10-14 10:19:50 -07:00
|
|
|
if (intel_context_is_parent(ce))
|
|
|
|
|
return -ENOSPC;
|
|
|
|
|
|
|
|
|
|
ret = steal_guc_id(guc, ce);
|
2021-07-21 14:50:49 -07:00
|
|
|
if (ret < 0)
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
if (intel_context_is_parent(ce)) {
|
|
|
|
|
struct intel_context *child;
|
|
|
|
|
int i = 1;
|
|
|
|
|
|
|
|
|
|
for_each_child(ce, child)
|
|
|
|
|
child->guc_id.id = ce->guc_id.id + i++;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#define PIN_GUC_ID_TRIES 4
|
|
|
|
|
static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
int ret = 0;
|
|
|
|
|
unsigned long flags, tries = PIN_GUC_ID_TRIES;
|
|
|
|
|
|
2021-09-09 09:47:42 -07:00
|
|
|
GEM_BUG_ON(atomic_read(&ce->guc_id.ref));
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
try_again:
|
2021-10-14 10:19:41 -07:00
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-09-09 09:47:38 -07:00
|
|
|
might_lock(&ce->guc_state.lock);
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
if (context_guc_id_invalid(ce)) {
|
2021-10-14 10:19:50 -07:00
|
|
|
ret = assign_guc_id(guc, ce);
|
2021-07-21 14:50:49 -07:00
|
|
|
if (ret)
|
|
|
|
|
goto out_unlock;
|
|
|
|
|
ret = 1; /* Indidcates newly assigned guc_id */
|
|
|
|
|
}
|
2021-09-09 09:47:42 -07:00
|
|
|
if (!list_empty(&ce->guc_id.link))
|
|
|
|
|
list_del_init(&ce->guc_id.link);
|
|
|
|
|
atomic_inc(&ce->guc_id.ref);
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
out_unlock:
|
2021-10-14 10:19:41 -07:00
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
/*
|
2021-09-09 09:47:42 -07:00
|
|
|
* -EAGAIN indicates no guc_id are available, let's retire any
|
2021-07-21 14:50:49 -07:00
|
|
|
* outstanding requests to see if that frees up a guc_id. If the first
|
|
|
|
|
* retire didn't help, insert a sleep with the timeslice duration before
|
|
|
|
|
* attempting to retire more requests. Double the sleep period each
|
|
|
|
|
* subsequent pass before finally giving up. The sleep period has max of
|
|
|
|
|
* 100ms and minimum of 1ms.
|
|
|
|
|
*/
|
|
|
|
|
if (ret == -EAGAIN && --tries) {
|
|
|
|
|
if (PIN_GUC_ID_TRIES - tries > 1) {
|
|
|
|
|
unsigned int timeslice_shifted =
|
|
|
|
|
ce->engine->props.timeslice_duration_ms <<
|
|
|
|
|
(PIN_GUC_ID_TRIES - tries - 2);
|
|
|
|
|
unsigned int max = min_t(unsigned int, 100,
|
|
|
|
|
timeslice_shifted);
|
|
|
|
|
|
|
|
|
|
msleep(max_t(unsigned int, max, 1));
|
|
|
|
|
}
|
|
|
|
|
intel_gt_retire_requests(guc_to_gt(guc));
|
|
|
|
|
goto try_again;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
2021-09-09 09:47:42 -07:00
|
|
|
GEM_BUG_ON(atomic_read(&ce->guc_id.ref) < 0);
|
2021-10-14 10:19:50 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:50 -07:00
|
|
|
if (unlikely(context_guc_id_invalid(ce) ||
|
|
|
|
|
intel_context_is_parent(ce)))
|
2021-07-21 14:50:49 -07:00
|
|
|
return;
|
|
|
|
|
|
2021-10-14 10:19:41 -07:00
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
2021-09-09 09:47:42 -07:00
|
|
|
if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id.link) &&
|
|
|
|
|
!atomic_read(&ce->guc_id.ref))
|
2021-10-14 10:19:41 -07:00
|
|
|
list_add_tail(&ce->guc_id.link,
|
|
|
|
|
&guc->submission_state.guc_id_list);
|
|
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static int __guc_action_register_multi_lrc_v69(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce,
|
|
|
|
|
u32 guc_id,
|
|
|
|
|
u32 offset,
|
|
|
|
|
bool loop)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *child;
|
|
|
|
|
u32 action[4 + MAX_ENGINE_INSTANCE];
|
|
|
|
|
int len = 0;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(ce->parallel.number_children > MAX_ENGINE_INSTANCE);
|
|
|
|
|
|
|
|
|
|
action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
|
|
|
|
|
action[len++] = guc_id;
|
|
|
|
|
action[len++] = ce->parallel.number_children + 1;
|
|
|
|
|
action[len++] = offset;
|
|
|
|
|
for_each_child(ce, child) {
|
|
|
|
|
offset += sizeof(struct guc_lrc_desc_v69);
|
|
|
|
|
action[len++] = offset;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return guc_submission_send_busy_loop(guc, action, len, 0, loop);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int __guc_action_register_multi_lrc_v70(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce,
|
|
|
|
|
struct guc_ctxt_registration_info *info,
|
|
|
|
|
bool loop)
|
2021-10-14 10:19:48 -07:00
|
|
|
{
|
|
|
|
|
struct intel_context *child;
|
2022-04-12 15:59:55 -07:00
|
|
|
u32 action[13 + (MAX_ENGINE_INSTANCE * 2)];
|
2021-10-14 10:19:48 -07:00
|
|
|
int len = 0;
|
2022-04-12 15:59:55 -07:00
|
|
|
u32 next_id;
|
2021-10-14 10:19:48 -07:00
|
|
|
|
|
|
|
|
GEM_BUG_ON(ce->parallel.number_children > MAX_ENGINE_INSTANCE);
|
|
|
|
|
|
|
|
|
|
action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
|
2022-04-12 15:59:55 -07:00
|
|
|
action[len++] = info->flags;
|
|
|
|
|
action[len++] = info->context_idx;
|
|
|
|
|
action[len++] = info->engine_class;
|
|
|
|
|
action[len++] = info->engine_submit_mask;
|
|
|
|
|
action[len++] = info->wq_desc_lo;
|
|
|
|
|
action[len++] = info->wq_desc_hi;
|
|
|
|
|
action[len++] = info->wq_base_lo;
|
|
|
|
|
action[len++] = info->wq_base_hi;
|
|
|
|
|
action[len++] = info->wq_size;
|
2021-10-14 10:19:48 -07:00
|
|
|
action[len++] = ce->parallel.number_children + 1;
|
2022-04-12 15:59:55 -07:00
|
|
|
action[len++] = info->hwlrca_lo;
|
|
|
|
|
action[len++] = info->hwlrca_hi;
|
|
|
|
|
|
|
|
|
|
next_id = info->context_idx + 1;
|
2021-10-14 10:19:48 -07:00
|
|
|
for_each_child(ce, child) {
|
2022-04-12 15:59:55 -07:00
|
|
|
GEM_BUG_ON(next_id++ != child->guc_id.id);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* NB: GuC interface supports 64 bit LRCA even though i915/HW
|
|
|
|
|
* only supports 32 bit currently.
|
|
|
|
|
*/
|
|
|
|
|
action[len++] = lower_32_bits(child->lrc.lrca);
|
|
|
|
|
action[len++] = upper_32_bits(child->lrc.lrca);
|
2021-10-14 10:19:48 -07:00
|
|
|
}
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
GEM_BUG_ON(len > ARRAY_SIZE(action));
|
|
|
|
|
|
2021-10-14 10:19:48 -07:00
|
|
|
return guc_submission_send_busy_loop(guc, action, len, 0, loop);
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static int __guc_action_register_context_v69(struct intel_guc *guc,
|
|
|
|
|
u32 guc_id,
|
|
|
|
|
u32 offset,
|
|
|
|
|
bool loop)
|
|
|
|
|
{
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_REGISTER_CONTEXT,
|
|
|
|
|
guc_id,
|
|
|
|
|
offset,
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
|
|
|
|
|
0, loop);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int __guc_action_register_context_v70(struct intel_guc *guc,
|
|
|
|
|
struct guc_ctxt_registration_info *info,
|
|
|
|
|
bool loop)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_REGISTER_CONTEXT,
|
2022-04-12 15:59:55 -07:00
|
|
|
info->flags,
|
|
|
|
|
info->context_idx,
|
|
|
|
|
info->engine_class,
|
|
|
|
|
info->engine_submit_mask,
|
|
|
|
|
info->wq_desc_lo,
|
|
|
|
|
info->wq_desc_hi,
|
|
|
|
|
info->wq_base_lo,
|
|
|
|
|
info->wq_base_hi,
|
|
|
|
|
info->wq_size,
|
|
|
|
|
info->hwlrca_lo,
|
|
|
|
|
info->hwlrca_hi,
|
2021-07-21 14:50:49 -07:00
|
|
|
};
|
|
|
|
|
|
2021-07-21 14:50:58 -07:00
|
|
|
return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
|
2021-07-26 17:23:23 -07:00
|
|
|
0, loop);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static void prepare_context_registration_info_v69(struct intel_context *ce);
|
|
|
|
|
static void prepare_context_registration_info_v70(struct intel_context *ce,
|
|
|
|
|
struct guc_ctxt_registration_info *info);
|
2022-03-01 16:33:54 -08:00
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static int
|
|
|
|
|
register_context_v69(struct intel_guc *guc, struct intel_context *ce, bool loop)
|
|
|
|
|
{
|
|
|
|
|
u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool_v69) +
|
|
|
|
|
ce->guc_id.id * sizeof(struct guc_lrc_desc_v69);
|
|
|
|
|
|
|
|
|
|
prepare_context_registration_info_v69(ce);
|
|
|
|
|
|
|
|
|
|
if (intel_context_is_parent(ce))
|
|
|
|
|
return __guc_action_register_multi_lrc_v69(guc, ce, ce->guc_id.id,
|
|
|
|
|
offset, loop);
|
|
|
|
|
else
|
|
|
|
|
return __guc_action_register_context_v69(guc, ce->guc_id.id,
|
|
|
|
|
offset, loop);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int
|
|
|
|
|
register_context_v70(struct intel_guc *guc, struct intel_context *ce, bool loop)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
2022-04-12 15:59:55 -07:00
|
|
|
struct guc_ctxt_registration_info info;
|
2022-07-18 16:07:32 -07:00
|
|
|
|
|
|
|
|
prepare_context_registration_info_v70(ce, &info);
|
|
|
|
|
|
|
|
|
|
if (intel_context_is_parent(ce))
|
|
|
|
|
return __guc_action_register_multi_lrc_v70(guc, ce, &info, loop);
|
|
|
|
|
else
|
|
|
|
|
return __guc_action_register_context_v70(guc, &info, loop);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int register_context(struct intel_context *ce, bool loop)
|
|
|
|
|
{
|
2021-07-21 14:50:49 -07:00
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
2021-07-26 17:23:47 -07:00
|
|
|
int ret;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:48 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-21 14:51:01 -07:00
|
|
|
trace_intel_context_register(ce);
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
if (guc->fw.major_ver_found >= 70)
|
|
|
|
|
ret = register_context_v70(guc, ce, loop);
|
2021-10-14 10:19:48 -07:00
|
|
|
else
|
2022-07-18 16:07:32 -07:00
|
|
|
ret = register_context_v69(guc, ce, loop);
|
|
|
|
|
|
2021-09-09 09:47:38 -07:00
|
|
|
if (likely(!ret)) {
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
2021-07-26 17:23:47 -07:00
|
|
|
set_context_registered(ce);
|
2021-09-09 09:47:38 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
2022-04-12 15:59:55 -07:00
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
if (guc->fw.major_ver_found >= 70)
|
|
|
|
|
guc_context_policy_init_v70(ce, loop);
|
2021-09-09 09:47:38 -07:00
|
|
|
}
|
2021-07-26 17:23:47 -07:00
|
|
|
|
|
|
|
|
return ret;
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int __guc_action_deregister_context(struct intel_guc *guc,
|
2021-09-09 09:47:23 -07:00
|
|
|
u32 guc_id)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
|
|
|
|
|
guc_id,
|
|
|
|
|
};
|
|
|
|
|
|
2021-07-21 14:50:58 -07:00
|
|
|
return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
|
|
|
|
|
G2H_LEN_DW_DEREGISTER_CONTEXT,
|
2021-09-09 09:47:23 -07:00
|
|
|
true);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:23 -07:00
|
|
|
static int deregister_context(struct intel_context *ce, u32 guc_id)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
|
2021-10-14 10:19:48 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-21 14:51:01 -07:00
|
|
|
trace_intel_context_deregister(ce);
|
|
|
|
|
|
2021-09-09 09:47:23 -07:00
|
|
|
return __guc_action_deregister_context(guc, guc_id);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
static inline void clear_children_join_go_memory(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct parent_scratch *ps = __get_parent_scratch(ce);
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
ps->go.semaphore = 0;
|
|
|
|
|
for (i = 0; i < ce->parallel.number_children + 1; ++i)
|
|
|
|
|
ps->join[i].semaphore = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline u32 get_children_go_value(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return __get_parent_scratch(ce)->go.semaphore;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline u32 get_children_join_value(struct intel_context *ce,
|
|
|
|
|
u8 child_index)
|
|
|
|
|
{
|
|
|
|
|
return __get_parent_scratch(ce)->join[child_index].semaphore;
|
|
|
|
|
}
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
struct context_policy {
|
|
|
|
|
u32 count;
|
|
|
|
|
struct guc_update_context_policy h2g;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static u32 __guc_context_policy_action_size(struct context_policy *policy)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
2022-04-12 15:59:55 -07:00
|
|
|
size_t bytes = sizeof(policy->h2g.header) +
|
|
|
|
|
(sizeof(policy->h2g.klv[0]) * policy->count);
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
return bytes / sizeof(u32);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void __guc_context_policy_start_klv(struct context_policy *policy, u16 guc_id)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
2022-04-12 15:59:55 -07:00
|
|
|
policy->h2g.header.action = INTEL_GUC_ACTION_HOST2GUC_UPDATE_CONTEXT_POLICIES;
|
|
|
|
|
policy->h2g.header.ctx_id = guc_id;
|
|
|
|
|
policy->count = 0;
|
|
|
|
|
}
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
#define MAKE_CONTEXT_POLICY_ADD(func, id) \
|
|
|
|
|
static void __guc_context_policy_add_##func(struct context_policy *policy, u32 data) \
|
|
|
|
|
{ \
|
|
|
|
|
GEM_BUG_ON(policy->count >= GUC_CONTEXT_POLICIES_KLV_NUM_IDS); \
|
|
|
|
|
policy->h2g.klv[policy->count].kl = \
|
|
|
|
|
FIELD_PREP(GUC_KLV_0_KEY, GUC_CONTEXT_POLICIES_KLV_ID_##id) | \
|
|
|
|
|
FIELD_PREP(GUC_KLV_0_LEN, 1); \
|
|
|
|
|
policy->h2g.klv[policy->count].value = data; \
|
|
|
|
|
policy->count++; \
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
MAKE_CONTEXT_POLICY_ADD(execution_quantum, EXECUTION_QUANTUM)
|
|
|
|
|
MAKE_CONTEXT_POLICY_ADD(preemption_timeout, PREEMPTION_TIMEOUT)
|
|
|
|
|
MAKE_CONTEXT_POLICY_ADD(priority, SCHEDULING_PRIORITY)
|
|
|
|
|
MAKE_CONTEXT_POLICY_ADD(preempt_to_idle, PREEMPT_TO_IDLE_ON_QUANTUM_EXPIRY)
|
|
|
|
|
|
|
|
|
|
#undef MAKE_CONTEXT_POLICY_ADD
|
|
|
|
|
|
|
|
|
|
static int __guc_context_set_context_policies(struct intel_guc *guc,
|
|
|
|
|
struct context_policy *policy,
|
|
|
|
|
bool loop)
|
|
|
|
|
{
|
|
|
|
|
return guc_submission_send_busy_loop(guc, (u32 *)&policy->h2g,
|
|
|
|
|
__guc_context_policy_action_size(policy),
|
|
|
|
|
0, loop);
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static int guc_context_policy_init_v70(struct intel_context *ce, bool loop)
|
2022-04-12 15:59:55 -07:00
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = ce->engine;
|
|
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
|
|
|
|
struct context_policy policy;
|
|
|
|
|
u32 execution_quantum;
|
|
|
|
|
u32 preemption_timeout;
|
|
|
|
|
bool missing = false;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
int ret;
|
2021-07-26 17:23:42 -07:00
|
|
|
|
2021-07-26 17:23:35 -07:00
|
|
|
/* NB: For both of these, zero means disabled. */
|
2022-04-12 15:59:55 -07:00
|
|
|
execution_quantum = engine->props.timeslice_duration_ms * 1000;
|
|
|
|
|
preemption_timeout = engine->props.preempt_timeout_ms * 1000;
|
|
|
|
|
|
|
|
|
|
__guc_context_policy_start_klv(&policy, ce->guc_id.id);
|
|
|
|
|
|
|
|
|
|
__guc_context_policy_add_priority(&policy, ce->guc_state.prio);
|
|
|
|
|
__guc_context_policy_add_execution_quantum(&policy, execution_quantum);
|
|
|
|
|
__guc_context_policy_add_preemption_timeout(&policy, preemption_timeout);
|
|
|
|
|
|
|
|
|
|
if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
|
|
|
|
|
__guc_context_policy_add_preempt_to_idle(&policy, 1);
|
|
|
|
|
|
|
|
|
|
ret = __guc_context_set_context_policies(guc, &policy, loop);
|
|
|
|
|
missing = ret != 0;
|
|
|
|
|
|
|
|
|
|
if (!missing && intel_context_is_parent(ce)) {
|
|
|
|
|
struct intel_context *child;
|
|
|
|
|
|
|
|
|
|
for_each_child(ce, child) {
|
|
|
|
|
__guc_context_policy_start_klv(&policy, child->guc_id.id);
|
|
|
|
|
|
|
|
|
|
if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
|
|
|
|
|
__guc_context_policy_add_preempt_to_idle(&policy, 1);
|
|
|
|
|
|
|
|
|
|
child->guc_state.prio = ce->guc_state.prio;
|
|
|
|
|
__guc_context_policy_add_priority(&policy, ce->guc_state.prio);
|
|
|
|
|
__guc_context_policy_add_execution_quantum(&policy, execution_quantum);
|
|
|
|
|
__guc_context_policy_add_preemption_timeout(&policy, preemption_timeout);
|
|
|
|
|
|
|
|
|
|
ret = __guc_context_set_context_policies(guc, &policy, loop);
|
|
|
|
|
if (ret) {
|
|
|
|
|
missing = true;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
|
|
|
|
if (missing)
|
|
|
|
|
set_context_policy_required(ce);
|
|
|
|
|
else
|
|
|
|
|
clr_context_policy_required(ce);
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
return ret;
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static void guc_context_policy_init_v69(struct intel_engine_cs *engine,
|
|
|
|
|
struct guc_lrc_desc_v69 *desc)
|
|
|
|
|
{
|
|
|
|
|
desc->policy_flags = 0;
|
|
|
|
|
|
|
|
|
|
if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
|
|
|
|
|
desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE_V69;
|
|
|
|
|
|
|
|
|
|
/* NB: For both of these, zero means disabled. */
|
|
|
|
|
desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
|
|
|
|
|
desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
|
|
|
|
|
}
|
|
|
|
|
|
2022-05-04 16:46:36 -07:00
|
|
|
static u32 map_guc_prio_to_lrc_desc_prio(u8 prio)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* this matches the mapping we do in map_i915_prio_to_guc_prio()
|
|
|
|
|
* (e.g. prio < I915_PRIORITY_NORMAL maps to GUC_CLIENT_PRIORITY_NORMAL)
|
|
|
|
|
*/
|
|
|
|
|
switch (prio) {
|
|
|
|
|
default:
|
|
|
|
|
MISSING_CASE(prio);
|
|
|
|
|
fallthrough;
|
|
|
|
|
case GUC_CLIENT_PRIORITY_KMD_NORMAL:
|
|
|
|
|
return GEN12_CTX_PRIORITY_NORMAL;
|
|
|
|
|
case GUC_CLIENT_PRIORITY_NORMAL:
|
|
|
|
|
return GEN12_CTX_PRIORITY_LOW;
|
|
|
|
|
case GUC_CLIENT_PRIORITY_HIGH:
|
|
|
|
|
case GUC_CLIENT_PRIORITY_KMD_HIGH:
|
|
|
|
|
return GEN12_CTX_PRIORITY_HIGH;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
static void prepare_context_registration_info_v69(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = ce->engine;
|
|
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
|
|
|
|
u32 ctx_id = ce->guc_id.id;
|
|
|
|
|
struct guc_lrc_desc_v69 *desc;
|
|
|
|
|
struct intel_context *child;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!engine->mask);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Ensure LRC + CT vmas are is same region as write barrier is done
|
|
|
|
|
* based on CT vma region.
|
|
|
|
|
*/
|
|
|
|
|
GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
|
|
|
|
|
i915_gem_object_is_lmem(ce->ring->vma->obj));
|
|
|
|
|
|
|
|
|
|
desc = __get_lrc_desc_v69(guc, ctx_id);
|
|
|
|
|
desc->engine_class = engine_class_to_guc_class(engine->class);
|
|
|
|
|
desc->engine_submit_mask = engine->logical_mask;
|
|
|
|
|
desc->hw_context_desc = ce->lrc.lrca;
|
|
|
|
|
desc->priority = ce->guc_state.prio;
|
|
|
|
|
desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
|
|
|
|
|
guc_context_policy_init_v69(engine, desc);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If context is a parent, we need to register a process descriptor
|
|
|
|
|
* describing a work queue and register all child contexts.
|
|
|
|
|
*/
|
|
|
|
|
if (intel_context_is_parent(ce)) {
|
|
|
|
|
struct guc_process_desc_v69 *pdesc;
|
|
|
|
|
|
|
|
|
|
ce->parallel.guc.wqi_tail = 0;
|
|
|
|
|
ce->parallel.guc.wqi_head = 0;
|
|
|
|
|
|
|
|
|
|
desc->process_desc = i915_ggtt_offset(ce->state) +
|
|
|
|
|
__get_parent_scratch_offset(ce);
|
|
|
|
|
desc->wq_addr = i915_ggtt_offset(ce->state) +
|
|
|
|
|
__get_wq_offset(ce);
|
|
|
|
|
desc->wq_size = WQ_SIZE;
|
|
|
|
|
|
|
|
|
|
pdesc = __get_process_desc_v69(ce);
|
|
|
|
|
memset(pdesc, 0, sizeof(*(pdesc)));
|
|
|
|
|
pdesc->stage_id = ce->guc_id.id;
|
|
|
|
|
pdesc->wq_base_addr = desc->wq_addr;
|
|
|
|
|
pdesc->wq_size_bytes = desc->wq_size;
|
|
|
|
|
pdesc->wq_status = WQ_STATUS_ACTIVE;
|
|
|
|
|
|
|
|
|
|
ce->parallel.guc.wq_head = &pdesc->head;
|
|
|
|
|
ce->parallel.guc.wq_tail = &pdesc->tail;
|
|
|
|
|
ce->parallel.guc.wq_status = &pdesc->wq_status;
|
|
|
|
|
|
|
|
|
|
for_each_child(ce, child) {
|
|
|
|
|
desc = __get_lrc_desc_v69(guc, child->guc_id.id);
|
|
|
|
|
|
|
|
|
|
desc->engine_class =
|
|
|
|
|
engine_class_to_guc_class(engine->class);
|
|
|
|
|
desc->hw_context_desc = child->lrc.lrca;
|
|
|
|
|
desc->priority = ce->guc_state.prio;
|
|
|
|
|
desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
|
|
|
|
|
guc_context_policy_init_v69(engine, desc);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
clear_children_join_go_memory(ce);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void prepare_context_registration_info_v70(struct intel_context *ce,
|
|
|
|
|
struct guc_ctxt_registration_info *info)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = ce->engine;
|
|
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
2022-03-01 16:33:55 -08:00
|
|
|
u32 ctx_id = ce->guc_id.id;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
GEM_BUG_ON(!engine->mask);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Ensure LRC + CT vmas are is same region as write barrier is done
|
|
|
|
|
* based on CT vma region.
|
|
|
|
|
*/
|
|
|
|
|
GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
|
|
|
|
|
i915_gem_object_is_lmem(ce->ring->vma->obj));
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
memset(info, 0, sizeof(*info));
|
|
|
|
|
info->context_idx = ctx_id;
|
|
|
|
|
info->engine_class = engine_class_to_guc_class(engine->class);
|
|
|
|
|
info->engine_submit_mask = engine->logical_mask;
|
|
|
|
|
/*
|
|
|
|
|
* NB: GuC interface supports 64 bit LRCA even though i915/HW
|
|
|
|
|
* only supports 32 bit currently.
|
|
|
|
|
*/
|
|
|
|
|
info->hwlrca_lo = lower_32_bits(ce->lrc.lrca);
|
|
|
|
|
info->hwlrca_hi = upper_32_bits(ce->lrc.lrca);
|
2022-05-04 16:46:36 -07:00
|
|
|
if (engine->flags & I915_ENGINE_HAS_EU_PRIORITY)
|
|
|
|
|
info->hwlrca_lo |= map_guc_prio_to_lrc_desc_prio(ce->guc_state.prio);
|
2022-04-12 15:59:55 -07:00
|
|
|
info->flags = CONTEXT_REGISTRATION_FLAG_KMD;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:48 -07:00
|
|
|
/*
|
|
|
|
|
* If context is a parent, we need to register a process descriptor
|
|
|
|
|
* describing a work queue and register all child contexts.
|
|
|
|
|
*/
|
|
|
|
|
if (intel_context_is_parent(ce)) {
|
2022-04-12 15:59:55 -07:00
|
|
|
struct guc_sched_wq_desc *wq_desc;
|
|
|
|
|
u64 wq_desc_offset, wq_base_offset;
|
2021-10-14 10:19:48 -07:00
|
|
|
|
|
|
|
|
ce->parallel.guc.wqi_tail = 0;
|
|
|
|
|
ce->parallel.guc.wqi_head = 0;
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
wq_desc_offset = i915_ggtt_offset(ce->state) +
|
|
|
|
|
__get_parent_scratch_offset(ce);
|
|
|
|
|
wq_base_offset = i915_ggtt_offset(ce->state) +
|
|
|
|
|
__get_wq_offset(ce);
|
|
|
|
|
info->wq_desc_lo = lower_32_bits(wq_desc_offset);
|
|
|
|
|
info->wq_desc_hi = upper_32_bits(wq_desc_offset);
|
|
|
|
|
info->wq_base_lo = lower_32_bits(wq_base_offset);
|
|
|
|
|
info->wq_base_hi = upper_32_bits(wq_base_offset);
|
|
|
|
|
info->wq_size = WQ_SIZE;
|
2021-10-14 10:19:48 -07:00
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
wq_desc = __get_wq_desc_v70(ce);
|
2022-04-12 15:59:55 -07:00
|
|
|
memset(wq_desc, 0, sizeof(*wq_desc));
|
|
|
|
|
wq_desc->wq_status = WQ_STATUS_ACTIVE;
|
2021-10-14 10:19:59 -07:00
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
ce->parallel.guc.wq_head = &wq_desc->head;
|
|
|
|
|
ce->parallel.guc.wq_tail = &wq_desc->tail;
|
|
|
|
|
ce->parallel.guc.wq_status = &wq_desc->wq_status;
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
clear_children_join_go_memory(ce);
|
2021-10-14 10:19:48 -07:00
|
|
|
}
|
2022-03-01 16:33:53 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int try_context_registration(struct intel_context *ce, bool loop)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = ce->engine;
|
|
|
|
|
struct intel_runtime_pm *runtime_pm = engine->uncore->rpm;
|
|
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
|
|
|
|
intel_wakeref_t wakeref;
|
2022-03-01 16:33:55 -08:00
|
|
|
u32 ctx_id = ce->guc_id.id;
|
2022-03-01 16:33:53 -08:00
|
|
|
bool context_registered;
|
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!sched_state_is_init(ce));
|
|
|
|
|
|
2022-03-01 16:33:55 -08:00
|
|
|
context_registered = ctx_id_mapped(guc, ctx_id);
|
2022-03-01 16:33:53 -08:00
|
|
|
|
2022-03-01 16:33:55 -08:00
|
|
|
clr_ctx_id_mapping(guc, ctx_id);
|
|
|
|
|
set_ctx_id_mapping(guc, ctx_id, ce);
|
2021-10-14 10:19:48 -07:00
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
/*
|
|
|
|
|
* The context_lookup xarray is used to determine if the hardware
|
|
|
|
|
* context is currently registered. There are two cases in which it
|
|
|
|
|
* could be registered either the guc_id has been stolen from another
|
|
|
|
|
* context or the lrc descriptor address of this context has changed. In
|
|
|
|
|
* either case the context needs to be deregistered with the GuC before
|
|
|
|
|
* registering this context.
|
|
|
|
|
*/
|
|
|
|
|
if (context_registered) {
|
2021-09-09 09:47:23 -07:00
|
|
|
bool disabled;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
2021-07-21 14:51:01 -07:00
|
|
|
trace_intel_context_steal_guc_id(ce);
|
2021-09-09 09:47:23 -07:00
|
|
|
GEM_BUG_ON(!loop);
|
|
|
|
|
|
|
|
|
|
/* Seal race with Reset */
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
|
|
|
|
disabled = submission_disabled(guc);
|
|
|
|
|
if (likely(!disabled)) {
|
2021-07-26 17:23:23 -07:00
|
|
|
set_context_wait_for_deregister_to_register(ce);
|
|
|
|
|
intel_context_get(ce);
|
2021-09-09 09:47:23 -07:00
|
|
|
}
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
if (unlikely(disabled)) {
|
2022-03-01 16:33:55 -08:00
|
|
|
clr_ctx_id_mapping(guc, ctx_id);
|
2021-09-09 09:47:23 -07:00
|
|
|
return 0; /* Will get registered later */
|
2021-07-26 17:23:23 -07:00
|
|
|
}
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If stealing the guc_id, this ce has the same guc_id as the
|
|
|
|
|
* context whose guc_id was stolen.
|
|
|
|
|
*/
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref)
|
2021-09-09 09:47:42 -07:00
|
|
|
ret = deregister_context(ce, ce->guc_id.id);
|
2021-09-09 09:47:23 -07:00
|
|
|
if (unlikely(ret == -ENODEV))
|
2021-07-26 17:23:23 -07:00
|
|
|
ret = 0; /* Will get registered later */
|
2021-07-21 14:50:49 -07:00
|
|
|
} else {
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref)
|
2021-07-26 17:23:23 -07:00
|
|
|
ret = register_context(ce, loop);
|
2021-09-09 09:47:35 -07:00
|
|
|
if (unlikely(ret == -EBUSY)) {
|
2022-03-01 16:33:55 -08:00
|
|
|
clr_ctx_id_mapping(guc, ctx_id);
|
2021-09-09 09:47:35 -07:00
|
|
|
} else if (unlikely(ret == -ENODEV)) {
|
2022-03-01 16:33:55 -08:00
|
|
|
clr_ctx_id_mapping(guc, ctx_id);
|
2021-07-26 17:23:23 -07:00
|
|
|
ret = 0; /* Will get registered later */
|
2021-09-09 09:47:35 -07:00
|
|
|
}
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return ret;
|
2021-01-12 18:12:35 -08:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
static int __guc_context_pre_pin(struct intel_context *ce,
|
|
|
|
|
struct intel_engine_cs *engine,
|
|
|
|
|
struct i915_gem_ww_ctx *ww,
|
|
|
|
|
void **vaddr)
|
2021-01-12 18:12:35 -08:00
|
|
|
{
|
2021-07-26 17:23:16 -07:00
|
|
|
return lrc_pre_pin(ce, engine, ww, vaddr);
|
2021-01-12 18:12:35 -08:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
static int __guc_context_pin(struct intel_context *ce,
|
|
|
|
|
struct intel_engine_cs *engine,
|
|
|
|
|
void *vaddr)
|
2021-01-12 18:12:35 -08:00
|
|
|
{
|
2021-07-21 14:50:49 -07:00
|
|
|
if (i915_ggtt_offset(ce->state) !=
|
|
|
|
|
(ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
|
|
|
|
|
set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* GuC context gets pinned in guc_request_alloc. See that function for
|
|
|
|
|
* explaination of why.
|
|
|
|
|
*/
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
return lrc_pin(ce, engine, vaddr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_context_pre_pin(struct intel_context *ce,
|
|
|
|
|
struct i915_gem_ww_ctx *ww,
|
|
|
|
|
void **vaddr)
|
|
|
|
|
{
|
|
|
|
|
return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_context_pin(struct intel_context *ce, void *vaddr)
|
|
|
|
|
{
|
2021-10-14 10:19:43 -07:00
|
|
|
int ret = __guc_context_pin(ce, ce->engine, vaddr);
|
|
|
|
|
|
|
|
|
|
if (likely(!ret && !intel_context_is_barrier(ce)))
|
|
|
|
|
intel_engine_pm_get(ce->engine);
|
|
|
|
|
|
|
|
|
|
return ret;
|
2021-01-12 18:12:35 -08:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static void guc_context_unpin(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
|
|
|
|
|
unpin_guc_id(guc, ce);
|
|
|
|
|
lrc_unpin(ce);
|
2021-10-14 10:19:43 -07:00
|
|
|
|
|
|
|
|
if (likely(!intel_context_is_barrier(ce)))
|
|
|
|
|
intel_engine_pm_put_async(ce->engine);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_context_post_unpin(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lrc_post_unpin(ce);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
static void __guc_context_sched_enable(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
|
2021-09-09 09:47:42 -07:00
|
|
|
ce->guc_id.id,
|
2021-07-26 17:23:40 -07:00
|
|
|
GUC_CONTEXT_ENABLE
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
trace_intel_context_sched_enable(ce);
|
|
|
|
|
|
|
|
|
|
guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
|
|
|
|
|
G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
static void __guc_context_sched_disable(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce,
|
|
|
|
|
u16 guc_id)
|
|
|
|
|
{
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
|
2021-09-09 09:47:42 -07:00
|
|
|
guc_id, /* ce->guc_id.id not stable */
|
2021-07-21 14:50:51 -07:00
|
|
|
GUC_CONTEXT_DISABLE
|
|
|
|
|
};
|
|
|
|
|
|
2022-03-01 16:33:52 -08:00
|
|
|
GEM_BUG_ON(guc_id == GUC_INVALID_CONTEXT_ID);
|
2021-07-21 14:50:51 -07:00
|
|
|
|
2021-10-14 10:19:49 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-21 14:51:01 -07:00
|
|
|
trace_intel_context_sched_disable(ce);
|
2021-07-21 14:50:51 -07:00
|
|
|
|
2021-07-21 14:50:58 -07:00
|
|
|
guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
|
|
|
|
|
G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
|
2021-07-21 14:50:51 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
static void guc_blocked_fence_complete(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
|
2021-09-09 09:47:37 -07:00
|
|
|
if (!i915_sw_fence_done(&ce->guc_state.blocked))
|
|
|
|
|
i915_sw_fence_complete(&ce->guc_state.blocked);
|
2021-07-26 17:23:40 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_blocked_fence_reinit(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
2021-09-09 09:47:37 -07:00
|
|
|
GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_state.blocked));
|
2021-07-26 17:23:40 -07:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* This fence is always complete unless a pending schedule disable is
|
|
|
|
|
* outstanding. We arm the fence here and complete it when we receive
|
|
|
|
|
* the pending schedule disable complete message.
|
|
|
|
|
*/
|
2021-09-09 09:47:37 -07:00
|
|
|
i915_sw_fence_fini(&ce->guc_state.blocked);
|
|
|
|
|
i915_sw_fence_reinit(&ce->guc_state.blocked);
|
|
|
|
|
i915_sw_fence_await(&ce->guc_state.blocked);
|
|
|
|
|
i915_sw_fence_commit(&ce->guc_state.blocked);
|
2021-07-26 17:23:40 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
static u16 prep_context_pending_disable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
|
|
|
|
|
set_context_pending_disable(ce);
|
|
|
|
|
clr_context_enabled(ce);
|
2021-07-26 17:23:40 -07:00
|
|
|
guc_blocked_fence_reinit(ce);
|
2021-07-26 17:23:23 -07:00
|
|
|
intel_context_get(ce);
|
2021-07-21 14:50:51 -07:00
|
|
|
|
2021-09-09 09:47:42 -07:00
|
|
|
return ce->guc_id.id;
|
2021-07-21 14:50:51 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm;
|
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
u16 guc_id;
|
|
|
|
|
bool enabled;
|
|
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
incr_context_blocked(ce);
|
|
|
|
|
|
|
|
|
|
enabled = context_enabled(ce);
|
|
|
|
|
if (unlikely(!enabled || submission_disabled(guc))) {
|
|
|
|
|
if (enabled)
|
|
|
|
|
clr_context_enabled(ce);
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
2021-09-09 09:47:37 -07:00
|
|
|
return &ce->guc_state.blocked;
|
2021-07-26 17:23:40 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* We add +2 here as the schedule disable complete CTB handler calls
|
|
|
|
|
* intel_context_sched_disable_unpin (-2 to pin_count).
|
|
|
|
|
*/
|
|
|
|
|
atomic_add(2, &ce->pin_count);
|
|
|
|
|
|
|
|
|
|
guc_id = prep_context_pending_disable(ce);
|
|
|
|
|
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref)
|
|
|
|
|
__guc_context_sched_disable(guc, ce, guc_id);
|
|
|
|
|
|
2021-09-09 09:47:37 -07:00
|
|
|
return &ce->guc_state.blocked;
|
2021-07-26 17:23:40 -07:00
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:30 -07:00
|
|
|
#define SCHED_STATE_MULTI_BLOCKED_MASK \
|
|
|
|
|
(SCHED_STATE_BLOCKED_MASK & ~SCHED_STATE_BLOCKED)
|
|
|
|
|
#define SCHED_STATE_NO_UNBLOCK \
|
|
|
|
|
(SCHED_STATE_MULTI_BLOCKED_MASK | \
|
|
|
|
|
SCHED_STATE_PENDING_DISABLE | \
|
|
|
|
|
SCHED_STATE_BANNED)
|
|
|
|
|
|
|
|
|
|
static bool context_cant_unblock(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
|
|
|
|
|
return (ce->guc_state.sched_state & SCHED_STATE_NO_UNBLOCK) ||
|
|
|
|
|
context_guc_id_invalid(ce) ||
|
2022-03-01 16:33:50 -08:00
|
|
|
!ctx_id_mapped(ce_to_guc(ce), ce->guc_id.id) ||
|
2021-09-09 09:47:30 -07:00
|
|
|
!intel_context_is_pinned(ce);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
static void guc_context_unblock(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm;
|
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
bool enable;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(context_enabled(ce));
|
2021-10-14 10:19:54 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-26 17:23:40 -07:00
|
|
|
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
if (unlikely(submission_disabled(guc) ||
|
2021-09-09 09:47:30 -07:00
|
|
|
context_cant_unblock(ce))) {
|
2021-07-26 17:23:40 -07:00
|
|
|
enable = false;
|
|
|
|
|
} else {
|
|
|
|
|
enable = true;
|
|
|
|
|
set_context_pending_enable(ce);
|
|
|
|
|
set_context_enabled(ce);
|
|
|
|
|
intel_context_get(ce);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
decr_context_blocked(ce);
|
|
|
|
|
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
if (enable) {
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref)
|
|
|
|
|
__guc_context_sched_enable(guc, ce);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_context_cancel_request(struct intel_context *ce,
|
|
|
|
|
struct i915_request *rq)
|
|
|
|
|
{
|
2021-10-14 10:19:54 -07:00
|
|
|
struct intel_context *block_context =
|
|
|
|
|
request_to_scheduling_context(rq);
|
|
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
if (i915_sw_fence_signaled(&rq->submit)) {
|
2021-09-09 09:47:33 -07:00
|
|
|
struct i915_sw_fence *fence;
|
2021-07-26 17:23:40 -07:00
|
|
|
|
2021-09-09 09:47:33 -07:00
|
|
|
intel_context_get(ce);
|
2021-10-14 10:19:54 -07:00
|
|
|
fence = guc_context_block(block_context);
|
2021-07-26 17:23:40 -07:00
|
|
|
i915_sw_fence_wait(fence);
|
|
|
|
|
if (!i915_request_completed(rq)) {
|
|
|
|
|
__i915_request_skip(rq);
|
|
|
|
|
guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
|
|
|
|
|
true);
|
|
|
|
|
}
|
2021-09-09 09:47:27 -07:00
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
guc_context_unblock(block_context);
|
2021-09-09 09:47:33 -07:00
|
|
|
intel_context_put(ce);
|
2021-07-26 17:23:40 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:39 -07:00
|
|
|
static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
|
|
|
|
|
u16 guc_id,
|
|
|
|
|
u32 preemption_timeout)
|
|
|
|
|
{
|
2022-07-18 16:07:32 -07:00
|
|
|
if (guc->fw.major_ver_found >= 70) {
|
|
|
|
|
struct context_policy policy;
|
2021-07-26 17:23:39 -07:00
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
__guc_context_policy_start_klv(&policy, guc_id);
|
|
|
|
|
__guc_context_policy_add_preemption_timeout(&policy, preemption_timeout);
|
|
|
|
|
__guc_context_set_context_policies(guc, &policy, true);
|
|
|
|
|
} else {
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_V69_SET_CONTEXT_PREEMPTION_TIMEOUT,
|
|
|
|
|
guc_id,
|
|
|
|
|
preemption_timeout
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
|
|
|
|
|
}
|
2021-07-26 17:23:39 -07:00
|
|
|
}
|
|
|
|
|
|
2022-05-27 08:24:52 +01:00
|
|
|
static void
|
|
|
|
|
guc_context_revoke(struct intel_context *ce, struct i915_request *rq,
|
|
|
|
|
unsigned int preempt_timeout_ms)
|
2021-07-26 17:23:39 -07:00
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
struct intel_runtime_pm *runtime_pm =
|
|
|
|
|
&ce->engine->gt->i915->runtime_pm;
|
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
2021-10-14 10:19:54 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
|
|
|
|
|
2021-07-26 17:23:39 -07:00
|
|
|
guc_flush_submissions(guc);
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
|
|
|
|
set_context_banned(ce);
|
|
|
|
|
|
|
|
|
|
if (submission_disabled(guc) ||
|
|
|
|
|
(!context_enabled(ce) && !context_pending_disable(ce))) {
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
guc_cancel_context_requests(ce);
|
|
|
|
|
intel_engine_signal_breadcrumbs(ce->engine);
|
|
|
|
|
} else if (!context_pending_disable(ce)) {
|
|
|
|
|
u16 guc_id;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* We add +2 here as the schedule disable complete CTB handler
|
|
|
|
|
* calls intel_context_sched_disable_unpin (-2 to pin_count).
|
|
|
|
|
*/
|
|
|
|
|
atomic_add(2, &ce->pin_count);
|
|
|
|
|
|
|
|
|
|
guc_id = prep_context_pending_disable(ce);
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* In addition to disabling scheduling, set the preemption
|
|
|
|
|
* timeout to the minimum value (1 us) so the banned context
|
|
|
|
|
* gets kicked off the HW ASAP.
|
|
|
|
|
*/
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref) {
|
2022-05-27 08:24:52 +01:00
|
|
|
__guc_context_set_preemption_timeout(guc, guc_id,
|
|
|
|
|
preempt_timeout_ms);
|
2021-07-26 17:23:39 -07:00
|
|
|
__guc_context_sched_disable(guc, ce, guc_id);
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
if (!context_guc_id_invalid(ce))
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref)
|
|
|
|
|
__guc_context_set_preemption_timeout(guc,
|
2021-09-09 09:47:42 -07:00
|
|
|
ce->guc_id.id,
|
2022-05-27 08:24:52 +01:00
|
|
|
preempt_timeout_ms);
|
2021-07-26 17:23:39 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
static void guc_context_sched_disable(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
unsigned long flags;
|
2021-07-26 17:23:39 -07:00
|
|
|
struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
|
2021-07-21 14:50:51 -07:00
|
|
|
intel_wakeref_t wakeref;
|
2021-07-26 17:23:39 -07:00
|
|
|
u16 guc_id;
|
2021-07-21 14:50:51 -07:00
|
|
|
|
2021-10-14 10:19:49 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
|
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
2021-07-21 14:50:53 -07:00
|
|
|
|
|
|
|
|
/*
|
2021-09-09 09:47:40 -07:00
|
|
|
* We have to check if the context has been disabled by another thread,
|
|
|
|
|
* check if submssion has been disabled to seal a race with reset and
|
|
|
|
|
* finally check if any more requests have been committed to the
|
|
|
|
|
* context ensursing that a request doesn't slip through the
|
|
|
|
|
* 'context_pending_disable' fence.
|
2021-07-21 14:50:53 -07:00
|
|
|
*/
|
2021-09-09 09:47:40 -07:00
|
|
|
if (unlikely(!context_enabled(ce) || submission_disabled(guc) ||
|
|
|
|
|
context_has_committed_requests(ce))) {
|
2021-09-09 09:47:38 -07:00
|
|
|
clr_context_enabled(ce);
|
2021-07-26 17:23:39 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
goto unpin;
|
|
|
|
|
}
|
2021-07-21 14:50:51 -07:00
|
|
|
guc_id = prep_context_pending_disable(ce);
|
2021-07-21 14:50:53 -07:00
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref)
|
|
|
|
|
__guc_context_sched_disable(guc, ce, guc_id);
|
|
|
|
|
|
|
|
|
|
return;
|
|
|
|
|
unpin:
|
|
|
|
|
intel_context_sched_disable_unpin(ce);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static inline void guc_lrc_desc_unpin(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
2021-10-14 10:19:42 -07:00
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
bool disabled;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:42 -07:00
|
|
|
GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
|
2022-03-01 16:33:50 -08:00
|
|
|
GEM_BUG_ON(!ctx_id_mapped(guc, ce->guc_id.id));
|
2021-09-09 09:47:42 -07:00
|
|
|
GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
|
2021-07-21 14:50:51 -07:00
|
|
|
GEM_BUG_ON(context_enabled(ce));
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2021-10-14 10:19:42 -07:00
|
|
|
/* Seal race with Reset */
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
|
|
|
|
disabled = submission_disabled(guc);
|
|
|
|
|
if (likely(!disabled)) {
|
|
|
|
|
__intel_gt_pm_get(gt);
|
|
|
|
|
set_context_destroyed(ce);
|
|
|
|
|
clr_context_registered(ce);
|
|
|
|
|
}
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
if (unlikely(disabled)) {
|
2021-12-14 09:04:57 -08:00
|
|
|
release_guc_id(guc, ce);
|
2021-10-14 10:19:42 -07:00
|
|
|
__guc_context_destroy(ce);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:42 -07:00
|
|
|
deregister_context(ce, ce->guc_id.id);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
static void __guc_context_destroy(struct intel_context *ce)
|
|
|
|
|
{
|
2021-09-09 09:47:43 -07:00
|
|
|
GEM_BUG_ON(ce->guc_state.prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
|
|
|
|
|
ce->guc_state.prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
|
|
|
|
|
ce->guc_state.prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
|
|
|
|
|
ce->guc_state.prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
|
2021-09-09 09:47:40 -07:00
|
|
|
GEM_BUG_ON(ce->guc_state.number_committed_requests);
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
lrc_fini(ce);
|
|
|
|
|
intel_context_fini(ce);
|
|
|
|
|
|
|
|
|
|
if (intel_engine_is_virtual(ce->engine)) {
|
|
|
|
|
struct guc_virtual_engine *ve =
|
|
|
|
|
container_of(ce, typeof(*ve), context);
|
|
|
|
|
|
2021-07-26 17:23:20 -07:00
|
|
|
if (ve->base.breadcrumbs)
|
|
|
|
|
intel_breadcrumbs_put(ve->base.breadcrumbs);
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
kfree(ve);
|
|
|
|
|
} else {
|
|
|
|
|
intel_context_free(ce);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:42 -07:00
|
|
|
static void guc_flush_destroyed_contexts(struct intel_guc *guc)
|
|
|
|
|
{
|
2021-12-14 09:04:57 -08:00
|
|
|
struct intel_context *ce;
|
2021-10-14 10:19:42 -07:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!submission_disabled(guc) &&
|
|
|
|
|
guc_submission_initialized(guc));
|
|
|
|
|
|
2021-12-14 09:04:57 -08:00
|
|
|
while (!list_empty(&guc->submission_state.destroyed_contexts)) {
|
|
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
|
|
|
|
ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
|
|
|
|
|
struct intel_context,
|
|
|
|
|
destroyed_link);
|
|
|
|
|
if (ce)
|
|
|
|
|
list_del_init(&ce->destroyed_link);
|
|
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
if (!ce)
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
release_guc_id(guc, ce);
|
2021-10-14 10:19:42 -07:00
|
|
|
__guc_context_destroy(ce);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void deregister_destroyed_contexts(struct intel_guc *guc)
|
|
|
|
|
{
|
2021-12-14 09:04:57 -08:00
|
|
|
struct intel_context *ce;
|
2021-10-14 10:19:42 -07:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
2021-12-14 09:04:57 -08:00
|
|
|
while (!list_empty(&guc->submission_state.destroyed_contexts)) {
|
|
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
|
|
|
|
ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts,
|
|
|
|
|
struct intel_context,
|
|
|
|
|
destroyed_link);
|
|
|
|
|
if (ce)
|
|
|
|
|
list_del_init(&ce->destroyed_link);
|
|
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
if (!ce)
|
|
|
|
|
break;
|
|
|
|
|
|
2021-10-14 10:19:42 -07:00
|
|
|
guc_lrc_desc_unpin(ce);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void destroyed_worker_func(struct work_struct *w)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = container_of(w, struct intel_guc,
|
|
|
|
|
submission_state.destroyed_worker);
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
int tmp;
|
|
|
|
|
|
|
|
|
|
with_intel_gt_pm(gt, tmp)
|
|
|
|
|
deregister_destroyed_contexts(guc);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static void guc_context_destroy(struct kref *kref)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = container_of(kref, typeof(*ce), ref);
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
unsigned long flags;
|
2021-10-14 10:19:42 -07:00
|
|
|
bool destroy;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If the guc_id is invalid this context has been stolen and we can free
|
|
|
|
|
* it immediately. Also can be freed immediately if the context is not
|
2021-07-26 17:23:23 -07:00
|
|
|
* registered with the GuC or the GuC is in the middle of a reset.
|
2021-07-21 14:50:49 -07:00
|
|
|
*/
|
2021-10-14 10:19:41 -07:00
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
2021-10-14 10:19:42 -07:00
|
|
|
destroy = submission_disabled(guc) || context_guc_id_invalid(ce) ||
|
2022-03-01 16:33:50 -08:00
|
|
|
!ctx_id_mapped(guc, ce->guc_id.id);
|
2021-10-14 10:19:42 -07:00
|
|
|
if (likely(!destroy)) {
|
|
|
|
|
if (!list_empty(&ce->guc_id.link))
|
|
|
|
|
list_del_init(&ce->guc_id.link);
|
|
|
|
|
list_add_tail(&ce->destroyed_link,
|
|
|
|
|
&guc->submission_state.destroyed_contexts);
|
|
|
|
|
} else {
|
|
|
|
|
__release_guc_id(guc, ce);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
2021-10-14 10:19:41 -07:00
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
2021-10-14 10:19:42 -07:00
|
|
|
if (unlikely(destroy)) {
|
2021-07-26 17:23:23 -07:00
|
|
|
__guc_context_destroy(ce);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
/*
|
2021-10-14 10:19:42 -07:00
|
|
|
* We use a worker to issue the H2G to deregister the context as we can
|
|
|
|
|
* take the GT PM for the first time which isn't allowed from an atomic
|
|
|
|
|
* context.
|
2021-07-21 14:50:49 -07:00
|
|
|
*/
|
2021-10-14 10:19:42 -07:00
|
|
|
queue_work(system_unbound_wq, &guc->submission_state.destroyed_worker);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_context_alloc(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
return lrc_alloc(ce, ce->engine);
|
|
|
|
|
}
|
|
|
|
|
|
2022-04-12 15:59:55 -07:00
|
|
|
static void __guc_context_set_prio(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce)
|
|
|
|
|
{
|
2022-07-18 16:07:32 -07:00
|
|
|
if (guc->fw.major_ver_found >= 70) {
|
|
|
|
|
struct context_policy policy;
|
2022-04-12 15:59:55 -07:00
|
|
|
|
2022-07-18 16:07:32 -07:00
|
|
|
__guc_context_policy_start_klv(&policy, ce->guc_id.id);
|
|
|
|
|
__guc_context_policy_add_priority(&policy, ce->guc_state.prio);
|
|
|
|
|
__guc_context_set_context_policies(guc, &policy, true);
|
|
|
|
|
} else {
|
|
|
|
|
u32 action[] = {
|
|
|
|
|
INTEL_GUC_ACTION_V69_SET_CONTEXT_PRIORITY,
|
|
|
|
|
ce->guc_id.id,
|
|
|
|
|
ce->guc_state.prio,
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
|
|
|
|
|
}
|
2022-04-12 15:59:55 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:47 -07:00
|
|
|
static void guc_context_set_prio(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce,
|
|
|
|
|
u8 prio)
|
|
|
|
|
{
|
|
|
|
|
GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
|
|
|
|
|
prio > GUC_CLIENT_PRIORITY_NORMAL);
|
2021-09-09 09:47:43 -07:00
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
if (ce->guc_state.prio == prio || submission_disabled(guc) ||
|
2021-09-09 09:47:41 -07:00
|
|
|
!context_registered(ce)) {
|
2021-09-09 09:47:43 -07:00
|
|
|
ce->guc_state.prio = prio;
|
2021-07-26 17:23:47 -07:00
|
|
|
return;
|
2021-09-09 09:47:41 -07:00
|
|
|
}
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
ce->guc_state.prio = prio;
|
2022-04-12 15:59:55 -07:00
|
|
|
__guc_context_set_prio(guc, ce);
|
|
|
|
|
|
2021-07-26 17:23:47 -07:00
|
|
|
trace_intel_context_set_prio(ce);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline u8 map_i915_prio_to_guc_prio(int prio)
|
|
|
|
|
{
|
|
|
|
|
if (prio == I915_PRIORITY_NORMAL)
|
|
|
|
|
return GUC_CLIENT_PRIORITY_KMD_NORMAL;
|
|
|
|
|
else if (prio < I915_PRIORITY_NORMAL)
|
|
|
|
|
return GUC_CLIENT_PRIORITY_NORMAL;
|
|
|
|
|
else if (prio < I915_PRIORITY_DISPLAY)
|
|
|
|
|
return GUC_CLIENT_PRIORITY_HIGH;
|
|
|
|
|
else
|
|
|
|
|
return GUC_CLIENT_PRIORITY_KMD_HIGH;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void add_context_inflight_prio(struct intel_context *ce,
|
|
|
|
|
u8 guc_prio)
|
|
|
|
|
{
|
2021-09-09 09:47:43 -07:00
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_state.prio_count));
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
++ce->guc_state.prio_count[guc_prio];
|
2021-07-26 17:23:47 -07:00
|
|
|
|
|
|
|
|
/* Overflow protection */
|
2021-09-09 09:47:43 -07:00
|
|
|
GEM_WARN_ON(!ce->guc_state.prio_count[guc_prio]);
|
2021-07-26 17:23:47 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void sub_context_inflight_prio(struct intel_context *ce,
|
|
|
|
|
u8 guc_prio)
|
|
|
|
|
{
|
2021-09-09 09:47:43 -07:00
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_state.prio_count));
|
2021-07-26 17:23:47 -07:00
|
|
|
|
|
|
|
|
/* Underflow protection */
|
2021-09-09 09:47:43 -07:00
|
|
|
GEM_WARN_ON(!ce->guc_state.prio_count[guc_prio]);
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
--ce->guc_state.prio_count[guc_prio];
|
2021-07-26 17:23:47 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void update_context_prio(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = &ce->engine->gt->uc.guc;
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
BUILD_BUG_ON(GUC_CLIENT_PRIORITY_KMD_HIGH != 0);
|
|
|
|
|
BUILD_BUG_ON(GUC_CLIENT_PRIORITY_KMD_HIGH > GUC_CLIENT_PRIORITY_NORMAL);
|
|
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
for (i = 0; i < ARRAY_SIZE(ce->guc_state.prio_count); ++i) {
|
|
|
|
|
if (ce->guc_state.prio_count[i]) {
|
2021-07-26 17:23:47 -07:00
|
|
|
guc_context_set_prio(guc, ce, i);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
|
|
|
|
|
{
|
|
|
|
|
/* Lower value is higher priority */
|
|
|
|
|
return new_guc_prio < old_guc_prio;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:22 -07:00
|
|
|
static void add_to_context(struct i915_request *rq)
|
|
|
|
|
{
|
2021-10-14 10:19:52 -07:00
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
2021-07-26 17:23:47 -07:00
|
|
|
u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
|
|
|
|
|
|
2021-10-14 10:19:52 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-26 17:23:47 -07:00
|
|
|
GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
|
2021-07-26 17:23:22 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_lock(&ce->guc_state.lock);
|
|
|
|
|
list_move_tail(&rq->sched.link, &ce->guc_state.requests);
|
2021-07-26 17:23:47 -07:00
|
|
|
|
|
|
|
|
if (rq->guc_prio == GUC_PRIO_INIT) {
|
|
|
|
|
rq->guc_prio = new_guc_prio;
|
|
|
|
|
add_context_inflight_prio(ce, rq->guc_prio);
|
|
|
|
|
} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
|
|
|
|
|
sub_context_inflight_prio(ce, rq->guc_prio);
|
|
|
|
|
rq->guc_prio = new_guc_prio;
|
|
|
|
|
add_context_inflight_prio(ce, rq->guc_prio);
|
|
|
|
|
}
|
|
|
|
|
update_context_prio(ce);
|
|
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-07-26 17:23:22 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:47 -07:00
|
|
|
static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
|
|
|
|
|
{
|
2021-09-09 09:47:43 -07:00
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
2021-07-26 17:23:47 -07:00
|
|
|
|
|
|
|
|
if (rq->guc_prio != GUC_PRIO_INIT &&
|
|
|
|
|
rq->guc_prio != GUC_PRIO_FINI) {
|
|
|
|
|
sub_context_inflight_prio(ce, rq->guc_prio);
|
|
|
|
|
update_context_prio(ce);
|
|
|
|
|
}
|
|
|
|
|
rq->guc_prio = GUC_PRIO_FINI;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:22 -07:00
|
|
|
static void remove_from_context(struct i915_request *rq)
|
|
|
|
|
{
|
2021-10-14 10:19:52 -07:00
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-26 17:23:22 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_lock_irq(&ce->guc_state.lock);
|
2021-07-26 17:23:22 -07:00
|
|
|
|
|
|
|
|
list_del_init(&rq->sched.link);
|
|
|
|
|
clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
|
|
|
|
|
|
|
|
|
|
/* Prevent further __await_execution() registering a cb, then flush */
|
|
|
|
|
set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
|
|
|
|
|
|
2021-07-26 17:23:47 -07:00
|
|
|
guc_prio_fini(rq, ce);
|
|
|
|
|
|
2021-09-09 09:47:40 -07:00
|
|
|
decr_context_committed_requests(ce);
|
2021-09-09 09:47:43 -07:00
|
|
|
|
2021-09-09 09:47:40 -07:00
|
|
|
spin_unlock_irq(&ce->guc_state.lock);
|
|
|
|
|
|
2021-09-09 09:47:42 -07:00
|
|
|
atomic_dec(&ce->guc_id.ref);
|
2021-07-26 17:23:22 -07:00
|
|
|
i915_request_notify_execute_cb_imm(rq);
|
|
|
|
|
}
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
static const struct intel_context_ops guc_context_ops = {
|
|
|
|
|
.alloc = guc_context_alloc,
|
|
|
|
|
|
|
|
|
|
.pre_pin = guc_context_pre_pin,
|
|
|
|
|
.pin = guc_context_pin,
|
2021-07-21 14:50:49 -07:00
|
|
|
.unpin = guc_context_unpin,
|
|
|
|
|
.post_unpin = guc_context_post_unpin,
|
2021-01-12 18:12:35 -08:00
|
|
|
|
2022-05-27 08:24:52 +01:00
|
|
|
.revoke = guc_context_revoke,
|
2021-07-26 17:23:39 -07:00
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
.cancel_request = guc_context_cancel_request,
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
.enter = intel_context_enter_engine,
|
|
|
|
|
.exit = intel_context_exit_engine,
|
|
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
.sched_disable = guc_context_sched_disable,
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
.reset = lrc_reset,
|
2021-07-21 14:50:49 -07:00
|
|
|
.destroy = guc_context_destroy,
|
2021-07-26 17:23:16 -07:00
|
|
|
|
|
|
|
|
.create_virtual = guc_create_virtual,
|
2021-10-14 10:19:56 -07:00
|
|
|
.create_parallel = guc_create_parallel,
|
2021-01-12 18:12:35 -08:00
|
|
|
};
|
|
|
|
|
|
2021-09-09 09:47:36 -07:00
|
|
|
static void submit_work_cb(struct irq_work *wrk)
|
|
|
|
|
{
|
|
|
|
|
struct i915_request *rq = container_of(wrk, typeof(*rq), submit_work);
|
|
|
|
|
|
|
|
|
|
might_lock(&rq->engine->sched_engine->lock);
|
|
|
|
|
i915_sw_fence_complete(&rq->submit);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:50 -07:00
|
|
|
static void __guc_signal_context_fence(struct intel_context *ce)
|
|
|
|
|
{
|
2021-09-09 09:47:36 -07:00
|
|
|
struct i915_request *rq, *rn;
|
2021-07-21 14:50:50 -07:00
|
|
|
|
|
|
|
|
lockdep_assert_held(&ce->guc_state.lock);
|
|
|
|
|
|
2021-07-21 14:51:01 -07:00
|
|
|
if (!list_empty(&ce->guc_state.fences))
|
|
|
|
|
trace_intel_context_fence_release(ce);
|
|
|
|
|
|
2021-09-09 09:47:36 -07:00
|
|
|
/*
|
|
|
|
|
* Use an IRQ to ensure locking order of sched_engine->lock ->
|
|
|
|
|
* ce->guc_state.lock is preserved.
|
|
|
|
|
*/
|
|
|
|
|
list_for_each_entry_safe(rq, rn, &ce->guc_state.fences,
|
|
|
|
|
guc_fence_link) {
|
|
|
|
|
list_del(&rq->guc_fence_link);
|
|
|
|
|
irq_work_queue(&rq->submit_work);
|
|
|
|
|
}
|
2021-07-21 14:50:50 -07:00
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&ce->guc_state.fences);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_signal_context_fence(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
2021-10-14 10:19:49 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
|
|
|
|
|
2021-07-21 14:50:50 -07:00
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
|
|
|
|
clr_context_wait_for_deregister_to_register(ce);
|
|
|
|
|
__guc_signal_context_fence(ce);
|
|
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
|
|
|
|
|
{
|
2021-07-26 17:23:23 -07:00
|
|
|
return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
|
2022-03-01 16:33:50 -08:00
|
|
|
!ctx_id_mapped(ce_to_guc(ce), ce->guc_id.id)) &&
|
2021-07-26 17:23:23 -07:00
|
|
|
!submission_disabled(ce_to_guc(ce));
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:41 -07:00
|
|
|
static void guc_context_init(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
const struct i915_gem_context *ctx;
|
|
|
|
|
int prio = I915_CONTEXT_DEFAULT_PRIORITY;
|
|
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
ctx = rcu_dereference(ce->gem_context);
|
|
|
|
|
if (ctx)
|
|
|
|
|
prio = ctx->sched.priority;
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
ce->guc_state.prio = map_i915_prio_to_guc_prio(prio);
|
2021-09-09 09:47:41 -07:00
|
|
|
set_bit(CONTEXT_GUC_INIT, &ce->flags);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static int guc_request_alloc(struct i915_request *rq)
|
2021-01-12 18:12:35 -08:00
|
|
|
{
|
2021-10-14 10:19:49 -07:00
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
2021-07-21 14:50:49 -07:00
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
2021-07-21 14:50:50 -07:00
|
|
|
unsigned long flags;
|
2021-01-12 18:12:35 -08:00
|
|
|
int ret;
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
GEM_BUG_ON(!intel_context_is_pinned(rq->context));
|
2021-01-12 18:12:35 -08:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Flush enough space to reduce the likelihood of waiting after
|
|
|
|
|
* we start building the request - in which case we will just
|
|
|
|
|
* have to repeat work.
|
|
|
|
|
*/
|
2021-07-21 14:50:49 -07:00
|
|
|
rq->reserved_space += GUC_REQUEST_SIZE;
|
2021-01-12 18:12:35 -08:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Note that after this point, we have committed to using
|
|
|
|
|
* this request as it is being used to both track the
|
|
|
|
|
* state of engine initialisation and liveness of the
|
|
|
|
|
* golden renderstate above. Think twice before you try
|
|
|
|
|
* to cancel/unwind this request now.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
/* Unconditionally invalidate GPU caches and TLBs. */
|
2021-07-21 14:50:49 -07:00
|
|
|
ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
|
2021-01-12 18:12:35 -08:00
|
|
|
if (ret)
|
|
|
|
|
return ret;
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
rq->reserved_space -= GUC_REQUEST_SIZE;
|
2021-01-12 18:12:35 -08:00
|
|
|
|
2021-09-09 09:47:41 -07:00
|
|
|
if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags)))
|
|
|
|
|
guc_context_init(ce);
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
/*
|
|
|
|
|
* Call pin_guc_id here rather than in the pinning step as with
|
|
|
|
|
* dma_resv, contexts can be repeatedly pinned / unpinned trashing the
|
2021-09-09 09:47:42 -07:00
|
|
|
* guc_id and creating horrible race conditions. This is especially bad
|
|
|
|
|
* when guc_id are being stolen due to over subscription. By the time
|
2021-07-21 14:50:49 -07:00
|
|
|
* this function is reached, it is guaranteed that the guc_id will be
|
|
|
|
|
* persistent until the generated request is retired. Thus, sealing these
|
2021-09-09 09:47:42 -07:00
|
|
|
* race conditions. It is still safe to fail here if guc_id are
|
2021-07-21 14:50:49 -07:00
|
|
|
* exhausted and return -EAGAIN to the user indicating that they can try
|
|
|
|
|
* again in the future.
|
|
|
|
|
*
|
|
|
|
|
* There is no need for a lock here as the timeline mutex ensures at
|
|
|
|
|
* most one context can be executing this code path at once. The
|
|
|
|
|
* guc_id_ref is incremented once for every request in flight and
|
|
|
|
|
* decremented on each retire. When it is zero, a lock around the
|
|
|
|
|
* increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
|
|
|
|
|
*/
|
2021-09-09 09:47:42 -07:00
|
|
|
if (atomic_add_unless(&ce->guc_id.ref, 1, 0))
|
2021-07-21 14:50:50 -07:00
|
|
|
goto out;
|
2021-01-12 18:12:36 -08:00
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */
|
|
|
|
|
if (unlikely(ret < 0))
|
|
|
|
|
return ret;
|
|
|
|
|
if (context_needs_register(ce, !!ret)) {
|
2022-03-01 16:33:53 -08:00
|
|
|
ret = try_context_registration(ce, true);
|
2021-07-21 14:50:49 -07:00
|
|
|
if (unlikely(ret)) { /* unwind */
|
2021-07-26 17:23:23 -07:00
|
|
|
if (ret == -EPIPE) {
|
|
|
|
|
disable_submission(guc);
|
|
|
|
|
goto out; /* GPU will be reset */
|
|
|
|
|
}
|
2021-09-09 09:47:42 -07:00
|
|
|
atomic_dec(&ce->guc_id.ref);
|
2021-07-21 14:50:49 -07:00
|
|
|
unpin_guc_id(guc, ce);
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
}
|
2021-01-12 18:12:36 -08:00
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
|
2021-01-12 18:12:36 -08:00
|
|
|
|
2021-07-21 14:50:50 -07:00
|
|
|
out:
|
|
|
|
|
/*
|
|
|
|
|
* We block all requests on this context if a G2H is pending for a
|
2021-07-21 14:50:53 -07:00
|
|
|
* schedule disable or context deregistration as the GuC will fail a
|
|
|
|
|
* schedule enable or context registration if either G2H is pending
|
|
|
|
|
* respectfully. Once a G2H returns, the fence is released that is
|
|
|
|
|
* blocking these requests (see guc_signal_context_fence).
|
2021-07-21 14:50:50 -07:00
|
|
|
*/
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
2021-07-21 14:50:53 -07:00
|
|
|
if (context_wait_for_deregister_to_register(ce) ||
|
|
|
|
|
context_pending_disable(ce)) {
|
2021-09-09 09:47:36 -07:00
|
|
|
init_irq_work(&rq->submit_work, submit_work_cb);
|
2021-07-21 14:50:50 -07:00
|
|
|
i915_sw_fence_await(&rq->submit);
|
|
|
|
|
|
|
|
|
|
list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
|
|
|
|
|
}
|
2021-09-09 09:47:40 -07:00
|
|
|
incr_context_committed_requests(ce);
|
2021-07-21 14:50:50 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
return 0;
|
2021-01-12 18:12:36 -08:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
static int guc_virtual_context_pre_pin(struct intel_context *ce,
|
|
|
|
|
struct i915_gem_ww_ctx *ww,
|
|
|
|
|
void **vaddr)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
|
|
|
|
|
|
|
|
|
|
return __guc_context_pre_pin(ce, engine, ww, vaddr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
|
2021-10-14 10:19:43 -07:00
|
|
|
int ret = __guc_context_pin(ce, engine, vaddr);
|
|
|
|
|
intel_engine_mask_t tmp, mask = ce->engine->mask;
|
|
|
|
|
|
|
|
|
|
if (likely(!ret))
|
|
|
|
|
for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
|
|
|
|
|
intel_engine_pm_get(engine);
|
2021-07-26 17:23:16 -07:00
|
|
|
|
2021-10-14 10:19:43 -07:00
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_virtual_context_unpin(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
intel_engine_mask_t tmp, mask = ce->engine->mask;
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(context_enabled(ce));
|
|
|
|
|
GEM_BUG_ON(intel_context_is_barrier(ce));
|
|
|
|
|
|
|
|
|
|
unpin_guc_id(guc, ce);
|
|
|
|
|
lrc_unpin(ce);
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
|
|
|
|
|
intel_engine_pm_put_async(engine);
|
2021-07-26 17:23:16 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_virtual_context_enter(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
intel_engine_mask_t tmp, mask = ce->engine->mask;
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
|
|
|
|
|
intel_engine_pm_get(engine);
|
|
|
|
|
|
|
|
|
|
intel_timeline_enter(ce->timeline);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_virtual_context_exit(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
intel_engine_mask_t tmp, mask = ce->engine->mask;
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
|
|
|
|
|
intel_engine_pm_put(engine);
|
|
|
|
|
|
|
|
|
|
intel_timeline_exit(ce->timeline);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_virtual_context_alloc(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
|
|
|
|
|
|
|
|
|
|
return lrc_alloc(ce, engine);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static const struct intel_context_ops virtual_guc_context_ops = {
|
|
|
|
|
.alloc = guc_virtual_context_alloc,
|
|
|
|
|
|
|
|
|
|
.pre_pin = guc_virtual_context_pre_pin,
|
|
|
|
|
.pin = guc_virtual_context_pin,
|
2021-10-14 10:19:43 -07:00
|
|
|
.unpin = guc_virtual_context_unpin,
|
2021-07-26 17:23:16 -07:00
|
|
|
.post_unpin = guc_context_post_unpin,
|
|
|
|
|
|
2022-05-27 08:24:52 +01:00
|
|
|
.revoke = guc_context_revoke,
|
2021-07-26 17:23:39 -07:00
|
|
|
|
2021-07-26 17:23:40 -07:00
|
|
|
.cancel_request = guc_context_cancel_request,
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
.enter = guc_virtual_context_enter,
|
|
|
|
|
.exit = guc_virtual_context_exit,
|
|
|
|
|
|
|
|
|
|
.sched_disable = guc_context_sched_disable,
|
|
|
|
|
|
|
|
|
|
.destroy = guc_context_destroy,
|
|
|
|
|
|
|
|
|
|
.get_sibling = guc_virtual_get_sibling,
|
|
|
|
|
};
|
|
|
|
|
|
2021-10-14 10:19:51 -07:00
|
|
|
static int guc_parent_context_pin(struct intel_context *ce, void *vaddr)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_parent(ce));
|
|
|
|
|
GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
|
|
|
|
|
|
|
|
|
|
ret = pin_guc_id(guc, ce);
|
|
|
|
|
if (unlikely(ret < 0))
|
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
|
|
return __guc_context_pin(ce, engine, vaddr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_child_context_pin(struct intel_context *ce, void *vaddr)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_child(ce));
|
|
|
|
|
GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
|
|
|
|
|
|
|
|
|
|
__intel_context_pin(ce->parallel.parent);
|
|
|
|
|
return __guc_context_pin(ce, engine, vaddr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_parent_context_unpin(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = ce_to_guc(ce);
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(context_enabled(ce));
|
|
|
|
|
GEM_BUG_ON(intel_context_is_barrier(ce));
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_parent(ce));
|
|
|
|
|
GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
|
|
|
|
|
|
|
|
|
|
unpin_guc_id(guc, ce);
|
|
|
|
|
lrc_unpin(ce);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_child_context_unpin(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
GEM_BUG_ON(context_enabled(ce));
|
|
|
|
|
GEM_BUG_ON(intel_context_is_barrier(ce));
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_child(ce));
|
|
|
|
|
GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
|
|
|
|
|
|
|
|
|
|
lrc_unpin(ce);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_child_context_post_unpin(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_child(ce));
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_pinned(ce->parallel.parent));
|
|
|
|
|
GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
|
|
|
|
|
|
|
|
|
|
lrc_post_unpin(ce);
|
|
|
|
|
intel_context_unpin(ce->parallel.parent);
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:56 -07:00
|
|
|
static void guc_child_context_destroy(struct kref *kref)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = container_of(kref, typeof(*ce), ref);
|
|
|
|
|
|
|
|
|
|
__guc_context_destroy(ce);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static const struct intel_context_ops virtual_parent_context_ops = {
|
|
|
|
|
.alloc = guc_virtual_context_alloc,
|
|
|
|
|
|
|
|
|
|
.pre_pin = guc_context_pre_pin,
|
|
|
|
|
.pin = guc_parent_context_pin,
|
|
|
|
|
.unpin = guc_parent_context_unpin,
|
|
|
|
|
.post_unpin = guc_context_post_unpin,
|
|
|
|
|
|
2022-05-27 08:24:52 +01:00
|
|
|
.revoke = guc_context_revoke,
|
2021-10-14 10:19:56 -07:00
|
|
|
|
|
|
|
|
.cancel_request = guc_context_cancel_request,
|
|
|
|
|
|
|
|
|
|
.enter = guc_virtual_context_enter,
|
|
|
|
|
.exit = guc_virtual_context_exit,
|
|
|
|
|
|
|
|
|
|
.sched_disable = guc_context_sched_disable,
|
|
|
|
|
|
|
|
|
|
.destroy = guc_context_destroy,
|
|
|
|
|
|
|
|
|
|
.get_sibling = guc_virtual_get_sibling,
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static const struct intel_context_ops virtual_child_context_ops = {
|
|
|
|
|
.alloc = guc_virtual_context_alloc,
|
|
|
|
|
|
|
|
|
|
.pre_pin = guc_context_pre_pin,
|
|
|
|
|
.pin = guc_child_context_pin,
|
|
|
|
|
.unpin = guc_child_context_unpin,
|
|
|
|
|
.post_unpin = guc_child_context_post_unpin,
|
|
|
|
|
|
|
|
|
|
.cancel_request = guc_context_cancel_request,
|
|
|
|
|
|
|
|
|
|
.enter = guc_virtual_context_enter,
|
|
|
|
|
.exit = guc_virtual_context_exit,
|
|
|
|
|
|
|
|
|
|
.destroy = guc_child_context_destroy,
|
|
|
|
|
|
|
|
|
|
.get_sibling = guc_virtual_get_sibling,
|
|
|
|
|
};
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
/*
|
|
|
|
|
* The below override of the breadcrumbs is enabled when the user configures a
|
|
|
|
|
* context for parallel submission (multi-lrc, parent-child).
|
|
|
|
|
*
|
|
|
|
|
* The overridden breadcrumbs implements an algorithm which allows the GuC to
|
|
|
|
|
* safely preempt all the hw contexts configured for parallel submission
|
|
|
|
|
* between each BB. The contract between the i915 and GuC is if the parent
|
|
|
|
|
* context can be preempted, all the children can be preempted, and the GuC will
|
|
|
|
|
* always try to preempt the parent before the children. A handshake between the
|
|
|
|
|
* parent / children breadcrumbs ensures the i915 holds up its end of the deal
|
|
|
|
|
* creating a window to preempt between each set of BBs.
|
|
|
|
|
*/
|
|
|
|
|
static int emit_bb_start_parent_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u64 offset, u32 len,
|
|
|
|
|
const unsigned int flags);
|
|
|
|
|
static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u64 offset, u32 len,
|
|
|
|
|
const unsigned int flags);
|
|
|
|
|
static u32 *
|
|
|
|
|
emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u32 *cs);
|
|
|
|
|
static u32 *
|
|
|
|
|
emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u32 *cs);
|
|
|
|
|
|
2021-10-14 10:19:56 -07:00
|
|
|
static struct intel_context *
|
|
|
|
|
guc_create_parallel(struct intel_engine_cs **engines,
|
|
|
|
|
unsigned int num_siblings,
|
|
|
|
|
unsigned int width)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs **siblings = NULL;
|
|
|
|
|
struct intel_context *parent = NULL, *ce, *err;
|
|
|
|
|
int i, j;
|
|
|
|
|
|
|
|
|
|
siblings = kmalloc_array(num_siblings,
|
|
|
|
|
sizeof(*siblings),
|
|
|
|
|
GFP_KERNEL);
|
|
|
|
|
if (!siblings)
|
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
|
|
|
|
|
for (i = 0; i < width; ++i) {
|
|
|
|
|
for (j = 0; j < num_siblings; ++j)
|
|
|
|
|
siblings[j] = engines[i * num_siblings + j];
|
|
|
|
|
|
|
|
|
|
ce = intel_engine_create_virtual(siblings, num_siblings,
|
|
|
|
|
FORCE_VIRTUAL);
|
2021-11-16 14:49:16 +03:00
|
|
|
if (IS_ERR(ce)) {
|
|
|
|
|
err = ERR_CAST(ce);
|
2021-10-14 10:19:56 -07:00
|
|
|
goto unwind;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (i == 0) {
|
|
|
|
|
parent = ce;
|
|
|
|
|
parent->ops = &virtual_parent_context_ops;
|
|
|
|
|
} else {
|
|
|
|
|
ce->ops = &virtual_child_context_ops;
|
|
|
|
|
intel_context_bind_parent_child(parent, ce);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:20:00 -07:00
|
|
|
parent->parallel.fence_context = dma_fence_context_alloc(1);
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
parent->engine->emit_bb_start =
|
|
|
|
|
emit_bb_start_parent_no_preempt_mid_batch;
|
|
|
|
|
parent->engine->emit_fini_breadcrumb =
|
|
|
|
|
emit_fini_breadcrumb_parent_no_preempt_mid_batch;
|
|
|
|
|
parent->engine->emit_fini_breadcrumb_dw =
|
|
|
|
|
12 + 4 * parent->parallel.number_children;
|
|
|
|
|
for_each_child(parent, ce) {
|
|
|
|
|
ce->engine->emit_bb_start =
|
|
|
|
|
emit_bb_start_child_no_preempt_mid_batch;
|
|
|
|
|
ce->engine->emit_fini_breadcrumb =
|
|
|
|
|
emit_fini_breadcrumb_child_no_preempt_mid_batch;
|
|
|
|
|
ce->engine->emit_fini_breadcrumb_dw = 16;
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:56 -07:00
|
|
|
kfree(siblings);
|
|
|
|
|
return parent;
|
|
|
|
|
|
|
|
|
|
unwind:
|
|
|
|
|
if (parent)
|
|
|
|
|
intel_context_put(parent);
|
|
|
|
|
kfree(siblings);
|
|
|
|
|
return err;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:20 -07:00
|
|
|
static bool
|
|
|
|
|
guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *sibling;
|
|
|
|
|
intel_engine_mask_t tmp, mask = b->engine_mask;
|
|
|
|
|
bool result = false;
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
|
|
|
|
|
result |= intel_engine_irq_enable(sibling);
|
|
|
|
|
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *sibling;
|
|
|
|
|
intel_engine_mask_t tmp, mask = b->engine_mask;
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
|
|
|
|
|
intel_engine_irq_disable(sibling);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* In GuC submission mode we do not know which physical engine a request
|
|
|
|
|
* will be scheduled on, this creates a problem because the breadcrumb
|
|
|
|
|
* interrupt is per physical engine. To work around this we attach
|
|
|
|
|
* requests and direct all breadcrumb interrupts to the first instance
|
|
|
|
|
* of an engine per class. In addition all breadcrumb interrupts are
|
|
|
|
|
* enabled / disabled across an engine class in unison.
|
|
|
|
|
*/
|
|
|
|
|
for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
|
|
|
|
|
struct intel_engine_cs *sibling =
|
|
|
|
|
engine->gt->engine_class[engine->class][i];
|
|
|
|
|
|
|
|
|
|
if (sibling) {
|
|
|
|
|
if (engine->breadcrumbs != sibling->breadcrumbs) {
|
|
|
|
|
intel_breadcrumbs_put(engine->breadcrumbs);
|
|
|
|
|
engine->breadcrumbs =
|
|
|
|
|
intel_breadcrumbs_get(sibling->breadcrumbs);
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (engine->breadcrumbs) {
|
|
|
|
|
engine->breadcrumbs->engine_mask |= engine->mask;
|
|
|
|
|
engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
|
|
|
|
|
engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:47 -07:00
|
|
|
static void guc_bump_inflight_request_prio(struct i915_request *rq,
|
|
|
|
|
int prio)
|
|
|
|
|
{
|
2021-10-14 10:19:52 -07:00
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
2021-07-26 17:23:47 -07:00
|
|
|
u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
|
|
|
|
|
|
|
|
|
|
/* Short circuit function */
|
|
|
|
|
if (prio < I915_PRIORITY_NORMAL ||
|
|
|
|
|
rq->guc_prio == GUC_PRIO_FINI ||
|
|
|
|
|
(rq->guc_prio != GUC_PRIO_INIT &&
|
|
|
|
|
!new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
|
|
|
|
|
return;
|
|
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_lock(&ce->guc_state.lock);
|
2021-07-26 17:23:47 -07:00
|
|
|
if (rq->guc_prio != GUC_PRIO_FINI) {
|
|
|
|
|
if (rq->guc_prio != GUC_PRIO_INIT)
|
|
|
|
|
sub_context_inflight_prio(ce, rq->guc_prio);
|
|
|
|
|
rq->guc_prio = new_guc_prio;
|
|
|
|
|
add_context_inflight_prio(ce, rq->guc_prio);
|
|
|
|
|
update_context_prio(ce);
|
|
|
|
|
}
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-07-26 17:23:47 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_retire_inflight_request_prio(struct i915_request *rq)
|
|
|
|
|
{
|
2021-10-14 10:19:52 -07:00
|
|
|
struct intel_context *ce = request_to_scheduling_context(rq);
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_lock(&ce->guc_state.lock);
|
2021-07-26 17:23:47 -07:00
|
|
|
guc_prio_fini(rq, ce);
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-07-26 17:23:47 -07:00
|
|
|
}
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
static void sanitize_hwsp(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
struct intel_timeline *tl;
|
|
|
|
|
|
|
|
|
|
list_for_each_entry(tl, &engine->status_page.timelines, engine_link)
|
|
|
|
|
intel_timeline_reset_seqno(tl);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_sanitize(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* Poison residual state on resume, in case the suspend didn't!
|
|
|
|
|
*
|
|
|
|
|
* We have to assume that across suspend/resume (or other loss
|
|
|
|
|
* of control) that the contents of our pinned buffers has been
|
|
|
|
|
* lost, replaced by garbage. Since this doesn't always happen,
|
|
|
|
|
* let's poison such state so that we more quickly spot when
|
|
|
|
|
* we falsely assume it has been preserved.
|
|
|
|
|
*/
|
|
|
|
|
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
|
|
|
|
|
memset(engine->status_page.addr, POISON_INUSE, PAGE_SIZE);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* The kernel_context HWSP is stored in the status_page. As above,
|
|
|
|
|
* that may be lost on resume/initialisation, and so we need to
|
|
|
|
|
* reset the value in the HWSP.
|
|
|
|
|
*/
|
|
|
|
|
sanitize_hwsp(engine);
|
|
|
|
|
|
|
|
|
|
/* And scrub the dirty cachelines for the HWSP */
|
2022-03-21 15:38:19 -07:00
|
|
|
drm_clflush_virt_range(engine->status_page.addr, PAGE_SIZE);
|
2021-09-22 08:25:23 +02:00
|
|
|
|
|
|
|
|
intel_engine_reset_pinned_contexts(engine);
|
2021-01-12 18:12:35 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void setup_hwsp(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
intel_engine_set_hwsp_writemask(engine, ~0u); /* HWSTAM */
|
|
|
|
|
|
|
|
|
|
ENGINE_WRITE_FW(engine,
|
|
|
|
|
RING_HWS_PGA,
|
|
|
|
|
i915_ggtt_offset(engine->status_page.vma));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void start_engine(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
ENGINE_WRITE_FW(engine,
|
|
|
|
|
RING_MODE_GEN7,
|
|
|
|
|
_MASKED_BIT_ENABLE(GEN11_GFX_DISABLE_LEGACY_MODE));
|
|
|
|
|
|
|
|
|
|
ENGINE_WRITE_FW(engine, RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING));
|
|
|
|
|
ENGINE_POSTING_READ(engine, RING_MI_MODE);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int guc_resume(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
assert_forcewakes_active(engine->uncore, FORCEWAKE_ALL);
|
|
|
|
|
|
|
|
|
|
intel_mocs_init_engine(engine);
|
|
|
|
|
|
|
|
|
|
intel_breadcrumbs_reset(engine->breadcrumbs);
|
|
|
|
|
|
|
|
|
|
setup_hwsp(engine);
|
|
|
|
|
start_engine(engine);
|
|
|
|
|
|
2022-03-03 14:34:34 -08:00
|
|
|
if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE)
|
2022-03-01 16:15:54 -08:00
|
|
|
xehp_enable_ccs_engines(engine);
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:24 -07:00
|
|
|
static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine)
|
|
|
|
|
{
|
|
|
|
|
return !sched_engine->tasklet.callback;
|
|
|
|
|
}
|
|
|
|
|
|
2018-07-17 21:29:32 +01:00
|
|
|
static void guc_set_default_submission(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
2021-01-12 18:12:36 -08:00
|
|
|
engine->submit_request = guc_submit_request;
|
2018-07-17 21:29:32 +01:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
static inline void guc_kernel_context_pin(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce)
|
|
|
|
|
{
|
2022-03-01 16:33:53 -08:00
|
|
|
/*
|
|
|
|
|
* Note: we purposefully do not check the returns below because
|
|
|
|
|
* the registration can only fail if a reset is just starting.
|
|
|
|
|
* This is called at the end of reset so presumably another reset
|
|
|
|
|
* isn't happening and even it did this code would be run again.
|
|
|
|
|
*/
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
if (context_guc_id_invalid(ce))
|
|
|
|
|
pin_guc_id(guc, ce);
|
2022-03-01 16:33:53 -08:00
|
|
|
|
|
|
|
|
try_context_registration(ce, true);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline void guc_init_lrc_mapping(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
enum intel_engine_id id;
|
|
|
|
|
|
|
|
|
|
/* make sure all descriptors are clean... */
|
|
|
|
|
xa_destroy(&guc->context_lookup);
|
|
|
|
|
|
2022-08-11 14:08:12 -07:00
|
|
|
/*
|
|
|
|
|
* A reset might have occurred while we had a pending stalled request,
|
|
|
|
|
* so make sure we clean that up.
|
|
|
|
|
*/
|
|
|
|
|
guc->stalled_request = NULL;
|
|
|
|
|
guc->submission_stall_reason = STALL_NONE;
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
/*
|
|
|
|
|
* Some contexts might have been pinned before we enabled GuC
|
|
|
|
|
* submission, so we need to add them to the GuC bookeeping.
|
|
|
|
|
* Also, after a reset the of the GuC we want to make sure that the
|
|
|
|
|
* information shared with GuC is properly reset. The kernel LRCs are
|
|
|
|
|
* not attached to the gem_context, so they need to be added separately.
|
|
|
|
|
*/
|
2021-09-22 08:25:23 +02:00
|
|
|
for_each_engine(engine, gt, id) {
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
|
|
|
|
|
list_for_each_entry(ce, &engine->pinned_contexts_list,
|
|
|
|
|
pinned_contexts_link)
|
|
|
|
|
guc_kernel_context_pin(guc, ce);
|
|
|
|
|
}
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
static void guc_release(struct intel_engine_cs *engine)
|
2015-08-12 15:43:41 +01:00
|
|
|
{
|
2021-01-12 18:12:35 -08:00
|
|
|
engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
|
2019-08-02 18:40:54 +00:00
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
intel_engine_cleanup_common(engine);
|
|
|
|
|
lrc_fini_wa_ctx(engine);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:17 -07:00
|
|
|
static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *e;
|
|
|
|
|
intel_engine_mask_t tmp, mask = engine->mask;
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(e, engine->gt, mask, tmp)
|
|
|
|
|
e->serial++;
|
|
|
|
|
}
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
static void guc_default_vfuncs(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
/* Default vfuncs which can be overridden by each engine. */
|
|
|
|
|
|
|
|
|
|
engine->resume = guc_resume;
|
|
|
|
|
|
|
|
|
|
engine->cops = &guc_context_ops;
|
|
|
|
|
engine->request_alloc = guc_request_alloc;
|
2021-07-26 17:23:22 -07:00
|
|
|
engine->add_active_request = add_to_context;
|
|
|
|
|
engine->remove_active_request = remove_from_context;
|
2021-01-12 18:12:35 -08:00
|
|
|
|
2021-06-17 18:06:35 -07:00
|
|
|
engine->sched_engine->schedule = i915_schedule;
|
2021-05-21 11:32:13 -07:00
|
|
|
|
2022-04-15 15:40:21 -07:00
|
|
|
engine->reset.prepare = guc_engine_reset_prepare;
|
2021-07-26 17:23:23 -07:00
|
|
|
engine->reset.rewind = guc_rewind_nop;
|
|
|
|
|
engine->reset.cancel = guc_reset_nop;
|
|
|
|
|
engine->reset.finish = guc_reset_nop;
|
2021-05-21 11:32:13 -07:00
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
engine->emit_flush = gen8_emit_flush_xcs;
|
|
|
|
|
engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
|
|
|
|
|
engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
|
2021-06-05 08:53:52 -07:00
|
|
|
if (GRAPHICS_VER(engine->i915) >= 12) {
|
2021-01-12 18:12:35 -08:00
|
|
|
engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_xcs;
|
|
|
|
|
engine->emit_flush = gen12_emit_flush_xcs;
|
|
|
|
|
}
|
|
|
|
|
engine->set_default_submission = guc_set_default_submission;
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
engine->busyness = guc_engine_busyness;
|
2021-05-21 11:32:13 -07:00
|
|
|
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
engine->flags |= I915_ENGINE_SUPPORTS_STATS;
|
2021-05-21 11:32:13 -07:00
|
|
|
engine->flags |= I915_ENGINE_HAS_PREEMPTION;
|
2021-07-26 17:23:35 -07:00
|
|
|
engine->flags |= I915_ENGINE_HAS_TIMESLICES;
|
2021-05-21 11:32:13 -07:00
|
|
|
|
2022-04-15 15:40:24 -07:00
|
|
|
/* Wa_14014475959:dg2 */
|
|
|
|
|
if (IS_DG2(engine->i915) && engine->class == COMPUTE_CLASS)
|
|
|
|
|
engine->flags |= I915_ENGINE_USES_WA_HOLD_CCS_SWITCHOUT;
|
|
|
|
|
|
2021-05-21 11:32:13 -07:00
|
|
|
/*
|
|
|
|
|
* TODO: GuC supports timeslicing and semaphores as well, but they're
|
|
|
|
|
* handled by the firmware so some minor tweaks are required before
|
|
|
|
|
* enabling.
|
|
|
|
|
*
|
|
|
|
|
* engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
engine->emit_bb_start = gen8_emit_bb_start;
|
2022-04-25 20:53:17 +05:30
|
|
|
if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
|
|
|
|
|
engine->emit_bb_start = gen125_emit_bb_start;
|
2021-01-12 18:12:35 -08:00
|
|
|
}
|
2017-03-09 13:20:04 +00:00
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
static void rcs_submission_override(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
2021-06-05 08:53:52 -07:00
|
|
|
switch (GRAPHICS_VER(engine->i915)) {
|
2021-01-12 18:12:35 -08:00
|
|
|
case 12:
|
|
|
|
|
engine->emit_flush = gen12_emit_flush_rcs;
|
|
|
|
|
engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs;
|
|
|
|
|
break;
|
|
|
|
|
case 11:
|
|
|
|
|
engine->emit_flush = gen11_emit_flush_rcs;
|
|
|
|
|
engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs;
|
|
|
|
|
break;
|
|
|
|
|
default:
|
|
|
|
|
engine->emit_flush = gen8_emit_flush_rcs;
|
|
|
|
|
engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs;
|
|
|
|
|
break;
|
2016-09-09 14:11:53 +01:00
|
|
|
}
|
2015-08-12 15:43:41 +01:00
|
|
|
}
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
static inline void guc_default_irqs(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
engine->irq_keep_mask = GT_RENDER_USER_INTERRUPT;
|
2021-05-21 11:32:15 -07:00
|
|
|
intel_engine_set_irq_handler(engine, cs_irq_handler);
|
2021-01-12 18:12:35 -08:00
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:23 -07:00
|
|
|
static void guc_sched_engine_destroy(struct kref *kref)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine *sched_engine =
|
|
|
|
|
container_of(kref, typeof(*sched_engine), ref);
|
|
|
|
|
struct intel_guc *guc = sched_engine->private_data;
|
|
|
|
|
|
|
|
|
|
guc->sched_engine = NULL;
|
|
|
|
|
tasklet_kill(&sched_engine->tasklet); /* flush the callback */
|
|
|
|
|
kfree(sched_engine);
|
|
|
|
|
}
|
|
|
|
|
|
2021-01-12 18:12:35 -08:00
|
|
|
int intel_guc_submission_setup(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
struct drm_i915_private *i915 = engine->i915;
|
2021-07-21 14:50:47 -07:00
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
2021-01-12 18:12:35 -08:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* The setup relies on several assumptions (e.g. irqs always enabled)
|
|
|
|
|
* that are only valid on gen11+
|
|
|
|
|
*/
|
2021-06-05 08:53:52 -07:00
|
|
|
GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
|
2021-01-12 18:12:35 -08:00
|
|
|
|
2021-07-21 14:50:47 -07:00
|
|
|
if (!guc->sched_engine) {
|
|
|
|
|
guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
|
|
|
|
|
if (!guc->sched_engine)
|
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
|
|
guc->sched_engine->schedule = i915_schedule;
|
2021-07-26 17:23:24 -07:00
|
|
|
guc->sched_engine->disabled = guc_sched_engine_disabled;
|
2021-07-21 14:50:47 -07:00
|
|
|
guc->sched_engine->private_data = guc;
|
2021-07-26 17:23:23 -07:00
|
|
|
guc->sched_engine->destroy = guc_sched_engine_destroy;
|
2021-07-26 17:23:47 -07:00
|
|
|
guc->sched_engine->bump_inflight_request_prio =
|
|
|
|
|
guc_bump_inflight_request_prio;
|
|
|
|
|
guc->sched_engine->retire_inflight_request_prio =
|
|
|
|
|
guc_retire_inflight_request_prio;
|
2021-07-21 14:50:47 -07:00
|
|
|
tasklet_setup(&guc->sched_engine->tasklet,
|
|
|
|
|
guc_submission_tasklet);
|
|
|
|
|
}
|
|
|
|
|
i915_sched_engine_put(engine->sched_engine);
|
|
|
|
|
engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
|
2021-01-12 18:12:35 -08:00
|
|
|
|
|
|
|
|
guc_default_vfuncs(engine);
|
|
|
|
|
guc_default_irqs(engine);
|
2021-07-26 17:23:20 -07:00
|
|
|
guc_init_breadcrumbs(engine);
|
2021-01-12 18:12:35 -08:00
|
|
|
|
2022-03-01 15:15:41 -08:00
|
|
|
if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)
|
2021-01-12 18:12:35 -08:00
|
|
|
rcs_submission_override(engine);
|
|
|
|
|
|
|
|
|
|
lrc_init_wa_ctx(engine);
|
|
|
|
|
|
|
|
|
|
/* Finally, take ownership and responsibility for cleanup! */
|
|
|
|
|
engine->sanitize = guc_sanitize;
|
|
|
|
|
engine->release = guc_release;
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void intel_guc_submission_enable(struct intel_guc *guc)
|
|
|
|
|
{
|
2021-07-21 14:50:49 -07:00
|
|
|
guc_init_lrc_mapping(guc);
|
drm/i915/pmu: Connect engine busyness stats from GuC to pmu
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-26 17:48:21 -07:00
|
|
|
guc_init_engine_stats(guc);
|
2021-01-12 18:12:35 -08:00
|
|
|
}
|
|
|
|
|
|
2017-11-16 19:02:39 +05:30
|
|
|
void intel_guc_submission_disable(struct intel_guc *guc)
|
2015-08-12 15:43:41 +01:00
|
|
|
{
|
2019-12-05 14:02:42 -08:00
|
|
|
/* Note: By the time we're here, GuC may have already been reset */
|
2015-08-12 15:43:41 +01:00
|
|
|
}
|
2017-11-16 14:06:31 -08:00
|
|
|
|
2021-07-26 17:23:48 -07:00
|
|
|
static bool __guc_submission_supported(struct intel_guc *guc)
|
|
|
|
|
{
|
|
|
|
|
/* GuC submission is unavailable for pre-Gen11 */
|
|
|
|
|
return intel_guc_is_supported(guc) &&
|
|
|
|
|
GRAPHICS_VER(guc_to_gt(guc)->i915) >= 11;
|
|
|
|
|
}
|
|
|
|
|
|
2020-02-18 14:33:24 -08:00
|
|
|
static bool __guc_submission_selected(struct intel_guc *guc)
|
2019-07-31 22:33:20 +00:00
|
|
|
{
|
2020-06-18 18:04:02 +03:00
|
|
|
struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
|
|
|
|
|
|
2020-02-18 14:33:24 -08:00
|
|
|
if (!intel_guc_submission_is_supported(guc))
|
2019-07-31 22:33:20 +00:00
|
|
|
return false;
|
|
|
|
|
|
2020-06-18 18:04:02 +03:00
|
|
|
return i915->params.enable_guc & ENABLE_GUC_SUBMISSION;
|
2019-07-31 22:33:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void intel_guc_submission_init_early(struct intel_guc *guc)
|
|
|
|
|
{
|
2022-02-14 17:11:23 -08:00
|
|
|
xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
|
|
|
|
|
|
|
|
|
|
spin_lock_init(&guc->submission_state.lock);
|
|
|
|
|
INIT_LIST_HEAD(&guc->submission_state.guc_id_list);
|
|
|
|
|
ida_init(&guc->submission_state.guc_ids);
|
|
|
|
|
INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
|
|
|
|
|
INIT_WORK(&guc->submission_state.destroyed_worker,
|
|
|
|
|
destroyed_worker_func);
|
|
|
|
|
INIT_WORK(&guc->submission_state.reset_fail_worker,
|
|
|
|
|
reset_fail_worker_func);
|
|
|
|
|
|
|
|
|
|
spin_lock_init(&guc->timestamp.lock);
|
|
|
|
|
INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
|
|
|
|
|
|
2022-03-01 16:33:52 -08:00
|
|
|
guc->submission_state.num_guc_ids = GUC_MAX_CONTEXT_ID;
|
2021-07-26 17:23:48 -07:00
|
|
|
guc->submission_supported = __guc_submission_supported(guc);
|
2020-02-18 14:33:24 -08:00
|
|
|
guc->submission_selected = __guc_submission_selected(guc);
|
2019-07-31 22:33:20 +00:00
|
|
|
}
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
static inline struct intel_context *
|
2022-03-01 16:33:55 -08:00
|
|
|
g2h_context_lookup(struct intel_guc *guc, u32 ctx_id)
|
2021-07-21 14:50:49 -07:00
|
|
|
{
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
|
2022-03-01 16:33:55 -08:00
|
|
|
if (unlikely(ctx_id >= GUC_MAX_CONTEXT_ID)) {
|
2021-07-21 14:50:49 -07:00
|
|
|
drm_err(&guc_to_gt(guc)->i915->drm,
|
2022-03-01 16:33:55 -08:00
|
|
|
"Invalid ctx_id %u\n", ctx_id);
|
2021-07-21 14:50:49 -07:00
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
2022-03-01 16:33:55 -08:00
|
|
|
ce = __get_context(guc, ctx_id);
|
2021-07-21 14:50:49 -07:00
|
|
|
if (unlikely(!ce)) {
|
|
|
|
|
drm_err(&guc_to_gt(guc)->i915->drm,
|
2022-03-01 16:33:55 -08:00
|
|
|
"Context is NULL, ctx_id %u\n", ctx_id);
|
2021-07-21 14:50:49 -07:00
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:48 -07:00
|
|
|
if (unlikely(intel_context_is_child(ce))) {
|
|
|
|
|
drm_err(&guc_to_gt(guc)->i915->drm,
|
2022-03-01 16:33:55 -08:00
|
|
|
"Context is child, ctx_id %u\n", ctx_id);
|
2021-10-14 10:19:48 -07:00
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
return ce;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
|
|
|
|
|
const u32 *msg,
|
|
|
|
|
u32 len)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce;
|
2022-03-01 16:33:57 -08:00
|
|
|
u32 ctx_id;
|
2021-07-21 14:50:49 -07:00
|
|
|
|
|
|
|
|
if (unlikely(len < 1)) {
|
2022-03-01 16:33:55 -08:00
|
|
|
drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u\n", len);
|
2021-07-21 14:50:49 -07:00
|
|
|
return -EPROTO;
|
|
|
|
|
}
|
2022-03-01 16:33:57 -08:00
|
|
|
ctx_id = msg[0];
|
2021-07-21 14:50:49 -07:00
|
|
|
|
2022-03-01 16:33:55 -08:00
|
|
|
ce = g2h_context_lookup(guc, ctx_id);
|
2021-07-21 14:50:49 -07:00
|
|
|
if (unlikely(!ce))
|
|
|
|
|
return -EPROTO;
|
|
|
|
|
|
2021-07-21 14:51:01 -07:00
|
|
|
trace_intel_context_deregister_done(ce);
|
|
|
|
|
|
2021-09-09 09:47:32 -07:00
|
|
|
#ifdef CONFIG_DRM_I915_SELFTEST
|
|
|
|
|
if (unlikely(ce->drop_deregister)) {
|
|
|
|
|
ce->drop_deregister = false;
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
if (context_wait_for_deregister_to_register(ce)) {
|
|
|
|
|
struct intel_runtime_pm *runtime_pm =
|
|
|
|
|
&ce->engine->gt->i915->runtime_pm;
|
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Previous owner of this guc_id has been deregistered, now safe
|
|
|
|
|
* register this context.
|
|
|
|
|
*/
|
|
|
|
|
with_intel_runtime_pm(runtime_pm, wakeref)
|
2021-07-26 17:23:23 -07:00
|
|
|
register_context(ce, true);
|
2021-07-21 14:50:50 -07:00
|
|
|
guc_signal_context_fence(ce);
|
2021-07-21 14:50:49 -07:00
|
|
|
intel_context_put(ce);
|
|
|
|
|
} else if (context_destroyed(ce)) {
|
|
|
|
|
/* Context has been destroyed */
|
2021-10-14 10:19:42 -07:00
|
|
|
intel_gt_pm_put_async(guc_to_gt(guc));
|
2021-07-21 14:50:49 -07:00
|
|
|
release_guc_id(guc, ce);
|
2021-07-26 17:23:16 -07:00
|
|
|
__guc_context_destroy(ce);
|
2021-07-21 14:50:49 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:58 -07:00
|
|
|
decr_outstanding_submission_g2h(guc);
|
|
|
|
|
|
2021-07-21 14:50:49 -07:00
|
|
|
return 0;
|
|
|
|
|
}
|
2021-07-21 14:50:51 -07:00
|
|
|
|
|
|
|
|
int intel_guc_sched_done_process_msg(struct intel_guc *guc,
|
|
|
|
|
const u32 *msg,
|
|
|
|
|
u32 len)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
unsigned long flags;
|
2022-03-01 16:33:57 -08:00
|
|
|
u32 ctx_id;
|
2021-07-21 14:50:51 -07:00
|
|
|
|
|
|
|
|
if (unlikely(len < 2)) {
|
2022-03-01 16:33:55 -08:00
|
|
|
drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u\n", len);
|
2021-07-21 14:50:51 -07:00
|
|
|
return -EPROTO;
|
|
|
|
|
}
|
2022-03-01 16:33:57 -08:00
|
|
|
ctx_id = msg[0];
|
2021-07-21 14:50:51 -07:00
|
|
|
|
2022-03-01 16:33:55 -08:00
|
|
|
ce = g2h_context_lookup(guc, ctx_id);
|
2021-07-21 14:50:51 -07:00
|
|
|
if (unlikely(!ce))
|
|
|
|
|
return -EPROTO;
|
|
|
|
|
|
|
|
|
|
if (unlikely(context_destroyed(ce) ||
|
|
|
|
|
(!context_pending_enable(ce) &&
|
|
|
|
|
!context_pending_disable(ce)))) {
|
|
|
|
|
drm_err(&guc_to_gt(guc)->i915->drm,
|
2022-03-01 16:33:55 -08:00
|
|
|
"Bad context sched_state 0x%x, ctx_id %u\n",
|
|
|
|
|
ce->guc_state.sched_state, ctx_id);
|
2021-07-21 14:50:51 -07:00
|
|
|
return -EPROTO;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:51:01 -07:00
|
|
|
trace_intel_context_sched_done(ce);
|
|
|
|
|
|
2021-07-21 14:50:51 -07:00
|
|
|
if (context_pending_enable(ce)) {
|
2021-09-09 09:47:32 -07:00
|
|
|
#ifdef CONFIG_DRM_I915_SELFTEST
|
|
|
|
|
if (unlikely(ce->drop_schedule_enable)) {
|
|
|
|
|
ce->drop_schedule_enable = false;
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
2021-09-09 09:47:38 -07:00
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
2021-07-21 14:50:51 -07:00
|
|
|
clr_context_pending_enable(ce);
|
2021-09-09 09:47:38 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
2021-07-21 14:50:51 -07:00
|
|
|
} else if (context_pending_disable(ce)) {
|
2021-07-26 17:23:39 -07:00
|
|
|
bool banned;
|
|
|
|
|
|
2021-09-09 09:47:32 -07:00
|
|
|
#ifdef CONFIG_DRM_I915_SELFTEST
|
|
|
|
|
if (unlikely(ce->drop_schedule_disable)) {
|
|
|
|
|
ce->drop_schedule_disable = false;
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
|
2021-07-21 14:50:53 -07:00
|
|
|
/*
|
|
|
|
|
* Unpin must be done before __guc_signal_context_fence,
|
|
|
|
|
* otherwise a race exists between the requests getting
|
|
|
|
|
* submitted + retired before this unpin completes resulting in
|
|
|
|
|
* the pin_count going to zero and the context still being
|
|
|
|
|
* enabled.
|
|
|
|
|
*/
|
2021-07-21 14:50:51 -07:00
|
|
|
intel_context_sched_disable_unpin(ce);
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
2021-07-26 17:23:39 -07:00
|
|
|
banned = context_banned(ce);
|
|
|
|
|
clr_context_banned(ce);
|
2021-07-21 14:50:51 -07:00
|
|
|
clr_context_pending_disable(ce);
|
2021-07-21 14:50:53 -07:00
|
|
|
__guc_signal_context_fence(ce);
|
2021-07-26 17:23:40 -07:00
|
|
|
guc_blocked_fence_complete(ce);
|
2021-07-21 14:50:51 -07:00
|
|
|
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
2021-07-26 17:23:39 -07:00
|
|
|
|
|
|
|
|
if (banned) {
|
|
|
|
|
guc_cancel_context_requests(ce);
|
|
|
|
|
intel_engine_signal_breadcrumbs(ce->engine);
|
|
|
|
|
}
|
2021-07-21 14:50:51 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:58 -07:00
|
|
|
decr_outstanding_submission_g2h(guc);
|
2021-07-21 14:50:51 -07:00
|
|
|
intel_context_put(ce);
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
2021-07-21 14:50:59 -07:00
|
|
|
|
2021-07-26 17:23:33 -07:00
|
|
|
static void capture_error_state(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
struct drm_i915_private *i915 = gt->i915;
|
|
|
|
|
struct intel_engine_cs *engine = __context_to_physical_engine(ce);
|
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
|
|
|
|
|
intel_engine_set_hung_context(engine, ce);
|
|
|
|
|
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
|
2022-03-21 09:45:26 -07:00
|
|
|
i915_capture_error_state(gt, engine->mask, CORE_DUMP_FLAG_IS_GUC_CAPTURE);
|
2021-07-26 17:23:33 -07:00
|
|
|
atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:27 -07:00
|
|
|
static void guc_context_replay(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
|
|
|
|
|
|
2022-04-25 17:30:45 -07:00
|
|
|
__guc_reset_context(ce, ce->engine->mask);
|
2021-07-26 17:23:27 -07:00
|
|
|
tasklet_hi_schedule(&sched_engine->tasklet);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void guc_handle_context_reset(struct intel_guc *guc,
|
|
|
|
|
struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
trace_intel_context_reset(ce);
|
2021-07-26 17:23:39 -07:00
|
|
|
|
2022-01-13 10:13:51 -08:00
|
|
|
if (likely(!intel_context_is_banned(ce))) {
|
2021-07-26 17:23:39 -07:00
|
|
|
capture_error_state(guc, ce);
|
|
|
|
|
guc_context_replay(ce);
|
2021-12-22 17:31:28 -08:00
|
|
|
} else {
|
2022-02-24 17:52:32 -08:00
|
|
|
drm_info(&guc_to_gt(guc)->i915->drm,
|
|
|
|
|
"Ignoring context reset notification of banned context 0x%04X on %s",
|
|
|
|
|
ce->guc_id.id, ce->engine->name);
|
2021-07-26 17:23:39 -07:00
|
|
|
}
|
2021-07-26 17:23:27 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int intel_guc_context_reset_process_msg(struct intel_guc *guc,
|
|
|
|
|
const u32 *msg, u32 len)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce;
|
2021-11-08 08:40:54 -08:00
|
|
|
unsigned long flags;
|
2022-03-01 16:33:55 -08:00
|
|
|
int ctx_id;
|
2021-07-26 17:23:27 -07:00
|
|
|
|
|
|
|
|
if (unlikely(len != 1)) {
|
|
|
|
|
drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
|
|
|
|
|
return -EPROTO;
|
|
|
|
|
}
|
|
|
|
|
|
2022-03-01 16:33:55 -08:00
|
|
|
ctx_id = msg[0];
|
2021-11-08 08:40:54 -08:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* The context lookup uses the xarray but lookups only require an RCU lock
|
|
|
|
|
* not the full spinlock. So take the lock explicitly and keep it until the
|
|
|
|
|
* context has been reference count locked to ensure it can't be destroyed
|
|
|
|
|
* asynchronously until the reset is done.
|
|
|
|
|
*/
|
|
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
2022-03-01 16:33:55 -08:00
|
|
|
ce = g2h_context_lookup(guc, ctx_id);
|
2021-11-08 08:40:54 -08:00
|
|
|
if (ce)
|
|
|
|
|
intel_context_get(ce);
|
|
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
|
|
|
|
|
2021-07-26 17:23:27 -07:00
|
|
|
if (unlikely(!ce))
|
|
|
|
|
return -EPROTO;
|
|
|
|
|
|
|
|
|
|
guc_handle_context_reset(guc, ce);
|
2021-11-08 08:40:54 -08:00
|
|
|
intel_context_put(ce);
|
2021-07-26 17:23:27 -07:00
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
2022-01-06 16:06:21 -08:00
|
|
|
int intel_guc_error_capture_process_msg(struct intel_guc *guc,
|
|
|
|
|
const u32 *msg, u32 len)
|
|
|
|
|
{
|
2022-03-21 09:45:24 -07:00
|
|
|
u32 status;
|
2022-01-06 16:06:21 -08:00
|
|
|
|
|
|
|
|
if (unlikely(len != 1)) {
|
|
|
|
|
drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
|
|
|
|
|
return -EPROTO;
|
|
|
|
|
}
|
|
|
|
|
|
2022-03-21 09:45:24 -07:00
|
|
|
status = msg[0] & INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK;
|
|
|
|
|
if (status == INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE)
|
|
|
|
|
drm_warn(&guc_to_gt(guc)->i915->drm, "G2H-Error capture no space");
|
2022-01-06 16:06:21 -08:00
|
|
|
|
2022-03-21 09:45:24 -07:00
|
|
|
intel_guc_capture_process(guc);
|
2022-01-06 16:06:21 -08:00
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
2022-03-21 09:45:27 -07:00
|
|
|
struct intel_engine_cs *
|
|
|
|
|
intel_guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance)
|
2021-07-26 17:23:28 -07:00
|
|
|
{
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
u8 engine_class = guc_class_to_engine_class(guc_class);
|
|
|
|
|
|
|
|
|
|
/* Class index is checked in class converter */
|
|
|
|
|
GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
|
|
|
|
|
|
|
|
|
|
return gt->engine_class[engine_class][instance];
|
|
|
|
|
}
|
|
|
|
|
|
2022-01-20 20:31:17 -08:00
|
|
|
static void reset_fail_worker_func(struct work_struct *w)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = container_of(w, struct intel_guc,
|
|
|
|
|
submission_state.reset_fail_worker);
|
|
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
|
|
|
|
intel_engine_mask_t reset_fail_mask;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
|
|
|
|
reset_fail_mask = guc->submission_state.reset_fail_mask;
|
|
|
|
|
guc->submission_state.reset_fail_mask = 0;
|
|
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
if (likely(reset_fail_mask))
|
|
|
|
|
intel_gt_handle_error(gt, reset_fail_mask,
|
|
|
|
|
I915_ERROR_CAPTURE,
|
|
|
|
|
"GuC failed to reset engine mask=0x%x\n",
|
|
|
|
|
reset_fail_mask);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:28 -07:00
|
|
|
int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
|
|
|
|
|
const u32 *msg, u32 len)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine;
|
2021-12-10 22:58:58 -08:00
|
|
|
struct intel_gt *gt = guc_to_gt(guc);
|
2021-07-26 17:23:28 -07:00
|
|
|
u8 guc_class, instance;
|
|
|
|
|
u32 reason;
|
2022-01-20 20:31:17 -08:00
|
|
|
unsigned long flags;
|
2021-07-26 17:23:28 -07:00
|
|
|
|
|
|
|
|
if (unlikely(len != 3)) {
|
2021-12-10 22:58:58 -08:00
|
|
|
drm_err(>->i915->drm, "Invalid length %u", len);
|
2021-07-26 17:23:28 -07:00
|
|
|
return -EPROTO;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
guc_class = msg[0];
|
|
|
|
|
instance = msg[1];
|
|
|
|
|
reason = msg[2];
|
|
|
|
|
|
2022-03-21 09:45:27 -07:00
|
|
|
engine = intel_guc_lookup_engine(guc, guc_class, instance);
|
2021-07-26 17:23:28 -07:00
|
|
|
if (unlikely(!engine)) {
|
2021-12-10 22:58:58 -08:00
|
|
|
drm_err(>->i915->drm,
|
2021-07-26 17:23:28 -07:00
|
|
|
"Invalid engine %d:%d", guc_class, instance);
|
|
|
|
|
return -EPROTO;
|
|
|
|
|
}
|
|
|
|
|
|
2021-12-10 22:58:58 -08:00
|
|
|
/*
|
|
|
|
|
* This is an unexpected failure of a hardware feature. So, log a real
|
|
|
|
|
* error message not just the informational that comes with the reset.
|
|
|
|
|
*/
|
|
|
|
|
drm_err(>->i915->drm, "GuC engine reset request failed on %d:%d (%s) because 0x%08X",
|
|
|
|
|
guc_class, instance, engine->name, reason);
|
|
|
|
|
|
2022-01-20 20:31:17 -08:00
|
|
|
spin_lock_irqsave(&guc->submission_state.lock, flags);
|
|
|
|
|
guc->submission_state.reset_fail_mask |= engine->mask;
|
|
|
|
|
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* A GT reset flushes this worker queue (G2H handler) so we must use
|
|
|
|
|
* another worker to trigger a GT reset.
|
|
|
|
|
*/
|
|
|
|
|
queue_work(system_unbound_wq, &guc->submission_state.reset_fail_worker);
|
2021-07-26 17:23:28 -07:00
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:34 -07:00
|
|
|
void intel_guc_find_hung_context(struct intel_engine_cs *engine)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
struct i915_request *rq;
|
|
|
|
|
unsigned long index;
|
2021-09-09 09:47:39 -07:00
|
|
|
unsigned long flags;
|
2021-07-26 17:23:34 -07:00
|
|
|
|
|
|
|
|
/* Reset called during driver load? GuC not yet initialised! */
|
|
|
|
|
if (unlikely(!guc_submission_initialized(guc)))
|
|
|
|
|
return;
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
2021-07-26 17:23:34 -07:00
|
|
|
xa_for_each(&guc->context_lookup, index, ce) {
|
2021-09-09 09:47:39 -07:00
|
|
|
if (!kref_get_unless_zero(&ce->ref))
|
2021-07-26 17:23:34 -07:00
|
|
|
continue;
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_unlock(&guc->context_lookup);
|
|
|
|
|
|
|
|
|
|
if (!intel_context_is_pinned(ce))
|
|
|
|
|
goto next;
|
|
|
|
|
|
2021-07-26 17:23:34 -07:00
|
|
|
if (intel_engine_is_virtual(ce->engine)) {
|
|
|
|
|
if (!(ce->engine->mask & engine->mask))
|
2021-09-09 09:47:39 -07:00
|
|
|
goto next;
|
2021-07-26 17:23:34 -07:00
|
|
|
} else {
|
|
|
|
|
if (ce->engine != engine)
|
2021-09-09 09:47:39 -07:00
|
|
|
goto next;
|
2021-07-26 17:23:34 -07:00
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
list_for_each_entry(rq, &ce->guc_state.requests, sched.link) {
|
2021-07-26 17:23:34 -07:00
|
|
|
if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE)
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
intel_engine_set_hung_context(engine, ce);
|
|
|
|
|
|
|
|
|
|
/* Can only cope with one hang at a time... */
|
2021-09-09 09:47:39 -07:00
|
|
|
intel_context_put(ce);
|
|
|
|
|
xa_lock(&guc->context_lookup);
|
|
|
|
|
goto done;
|
2021-07-26 17:23:34 -07:00
|
|
|
}
|
2021-09-09 09:47:39 -07:00
|
|
|
next:
|
|
|
|
|
intel_context_put(ce);
|
|
|
|
|
xa_lock(&guc->context_lookup);
|
2021-07-26 17:23:34 -07:00
|
|
|
}
|
2021-09-09 09:47:39 -07:00
|
|
|
done:
|
|
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
2021-07-26 17:23:34 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
|
|
|
|
|
struct i915_request *hung_rq,
|
|
|
|
|
struct drm_printer *m)
|
|
|
|
|
{
|
|
|
|
|
struct intel_guc *guc = &engine->gt->uc.guc;
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
unsigned long index;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
/* Reset called during driver load? GuC not yet initialised! */
|
|
|
|
|
if (unlikely(!guc_submission_initialized(guc)))
|
|
|
|
|
return;
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
2021-07-26 17:23:34 -07:00
|
|
|
xa_for_each(&guc->context_lookup, index, ce) {
|
2021-09-09 09:47:39 -07:00
|
|
|
if (!kref_get_unless_zero(&ce->ref))
|
2021-07-26 17:23:34 -07:00
|
|
|
continue;
|
|
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_unlock(&guc->context_lookup);
|
|
|
|
|
|
|
|
|
|
if (!intel_context_is_pinned(ce))
|
|
|
|
|
goto next;
|
|
|
|
|
|
2021-07-26 17:23:34 -07:00
|
|
|
if (intel_engine_is_virtual(ce->engine)) {
|
|
|
|
|
if (!(ce->engine->mask & engine->mask))
|
2021-09-09 09:47:39 -07:00
|
|
|
goto next;
|
2021-07-26 17:23:34 -07:00
|
|
|
} else {
|
|
|
|
|
if (ce->engine != engine)
|
2021-09-09 09:47:39 -07:00
|
|
|
goto next;
|
2021-07-26 17:23:34 -07:00
|
|
|
}
|
|
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_lock(&ce->guc_state.lock);
|
|
|
|
|
intel_engine_dump_active_requests(&ce->guc_state.requests,
|
2021-07-26 17:23:34 -07:00
|
|
|
hung_rq, m);
|
2021-09-09 09:47:43 -07:00
|
|
|
spin_unlock(&ce->guc_state.lock);
|
2021-09-09 09:47:39 -07:00
|
|
|
|
|
|
|
|
next:
|
|
|
|
|
intel_context_put(ce);
|
|
|
|
|
xa_lock(&guc->context_lookup);
|
2021-07-26 17:23:34 -07:00
|
|
|
}
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
2021-07-26 17:23:34 -07:00
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:59 -07:00
|
|
|
void intel_guc_submission_print_info(struct intel_guc *guc,
|
|
|
|
|
struct drm_printer *p)
|
|
|
|
|
{
|
|
|
|
|
struct i915_sched_engine *sched_engine = guc->sched_engine;
|
|
|
|
|
struct rb_node *rb;
|
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
|
|
if (!sched_engine)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
|
|
|
|
|
atomic_read(&guc->outstanding_submission_g2h));
|
|
|
|
|
drm_printf(p, "GuC tasklet count: %u\n\n",
|
|
|
|
|
atomic_read(&sched_engine->tasklet.count));
|
|
|
|
|
|
|
|
|
|
spin_lock_irqsave(&sched_engine->lock, flags);
|
|
|
|
|
drm_printf(p, "Requests in GuC submit tasklet:\n");
|
|
|
|
|
for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
|
|
|
|
|
struct i915_priolist *pl = to_priolist(rb);
|
|
|
|
|
struct i915_request *rq;
|
|
|
|
|
|
|
|
|
|
priolist_for_each_request(rq, pl)
|
|
|
|
|
drm_printf(p, "guc_id=%u, seqno=%llu\n",
|
2021-09-09 09:47:42 -07:00
|
|
|
rq->context->guc_id.id,
|
2021-07-21 14:50:59 -07:00
|
|
|
rq->fence.seqno);
|
|
|
|
|
}
|
|
|
|
|
spin_unlock_irqrestore(&sched_engine->lock, flags);
|
|
|
|
|
drm_printf(p, "\n");
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-26 17:23:47 -07:00
|
|
|
static inline void guc_log_context_priority(struct drm_printer *p,
|
|
|
|
|
struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
int i;
|
|
|
|
|
|
2021-09-09 09:47:43 -07:00
|
|
|
drm_printf(p, "\t\tPriority: %d\n", ce->guc_state.prio);
|
2021-07-26 17:23:47 -07:00
|
|
|
drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
|
|
|
|
|
for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
|
|
|
|
|
i < GUC_CLIENT_PRIORITY_NUM; ++i) {
|
|
|
|
|
drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
|
2021-09-09 09:47:43 -07:00
|
|
|
i, ce->guc_state.prio_count[i]);
|
2021-07-26 17:23:47 -07:00
|
|
|
}
|
|
|
|
|
drm_printf(p, "\n");
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:55 -07:00
|
|
|
static inline void guc_log_context(struct drm_printer *p,
|
|
|
|
|
struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id);
|
|
|
|
|
drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
|
|
|
|
|
drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
|
|
|
|
|
ce->ring->head,
|
|
|
|
|
ce->lrc_reg_state[CTX_RING_HEAD]);
|
|
|
|
|
drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
|
|
|
|
|
ce->ring->tail,
|
|
|
|
|
ce->lrc_reg_state[CTX_RING_TAIL]);
|
|
|
|
|
drm_printf(p, "\t\tContext Pin Count: %u\n",
|
|
|
|
|
atomic_read(&ce->pin_count));
|
|
|
|
|
drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
|
|
|
|
|
atomic_read(&ce->guc_id.ref));
|
|
|
|
|
drm_printf(p, "\t\tSchedule State: 0x%x\n\n",
|
|
|
|
|
ce->guc_state.sched_state);
|
|
|
|
|
}
|
|
|
|
|
|
2021-07-21 14:50:59 -07:00
|
|
|
void intel_guc_submission_print_context_info(struct intel_guc *guc,
|
|
|
|
|
struct drm_printer *p)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce;
|
|
|
|
|
unsigned long index;
|
2021-09-09 09:47:39 -07:00
|
|
|
unsigned long flags;
|
2021-07-21 14:50:59 -07:00
|
|
|
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_lock_irqsave(&guc->context_lookup, flags);
|
2021-07-21 14:50:59 -07:00
|
|
|
xa_for_each(&guc->context_lookup, index, ce) {
|
2021-10-14 10:19:55 -07:00
|
|
|
GEM_BUG_ON(intel_context_is_child(ce));
|
2021-07-26 17:23:47 -07:00
|
|
|
|
2021-10-14 10:19:55 -07:00
|
|
|
guc_log_context(p, ce);
|
2021-07-26 17:23:47 -07:00
|
|
|
guc_log_context_priority(p, ce);
|
2021-10-14 10:19:55 -07:00
|
|
|
|
|
|
|
|
if (intel_context_is_parent(ce)) {
|
|
|
|
|
struct intel_context *child;
|
|
|
|
|
|
|
|
|
|
drm_printf(p, "\t\tNumber children: %u\n",
|
|
|
|
|
ce->parallel.number_children);
|
2022-07-18 16:07:32 -07:00
|
|
|
|
|
|
|
|
if (ce->parallel.guc.wq_status) {
|
|
|
|
|
drm_printf(p, "\t\tWQI Head: %u\n",
|
|
|
|
|
READ_ONCE(*ce->parallel.guc.wq_head));
|
|
|
|
|
drm_printf(p, "\t\tWQI Tail: %u\n",
|
|
|
|
|
READ_ONCE(*ce->parallel.guc.wq_tail));
|
|
|
|
|
drm_printf(p, "\t\tWQI Status: %u\n\n",
|
|
|
|
|
READ_ONCE(*ce->parallel.guc.wq_status));
|
|
|
|
|
}
|
2021-10-14 10:19:55 -07:00
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
if (ce->engine->emit_bb_start ==
|
|
|
|
|
emit_bb_start_parent_no_preempt_mid_batch) {
|
|
|
|
|
u8 i;
|
|
|
|
|
|
|
|
|
|
drm_printf(p, "\t\tChildren Go: %u\n\n",
|
|
|
|
|
get_children_go_value(ce));
|
|
|
|
|
for (i = 0; i < ce->parallel.number_children; ++i)
|
|
|
|
|
drm_printf(p, "\t\tChildren Join: %u\n",
|
|
|
|
|
get_children_join_value(ce, i));
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:55 -07:00
|
|
|
for_each_child(ce, child)
|
|
|
|
|
guc_log_context(p, child);
|
|
|
|
|
}
|
2021-07-21 14:50:59 -07:00
|
|
|
}
|
2021-09-09 09:47:39 -07:00
|
|
|
xa_unlock_irqrestore(&guc->context_lookup, flags);
|
2021-07-21 14:50:59 -07:00
|
|
|
}
|
2021-07-26 17:23:16 -07:00
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
static inline u32 get_children_go_addr(struct intel_context *ce)
|
|
|
|
|
{
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_parent(ce));
|
|
|
|
|
|
|
|
|
|
return i915_ggtt_offset(ce->state) +
|
|
|
|
|
__get_parent_scratch_offset(ce) +
|
|
|
|
|
offsetof(struct parent_scratch, go.semaphore);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline u32 get_children_join_addr(struct intel_context *ce,
|
|
|
|
|
u8 child_index)
|
|
|
|
|
{
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_parent(ce));
|
|
|
|
|
|
|
|
|
|
return i915_ggtt_offset(ce->state) +
|
|
|
|
|
__get_parent_scratch_offset(ce) +
|
|
|
|
|
offsetof(struct parent_scratch, join[child_index].semaphore);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#define PARENT_GO_BB 1
|
|
|
|
|
#define PARENT_GO_FINI_BREADCRUMB 0
|
|
|
|
|
#define CHILD_GO_BB 1
|
|
|
|
|
#define CHILD_GO_FINI_BREADCRUMB 0
|
|
|
|
|
static int emit_bb_start_parent_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u64 offset, u32 len,
|
|
|
|
|
const unsigned int flags)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = rq->context;
|
|
|
|
|
u32 *cs;
|
|
|
|
|
u8 i;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_parent(ce));
|
|
|
|
|
|
|
|
|
|
cs = intel_ring_begin(rq, 10 + 4 * ce->parallel.number_children);
|
|
|
|
|
if (IS_ERR(cs))
|
|
|
|
|
return PTR_ERR(cs);
|
|
|
|
|
|
|
|
|
|
/* Wait on children */
|
|
|
|
|
for (i = 0; i < ce->parallel.number_children; ++i) {
|
|
|
|
|
*cs++ = (MI_SEMAPHORE_WAIT |
|
|
|
|
|
MI_SEMAPHORE_GLOBAL_GTT |
|
|
|
|
|
MI_SEMAPHORE_POLL |
|
|
|
|
|
MI_SEMAPHORE_SAD_EQ_SDD);
|
|
|
|
|
*cs++ = PARENT_GO_BB;
|
|
|
|
|
*cs++ = get_children_join_addr(ce, i);
|
|
|
|
|
*cs++ = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Turn off preemption */
|
|
|
|
|
*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
|
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
|
|
|
|
|
|
/* Tell children go */
|
|
|
|
|
cs = gen8_emit_ggtt_write(cs,
|
|
|
|
|
CHILD_GO_BB,
|
|
|
|
|
get_children_go_addr(ce),
|
|
|
|
|
0);
|
|
|
|
|
|
|
|
|
|
/* Jump to batch */
|
|
|
|
|
*cs++ = MI_BATCH_BUFFER_START_GEN8 |
|
|
|
|
|
(flags & I915_DISPATCH_SECURE ? 0 : BIT(8));
|
|
|
|
|
*cs++ = lower_32_bits(offset);
|
|
|
|
|
*cs++ = upper_32_bits(offset);
|
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
|
|
|
|
|
|
intel_ring_advance(rq, cs);
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u64 offset, u32 len,
|
|
|
|
|
const unsigned int flags)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = rq->context;
|
|
|
|
|
struct intel_context *parent = intel_context_to_parent(ce);
|
|
|
|
|
u32 *cs;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_child(ce));
|
|
|
|
|
|
|
|
|
|
cs = intel_ring_begin(rq, 12);
|
|
|
|
|
if (IS_ERR(cs))
|
|
|
|
|
return PTR_ERR(cs);
|
|
|
|
|
|
|
|
|
|
/* Signal parent */
|
|
|
|
|
cs = gen8_emit_ggtt_write(cs,
|
|
|
|
|
PARENT_GO_BB,
|
|
|
|
|
get_children_join_addr(parent,
|
|
|
|
|
ce->parallel.child_index),
|
|
|
|
|
0);
|
|
|
|
|
|
|
|
|
|
/* Wait on parent for go */
|
|
|
|
|
*cs++ = (MI_SEMAPHORE_WAIT |
|
|
|
|
|
MI_SEMAPHORE_GLOBAL_GTT |
|
|
|
|
|
MI_SEMAPHORE_POLL |
|
|
|
|
|
MI_SEMAPHORE_SAD_EQ_SDD);
|
|
|
|
|
*cs++ = CHILD_GO_BB;
|
|
|
|
|
*cs++ = get_children_go_addr(parent);
|
|
|
|
|
*cs++ = 0;
|
|
|
|
|
|
|
|
|
|
/* Turn off preemption */
|
|
|
|
|
*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
|
|
|
|
|
|
|
|
|
|
/* Jump to batch */
|
|
|
|
|
*cs++ = MI_BATCH_BUFFER_START_GEN8 |
|
|
|
|
|
(flags & I915_DISPATCH_SECURE ? 0 : BIT(8));
|
|
|
|
|
*cs++ = lower_32_bits(offset);
|
|
|
|
|
*cs++ = upper_32_bits(offset);
|
|
|
|
|
|
|
|
|
|
intel_ring_advance(rq, cs);
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static u32 *
|
2021-10-14 10:20:01 -07:00
|
|
|
__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u32 *cs)
|
2021-10-14 10:19:59 -07:00
|
|
|
{
|
|
|
|
|
struct intel_context *ce = rq->context;
|
|
|
|
|
u8 i;
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_parent(ce));
|
|
|
|
|
|
|
|
|
|
/* Wait on children */
|
|
|
|
|
for (i = 0; i < ce->parallel.number_children; ++i) {
|
|
|
|
|
*cs++ = (MI_SEMAPHORE_WAIT |
|
|
|
|
|
MI_SEMAPHORE_GLOBAL_GTT |
|
|
|
|
|
MI_SEMAPHORE_POLL |
|
|
|
|
|
MI_SEMAPHORE_SAD_EQ_SDD);
|
|
|
|
|
*cs++ = PARENT_GO_FINI_BREADCRUMB;
|
|
|
|
|
*cs++ = get_children_join_addr(ce, i);
|
|
|
|
|
*cs++ = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Turn on preemption */
|
|
|
|
|
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
|
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
|
|
|
|
|
|
/* Tell children go */
|
|
|
|
|
cs = gen8_emit_ggtt_write(cs,
|
|
|
|
|
CHILD_GO_FINI_BREADCRUMB,
|
|
|
|
|
get_children_go_addr(ce),
|
|
|
|
|
0);
|
|
|
|
|
|
2021-10-14 10:20:01 -07:00
|
|
|
return cs;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If this true, a submission of multi-lrc requests had an error and the
|
|
|
|
|
* requests need to be skipped. The front end (execuf IOCTL) should've called
|
|
|
|
|
* i915_request_skip which squashes the BB but we still need to emit the fini
|
|
|
|
|
* breadrcrumbs seqno write. At this point we don't know how many of the
|
|
|
|
|
* requests in the multi-lrc submission were generated so we can't do the
|
|
|
|
|
* handshake between the parent and children (e.g. if 4 requests should be
|
|
|
|
|
* generated but 2nd hit an error only 1 would be seen by the GuC backend).
|
|
|
|
|
* Simply skip the handshake, but still emit the breadcrumbd seqno, if an error
|
|
|
|
|
* has occurred on any of the requests in submission / relationship.
|
|
|
|
|
*/
|
|
|
|
|
static inline bool skip_handshake(struct i915_request *rq)
|
|
|
|
|
{
|
|
|
|
|
return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags);
|
|
|
|
|
}
|
|
|
|
|
|
2022-01-19 13:06:39 -08:00
|
|
|
#define NON_SKIP_LEN 6
|
2021-10-14 10:20:01 -07:00
|
|
|
static u32 *
|
|
|
|
|
emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u32 *cs)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = rq->context;
|
2022-01-19 13:06:39 -08:00
|
|
|
__maybe_unused u32 *before_fini_breadcrumb_user_interrupt_cs;
|
|
|
|
|
__maybe_unused u32 *start_fini_breadcrumb_cs = cs;
|
2021-10-14 10:20:01 -07:00
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_parent(ce));
|
|
|
|
|
|
|
|
|
|
if (unlikely(skip_handshake(rq))) {
|
|
|
|
|
/*
|
|
|
|
|
* NOP everything in __emit_fini_breadcrumb_parent_no_preempt_mid_batch,
|
2022-01-19 13:06:39 -08:00
|
|
|
* the NON_SKIP_LEN comes from the length of the emits below.
|
2021-10-14 10:20:01 -07:00
|
|
|
*/
|
|
|
|
|
memset(cs, 0, sizeof(u32) *
|
2022-01-19 13:06:39 -08:00
|
|
|
(ce->engine->emit_fini_breadcrumb_dw - NON_SKIP_LEN));
|
|
|
|
|
cs += ce->engine->emit_fini_breadcrumb_dw - NON_SKIP_LEN;
|
2021-10-14 10:20:01 -07:00
|
|
|
} else {
|
|
|
|
|
cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs);
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
/* Emit fini breadcrumb */
|
2022-01-19 13:06:39 -08:00
|
|
|
before_fini_breadcrumb_user_interrupt_cs = cs;
|
2021-10-14 10:19:59 -07:00
|
|
|
cs = gen8_emit_ggtt_write(cs,
|
|
|
|
|
rq->fence.seqno,
|
|
|
|
|
i915_request_active_timeline(rq)->hwsp_offset,
|
|
|
|
|
0);
|
|
|
|
|
|
|
|
|
|
/* User interrupt */
|
|
|
|
|
*cs++ = MI_USER_INTERRUPT;
|
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
|
|
2022-01-19 13:06:39 -08:00
|
|
|
/* Ensure our math for skip + emit is correct */
|
|
|
|
|
GEM_BUG_ON(before_fini_breadcrumb_user_interrupt_cs + NON_SKIP_LEN !=
|
|
|
|
|
cs);
|
|
|
|
|
GEM_BUG_ON(start_fini_breadcrumb_cs +
|
|
|
|
|
ce->engine->emit_fini_breadcrumb_dw != cs);
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
rq->tail = intel_ring_offset(rq, cs);
|
|
|
|
|
|
|
|
|
|
return cs;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static u32 *
|
2021-10-14 10:20:01 -07:00
|
|
|
__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u32 *cs)
|
2021-10-14 10:19:59 -07:00
|
|
|
{
|
|
|
|
|
struct intel_context *ce = rq->context;
|
|
|
|
|
struct intel_context *parent = intel_context_to_parent(ce);
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_child(ce));
|
|
|
|
|
|
|
|
|
|
/* Turn on preemption */
|
|
|
|
|
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
|
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
|
|
|
|
|
|
/* Signal parent */
|
|
|
|
|
cs = gen8_emit_ggtt_write(cs,
|
|
|
|
|
PARENT_GO_FINI_BREADCRUMB,
|
|
|
|
|
get_children_join_addr(parent,
|
|
|
|
|
ce->parallel.child_index),
|
|
|
|
|
0);
|
|
|
|
|
|
|
|
|
|
/* Wait parent on for go */
|
|
|
|
|
*cs++ = (MI_SEMAPHORE_WAIT |
|
|
|
|
|
MI_SEMAPHORE_GLOBAL_GTT |
|
|
|
|
|
MI_SEMAPHORE_POLL |
|
|
|
|
|
MI_SEMAPHORE_SAD_EQ_SDD);
|
|
|
|
|
*cs++ = CHILD_GO_FINI_BREADCRUMB;
|
|
|
|
|
*cs++ = get_children_go_addr(parent);
|
|
|
|
|
*cs++ = 0;
|
|
|
|
|
|
2021-10-14 10:20:01 -07:00
|
|
|
return cs;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static u32 *
|
|
|
|
|
emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
|
|
|
|
|
u32 *cs)
|
|
|
|
|
{
|
|
|
|
|
struct intel_context *ce = rq->context;
|
2022-01-19 13:06:39 -08:00
|
|
|
__maybe_unused u32 *before_fini_breadcrumb_user_interrupt_cs;
|
|
|
|
|
__maybe_unused u32 *start_fini_breadcrumb_cs = cs;
|
2021-10-14 10:20:01 -07:00
|
|
|
|
|
|
|
|
GEM_BUG_ON(!intel_context_is_child(ce));
|
|
|
|
|
|
|
|
|
|
if (unlikely(skip_handshake(rq))) {
|
|
|
|
|
/*
|
|
|
|
|
* NOP everything in __emit_fini_breadcrumb_child_no_preempt_mid_batch,
|
2022-01-19 13:06:39 -08:00
|
|
|
* the NON_SKIP_LEN comes from the length of the emits below.
|
2021-10-14 10:20:01 -07:00
|
|
|
*/
|
|
|
|
|
memset(cs, 0, sizeof(u32) *
|
2022-01-19 13:06:39 -08:00
|
|
|
(ce->engine->emit_fini_breadcrumb_dw - NON_SKIP_LEN));
|
|
|
|
|
cs += ce->engine->emit_fini_breadcrumb_dw - NON_SKIP_LEN;
|
2021-10-14 10:20:01 -07:00
|
|
|
} else {
|
|
|
|
|
cs = __emit_fini_breadcrumb_child_no_preempt_mid_batch(rq, cs);
|
|
|
|
|
}
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
/* Emit fini breadcrumb */
|
2022-01-19 13:06:39 -08:00
|
|
|
before_fini_breadcrumb_user_interrupt_cs = cs;
|
2021-10-14 10:19:59 -07:00
|
|
|
cs = gen8_emit_ggtt_write(cs,
|
|
|
|
|
rq->fence.seqno,
|
|
|
|
|
i915_request_active_timeline(rq)->hwsp_offset,
|
|
|
|
|
0);
|
|
|
|
|
|
|
|
|
|
/* User interrupt */
|
|
|
|
|
*cs++ = MI_USER_INTERRUPT;
|
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
|
|
2022-01-19 13:06:39 -08:00
|
|
|
/* Ensure our math for skip + emit is correct */
|
|
|
|
|
GEM_BUG_ON(before_fini_breadcrumb_user_interrupt_cs + NON_SKIP_LEN !=
|
|
|
|
|
cs);
|
|
|
|
|
GEM_BUG_ON(start_fini_breadcrumb_cs +
|
|
|
|
|
ce->engine->emit_fini_breadcrumb_dw != cs);
|
|
|
|
|
|
2021-10-14 10:19:59 -07:00
|
|
|
rq->tail = intel_ring_offset(rq, cs);
|
|
|
|
|
|
|
|
|
|
return cs;
|
|
|
|
|
}
|
|
|
|
|
|
2022-01-19 13:06:39 -08:00
|
|
|
#undef NON_SKIP_LEN
|
|
|
|
|
|
2021-07-26 17:23:16 -07:00
|
|
|
static struct intel_context *
|
2021-10-14 10:19:56 -07:00
|
|
|
guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count,
|
|
|
|
|
unsigned long flags)
|
2021-07-26 17:23:16 -07:00
|
|
|
{
|
|
|
|
|
struct guc_virtual_engine *ve;
|
|
|
|
|
struct intel_guc *guc;
|
|
|
|
|
unsigned int n;
|
|
|
|
|
int err;
|
|
|
|
|
|
|
|
|
|
ve = kzalloc(sizeof(*ve), GFP_KERNEL);
|
|
|
|
|
if (!ve)
|
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
|
|
|
|
|
guc = &siblings[0]->gt->uc.guc;
|
|
|
|
|
|
|
|
|
|
ve->base.i915 = siblings[0]->i915;
|
|
|
|
|
ve->base.gt = siblings[0]->gt;
|
|
|
|
|
ve->base.uncore = siblings[0]->uncore;
|
|
|
|
|
ve->base.id = -1;
|
|
|
|
|
|
|
|
|
|
ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
|
|
|
|
|
ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
|
|
|
|
|
ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
|
|
|
|
|
ve->base.saturated = ALL_ENGINES;
|
|
|
|
|
|
|
|
|
|
snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
|
|
|
|
|
|
|
|
|
|
ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
|
|
|
|
|
|
|
|
|
|
ve->base.cops = &virtual_guc_context_ops;
|
|
|
|
|
ve->base.request_alloc = guc_request_alloc;
|
2021-07-26 17:23:17 -07:00
|
|
|
ve->base.bump_serial = virtual_guc_bump_serial;
|
2021-07-26 17:23:16 -07:00
|
|
|
|
|
|
|
|
ve->base.submit_request = guc_submit_request;
|
|
|
|
|
|
|
|
|
|
ve->base.flags = I915_ENGINE_IS_VIRTUAL;
|
|
|
|
|
|
|
|
|
|
intel_context_init(&ve->context, &ve->base);
|
|
|
|
|
|
|
|
|
|
for (n = 0; n < count; n++) {
|
|
|
|
|
struct intel_engine_cs *sibling = siblings[n];
|
|
|
|
|
|
|
|
|
|
GEM_BUG_ON(!is_power_of_2(sibling->mask));
|
|
|
|
|
if (sibling->mask & ve->base.mask) {
|
|
|
|
|
DRM_DEBUG("duplicate %s entry in load balancer\n",
|
|
|
|
|
sibling->name);
|
|
|
|
|
err = -EINVAL;
|
|
|
|
|
goto err_put;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ve->base.mask |= sibling->mask;
|
2021-10-14 10:19:45 -07:00
|
|
|
ve->base.logical_mask |= sibling->logical_mask;
|
2021-07-26 17:23:16 -07:00
|
|
|
|
|
|
|
|
if (n != 0 && ve->base.class != sibling->class) {
|
|
|
|
|
DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
|
|
|
|
|
sibling->class, ve->base.class);
|
|
|
|
|
err = -EINVAL;
|
|
|
|
|
goto err_put;
|
|
|
|
|
} else if (n == 0) {
|
|
|
|
|
ve->base.class = sibling->class;
|
|
|
|
|
ve->base.uabi_class = sibling->uabi_class;
|
|
|
|
|
snprintf(ve->base.name, sizeof(ve->base.name),
|
|
|
|
|
"v%dx%d", ve->base.class, count);
|
|
|
|
|
ve->base.context_size = sibling->context_size;
|
|
|
|
|
|
2021-07-26 17:23:22 -07:00
|
|
|
ve->base.add_active_request =
|
|
|
|
|
sibling->add_active_request;
|
|
|
|
|
ve->base.remove_active_request =
|
|
|
|
|
sibling->remove_active_request;
|
2021-07-26 17:23:16 -07:00
|
|
|
ve->base.emit_bb_start = sibling->emit_bb_start;
|
|
|
|
|
ve->base.emit_flush = sibling->emit_flush;
|
|
|
|
|
ve->base.emit_init_breadcrumb =
|
|
|
|
|
sibling->emit_init_breadcrumb;
|
|
|
|
|
ve->base.emit_fini_breadcrumb =
|
|
|
|
|
sibling->emit_fini_breadcrumb;
|
|
|
|
|
ve->base.emit_fini_breadcrumb_dw =
|
|
|
|
|
sibling->emit_fini_breadcrumb_dw;
|
2021-07-26 17:23:20 -07:00
|
|
|
ve->base.breadcrumbs =
|
|
|
|
|
intel_breadcrumbs_get(sibling->breadcrumbs);
|
2021-07-26 17:23:16 -07:00
|
|
|
|
|
|
|
|
ve->base.flags |= sibling->flags;
|
|
|
|
|
|
|
|
|
|
ve->base.props.timeslice_duration_ms =
|
|
|
|
|
sibling->props.timeslice_duration_ms;
|
|
|
|
|
ve->base.props.preempt_timeout_ms =
|
|
|
|
|
sibling->props.preempt_timeout_ms;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return &ve->context;
|
|
|
|
|
|
|
|
|
|
err_put:
|
|
|
|
|
intel_context_put(&ve->context);
|
|
|
|
|
return ERR_PTR(err);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
|
|
|
|
|
{
|
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
intel_engine_mask_t tmp, mask = ve->mask;
|
|
|
|
|
|
|
|
|
|
for_each_engine_masked(engine, ve->gt, mask, tmp)
|
|
|
|
|
if (READ_ONCE(engine->props.heartbeat_interval_ms))
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2021-09-09 09:47:32 -07:00
|
|
|
|
|
|
|
|
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
|
|
|
|
|
#include "selftest_guc.c"
|
2021-10-14 10:19:58 -07:00
|
|
|
#include "selftest_guc_multi_lrc.c"
|
2021-09-09 09:47:32 -07:00
|
|
|
#endif
|