If we find ourselves waiting on a MI_SEMAPHORE_WAIT, either within the
user batch or in our own preamble, the engine raises a
GT_WAIT_ON_SEMAPHORE interrupt. We can unmask that interrupt and so
respond to a semaphore wait by yielding the timeslice, if we have
another context to yield to!
The only real complication is that the interrupt is only generated for
the start of the semaphore wait, and is asynchronous to our
process_csb() -- that is, we may not have registered the timeslice before
we see the interrupt. To ensure we don't miss a potential semaphore
blocking forward progress (e.g. selftests/live_timeslice_preempt) we mark
the interrupt and apply it to the next timeslice regardless of whether it
was active at the time.
v2: We use semaphores in preempt-to-busy, within the timeslicing
implementation itself! Ergo, when we do insert a preemption due to an
expired timeslice, the new context may start with the missed semaphore
flagged by the retired context and be yielded, ad infinitum. To avoid
this, read the context id at the time of the semaphore interrupt and
only yield if that context is still active.
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200407130811.17321-1-chris@chris-wilson.co.uk
GPU saves accumulated context runtime (in CS timestamp units) in PPHWSP
which will be useful for us in cases when we are not able to track context
busyness ourselves (like with GuC). Keep a copy of this in struct
intel_context from where it can be easily read even if the context is not
pinned.
v2:
(Chris)
* Do not store pphwsp address in intel_context.
* Log CS wrap-around.
* Simplify calculation by relying on integer wraparound.
v3:
* Include total/avg in traces and error state for debugging
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200216133620.394962-1-chris@chris-wilson.co.uk
Originally, I did not expect having to rewind a context upon
timeslicing: the point was to replace the executing context with a
non-executing one! However, given a second context that depends on
requests from the first, we may have to split the requests along the
first context to execute the second, causing us to partially replay the
first context and so have to rewind its RING_TAIL.
References: 5ba32c7be8 ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200213140150.3639027-1-chris@chris-wilson.co.uk
Currently on execlists, we use a local hwsp for the kernel_context,
rather than the engine's HWSP, as this is the default for execlists.
However, seqno wrap requires allocating a new HWSP cacheline, and may
require pinning a new HWSP page in the GGTT. This operation requiring
pinning in the GGTT is not allowed within the kernel_context timeline,
as doing so may require re-entering the kernel_context in order to evict
from the GGTT. As we want to avoid requiring a new HWSP for the
kernel_context, we can use the permanently pinned engine's HWSP instead.
However to do so we must prevent the use of semaphores reading the
kernel_context's HWSP, as the use of semaphores do not support rollover
onto the same cacheline. Fortunately, the kernel_context is mostly
isolated, so unlikely to give benefit to semaphores.
Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200210205722.794180-5-chris@chris-wilson.co.uk
Now that we have offline error capture and can reset an engine from
inside an atomic context while also preserving the GPU state for
post-mortem analysis, it is time to handle error interrupts thrown by
the command parser.
This provides a much, much faster mechanism for us to detect known
problems than using heartbeats/hangchecks, and also provides a mechanism
for when those are disabled. However, it is limited to problems the HW
can detect in the CS and so not a complete solution for detecting lockups.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200128204318.4182039-2-chris@chris-wilson.co.uk
If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.
v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine
Fixes: 748317386a ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk
Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.
Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.
Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).
Fixes: 32ff621fd7 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk
In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).
Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.
v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!
References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).
Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.
GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
As the engine->kernel_context is used within the engine-pm barrier, we
have to be careful when emitting requests outside of the barrier, as the
strict timeline locking rules do not apply. Instead, we must ensure the
engine_park() cannot be entered as we build the request, which is
simplest by taking an explicit engine-pm wakeref around the request
construction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk
Bonded request submission is designed to allow requests to execute in
parallel as laid out by the user. If the master request is already
finished before its bonded pair is submitted, the pair were not destined
to run in parallel and we lose the information about the master engine
to dictate selection of the secondary. If the second request was
required to be run on a particular engine in a virtual set, that should
have been specified, rather than left to the whims of a random
unconnected requests!
In the selftest, I made the mistake of not ensuring the master would
overlap with its bonded pairs, meaning that it could indeed complete
before we submitted the bonds. Those bonds were then free to select any
available engine in their virtual set, and not the one expected by the
test.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122112152.660743-1-chris@chris-wilson.co.uk
Execlists uses a scheduling quantum (a timeslice) to alternate execution
between ready-to-run contexts of equal priority. This ensures that all
users (though only if they of equal importance) have the opportunity to
run and prevents livelocks where contexts may have implicit ordering due
to userspace semaphores. However, not all workloads necessarily benefit
from timeslicing and in the extreme some sysadmin may want to disable or
reduce the timeslicing granularity.
The timeslicing mechanism can be compiled out^W^W disabled (but should
DCE!) with
./scripts/config --set-val DRM_I915_TIMESLICE_DURATION 0
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191029091632.26281-1-chris@chris-wilson.co.uk