Commit 0f36a85c3b ("drm/i915: Flush pending interrupt following a GPU
reset") got confused and only applied the flush to the set-wedge path
(which itself is proving troublesome), but we also need the
serialisation on the regular reset path. Oops.
Move the interrupt into reset_irq() and make it common to the reset and
final set-wedge.
v2: reset_irq() after port cancellation, as we assert that
execlists->active is sane for cancellation (and is being reset by
reset_irq).
References: 0f36a85c3b ("drm/i915: Flush pending interrupt following a GPU reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180323101824.14645-1-chris@chris-wilson.co.uk
After resetting the GPU (or subset of engines), call synchronize_irq()
to flush any pending irq before proceeding with the cleanup. For a
device level reset, we disable the interupts around the reset, but when
resetting just one engine, we have to avoid such global disabling. This
leaves us open to an interrupt arriving for the engine as we try to
reset it. We already do try to flush the IIR following the reset, but we
have to ensure that the in-flight interrupt does not land after we start
cleaning up after the reset; enter synchronize_irq().
As it current stands, we very rarely, but fatally, see sequences such as:
2.... 57964564us : execlists_reset_prepare: rcs0
2.... 57964613us : execlists_reset: rcs0 seqno=424
0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1
2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
2.... 57964703us : execlists_reset_finish: rcs0
0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1
v2: Move the sync into the execlists reset handler so that we coordinate
the flush with disabling the interrupt handling and canceling the
pending interrupt.
v3: Just use synchronize_hardirq() to avoid the might_sleep(), we do not
yet have threaded-irq to worry about.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180322073533.5313-4-chris@chris-wilson.co.uk
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
We were relying on the uncached reads when processing the CSB to provide
ourselves with the serialisation with the interrupt handler (so we could
detect new interrupts in the middle of processing the old one). However,
in commit 767a983ab2 ("drm/i915/execlists: Read the context-status HEAD
from the HWSP") those uncached reads were eliminated (on one path at
least) and along with them our serialisation. The result is that we
would very rarely miss notification of a new interrupt and leave a
context-switch unprocessed, hanging the GPU.
Fixes: 767a983ab2 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180321091027.21034-1-chris@chris-wilson.co.uk
The only usage outside the intel_lrc.c file is in the ringbuffer
init, but the irq mask calculated there is then overwritten for
all engines that have a non-zero shift, so we can drop it.
This change is not aimed at code saving but at removing from
intel_engines information that does not apply to all gens that have
the engine. When checking without the temporary WARN_ON, code size
is basically unchanged.
v2: make the irq_shifts array static const
v3: rebase, move irq_shifts array to logical_ring_default_irqs
v4: move array inside the if and use u8 for it (Chris)
Suggested-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-4-daniele.ceraolospurio@intel.com
The "reset" value and the "keep" value are the same.
While we are here, add a TODO for gen11 interrupt reset
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-3-daniele.ceraolospurio@intel.com
There is one corner case missing schedule out notification of the preempted
request. The preempted request is just completed when preemption happen,
then it will be canceled and won't be resubmitted later, GVT-g will lost
the schedule out notification.
Here add schedule out notification if found the preempted request has been
completed.
v2:
- refine description, add completed check and notification in
execlists_cancel_port_requests. (Chris)
v3:
- use ternary confitional, remove local variable. (Tvrtko)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1520302557-25079-1-git-send-email-weinan.z.li@intel.com
Up to now, subslice mask was assumed to be uniform across slices. But
starting with Cannonlake, slices can be asymmetric (for example slice0
has different number of subslices as slice1+). This change stores all
subslices masks for all slices rather than having a single mask that
applies to all slices.
v2: Rework how we store total numbers in sseu_dev_info (Tvrtko)
Fix CHV eu masks, was reading disabled as enabled (Tvrtko)
Readability changes (Tvrtko)
Add EU index helper (Tvrtko)
v3: Turn ALIGN(v, 8) / 8 into DIV_ROUND_UP(v, BITS_PER_BYTE) (Tvrtko)
Reuse sseu_eu_idx() for setting eu_mask on CHV (Tvrtko)
Reformat debug prints for subslices (Tvrtko)
v4: Change eu_mask helper into sseu_set_eus() (Tvrtko)
v5: With Haswell reporting masks & counts, bump sseu_*_eus() functions
to use u16 (Lionel)
v6: Fix sseu_get_eus() for > 8 EUs per subslice (Lionel)
v7: Change debugfs enabels for number of subslices per slice, will
need a small igt/pm_sseu change (Lionel)
Drop subslice_total field from sseu_dev_info, rely on
sseu_subslice_total() to recompute the value instead (Lionel)
v8: Remove unused function compute_subslice_total() (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-2-lionel.g.landwerlin@intel.com
Enhanced Execlists is an upgraded version of execlists which supports
up to 8 ports. The lrcs to be submitted are written to a submit queue
(the ExecLists Submission Queue - ELSQ), which is then loaded on the
HW. When writing to the ELSP register, the lrcs are written cyclically
in the queue from position 0 to position 7. Alternatively, it is
possible to write directly in the individual positions of the queue
using the ELSQC registers. To be able to re-use all the existing code
we're using the latter method and we're currently limiting ourself to
only using 2 elements.
v2: Rebase.
v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio).
v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio)
v5: Reword commit, rename regs to be closer to specs, turn off
preemption (Daniele), reuse engine->execlists.elsp (Chris)
v6: use has_logical_ring_elsq to differentiate the new paths
v7: add preemption support, rename els to submit_reg (Chris)
v8: save the ctrl register inside the execlists struct, drop CSB
handling updates (superseded by preempt_complete_status) (Chris)
v9: s/drm_i915_gem_request/i915_request (Mika)
v10: resolved conflict in inject_preempt_context (Mika)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-4-mika.kuoppala@linux.intel.com
Starting from Gen11 the context descriptor format has been updated in
the HW. The hw_id field has been considerably reduced in size and engine
class and instance fields have been added.
There is a slight name clashing issue because the field that we call
hw_id is actually called SW Context ID in the specs for Gen11+.
With the current size of the hw_id field we can have a maximum of 2k
contexts at any time, but we could use the sw_counter field (which is sw
defined) to increase that because the HW requirement is that
engine_id + sw id + sw_counter is a unique number.
GuC uses a similar method to support more contexts but does its tracking
at lrc level. To avoid doing an implementation that will need to be
reworked once GuC support lands, defer it for now and mark it as TODO.
v2: rebased, add documentation, fix GEN11_ENGINE_INSTANCE_SHIFT
v3: rebased, bring back lost code from i915_gem_context.c
v4: make TODO comment more generic
v5: be consistent with bit ordering, add extra checks (Chris)
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-3-mika.kuoppala@linux.intel.com
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
On Gen11 interrupt masks need to be clear to allow C6 entry.
We keep them all enabled knowing that we generate extra
interrupts.
v2: Rebase.
v3: Remove gen 11 extra check in logical_render_ring_init.
v4: Rebase fixes.
v5: Rebase/refactor.
v6: Rebase.
v7: Rebase.
v8: Update comment and commit message (Daniele)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-1-mika.kuoppala@linux.intel.com
During reset/wedging, we have to clean up the requests on the timeline
and flush the pending interrupt state. Currently, we are abusing the irq
disabling of the timeline spinlock to protect the irq state in
conjunction to the engine's timeline requests, but this is accidental
and conflates the spinlock with the irq state. A baffling state of
affairs for the reader.
Instead, explicitly disable irqs over the critical section, and separate
modifying the irq state from the timeline's requests.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302143246.2579-4-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Although this state (execlists->active and engine->irq_posted) itself is
not protected by the engine->timeline spinlock, it does conveniently
ensure that irqs are disabled. We can use this to protect our
manipulation of the state and so ensure that the next IRQ to arrive sees
consistent state and (hopefully) ignores the reset engine.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302131246.22036-1-chris@chris-wilson.co.uk
Sometimes we need to boost the priority of an in-flight request, which
may lead to the situation where the second submission port then contains
a higher priority context than the first and so we need to inject a
preemption event. To do so we must always check inside
execlists_dequeue() whether there is a priority inversion between the
ports themselves as well as the head of the priority sorted queue, and we
cannot just skip dequeuing if the queue is empty.
As Michał noted, this doesn't simply extend to handling more than 2-port
submission, as we may need to reorder within the array of executing
requests which themselves are lower priority than the first. A task for
later!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180222142229.14517-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Print out the current request/context before doing the GEM_BUG_ON, so
that we can inspect the values in the ftrace.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221152301.9178-1-chris@chris-wilson.co.uk
Load an empty ringbuffer for preemption, ignoring the lite-restore
workaround as we know the preempt context is always idle before preemption.
Note that after some digging by Michal Winiarski, we found that
RING_HEAD is no longer being updated (due to inhibiting context save
restore) so this patch is already in effect!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221133236.29402-1-chris@chris-wilson.co.uk
We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)
In short, the spatch:
@@
@@
- struct drm_i915_gem_request
+ struct i915_request
A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We can't assert that the execlists are active before we set the flag. So
perform the assert after we are expected to have marked the execlists
active.
Fixes: 339ccd35b4 ("drm/i915: Assert that we always complete a submission to guc/execlists")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180216153210.30551-1-chris@chris-wilson.co.uk
The continual resubmission model for execlists (and emulated over guc)
requires that we keep feeding requests into the HW in order to generate
more CS interrupts to drain the rest of the queue. Add a couple of
asserts to ensure that we don't skip a cycle and come to a grinding
halt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180215162553.23348-1-chris@chris-wilson.co.uk
If we remove some hardcoded assumptions about the preempt context having
a fixed id, reserved from use by normal user contexts, we may only
allocate the i915_gem_context when required. Then the subsequent
decisions on using preemption reduce to having the preempt context
available.
v2: Include an assert that we don't allocate the preempt context twice.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-3-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Rather than having the high level ioctl interface guess the underlying
implementation details, having the implementation declare what
capabilities it exports. We define an intel_driver_caps, similar to the
intel_device_info, which instead of trying to describe the HW gives
details on what the driver itself supports. This is then populated by
the engine backend for the new scheduler capability field for use
elsewhere.
v2: Use caps.scheduler for validating CONTEXT_PARAM_SET_PRIORITY (Mika)
One less assumption of engine[RCS] \o/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-2-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
This workaround should prevent a bug that can be hit on a context
restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall
(0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of
NOOP(0x0) in the indirect context batch buffer, to ensure the engine is
idle prior to programming 3DSTATE_SAMPLE_PATTERN.
It's also not clear whether we should add those extra dwords because of
the workaround itself, or if that's just padding for the WA BB (and next
commands could come right after the PIPE_CONTROL). We keep them for now.
References: HSD#1939868
v2: More descriptive changelog and comments.
v3: Explain that PIPE_CONTROL is actually 6 dwords, and that we advance
10 more dwords because of that.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205233330.14973-1-rafael.antognolli@intel.com
In preparation for the next patch, we want the engine to appear idle
after a reset (if there are no requests in flight). For execlists, this
entails clearing the active status on reset, it will be regenerated on
restarting the engine after the reset. In the process, note that a
couple of other status flags and checks could be moved into the
describing function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205152431.12163-3-chris@chris-wilson.co.uk
Execlists is now enabled by default and included in the list of
capabilities printed out to dmesg and beyond. We do not need to mention
it again, every time we restart the engine, so kill the spam.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205092201.19476-6-chris@chris-wilson.co.uk
Be paranoid and flush the GTIIR after clearing the CS interrupt to be
sure it has taken before we re-enable the interrupt handler. We still
see early interrupts following reset, the tasklet handling the mmio read
before it has been written by the CS. This hopefully reduces the
frequency to 0...
References: https://bugs.freedesktop.org/show_bug.cgi?id=104262
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180202145455.29876-1-chris@chris-wilson.co.uk
This patch clears a single bit. The bit is 0 by default but expected
not to be set. Explicitly clearing the bit in this patch is intended
to indicate some thinking has occurred, and that we want this bit
cleared and we are not just excepting the default value.
We also stop setting GFX_RUN_LIST_ENABLE, which is correct since that
bit is gone.
v2 (from Paulo): fix indentation.
v3: Changed GEN check to >= 11. Corrected author name.
v4 (from Paulo): improve commit message (Daniele).
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180130134918.32283-9-paulo.r.zanoni@intel.com
Remove the WARN_ON(ce->state) inside the static function only called
when ce->state == NULL and downgrade the w/a batch setup warning into a
developer only mode (GEM_WARN_ON).
v2: Move the deferred alloc guard into the callee, eliminating the need
for the WARN_ON:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-1 (-1)
Function old new delta
execlists_context_pin 1819 1818 -1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180126121846.12007-1-chris@chris-wilson.co.uk
CTX_CONTEXT_CONTROL (CTX_SR_CTL) operates as a masked register and so
will only apply the bits that are selected by the upper half. In the
case of selectively enabling sr inhibit, this may mean the context keeps
the current setting (so forgetting to save the context later, eventually
leading to a very upset GPU!).
Fixes: 517aaffe0c ("drm/i915/execlists: Inhibit context save/restore for the fake preempt context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180125112443.12745-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
We only use the preempt context to inject an idle point into execlists.
We never need to reference its logical state, so tell the GPU never to
load it or save it.
v2: BIT(2) for save-inhibit.
N.B. Daniele mentioned this bit mbz for ICL, and has been moved into the
submission process rather than the context image.
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180123210412.17653-1-chris@chris-wilson.co.uk
Newer platforms may have subtle offset changes, which will increase the
number of defines, so it is probably better to start moving them to its
own header file. Also move the macros used while setting the reg state.
v2: Rename to intel_lrc_reg.h, to be consistent with i915_reg.h and
intel_guc_reg.h (Chris)
v3: License notice shenanigans.
v4: Documentation/process/coding-style.rst is always right (Chris)
v5: Rebase.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180124004349.22126-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The macros we use to init the reg_state had the following issues reported
by checkpatch --strict.
Macro argument reuse 'reg_state' - possible side-effects
Macro argument reuse 'pos' - possible side-effects
Macro argument reuse 'ppgtt' - possible side-effects
spaces preferred around that '+' (ctx:VxV)
So fix these issues before they are moved to a new header file.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180124004349.22126-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Now that we can read the CSB from the HWSP, we may avoid having to
perform mmio reads entirely and so forgo the rigmarole of the forcewake
dance.
v2: Include forcewake hint for GEM_TRACE readback of mmio. If we don't
hold fw ourselves, the reads may return garbage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180122100714.15137-1-chris@chris-wilson.co.uk
We fail engine initialization if the scratch VMA cannot be created so
there is no point in error handle it later. If the initialization ordering
gets messed up, we can explode during development just as well.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180119100005.9072-2-tvrtko.ursulin@linux.intel.com
Render engine constructor helpers must only be called from the render
engine constructors, but there is no need to burden the production
binaries with warnings which can only be triggered during development.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180119100005.9072-1-tvrtko.ursulin@linux.intel.com
After staring at the list_for_each_safe macros for a bit, our current
invocation of list_safe_reset_next in execlists_schedule() simply
reduces to list_for_each.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-11-chris@chris-wilson.co.uk
The dependency chain must be an acyclic graph. This is checked by the
swfence, but for sanity, also do a simple check that we do not corrupt
our list iteration in execlists_schedule() by a shallow dependency
cycle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-10-chris@chris-wilson.co.uk
Back up our comment that all signalers should have been signaled before
we ourselves were retired with an assert to that effect.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-9-chris@chris-wilson.co.uk
Move the clearing of the CS-interrupt into the engine reset phase,
before the current init-hw phase. This helps clarify that we clear the
pending interrupts prior to any restarting of the execlists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-16-chris@chris-wilson.co.uk
Looking at the coordination of resets with the submission of execlists,
it will be useful to have a GEM_TRACE for when we issue the reset.
Whilst there tidy up the other GEM_TRACE to always include the engine
name, and be careful not to trust any pointers prior to asserts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171220090626.31643-1-chris@chris-wilson.co.uk
Currently on every submission, we recalculate the ELSP register offset
for the engine, after chasing the pointers to find the iomem base. Since
this is fixed for the lifetime of the driver, record the offset in the
execlists struct.
In practice the difference is negligible, it just happens to remove 27
bytes of eyesore pointer dancing from next to the hottest instruction
(which is itself due to stalling for a cache miss) in perf profiles of
the execlists_submission_tasklet().
v2: Trim off one more elsp local.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171207222434.17686-1-chris@chris-wilson.co.uk
Sagar noticed the check can be consolidated between the engine stats
implementation and the PMU.
My first choice was a static inline helper but that got into include
ordering mess quickly fast so I went with a macro instead. At some point
we should perhaps looking into taking out the non-ringubffer bits from
intel_ringbuffer.h into a new intel_engine.h or something.
v2: Use engine->flags. (Chris Wilson)
v3: Rebase and mark GuC as not yet supported. (Chris Wilson)
v4: Move flag setting to intel_engines_reset_default_submission.
(Chris Wilson)
v5: Move flag setting to logical_ring_setup.
v6: intel_engines_reset_default_submission is the wrong place to set the
flag - it needs to be in execlists_set_default_submission. (Sagar)
v7: Flag setting in logical_ring_setup is not required. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v6)
Link: https://patchwork.freedesktop.org/patch/msgid/20171129102805.22690-1-tvrtko.ursulin@linux.intel.com
Track total time requests have been executing on the hardware.
We add new kernel API to allow software tracking of time GPU
engines are spending executing requests.
Both per-engine and global API is added with the latter also
being exported for use by external users.
v2:
* Squashed with the internal API.
* Dropped static key.
* Made per-engine.
* Store time in monotonic ktime.
v3: Moved stats clearing to disable.
v4:
* Comments.
* Don't export the API just yet.
v5: Whitespace cleanup.
v6:
* Rename ref to active.
* Drop engine aggregate stats for now.
* Account initial busy period after enabling stats.
v7:
* Rebase.
v8:
* Move context in notification after the notifier. (Chris Wilson)
v9:
In cases where stats tracking is getting disabled while there is
an active context on an engine, add up the current value to the
total. This also implies we don't clear the total when tracking
is disabled any longer. There is no real need to do so because
we define the stats as relative while enabled, meaning
comparison between two samples while tracking is enabled is the
valid usage. However, when busy stats will later be plugged into
the perf PMU API, it is beneficial to not reset the total, since
the PMU core likes to do some counter disable/enable cycles on
startup, and while doing so during a single long context
executing on an engine we would lose some accuracy and so make
unit testing more difficult than needs to be.
v10:
* Fix accounting for preemption.
v11:
* Rebase for i915_modparams.enable_execlists removal.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-5-tvrtko.ursulin@linux.intel.com
Execlists and legacy ringbuffer submission are no longer feature
comparable (execlists now offer greater functionality that should
overcome their performance hit) and obsoletes the unsafe module
parameter, i.e. comparing the two modes of execution is no longer
useful, so remove the debug tool.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk
The hardware needs some time to process the information received in the
ExecList Submission Port, and expects us to not write anything more until
it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or
PREEMPTED CSB event.
If we do not follow this, the driver could write new data into the ELSP
before HW had finishing fetching the previous one, putting us in
'undefined behaviour' space.
This seems to be the problem causing the spurious PREEMPTED & COMPLETE
events after a COMPLETE like the one below:
[] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3.
[] vcs0: Execlist CSB[0]: 0x00000018 _ 0x00000007
[] vcs0: Execlist CSB[1]: 0x00000001 _ 0x00000000
[] vcs0: Execlist CSB[2]: 0x00000018 _ 0x00000007 <<< COMPLETE
[] vcs0: Execlist CSB[3]: 0x00000012 _ 0x00000007 <<< PREEMPTED & COMPLETE
[] vcs0: Execlist CSB[4]: 0x00008002 _ 0x00000006
[] vcs0: Execlist CSB[5]: 0x00000014 _ 0x00000006
The ELSP writes that lead to this CSB sequence show that the HW hadn't
started executing the previous execlist (the one with only ctx 0x6) by the
time the new one was submitted; this is a bit more clear in the data
show in the EXECLIST_STATUS register at the time of the ELSP write.
[] vcs0: ELSP[0] = 0x0_0 [execlist1] - status_reg = 0x0_302
[] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302
[] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308
[] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308
Note that having to wait for this ack does not disable lite-restores,
although it may reduce their numbers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If IDLE_ACTIVE is set, then all other bits are invalid. For us, we can
assert that if we see a COMPLETE | PREEMPTED event, then it should be
impossible for it to also contain an IDLE_ACTIVE flag.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since we get a COMPLETE event when the context switch occurs on
RING_HEAD == RING_TAIL and a PREEMPTED event when a switch occurs
before that point, COMPLETE | PREEMPTED should cover all possible context
switch completion events. We can move the ELEMENT_SWITCH info message
from the COMPLETED_MASK into an assertion for when we are performing a
switch to port[1].
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since commit e1fee72c2e
Author: Oscar Mateo <oscar.mateo@intel.com>
Date: Thu Jul 24 17:04:40 2014 +0100
drm/i915/bdw: Avoid non-lite-restore preemptions
execlists has listened to (ACTIVE_IDLE | ELEMENT_SWITCH) for detecting
when one context completed and it either continued onto the next (in port
1) or idled. We would always see COMPLETE | ACTIVE_IDLE on the final
context-switch event, but on recent gen it appears that we now get
separate ACTIVE_IDLE and COMPLETE events. In particular, the ACTIVE_IDLE
events may not be coupled to a context (since it is a general state rather
than a specific context completion event).
v2: Update the history, execlists did originally start out by listening
to the COMPLETE event not ACTIVE_IDLE.
v3: Update preempt completion test to also use COMPLETE not ACTIVE_IDLE.
References: bspec/12255
References: https://bugs.freedesktop.org/show_bug.cgi?id=103800
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-1-chris@chris-wilson.co.uk
intel_lrc_irq_handler and i915_guc_irq_handler are HW submission related
tasklet functions. Name them with "submission_tasklet" suffix and
remove intel/i915 prefix as they are static. Also rename irq_tasklet
as just tasklet for clarity.
v2: s/_bh/_tasklet (Chris)
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-2-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
At the start of building a request, we would wait for roughly enough
space to fit the average request (to reduce the likelihood of having to
wait and abort partway through request construction). To achieve we
would try to begin a 0-length command packet, this just adds extra
confusion so make the wait-for-space explicit, as in the next patch we
want to move it from the backend to the i915_gem_request_alloc() so it
can ensure that the wait-for-space is the first operation in building a
new request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115151204.8105-2-chris@chris-wilson.co.uk
As we now record the default HW state and so only emit the "golden"
renderstate once to prepare the HW, there is no advantage in keeping the
renderstate batch around as it will never be used again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-8-chris@chris-wilson.co.uk
Take a copy of the HW state after a reset upon module loading by
executing a context switch from a blank context to the kernel context,
thus saving the default hw state over the blank context image.
We can then use the default hw state to initialise any future context,
ensuring that each starts with the default view of hw state.
v2: Unmap our default state from the GTT after stealing it from the
context. This should stop us from accidentally overwriting it via the
GTT (and frees up some precious GTT space).
Testcase: igt/gem_ctx_isolation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
In the next few patches, we will want to both copy out of the context
image and write a valid image into a new context. To be completely safe,
we should then couple in our domain tracking to ensure that we don't
have any issues with stale data remaining in unwanted cachelines.
Historically, we omitted the .write=true from the call to set-gtt-domain
in i915_switch_context() in order to avoid a stall between every request
as we would want to wait for the previous context write from the gpu.
Since then, we limit the set-gtt-domain to only occur when we first bind
the vma, so once in use we will never stall, and we are sure to flush
the context following a load from swap.
Equally we never applied the lessons learnt from ringbuffer submission
to execlists; so time to apply the flush of the lrc after load as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-6-chris@chris-wilson.co.uk
Trying to enable printk debugging for GEM is fraught with the issue of
spam; interactions with HW are very frequent and often boring. However,
one instance where they are not so boring is just before a BUG; here
ftrace provides a facility to dump its ringbuffer on an oops. So for CI
let's enable trace_printk() to capture the last exchanges with HW as a
death rattle.
For example,
[ 79.234110] ------------[ cut here ]------------
[ 79.234137] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:907!
[ 79.234145] invalid opcode: 0000 [#1] SMP
[ 79.234153] Dumping ftrace buffer:
[ 79.234158] ---------------------------------
...
[ 79.314044] gem_conc-1059 1..s1 79203443us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.2, seqno=145
[ 79.314089] gem_conc-1059 1..s. 79220800us : intel_lrc_irq_handler: bcs0 csb[1/1]: status=0x00000018:0x00000005
[ 79.314133] gem_conc-1059 1..s. 79220803us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.1, seqno=145
[ 79.314177] gem_conc-1062 2..s1 79230458us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.1, seqno=146
[ 79.314220] gem_conc-1062 2..s1 79230515us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.2, seqno=147
[ 79.314265] gem_conc-1059 1..s1 79230951us : intel_lrc_irq_handler: bcs0 csb[2/3]: status=0x00000012:0x00000008
[ 79.314309] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.2, seqno=147
[ 79.314353] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 csb[3/3]: status=0x00008002:0x00000008
[ 79.314396] gem_conc-1059 1..s1 79230955us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.1, seqno=147
[ 79.314402] ---------------------------------
v2: Tweak the formatting to be more consistent between in/out.
v3: do {} while (0) stub macro protection
Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171109143019.16568-1-chris@chris-wilson.co.uk
Pretty similar to what we have on execlists.
We're reusing most of the GEM code, however, due to GuC quirks we need a
couple of extra bits.
Preemption is implemented as GuC action, and actions can be pretty slow.
Because of that, we're using a mutex to serialize them. Since we're
requesting preemption from the tasklet, the task of creating a workitem
and wrapping it in GuC action is delegated to a worker.
To distinguish that preemption has finished, we're using additional
piece of HWSP, and since we're not getting context switch interrupts,
we're also adding a user interrupt.
The fact that our special preempt context has completed unfortunately
doesn't mean that we're ready to submit new work. We also need to wait
for GuC to finish its own processing.
v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
no need for ordered workqueue, put on a reviewer hat when looking at my own
patches (Chris)
Move struct work around in intel_guc, move user interruput outside of
conditional (Michał)
Keep ring around rather than chase though intel_context
v3: Extract WA for flushing ggtt writes to a helper (Chris)
Keep work_struct in intel_guc rather than engine (Michał)
Use ordered workqueue for inject_preempt worker to avoid GuC quirks.
v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
Drop stray newlines, use container_of for intel_guc in worker,
check for presence of workqueue when flushing it, rather than
enable_guc_submission modparam, reorder preempt postprocessing (Chris)
v5: Make wq NULL after destroying it
v6: Swap struct guc_preempt_work members (Michał)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
We would also like to make use of execlist_cancel_port_requests and
unwind_incomplete_requests in GuC preemption backend.
Let's rename the functions to use the correct prefixes, so that we can
simply add the declarations in the following patch.
Similar thing for applies for can_preempt, except we're introducing
HAS_LOGICAL_RING_PREEMPTION macro instad, converting other users that
were previously touching device info directly.
v2: s/intel_engine/execlists and pass execlists to unwind (Chris)
v3: use locked version for exporting, drop const qual (Chris)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-11-michal.winiarski@intel.com
Let's separate the "emit" part from touching any internal structures,
this way we can have a generic "emit coherent GGTT write" function.
We would like to reuse this functionality for emitting HWSP write, to
confirm that preempt-to-idle has finished.
v2: Reorder args to match emit_pipe_control, s/render/rcs (Chris)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-8-michal.winiarski@intel.com
Now that we're handling request resubmission the same way as regular
submission (from the tasklet), we can move GuC initialization earlier,
before restarting the engines. This way, we're no longer being in the
state of flux during engine restart - we're already in user requested
submission mode.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025172519.10670-5-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the next patch, we will want to install a callback when the engines
(GT as a whole) become idle and similarly when they first become busy.
To enable that callback, first rename intel_engines_mark_idle() to
intel_engines_park() and provide the companion intel_engines_unpark().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk
Originally we set the priority to max upon inserting the request into
the execlists queue (and removing it from the scheduler lists). We could
then use the prio==INT_MAX as a shortcut within execlists_schedule() to
detect the end of the dependency chain. Since commit 1f181225f8
("drm/i915/execlists: Keep request->priority for its lifetime") this is
no longer true as we use the request completion as an indicator the
schedule dependency chain is complete instead. (This allows us to then
reschedule requests even when its context is in flight.) However, this
makes the GEM_BUG_ON() inside execlists_schedule() racy as we may change
the rq->prio at the same time. As the assertion is useful, let's keep
the assertion and remove the micro-optimisation.
Fixes: 1f181225f8 ("drm/i915/execlists: Keep request->priority for its lifetime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171024115501.21033-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Back in commit a4b2b01523 ("drm/i915: Don't mark an execlists
context-switch when idle") we noticed the presence of late
context-switch interrupts. We were able to filter those out by looking
at whether the ELSP remained active, but in commit beecec9017
("drm/i915/execlists: Preemption!") that became problematic as we now
anticipate receiving a context-switch event for preemption while ELSP
may be empty. To restore the spurious interrupt suppression, add a
counter for the expected number of pending context-switches and skip if
we do not need to handle this interrupt to make forward progress.
v2: Don't forget to switch on for preempt.
v3: Reduce the counter to a on/off boolean tracker. Declare the HW as
active when we first submit, and idle after the final completion event
(with which we confirm the HW says it is idle), and track each source
of activity separately. With a finite number of sources, it should aide
us in debugging which gets stuck.
Fixes: beecec9017 ("drm/i915/execlists: Preemption!")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
In the next patch, we want to reduce the lock coverage within the
shrinker, and one of the dangerous walks we have is over obj->vma_list.
We are only walking the obj->vma_list in order to check whether it has
been permanently pinned by HW access, typically via use on the scanout.
But we have a couple of other long term pins, the context objects for
which we currently have to check the individual vma pin_count. If we
instead mark these using obj->pin_display, we can forgo the dangerous
and sometimes slow list iteration.
v2: Rearrange code to try and avoid confusion from false associations
due to arrangement of whitespace along with rebasing on obj->pin_global.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-4-chris@chris-wilson.co.uk
Let GVT-g VM read the CSB and CSB write pointer from virtual HWSP, not all
the host support this feature, need to check the BIT(3) of caps in PVINFO.
v3 : Remove unnecessary comments.
v4 : Separate VM enable patch with GVT-g implementation patch due to code
dependency.
v5 : Use inline for GVT virtual HWSP caps check function.
v6 : Comments refine.
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1508039725-1066-1-git-send-email-weinan.z.li@intel.com
There is function to tell how many ports we have, so use it.
We still have direct relationship with array size and port count,
so no harm was done.
Fixes: 76e70087d3 ("drm/i915: Make execlist port count variable")
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010114857.13108-1-mika.kuoppala@intel.com
Michel Thierry noticed that we were applying WaDisableCtxRestoreArbitration
even to gen9, which does not require the w/a. The rationale is that we
need to enable MI arbitration for execlists to work, and to be safe we
do that before every batch (in addition to every context switch into the
batch). Since this is not clear from the single line comment suggesting
the MI_ARB_ENABLE is solely for the w/a, add a little more detail.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171005191005.13462-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
When we write to ELSP, it triggers a context preemption at the earliest
arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other
operations and the explicit MI_ARB_CHECK). If this is to the same
context, it triggers a LITE_RESTORE where the RING_TAIL is merely
updated (used currently to chain requests from the same context
together, avoiding bubbles). However, if it is to a different context, a
full context-switch is performed and it will start to execute the new
context saving the image of the old for later execution.
Previously we avoided preemption by only submitting a new context when
the old was idle. But now we wish embrace it, and if the new request has
a higher priority than the currently executing request, we write to the
ELSP regardless, thus triggering preemption, but we tell the GPU to
switch to our special preemption context (not the target). In the
context-switch interrupt handler, we know that the previous contexts
have finished execution and so can unwind all the incomplete requests
and compute the new highest priority request to execute.
It would be feasible to avoid the switch-to-idle intermediate by
programming the ELSP with the target context. The difficulty is in
tracking which request that should be whilst maintaining the dependency
change, the error comes in with coalesced requests. As we only track the
most recent request and its priority, we may run into the issue of being
tricked in preempting a high priority request that was followed by a
low priority request from the same context (e.g. for PI); worse still
that earlier request may be our own dependency and the order then broken
by preemption. By injecting the switch-to-idle and then recomputing the
priority queue, we avoid the issue with tracking in-flight coalesced
requests. Having tried the preempt-to-busy approach, and failed to find
a way around the coalesced priority issue, Michal's original proposal to
inject an idle context (based on handling GuC preemption) succeeds.
The current heuristic for deciding when to preempt are only if the new
request is of higher priority, and has the privileged priority of
greater than 0. Note that the scheduler remains unfair!
v2: Disable for gen8 (bdw/bsw) as we need additional w/a for GPGPU.
Since, the feature is now conditional and not always available when we
have a scheduler, make it known via the HAS_SCHEDULER GETPARAM (now a
capability mask).
v3: Stylistic tweaks.
v4: Appease Joonas with a snippet of kerneldoc, only to fuel to fire of
the preempt vs preempting debate.
Suggested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-8-chris@chris-wilson.co.uk
With preemption, we will want to "unsubmit" a request, taking it back
from the hw and returning it to the priority sorted execution list. In
order to know where to insert it into that list, we need to remember
its adjust priority (which may change even as it was being executed).
This also affects reset for execlists as we are now unsubmitting the
requests following the reset (rather than directly writing the ELSP for
the inflight contexts). This turns reset into an accidental preemption
point, as after the reset we may choose a different pair of contexts to
submit to hw.
GuC is not updated as this series doesn't add preemption to the GuC
submission, and so it can keep benefiting from the early pruning of the
DFS inside execlists_schedule() for a little longer. We also need to
find a way of reducing the cost of that DFS...
v2: Include priority in error-state
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-6-chris@chris-wilson.co.uk
Let the listener know that the context we just scheduled out was not
complete, and will be scheduled back in at a later point.
v2: Handle CONTEXT_STATUS_PREEMPTED in gvt by aliasing it to
CONTEXT_STATUS_OUT for the moment, gvt can expand upon the difference
later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-3-chris@chris-wilson.co.uk
This has been unused since commit afa8ce5b30
("drm/i915: Nuke legacy flip queueing code").
Changes since v1:
- Rebase on top of all the changes to modparams.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
\o/-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171004094416.31306-1-maarten.lankhorst@linux.intel.com
Avoid the repeated rbtree lookup for each request as we unwind them by
tracking the last priolist.
v2: Fix up my unhelpful suggestion of using default_priolist.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-4-chris@chris-wilson.co.uk
We use INT_MIN to denote the priority of a request that has not been
submitted to the scheduler; we treat INT_MIN as an invalid priority and
initialise the request to it. Give the value a name so it stands out.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com
In the future, we will want to unwind requests following a preemption
point. This requires the same steps as for unwinding upon a reset, so
extract the existing code to a separate function for later use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-2-chris@chris-wilson.co.uk
When cancelling requests, also send the notification to any listeners
(gvt) that the request is no longer scheduled on hw. They may require to
keep the in/out exactly balanced, and so the reuse after the reset may
confuse the listener.
Fixes: 221ab9719b ("drm/i915/execlists: Unwind incomplete requests on resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170926101720.9479-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Since we inherited the context image setup from gen8 which needed a
per-bb workaround (for GPGPU), we are submitting an empty per-bb buffer
on gen9. Now that we can skip adding the buffer to the context image,
remove the dangling per-bb. This slightly improves execution latency,
most notably on an idle engine.
References: https://bugs.freedesktop.org/show_bug.cgi?id=87725
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-2-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The per-context and per-batch workaround buffers are optional, yet we
tell the GPU to execute them even if they contain no instructions. Doing
so incurs the dispatch latency, which we can avoid if we don't ask the
GPU to execute the no-op buffers. Allow ourselves to skip setup of empty
buffer, and then to only enable non-empty buffers in the context image.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
As we emulate execlists on top of the GuC workqueue, it is not
restricted to just 2 ports and we can increase that number arbitrarily
to trade-off queue depth (i.e. scheduling latency) against pipeline
bubbles.
v2: rebase. better commit msg (Chris)
v3: rebase
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-5-mika.kuoppala@intel.com
When first execlist entry is processed, we move the port (contents).
Introduce function for this as execlist and guc use this common
operation.
v2: rebase. s/GEM_DEBUG_BUG/GEM_BUG (Chris)
v3: rebase
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-4-mika.kuoppala@intel.com
On reset and wedged path, we want to release the requests
that are tied to ports and then mark the ports to be unset.
Introduce a function for this.
v2: rebase
v3: drop local, keep GEM_BUG_ON (Michał, Chris)
v4: rebase
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-3-mika.kuoppala@intel.com
Engine's execlist related items have been increasing to
a point where a separate struct is warranted. Carve execlist
specific items to a dedicated struct to add clarity.
v2: add kerneldoc and fix whitespace (Joonas, Chris)
v3: csb_mmio changes, rebase
v4: s/\b(el|execlist)\b/execlists/ (Joonas)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.
v5: pure rename
v6: fix
Credits-to: Coccinelle
@@
identifier n;
@@
(
- i915.n
+ i915_modparams.n
)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
To create an upper bound on number of GuC workitems, we need to change
the way that requests are being submitted. Rather than submitting each
request as an individual workitem, we can do coalescing in a similar way
we're handlig execlist submission ports. We also need to stop pretending
that we're doing "lite-restore" in GuC submission (we would create a
workitem each time we hit this condition). This allows us to completely
remove the reservation, replacing it with a compile time check.
v2: Also coalesce when replaying on reset (Daniele)
v3: Consistent wq_resv - per-request (Daniele)
v4: Squash removing wq_resv
v5: Reflect i915_guc_submit argument changes in doc
v6: Rebase on top of execlists reset/restart fix (Chris,Michał)
References: https://bugs.freedesktop.org/show_bug.cgi?id=101873
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170914083216.10192-2-michal.winiarski@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Given the mechanism to unwind and replay requests (designed to support
preemption), we have an alternative to the current method of
resubmitting the ELSP upon reset. Resubmitting ELSP turns out to be more
complicated than expected, due to having to handle lost context-switch
interrupts and so guessing what ELSP we need to resubmit later. Instead,
by unwinding the requests and clearing the ELSP tracking entirely, we
can then just dequeue the first pair of ready requests after resetting,
using the normal submission procedure.
Currently, the unwound requests have maximum priority and so are
guaranteed to be resubmitted upon resume. If we are lucky, we may be
able to coalesce a new request on top!
Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-4-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
In the next patch we will want to reinsert a request not at the end of
the priority queue, but at the front. Here we split insert_request()
into two, the first function retrieves the priority list (for reuse for
unsubmit later) and a wrapper function to insert at the end of that list
and to schedule the tasklet if we were first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-3-chris@chris-wilson.co.uk
During a reset, we may skip over completed requests and lost
context-switch interrupts. Following the reset, we may then may end up
with no active requests in the ELSP (and so do not resubmit to restart
the engine), but have a queue of requests ready for execution. This is
unlikely, it requires the last request to complete after the hang is
detected, but not impossible. The outcome of this is that the engine
stalls, possibly leading to full ring and indefinite wait under
struct_mutex, eventually leading to a full driver hang.
Alternatively, we can solve this by unsubmitting the incomplete requests
and just kickstarting the tasklet. Michał has patches for that, which I
initially disliked due to the extra complexity, but the complexity of
this "simple" restart is growing...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
When wedging the hw, we want to mark all in-flight requests as -EIO.
This is made slightly more complex by execlists who store the ready but
not yet submitted-to-hw requests on a private queue (an rbtree
priolist). Call into execlists to cancel not only the ELSP tracking for
the submitted requests, but also the queue of unsubmitted requests.
v2: Move the majority of engine_set_wedged to the backends (both legacy
ringbuffer and execlists handling their own lists).
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_eio/in-flight-contexts
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!
v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.
Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.
v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)
v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
read and we want to stop the compiler from issuing a later (v.slow) reload.
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.
To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.
v2: Add a use-case for the register arena of the HWSP.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk
Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:
| [guc] | [hwsp] [logical state] |
|<- our header ->|<- context image ->|
So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).
This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.
v2: Use PPHWSP index for hws again.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-1-chris@chris-wilson.co.uk
The context descriptor is stored inside the per-engine context state, as
we only need to compute it once and access it frequently. However,
currently only intel_lrc.c has easy access, but i915_guc_submission.c
would like to frequently read it as well, and more so only ever needs
the lower 32bits. Make it an inline as the compiler should be able to
retrieve the value in less instructions than it takes to do the function
call:
add/remove: 0/1 grow/shrink: 1/0 up/down: 8/-45 (-37)
function old new delta
i915_guc_submit 621 629 +8
intel_lr_context_descriptor 45 - -45
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170912214905.21987-1-chris@chris-wilson.co.uk
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Let's inherit workarounds from previous platforms that
according to wa_database and BSpec are still valid for
Cannonlake.
v2: Add missed workarounds.
v3: Rebase
v4: Remove bad chunk that was added to rc6 disable. (Ander)
Also remove A0 W/a that are not needed anymore.
v5: Rebase on top of CFL.
v6: Remove empty gen9_init_perctx_bb and gen9_init_indirectctx_bb
since they don't carry any gen10 related W/a. (by Oscar).
Also Remove A0 exclusive workaround.
v7: Remove more A0 exclusive workarounds. As pointed out by Oscar
many workarounds were changed to be A0 only so let's remove
them.
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170815231651.975-1-rodrigo.vivi@intel.com
During a global reset, we disable the irq. As we disable the irq, the
hardware may be raising a GT interrupt that we then ignore, leaving it
pending in the GTIIR. After the reset, we then re-enable the irq,
triggering the pending interrupt. However, that interrupt was for the
stale state from before the reset, and the contents of the CSB buffer
are now invalid.
v2: Add a comment to make it clear that the double clear is purely my
paranoia.
Reported-by: "Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170807121919.30165-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20170818090509.5363-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
When doing a GPU reset, the CSB register will be trashed and we will
lose any context-switch notifications that happened since the tasklet
was disabled. If we find that all requests on this engine were
completed, we want to make sure that the ELSP tracker is similarly empty
so that we do not feed back in the completed requests upon recovering
from the reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-4-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.
Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.
The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).
This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.
The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).
Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.
v5: Drain submitted requests when enabling metric set to ensure no
lite-restore erases the context image we just updated (Lionel)
v6: In addition to drain, switch to kernel context & update all
context in place (Chris)
v7: Add missing mutex_unlock() if switching to kernel context fails
(Matthew)
v8: Simplify OA period/flex-eu-counters programming by using the
batchbuffer instead of modifying ctx-image (Lionel)
v9: Back to updating the context image (due to erroneous testing,
batchbuffer programming the OA unit doesn't actually work)
(Lionel)
Pin context before updating context image (Chris)
Drop MMIO programming now that we switch to a kernel context with
right values in initial context image (Chris)
v10: Just pin_map the contexts we want to modify or let the
configuration happen on first use (Chris)
v11: Update kernel context OA config through the batchbuffer rather
than on the fly ctx-image update (Lionel)
v12: Rework OA context registers update again by swithing away from
user contexts and reconfiguring the kernel context through the
batchbuffer and updating all the other contexts' context image.
Also take care to lock slice/subslice configuration when OA is
on. (Lionel)
v13: Request rpcs updates on all engine when updating the OA config
(Lionel)
v14: Drop any kind of rpcs management now that we monitor sseu
configuration changes in a later patch (Lionel)
Remove usleep after programming the NOA configs on Gen8+, this
doesn't seem to be needed (Lionel)
v15: Respect coding style for block comments (Chris)
v16: Add missing i915_add_request() in case we fail to emit OA
configuration (Matthew)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Indirect Context Offset Pointer has changed for Cannonlake.
INDIRECT_CTX_OFFSET[15:6] valid value for CNL is 19h per Spec.
v2: rebased to intel_lr_indirect_ctx_offset
v3: Commit message added per Tvrtko request.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496781040-20888-9-git-send-email-rodrigo.vivi@intel.com
If we do not require to perform priority bumping, and we haven't yet
submitted the request, we can update its priority in situ and skip
acquiring the engine locks -- thus avoiding any contention between us
and submit/execute.
v2: Remove the stack element from the list if we can do the early
assignment.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-10-chris@chris-wilson.co.uk
The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.
Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.
There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).
v2: Avoid use-after-free when deleting a depleted priolist
v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function old new delta
execlists_submit_ports 262 471 +209
port_assign.isra - 136 +136
capture 6344 6359 +15
reset_common_ring 438 452 +14
execlists_submit_request 228 238 +10
gen8_init_common_ring 334 341 +7
intel_engine_is_idle 106 105 -1
i915_engine_info 2314 2290 -24
__i915_gem_set_wedged_BKL 485 411 -74
intel_lrc_irq_handler 1789 1604 -185
execlists_update_context 294 - -294
The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).
v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
During execlist_context_deferred_alloc() we presumed that the context is
uninitialised (we only just allocated the state object for it!) and
chose to optimise away the later call to engine->init_context() if
engine->init_context were NULL. This breaks with GVT's contexts that are
marked as pre-initialised to avoid us annoyingly calling
engine->init_context(). The fix is to not override ce->initialised if it
is already true.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1494497262-24855-1-git-send-email-chuanxiao.dong@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since unifying ringbuffer/execlist submission to use
engine->pin_context, we ensure that the intel_ring is available before
we start constructing the request. We can therefore move the assignment
of the request->ring to the central i915_gem_request_alloc() and not
require it in every engine->request_alloc() callback. Another small step
towards simplification (of the core, but at a cost of handling error
pointers in less important callers of engine->pin_context).
v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code
generation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
Pre-calculate engine context size based on engine class and device
generation and store it in the engine instance.
v2:
- Squash and get rid of hw_context_size (Chris)
v3:
- Move after MMIO init for probing on Gen7 and 8 (Chris)
- Retained rounding (Tvrtko)
v4:
- Rebase for deferred legacy context allocation
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to keep track of the last location we ask the hw to read up to
(RING_TAIL) separately from our last write location into the ring, so
that in the event of a GPU reset we do not tell the HW to proceed into
a partially written request (which can happen if that request is waiting
for an external signal before being executed).
v2: Refactor intel_ring_reset() (Mika)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Testcase: igt/gem_exec_fence/await-hang
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
As we now share the execlist_port[] tracking for both execlists/guc, we
can reset the inflight count on both and report which requests are being
restarted.
Suggested-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425103835.31871-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We indirectly hold the runtime-pm for the intel_lrc_irq_handler() by
virtue of dev_priv->gt.awake keeping a wakeref whilst the requests are
busy. As this is not obvious from the code, add a comment.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411175850.2470-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Technically speaking, the context size is per engine class, not per
instance.
v2: Add MISSING_CASE (Tvrtko)
v3: Rebased
v4: Restore the interface back to hiding the class lookup (Chris)
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491905472-16189-1-git-send-email-oscar.mateo@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Or rather it is used only by intel_ring_pin() to extract the
drm_i915_private which we can easily pass in. As this is a relatively
rare operation, save the space in the struct, and as such it is even
break even in the extra code for passing around the parameter:
add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0)
function old new delta
intel_init_ring_buffer 906 918 +12
execlists_context_pin 1308 1311 +3
mock_engine 407 403 -4
intel_engine_create_ring 367 363 -4
intel_ring_pin 326 319 -7
Total: Before=1261794, After=1261794, chg +0.00%
v2: Reorder intel_init_ring_buffer to keep the ring setup together:
add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6)
function old new delta
intel_init_ring_buffer 906 912 +6
execlists_context_pin 1308 1311 +3
mock_engine 407 403 -4
intel_engine_create_ring 367 363 -4
intel_ring_pin 326 319 -7
Total: Before=1261794, After=1261788, chg -0.00%
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
This reverts commit 6c943de668 ("drm/i915: Skip execlists_dequeue()
early if the list is empty").
The validity of using READ_ONCE there depends upon having a mb to
coordinate the assignment of engine->execlist_first inside
submit_request() and checking prior to taking the spinlock in
execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
memory barrier making this cross-cpu checking safe", but failed to
notice that this mb was *conditional* on the execlists being ready, i.e.
there wasn't the required mb when it was most necessary!
We could install an unconditional memory barrier to fixup the
READ_ONCE():
diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
index 7dd732cb9f57..1ed164b16d44 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -616,6 +616,7 @@ static void execlists_submit_request(struct
drm_i915_gem_request *request)
if (insert_request(&request->priotree, &engine->execlist_queue))
{
engine->execlist_first = &request->priotree.node;
+ smp_wmb();
if (execlists_elsp_ready(engine))
But we have opted to remove the race as it should be rarely effective,
and saves us having to explain the necessary memory barriers which we
quite clearly failed at.
Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 6c943de668 ("drm/i915: Skip execlists_dequeue() early if the list is empty")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Unlocking is dangerous. In this case we combine an early update to the
out-of-queue request, because we know that it will be inserted into the
correct FIFO priority-ordered slot when it becomes ready in the future.
However, given sufficient enthusiasm, it may become ready as we are
continuing to reschedule, and so may gazump the FIFO if we have since
dropped its spinlock. The result is that it may be executed too early,
before its dependencies.
v2: Move all work into the second phase over the topological sort. This
removes the shortcut on the out-of-rbtree request to ensure that we only
adjust its priority after adjusting all of its dependencies.
Fixes: 20311bd350 ("drm/i915/scheduler: Execute requests in order of priorities")
Testcase: igt/gem_exec_whisper
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170327202143.7972-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Whilst I like having the assertions clearly visible in the code, they
are quite repetitious! As we find new limits we want to incorporate into
the set of assertions, it make sense to refactor them to a common
routine.
v2: Add a guc holdout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
If the request->wa_tail is 0 (because it landed exactly on the end of
the ringbuffer), when we reconstruct request->tail following a reset we
fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
never able to catch up with RING_TAIL and the GPU spins endlessly. If
the ring contains a couple of breadcrumbs, even our hangcheck is unable
to catch the busy-looping as the ACTHD and seqno continually advance.
v2: Move the wrap into a common intel_ring_wrap().
Fixes: a3aabe86a3 ("drm/i915/execlists: Reinitialise context image after GPU hang")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
I noticed that gcc was spilling the CSB to the stack, so rearrange the
code to be more compact. Spilling in this function is slightly more
interesting due to the mmio reads acting as memory barriers and so
end up flushing the stack spills. Still miniscule to having to do at
least the pair of uncached reads :(
function old new delta
intel_lrc_irq_handler 1039 878 -161
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170325201053.21306-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We only need to care about the ordering of the clearing of the bit with
the uncached CSB read in order to correctly detect a new interrupt
before the read completes. The uncached read itself acts as a full
memory barrier, so we do not need to enforce another in the form of a
locked clear_bit.
v2: Clarify why the split and unlocked test/clear is harmless.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170323134803.10418-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Storing the position of the breadcrumb of the last retired request as
a separate last_retired_head is superfluous as we always copy that into
head prior to recalculation of the intel_ring.space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk
Rather than impose the cost of a locked test before queuing a new
request, reduce it to a simple test_bit() with a following clear_bit()
prior to doing the CSB check. This ensure that if an interrupt does
occur whilst reading from the CSB, we still detect it (the interrupt
would trigger a rescheduling of the tasklet anyway).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170321113320.2603-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When switching back to execlists, we also now need to restore the
tasklet handler.
Reported-by: Oscar Mateo <oscar.mateo@intel.com>
Fixes: 31de73501a ("drm/i915/scheduler: emulate a scheduler for guc")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170318102859.24101-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Do an early read of the execlists' queue before we take the spinlock and
start checking. This is safe as the first writer to the execlists queue
will cause the tasklet to be run again after a memory barrier.
v2: Keep guc in sync with execlists queue changes
v3: Explain the mb between the tasklet running on one cpu and the
execlist_first update and schedule from a second cpu.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317120716.17191-1-chris@chris-wilson.co.uk
This should be impossible, but let's assert that we do not pin a context
4 billion times before retiring!
v2: Fix the assertion -- the patch had just one job to do!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171628.3228-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
It turns out that we may want to restore the original
engine->submit_request (and engine->schedule) callbacks from more than
just the guc <-> execlists transition. Move this to a vfunc so we can
have a common interface.
v2: Move initial selection to intel_engines_init_common(), repaint vfunc
with engine->set_default_submission (and a similar colour for the
helper).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk
GVTg has introduced the context status notifier to schedule the GVTg
workload. At that time, the notifier is bound to GVTg context only,
so GVTg is not aware of host workloads.
Now we are going to improve GVTg's guest workload scheduler policy,
and add Guc emulation support for new Gen graphics. Both these two
features require acknowledgment for all contexts running on hardware.
(But will not alter host workload.) So here try to make some change.
The change is simple:
1. Move the context status notifier head from i915_gem_context to
intel_engine_cs. Which means there is a notifier head per engine
instead of per context. Execlist driver still call notifier for
each context sched-in/out events of current engine.
2. At GVTg side, it binds a notifier_block for each physical engine
at GVTg initialization period. Then GVTg can hear all context
status events.
In this patch, GVTg do nothing for host context event, but later
will add a function there. But in any case, the notifier callback is
a noop if this is no active vGPU.
Since intel_gvt_init() is called at early initialization stage and
require the status notifier head has been initiated, I initiate it in
intel_engine_setup().
v2: remove a redundant newline. (chris)
Fixes: 3c7ba6359d ("drm/i915: Introduce execlist context status change notification")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100232
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170313024711.28591-1-changbin.du@intel.com
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This emulates execlists on top of the GuC in order to defer submission of
requests to the hardware. This deferral allows time for high priority
requests to gazump their way to the head of the queue, however it nerfs
the GuC by converting it back into a simple execlist (where the CPU has
to wake up after every request to feed new commands into the GuC).
v2: Drop hack status - though iirc there is still a lockdep inversion
between fence and engine->timeline->lock (which is impossible as the
nesting only occurs on different fences - hopefully just requires some
judicious lockdep annotation)
v3: Apply lockdep nesting to enabling signaling on the request, using
the pattern we already have in __i915_gem_request_submit();
v4: Replaying requests after a hang also now needs the timeline
spinlock, to disable the interrupts at least
v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal
v6: Reorder interrupt checking for a happier gcc.
v7: Only signal the tasklet after a user-interrupt if using guc scheduling
v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko)
v9: Avoid re-initialising the engine->irq_tasklet from inside a reset
v10: Hook up the execlists-style tracepoints
v11: Clear the execlists irq_posted bit after taking over the interrupt/tasklet
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316125619.6856-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
The replay bit of the ring mode register is not a valid bit for Gen8+.
Do not write to this bit.
Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[Joonas: Fixed commit message line to be under 72 chars]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1487963724-4824-1-git-send-email-kelvin.gardiner@intel.com
Two new tracepoints placed at the call sites where requests are
actually passed to the GPU enable userspace to track engine
utilisation.
These tracepoints are only enabled when the
DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled.
v2: Fix compilation with !CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS.
v3: Name global seqno consistently across tracepoints.
v4: Remove port info from request out tracepoint. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Compact the name of the macro and reg_state variable, and cache
some data in local variables to make the function more compact
and more readable.
v2: Fixup some checkpatch warnings.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170221095839.30525-1-tvrtko.ursulin@linux.intel.com
The hardware requires that the tail pointer only advance in qword units,
so assert that the value we write is aligned to qwords, and similarly
enforce this restriction onto the request->tail.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217163833.731-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
We have a few open coded instances in the execlists code and an
almost suitable helper in intel_ringbuf.c
We can consolidate to a single helper if we change the existing
helper to emit directly to ring buffer memory and move the space
reservation outside it.
v2: Drop memcpy for memset. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170216122325.31391-2-tvrtko.ursulin@linux.intel.com
Use the "*batch++ = " style as in the ring emission for better
readability and also simplify the logic a bit by consolidating
the offset and size calculations and overflow checking. The
latter is a programming error so it is not required to check
for it after each write to the object, but rather do it once the
whole state has been written and fail the driver if something
went wrong.
v2: Rebase.
v3: Keep track of offsets and sizes in bytes for simplicity
and rename function pointer variable to _fn suffix.
(Chris Wilson)
v4: Fix size calc broken in v3 and add alignment warning. (Chris Wilson)
v5: Fix return code.
v6: I added an exit from loop in v5 but forgot to put back
the object teardown.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This removes the usage of intel_ring_emit in favour of
directly writing to the ring buffer.
intel_ring_emit was preventing the compiler for optimising
fetch and increment of the current ring buffer pointer and
therefore generating very verbose code for every write.
It had no useful purpose since all ringbuffer operations
are started and ended with intel_ring_begin and
intel_ring_advance respectively, with no bail out in the
middle possible, so it is fine to increment the tail in
intel_ring_begin and let the code manage the pointer
itself.
Useless instruction removal amounts to approximately
two and half kilobytes of saved text on my build.
Not sure if this has any measurable performance
implications but executing a ton of useless instructions
on fast paths cannot be good.
v2:
* Change return from intel_ring_begin to error pointer by
popular demand.
* Move tail increment to intel_ring_advance to enable some
error checking.
v3:
* Move tail advance back into intel_ring_begin.
* Rebase and tidy.
v4:
* Complete rebase after a few months since v3.
v5:
* Remove unecessary cast and fix !debug compile. (Chris Wilson)
v6:
* Make intel_ring_offset take request as well.
* Fix recording of request postfix plus a sprinkle of asserts.
(Chris Wilson)
v7:
* Use intel_ring_offset to get the postfix. (Chris Wilson)
* Convert GVT code as well.
v8:
* Rename *out++ to *cs++.
v9:
* Fix GVT out to cs conversion in GVT.
v10:
* Rebase for new intel_ring_begin in selftests.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
After a brief discussion, we settled on a naming convention for the
conditional GEM debugging data that should be clearer to the casual
user: GEM_DEBUG
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Now that we have fast top-down insertion into the drm_mm, we can use it
for frequent runtime operations like insertion of the context object,
whereas before we limited it to the one-off insertion of the pinned
kernel context. Keeping the active context objects out of the mappable
region of the global GTT (except under memory pressure) improves our
ability to allocate mappable aperture region without triggering a GPU
stall.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210101422.1598-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Once the address space has been created (using 3 or 4 levels of page
tables), we should use that to program the appropriate type into the
contexts. This gives us the flexibility to handle different types of
address spaces at runtime.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170209144036.23664-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>