linux

Author	SHA1	Message	Date
Chris Wilson	46b3617dfe	drm/i915: Actually flush interrupts on reset not just wedging Commit `0f36a85c3b` ("drm/i915: Flush pending interrupt following a GPU reset") got confused and only applied the flush to the set-wedge path (which itself is proving troublesome), but we also need the serialisation on the regular reset path. Oops. Move the interrupt into reset_irq() and make it common to the reset and final set-wedge. v2: reset_irq() after port cancellation, as we assert that execlists->active is sane for cancellation (and is being reset by reset_irq). References: `0f36a85c3b` ("drm/i915: Flush pending interrupt following a GPU reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180323101824.14645-1-chris@chris-wilson.co.uk	2018-03-23 17:03:24 +00:00
Chris Wilson	0f36a85c3b	drm/i915: Flush pending interrupt following a GPU reset After resetting the GPU (or subset of engines), call synchronize_irq() to flush any pending irq before proceeding with the cleanup. For a device level reset, we disable the interupts around the reset, but when resetting just one engine, we have to avoid such global disabling. This leaves us open to an interrupt arriving for the engine as we try to reset it. We already do try to flush the IIR following the reset, but we have to ensure that the in-flight interrupt does not land after we start cleaning up after the reset; enter synchronize_irq(). As it current stands, we very rarely, but fatally, see sequences such as: 2.... 57964564us : execlists_reset_prepare: rcs0 2.... 57964613us : execlists_reset: rcs0 seqno=424 0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1 2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060 2.... 57964703us : execlists_reset_finish: rcs0 0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1 v2: Move the sync into the execlists reset handler so that we coordinate the flush with disabling the interrupt handling and canceling the pending interrupt. v3: Just use synchronize_hardirq() to avoid the might_sleep(), we do not yet have threaded-irq to worry about. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180322073533.5313-4-chris@chris-wilson.co.uk Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>	2018-03-22 20:34:50 +00:00
Chris Wilson	9153e6b7c8	drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt We were relying on the uncached reads when processing the CSB to provide ourselves with the serialisation with the interrupt handler (so we could detect new interrupts in the middle of processing the old one). However, in commit `767a983ab2` ("drm/i915/execlists: Read the context-status HEAD from the HWSP") those uncached reads were eliminated (on one path at least) and along with them our serialisation. The result is that we would very rarely miss notification of a new interrupt and leave a context-switch unprocessed, hanging the GPU. Fixes: `767a983ab2` ("drm/i915/execlists: Read the context-status HEAD from the HWSP") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180321091027.21034-1-chris@chris-wilson.co.uk	2018-03-21 17:08:26 +00:00
Daniele Ceraolo Spurio	fa6f071d54	drm/i915: move gen8 irq shifts to intel_lrc.c The only usage outside the intel_lrc.c file is in the ringbuffer init, but the irq mask calculated there is then overwritten for all engines that have a non-zero shift, so we can drop it. This change is not aimed at code saving but at removing from intel_engines information that does not apply to all gens that have the engine. When checking without the temporary WARN_ON, code size is basically unchanged. v2: make the irq_shifts array static const v3: rebase, move irq_shifts array to logical_ring_default_irqs v4: move array inside the if and use u8 for it (Chris) Suggested-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-4-daniele.ceraolospurio@intel.com	2018-03-15 08:46:06 +00:00
Daniele Ceraolo Spurio	210060edc2	drm/i915: use engine->irq_keep_mask when resetting irqs The "reset" value and the "keep" value are the same. While we are here, add a TODO for gen11 interrupt reset Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-3-daniele.ceraolospurio@intel.com	2018-03-15 08:46:06 +00:00
Weinan Li	702791f7f2	drm/i915: add schedule out notification of preempted but completed request There is one corner case missing schedule out notification of the preempted request. The preempted request is just completed when preemption happen, then it will be canceled and won't be resubmitted later, GVT-g will lost the schedule out notification. Here add schedule out notification if found the preempted request has been completed. v2: - refine description, add completed check and notification in execlists_cancel_port_requests. (Chris) v3: - use ternary confitional, remove local variable. (Tvrtko) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Weinan Li <weinan.z.li@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1520302557-25079-1-git-send-email-weinan.z.li@intel.com	2018-03-08 13:50:11 +02:00
Lionel Landwerlin	8cc7669355	drm/i915: store all subslice masks Up to now, subslice mask was assumed to be uniform across slices. But starting with Cannonlake, slices can be asymmetric (for example slice0 has different number of subslices as slice1+). This change stores all subslices masks for all slices rather than having a single mask that applies to all slices. v2: Rework how we store total numbers in sseu_dev_info (Tvrtko) Fix CHV eu masks, was reading disabled as enabled (Tvrtko) Readability changes (Tvrtko) Add EU index helper (Tvrtko) v3: Turn ALIGN(v, 8) / 8 into DIV_ROUND_UP(v, BITS_PER_BYTE) (Tvrtko) Reuse sseu_eu_idx() for setting eu_mask on CHV (Tvrtko) Reformat debug prints for subslices (Tvrtko) v4: Change eu_mask helper into sseu_set_eus() (Tvrtko) v5: With Haswell reporting masks & counts, bump sseu_*_eus() functions to use u16 (Lionel) v6: Fix sseu_get_eus() for > 8 EUs per subslice (Lionel) v7: Change debugfs enabels for number of subslices per slice, will need a small igt/pm_sseu change (Lionel) Drop subslice_total field from sseu_dev_info, rely on sseu_subslice_total() to recompute the value instead (Lionel) v8: Remove unused function compute_subslice_total() (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-2-lionel.g.landwerlin@intel.com	2018-03-08 10:06:20 +00:00
Michel Thierry	fd034c77b5	drm/i915/icl: Add Indirect Context Offset for Gen11 v2: rebased to intel_lr_indirect_ctx_offset v3: rebase, move define to intel_lrc_reg.h BSpec: 11740 Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-5-mika.kuoppala@linux.intel.com Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-03-07 15:07:39 +02:00
Thomas Daniel	05f0addd9b	drm/i915/icl: Enhanced execution list support Enhanced Execlists is an upgraded version of execlists which supports up to 8 ports. The lrcs to be submitted are written to a submit queue (the ExecLists Submission Queue - ELSQ), which is then loaded on the HW. When writing to the ELSP register, the lrcs are written cyclically in the queue from position 0 to position 7. Alternatively, it is possible to write directly in the individual positions of the queue using the ELSQC registers. To be able to re-use all the existing code we're using the latter method and we're currently limiting ourself to only using 2 elements. v2: Rebase. v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio). v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio) v5: Reword commit, rename regs to be closer to specs, turn off preemption (Daniele), reuse engine->execlists.elsp (Chris) v6: use has_logical_ring_elsq to differentiate the new paths v7: add preemption support, rename els to submit_reg (Chris) v8: save the ctrl register inside the execlists struct, drop CSB handling updates (superseded by preempt_complete_status) (Chris) v9: s/drm_i915_gem_request/i915_request (Mika) v10: resolved conflict in inject_preempt_context (Mika) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-4-mika.kuoppala@linux.intel.com	2018-03-07 15:07:31 +02:00
Daniele Ceraolo Spurio	ac52da6af8	drm/i915/icl: new context descriptor support Starting from Gen11 the context descriptor format has been updated in the HW. The hw_id field has been considerably reduced in size and engine class and instance fields have been added. There is a slight name clashing issue because the field that we call hw_id is actually called SW Context ID in the specs for Gen11+. With the current size of the hw_id field we can have a maximum of 2k contexts at any time, but we could use the sw_counter field (which is sw defined) to increase that because the HW requirement is that engine_id + sw id + sw_counter is a unique number. GuC uses a similar method to support more contexts but does its tracking at lrc level. To avoid doing an implementation that will need to be reworked once GuC support lands, defer it for now and mark it as TODO. v2: rebased, add documentation, fix GEN11_ENGINE_INSTANCE_SHIFT v3: rebased, bring back lost code from i915_gem_context.c v4: make TODO comment more generic v5: be consistent with bit ordering, add extra checks (Chris) Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-3-mika.kuoppala@linux.intel.com Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-03-07 15:07:20 +02:00
Tvrtko Ursulin	d4ccceb055	drm/i915/icl: Ringbuffer interrupt handling On Gen11 interrupt masks need to be clear to allow C6 entry. We keep them all enabled knowing that we generate extra interrupts. v2: Rebase. v3: Remove gen 11 extra check in logical_render_ring_init. v4: Rebase fixes. v5: Rebase/refactor. v6: Rebase. v7: Rebase. v8: Update comment and commit message (Daniele) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-1-mika.kuoppala@linux.intel.com	2018-03-05 16:26:28 +02:00
Chris Wilson	a3e3883646	drm/i915/execlists: Split spinlock from its irq disabling side-effect During reset/wedging, we have to clean up the requests on the timeline and flush the pending interrupt state. Currently, we are abusing the irq disabling of the timeline spinlock to protect the irq state in conjunction to the engine's timeline requests, but this is accidental and conflates the spinlock with the irq state. A baffling state of affairs for the reader. Instead, explicitly disable irqs over the critical section, and separate modifying the irq state from the timeline's requests. Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302143246.2579-4-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-03-02 23:11:12 +00:00
Chris Wilson	aebbc2d7b3	drm/i915/execlists: Move irq state manipulation inside irq disabled region Although this state (execlists->active and engine->irq_posted) itself is not protected by the engine->timeline spinlock, it does conveniently ensure that irqs are disabled. We can use this to protect our manipulation of the state and so ensure that the next IRQ to arrive sees consistent state and (hopefully) ignores the reset engine. Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302131246.22036-1-chris@chris-wilson.co.uk	2018-03-02 23:11:11 +00:00
Chris Wilson	963ddd63c3	drm/i915: Suspend submission tasklets around wedging After staring hard at sequences like [ 28.199013] systemd-1 2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?] [ 28.199095] systemd-1 2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1 [ 28.199177] systemd-1 2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024 [ 28.199258] systemd-1 2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0 [ 28.199340] gem_eio-829 1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.1, seqno=1, prio=0 [ 28.199421] <idle>-0 2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?] [ 28.199503] <idle>-0 2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1 [ 28.199585] gem_eio-829 1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]: ctx=3.1, seqno=2, prio=0 [ 28.199667] gem_eio-829 1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.2, seqno=1, prio=0 [ 28.199749] <idle>-0 2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?] [ 28.199830] <idle>-0 2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1 [ 28.199912] <idle>-0 2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0 [ 28.199994] gem_eio-829 2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?] [ 28.200096] gem_eio-829 2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5 [ 28.200178] gem_eio-829 2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0 [ 28.200260] gem_eio-829 2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id) the conclusion is that the only place where the ports are reset to zero, is from engine->cancel_requests called during i915_gem_set_wedged(). The race is horrible as it results from calling set-wedged on active HW (the GPU reset failed) and as such we need to be careful as the HW state changes beneath us. Fortunately, it's the same scary conditions as affect normal reset, so we can reuse the same machinery to disable state tracking as we clobber it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Fixes: `af7a8ffad9` ("drm/i915: Use rcu instead of stop_machine in set_wedged") Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk	2018-03-02 23:11:11 +00:00
Chris Wilson	f6322eddaf	drm/i915/preemption: Allow preemption between submission ports Sometimes we need to boost the priority of an in-flight request, which may lead to the situation where the second submission port then contains a higher priority context than the first and so we need to inject a preemption event. To do so we must always check inside execlists_dequeue() whether there is a priority inversion between the ports themselves as well as the head of the priority sorted queue, and we cannot just skip dequeuing if the queue is empty. As Michał noted, this doesn't simply extend to handling more than 2-port submission, as we may need to reorder within the array of executing requests which themselves are lower priority than the first. A task for later! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180222142229.14517-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-02-23 16:37:40 +00:00
Chris Wilson	e084039b58	drm/i915/execlists: Move the GEM_BUG_ON context matches CSB later Print out the current request/context before doing the GEM_BUG_ON, so that we can inspect the values in the ftrace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221152301.9178-1-chris@chris-wilson.co.uk	2018-02-22 09:08:39 +00:00
Chris Wilson	65cb8c0f04	drm/i915/execlists: Add a GEM_TRACE to show when the context is completed Include a GEM_TRACE to show when the context is complete and we advance the ELSP port. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221151553.9054-1-chris@chris-wilson.co.uk	2018-02-22 09:08:21 +00:00
Chris Wilson	561210706c	drm/i915/execlists: Remove the ring advancement under preemption Load an empty ringbuffer for preemption, ignoring the lite-restore workaround as we know the preempt context is always idle before preemption. Note that after some digging by Michal Winiarski, we found that RING_HEAD is no longer being updated (due to inhibiting context save restore) so this patch is already in effect! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221133236.29402-1-chris@chris-wilson.co.uk	2018-02-21 20:57:47 +00:00
Chris Wilson	e61e0f51ba	drm/i915: Rename drm_i915_gem_request to i915_request We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2018-02-21 20:57:22 +00:00
Chris Wilson	d081e021fd	drm/i915/execlists: Remove too early assert We can't assert that the execlists are active before we set the flag. So perform the assert after we are expected to have marked the execlists active. Fixes: `339ccd35b4` ("drm/i915: Assert that we always complete a submission to guc/execlists") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Tomi Sarvela <tomi.p.sarvela@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180216153210.30551-1-chris@chris-wilson.co.uk	2018-02-16 16:18:32 +00:00
Chris Wilson	339ccd35b4	drm/i915: Assert that we always complete a submission to guc/execlists The continual resubmission model for execlists (and emulated over guc) requires that we keep feeding requests into the HW in order to generate more CS interrupts to drain the rest of the queue. Add a couple of asserts to ensure that we don't skip a cycle and come to a grinding halt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180215162553.23348-1-chris@chris-wilson.co.uk	2018-02-16 14:14:14 +00:00
Chris Wilson	d637637491	drm/i915: Only allocate preempt context when required If we remove some hardcoded assumptions about the preempt context having a fixed id, reserved from use by normal user contexts, we may only allocate the i915_gem_context when required. Then the subsequent decisions on using preemption reduce to having the preempt context available. v2: Include an assert that we don't allocate the preempt context twice. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-3-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2018-02-08 07:30:16 +00:00
Chris Wilson	3fed180812	drm/i915: Move the scheduler feature bits into the purview of the engines Rather than having the high level ioctl interface guess the underlying implementation details, having the implementation declare what capabilities it exports. We define an intel_driver_caps, similar to the intel_device_info, which instead of trying to describe the HW gives details on what the driver itself supports. This is then populated by the engine backend for the new scheduler capability field for use elsewhere. v2: Use caps.scheduler for validating CONTEXT_PARAM_SET_PRIORITY (Mika) One less assumption of engine[RCS] \o/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tomasz Lis <tomasz.lis@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-2-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2018-02-08 07:30:11 +00:00
Rafael Antognolli	4b6ce6810a	drm/i915/cnl: WaPipeControlBefore3DStateSamplePattern This workaround should prevent a bug that can be hit on a context restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall (0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of NOOP(0x0) in the indirect context batch buffer, to ensure the engine is idle prior to programming 3DSTATE_SAMPLE_PATTERN. It's also not clear whether we should add those extra dwords because of the workaround itself, or if that's just padding for the WA BB (and next commands could come right after the PIPE_CONTROL). We keep them for now. References: HSD#1939868 v2: More descriptive changelog and comments. v3: Explain that PIPE_CONTROL is actually 6 dwords, and that we advance 10 more dwords because of that. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180205233330.14973-1-rafael.antognolli@intel.com	2018-02-06 08:59:39 +00:00
Chris Wilson	e840130a25	drm/i915/execlists: Move the reset bits to a more natural home In preparation for the next patch, we want the engine to appear idle after a reset (if there are no requests in flight). For execlists, this entails clearing the active status on reset, it will be regenerated on restarting the engine after the reset. In the process, note that a couple of other status flags and checks could be moved into the describing function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180205152431.12163-3-chris@chris-wilson.co.uk	2018-02-05 15:27:25 +00:00
Chris Wilson	073988d102	drm/i915/execlists: Remove the startup spam Execlists is now enabled by default and included in the list of capabilities printed out to dmesg and beyond. We do not need to mention it again, every time we restart the engine, so kill the spam. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180205092201.19476-6-chris@chris-wilson.co.uk	2018-02-05 13:24:14 +00:00
Chris Wilson	274de87606	drm/i915/execlists: Flush GTIIR on clearing CS interrupts during reset Be paranoid and flush the GTIIR after clearing the CS interrupt to be sure it has taken before we re-enable the interrupt handler. We still see early interrupts following reset, the tasklet handling the mmio read before it has been written by the CS. This hopefully reduces the frequency to 0... References: https://bugs.freedesktop.org/show_bug.cgi?id=104262 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180202145455.29876-1-chris@chris-wilson.co.uk	2018-02-02 20:31:52 +00:00
Kelvin Gardiner	225701fc20	drm/i915/icl: Set graphics mode register for gen11 This patch clears a single bit. The bit is 0 by default but expected not to be set. Explicitly clearing the bit in this patch is intended to indicate some thinking has occurred, and that we want this bit cleared and we are not just excepting the default value. We also stop setting GFX_RUN_LIST_ENABLE, which is correct since that bit is gone. v2 (from Paulo): fix indentation. v3: Changed GEN check to >= 11. Corrected author name. v4 (from Paulo): improve commit message (Daniele). Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180130134918.32283-9-paulo.r.zanoni@intel.com	2018-01-31 14:29:52 -02:00
Chris Wilson	1d2a19c256	drm/i915/lrc: Remove superfluous WARN_ON Remove the WARN_ON(ce->state) inside the static function only called when ce->state == NULL and downgrade the w/a batch setup warning into a developer only mode (GEM_WARN_ON). v2: Move the deferred alloc guard into the callee, eliminating the need for the WARN_ON: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-1 (-1) Function old new delta execlists_context_pin 1819 1818 -1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180126121846.12007-1-chris@chris-wilson.co.uk	2018-01-26 13:03:07 +00:00
Chris Wilson	09b1a4e4b5	drm/i915/lrc: Clear context restore/save inhibit flags for new contexts CTX_CONTEXT_CONTROL (CTX_SR_CTL) operates as a masked register and so will only apply the bits that are selected by the upper half. In the case of selectively enabling sr inhibit, this may mean the context keeps the current setting (so forgetting to save the context later, eventually leading to a very upset GPU!). Fixes: `517aaffe0c` ("drm/i915/execlists: Inhibit context save/restore for the fake preempt context") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180125112443.12745-1-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2018-01-25 18:04:25 +00:00
Chris Wilson	517aaffe0c	drm/i915/execlists: Inhibit context save/restore for the fake preempt context We only use the preempt context to inject an idle point into execlists. We never need to reference its logical state, so tell the GPU never to load it or save it. v2: BIT(2) for save-inhibit. N.B. Daniele mentioned this bit mbz for ICL, and has been moved into the submission process rather than the context image. Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180123210412.17653-1-chris@chris-wilson.co.uk	2018-01-24 09:40:15 +00:00
Michel Thierry	578f1ac689	drm/i915: Move LRC register offsets to a header file Newer platforms may have subtle offset changes, which will increase the number of defines, so it is probably better to start moving them to its own header file. Also move the macros used while setting the reg state. v2: Rename to intel_lrc_reg.h, to be consistent with i915_reg.h and intel_guc_reg.h (Chris) v3: License notice shenanigans. v4: Documentation/process/coding-style.rst is always right (Chris) v5: Rebase. Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180124004349.22126-2-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2018-01-24 09:22:55 +00:00
Michel Thierry	751d115302	drm/i915/lrc: Update reg_state macros to pass checkpatch The macros we use to init the reg_state had the following issues reported by checkpatch --strict. Macro argument reuse 'reg_state' - possible side-effects Macro argument reuse 'pos' - possible side-effects Macro argument reuse 'ppgtt' - possible side-effects spaces preferred around that '+' (ctx:VxV) So fix these issues before they are moved to a new header file. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180124004349.22126-1-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2018-01-24 09:19:38 +00:00
Chris Wilson	bb5db7e160	drm/i915/execlists: Skip forcewake for ELSP submission Now that we can read the CSB from the HWSP, we may avoid having to perform mmio reads entirely and so forgo the rigmarole of the forcewake dance. v2: Include forcewake hint for GEM_TRACE readback of mmio. If we don't hold fw ourselves, the reads may return garbage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180122100714.15137-1-chris@chris-wilson.co.uk	2018-01-22 18:27:04 +00:00
Tvrtko Ursulin	10bde236ef	drm/i915: Per-engine scratch VMA is mandatory We fail engine initialization if the scratch VMA cannot be created so there is no point in error handle it later. If the initialization ordering gets messed up, we can explode during development just as well. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180119100005.9072-2-tvrtko.ursulin@linux.intel.com	2018-01-22 17:15:31 +00:00
Tvrtko Ursulin	ae504be2e0	drm/i915: Downgrade incorrect engine constructor usage warnings to development Render engine constructor helpers must only be called from the render engine constructors, but there is no need to burden the production binaries with warnings which can only be triggered during development. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180119100005.9072-1-tvrtko.ursulin@linux.intel.com	2018-01-22 17:15:20 +00:00
Chris Wilson	c218ee03b9	drm/i915: Don't adjust priority on an already signaled fence When we retire a signaled fence, we free the dependency tree. However, we skip clearing the list so that if we then try to adjust the priority of the signaled fence, we may walk the list of freed dependencies. [ 3083.156757] ================================================================== [ 3083.156806] BUG: KASAN: use-after-free in execlists_schedule+0x199/0x660 [i915] [ 3083.156810] Read of size 8 at addr ffff8806bf20f400 by task Xorg/831 [ 3083.156815] CPU: 0 PID: 831 Comm: Xorg Not tainted 4.15.0-rc6-no-psn+ #1 [ 3083.156817] Hardware name: Notebook N24_25BU/N24_25BU, BIOS 5.12 02/17/2017 [ 3083.156818] Call Trace: [ 3083.156823] dump_stack+0x5c/0x7a [ 3083.156827] print_address_description+0x6b/0x290 [ 3083.156830] kasan_report+0x28f/0x380 [ 3083.156872] ? execlists_schedule+0x199/0x660 [i915] [ 3083.156914] execlists_schedule+0x199/0x660 [i915] [ 3083.156956] ? intel_crtc_atomic_check+0x146/0x4e0 [i915] [ 3083.156997] ? execlists_submit_request+0xe0/0xe0 [i915] [ 3083.157038] ? i915_vma_misplaced.part.4+0x25/0xb0 [i915] [ 3083.157079] ? __i915_vma_do_pin+0x7c8/0xc80 [i915] [ 3083.157121] ? intel_atomic_state_alloc+0x44/0x60 [i915] [ 3083.157130] ? drm_atomic_helper_page_flip+0x3e/0xb0 [drm_kms_helper] [ 3083.157145] ? drm_mode_page_flip_ioctl+0x7d2/0x850 [drm] [ 3083.157159] ? drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.157172] ? drm_ioctl+0x45b/0x560 [drm] [ 3083.157211] i915_gem_object_wait_priority+0x14c/0x2c0 [i915] [ 3083.157251] ? i915_gem_get_aperture_ioctl+0x150/0x150 [i915] [ 3083.157290] ? i915_vma_pin_fence+0x1d8/0x320 [i915] [ 3083.157331] ? intel_pin_and_fence_fb_obj+0x175/0x250 [i915] [ 3083.157372] ? intel_rotation_info_size+0x60/0x60 [i915] [ 3083.157413] ? intel_link_compute_m_n+0x80/0x80 [i915] [ 3083.157428] ? drm_dev_printk+0x1b0/0x1b0 [drm] [ 3083.157443] ? drm_dev_printk+0x1b0/0x1b0 [drm] [ 3083.157485] intel_prepare_plane_fb+0x2f8/0x5a0 [i915] [ 3083.157527] ? intel_crtc_get_vblank_counter+0x80/0x80 [i915] [ 3083.157536] drm_atomic_helper_prepare_planes+0xa0/0x1c0 [drm_kms_helper] [ 3083.157587] intel_atomic_commit+0x12e/0x4e0 [i915] [ 3083.157605] drm_atomic_helper_page_flip+0xa2/0xb0 [drm_kms_helper] [ 3083.157621] drm_mode_page_flip_ioctl+0x7d2/0x850 [drm] [ 3083.157638] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm] [ 3083.157652] ? drm_lease_owner+0x1a/0x30 [drm] [ 3083.157668] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm] [ 3083.157681] drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.157696] drm_ioctl+0x45b/0x560 [drm] [ 3083.157711] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm] [ 3083.157725] ? drm_getstats+0x20/0x20 [drm] [ 3083.157729] ? timerqueue_del+0x49/0x80 [ 3083.157732] ? __remove_hrtimer+0x62/0xb0 [ 3083.157735] ? hrtimer_try_to_cancel+0x173/0x210 [ 3083.157738] do_vfs_ioctl+0x13b/0x880 [ 3083.157741] ? ioctl_preallocate+0x140/0x140 [ 3083.157744] ? _raw_spin_unlock_irq+0xe/0x30 [ 3083.157746] ? do_setitimer+0x234/0x370 [ 3083.157750] ? SyS_setitimer+0x19e/0x1b0 [ 3083.157752] ? SyS_alarm+0x140/0x140 [ 3083.157755] ? __rcu_read_unlock+0x66/0x80 [ 3083.157757] ? __fget+0xc4/0x100 [ 3083.157760] SyS_ioctl+0x74/0x80 [ 3083.157763] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 3083.157765] RIP: 0033:0x7f6135d0c6a7 [ 3083.157767] RSP: 002b:00007fff01451888 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 [ 3083.157769] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6135d0c6a7 [ 3083.157771] RDX: 00007fff01451950 RSI: 00000000c01864b0 RDI: 000000000000000c [ 3083.157772] RBP: 00007f613076f600 R08: 0000000000000001 R09: 0000000000000000 [ 3083.157773] R10: 0000000000000060 R11: 0000000000003246 R12: 0000000000000000 [ 3083.157774] R13: 0000000000000060 R14: 000000000000001b R15: 0000000000000060 [ 3083.157779] Allocated by task 831: [ 3083.157783] kmem_cache_alloc+0xc0/0x200 [ 3083.157822] i915_gem_request_await_dma_fence+0x2c4/0x5d0 [i915] [ 3083.157861] i915_gem_request_await_object+0x321/0x370 [i915] [ 3083.157900] i915_gem_do_execbuffer+0x1165/0x19c0 [i915] [ 3083.157937] i915_gem_execbuffer2+0x1ad/0x550 [i915] [ 3083.157950] drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.157962] drm_ioctl+0x45b/0x560 [drm] [ 3083.157964] do_vfs_ioctl+0x13b/0x880 [ 3083.157966] SyS_ioctl+0x74/0x80 [ 3083.157968] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 3083.157971] Freed by task 831: [ 3083.157973] kmem_cache_free+0x77/0x220 [ 3083.158012] i915_gem_request_retire+0x72c/0xa70 [i915] [ 3083.158051] i915_gem_request_alloc+0x1e9/0x8b0 [i915] [ 3083.158089] i915_gem_do_execbuffer+0xa96/0x19c0 [i915] [ 3083.158127] i915_gem_execbuffer2+0x1ad/0x550 [i915] [ 3083.158140] drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.158153] drm_ioctl+0x45b/0x560 [drm] [ 3083.158155] do_vfs_ioctl+0x13b/0x880 [ 3083.158156] SyS_ioctl+0x74/0x80 [ 3083.158158] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 3083.158162] The buggy address belongs to the object at ffff8806bf20f400 which belongs to the cache i915_dependency of size 64 [ 3083.158166] The buggy address is located 0 bytes inside of 64-byte region [ffff8806bf20f400, ffff8806bf20f440) [ 3083.158168] The buggy address belongs to the page: [ 3083.158171] page:00000000d43decc4 count:1 mapcount:0 mapping: (null) index:0x0 [ 3083.158174] flags: 0x17ffe0000000100(slab) [ 3083.158179] raw: 017ffe0000000100 0000000000000000 0000000000000000 0000000180200020 [ 3083.158182] raw: ffffea001afc16c0 0000000500000005 ffff880731b881c0 0000000000000000 [ 3083.158184] page dumped because: kasan: bad access detected [ 3083.158187] Memory state around the buggy address: [ 3083.158190] ffff8806bf20f300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158192] ffff8806bf20f380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158195] >ffff8806bf20f400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158196] ^ [ 3083.158199] ffff8806bf20f480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158201] ffff8806bf20f500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158203] ================================================================== Reported-by: Alexandru Chirvasitu <achirvasub@gmail.com> Reported-by: Mike Keehan <mike@keehan.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104436 Fixes: `1f181225f8` ("drm/i915/execlists: Keep request->priority for its lifetime") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Alexandru Chirvasitu <achirvasub@gmail.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Tested-by: Alexandru Chirvasitu <achirvasub@gmail.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180106105618.13532-1-chris@chris-wilson.co.uk	2018-01-08 09:16:02 +00:00
Chris Wilson	2221c5b7dd	drm/i915/execlists: Reduce list_for_each_safe+list_safe_reset_next After staring at the list_for_each_safe macros for a bit, our current invocation of list_safe_reset_next in execlists_schedule() simply reduces to list_for_each. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-11-chris@chris-wilson.co.uk	2018-01-03 12:09:46 +00:00
Chris Wilson	ce01b17377	drm/i915/execlists: Assert there are no simple cycles in the dependencies The dependency chain must be an acyclic graph. This is checked by the swfence, but for sanity, also do a simple check that we do not corrupt our list iteration in execlists_schedule() by a shallow dependency cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-10-chris@chris-wilson.co.uk	2018-01-03 12:09:44 +00:00
Chris Wilson	83cc84c5a8	drm/i915: Assert all signalers we depended on did indeed signal Back up our comment that all signalers should have been signaled before we ourselves were retired with an assert to that effect. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-9-chris@chris-wilson.co.uk	2018-01-03 12:05:41 +00:00
Chris Wilson	f3c9d40757	drm/i915/execlists: Tidy enabling execlists Move the register settings for enabling execlists into its own function for clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-18-chris@chris-wilson.co.uk	2018-01-03 11:02:43 +00:00
Chris Wilson	693cfbf058	drm/i915/execlists: Record elsp offset during engine setup Currently, we record the elsp register offset inside init-hw but we only need to do it once during engine setup (after we know the mmio iomapping). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-17-chris@chris-wilson.co.uk	2018-01-03 11:01:38 +00:00
Chris Wilson	4223221361	drm/i915/execlists: Clear context-switch interrupt earlier in the reset Move the clearing of the CS-interrupt into the engine reset phase, before the current init-hw phase. This helps clarify that we clear the pending interrupts prior to any restarting of the execlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-16-chris@chris-wilson.co.uk	2018-01-03 11:01:05 +00:00
Chris Wilson	193a98dc7c	drm/i915/execlists: Show preemption progress in GEM_TRACE We already emit a GEM_TRACE for when we start preemption, but we lack one to show when the preemption is completed and we return to the regular queue. This is to continue the investigation into the mysterious <0>[ 197.854177] <idle>-0 1..s1 197837017us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0], tail=0 [0] <0>[ 197.854209] drv_self-6008 2.... 197837390us : reset_common_ring: rcs0 seqno=15515 <0>[ 197.854240] drv_self-6008 2.... 197837415us : reset_common_ring: bcs0 seqno=0 <0>[ 197.854270] drv_self-6008 2.... 197837443us : reset_common_ring: vcs0 seqno=0 <0>[ 197.854300] drv_self-6008 2.... 197837463us : reset_common_ring: vcs1 seqno=0 <0>[ 197.854330] drv_self-6008 2.... 197837482us : reset_common_ring: vecs0 seqno=0 <0>[ 197.854360] ksoftirq-23 2..s. 197838341us : execlists_submission_tasklet: bcs0 in[0]: ctx=0.1, seqno=1dce7 <0>[ 197.854392] <idle>-0 1..s1 197838347us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=0 [0] <0>[ 197.854423] ksoftirq-23 2..s. 197838354us : execlists_submission_tasklet: vcs0 in[0]: ctx=0.1, seqno=1d027 <0>[ 197.854456] ksoftirq-23 2.Ns. 197838361us : execlists_submission_tasklet: vcs1 in[0]: ctx=0.1, seqno=1e738 <0>[ 197.854488] ksoftirq-23 2.Ns. 197838366us : execlists_submission_tasklet: vecs0 in[0]: ctx=0.1, seqno=235aa <0>[ 197.854520] ksoftirq-23 2.Ns. 197838376us : execlists_submission_tasklet: rcs0 in[0]: ctx=0.1, seqno=15518 <0>[ 197.854552] <idle>-0 1..s1 197853285us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0], tail=7 [7] <0>[ 197.854584] <idle>-0 1..s1 197853285us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000 <0>[ 197.854616] <idle>-0 1..s1 197853286us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171222132742.4272-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-12-22 18:32:51 +00:00
Chris Wilson	16a8739402	drm/i915: Tidy up GEM_TRACE around execlists Looking at the coordination of resets with the submission of execlists, it will be useful to have a GEM_TRACE for when we issue the reset. Whilst there tidy up the other GEM_TRACE to always include the engine name, and be careful not to trust any pointers prior to asserts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171220090626.31643-1-chris@chris-wilson.co.uk	2017-12-20 13:24:34 +00:00
Chris Wilson	16c8619a7c	drm/i915: Avoid context dereference inside execlists_submission_tasklet A lesson that has to be relearnt over and over again is that the request does not keep a reference to the context and so we cannot freely dereference the context from inside the execlists_submission_tasklet. In particular, we try to do so in the new GEM_TRACE() so convert those over to the port->context_id we keep for GEM debugging. This means the tracing now depends on DRM_I915_GEM_DEBUG. Fixes: `bccd3b8311` ("drm/i915: Use trace_printk to provide a death rattle for GEM") References: https://bugs.freedesktop.org/show_bug.cgi?id=104066 References: https://bugs.freedesktop.org/show_bug.cgi?id=104162 References: https://bugs.freedesktop.org/show_bug.cgi?id=104242 References: https://bugs.freedesktop.org/show_bug.cgi?id=104310 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171219220916.30882-1-chris@chris-wilson.co.uk	2017-12-19 23:04:45 +00:00
Chris Wilson	2fc7a06ad5	drm/i915/execlists: Cache ELSP register offset Currently on every submission, we recalculate the ELSP register offset for the engine, after chasing the pointers to find the iomem base. Since this is fixed for the lifetime of the driver, record the offset in the execlists struct. In practice the difference is negligible, it just happens to remove 27 bytes of eyesore pointer dancing from next to the hottest instruction (which is itself due to stalling for a cache miss) in perf profiles of the execlists_submission_tasklet(). v2: Trim off one more elsp local. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171207222434.17686-1-chris@chris-wilson.co.uk	2017-12-08 00:37:05 +00:00
Tvrtko Ursulin	cf669b4e9f	drm/i915: Consolidate checks for engine stats availability Sagar noticed the check can be consolidated between the engine stats implementation and the PMU. My first choice was a static inline helper but that got into include ordering mess quickly fast so I went with a macro instead. At some point we should perhaps looking into taking out the non-ringubffer bits from intel_ringbuffer.h into a new intel_engine.h or something. v2: Use engine->flags. (Chris Wilson) v3: Rebase and mark GuC as not yet supported. (Chris Wilson) v4: Move flag setting to intel_engines_reset_default_submission. (Chris Wilson) v5: Move flag setting to logical_ring_setup. v6: intel_engines_reset_default_submission is the wrong place to set the flag - it needs to be in execlists_set_default_submission. (Sagar) v7: Flag setting in logical_ring_setup is not required. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v6) Link: https://patchwork.freedesktop.org/patch/msgid/20171129102805.22690-1-tvrtko.ursulin@linux.intel.com	2017-11-29 12:29:40 +00:00
Tvrtko Ursulin	30e17b7847	drm/i915: Engine busy time tracking Track total time requests have been executing on the hardware. We add new kernel API to allow software tracking of time GPU engines are spending executing requests. Both per-engine and global API is added with the latter also being exported for use by external users. v2: * Squashed with the internal API. * Dropped static key. * Made per-engine. * Store time in monotonic ktime. v3: Moved stats clearing to disable. v4: * Comments. * Don't export the API just yet. v5: Whitespace cleanup. v6: * Rename ref to active. * Drop engine aggregate stats for now. * Account initial busy period after enabling stats. v7: * Rebase. v8: * Move context in notification after the notifier. (Chris Wilson) v9: In cases where stats tracking is getting disabled while there is an active context on an engine, add up the current value to the total. This also implies we don't clear the total when tracking is disabled any longer. There is no real need to do so because we define the stats as relative while enabled, meaning comparison between two samples while tracking is enabled is the valid usage. However, when busy stats will later be plugged into the perf PMU API, it is beneficial to not reset the total, since the PMU core likes to do some counter disable/enable cycles on startup, and while doing so during a single long context executing on an engine we would lose some accuracy and so make unit testing more difficult than needs to be. v10: * Fix accounting for preemption. v11: * Rebase for i915_modparams.enable_execlists removal. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-5-tvrtko.ursulin@linux.intel.com	2017-11-22 11:25:02 +00:00
Tvrtko Ursulin	73fd9d3816	drm/i915: Wrap context schedule notification No functional change just something which will be handy in the following patch. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-4-tvrtko.ursulin@linux.intel.com	2017-11-22 11:25:01 +00:00
Chris Wilson	fb5c551ad5	drm/i915: Remove i915.enable_execlists module parameter Execlists and legacy ringbuffer submission are no longer feature comparable (execlists now offer greater functionality that should overcome their performance hit) and obsoletes the unsafe module parameter, i.e. comparing the two modes of execution is no longer useful, so remove the debug tool. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk	2017-11-20 21:53:59 +00:00
Michel Thierry	ba74cb10c7	drm/i915/execlists: Delay writing to ELSP until HW has processed the previous write The hardware needs some time to process the information received in the ExecList Submission Port, and expects us to not write anything more until it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or PREEMPTED CSB event. If we do not follow this, the driver could write new data into the ELSP before HW had finishing fetching the previous one, putting us in 'undefined behaviour' space. This seems to be the problem causing the spurious PREEMPTED & COMPLETE events after a COMPLETE like the one below: [] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3. [] vcs0: Execlist CSB[0]: 0x00000018 _ 0x00000007 [] vcs0: Execlist CSB[1]: 0x00000001 _ 0x00000000 [] vcs0: Execlist CSB[2]: 0x00000018 _ 0x00000007 <<< COMPLETE [] vcs0: Execlist CSB[3]: 0x00000012 _ 0x00000007 <<< PREEMPTED & COMPLETE [] vcs0: Execlist CSB[4]: 0x00008002 _ 0x00000006 [] vcs0: Execlist CSB[5]: 0x00000014 _ 0x00000006 The ELSP writes that lead to this CSB sequence show that the HW hadn't started executing the previous execlist (the one with only ctx 0x6) by the time the new one was submitted; this is a bit more clear in the data show in the EXECLIST_STATUS register at the time of the ELSP write. [] vcs0: ELSP[0] = 0x0_0 [execlist1] - status_reg = 0x0_302 [] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302 [] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308 [] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308 Note that having to wait for this ack does not disable lite-restores, although it may reduce their numbers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035 Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-20 17:01:38 +00:00
Chris Wilson	1f5f9edb44	drm/i915/execlists: Assert that we don't get mixed IDLE_ACTIVE \| COMPLETE events If IDLE_ACTIVE is set, then all other bits are invalid. For us, we can assert that if we see a COMPLETE \| PREEMPTED event, then it should be impossible for it to also contain an IDLE_ACTIVE flag. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-11-20 14:50:45 +00:00
Chris Wilson	d8747afb11	drm/i915/execlists: Reduce completed event mask to COMPLETE \| PREEMPTED Since we get a COMPLETE event when the context switch occurs on RING_HEAD == RING_TAIL and a PREEMPTED event when a switch occurs before that point, COMPLETE \| PREEMPTED should cover all possible context switch completion events. We can move the ELEMENT_SWITCH info message from the COMPLETED_MASK into an assertion for when we are performing a switch to port[1]. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-11-20 14:50:03 +00:00
Chris Wilson	e40dd22624	drm/i915/execlists: Listen to COMPLETE context event not ACTIVE_IDLE Since commit `e1fee72c2e` Author: Oscar Mateo <oscar.mateo@intel.com> Date: Thu Jul 24 17:04:40 2014 +0100 drm/i915/bdw: Avoid non-lite-restore preemptions execlists has listened to (ACTIVE_IDLE \| ELEMENT_SWITCH) for detecting when one context completed and it either continued onto the next (in port 1) or idled. We would always see COMPLETE \| ACTIVE_IDLE on the final context-switch event, but on recent gen it appears that we now get separate ACTIVE_IDLE and COMPLETE events. In particular, the ACTIVE_IDLE events may not be coupled to a context (since it is a general state rather than a specific context completion event). v2: Update the history, execlists did originally start out by listening to the COMPLETE event not ACTIVE_IDLE. v3: Update preempt completion test to also use COMPLETE not ACTIVE_IDLE. References: bspec/12255 References: https://bugs.freedesktop.org/show_bug.cgi?id=103800 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-1-chris@chris-wilson.co.uk	2017-11-20 14:49:16 +00:00
Sagar Arun Kamble	c6dce8f140	drm/i915: Update execlists tasklet naming intel_lrc_irq_handler and i915_guc_irq_handler are HW submission related tasklet functions. Name them with "submission_tasklet" suffix and remove intel/i915 prefix as they are static. Also rename irq_tasklet as just tasklet for clarity. v2: s/_bh/_tasklet (Chris) Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-2-git-send-email-sagar.a.kamble@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-16 15:01:31 +00:00
Chris Wilson	fd13821219	drm/i915: Make request's wait-for-space explicit At the start of building a request, we would wait for roughly enough space to fit the average request (to reduce the likelihood of having to wait and abort partway through request construction). To achieve we would try to begin a 0-length command packet, this just adds extra confusion so make the wait-for-space explicit, as in the next patch we want to move it from the backend to the i915_gem_request_alloc() so it can ensure that the wait-for-space is the first operation in building a new request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171115151204.8105-2-chris@chris-wilson.co.uk	2017-11-15 17:12:49 +00:00
Chris Wilson	7c2fa7faf1	drm/i915: Stop caching the "golden" renderstate As we now record the default HW state and so only emit the "golden" renderstate once to prepare the HW, there is no advantage in keeping the renderstate batch around as it will never be used again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-8-chris@chris-wilson.co.uk	2017-11-10 17:23:22 +00:00
Chris Wilson	d2b4b97933	drm/i915: Record the default hw state after reset upon load Take a copy of the HW state after a reset upon module loading by executing a context switch from a blank context to the kernel context, thus saving the default hw state over the blank context image. We can then use the default hw state to initialise any future context, ensuring that each starts with the default view of hw state. v2: Unmap our default state from the GTT after stealing it from the context. This should stop us from accidentally overwriting it via the GTT (and frees up some precious GTT space). Testcase: igt/gem_ctx_isolation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk	2017-11-10 17:23:10 +00:00
Chris Wilson	f4e15af7e2	drm/i915: Mark the context state as dirty/written In the next few patches, we will want to both copy out of the context image and write a valid image into a new context. To be completely safe, we should then couple in our domain tracking to ensure that we don't have any issues with stale data remaining in unwanted cachelines. Historically, we omitted the .write=true from the call to set-gtt-domain in i915_switch_context() in order to avoid a stall between every request as we would want to wait for the previous context write from the gpu. Since then, we limit the set-gtt-domain to only occur when we first bind the vma, so once in use we will never stall, and we are sure to flush the context following a load from swap. Equally we never applied the lessons learnt from ringbuffer submission to execlists; so time to apply the flush of the lrc after load as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-6-chris@chris-wilson.co.uk	2017-11-10 17:20:29 +00:00
Chris Wilson	bccd3b8311	drm/i915: Use trace_printk to provide a death rattle for GEM Trying to enable printk debugging for GEM is fraught with the issue of spam; interactions with HW are very frequent and often boring. However, one instance where they are not so boring is just before a BUG; here ftrace provides a facility to dump its ringbuffer on an oops. So for CI let's enable trace_printk() to capture the last exchanges with HW as a death rattle. For example, [ 79.234110] ------------[ cut here ]------------ [ 79.234137] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:907! [ 79.234145] invalid opcode: 0000 [#1] SMP [ 79.234153] Dumping ftrace buffer: [ 79.234158] --------------------------------- ... [ 79.314044] gem_conc-1059 1..s1 79203443us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.2, seqno=145 [ 79.314089] gem_conc-1059 1..s. 79220800us : intel_lrc_irq_handler: bcs0 csb[1/1]: status=0x00000018:0x00000005 [ 79.314133] gem_conc-1059 1..s. 79220803us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.1, seqno=145 [ 79.314177] gem_conc-1062 2..s1 79230458us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.1, seqno=146 [ 79.314220] gem_conc-1062 2..s1 79230515us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.2, seqno=147 [ 79.314265] gem_conc-1059 1..s1 79230951us : intel_lrc_irq_handler: bcs0 csb[2/3]: status=0x00000012:0x00000008 [ 79.314309] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.2, seqno=147 [ 79.314353] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 csb[3/3]: status=0x00008002:0x00000008 [ 79.314396] gem_conc-1059 1..s1 79230955us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.1, seqno=147 [ 79.314402] --------------------------------- v2: Tweak the formatting to be more consistent between in/out. v3: do {} while (0) stub macro protection Suggested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171109143019.16568-1-chris@chris-wilson.co.uk	2017-11-09 21:39:18 +00:00
Michał Winiarski	c41937fd99	drm/i915/guc: Preemption! With GuC Pretty similar to what we have on execlists. We're reusing most of the GEM code, however, due to GuC quirks we need a couple of extra bits. Preemption is implemented as GuC action, and actions can be pretty slow. Because of that, we're using a mutex to serialize them. Since we're requesting preemption from the tasklet, the task of creating a workitem and wrapping it in GuC action is delegated to a worker. To distinguish that preemption has finished, we're using additional piece of HWSP, and since we're not getting context switch interrupts, we're also adding a user interrupt. The fact that our special preempt context has completed unfortunately doesn't mean that we're ready to submit new work. We also need to wait for GuC to finish its own processing. v2: Don't compile out the wait for GuC, handle workqueue flush on reset, no need for ordered workqueue, put on a reviewer hat when looking at my own patches (Chris) Move struct work around in intel_guc, move user interruput outside of conditional (Michał) Keep ring around rather than chase though intel_context v3: Extract WA for flushing ggtt writes to a helper (Chris) Keep work_struct in intel_guc rather than engine (Michał) Use ordered workqueue for inject_preempt worker to avoid GuC quirks. v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele) Drop stray newlines, use container_of for intel_guc in worker, check for presence of workqueue when flushing it, rather than enable_guc_submission modparam, reorder preempt postprocessing (Chris) v5: Make wq NULL after destroying it v6: Swap struct guc_preempt_work members (Michał) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com	2017-10-26 21:35:21 +01:00
Michał Winiarski	a4598d1755	drm/i915: Rename helpers used for unwinding, use macro for can_preempt We would also like to make use of execlist_cancel_port_requests and unwind_incomplete_requests in GuC preemption backend. Let's rename the functions to use the correct prefixes, so that we can simply add the declarations in the following patch. Similar thing for applies for can_preempt, except we're introducing HAS_LOGICAL_RING_PREEMPTION macro instad, converting other users that were previously touching device info directly. v2: s/intel_engine/execlists and pass execlists to unwind (Chris) v3: use locked version for exporting, drop const qual (Chris) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-11-michal.winiarski@intel.com	2017-10-26 21:35:21 +01:00
Michał Winiarski	df77cd83d5	drm/i915: Extract "emit write" part of emit breadcrumb functions Let's separate the "emit" part from touching any internal structures, this way we can have a generic "emit coherent GGTT write" function. We would like to reuse this functionality for emitting HWSP write, to confirm that preempt-to-idle has finished. v2: Reorder args to match emit_pipe_control, s/render/rcs (Chris) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-8-michal.winiarski@intel.com	2017-10-26 21:35:21 +01:00
Michał Winiarski	9bdc3573a5	drm/i915/guc: Initialize GuC before restarting engines Now that we're handling request resubmission the same way as regular submission (from the tasklet), we can move GuC initialization earlier, before restarting the engines. This way, we're no longer being in the state of flux during engine restart - we're already in user requested submission mode. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025172519.10670-5-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-10-25 19:41:04 +01:00
Chris Wilson	aba5e27858	drm/i915: Add a hook for making the engines idle (parking) and unparking In the next patch, we will want to install a callback when the engines (GT as a whole) become idle and similarly when they first become busy. To enable that callback, first rename intel_engines_mark_idle() to intel_engines_park() and provide the companion intel_engines_unpark(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk	2017-10-25 18:47:30 +01:00
Chris Wilson	64b80085dd	drm/i915/execlists: Remove the priority "optimisation" Originally we set the priority to max upon inserting the request into the execlists queue (and removing it from the scheduler lists). We could then use the prio==INT_MAX as a shortcut within execlists_schedule() to detect the end of the dependency chain. Since commit `1f181225f8` ("drm/i915/execlists: Keep request->priority for its lifetime") this is no longer true as we use the request completion as an indicator the schedule dependency chain is complete instead. (This allows us to then reschedule requests even when its context is in flight.) However, this makes the GEM_BUG_ON() inside execlists_schedule() racy as we may change the rq->prio at the same time. As the assertion is useful, let's keep the assertion and remove the micro-optimisation. Fixes: `1f181225f8` ("drm/i915/execlists: Keep request->priority for its lifetime") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171024115501.21033-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-10-24 15:54:38 +01:00
Chris Wilson	4a118ecbe9	drm/i915: Filter out spurious execlists context-switch interrupts Back in commit `a4b2b01523` ("drm/i915: Don't mark an execlists context-switch when idle") we noticed the presence of late context-switch interrupts. We were able to filter those out by looking at whether the ELSP remained active, but in commit `beecec9017` ("drm/i915/execlists: Preemption!") that became problematic as we now anticipate receiving a context-switch event for preemption while ELSP may be empty. To restore the spurious interrupt suppression, add a counter for the expected number of pending context-switches and skip if we do not need to handle this interrupt to make forward progress. v2: Don't forget to switch on for preempt. v3: Reduce the counter to a on/off boolean tracker. Declare the HW as active when we first submit, and idle after the final completion event (with which we confirm the HW says it is idle), and track each source of activity separately. With a finite number of sources, it should aide us in debugging which gets stuck. Fixes: `beecec9017` ("drm/i915/execlists: Preemption!") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-3-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2017-10-24 15:54:37 +01:00
Chris Wilson	3d574a6bbb	drm/i915: Remove walk over obj->vma_list for the shrinker In the next patch, we want to reduce the lock coverage within the shrinker, and one of the dangerous walks we have is over obj->vma_list. We are only walking the obj->vma_list in order to check whether it has been permanently pinned by HW access, typically via use on the scanout. But we have a couple of other long term pins, the context objects for which we currently have to check the individual vma pin_count. If we instead mark these using obj->pin_display, we can forgo the dangerous and sometimes slow list iteration. v2: Rearrange code to try and avoid confusion from false associations due to arrangement of whitespace along with rebasing on obj->pin_global. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-4-chris@chris-wilson.co.uk	2017-10-16 20:44:19 +01:00
Weinan Li	1fd51d9d97	drm/i915: enable to read CSB and CSB write pointer from HWSP in GVT-g VM Let GVT-g VM read the CSB and CSB write pointer from virtual HWSP, not all the host support this feature, need to check the BIT(3) of caps in PVINFO. v3 : Remove unnecessary comments. v4 : Separate VM enable patch with GVT-g implementation patch due to code dependency. v5 : Use inline for GVT virtual HWSP caps check function. v6 : Comments refine. Signed-off-by: Weinan Li <weinan.z.li@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1508039725-1066-1-git-send-email-weinan.z.li@intel.com	2017-10-16 13:56:29 +03:00
Mika Kuoppala	dc2279e169	drm/i915: Use execlists_num_ports instead of size of array There is function to tell how many ports we have, so use it. We still have direct relationship with array size and port count, so no harm was done. Fixes: `76e70087d3` ("drm/i915: Make execlist port count variable") Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171010114857.13108-1-mika.kuoppala@intel.com	2017-10-10 16:19:40 +03:00
Chris Wilson	279f5a00c9	drm/i915/execlists: Add a comment for the extra MI_ARB_ENABLE Michel Thierry noticed that we were applying WaDisableCtxRestoreArbitration even to gen9, which does not require the w/a. The rationale is that we need to enable MI arbitration for execlists to work, and to be safe we do that before every batch (in addition to every context switch into the batch). Since this is not clear from the single line comment suggesting the MI_ARB_ENABLE is solely for the w/a, add a little more detail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171005191005.13462-1-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2017-10-06 23:21:11 +01:00
Chris Wilson	beecec9017	drm/i915/execlists: Preemption! When we write to ELSP, it triggers a context preemption at the earliest arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other operations and the explicit MI_ARB_CHECK). If this is to the same context, it triggers a LITE_RESTORE where the RING_TAIL is merely updated (used currently to chain requests from the same context together, avoiding bubbles). However, if it is to a different context, a full context-switch is performed and it will start to execute the new context saving the image of the old for later execution. Previously we avoided preemption by only submitting a new context when the old was idle. But now we wish embrace it, and if the new request has a higher priority than the currently executing request, we write to the ELSP regardless, thus triggering preemption, but we tell the GPU to switch to our special preemption context (not the target). In the context-switch interrupt handler, we know that the previous contexts have finished execution and so can unwind all the incomplete requests and compute the new highest priority request to execute. It would be feasible to avoid the switch-to-idle intermediate by programming the ELSP with the target context. The difficulty is in tracking which request that should be whilst maintaining the dependency change, the error comes in with coalesced requests. As we only track the most recent request and its priority, we may run into the issue of being tricked in preempting a high priority request that was followed by a low priority request from the same context (e.g. for PI); worse still that earlier request may be our own dependency and the order then broken by preemption. By injecting the switch-to-idle and then recomputing the priority queue, we avoid the issue with tracking in-flight coalesced requests. Having tried the preempt-to-busy approach, and failed to find a way around the coalesced priority issue, Michal's original proposal to inject an idle context (based on handling GuC preemption) succeeds. The current heuristic for deciding when to preempt are only if the new request is of higher priority, and has the privileged priority of greater than 0. Note that the scheduler remains unfair! v2: Disable for gen8 (bdw/bsw) as we need additional w/a for GPGPU. Since, the feature is now conditional and not always available when we have a scheduler, make it known via the HAS_SCHEDULER GETPARAM (now a capability mask). v3: Stylistic tweaks. v4: Appease Joonas with a snippet of kerneldoc, only to fuel to fire of the preempt vs preempting debate. Suggested-by: Michal Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-8-chris@chris-wilson.co.uk	2017-10-04 17:52:46 +01:00
Chris Wilson	1f181225f8	drm/i915/execlists: Keep request->priority for its lifetime With preemption, we will want to "unsubmit" a request, taking it back from the hw and returning it to the priority sorted execution list. In order to know where to insert it into that list, we need to remember its adjust priority (which may change even as it was being executed). This also affects reset for execlists as we are now unsubmitting the requests following the reset (rather than directly writing the ELSP for the inflight contexts). This turns reset into an accidental preemption point, as after the reset we may choose a different pair of contexts to submit to hw. GuC is not updated as this series doesn't add preemption to the GuC submission, and so it can keep benefiting from the early pruning of the DFS inside execlists_schedule() for a little longer. We also need to find a way of reducing the cost of that DFS... v2: Include priority in error-state Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-6-chris@chris-wilson.co.uk	2017-10-04 17:52:46 +01:00
Chris Wilson	3ad7b52d96	drm/i915/execlists: Move bdw GPGPU w/a to emit_bb Move the re-enabling of MI arbitration from a per-bb w/a buffer to the emission of the batch buffer itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-5-chris@chris-wilson.co.uk	2017-10-04 17:52:45 +01:00
Chris Wilson	d6c0511300	drm/i915/execlists: Distinguish the incomplete context notifies Let the listener know that the context we just scheduled out was not complete, and will be scheduled back in at a later point. v2: Handle CONTEXT_STATUS_PREEMPTED in gvt by aliasing it to CONTEXT_STATUS_OUT for the moment, gvt can expand upon the difference later. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-3-chris@chris-wilson.co.uk	2017-10-04 17:52:45 +01:00
Maarten Lankhorst	8279aaf590	drm/i915: Remove use_mmio_flip modparm, v2. This has been unused since commit `afa8ce5b30` ("drm/i915: Nuke legacy flip queueing code"). Changes since v1: - Rebase on top of all the changes to modparams. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> \o/-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171004094416.31306-1-maarten.lankhorst@linux.intel.com	2017-10-04 15:51:09 +02:00
Michał Winiarski	097a94815f	drm/i915/execlists: Cache the last priolist lookup Avoid the repeated rbtree lookup for each request as we unwind them by tracking the last priolist. v2: Fix up my unhelpful suggestion of using default_priolist. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-4-chris@chris-wilson.co.uk	2017-09-29 12:46:08 +01:00
Chris Wilson	7d1ea609f6	drm/i915: Give the invalid priority a magic name We use INT_MIN to denote the priority of a request that has not been submitted to the scheduler; we treat INT_MIN as an invalid priority and initialise the request to it. Give the value a name so it stands out. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-3-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com	2017-09-29 12:45:34 +01:00
Chris Wilson	7e4992ac04	drm/i915/execlists: Move request unwinding to a separate function In the future, we will want to unwind requests following a preemption point. This requires the same steps as for unwinding upon a reset, so extract the existing code to a separate function for later use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-2-chris@chris-wilson.co.uk	2017-09-29 12:45:21 +01:00
Chris Wilson	7e44fc289d	drm/i915/execlists: Notify context-out for lost requests When cancelling requests, also send the notification to any listeners (gvt) that the request is no longer scheduled on hw. They may require to keep the in/out exactly balanced, and so the reuse after the reset may confuse the listener. Fixes: `221ab9719b` ("drm/i915/execlists: Unwind incomplete requests on resets") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170926101720.9479-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2017-09-27 10:48:59 +01:00
Chris Wilson	3f9e6cd823	drm/i915/execlists: Microoptimise execlists_cancel_port_request() Just rearrange the code slightly to trim the number of iterations required. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170925124929.16974-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2017-09-25 20:36:02 +01:00
Chris Wilson	b8aa223341	drm/i915/lrc: Skip no-op per-bb buffer on gen9 Since we inherited the context image setup from gen8 which needed a per-bb workaround (for GPGPU), we are submitting an empty per-bb buffer on gen9. Now that we can skip adding the buffer to the context image, remove the dangling per-bb. This slightly improves execution latency, most notably on an idle engine. References: https://bugs.freedesktop.org/show_bug.cgi?id=87725 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-2-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-09-25 10:21:45 +01:00
Chris Wilson	604a8f6f1e	drm/i915/lrc: Only enable per-context and per-bb buffers if set The per-context and per-batch workaround buffers are optional, yet we tell the GPU to execute them even if they contain no instructions. Doing so incurs the dispatch latency, which we can avoid if we don't ask the GPU to execute the no-op buffers. Allow ourselves to skip setup of empty buffer, and then to only enable non-empty buffers in the context image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-09-25 10:21:21 +01:00
Mika Kuoppala	76e70087d3	drm/i915: Make execlist port count variable As we emulate execlists on top of the GuC workqueue, it is not restricted to just 2 ports and we can increase that number arbitrarily to trade-off queue depth (i.e. scheduling latency) against pipeline bubbles. v2: rebase. better commit msg (Chris) v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-5-mika.kuoppala@intel.com	2017-09-25 11:33:53 +03:00
Mika Kuoppala	7a62cc6107	drm/i915: Add execlist_port_complete When first execlist entry is processed, we move the port (contents). Introduce function for this as execlist and guc use this common operation. v2: rebase. s/GEM_DEBUG_BUG/GEM_BUG (Chris) v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-4-mika.kuoppala@intel.com	2017-09-25 11:33:47 +03:00
Mika Kuoppala	cf4591d1ce	drm/i915: Wrap port cancellation into a function On reset and wedged path, we want to release the requests that are tied to ports and then mark the ports to be unset. Introduce a function for this. v2: rebase v3: drop local, keep GEM_BUG_ON (Michał, Chris) v4: rebase Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-3-mika.kuoppala@intel.com	2017-09-25 11:33:40 +03:00
Mika Kuoppala	19df9a5782	drm/i915: Move execlist initialization into intel_engine_cs.c Move execlist init into a common engine setup. As it is common to both guc and hw execlists. v2: rebase with csb changes v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-2-mika.kuoppala@intel.com	2017-09-25 11:33:32 +03:00
Mika Kuoppala	b620e87021	drm/i915: Make own struct for execlist items Engine's execlist related items have been increasing to a point where a separate struct is warranted. Carve execlist specific items to a dedicated struct to add clarity. v2: add kerneldoc and fix whitespace (Joonas, Chris) v3: csb_mmio changes, rebase v4: s/\b(el\|execlist)\b/execlists/ (Joonas) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com	2017-09-25 11:33:23 +03:00
Michal Wajdeczko	4f044a88a8	drm/i915: Rename global i915 to i915_modparams Our global struct with params is named exactly the same way as new preferred name for the drm_i915_private function parameter. To avoid such name reuse lets use different name for the global. v5: pure rename v6: fix Credits-to: Coccinelle @@ identifier n; @@ ( - i915.n + i915_modparams.n ) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ville Syrjala <ville.syrjala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com	2017-09-22 14:50:36 +03:00
Michał Winiarski	85e2fe679e	drm/i915/guc: Submit GuC workitems containing coalesced requests To create an upper bound on number of GuC workitems, we need to change the way that requests are being submitted. Rather than submitting each request as an individual workitem, we can do coalescing in a similar way we're handlig execlist submission ports. We also need to stop pretending that we're doing "lite-restore" in GuC submission (we would create a workitem each time we hit this condition). This allows us to completely remove the reservation, replacing it with a compile time check. v2: Also coalesce when replaying on reset (Daniele) v3: Consistent wq_resv - per-request (Daniele) v4: Squash removing wq_resv v5: Reflect i915_guc_submit argument changes in doc v6: Rebase on top of execlists reset/restart fix (Chris,Michał) References: https://bugs.freedesktop.org/show_bug.cgi?id=101873 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170914083216.10192-2-michal.winiarski@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-09-18 11:18:00 +01:00
Chris Wilson	221ab9719b	drm/i915/execlists: Unwind incomplete requests on resets Given the mechanism to unwind and replay requests (designed to support preemption), we have an alternative to the current method of resubmitting the ELSP upon reset. Resubmitting ELSP turns out to be more complicated than expected, due to having to handle lost context-switch interrupts and so guessing what ELSP we need to resubmit later. Instead, by unwinding the requests and clearing the ELSP tracking entirely, we can then just dequeue the first pair of ready requests after resetting, using the normal submission procedure. Currently, the unwound requests have maximum priority and so are guaranteed to be resubmitted upon resume. If we are lucky, we may be able to coalesce a new request on top! Suggested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-4-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-09-18 11:00:12 +01:00
Chris Wilson	27606fd878	drm/i915/execlists: Split insert_request() In the next patch we will want to reinsert a request not at the end of the priority queue, but at the front. Here we split insert_request() into two, the first function retrieves the priority list (for reuse for unsubmit later) and a wrapper function to insert at the end of that list and to schedule the tasklet if we were first. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-3-chris@chris-wilson.co.uk	2017-09-18 11:00:10 +01:00
Chris Wilson	08dd3e1acc	drm/i915/execlists: Move insert_request() Move insert_request() earlier to avoid a forward declaration in a later patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-2-chris@chris-wilson.co.uk	2017-09-18 11:00:05 +01:00
Chris Wilson	523e7c9278	drm/i915/execlists: Kick start request processing after a reset During a reset, we may skip over completed requests and lost context-switch interrupts. Following the reset, we may then may end up with no active requests in the ELSP (and so do not resubmit to restart the engine), but have a queue of requests ready for execution. This is unlikely, it requires the last request to complete after the hang is detected, but not impossible. The outcome of this is that the engine stalls, possibly leading to full ring and indefinite wait under struct_mutex, eventually leading to a full driver hang. Alternatively, we can solve this by unsubmitting the incomplete requests and just kickstarting the tasklet. Michał has patches for that, which I initially disliked due to the extra complexity, but the complexity of this "simple" restart is growing... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-09-18 10:59:59 +01:00
Chris Wilson	27a5f61b37	drm/i915: Cancel all ready but queued requests when wedging When wedging the hw, we want to mark all in-flight requests as -EIO. This is made slightly more complex by execlists who store the ready but not yet submitted-to-hw requests on a private queue (an rbtree priolist). Call into execlists to cancel not only the ELSP tracking for the submitted requests, but also the queue of unsubmitted requests. v2: Move the majority of engine_set_wedged to the backends (both legacy ringbuffer and execlists handling their own lists). Reported-by: Michał Winiarski <michal.winiarski@intel.com> Testcase: igt/gem_eio/in-flight-contexts Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-09-18 10:59:55 +01:00
Chris Wilson	767a983ab2	drm/i915/execlists: Read the context-status HEAD from the HWSP The engine also provides a mirror of the CSB write pointer in the HWSP, but not of our read pointer. To take advantage of this we need to remember where we read up to on the last interrupt and continue off from there. This poses a problem following a reset, as we don't know where the hw will start writing from, and due to the use of power contexts we cannot perform that query during the reset itself. So we continue the current modus operandi of delaying the first read of the context-status read/write pointers until after the first interrupt. With this we should now have eliminated all uncached mmio reads in handling the context-status interrupt, though we still have the uncached mmio writes for submitting new work, and many uncached mmio reads in the global interrupt handler itself. Still a step in the right direction towards reducing our resubmit latency, although it appears lost in the noise! v2: Cannonlake moved the CSB write index v3: Include the sw/hwsp state in debugfs/i915_engine_info v4: Also revert to using CSB mmio for GVT-g v5: Prevent the compiler reloading tail (Mika) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-09-13 17:28:46 +01:00
Chris Wilson	6d2cb5aa38	drm/i915/execlists: Read the context-status buffer from the HWSP The engine provides a mirror of the CSB in the HWSP. If we use the cacheable reads from the HWSP, we can shave off a few mmio reads per context-switch interrupt (which are quite frequent!). Just removing a couple of mmio is not enough to actually reduce any latency, but a small reduction in overall cpu usage. Much appreciation for Ben dropping the bombshell that the CSB was in the HWSP and for Michel in digging out the details. v2: Don't be lazy, add the defines for the indices. v3: Include the HWSP in debugfs/i915_engine_info v4: Check for GVT-g, it currently depends on intercepting CSB mmio v5: Fixup GVT-g mmio path v6: Disable HWSP if VT-d is active as the iommu adds unpredictable memory latency. (Mika) v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio read and we want to stop the compiler from issuing a later (v.slow) reload. Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-09-13 17:24:32 +01:00
Daniele Ceraolo Spurio	486e93f72a	drm/i915/lrc: allocate separate page for HWSP On gen8+ we're currently using the PPHWSP of the kernel ctx as the global HWSP. However, when the kernel ctx gets submitted (e.g. from __intel_autoenable_gt_powersave) the HW will use that page as both HWSP and PPHWSP. This causes a conflict in the register arena of the HWSP, i.e. dword indices below 0x30. We don't current utilize this arena, but in the following patches we will take advantage of the cached register state for handling execlist's context status interrupt. To avoid the conflict, instead of re-using the PPHWSP of the kernel ctx we can allocate a separate page for the HWSP like what happens for pre-execlists platform. v2: Add a use-case for the register arena of the HWSP. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk	2017-09-13 15:02:39 +01:00
Michel Thierry	0b29c75a01	drm/i915/lrc: Clarify the format of the context image Not only the context image consist of two parts (the PPHWSP, and the logical context state), but we also allocate a header at the start of for sharing data with GuC. Thus every lrc looks like this: \| [guc] \| [hwsp] [logical state] \| \|<- our header ->\|<- context image ->\| So far, we have oversimplified whenever we use each of these parts of the context, just because the GuC header happens to be in page 0, and the (PP)HWSP is in page 1. But this had led to using the same define for more than one meaning (as a page index in the lrc and as 1 page). This patch adds defines for the GuC shared page, the PPHWSP page and the start of the logical state. It also updated the places where the old define was being used. Since we are not changing the size (or format) of the context, there are no functional changes. v2: Use PPHWSP index for hws again. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: intel-gvt-dev@lists.freedesktop.org Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-1-chris@chris-wilson.co.uk	2017-09-13 15:02:15 +01:00
Chris Wilson	2013ddebd2	drm/i915: Move the context descriptor to an inline helper The context descriptor is stored inside the per-engine context state, as we only need to compute it once and access it frequently. However, currently only intel_lrc.c has easy access, but i915_guc_submission.c would like to frequently read it as well, and more so only ever needs the lower 32bits. Make it an inline as the compiler should be able to retrieve the value in less instructions than it takes to do the function call: add/remove: 0/1 grow/shrink: 1/0 up/down: 8/-45 (-37) function old new delta i915_guc_submit 621 629 +8 intel_lr_context_descriptor 45 - -45 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170912214905.21987-1-chris@chris-wilson.co.uk Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>	2017-09-13 10:31:48 +01:00
Rodrigo Vivi	90007bca61	drm/i915/cnl: Introduce initial Cannonlake Workarounds. Let's inherit workarounds from previous platforms that according to wa_database and BSpec are still valid for Cannonlake. v2: Add missed workarounds. v3: Rebase v4: Remove bad chunk that was added to rc6 disable. (Ander) Also remove A0 W/a that are not needed anymore. v5: Rebase on top of CFL. v6: Remove empty gen9_init_perctx_bb and gen9_init_indirectctx_bb since they don't carry any gen10 related W/a. (by Oscar). Also Remove A0 exclusive workaround. v7: Remove more A0 exclusive workarounds. As pointed out by Oscar many workarounds were changed to be A0 only so let's remove them. Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170815231651.975-1-rodrigo.vivi@intel.com	2017-08-18 15:32:17 -07:00
Chris Wilson	64f09f00ca	drm/i915: Clear lost context-switch interrupts across reset During a global reset, we disable the irq. As we disable the irq, the hardware may be raising a GT interrupt that we then ignore, leaving it pending in the GTIIR. After the reset, we then re-enable the irq, triggering the pending interrupt. However, that interrupt was for the stale state from before the reset, and the contents of the CSB buffer are now invalid. v2: Add a comment to make it clear that the double clear is purely my paranoia. Reported-by: "Dong, Chuanxiao" <chuanxiao.dong@intel.com> Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Dong, Chuanxiao" <chuanxiao.dong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170807121919.30165-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20170818090509.5363-1-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2017-08-18 22:42:13 +01:00
Chris Wilson	cdb6ded42f	drm/i915: Flush the execlist ports if idle When doing a GPU reset, the CSB register will be trashed and we will lose any context-switch notifications that happened since the tasklet was disabled. If we find that all requests on this engine were completed, we want to make sure that the ELSP tracker is similarly empty so that we do not feed back in the completed requests upon recovering from the reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-4-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:45 +02:00
Chris Wilson	829a0af29f	drm/i915: Group all the global context information together Create a substruct to hold all the global context state under drm_i915_private. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk	2017-06-20 17:13:40 +01:00
Robert Bragg	19f81df285	drm/i915/perf: Add OA unit support for Gen 8+ Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. v5: Drain submitted requests when enabling metric set to ensure no lite-restore erases the context image we just updated (Lionel) v6: In addition to drain, switch to kernel context & update all context in place (Chris) v7: Add missing mutex_unlock() if switching to kernel context fails (Matthew) v8: Simplify OA period/flex-eu-counters programming by using the batchbuffer instead of modifying ctx-image (Lionel) v9: Back to updating the context image (due to erroneous testing, batchbuffer programming the OA unit doesn't actually work) (Lionel) Pin context before updating context image (Chris) Drop MMIO programming now that we switch to a kernel context with right values in initial context image (Chris) v10: Just pin_map the contexts we want to modify or let the configuration happen on first use (Chris) v11: Update kernel context OA config through the batchbuffer rather than on the fly ctx-image update (Lionel) v12: Rework OA context registers update again by swithing away from user contexts and reconfiguring the kernel context through the batchbuffer and updating all the other contexts' context image. Also take care to lock slice/subslice configuration when OA is on. (Lionel) v13: Request rpcs updates on all engine when updating the OA config (Lionel) v14: Drop any kind of rpcs management now that we monitor sseu configuration changes in a later patch (Lionel) Remove usleep after programming the NOA configs on Gen8+, this doesn't seem to be needed (Lionel) v15: Respect coding style for block comments (Chris) v16: Add missing i915_add_request() in case we fail to emit OA configuration (Matthew) Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2017-06-14 12:31:57 -07:00
Michel Thierry	7bd0a2c6e1	drm/i915/gen10: Set value of Indirect Context Offset for gen10 Indirect Context Offset Pointer has changed for Cannonlake. INDIRECT_CTX_OFFSET[15:6] valid value for CNL is 19h per Spec. v2: rebased to intel_lr_indirect_ctx_offset v3: Commit message added per Tvrtko request. Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1496781040-20888-9-git-send-email-rodrigo.vivi@intel.com	2017-06-07 07:29:58 -07:00
Chris Wilson	349bdb68bd	drm/i915/execlists: Reduce lock contention between schedule/submit_request If we do not require to perform priority bumping, and we haven't yet submitted the request, we can update its priority in situ and skip acquiring the engine locks -- thus avoiding any contention between us and submit/execute. v2: Remove the stack element from the list if we can do the early assignment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-10-chris@chris-wilson.co.uk	2017-05-17 13:38:13 +01:00
Chris Wilson	c5cf9a9147	drm/i915: Create a kmem_cache to allocate struct i915_priolist from The i915_priolist are allocated within an atomic context on a path where we wish to minimise latency. If we use a dedicated kmem_cache, we have the advantage of a local freelist from which to service new requests that should keep the latency impact of an allocation small. Though currently we expect the majority of requests to be at default priority (and so hit the preallocate priolist), once userspace starts using priorities they are likely to use many fine grained policies improving the utilisation of a private slab. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk	2017-05-17 13:38:12 +01:00
Chris Wilson	6c067579e6	drm/i915: Split execlist priority queue into rbtree + linked list All the requests at the same priority are executed in FIFO order. They do not need to be stored in the rbtree themselves, as they are a simple list within a level. If we move the requests at one priority into a list, we can then reduce the rbtree to the set of priorities. This should keep the height of the rbtree small, as the number of active priorities can not exceed the number of active requests and should be typically only a few. Currently, we have ~2k possible different priority levels, that may increase to allow even more fine grained selection. Allocating those in advance seems a waste (and may be impossible), so we opt for allocating upon first use, and freeing after its requests are depleted. To avoid the possibility of an allocation failure causing us to lose a request, we preallocate the default priority (0) and bump any request to that priority if we fail to allocate it the appropriate plist. Having a request (that is ready to run, so not leading to corruption) execute out-of-order is better than leaking the request (and its dependency tree) entirely. There should be a benefit to reducing execlists_dequeue() to principally using a simple list (and reducing the frequency of both rbtree iteration and balancing on erase) but for typical workloads, request coalescing should be small enough that we don't notice any change. The main gain is from improving PI calls to schedule, and the explicit list within a level should make request unwinding simpler (we just need to insert at the head of the list rather than the tail and not have to make the rbtree search more complicated). v2: Avoid use-after-free when deleting a depleted priolist v3: Michał found the solution to handling the allocation failure gracefully. If we disable all priority scheduling following the allocation failure, those requests will be executed in fifo and we will ensure that this request and its dependencies are in strict fifo (even when it doesn't realise it is only a single list). Normal scheduling is restored once we know the device is idle, until the next failure! Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk	2017-05-17 13:38:09 +01:00
Chris Wilson	77f0d0e925	drm/i915/execlists: Pack the count into the low bits of the port.request add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187) function old new delta execlists_submit_ports 262 471 +209 port_assign.isra - 136 +136 capture 6344 6359 +15 reset_common_ring 438 452 +14 execlists_submit_request 228 238 +10 gen8_init_common_ring 334 341 +7 intel_engine_is_idle 106 105 -1 i915_engine_info 2314 2290 -24 __i915_gem_set_wedged_BKL 485 411 -74 intel_lrc_irq_handler 1789 1604 -185 execlists_update_context 294 - -294 The most important change there is the improve to the intel_lrc_irq_handler and excclist_submit_ports (net improvement since execlists_update_context is now inlined). v2: Use the port_api() for guc as well (even though currently we do not pack any counters in there, yet) and hide all port->request_count inside the helpers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk	2017-05-17 13:38:06 +01:00
Chuanxiao Dong	0d402a24df	drm/i915: set initialised only when init_context callback is NULL During execlist_context_deferred_alloc() we presumed that the context is uninitialised (we only just allocated the state object for it!) and chose to optimise away the later call to engine->init_context() if engine->init_context were NULL. This breaks with GVT's contexts that are marked as pre-initialised to avoid us annoyingly calling engine->init_context(). The fix is to not override ce->initialised if it is already true. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1494497262-24855-1-git-send-email-chuanxiao.dong@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-05-12 11:11:11 +01:00
Chris Wilson	266a240bf0	drm/i915: Use engine->context_pin() to report the intel_ring Since unifying ringbuffer/execlist submission to use engine->pin_context, we ensure that the intel_ring is available before we start constructing the request. We can therefore move the assignment of the request->ring to the central i915_gem_request_alloc() and not require it in every engine->request_alloc() callback. Another small step towards simplification (of the core, but at a cost of handling error pointers in less important callers of engine->pin_context). v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code generation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk	2017-05-04 11:54:43 +01:00
Joonas Lahtinen	63ffbcdadc	drm/i915: Sanitize engine context sizes Pre-calculate engine context size based on engine class and device generation and store it in the engine instance. v2: - Squash and get rid of hw_context_size (Chris) v3: - Move after MMIO init for probing on Gen7 and 8 (Chris) - Retained rounding (Tvrtko) v4: - Rebase for deferred legacy context allocation Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: intel-gvt-dev@lists.freedesktop.org Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-04-28 12:11:59 +03:00
Chris Wilson	e6ba9992de	drm/i915: Differentiate between sw write location into ring and last hw read We need to keep track of the last location we ask the hw to read up to (RING_TAIL) separately from our last write location into the ring, so that in the event of a GPU reset we do not tell the HW to proceed into a partially written request (which can happen if that request is waiting for an external signal before being executed). v2: Refactor intel_ring_reset() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_fence/await-hang Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Fixes: `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-04-25 15:33:22 +01:00
Chris Wilson	6b764a594f	drm/i915: Report request restarts for both execlists/guc As we now share the execlist_port[] tracking for both execlists/guc, we can reset the inflight count on both and report which requests are being restarted. Suggested-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425103835.31871-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-04-25 14:02:36 +01:00
Chris Wilson	48921260b7	drm/i915/execlists: Document runtime pm for intel_lrc_irq_handler() We indirectly hold the runtime-pm for the intel_lrc_irq_handler() by virtue of dev_priv->gt.awake keeping a wakeref whilst the requests are busy. As this is not obvious from the code, add a comment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170411175850.2470-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-04-12 09:44:08 +01:00
Daniele Ceraolo Spurio	ddfb570c20	drm/i915: Use the engine class to get the context size Technically speaking, the context size is per engine class, not per instance. v2: Add MISSING_CASE (Tvrtko) v3: Rebased v4: Restore the interface back to hiding the class lookup (Chris) Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1491905472-16189-1-git-send-email-oscar.mateo@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-04-11 19:44:04 +01:00
Chris Wilson	d822bb18ce	drm/i915: intel_ring.engine is unused Or rather it is used only by intel_ring_pin() to extract the drm_i915_private which we can easily pass in. As this is a relatively rare operation, save the space in the struct, and as such it is even break even in the extra code for passing around the parameter: add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0) function old new delta intel_init_ring_buffer 906 918 +12 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261794, chg +0.00% v2: Reorder intel_init_ring_buffer to keep the ring setup together: add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6) function old new delta intel_init_ring_buffer 906 912 +6 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261788, chg -0.00% Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk	2017-04-03 13:52:58 +01:00
Chris Wilson	18afa28892	Revert "drm/i915: Skip execlists_dequeue() early if the list is empty" This reverts commit `6c943de668` ("drm/i915: Skip execlists_dequeue() early if the list is empty"). The validity of using READ_ONCE there depends upon having a mb to coordinate the assignment of engine->execlist_first inside submit_request() and checking prior to taking the spinlock in execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a memory barrier making this cross-cpu checking safe", but failed to notice that this mb was conditional on the execlists being ready, i.e. there wasn't the required mb when it was most necessary! We could install an unconditional memory barrier to fixup the READ_ONCE(): diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 7dd732cb9f57..1ed164b16d44 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -616,6 +616,7 @@ static void execlists_submit_request(struct drm_i915_gem_request *request) if (insert_request(&request->priotree, &engine->execlist_queue)) { engine->execlist_first = &request->priotree.node; + smp_wmb(); if (execlists_elsp_ready(engine)) But we have opted to remove the race as it should be rarely effective, and saves us having to explain the necessary memory barriers which we quite clearly failed at. Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Fixes: `6c943de668` ("drm/i915: Skip execlists_dequeue() early if the list is empty") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>	2017-03-29 13:02:24 +01:00
Chris Wilson	a79a524e92	drm/i915: Avoid lock dropping between rescheduling Unlocking is dangerous. In this case we combine an early update to the out-of-queue request, because we know that it will be inserted into the correct FIFO priority-ordered slot when it becomes ready in the future. However, given sufficient enthusiasm, it may become ready as we are continuing to reschedule, and so may gazump the FIFO if we have since dropped its spinlock. The result is that it may be executed too early, before its dependencies. v2: Move all work into the second phase over the topological sort. This removes the shortcut on the out-of-rbtree request to ensure that we only adjust its priority after adjusting all of its dependencies. Fixes: `20311bd350` ("drm/i915/scheduler: Execute requests in order of priorities") Testcase: igt/gem_exec_whisper Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170327202143.7972-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-29 11:47:45 +01:00
Chris Wilson	ed1501d451	drm/i915: Refactor tests for validity of RING_TAIL Whilst I like having the assertions clearly visible in the code, they are quite repetitious! As we find new limits we want to incorporate into the set of assertions, it make sense to refactor them to a common routine. v2: Add a guc holdout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-27 15:03:53 +01:00
Chris Wilson	a91fdf1293	drm/i915: Assert that the request->tail fits within the ring In addition to being qword-aligned, the RING_TAIL offset must be within the ring! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-27 15:03:21 +01:00
Chris Wilson	450362d3fe	drm/i915/execlists: Wrap tail pointer after reset tweaking If the request->wa_tail is 0 (because it landed exactly on the end of the ringbuffer), when we reconstruct request->tail following a reset we fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is never able to catch up with RING_TAIL and the GPU spins endlessly. If the ring contains a couple of breadcrumbs, even our hangcheck is unable to catch the busy-looping as the ACTHD and seqno continually advance. v2: Move the wrap into a common intel_ring_wrap(). Fixes: `a3aabe86a3` ("drm/i915/execlists: Reinitialise context image after GPU hang") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-27 15:02:56 +01:00
Chris Wilson	4af0d727b1	drm/i915/execlists: Trim irq handler I noticed that gcc was spilling the CSB to the stack, so rearrange the code to be more compact. Spilling in this function is slightly more interesting due to the mmio reads acting as memory barriers and so end up flushing the stack spills. Still miniscule to having to do at least the pair of uncached reads :( function old new delta intel_lrc_irq_handler 1039 878 -161 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170325201053.21306-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-27 12:48:44 +01:00
Chris Wilson	2e70b8c6bf	drm/i915/execlists: Relax the locked clear_bit(IRQ_EXECLIST) We only need to care about the ordering of the clearing of the bit with the uncached CSB read in order to correctly detect a new interrupt before the read completes. The uncached read itself acts as a full memory barrier, so we do not need to enforce another in the form of a locked clear_bit. v2: Clarify why the split and unlocked test/clear is harmless. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170323134803.10418-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-23 22:13:51 +00:00
Tvrtko Ursulin	9f7886d07f	drm/i915: Spinlocks in tasklets can use spin_(un)lock_irq The tasklets callbacks are only called from tasklet context so it is safe do to this. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170321105511.18269-1-tvrtko.ursulin@linux.intel.com	2017-03-21 15:04:55 +00:00
Chris Wilson	fe085f13c7	drm/i915: Remove intel_ring.last_retired_head Storing the position of the breadcrumb of the last retired request as a separate last_retired_head is superfluous as we always copy that into head prior to recalculation of the intel_ring.space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk	2017-03-21 14:21:50 +00:00
Chris Wilson	899f6204c0	drm/i915/execlists: Split the atomic test_and_clear_bit for irq handler Rather than impose the cost of a locked test before queuing a new request, reduce it to a simple test_bit() with a following clear_bit() prior to doing the CSB check. This ensure that if an interrupt does occur whilst reading from the CSB, we still detect it (the interrupt would trigger a rescheduling of the tasklet anyway). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170321113320.2603-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-21 14:14:55 +00:00
Chris Wilson	c9203e8277	drm/i915: Reset tasklet back to execlists after disabling guc When switching back to execlists, we also now need to restore the tasklet handler. Reported-by: Oscar Mateo <oscar.mateo@intel.com> Fixes: `31de73501a` ("drm/i915/scheduler: emulate a scheduler for guc") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170318102859.24101-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-03-20 10:13:14 +00:00
Chris Wilson	6c943de668	drm/i915: Skip execlists_dequeue() early if the list is empty Do an early read of the execlists' queue before we take the spinlock and start checking. This is safe as the first writer to the execlists queue will cause the tasklet to be run again after a memory barrier. v2: Keep guc in sync with execlists queue changes v3: Explain the mb between the tasklet running on one cpu and the execlist_first update and schedule from a second cpu. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170317120716.17191-1-chris@chris-wilson.co.uk	2017-03-17 15:53:26 +00:00
Chris Wilson	a533b4ba77	drm/i915: Assert that the context pin_counts do not overflow This should be impossible, but let's assert that we do not pin a context 4 billion times before retiring! v2: Fix the assertion -- the patch had just one job to do! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171628.3228-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-16 20:48:58 +00:00
Chris Wilson	ff44ad51eb	drm/i915: Move engine->submit_request selection to a vfunc It turns out that we may want to restore the original engine->submit_request (and engine->schedule) callbacks from more than just the guc <-> execlists transition. Move this to a vfunc so we can have a common interface. v2: Move initial selection to intel_engines_init_common(), repaint vfunc with engine->set_default_submission (and a similar colour for the helper). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk	2017-03-16 17:17:12 +00:00
Changbin Du	3fc03069bc	drm/i915: make context status notifier head be per engine GVTg has introduced the context status notifier to schedule the GVTg workload. At that time, the notifier is bound to GVTg context only, so GVTg is not aware of host workloads. Now we are going to improve GVTg's guest workload scheduler policy, and add Guc emulation support for new Gen graphics. Both these two features require acknowledgment for all contexts running on hardware. (But will not alter host workload.) So here try to make some change. The change is simple: 1. Move the context status notifier head from i915_gem_context to intel_engine_cs. Which means there is a notifier head per engine instead of per context. Execlist driver still call notifier for each context sched-in/out events of current engine. 2. At GVTg side, it binds a notifier_block for each physical engine at GVTg initialization period. Then GVTg can hear all context status events. In this patch, GVTg do nothing for host context event, but later will add a function there. But in any case, the notifier callback is a noop if this is no active vGPU. Since intel_gvt_init() is called at early initialization stage and require the status notifier head has been initiated, I initiate it in intel_engine_setup(). v2: remove a redundant newline. (chris) Fixes: `3c7ba6359d` ("drm/i915: Introduce execlist context status change notification") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100232 Signed-off-by: Changbin Du <changbin.du@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170313024711.28591-1-changbin.du@intel.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-03-16 16:24:35 +00:00
Chris Wilson	31de73501a	drm/i915/scheduler: emulate a scheduler for guc This emulates execlists on top of the GuC in order to defer submission of requests to the hardware. This deferral allows time for high priority requests to gazump their way to the head of the queue, however it nerfs the GuC by converting it back into a simple execlist (where the CPU has to wake up after every request to feed new commands into the GuC). v2: Drop hack status - though iirc there is still a lockdep inversion between fence and engine->timeline->lock (which is impossible as the nesting only occurs on different fences - hopefully just requires some judicious lockdep annotation) v3: Apply lockdep nesting to enabling signaling on the request, using the pattern we already have in __i915_gem_request_submit(); v4: Replaying requests after a hang also now needs the timeline spinlock, to disable the interrupts at least v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal v6: Reorder interrupt checking for a happier gcc. v7: Only signal the tasklet after a user-interrupt if using guc scheduling v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko) v9: Avoid re-initialising the engine->irq_tasklet from inside a reset v10: Hook up the execlists-style tracepoints v11: Clear the execlists irq_posted bit after taking over the interrupt/tasklet Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316125619.6856-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-03-16 14:45:07 +00:00
Mika Kuoppala	e71677698b	drm/i915: Avoid using word legacy with ppgtt The term legacy is subjective. Use 3lvl and 4lvl where appropriate. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-4-git-send-email-mika.kuoppala@intel.com	2017-03-03 16:46:23 +02:00
Mika Kuoppala	54af56dbf8	drm/i915: Don't mark pdps clear if pdps are not submitted Don't mark pdps clear if never do the necessary actions with the hardware to make them clear. v2: totally get rid of confusing ppgtt bool (Chris) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-2-git-send-email-mika.kuoppala@intel.com	2017-03-03 16:45:11 +02:00
Chris Wilson	0542524944	drm/i915: Generalise wait for execlists to be idle The code to check for execlists completion is generic, so move it to intel_engine_cs.c, where we can reuse the new intel_engine_is_idle(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-03 13:08:15 +00:00
Kelvin Gardiner	69060d9644	drm/i915/bdw: Do not write the replay bit of the ring mode register The replay bit of the ring mode register is not a valid bit for Gen8+. Do not write to this bit. Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> [Joonas: Fixed commit message line to be under 72 chars] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1487963724-4824-1-git-send-email-kelvin.gardiner@intel.com	2017-02-27 14:02:50 +02:00
Chris Wilson	fe9ae7a3bf	drm/i915/execlists: Detect an out-of-order context switch We require that the request is completed before the context is switched away. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170223145031.26210-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-24 12:57:32 +00:00
Tvrtko Ursulin	d7d96833f2	drm/i915/tracepoints: Add backend level request in and out tracepoints Two new tracepoints placed at the call sites where requests are actually passed to the GPU enable userspace to track engine utilisation. These tracepoints are only enabled when the DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled. v2: Fix compilation with !CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS. v3: Name global seqno consistently across tracepoints. v4: Remove port info from request out tracepoint. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-21 13:18:25 +00:00
Tvrtko Ursulin	56e51bf036	drm/i915: Tidy execlists_init_reg_state Compact the name of the macro and reg_state variable, and cache some data in local variables to make the function more compact and more readable. v2: Fixup some checkpatch warnings. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170221095839.30525-1-tvrtko.ursulin@linux.intel.com	2017-02-21 13:15:41 +00:00
Chris Wilson	944a36d472	drm/i915: Assert that the request->tail is always qword aligned The hardware requires that the tail pointer only advance in qword units, so assert that the value we write is aligned to qwords, and similarly enforce this restriction onto the request->tail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217163833.731-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-02-20 14:32:25 +00:00
Tvrtko Ursulin	9f235dfa49	drm/i915: Consolidate gen8_emit_pipe_control We have a few open coded instances in the execlists code and an almost suitable helper in intel_ringbuf.c We can consolidate to a single helper if we change the existing helper to emit directly to ring buffer memory and move the space reservation outside it. v2: Drop memcpy for memset. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170216122325.31391-2-tvrtko.ursulin@linux.intel.com	2017-02-17 11:39:59 +00:00
Tvrtko Ursulin	097d4f1c12	drm/i915: Tidy workaround batch buffer emission Use the "*batch++ = " style as in the ring emission for better readability and also simplify the logic a bit by consolidating the offset and size calculations and overflow checking. The latter is a programming error so it is not required to check for it after each write to the object, but rather do it once the whole state has been written and fail the driver if something went wrong. v2: Rebase. v3: Keep track of offsets and sizes in bytes for simplicity and rename function pointer variable to _fn suffix. (Chris Wilson) v4: Fix size calc broken in v3 and add alignment warning. (Chris Wilson) v5: Fix return code. v6: I added an exit from loop in v5 but forgot to put back the object teardown. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-17 11:39:59 +00:00
Tvrtko Ursulin	4ac9659ef9	drm/i915: Remove duplicate intel_logical_ring_workarounds_emit intel_ring_workarounds_emit is exactly the same code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170214150017.16058-1-tvrtko.ursulin@linux.intel.com	2017-02-15 08:16:41 +00:00
Tvrtko Ursulin	73dec95e6b	drm/i915: Emit to ringbuffer directly This removes the usage of intel_ring_emit in favour of directly writing to the ring buffer. intel_ring_emit was preventing the compiler for optimising fetch and increment of the current ring buffer pointer and therefore generating very verbose code for every write. It had no useful purpose since all ringbuffer operations are started and ended with intel_ring_begin and intel_ring_advance respectively, with no bail out in the middle possible, so it is fine to increment the tail in intel_ring_begin and let the code manage the pointer itself. Useless instruction removal amounts to approximately two and half kilobytes of saved text on my build. Not sure if this has any measurable performance implications but executing a ton of useless instructions on fast paths cannot be good. v2: * Change return from intel_ring_begin to error pointer by popular demand. * Move tail increment to intel_ring_advance to enable some error checking. v3: * Move tail advance back into intel_ring_begin. * Rebase and tidy. v4: * Complete rebase after a few months since v3. v5: * Remove unecessary cast and fix !debug compile. (Chris Wilson) v6: * Make intel_ring_offset take request as well. * Fix recording of request postfix plus a sprinkle of asserts. (Chris Wilson) v7: * Use intel_ring_offset to get the postfix. (Chris Wilson) * Convert GVT code as well. v8: * Rename out++ to cs++. v9: * Fix GVT out to cs conversion in GVT. v10: * Rebase for new intel_ring_begin in selftests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com	2017-02-14 14:30:46 +00:00
Chris Wilson	ae9a043b0c	drm/i915: Rename conditional GEM execution macros After a brief discussion, we settled on a naming convention for the conditional GEM debugging data that should be clearer to the casual user: GEM_DEBUG Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-02-10 21:43:43 +00:00
Chris Wilson	72b72ae473	drm/i915: Always pin contexts into the high GGTT Now that we have fast top-down insertion into the drm_mm, we can use it for frequent runtime operations like insertion of the context object, whereas before we limited it to the one-off insertion of the pinned kernel context. Keeping the active context objects out of the mappable region of the global GTT (except under memory pressure) improves our ability to allocate mappable aperture region without triggering a GPU stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170210101422.1598-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-10 13:58:40 +00:00
Chris Wilson	949e8ab3a9	drm/i915: Use the size/type of address space to make decisions Once the address space has been created (using 3 or 4 levels of page tables), we should use that to program the appropriate type into the contexts. This gives us the flexibility to handle different types of address spaces at runtime. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170209144036.23664-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-09 17:09:27 +00:00

1 2 3 4 5 ...

683 Commits