linux

Author	SHA1	Message	Date
Mika Kuoppala	48d7fb181a	drm/i915: Remove lite restore defines We have switched from tail manipulation to forced context restore to implement WaIdleLiteRestore. Remove the old defines and comments. Note: we still do emit the WA tail, and use it as our first attempt to avoid forcing a full-restore instead of a lite-restore, we just have a much stronger backup mechanism for repeated preemptions. References: `f26a9e959a` ("drm/i915/gt: Detect if we miss WaIdleLiteRestore") Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200203163312.15475-1-mika.kuoppala@linux.intel.com	2020-02-08 11:36:55 +00:00
Rodrigo Vivi	c0f00d270e	Merge drm/drm-next into drm-intel-next-queued Moving the base forward since this one was so old. New base contains fixes that we needed. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-02-07 17:47:43 -08:00
Chris Wilson	5ba32c7be8	drm/i915/execlists: Always force a context reload when rewinding RING_TAIL If we rewind the RING_TAIL on a context, due to a preemption event, we must force the context restore for the RING_TAIL update to be properly handled. Rather than note which preemption events may cause us to rewind the tail, compare the new request's tail with the previously submitted RING_TAIL, as it turns out that timeslicing was causing unexpected rewinds. <idle>-0 0d.s2 1280851190us : __execlists_submission_tasklet: 0000:00:02.0 rcs0: expired last=130:4698, prio=3, hint=3 <idle>-0 0d.s2 1280851192us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 66:119966, current 119964 <idle>-0 0d.s2 1280851195us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4698, current 4695 <idle>-0 0d.s2 1280851198us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4696, current 4695 ^---- Note we unwind 2 requests from the same context <idle>-0 0d.s2 1280851208us : __i915_request_submit: 0000:00:02.0 rcs0: fence 130:4696, current 4695 <idle>-0 0d.s2 1280851213us : __i915_request_submit: 0000:00:02.0 rcs0: fence 134:1508, current 1506 ^---- But to apply the new timeslice, we have to replay the first request before the new client can start -- the unexpected RING_TAIL rewind <idle>-0 0d.s2 1280851219us : trace_ports: 0000:00:02.0 rcs0: submit { 130:4696, 134:1508 } synmark2-5425 2..s. 1280851239us : process_csb: 0000:00:02.0 rcs0: cs-irq head=5, tail=0 synmark2-5425 2..s. 1280851240us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x00008002:0x00000000 ^---- Preemption event for the ELSP update; note the lite-restore synmark2-5425 2..s. 1280851243us : trace_ports: 0000:00:02.0 rcs0: preempted { 130:4698, 66:119966 } synmark2-5425 2..s. 1280851246us : trace_ports: 0000:00:02.0 rcs0: promote { 130:4696, 134:1508 } synmark2-5425 2.... 1280851462us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4700, current 4695 synmark2-5425 2.... 1280852111us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4702, current 4695 synmark2-5425 2.Ns1 1280852296us : process_csb: 0000:00:02.0 rcs0: cs-irq head=0, tail=2 synmark2-5425 2.Ns1 1280852297us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x00000814:0x00000000 synmark2-5425 2.Ns1 1280852299us : trace_ports: 0000:00:02.0 rcs0: completed { 130:4696!, 134:1508 } synmark2-5425 2.Ns1 1280852301us : process_csb: 0000:00:02.0 rcs0: csb[2]: status=0x00000818:0x00000040 synmark2-5425 2.Ns1 1280852302us : trace_ports: 0000:00:02.0 rcs0: completed { 134:1508, 0:0 } synmark2-5425 2.Ns1 1280852313us : process_csb: process_csb:2336 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists)) Fixes: `8ee36e048c` ("drm/i915/execlists: Minimalistic timeslicing") Referenecs: `82c69bf586` ("drm/i915/gt: Detect if we miss WaIdleLiteRestore") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Link: https://patchwork.freedesktop.org/patch/msgid/20200207211452.2860634-1-chris@chris-wilson.co.uk	2020-02-07 21:41:46 +00:00
Chris Wilson	793c226173	drm/i915/gt: Protect execlists_hold/unhold from new waiters As we may add new waiters to a request as it is being run, we need to mark the list iteration as being safe for concurrent addition. v2: Mika spotted that we used the same trick for signalers_list, so warn the compiler about the lockless walk there as well. Fixes: `32ff621fd7` ("drm/i915/gt: Allow temporary suspension of inflight requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200207110213.2734386-1-chris@chris-wilson.co.uk	2020-02-07 13:07:28 +00:00
Chris Wilson	f14f27b166	drm/i915/gt: Protect defer_request() from new waiters Mika spotted <4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI <4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1 <4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018 <4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915] <4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05 <4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083 <4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122 <4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48 <4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe <4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8 <4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80 <4>[17436.705532] FS: 0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000 <4>[17436.705534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0 <4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4>[17436.705542] Call Trace: <4>[17436.705545] <IRQ> <4>[17436.705603] execlists_submission_tasklet+0xc0/0x130 [i915] which is us consuming a partially initialised new waiter in defer_requests(). We can prevent this by initialising the i915_dependency prior to making it visible, and since we are using a concurrent list_add/iterator mark them up to the compiler. Fixes: `8ee36e048c` ("drm/i915/execlists: Minimalistic timeslicing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk	2020-02-07 10:55:49 +00:00
Daniele Ceraolo Spurio	faea179283	drm/i915: extract engine WA programming to common resume function The workarounds are a common "feature" across gens and submission mechanisms and we already call the other WA related functions from common engine ones (<setup/cleanup>_common), so it makes sense to do the same with WA application. Medium-term, This will help us reduce the duplication once the GuC resume function is added, but short term it will also allow us to use the workaround lists for pre-gen8 engine workarounds. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200131075716.2212299-2-chris@chris-wilson.co.uk	2020-01-31 23:54:12 +00:00
Chris Wilson	e3793468b4	drm/i915: Use the async worker to avoid reclaim tainting the ggtt->mutex On Braswell and Broxton (also known as Valleyview and Apollolake), we need to serialise updates of the GGTT using the big stop_machine() hammer. This has the side effect of appearing to lockdep as a possible reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu allocations). However, we want to use vm->mutex (including ggtt->mutex) from within the shrinker and so must avoid such possible taints. For this purpose, we introduced the asynchronous vma binding and we can apply it to the PIN_GLOBAL so long as take care to add the necessary waits for the worker afterwards. Closes: https://gitlab.freedesktop.org/drm/intel/issues/211 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk	2020-01-30 21:35:43 +00:00
Chris Wilson	f1042cc853	drm/i915/execlists: Ignore discrepancies in pending[] across resets When we reset the engine, we first remove the guilty request from the active list. If it so happens that there is a pending preemption event to process before we handle the reset, when we inspect that event we find ourselves a little confused as we have bent the rules slightly to perform the reset. Just ignore any discrepancies inside reset, we know we'll start again from scratch afterwards. <0>[ 536.940213] <idle>-0 6..s1 537441383us : execlists_reset: 0000:00:02.0 vcs0: reset for CS error <0>[ 536.940213] i915_sel-7302 2d..1 537441386us : trace_ports: 0000:00:02.0 vcs0: submit { 10c59:2, 10c5a:2 } <0>[ 536.940213] <idle>-0 6d.s2 537471320us : __i915_request_unsubmit: 0000:00:02.0 vcs0: fence 10c59:2, current 1 <0>[ 536.940213] <idle>-0 6d.s2 537471321us : execlists_hold: 0000:00:02.0 vcs0: fence 10c59:2, current 1 on hold <0>[ 536.940213] <idle>-0 6.Ns1 537471328us : intel_engine_reset: 0000:00:02.0 vcs0: flags=10 <0>[ 536.940213] <idle>-0 6.Ns1 537471421us : execlists_reset_prepare: 0000:00:02.0 vcs0: depth<-1 <0>[ 536.940213] <idle>-0 6.Ns1 537471422us : intel_engine_stop_cs: 0000:00:02.0 vcs0: <0>[ 536.940213] <idle>-0 6.Ns1 537472424us : intel_engine_stop_cs: 0000:00:02.0 vcs0: timed out on STOP_RING -> IDLE <0>[ 536.940213] <idle>-0 6.Ns1 537472429us : __intel_gt_reset: 0000:00:02.0 engine_mask=4 <0>[ 536.940213] <idle>-0 6.Ns1 537472442us : execlists_reset_rewind: 0000:00:02.0 vcs0: <0>[ 536.940213] <idle>-0 6dNs2 537472443us : process_csb: 0000:00:02.0 vcs0: cs-irq head=4, tail=5 <0>[ 536.940213] <idle>-0 6dNs2 537472444us : process_csb: 0000:00:02.0 vcs0: csb[5]: status=0x00008002:0x20000060 <0>[ 536.940213] <idle>-0 6dNs2 537472464us : trace_ports: 0000:00:02.0 vcs0: preempted { 10c59:2, 0:0 } <0>[ 536.940213] <idle>-0 6dNs2 537472465us : trace_ports: 0000:00:02.0 vcs0: promote { 10c59:2*, 10c5a:2 } <0>[ 536.940213] <idle>-0 6dNs2 537472706us : assert_pending_valid: assert_pending_valid:1417 GEM_BUG_ON(!i915_request_is_active(rq)) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200129165935.1266132-1-chris@chris-wilson.co.uk	2020-01-30 00:50:58 +00:00
Chris Wilson	70a76a9b8e	drm/i915/gt: Hook up CS_MASTER_ERROR_INTERRUPT Now that we have offline error capture and can reset an engine from inside an atomic context while also preserving the GPU state for post-mortem analysis, it is time to handle error interrupts thrown by the command parser. This provides a much, much faster mechanism for us to detect known problems than using heartbeats/hangchecks, and also provides a mechanism for when those are disabled. However, it is limited to problems the HW can detect in the CS and so not a complete solution for detecting lockups. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200128204318.4182039-2-chris@chris-wilson.co.uk	2020-01-29 15:16:52 +00:00
Chris Wilson	8a5746982e	drm/i915/execlist: Mark up racy read of execlists->pending[0] We write to execlists->pending[0] in process_csb() to acknowledge the completion of the ESLP update, outside of the main spinlock. When we check the current status of the previous submission in __execlists_submission_tasklet() we should therefore use READ_ONCE() to reflect and document the unsynchronized read. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200128171614.3845825-1-chris@chris-wilson.co.uk	2020-01-29 13:13:50 +00:00
Umesh Nerlige Ramappa	6f280b133d	drm/i915/perf: Fix OA context id overlap with idle context id Engine context pinned in perf OA was set to same context id as the idle context. Set the context id to an unused value. Clear the sw context id field in lrc descriptor before ORing with ce->tag (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/issues/756 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200124013701.40609-1-umesh.nerlige.ramappa@intel.com	2020-01-27 21:11:59 +00:00
Chris Wilson	989df3a7bd	drm/i915/execlists: Reclaim the hanging virtual request If we encounter a hang on a virtual engine, as we process the hang the request may already have been moved back to the virtual engine (we are processing the hang on the physical engine). We need to reclaim the request from the virtual engine so that the locking is consistent and local to the real engine on which we will hold the request for error state capturing. v2: Pull the reclamation into execlists_hold() and assert that cannot be called from outside of the reset (i.e. with the tasklet disabled). v3: Added selftest v4: Drop the reference owned by the virtual engine Fixes: `748317386a` ("drm/i915/execlists: Offline error capture") Testcase: igt/gem_exec_balancer/hang Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk	2020-01-22 17:10:15 +00:00
Chris Wilson	4ba5c086a1	drm/i915/execlists: Take a reference while capturing the guilty request Thanks to preempt-to-busy, we leave the request on the HW as we submit the preemption request. This means that the request may complete at any moment as we process HW events, and in particular the request may be retired as we are planning to capture it for a preemption timeout. Be more careful while obtaining the request to capture after a preemption timeout, and check to see if it completed before we were able to put it on the on-hold list. If we do see it did complete just before we capture the request, proclaim the preemption-timeout a false positive and pardon the reset as we should hit an arbitration point momentarily and so be able to process the preemption. Note that even after we move the request to be on hold it may be retired (as the reset to stop the HW comes after), so we do require to hold our own reference as we work on the request for capture (and all of the peeking at state within the request needs to be carefully protected). Fixes: `32ff621fd7` ("drm/i915/gt: Allow temporary suspension of inflight requests") Closes: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk	2020-01-22 17:10:15 +00:00
Dave Airlie	3d4743131b	Linux 5.5-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4k7i8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvk0IAKRenVOdiudY77SQ VZjsteyrYTTQtPPv494ToIRjR0XQ+gYp8vyWzXTUC5Nm9Y9U3VzDqUPUjWszrSXE 6mU+tzcMc9qwuUxnIFn8zfg64ygw+37sn/w3xqeH4QmF9Z5Wl3EX3SdXTs7jp3RS VxiztkUNI5ZBV2GDtla5K/9qLPqCQnUYXIiyi5lAtBtiitZDVXFp7dy7hMgEiaEO +78K5Kh3xlt5ndDsBFOlwIb2Oof3KL7bBXntdbSBc/bjol6IRvAgln48HWCv59G2 jzAp2tj2KobX9GRAEPj+v4TQZEW0SXDNDi8MgQsM+3DYVCTmANsv57CBKRuf01+F nB1kAys= =zSnJ -----END PGP SIGNATURE----- Backmerge v5.5-rc7 into drm-next msm needs 5.5-rc4, go to the latest. Signed-off-by: Dave Airlie <airlied@redhat.com>	2020-01-20 11:42:57 +10:00
Chris Wilson	748317386a	drm/i915/execlists: Offline error capture Currently, we skip error capture upon forced preemption. We apply forced preemption when there is a higher priority request that should be running but is being blocked, and we skip inline error capture so that the preemption request is not further delayed by a user controlled capture -- extending the denial of service. However, preemption reset is also used for heartbeats and regular GPU hangs. By skipping the error capture, we remove the ability to debug GPU hangs. In order to capture the error without delaying the preemption request further, we can do an out-of-line capture by removing the guilty request from the execution queue and scheduling a worker to dump that request. When removing a request, we need to remove the entire context and all descendants from the execution queue, so that they do not jump past. Closes: https://gitlab.freedesktop.org/drm/intel/issues/738 Fixes: `3a7a92aba8` ("drm/i915/execlists: Force preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk	2020-01-16 19:56:17 +00:00
Chris Wilson	32ff621fd7	drm/i915/gt: Allow temporary suspension of inflight requests In order to support out-of-line error capture, we need to remove the active request from HW and put it to one side while a worker compresses and stores all the details associated with that request. (As that compression may take an arbitrary user-controlled amount of time, we want to let the engine continue running on other workloads while the hanging request is dumped.) Not only do we need to remove the active request, but we also have to remove its context and all requests that were dependent on it (both in flight, queued and future submission). Finally once the capture is complete, we need to be able to resubmit the request and its dependents and allow them to execute. v2: Replace stack recursion with a simple list. v3: Check all the parents, not just the first, when searching for a stuck ancestor! References: https://gitlab.freedesktop.org/drm/intel/issues/738 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk	2020-01-16 19:56:16 +00:00
Chris Wilson	672c368f93	drm/i915: Keep track of request among the scheduling lists If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk	2020-01-16 19:56:15 +00:00
Chris Wilson	f3c0efc9fe	drm/i915/execlists: Leave resetting ring to intel_ring We need to allow concurrent intel_context_unpin, which means avoiding doing destructive operations like intel_ring_reset(). This was already fixed for intel_ring_unpin() in commit `0725d9a318` ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"), but I overlooked that execlists_context_unpin() also made the same mistake. Reported-by: Matthew Brost <matthew.brost@intel.com> Fixes: `8413502238` ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") References: `0725d9a318` ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200115175829.2761329-1-chris@chris-wilson.co.uk	2020-01-16 12:39:44 +00:00
Chris Wilson	72ff2b8d5f	drm/i915/gt: Use the BIT when checking the flags, not the index In converting over to using set_bit()/test_bit(), when manually inspecting the rq->fence.flags, we need to use BIT(). Fixes: `e1c31fb5dd` ("drm/i915: Merge i915_request.flags with i915_request.fence.flags") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200115122509.2673075-1-chris@chris-wilson.co.uk	2020-01-15 14:01:58 +00:00
Linus Torvalds	63d264fe08	Intel ID: PSIRT-TA-201910-001 CVEID: CVE-2019-14615 Summary of Vulnerability ------------------------ Insufficient control flow in certain data structures for some Intel(R) Processors with Intel Processor Graphics may allow an unauthenticated user to potentially enable information disclosure via local access Products affected: ------------------ Intel CPU’s with Gen7, Gen7.5 and Gen9 Graphics. Public Disclosure Schedule: --------------------------- Intel is pursuing a coordinated disclosure of this vulnerability. The targeted public disclosure date is January 14 2020 Mitigation Summary ------------------ This patch provides mitigation for Gen9 hardware only. Patches for Gen7 and Gen7.5 will be provided later. Note that Gen8 is not impacted due to a previously implemented workaround. The mitigation involves using an existing hardware feature to forcibly clear down all EU state at each context switch. -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJeGHjXAAoJEID/Kx9323OZezwH/iLlbczb6HW7AbloQVa7KRNL cZ4VHHXmMEQPSprxFuOS21/hVW1rKZzbjTGGI0qbm4qNT3LiK92E0dcoMs1Tp9Xd eElZpkeO36pqdxc/a256N3xrpmhiMnmk33F36k4qGpt6YUxvFUyZ50re0e3pO03j wGJ1cMIbAKJQmMC23yQdD44y1TH32fGeUQvwbLgktHAS/r1DxqyaZZq1hSpOiZdV TqhFLQAXUw2Cxy3FmF7KgcedcZfii1Rq5Gz7iQeyix3CbNM9r+1UGqsjGacDcXS9 /GxhBCSKf35pOj7ZxgtLPCCdL5mSAtvQO/E+yLx3F9axG9bzzNGkLpEsWeCshp8= =3jTf -----END PGP SIGNATURE----- Merge tag 'Intel-CVE-2019-14615' from bundle by Akeem Abodunrin. Merge Intel Gen9 graphics fix from Akeem Abodunrin: "Insufficient control flow in certain data structures for some Intel Processors with Intel Processor Graphics may allow an unauthenticated user to potentially enable information disclosure via local access This provides mitigation for Gen9 hardware. Note that Gen8 is not impacted due to a previously implemented workaround. The mitigation involves using an existing hardware feature to forcibly clear down all EU state at each context switch" * tag 'Intel-CVE-2019-14615' of emailed bundle from Akeem G Abodunrin <akeem.g.abodunrin@intel.com>: drm/i915/gen9: Clear residual context state on context switch	2020-01-13 18:40:57 -08:00
Chris Wilson	6b7133b669	drm/i915/gt: Always reset the timeslice after a context switch Currently, we reset the timer after a pre-eemption event. This has the side-effect that the timeslice runs into the second context after the first is completed after a normal promotion event, causing the second context to be swapped out early and switched for a third context. To be more fair, we want to reset the clock after promotion as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200113214546.1990139-1-chris@chris-wilson.co.uk	2020-01-13 22:14:54 +00:00
Akeem G Abodunrin	bc8a76a152	drm/i915/gen9: Clear residual context state on context switch Intel ID: PSIRT-TA-201910-001 CVEID: CVE-2019-14615 Intel GPU Hardware prior to Gen11 does not clear EU state during a context switch. This can result in information leakage between contexts. For Gen8 and Gen9, hardware provides a mechanism for fast cleardown of the EU state, by issuing a PIPE_CONTROL with bit 27 set. We can use this in a context batch buffer to explicitly cleardown the state on every context switch. As this workaround is already in place for gen8, we can borrow the code verbatim for Gen9. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Kumar Valsan Prathap <prathap.kumar.valsan@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Balestrieri Francesco <francesco.balestrieri@intel.com> Cc: Bloomfield Jon <jon.bloomfield@intel.com> Cc: Dutt Sudeep <sudeep.dutt@intel.com>	2020-01-09 07:18:02 -08:00
Chris Wilson	b11b28ea0d	drm/i915/gt: Pull context activation into central intel_context_pin() While this is encroaching on midlayer territory, having already made the state allocation a previous step in pinning, we can now pull the common intel_context_active_acquire() into intel_context_pin() itself. This is a prelude to make the activation a separate step inside pinning, outside of the ce->pin_mutex Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200109085717.873326-2-chris@chris-wilson.co.uk	2020-01-09 13:48:00 +00:00
Tvrtko Ursulin	921f0c47f2	drm/i915: Revert "drm/i915/tgl: Wa_1607138340" This reverts commit `08fff7aedd`. For some yet unexplained reason not having this improves stability of some media workloads. Promise is that the media hang will be root caused properly and in the meantime absence of this workaround is unlikely to cause problems. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Francesco Balestrieri <francesco.balestrieri@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tony Ye <tony.ye@intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200108161954.29739-1-tvrtko.ursulin@linux.intel.com	2020-01-09 12:14:02 +00:00
Chris Wilson	d7cb6975f1	drm/i915/gt: Always force restore freshly pinned contexts It is highly unlikely, but still conceivable, that we submit a context with the same GGTT address as last active on the HW. In this case, with a matching LRCA, the HW would not restore the new context image causing a potential violation of our context isolation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200107172842.3315449-1-chris@chris-wilson.co.uk	2020-01-07 22:31:45 +00:00
Chris Wilson	7807a76b00	drm/i915/gt: Take responsibility for engine->release as the last step In order to avoid a double cleanup on error, take ownership of engine->release past the point of no [error] return. Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Fixes: `e26b6d4341` ("drm/i915/gt: Pull GT initialisation under intel_gt_init()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200107143118.3288995-1-chris@chris-wilson.co.uk	2020-01-07 15:53:26 +00:00
Chris Wilson	1325008f5c	drm/i915/gt: Mark up virtual engine uabi_instance Be sure to initialise the uabi_instance on the virtual engine to the special invalid value, just in case we ever peek at it from the uAPI. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `750e76b4f9` ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106123921.2543886-1-chris@chris-wilson.co.uk (cherry picked from commit `f75fc37b5e`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2020-01-07 12:50:00 +02:00
Chris Wilson	f75fc37b5e	drm/i915/gt: Mark up virtual engine uabi_instance Be sure to initialise the uabi_instance on the virtual engine to the special invalid value, just in case we ever peek at it from the uAPI. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `750e76b4f9` ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106123921.2543886-1-chris@chris-wilson.co.uk	2020-01-06 14:52:57 +00:00
Chris Wilson	ab17e6caa7	drm/i915/gt: Use memset_p to clear the ports Put memset_p to use to clear the array of pointers used for tracking the ELSP. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-6-chris@chris-wilson.co.uk	2020-01-06 14:38:57 +00:00
Chris Wilson	e1c31fb5dd	drm/i915: Merge i915_request.flags with i915_request.fence.flags As we already have a flags field buried within i915_request, reuse it! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk	2020-01-06 14:38:55 +00:00
Chris Wilson	1d0e2c9359	drm/i915/gt: Always poison the kernel_context image before unparking Keep scrubbing the kernel_context image with poison before we reset it in order to demonstrate that we will be resilient in the case where it is accidentally overwritten on idle. Suggested-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-5-chris@chris-wilson.co.uk	2020-01-03 11:26:01 +00:00
Chris Wilson	49a24e71b2	drm/i915/gt: Ignore stale context state upon resume We leave the kernel_context on the HW as we suspend (and while idle). There is no guarantee that is complete in memory, so we try to inhibit restoration from the kernel_context. Reinforce the inhibition by scrubbing the context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-3-chris@chris-wilson.co.uk	2020-01-03 11:26:01 +00:00
Chris Wilson	d1813ca2bb	drm/i915/gt: Clear LRC image inline When creating the initial LRC image, we also want to clear the MI_NOOPs and register values. Rather than use a blanket memset beforehand, apply the clears inline, close the context image and force inhibition of the uninitialised reminder. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-2-chris@chris-wilson.co.uk	2020-01-03 11:26:01 +00:00
Chris Wilson	6a505e644c	drm/i915/gt: Include a bunch more rcs image state Empirically the minimal context image we use for rcs is insufficient to state the engine. This is demonstrated if we poison the context image such that any uninitialised state is invalid, and so if the engine samples beyond our defined region, will fail to start. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-1-chris@chris-wilson.co.uk	2020-01-03 11:26:01 +00:00
Chris Wilson	2b64e616d5	drm/i915/gt: Leave RING_BB_STATE to default value Do not reset RING_BB_STATE, leaving it to the default state value. This prevents bdw/bsw from getting confused when executing batches from the GGTT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191230165821.3840449-2-chris@chris-wilson.co.uk	2019-12-30 20:32:07 +00:00
Chris Wilson	7b02b23e5d	drm/i915/gt: Avoid using tag 0 for the very first submission Assume that the HW starts off with tag 0 "active" and so avoid using tag 0 for our own first ELSP submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191229183153.3719869-2-chris@chris-wilson.co.uk	2019-12-30 13:44:25 +00:00
Chris Wilson	987281ab02	drm/i915/gt: Ensure that all new contexts clear STOP_RING Set up the RING_MI_MODE in new contexts to clear the STOP_RING bit, just in case they find it still set after a reset (as they are the first contexts to be run). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191229183153.3719869-1-chris@chris-wilson.co.uk	2019-12-30 13:43:54 +00:00
Chris Wilson	7d70a1233d	drm/i915/gt: Merge engine init/setup loops Now that we don't need to create GEM contexts in the middle of engine construction, we can pull the engine init/setup loops together. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191222144046.1674865-2-chris@chris-wilson.co.uk	2019-12-22 15:18:05 +00:00
Chris Wilson	e26b6d4341	drm/i915/gt: Pull GT initialisation under intel_gt_init() Begin pulling the GT setup underneath a single GT umbrella; let intel_gt take ownership of its engines! As hinted, the complication is the lifetime of the probed engine versus the active lifetime of the GT backends. We need to detect the engine layout early and keep it until the end so that we can sanitize state on takeover and release. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191222120752.1368352-1-chris@chris-wilson.co.uk	2019-12-22 12:51:32 +00:00
Chris Wilson	e6ba764802	drm/i915: Remove i915->kernel_context Allocate only an internal intel_context for the kernel_context, forgoing a global GEM context for internal use as we only require a separate address space (for our own protection). Now having weaned GT from requiring ce->gem_context, we can stop referencing it entirely. This also means we no longer have to create random and unnecessary GEM contexts for internal use. GEM contexts are now entirely for tracking GEM clients, and intel_context the execution environment on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk	2019-12-21 16:37:10 +00:00
Chris Wilson	a5e93b42f4	drm/i915/execlists: Select arb on/off around batches based on preemption Decide whether or not we need to disable arbitration within user batches based on our intel_engine_has_preemption() flag. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191213151331.1788371-1-chris@chris-wilson.co.uk	2019-12-20 12:00:34 +00:00
Chris Wilson	9f3ccd40ac	drm/i915: Drop GEM context as a direct link from i915_request Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk	2019-12-20 10:52:21 +00:00
Chris Wilson	d5e1935381	drm/i915/gt: Teach veng to defer the context allocation Since we added the context_alloc callback to intel_context_ops, we can safely install a custom hook for the deferred virtual context allocation. This means that all new contexts behave the same upon creation, simplifying later code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191219232932.189197-1-chris@chris-wilson.co.uk	2019-12-20 01:33:17 +00:00
Chris Wilson	7d1ff0d9fa	drm/i915/gt: Add breadcrumb retire to physical engine Avoid adding the retire workers to the virtual engine so that we don't end up in the unenviable situation of trying to free the virtual engine while its worker remains active. Fixes: `dc93c9b693` ("drm/i915/gt: Schedule request retirement when signaler idles") Closes: https://gitlab.freedesktop.org/drm/intel/issues/867 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191219221344.161523-1-chris@chris-wilson.co.uk	2019-12-19 23:24:48 +00:00
Chris Wilson	dc93c9b693	drm/i915/gt: Schedule request retirement when signaler idles Very similar to commit `4f88f8747f` ("drm/i915/gt: Schedule request retirement when timeline idles"), but this time instead of coupling into the execlists CS event interrupt, we couple into the breadcrumb interrupt and queue a timeline's retirement when the last signaler is completed. This should allow us to more rapidly park ringbuffer submission, and so help reduce power consumption on older systems. v2: Fixup intel_engine_add_retire() to handle concurrent callers References: `4f88f8747f` ("drm/i915/gt: Schedule request retirement when timeline idles") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191219124353.8607-1-chris@chris-wilson.co.uk	2019-12-19 17:03:56 +00:00
Chris Wilson	54400257ae	drm/i915/gt: Remove direct invocation of breadcrumb signaling Only signal the breadcrumbs from inside the irq_work, simplifying our interface and calling conventions. The micro-optimisation here is that by always using the irq_work interface, we know we are always inside an irq-off critical section for the breadcrumb signaling and can ellide save/restore of the irq flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217095642.3124521-7-chris@chris-wilson.co.uk	2019-12-18 17:11:28 +00:00
Venkata Sandeep Dhanalakota	639f2f2489	drm/i915: Introduce new macros for tracing New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and GT_TRACE() are introduce to tag device name and engine name with contexts and requests tracing in i915. Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com	2019-12-13 20:16:23 +00:00
Chris Wilson	f26a9e959a	drm/i915/gt: Detect if we miss WaIdleLiteRestore In order to avoid confusing the HW, we must never submit an empty ring during lite-restore, that is we should always advance the RING_TAIL before submitting to stay ahead of the RING_HEAD. Normally this is prevented by keeping a couple of spare NOPs in the request->wa_tail so that on resubmission we can advance the tail. This relies on the request only being resubmitted once, which is the normal condition as it is seen once for ELSP[1] and then later in ELSP[0]. On preemption, the requests are unwound and the tail reset back to the normal end point (as we know the request is incomplete and therefore its RING_HEAD is even earlier). However, if this w/a should fail we would try and resubmit the request with the RING_TAIL already set to the location of this request's wa_tail potentially causing a GPU hang. We can spot when we do try and incorrectly resubmit without advancing the RING_TAIL and spare any embarrassment by forcing the context restore. In the case of preempt-to-busy, we leave the requests running on the HW while we unwind. As the ring is still live, we cannot rewind our rq->tail without forcing a reload so leave it set to rq->wa_tail and only force a reload if we resubmit after a lite-restore. (Normally, the forced reload will be a part of the preemption event.) Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Closes: https://gitlab.freedesktop.org/drm/intel/issues/673 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: stable@kernel.vger.org Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk (cherry picked from commit `82c69bf586`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2019-12-10 12:37:53 +02:00
Chris Wilson	82c69bf586	drm/i915/gt: Detect if we miss WaIdleLiteRestore In order to avoid confusing the HW, we must never submit an empty ring during lite-restore, that is we should always advance the RING_TAIL before submitting to stay ahead of the RING_HEAD. Normally this is prevented by keeping a couple of spare NOPs in the request->wa_tail so that on resubmission we can advance the tail. This relies on the request only being resubmitted once, which is the normal condition as it is seen once for ELSP[1] and then later in ELSP[0]. On preemption, the requests are unwound and the tail reset back to the normal end point (as we know the request is incomplete and therefore its RING_HEAD is even earlier). However, if this w/a should fail we would try and resubmit the request with the RING_TAIL already set to the location of this request's wa_tail potentially causing a GPU hang. We can spot when we do try and incorrectly resubmit without advancing the RING_TAIL and spare any embarrassment by forcing the context restore. In the case of preempt-to-busy, we leave the requests running on the HW while we unwind. As the ring is still live, we cannot rewind our rq->tail without forcing a reload so leave it set to rq->wa_tail and only force a reload if we resubmit after a lite-restore. (Normally, the forced reload will be a part of the preemption event.) Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Closes: https://gitlab.freedesktop.org/drm/intel/issues/673 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: stable@kernel.vger.org Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk	2019-12-10 10:08:07 +00:00
Chris Wilson	36deeddcd3	drm/i915/gt: Save irqstate around virtual_context_destroy As virtual_context_destroy() may be called from a request signal, it may be called from inside an irq-off section, and so we need to do a full save/restore of the irq state rather than blindly re-enable irqs upon unlocking. <4> [110.024262] WARNING: inconsistent lock state <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U <4> [110.024292] -------------------------------- <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes: <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.024592] {IN-HARDIRQ-W} state was registered at: <4> [110.024612] lock_acquire+0xa7/0x1c0 <4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50 <4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915] <4> [110.024808] irq_work_run_list+0x49/0x70 <4> [110.024824] irq_work_run+0x26/0x50 <4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0 <4> [110.024855] irq_work_interrupt+0xf/0x20 <4> [110.024871] __do_softirq+0xb7/0x47f <4> [110.024885] irq_exit+0xba/0xc0 <4> [110.024898] do_IRQ+0x83/0x160 <4> [110.024910] ret_from_intr+0x0/0x1d <4> [110.024922] irq event stamp: 172864 <4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50 <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40 <4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0 <4> [110.025031] other info that might help us debug this: <4> [110.025049] Possible unsafe locking scenario: <4> [110.025065] CPU0 <4> [110.025075] ---- <4> [110.025084] lock(&(&rq->lock)->rlock); <4> [110.025099] <Interrupt> <4> [110.025109] lock(&(&rq->lock)->rlock); <4> [110.025124] * DEADLOCK * <4> [110.025144] 4 locks held by kworker/0:0/5: <4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915] <4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.025634] stack backtrace: <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1 <4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4> [110.025856] Workqueue: events engine_retire [i915] <4> [110.025872] Call Trace: <4> [110.025891] dump_stack+0x71/0x9b <4> [110.025907] mark_lock+0x49a/0x500 <4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200 <4> [110.025946] mark_held_locks+0x49/0x70 <4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50 <4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0 <4> [110.025995] _raw_spin_unlock_irq+0x24/0x50 <4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915] <4> [110.026376] __active_retire+0xb4/0x290 [i915] <4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0 <4> [110.026613] i915_request_retire+0x451/0x930 [i915] <4> [110.026766] retire_requests+0x4d/0x60 [i915] <4> [110.026919] engine_retire+0x63/0xe0 [i915] Fixes: `b1e3177bd1` ("drm/i915: Coordinate i915_active with its own mutex") Fixes: `6d06779e86` ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205145934.663183-1-chris@chris-wilson.co.uk (cherry picked from commit `6f7ac82853`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2019-12-09 16:00:48 +02:00
Chris Wilson	6f7ac82853	drm/i915/gt: Save irqstate around virtual_context_destroy As virtual_context_destroy() may be called from a request signal, it may be called from inside an irq-off section, and so we need to do a full save/restore of the irq state rather than blindly re-enable irqs upon unlocking. <4> [110.024262] WARNING: inconsistent lock state <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U <4> [110.024292] -------------------------------- <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes: <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.024592] {IN-HARDIRQ-W} state was registered at: <4> [110.024612] lock_acquire+0xa7/0x1c0 <4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50 <4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915] <4> [110.024808] irq_work_run_list+0x49/0x70 <4> [110.024824] irq_work_run+0x26/0x50 <4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0 <4> [110.024855] irq_work_interrupt+0xf/0x20 <4> [110.024871] __do_softirq+0xb7/0x47f <4> [110.024885] irq_exit+0xba/0xc0 <4> [110.024898] do_IRQ+0x83/0x160 <4> [110.024910] ret_from_intr+0x0/0x1d <4> [110.024922] irq event stamp: 172864 <4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50 <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40 <4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0 <4> [110.025031] other info that might help us debug this: <4> [110.025049] Possible unsafe locking scenario: <4> [110.025065] CPU0 <4> [110.025075] ---- <4> [110.025084] lock(&(&rq->lock)->rlock); <4> [110.025099] <Interrupt> <4> [110.025109] lock(&(&rq->lock)->rlock); <4> [110.025124] * DEADLOCK * <4> [110.025144] 4 locks held by kworker/0:0/5: <4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915] <4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.025634] stack backtrace: <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1 <4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4> [110.025856] Workqueue: events engine_retire [i915] <4> [110.025872] Call Trace: <4> [110.025891] dump_stack+0x71/0x9b <4> [110.025907] mark_lock+0x49a/0x500 <4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200 <4> [110.025946] mark_held_locks+0x49/0x70 <4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50 <4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0 <4> [110.025995] _raw_spin_unlock_irq+0x24/0x50 <4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915] <4> [110.026376] __active_retire+0xb4/0x290 [i915] <4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0 <4> [110.026613] i915_request_retire+0x451/0x930 [i915] <4> [110.026766] retire_requests+0x4d/0x60 [i915] <4> [110.026919] engine_retire+0x63/0xe0 [i915] Fixes: `b1e3177bd1` ("drm/i915: Coordinate i915_active with its own mutex") Fixes: `6d06779e86` ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205145934.663183-1-chris@chris-wilson.co.uk	2019-12-05 17:27:55 +00:00
Chris Wilson	f70de8d2ca	drm/i915/gt: Track the context validity explicitly Rather than assume if and only if the engine->default_state is not set that the context is invalid, instead track when we know the context has valid state -- either because we have copied the default_state or we have completed a context switch to save the HW state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203124155.3019926-1-chris@chris-wilson.co.uk	2019-12-03 16:49:31 +00:00
Chris Wilson	49e74c8f9a	drm/i915/execlists: Skip nested spinlock for validating pending Only along the submission path can we guarantee that the locked request is indeed from a foreign engine, and so the nesting of engine/rq is permissible. On the submission tasklet (process_csb()), we may find ourselves competing with the normal nesting of rq/engine, invalidating our nesting. As we only use the spinlock for debug purposes, skip the debug if we cannot acquire the spinlock for safe validation - catching 99% of the bugs is better than causing a hard lockup. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `c95d31c3df` ("drm/i915/execlists: Lock the request while validating it during promotion") Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-2-chris@chris-wilson.co.uk	2019-12-03 15:42:28 +00:00
Chris Wilson	80aac91b27	drm/i915/execlists: Add a couple more validity checks to assert_pending() Check the pending request submission is valid: that it at least has a reference for the submission and that the request is on the active list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-1-chris@chris-wilson.co.uk	2019-12-03 15:42:28 +00:00
Chris Wilson	97c1635397	drm/i915/execlists: Ensure the tasklet is decoupled upon shutdown As we only cancel the timers asynchronously, they may still be running on another CPU as we shutdown, raising one last softirq. So be safe and make sure the tasklet is flushed before destroying the engine's memory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129172542.1222810-1-chris@chris-wilson.co.uk	2019-11-29 20:09:14 +00:00
Chris Wilson	311770173f	drm/i915/gt: Schedule request retirement when timeline idles The major drawback of commit `7e34f4e4aa` ("drm/i915/gen8+: Add RC6 CTX corruption WA") is that it disables RC6 while Skylake (and friends) is active, and we do not consider the GPU idle until all outstanding requests have been retired and the engine switched over to the kernel context. If userspace is idle, this task falls onto our background idle worker, which only runs roughly once a second, meaning that userspace has to have been idle for a couple of seconds before we enable RC6 again. Naturally, this causes us to consume considerably more energy than before as powersaving is effectively disabled while a display server (here's looking at you Xorg) is running. As execlists will get a completion event as each context is completed, we can use this interrupt to queue a retire worker bound to this engine to cleanup idle timelines. We will then immediately notice the idle engine (without userspace intervention or the aid of the background retire worker) and start parking the GPU. Thus during light workloads, we will do much more work to idle the GPU faster... Hopefully with commensurate power saving! v2: Watch context completions and only look at those local to the engine when retiring to reduce the amount of excess work we perform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315 References: `7e34f4e4aa` ("drm/i915/gen8+: Add RC6 CTX corruption WA") References: `2248a28384` ("drm/i915/gen8+: Add RC6 CTX corruption WA") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk (cherry picked from commit `4f88f8747f`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2019-11-25 16:39:07 +02:00
Chris Wilson	4ec5cc78c1	drm/i915/execlists: Fixup cancel_port_requests() I rushed a last minute correction to cancel_port_requests() to prevent the snooping of *execlists->active as the inflight array was being updated, without noticing we iterated the inflight array starting from active! Oops. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387 Fixes: `97f9af78f3` ("drm/i915/gt: Mark the execlists->active as the primary volatile access") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk (cherry picked from commit `da0ef77e1e`) [Joonas: Fixed Fixes: tag to match drm-intel-next-fixes] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2019-11-25 16:38:51 +02:00
Chris Wilson	97f9af78f3	drm/i915/gt: Mark the execlists->active as the primary volatile access Since we want to do a lockless read of the current active request, and that request is written to by process_csb also without serialisation, we need to instruct gcc to take care in reading the pointer itself. Otherwise, we have observed execlists_active() to report 0x40. [ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4 [ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000 [ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 } [ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940 which is impossible! The answer is that as we keep the existing execlists->active pointing into the array as we copy over that array, the unserialised read may see a partial pointer value. Fixes: `df40306902` ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk (cherry picked from commit `331bf90591`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2019-11-25 16:36:40 +02:00
Chris Wilson	ee33baa831	drm/i915: Mark up the calling context for intel_wakeref_put() Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: `51fbd8de87` ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: `a0855d24fc` ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk (cherry picked from commit `07779a76ee`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2019-11-25 15:29:17 +02:00
Chris Wilson	4f88f8747f	drm/i915/gt: Schedule request retirement when timeline idles The major drawback of commit `7e34f4e4aa` ("drm/i915/gen8+: Add RC6 CTX corruption WA") is that it disables RC6 while Skylake (and friends) is active, and we do not consider the GPU idle until all outstanding requests have been retired and the engine switched over to the kernel context. If userspace is idle, this task falls onto our background idle worker, which only runs roughly once a second, meaning that userspace has to have been idle for a couple of seconds before we enable RC6 again. Naturally, this causes us to consume considerably more energy than before as powersaving is effectively disabled while a display server (here's looking at you Xorg) is running. As execlists will get a completion event as each context is completed, we can use this interrupt to queue a retire worker bound to this engine to cleanup idle timelines. We will then immediately notice the idle engine (without userspace intervention or the aid of the background retire worker) and start parking the GPU. Thus during light workloads, we will do much more work to idle the GPU faster... Hopefully with commensurate power saving! v2: Watch context completions and only look at those local to the engine when retiring to reduce the amount of excess work we perform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315 References: `7e34f4e4aa` ("drm/i915/gen8+: Add RC6 CTX corruption WA") References: `2248a28384` ("drm/i915/gen8+: Add RC6 CTX corruption WA") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Chris Wilson	da0ef77e1e	drm/i915/execlists: Fixup cancel_port_requests() I rushed a last minute correction to cancel_port_requests() to prevent the snooping of *execlists->active as the inflight array was being updated, without noticing we iterated the inflight array starting from active! Oops. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387 Fixes: `331bf90591` ("drm/i915/gt: Mark the execlists->active as the primary volatile access") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Chris Wilson	331bf90591	drm/i915/gt: Mark the execlists->active as the primary volatile access Since we want to do a lockless read of the current active request, and that request is written to by process_csb also without serialisation, we need to instruct gcc to take care in reading the pointer itself. Otherwise, we have observed execlists_active() to report 0x40. [ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4 [ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000 [ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 } [ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940 which is impossible! The answer is that as we keep the existing execlists->active pointing into the array as we copy over that array, the unserialised read may see a partial pointer value. Fixes: `df40306902` ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk	2019-11-25 09:45:37 +00:00
Chris Wilson	93b0e8fe47	drm/i915: Mark intel_wakeref_get() as a sleeper Assume that intel_wakeref_get() may take the mutex, and perform other sleeping actions in the course of its callbacks and so use might_sleep() to ensure that all callers abide. Anything that cannot sleep has to use e.g. intel_wakeref_get_if_active() to guarantee its avoidance of the non-atomic paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121130528.309474-1-chris@chris-wilson.co.uk	2019-11-21 13:22:04 +00:00
Chris Wilson	c95d31c3df	drm/i915/execlists: Lock the request while validating it during promotion Since the request is already on the HW as we perform its validation, it and even its subsequent barrier may be concurrently retired before we process the assertions. If it is retired already and so off the HW, our assertions become void and we need to ignore them. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112363 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121103546.146487-1-chris@chris-wilson.co.uk	2019-11-21 12:14:45 +00:00
Chris Wilson	07779a76ee	drm/i915: Mark up the calling context for intel_wakeref_put() Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: `51fbd8de87` ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: `a0855d24fc` ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk	2019-11-20 15:59:23 +00:00
Chris Wilson	98ae6fb3f1	drm/i915/execlists: Move reset_active() from schedule-out to schedule-in The gem_ctx_persistence/smoketest was detecting an odd coherency issue inside the LRC context image; that the address of the ring buffer did not match our associated struct intel_ring. As we set the address into the context image when we pin the ring buffer into place before the context is active, that leaves the question of where did it get overwritten. Either the HW context save occurred after our pin which would imply that our idle barriers are broken, or we overwrote the context image ourselves. It is only in reset_active() where we dabble inside the context image outside of a serialised path from schedule-out; but we could equally perform the operation inside schedule-in which is then fully serialised with the context pin -- and remains serialised by the engine pulse with kill_context(). (The only downside, aside from doing more work inside the engine->active.lock, was the plan to merge all the reset paths into doing their context scrubbing on schedule-out needs more thought.) Fixes: `d12acee84f` ("drm/i915/execlists: Cancel banned contexts on schedule-out") Testcase: igt/gem_ctx_persistence/smoketest Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-3-chris@chris-wilson.co.uk (cherry picked from commit `31b61f0ef9`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2019-11-13 12:26:35 +02:00
Chris Wilson	31b61f0ef9	drm/i915/execlists: Move reset_active() from schedule-out to schedule-in The gem_ctx_persistence/smoketest was detecting an odd coherency issue inside the LRC context image; that the address of the ring buffer did not match our associated struct intel_ring. As we set the address into the context image when we pin the ring buffer into place before the context is active, that leaves the question of where did it get overwritten. Either the HW context save occurred after our pin which would imply that our idle barriers are broken, or we overwrote the context image ourselves. It is only in reset_active() where we dabble inside the context image outside of a serialised path from schedule-out; but we could equally perform the operation inside schedule-in which is then fully serialised with the context pin -- and remains serialised by the engine pulse with kill_context(). (The only downside, aside from doing more work inside the engine->active.lock, was the plan to merge all the reset paths into doing their context scrubbing on schedule-out needs more thought.) Fixes: `d12acee84f` ("drm/i915/execlists: Cancel banned contexts on schedule-out") Testcase: igt/gem_ctx_persistence/smoketest Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-3-chris@chris-wilson.co.uk	2019-11-11 16:37:05 +00:00
Chris Wilson	69a48c1d28	drm/i915/execlists: Reduce barrier on context switch to a wmb() Having been forced to reduce Braswell back to using the aliasing ppgtt, the coherency issue we previously observed cannot impact us. Reduce the performance penalty imposed on all platforms from using the mfence to a mere sfence. References: `cf66b8a0ba` ("drm/i915/execlists: Apply a full mb before execution for Braswell") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191110185806.17413-10-chris@chris-wilson.co.uk	2019-11-11 13:27:03 +00:00
Chris Wilson	0a1f57b86c	drm/i915/execlists: Reset CSB pointers by mmio as well Sometimes Icelake forgets to reset the CSB pointers on a GPU reset, leading to it carry on updating the old tail of the buffer. <0>[ 618.138490] i915_sel-5636 3d..1 673425465us : trace_ports: vecs0: submit { 14de2:504, 0:0 } <0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_reset: vecs0 flags=100 <0>[ 618.138490] i915_sel-5636 3.... 673425493us : execlists_reset_prepare: vecs0: depth<-0 <0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_stop_cs: vecs0 <0>[ 618.138490] i915_sel-5636 3.... 673425523us : __intel_gt_reset: engine_mask=40 <0>[ 618.138490] i915_sel-5636 3.... 673425568us : execlists_reset: vecs0 <0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0 cs-irq head=1, tail=2 <0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0 csb[2]: status=0x00000001:0x40000000 <0>[ 618.138490] i915_sel-5636 3d..1 673425569us : trace_ports: vecs0: promote { 14de2:504, 0:0 } <0>[ 618.138490] i915_sel-5636 3d..1 673425570us : __i915_request_reset: vecs0 rq=14de2:504, guilty? yes <0>[ 618.138490] i915_sel-5636 3d..1 673425571us : __execlists_reset: vecs0 replay {head:2de0, tail:2e48} <0>[ 618.138490] i915_sel-5636 3d..1 673425572us : __i915_request_unsubmit: vecs0 fence 14de2:504, current 503 <0>[ 618.138490] i915_sel-5636 3.... 673435544us : intel_engine_cancel_stop_cs: vecs0 <0>[ 618.138490] i915_sel-5636 3.... 673435544us : process_csb: vecs0 cs-irq head=11, tail=11 <0>[ 618.138490] i915_sel-5636 3d..1 673435545us : __i915_request_submit: vecs0 fence 14de2:504, current 503 <0>[ 618.138490] i915_sel-5636 3d..1 673435546us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648, submit:yes <0>[ 618.138490] i915_sel-5636 3d..1 673435548us : trace_ports: vecs0: submit { 14de2:504, 0:0 } <0>[ 618.138490] i915_sel-5636 3.... 673435549us : execlists_reset_finish: vecs0: depth->0 <0>[ 618.138490] ksoftirq-21 2..s. 673435592us : process_csb: vecs0 cs-irq head=11, tail=3 <0>[ 618.138490] ksoftirq-21 2..s. 673435593us : process_csb: vecs0 csb[0]: status=0x00000001:0x40000000 <0>[ 618.138490] ksoftirq-21 2..s. 673435594us : trace_ports: vecs0: promote { 14de2:504, 0:0 } <0>[ 618.138490] ksoftirq-21 2..s. 673435596us : process_csb: vecs0 csb[1]: status=0x00000018:0x40000040 <0>[ 618.138490] ksoftirq-21 2..s. 673435597us : trace_ports: vecs0: completed { 14de2:504, 0:0 } <0>[ 618.138490] ksoftirq-21 2..s. 673435612us : process_csb: process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists)) After the reset, we do another clflush before checking the CSB to be sure we see whatever was left in the CSB prior to the reset. So it is unlikely to be an incoherent view of the CSB, and more likely that Icelake didn't reset its pointers. References: `582a6f90aa` ("drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191104135307.21083-1-chris@chris-wilson.co.uk	2019-11-04 15:54:26 +00:00
Chris Wilson	3809875071	drm/i915/execlists: Ignore the inactive kernel context in assert_pending_valid Filter out warnings for the kernel context that is used to flush inactive contexts, as they do no not pose a risk. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101082919.21122-1-chris@chris-wilson.co.uk	2019-11-02 14:48:32 +00:00
Chris Wilson	b0b1024886	drm/i915/execlists: Verify context register state before execution Check that the context's ring register state still matches our expectations prior to execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191102125739.24626-1-chris@chris-wilson.co.uk	2019-11-02 13:39:13 +00:00
Daniele Ceraolo Spurio	9f37940756	drm/i915: drop lrc header page Recent GuC binaries (including all the ones we're currently using) don't require this shared area anymore, having moved the relevant entries into the stage pool instead. i915 itself doesn't write anything into it either, so we can safely drop it. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191031013040.25803-1-daniele.ceraolospurio@intel.com	2019-10-31 16:47:22 +00:00
Chris Wilson	b79029b2e8	drm/i915/gt: Make timeslice duration configurable Execlists uses a scheduling quantum (a timeslice) to alternate execution between ready-to-run contexts of equal priority. This ensures that all users (though only if they of equal importance) have the opportunity to run and prevents livelocks where contexts may have implicit ordering due to userspace semaphores. However, not all workloads necessarily benefit from timeslicing and in the extreme some sysadmin may want to disable or reduce the timeslicing granularity. The timeslicing mechanism can be compiled out^W^W disabled (but should DCE!) with ./scripts/config --set-val DRM_I915_TIMESLICE_DURATION 0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191029091632.26281-1-chris@chris-wilson.co.uk	2019-10-29 16:23:55 +00:00
Michal Wajdeczko	19c17b763f	drm/i915/execlists: Use vfunc to check engine submission mode While processing CSB there is no need to look at GuC submission settings, just check if engine is configured for execlists mode. While today GuC submission is disabled it's settings are still based on modparam values that might not correctly reflect actual submission status in case of any fallback. Until that is fully fixed, use alternate method to confirm that engine really runs in execlists mode by comparing set_default_submission vfunc. v2: add other immediate use of new helper Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com	2019-10-28 20:40:07 +00:00
Chris Wilson	a7f328fc78	drm/i915/execlists: Simply walk back along request timeline on reset The request's timeline will only contain requests from this context, in order of execution. Therefore, we can simply look back along this timeline to find the currently executing request. If we do find that the current context has completed its last request, that does not imply that all requests are completed in the context, so only advance the ring->head up to the end of the known completions! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191028124125.25176-1-chris@chris-wilson.co.uk	2019-10-28 16:09:44 +00:00
Chris Wilson	35865aef05	drm/i915/tgl: Adjust the location of RING_MI_MODE in the context image The location of RING_MI_MODE (used to stop the ring across resets) moved for Tigerlake. Fixup the new location and include a selftest to verify the location in the default context image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191026082220.32632-1-chris@chris-wilson.co.uk	2019-10-26 09:48:34 +01:00
Tvrtko Ursulin	5932925ac1	drm/i915: Move intel_engine_context_in/out into intel_lrc.c Intel_lrc.c is the only caller and so to avoid some header file ordering issues in future patches move these two over there. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191025090952.10135-1-tvrtko.ursulin@linux.intel.com	2019-10-25 13:22:04 +01:00
Chris Wilson	2871ea85c1	drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk	2019-10-24 12:14:21 +01:00
Chris Wilson	d12acee84f	drm/i915/execlists: Cancel banned contexts on schedule-out On schedule-out (CS completion) of a banned context, scrub the context image so that we do not replay the active payload. The intent is that we skip banned payloads on request submission so that the timeline advancement continues on in the background. However, if we are returning to a preempted request, i915_request_skip() is ineffective and instead we need to patch up the context image so that it continues from the start of the next request. v2: Fixup cancellation so that we only scrub the payload of the active request and do not short-circuit the breadcrumbs (which might cause other contexts to execute out of order). v3: Grammar pass Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-3-chris@chris-wilson.co.uk	2019-10-23 23:52:10 +01:00
Chris Wilson	3a7a92aba8	drm/i915/execlists: Force preemption If the preempted context takes too long to relinquish control, e.g. it is stuck inside a shader with arbitration disabled, evict that context with an engine reset. This ensures that preemptions are reasonably responsive, providing a tighter QoS for the more important context at the cost of flagging unresponsive contexts more frequently (i.e. instead of using an ~10s hangcheck, we now evict at ~100ms). The challenge of lies in picking a timeout that can be reasonably serviced by HW for typical workloads, balancing the existing clients against the needs for responsiveness. Note that coupled with timeslicing, this will lead to rapid GPU "hang" detection with multiple active contexts vying for GPU time. The forced preemption mechanism can be compiled out with ./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-2-chris@chris-wilson.co.uk	2019-10-23 23:52:10 +01:00
Chris Wilson	0587152bf9	drm/i915: Drop assertion that ce->pin_mutex guards state updates The actual conditions are that we know the GPU is not accessing the context, and we hold a pin on the context image to allow CPU access. We used a fake lock on ce->pin_mutex so that we could try and use lockdep to assert that access is serialised, but the various different hardirq/softirq contexts where we need to fake holding the pin_mutex are causing more trouble. Still it would be nice if we did have a way to reassure ourselves that the direct update to the context image is serialised with GPU execution. In the meantime, stop lockdep complaining about false irq inversions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk	2019-10-22 13:32:01 +01:00
Chris Wilson	253a774bb0	drm/i915/execlists: Don't merely skip submission if maybe timeslicing Normally, we try and skip submission if ELSP[1] is filled. However, we may desire to enable timeslicing due to the queue priority, even if ELSP[1] itself does not require timeslicing. That is the queue is equal priority to ELSP[0] and higher priority then ELSP[1]. Previously, we would wait until the context switch to preempt the current ELSP[1], but with timeslicing, we want to preempt ELSP[0] and replace it with the queue. In writing the test case, it become quickly apparent that we were also suppressing the tasklet during promotion and so failing to notice when the queue started requiring timeslicing. Fixes: `2229adc813` ("drm/i915/execlist: Trim immediate timeslice expiry") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191018072027.31948-1-chris@chris-wilson.co.uk	2019-10-18 11:23:26 +01:00
Chris Wilson	2229adc813	drm/i915/execlist: Trim immediate timeslice expiry We perform timeslicing immediately upon receipt of a request that may be put into the second ELSP slot. The idea behind this was that since we didn't install the timer if the second ELSP slot was empty, we would not have any idea of how long ELSP[0] had been running and so giving the newcomer a chance on the GPU was fair. However, this causes us extra busy work that we may be able to avoid if we wait a jiffie for the first timeslice as normal. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191016100851.4979-1-chris@chris-wilson.co.uk	2019-10-16 14:05:45 +01:00
Mika Kuoppala	08fff7aedd	drm/i915/tgl: Wa_1607138340 Avoid possible cs hang with semaphores by disabling lite restore. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-11-mika.kuoppala@linux.intel.com	2019-10-15 18:25:52 +01:00
Mika Kuoppala	2e19af9438	drm/i915/tgl: Wa_1409600907 To avoid possible hang, we need to add depth stall if we flush the depth cache. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-8-mika.kuoppala@linux.intel.com	2019-10-15 18:23:10 +01:00
Mika Kuoppala	36a6b5d964	drm/i915/tgl: Add extra hdc flush workaround In order to ensure constant caches are invalidated properly with a0, we need extra hdc flush after invalidation. v2: use IS_TGL_REVID (Chris) References: HSDES#1604544889 Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-4-mika.kuoppala@linux.intel.com	2019-10-15 18:16:51 +01:00
Mika Kuoppala	4aa0b5d457	drm/i915/tgl: Add HDC Pipeline Flush Add hdc pipeline flush to ensure memory state is coherent in L3 when we are done. v2: Flush also in breadcrumbs (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-3-mika.kuoppala@linux.intel.com	2019-10-15 18:15:59 +01:00
Mika Kuoppala	62037ffff2	drm/i915/tgl: Include ro parts of l3 to invalidate Aim for completeness and invalidate also the ro parts in l3 cache. This might allow to get rid of the preparser disable/enable workaround on invalidation path. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-2-mika.kuoppala@linux.intel.com	2019-10-15 18:13:50 +01:00
Chris Wilson	8b390c1581	drm/i915/execlists: Clear semaphore immediately upon ELSP promotion There is no significance to our delay before clearing the semaphore the engine is waiting on, so release it as soon as we acknowledge the CS update following our preemption request. This should allow the GPU to resume work earlier, if it was stuck on the semaphore at the end of a request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191015093204.25693-1-chris@chris-wilson.co.uk	2019-10-15 11:51:13 +01:00
Chris Wilson	3c00660db1	drm/i915/execlists: Assert tasklet is locked for process_csb() We rely on only the tasklet being allowed to call into process_csb(), so assert that is locked when we do. As the tasklet uses a simple bitlock, there is no strong lockdep checking so we must make do with a plain assertion that the tasklet is running and assume that we are the tasklet! v2: Fixup intel_gt_sanitize() to prepare each engine for the reset so that the locks are marked as held during the reset v3: Check for existent function pointers for very early sanitisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014121336.30137-1-chris@chris-wilson.co.uk	2019-10-14 21:10:59 +01:00
Chris Wilson	89b6d1831d	drm/i915/execlists: Tweak virtual unsubmission Since commit `e2144503bf` ("drm/i915: Prevent bonded requests from overtaking each other on preemption") we have restricted requests to run on their chosen engine across preemption events. We can take this restriction into account to know that we will want to resubmit those requests onto the same physical engine, and so can shortcircuit the virtual engine selection process and keep the request on the same engine during unwind. References: `e2144503bf` ("drm/i915: Prevent bonded requests from overtaking each other on preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Ramlingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk	2019-10-14 12:51:17 +01:00
Chris Wilson	c3eb54aad9	drm/i915: Mark up "sentinel" requests Sometimes we want to emit a terminator request, a request that flushes the pipeline and allows no request to come after it. This can be used for a "preempt-to-idle" to ensure that upon processing the context-switch to that request, all other active contexts have been flushed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191012070136.32058-1-chris@chris-wilson.co.uk	2019-10-12 08:51:17 +01:00
Chris Wilson	d8ad5f5261	drm/i915/execlists: Prevent merging requests with conflicting flags We set out-of-bound parameters inside the i915_requests.flags field, such as disabling preemption or marking the end-of-context. We should not coalesce consecutive requests if they have differing instructions as we only inspect the last active request in a context. Thus if we allow a later request to be merged into the same execution context, it will mask any of the earlier flags. References: `2a98f4e65b` ("drm/i915: add infrastructure to hold off preemption on a request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-9-chris@chris-wilson.co.uk	2019-10-12 07:54:52 +01:00
Chris Wilson	cbbf278778	drm/i915/execlists: Only mark incomplete requests as -EIO on cancelling Only the requests that have not completed do we want to change the status of to signal the -EIO when cancelling the inflight set of requests upon wedging. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011103345.26013-1-chris@chris-wilson.co.uk	2019-10-11 13:07:24 +01:00
Chris Wilson	c97fb526ca	drm/i915/execlists: Leave tell-tales as to why pending[] is bad Before we BUG out with bad pending state, leave a telltale as to which test failed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010071434.31195-2-chris@chris-wilson.co.uk	2019-10-11 09:43:06 +01:00
Chris Wilson	bd9bec5b6a	drm/i915/execlists: Mark up expected state during reset Move the BUG_ON around slightly and add some explanations for each to try and capture the expected state more carefully. We want to compare the expected active state of our bookkeeping as compared to the tracked HW state. References: https://bugs.freedesktop.org/show_bug.cgi?id=111937 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010083242.1387-1-chris@chris-wilson.co.uk	2019-10-10 13:52:34 +01:00
Daniele Ceraolo Spurio	9d41318c4e	drm/i915/tgl: simplify the lrc register list for !RCS There are small differences between the blitter and the video engines in the xcs context image (e.g. registers 0x200 and 0x204 only exist on the blitter). Since we never explicitly set a value for those register and given that we don't need to update the offsets in the lrc image when we change engine within the class for virtual engine because the HW can handle that, instead of having a separate define for the BCS we can just restrict the programming to the part we're interested in, which is common across the engines. Bspec: 45584 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-2-daniele.ceraolospurio@intel.com	2019-10-10 10:14:42 +01:00
Daniele Ceraolo Spurio	ba2c74da52	drm/i915/tgl: the BCS engine supports relative MMIO The specs don't mention any specific HW limitation on the blitter and manual inspection shows that the HW does set the relative MMIO bit in the LRI of the blitter context image, so we can remove our limitations. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-1-daniele.ceraolospurio@intel.com	2019-10-10 10:12:18 +01:00
Chris Wilson	c949ae4314	drm/i915/execlists: Protect peeking at execlists->active Now that we dropped the engine->active.lock serialisation from around process_csb(), direct submission can run concurrently to the interrupt handler. As such execlists->active may be advanced as we dequeue, dropping the reference to the request. We need to employ our RCU request protection to ensure that the request is not freed too early. Fixes: `df40306902` ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk	2019-10-09 19:46:40 +01:00
Chris Wilson	20af04f3dd	drm/i915/execlists: Assign virtual_engine->uncore from first sibling Copy across the engine->uncore shortcut to the virtual_engine from its first physical engine, similar to the handling of the engine->gt backpointer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191008070342.4045-1-chris@chris-wilson.co.uk	2019-10-08 10:14:29 +01:00
Chris Wilson	08ad9a3846	drm/i915/execlists: Fix annotation for decoupling virtual request As we may signal a request and take the engine->active.lock within the signaler, the engine submission paths have to use a nested annotation on their requests -- but we guarantee that we can never submit on the same engine as the signaling fence. <4>[ 723.763281] WARNING: possible circular locking dependency detected <4>[ 723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G U <4>[ 723.763288] ------------------------------------------------------ <4>[ 723.763291] gem_exec_await/1388 is trying to acquire lock: <4>[ 723.763294] ffff93a7b53221d8 (&engine->active.lock){..-.}, at: execlists_submit_request+0x2b/0x1e0 [i915] <4>[ 723.763378] but task is already holding lock: <4>[ 723.763381] ffff93a7c25f6d20 (&i915_request_get(rq)->submit/1){-.-.}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915] <4>[ 723.763420] which lock already depends on the new lock. <4>[ 723.763423] the existing dependency chain (in reverse order) is: <4>[ 723.763427] -> #2 (&i915_request_get(rq)->submit/1){-.-.}: <4>[ 723.763434] _raw_spin_lock_irqsave_nested+0x39/0x50 <4>[ 723.763478] __i915_sw_fence_complete+0x1b2/0x250 [i915] <4>[ 723.763513] intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915] <4>[ 723.763600] cs_irq_handler+0x49/0x50 [i915] <4>[ 723.763659] gen11_gt_irq_handler+0x17b/0x280 [i915] <4>[ 723.763690] gen11_irq_handler+0x54/0xf0 [i915] <4>[ 723.763695] __handle_irq_event_percpu+0x41/0x2d0 <4>[ 723.763699] handle_irq_event_percpu+0x2b/0x70 <4>[ 723.763702] handle_irq_event+0x2f/0x50 <4>[ 723.763706] handle_edge_irq+0xee/0x1a0 <4>[ 723.763709] do_IRQ+0x7e/0x160 <4>[ 723.763712] ret_from_intr+0x0/0x1d <4>[ 723.763717] __slab_alloc.isra.28.constprop.33+0x4f/0x70 <4>[ 723.763720] kmem_cache_alloc+0x28d/0x2f0 <4>[ 723.763724] vm_area_dup+0x15/0x40 <4>[ 723.763727] dup_mm+0x2dd/0x550 <4>[ 723.763730] copy_process+0xf21/0x1ef0 <4>[ 723.763734] _do_fork+0x71/0x670 <4>[ 723.763737] __se_sys_clone+0x6e/0xa0 <4>[ 723.763741] do_syscall_64+0x4f/0x210 <4>[ 723.763744] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 723.763747] -> #1 (&(&rq->lock)->rlock#2){-.-.}: <4>[ 723.763752] _raw_spin_lock+0x2a/0x40 <4>[ 723.763789] __unwind_incomplete_requests+0x3eb/0x450 [i915] <4>[ 723.763825] __execlists_submission_tasklet+0x9ec/0x1d60 [i915] <4>[ 723.763864] execlists_submission_tasklet+0x34/0x50 [i915] <4>[ 723.763874] tasklet_action_common.isra.5+0x47/0xb0 <4>[ 723.763878] __do_softirq+0xd8/0x4ae <4>[ 723.763881] irq_exit+0xa9/0xc0 <4>[ 723.763883] smp_apic_timer_interrupt+0xb7/0x280 <4>[ 723.763887] apic_timer_interrupt+0xf/0x20 <4>[ 723.763892] cpuidle_enter_state+0xae/0x450 <4>[ 723.763895] cpuidle_enter+0x24/0x40 <4>[ 723.763899] do_idle+0x1e7/0x250 <4>[ 723.763902] cpu_startup_entry+0x14/0x20 <4>[ 723.763905] start_secondary+0x15f/0x1b0 <4>[ 723.763908] secondary_startup_64+0xa4/0xb0 <4>[ 723.763911] -> #0 (&engine->active.lock){..-.}: <4>[ 723.763916] __lock_acquire+0x15d8/0x1ea0 <4>[ 723.763919] lock_acquire+0xa6/0x1c0 <4>[ 723.763922] _raw_spin_lock_irqsave+0x33/0x50 <4>[ 723.763956] execlists_submit_request+0x2b/0x1e0 [i915] <4>[ 723.764002] submit_notify+0xa8/0x13c [i915] <4>[ 723.764035] __i915_sw_fence_complete+0x81/0x250 [i915] <4>[ 723.764054] i915_sw_fence_wake+0x51/0x64 [i915] <4>[ 723.764054] __i915_sw_fence_complete+0x1ee/0x250 [i915] <4>[ 723.764054] dma_i915_sw_fence_wake_timer+0x14/0x20 [i915] <4>[ 723.764054] dma_fence_signal_locked+0x9e/0x1c0 <4>[ 723.764054] dma_fence_signal+0x1f/0x40 <4>[ 723.764054] vgem_fence_signal_ioctl+0x67/0xc0 [vgem] <4>[ 723.764054] drm_ioctl_kernel+0x83/0xf0 <4>[ 723.764054] drm_ioctl+0x2f3/0x3b0 <4>[ 723.764054] do_vfs_ioctl+0xa0/0x6f0 <4>[ 723.764054] ksys_ioctl+0x35/0x60 <4>[ 723.764054] __x64_sys_ioctl+0x11/0x20 <4>[ 723.764054] do_syscall_64+0x4f/0x210 <4>[ 723.764054] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 723.764054] other info that might help us debug this: <4>[ 723.764054] Chain exists of: &engine->active.lock --> &(&rq->lock)->rlock#2 --> &i915_request_get(rq)->submit/1 <4>[ 723.764054] Possible unsafe locking scenario: <4>[ 723.764054] CPU0 CPU1 <4>[ 723.764054] ---- ---- <4>[ 723.764054] lock(&i915_request_get(rq)->submit/1); <4>[ 723.764054] lock(&(&rq->lock)->rlock#2); <4>[ 723.764054] lock(&i915_request_get(rq)->submit/1); <4>[ 723.764054] lock(&engine->active.lock); <4>[ 723.764054] * DEADLOCK * Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111862 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004194758.19679-1-chris@chris-wilson.co.uk	2019-10-07 21:44:02 +01:00
Chris Wilson	2935ed5339	drm/i915: Remove logical HW ID With the introduction of ctx->engines[] we allow multiple logical contexts to be used on the same engine (e.g. with virtual engines). According to bspec, aach logical context requires a unique tag in order for context-switching to occur correctly between them. [Simple experiments show that it is not so easy to trick the HW into performing a lite-restore with matching logical IDs, though my memory from early Broadwell experiments do suggest that it should be generating lite-restores.] We only need to keep a unique tag for the active lifetime of the context, and for as long as we need to identify that context. The HW uses the tag to determine if it should use a lite-restore (why not the LRCA?) and passes the tag back for various status identifies. The only status we need to track is for OA, so when using perf, we assign the specific context a unique tag. v2: Calculate required number of tags to fill ELSP. Fixes: `976b55f0e1` ("drm/i915: Allow a context to define its set of engines") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk	2019-10-04 15:39:30 +01:00
Chris Wilson	44d0a9c05b	drm/i915/execlists: Skip redundant resubmission If we unwind the active requests, and on resubmission discover that we intend to preempt the active contexts with themselves, simply skip the ELSP submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-1-chris@chris-wilson.co.uk	2019-10-04 12:52:24 +01:00
Chris Wilson	fcde8c7eea	drm/i915/selftests: Exercise potential false lite-restore If execlists's lite-restore is based on the common GEM context tag rather than the per-intel_context LRCA, then a context switch between two intel_contexts on the same engine derived from the same GEM context will perform a lite-restore instead of a full context switch. We can exploit this by poisoning the ringbuffer of the first context and trying to trick a simple RING_TAIL update (i.e. lite-restore) v2: Also check what happens if preempt ce[0] with ce[1] (both instances on the same engine from the same parent context) [Tvrtko] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191002183459.26614-1-chris@chris-wilson.co.uk	2019-10-03 00:26:02 +01:00
Chris Wilson	f8db4d051b	drm/i915: Initialise breadcrumb lists on the virtual engine With deferring the breadcrumb signalling to the virtual engine (thanks preempt-to-busy) we need to make sure the lists and irq-worker are ready to send a signal. [41958.710544] BUG: kernel NULL pointer dereference, address: 0000000000000000 [41958.710553] #PF: supervisor write access in kernel mode [41958.710556] #PF: error_code(0x0002) - not-present page [41958.710558] PGD 0 P4D 0 [41958.710562] Oops: 0002 [#1] SMP [41958.710565] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U 5.3.0+ #207 [41958.710568] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [41958.710602] RIP: 0010:i915_request_enable_breadcrumb+0xe1/0x130 [i915] [41958.710605] Code: 8b 44 24 30 48 89 41 08 48 89 08 48 8b 85 98 01 00 00 48 8d 8d 90 01 00 00 48 89 95 98 01 00 00 49 89 4c 24 28 49 89 44 24 30 <48> 89 10 f0 80 4b 30 10 c6 85 88 01 00 00 00 e9 1a ff ff ff 48 83 [41958.710609] RSP: 0018:ffffc90000003de0 EFLAGS: 00010046 [41958.710612] RAX: 0000000000000000 RBX: ffff888735424480 RCX: ffff8887cddb2190 [41958.710614] RDX: ffff8887cddb3570 RSI: ffff888850362190 RDI: ffff8887cddb2188 [41958.710617] RBP: ffff8887cddb2000 R08: ffff8888503624a8 R09: 0000000000000100 [41958.710619] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8887cddb3548 [41958.710622] R13: 0000000000000000 R14: 0000000000000046 R15: ffff888850362070 [41958.710625] FS: 0000000000000000(0000) GS:ffff88885ea00000(0000) knlGS:0000000000000000 [41958.710628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [41958.710630] CR2: 0000000000000000 CR3: 0000000002c09002 CR4: 00000000001606f0 [41958.710633] Call Trace: [41958.710636] <IRQ> [41958.710668] __i915_request_submit+0x12b/0x160 [i915] [41958.710693] virtual_submit_request+0x67/0x120 [i915] [41958.710720] __unwind_incomplete_requests+0x131/0x170 [i915] [41958.710744] execlists_dequeue+0xb40/0xe00 [i915] [41958.710771] execlists_submission_tasklet+0x10f/0x150 [i915] [41958.710776] tasklet_action_common.isra.17+0x41/0xa0 [41958.710781] __do_softirq+0xc8/0x221 [41958.710785] irq_exit+0xa6/0xb0 [41958.710788] smp_apic_timer_interrupt+0x4d/0x80 [41958.710791] apic_timer_interrupt+0xf/0x20 [41958.710794] </IRQ> Fixes: `cb2377a919` ("drm/i915: Fixup preempt-to-busy vs reset of a virtual request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191001103518.9113-1-chris@chris-wilson.co.uk	2019-10-01 11:46:52 +01:00
Michał Winiarski	e123752374	drm/i915/execlists: Use per-process HWSP as scratch Some of our commands (MI_FLUSH_DW / PIPE_CONTROL) require a post-sync write operation to be performed. Currently we're using dedicated VMA for PIPE_CONTROL and global HWSP for MI_FLUSH_DW. On execlists platforms, each of our contexts has an area that can be used as scratch space. Let's use that instead. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-2-chris@chris-wilson.co.uk	2019-09-26 18:44:35 +01:00
Chris Wilson	f9d4eae25d	drm/i915/execlists: Simplify gen12_csb_parse Having decided that we only care about the promotion predicate, we can simplify gen12_csb_parse to simply check whether we need to jump to a new queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190925130845.17952-1-chris@chris-wilson.co.uk	2019-09-25 19:26:47 +01:00
Chris Wilson	7dc56af526	drm/i915/selftests: Verify the LRC register layout between init and HW Before we submit the first context to HW, we need to construct a valid image of the register state. This layout is defined by the HW and should match the layout generated by HW when it saves the context image. Asserting that this should be equivalent should help avoid any undefined behaviour and verify that we haven't missed anything important! Of course, having insisted that the initial register state within the LRC should match that returned by HW, we need to ensure that it does. v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for constructing the lrc image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk	2019-09-24 17:27:19 +01:00
Chris Wilson	e2144503bf	drm/i915: Prevent bonded requests from overtaking each other on preemption Force bonded requests to run on distinct engines so that they cannot be shuffled onto the same engine where timeslicing will reverse the order. A bonded request will often wait on a semaphore signaled by its master, creating an implicit dependency -- if we ignore that implicit dependency and allow the bonded request to run on the same engine and before its master, we will cause a GPU hang. [Whether it will hang the GPU is debatable, we should keep on timeslicing and each timeslice should be "accidentally" counted as forward progress, in which case it should run but at one-half to one-third speed.] We can prevent this inversion by restricting which engines we allow ourselves to jump to upon preemption, i.e. baking in the arrangement established at first execution. (We should also consider capturing the implicit dependency using i915_sched_add_dependency(), but first we need to think about the constraints that requires on the execution/retirement ordering.) Fixes: `8ee36e048c` ("drm/i915/execlists: Minimalistic timeslicing") References: `ee1136908e` ("drm/i915/execlists: Virtual engine bonding") Testcase: igt/gem_exec_balancer/bonded-slice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk	2019-09-23 20:44:14 +01:00
Chris Wilson	cb2377a919	drm/i915: Fixup preempt-to-busy vs reset of a virtual request Due to the nature of preempt-to-busy the execlists active tracking and the schedule queue may become temporarily desync'ed (between resubmission to HW and its ack from HW). This means that we may have unwound a request and passed it back to the virtual engine, but it is still inflight on the HW and may even result in a GPU hang. If we detect that GPU hang and try to reset, the hanging request->engine will no longer match the current engine, which means that the request is not on the execlists active list and we should not try to find an older incomplete request. Given that we have deduced this must be a request on a virtual engine, it is the single active request in the context and so must be guilty (as the context is still inflight, it is prevented from being executed on another engine as we process the reset). Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk	2019-09-23 20:44:14 +01:00
Chris Wilson	b647c7df01	drm/i915: Fixup preempt-to-busy vs resubmission of a virtual request As preempt-to-busy leaves the request on the HW as the resubmission is processed, that request may complete in the background and even cause a second virtual request to enter queue. This second virtual request breaks our "single request in the virtual pipeline" assumptions. Furthermore, as the virtual request may be completed and retired, we lose the reference the virtual engine assumes is held. Normally, just removing the request from the scheduler queue removes it from the engine, but the virtual engine keeps track of its singleton request via its ve->request. This pointer needs protecting with a reference. v2: Drop unnecessary motion of rq->engine = owner Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk	2019-09-23 20:43:59 +01:00
Chris Wilson	0d7cf7bc15	drm/i915/execlists: Refactor -EIO markup of hung requests Pull setting -EIO on the hung requests into its own utility function. Having allowed ourselves to short-circuit submission of completed requests, we can now do the mark_eio() prior to submission and avoid some redundant operations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-4-chris@chris-wilson.co.uk	2019-09-23 16:21:38 +01:00
Chris Wilson	c0bb487dc1	drm/i915: Only enqueue already completed requests If we are asked to submit a completed request, just move it onto the active-list without modifying it's payload. If we try to emit the modified payload of a completed request, we risk racing with the ring->head update during retirement which may advance the head past our breadcrumb and so we generate a warning for the emission being behind the RING_HEAD. v2: Commentary for the sneaky, shared responsibility between functions. v3: Spelling mistakes and bonus assertion Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk	2019-09-23 16:21:37 +01:00
Chris Wilson	3231f8c011	drm/i915/execlists: Drop redundant list_del_init(&rq->sched.link) Since amalgamating the queued and active lists in commit `422d7df4f0` ("drm/i915: Replace engine->timeline with a plain list"), performing a i915_request_submit() will remove the request from the execlists priority queue. References: `422d7df4f0` ("drm/i915: Replace engine->timeline with a plain list") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-2-chris@chris-wilson.co.uk	2019-09-23 16:21:36 +01:00
Chris Wilson	ae911b23d2	drm/i915/execlists: Relax assertion for a pinned context image on reset A gpu hang can occur at any time, given a sufficiently angry gpu. An example is when it forgets to perform a context-switch at the end of a request, leaving us with a hanging GPU on a completed request. Here, we may retire the request, only leaving its context alive via the active barrier. When we reset the GPU on a completed request, we do not modify its context image (just updating the ring state) and can safely defer the assertion that we have the image pinned and ready to modify. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111639 Fixes: `dffa8feb30` ("drm/i915/perf: Assert locking for i915_init_oa_perf_state()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-1-chris@chris-wilson.co.uk	2019-09-23 16:21:36 +01:00
Chris Wilson	d19d71fc2b	drm/i915: Mark i915_request.timeline as a volatile, rcu pointer The request->timeline is only valid until the request is retired (i.e. before it is completed). Upon retiring the request, the context may be unpinned and freed, and along with it the timeline may be freed. We therefore need to be very careful when chasing rq->timeline that the pointer does not disappear beneath us. The vast majority of users are in a protected context, either during request construction or retirement, where the timeline->mutex is held and the timeline cannot disappear. It is those few off the beaten path (where we access a second timeline) that need extra scrutiny -- to be added in the next patch after first adding the warnings about dangerous access. One complication, where we cannot use the timeline->mutex itself, is during request submission onto hardware (under spinlocks). Here, we want to check on the timeline to finalize the breadcrumb, and so we need to impose a second rule to ensure that the request->timeline is indeed valid. As we are submitting the request, it's context and timeline must be pinned, as it will be used by the hardware. Since it is pinned, we know the request->timeline must still be valid, and we cannot submit the idle barrier until after we release the engine->active.lock, ergo while submitting and holding that spinlock, a second thread cannot release the timeline. v2: Don't be lazy inside selftests; hold the timeline->mutex for as long as we need it, and tidy up acquiring the timeline with a bit of refactoring (i915_active_add_request) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk	2019-09-20 10:24:09 +01:00
Chris Wilson	c45e788d95	drm/i915/tgl: Suspend pre-parser across GTT invalidations Before we execute a batch, we must first issue any and all TLB invalidations so that batch picks up the new page table entries. Tigerlake's preparser is weakening our post-sync CS_STALL inside the invalidate pipe-control and allowing the loading of the batch buffer before we have setup its page table (and so it loads the wrong page and executes indefinitely). The igt_cs_tlb indicates that this issue can only be observed on rcs, even though the preparser is common to all engines. Alternatively, we could do TLB shootdown via mmio on updating the GTT. By inserting the pre-parser disable inside EMIT_INVALIDATE, we will also accidentally fixup execution that writes into subsequent batches, such as gem_exec_whisper and even relocations performed on the GPU. We should be careful not to allow this disable to become baked into the uABI! The issue is that if userspace relies on our disabling of the HW optimisation, when we are ready to enable that optimisation, userspace will then be broken... Testcase: igt/i915_selftests/live_gtt/igt_cs_tlb Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111753 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919151811.9526-1-chris@chris-wilson.co.uk	2019-09-20 09:47:52 +01:00
Chris Wilson	c210e85b8f	drm/i915/tgl: Extend MI_SEMAPHORE_WAIT On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to update the length field and emit that extra parameter and any padding noop as required. v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT v3: Use int instead of bool in the addition so that readers are not left wondering about the intricacies of the C spec. Now they just have to worry what the integer value of a boolean operation is... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk	2019-09-17 15:33:21 +01:00
Chris Wilson	ee73e2795b	drm/i915/tgl: Disable preemption while being debugged We see failures where the context continues executing past a preemption event, eventually leading to situations where a request has executed before we have event submitted it to HW! It seems like tgl is ignoring our RING_TAIL updates, but more likely is that there is a missing update required for our semaphore waits around preemption. v2: And disable internal semaphore usage Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190912132313.12751-1-chris@chris-wilson.co.uk	2019-09-12 20:45:23 +01:00
Chris Wilson	a17592effd	drm/i915/execlists: Ensure the context is reloaded after a GPU reset After we manipulate the context to allow replay after a GPU reset, force that context to be reloaded. This should be a layer of paranoia, for if the GPU was reset, the context will no longer be resident! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-2-chris@chris-wilson.co.uk	2019-09-12 12:59:46 +01:00
Chris Wilson	582a6f90aa	drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset After a GPU reset, we need to drain all the CS events so that we have an accurate picture of the execlists state at the time of the reset. Be paranoid and force a read of the CSB write pointer from memory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-1-chris@chris-wilson.co.uk	2019-09-12 12:59:45 +01:00
Chris Wilson	198d253366	drm/i915/execlists: Ignore lost completion events Icelake hit an issue where it missed reporting a completion event and instead jumped straight to a idle->active event (skipping over the active->idle and not even hitting the lite-restore preemption). 661497511us : process_csb: rcs0 cs-irq head=11, tail=0 661497512us : process_csb: rcs0 csb[0]: status=0x10008002:0x00000020 [lite-restore] 661497512us : trace_ports: rcs0: preempted { 28cc8:11052, 0:0 } 661497513us : trace_ports: rcs0: promote { 28cc8:11054, 0:0 } 661497514us : __i915_request_submit: rcs0 fence 28cc8:11056, current 11052 661497514us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes 661497515us : trace_ports: rcs0: submit { 28cc8:11056, 0:0 } 661497530us : process_csb: rcs0 cs-irq head=0, tail=1 661497530us : process_csb: rcs0 csb[1]: status=0x10008002:0x00000020 [lite-restore] 661497531us : trace_ports: rcs0: preempted { 28cc8:11054!, 0:0 } 661497535us : trace_ports: rcs0: promote { 28cc8:11056, 0:0 } 661497540us : __i915_request_submit: rcs0 fence 28cc8:11058, current 11054 661497544us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes 661497545us : trace_ports: rcs0: submit { 28cc8:11058, 0:0 } 661497553us : process_csb: rcs0 cs-irq head=1, tail=2 661497553us : process_csb: rcs0 csb[2]: status=0x10000001:0x00000000 [idle->active] 661497574us : process_csb: process_csb:1538 GEM_BUG_ON(*execlists->active) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190907084334.28952-1-chris@chris-wilson.co.uk	2019-09-10 11:39:59 +01:00
Chris Wilson	fa9a09f150	drm/i915/execlists: Clear STOP_RING bit on reset During reset, we try to ensure no forward progress of the CS prior to the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this register is context saved and do we end up in the odd situation where we save the STOP_RING bit and so try to stop the engine again immediately upon resume. This is quite unexpected and causes us to complain about an early CS completion event! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111514 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190910080208.4223-1-chris@chris-wilson.co.uk	2019-09-10 11:04:17 +01:00
Chris Wilson	d810583fc2	drm/i915/execlists: Remove incorrect BUG_ON for schedule-out As we may unwind incomplete requests (for preemption) prior to processing the CSB and the schedule-out events, we may update rq->engine (resetting it to point back to the parent virtual engine) prior to calling execlists_schedule_out(), invalidating the assertion that the request still points to the inflight engine. (The likelihood of this is increased if the CSB interrupt processing is pushed to the ksoftirqd for being too slow and direct submission overtakes it.) Tvrtko summarised it as: "So unwind from direct submission resets rq->engine and races with process_csb from the tasklet which notices request has actually completed." Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Fixes: `df40306902` ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk	2019-09-09 11:28:06 +01:00
Michel Thierry	5bf05dc58d	drm/i915/tgl: Register state context definition for Gen12 Gen12 has subtle changes in the reg state context offsets (some fields are gone, some are in a different location), compared to previous Gens. The simplest approach seems to be keeping Gen12 (and future platform) changes apart from the previous gens, while keeping the registers that are contiguous in functions we can reuse. v2: alias, virtual engine, rpcs, prune unused regs v3: use engine base (Daniele), take ctx_bb for all Bspec: 46255 Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Tested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> [ickle: Tweaked the GEM_WARN_ON after settling on a compromise with Daniele] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190906122314.2146-2-mika.kuoppala@linux.intel.com	2019-09-06 18:12:58 +01:00
Mika Kuoppala	cdb736fa8b	drm/i915: Use engine relative LRIs on context setup Daniele pointed out that relative mmio works differently in on context restore. Instead of adding the engine mmio base to offset, it masks out the base and adds bits [12:2] to current engine base. This should allow us to construct context register state to be applicable to all instances, including virtual. And avoid the trouble of updating the registers on virtual instances when submitting work. v2: only enable for gen12 for now (Mika) v3: make enabling readable (Chris) Bspec: 20206 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190906134957.25909-1-mika.kuoppala@linux.intel.com	2019-09-06 18:12:25 +01:00
Chris Wilson	dffa8feb30	drm/i915/perf: Assert locking for i915_init_oa_perf_state() We use the context->pin_mutex to serialise updates to the OA config and the registers values written into each new context. Document this relationship and assert we do hold the context->pin_mutex as used by gen8_configure_all_contexts() to serialise updates to the OA config itself. v2: Add a white-lie for when we call intel_gt_resume() from init. v3: Lie while we have the context pinned inside atomic reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk	2019-08-31 16:08:28 +01:00
Chris Wilson	0b718ba1e8	drm/i915/gtt: Downgrade Cherryview back to aliasing-ppgtt With the upcoming change in timing (dramatically reducing the latency between manipulating the ppGTT and execution), no amount of tweaking could save Cherryview, it would always fail to invalidate its TLB. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190830180000.24608-2-chris@chris-wilson.co.uk	2019-08-30 20:49:56 +01:00
Chris Wilson	11988e3938	drm/i915/execlists: Try rearranging breadcrumb flush The addition of the DC_FLUSH failed to ensure sanctity of the post-sync write as CI immediately got a completion CS-event before the breadcrumb was coherent. So let's try the other idea of moving the post-sync write into the CS_STALL. References: https://bugs.freedesktop.org/show_bug.cgi?id=111514 References: `e8f6b4952e` ("drm/i915/execlists: Flush the post-sync breadcrumb write harder") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190829081150.10271-2-chris@chris-wilson.co.uk	2019-08-29 23:47:36 +01:00
Chris Wilson	e8f6b4952e	drm/i915/execlists: Flush the post-sync breadcrumb write harder Quite rarely we see that the CS completion event fires before the breadcrumb is coherent, which presumably is a result of the CS_STALL not waiting for the post-sync operation. Try throwing in a DC_FLUSH into the following pipecontrol to see if that makes any difference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190827120615.31390-1-chris@chris-wilson.co.uk	2019-08-28 14:05:31 +01:00
Daniele Ceraolo Spurio	8a9a982767	drm/i915: use a separate context for gpu relocs The CS pre-parser can pre-fetch commands across memory sync points and starting from gen12 it is able to pre-fetch across BB_START and BB_END boundaries as well, so when we emit gpu relocs the pre-parser might fetch the target location of the reloc before the memory write lands. The parser can't pre-fetch across the ctx switch, so we use a separate context to guarantee that the memory is synchronized before the parser can get to it. Note that there is no risk of the CS doing a lite restore from the reloc context to the user context, even if the two have the same hw_id, because since gen11 the CS also checks the LRCA when deciding if it can lite-restore. v2: limit new context to gen12+, release in eb_destroy, add a comment in emit_fini_breadcrumb (Chris). Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190827185805.21799-1-daniele.ceraolospurio@intel.com	2019-08-27 21:14:43 +01:00
Chris Wilson	cccdce1dd0	drm/i915: Make engine's batch pool safe for use with virtual engines A virtual engine itself does not have a batch pool, but we can gleefully use any of its siblings instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190827135935.3831-1-chris@chris-wilson.co.uk	2019-08-27 16:42:12 +01:00
Chris Wilson	a20ab592d1	drm/i915/execlists: Set priority hint prior to submission Since we now run process_csb() outside of the engine->active.lock, we can process a CS-event immediately upon our ELSP write. As we currently inspect the pending queue after the ELSP write, there is an opportunity for a CS-event to update the pending queue before we can read it, making ourselves chases an invalid pointer. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111427 Fixes: `df40306902` ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190821142336.21609-1-chris@chris-wilson.co.uk	2019-08-21 17:32:27 +01:00
Lucas De Marchi	13e53c5c53	drm/i915/tgl: Introduce initial Tiger Lake workarounds Add empty workaround hooks for Tiger Lake. The workarounds will be added on separate patches. We were already applying WaRsForcewakeAddDelayForAck, which is indeed still valid, so also update the comment. Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-21-lucas.demarchi@intel.com	2019-08-20 15:23:33 +01:00
Daniele Ceraolo Spurio	f4785682c9	drm/i915/tgl: Gen12 csb support The CSB format has been reworked for Gen12 to include information on both the context we're switching away from and the context we're switching to. After the change, some of the events don't have their own bit anymore and need to be inferred from other values in the csb. One of the context IDs (0x7FF) has also been reserved to indicate the invalid ctx, i.e. engine idle. Note that the full context ID includes the SW counter as well, but since we currently only care if the context is valid or not we can ignore that part. v2: fix mask size, fix and expand comments (Tvrtko), use if-ladder (Chris) Bspec: 45555, 46144 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190820102201.29849-1-chris@chris-wilson.co.uk	2019-08-20 15:23:24 +01:00
Daniele Ceraolo Spurio	487f471da3	drm/i915/tgl: add Gen12 default indirect ctx offset Gen12 uses a new indirect ctx offset. Bspec: 11740 Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-28-lucas.demarchi@intel.com	2019-08-20 14:23:45 +01:00
Chris Wilson	9559c87513	drm/i915/selftests: Check the context size Add a redzone to our context image and check the HW does not write into after a context save, to verify that we have the correct context size. (This does vary with feature bits, so test with a live setup that should match how we run userspace.) v2: Check the redzone on every context unpin v3: Use a kernel context to prevent loading garbage for ringbuffer submission Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190817073711.5897-1-chris@chris-wilson.co.uk	2019-08-17 09:27:58 +01:00
Chris Wilson	df40306902	drm/i915/execlists: Lift process_csb() out of the irq-off spinlock If we only call process_csb() from the tasklet, though we lose the ability to bypass ksoftirqd interrupt processing on direct submission paths, we can push it out of the irq-off spinlock. The penalty is that we then allow schedule_out to be called concurrently with schedule_in requiring us to handle the usage count (baked into the pointer itself) atomically. As we do kick the tasklets (via local_bh_enable()) after our submission, there is a possibility there to see if we can pull the local softirq processing back from the ksoftirqd. v2: Store the 'switch_priority_hint' on submission, so that we can safely check during process_csb(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190816171608.11760-1-chris@chris-wilson.co.uk	2019-08-16 20:59:02 +01:00
Chris Wilson	e5dadff4b0	drm/i915: Protect request retirement with timeline->mutex Forgo the struct_mutex requirement for request retirement as we have been transitioning over to only using the timeline->mutex for controlling the lifetime of a request on that timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk	2019-08-15 23:21:13 +01:00
Chris Wilson	531958f6f3	drm/i915/gt: Track timeline activeness in enter/exit Lift moving the timeline to/from the active_list on enter/exit in order to shorten the active tracking span in comparison to the existing pin/unpin. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-1-chris@chris-wilson.co.uk	2019-08-15 23:16:05 +01:00
Mika Kuoppala	845f7f7ecb	drm/i915/icl: Add gen11 specific render breadcrumbs Flush according to what gen11 expects when writing breadcrumbs. As only the seqnowrite + flush differs between engine and gens, enclose the footer to helper. v2: avoid problem of sane local naming by not using them Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190815094929.358-1-mika.kuoppala@linux.intel.com	2019-08-15 13:13:23 +01:00
Mika Kuoppala	8a8b540a6d	drm/i915/icl: Add command cache invalidate On the set of invalidations, we need to add command cache invalidate as a new domain. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190815083055.14132-2-mika.kuoppala@linux.intel.com	2019-08-15 13:13:23 +01:00
Mika Kuoppala	cfba6bd8b0	drm/i915/icl: Implement gen11 flush including tile cache Add tile cache flushing for gen11. To relive us from the burden of previous obsolete workarounds, make a dedicated flush/invalidate callback for gen11. To fortify an independent single flush, do post sync op as there are indications that without it we don't flush everything. This should also make this callback more readily usable in tgl (see l3 fabric flush). v2: whitespacing Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190815083055.14132-1-mika.kuoppala@linux.intel.com	2019-08-15 13:13:23 +01:00
Chris Wilson	5f15c1e6e1	drm/i915/guc: Use a local cancel_port_requests Since execlists and the guc have diverged in their port tracking, we cannot simply reuse the execlists cancellation code as it leads to unbalanced reference counting. Use a local, simpler routine for the guc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190812203626.3948-1-chris@chris-wilson.co.uk	2019-08-13 07:54:39 +01:00
Chris Wilson	f597625d12	drm/i915/execlists: Avoid sync calls during park Since we allow ourselves to use non-process context during parking, we cannot allow ourselves to sleep and in particular cannot call del_timer_sync() -- but we can use a plain del_timer(). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111375 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190812091045.29587-1-chris@chris-wilson.co.uk	2019-08-12 13:17:59 +01:00
Chris Wilson	75d0a7f31e	drm/i915: Lift timeline into intel_context Move the timeline from being inside the intel_ring to intel_context itself. This saves much pointer dancing and makes the relations of the context to its timeline much clearer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk	2019-08-09 20:18:30 +01:00
Chris Wilson	48ae397b6b	drm/i915: Push the ring creation flags to the backend Push the ring creation flags from the outer GEM context to the inner intel_context to avoid an unsightly back-reference from inside the backend. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-3-chris@chris-wilson.co.uk	2019-08-09 20:18:30 +01:00
Chris Wilson	4c60b1aaa2	drm/i915/gt: Make deferred context allocation explicit Refactor the backends to handle the deferred context allocation in a consistent manner, and allow calling it as an explicit first step in pinning a context for the first time. This should make it easier for backends to keep track of partially constructed contexts from initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-2-chris@chris-wilson.co.uk	2019-08-09 20:18:30 +01:00
Chris Wilson	6cd34b10cd	drm/i915/execlists: Backtrack along timeline After a preempt-to-busy, we may find an active request that is caught between execution states. Walk back along the timeline instead of the execution list to be safe. [ 106.417541] i915 0000:00:02.0: Resetting rcs0 for preemption time out [ 106.417659] ================================================================== [ 106.418041] BUG: KASAN: slab-out-of-bounds in __execlists_reset+0x2f2/0x440 [i915] [ 106.418123] Read of size 8 at addr ffff888703506b30 by task swapper/1/0 [ 106.418194] [ 106.418267] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U 5.3.0-rc3+ #5 [ 106.418344] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 106.418434] Call Trace: [ 106.418508] <IRQ> [ 106.418585] dump_stack+0x5b/0x90 [ 106.418941] ? __execlists_reset+0x2f2/0x440 [i915] [ 106.419022] print_address_description+0x67/0x32d [ 106.419376] ? __execlists_reset+0x2f2/0x440 [i915] [ 106.419731] ? __execlists_reset+0x2f2/0x440 [i915] [ 106.419810] __kasan_report.cold.6+0x1a/0x3c [ 106.419888] ? __trace_bprintk+0xc0/0xd0 [ 106.420239] ? __execlists_reset+0x2f2/0x440 [i915] [ 106.420318] check_memory_region+0x144/0x1c0 [ 106.420671] __execlists_reset+0x2f2/0x440 [i915] [ 106.421029] execlists_reset+0x3d/0x50 [i915] [ 106.421387] intel_engine_reset+0x203/0x3a0 [i915] [ 106.421744] ? igt_reset_nop+0x2b0/0x2b0 [i915] [ 106.421825] ? _raw_spin_trylock_bh+0xe0/0xe0 [ 106.421901] ? rcu_core+0x1b9/0x6a0 [ 106.422251] preempt_reset+0x9a/0xf0 [i915] [ 106.422333] tasklet_action_common.isra.15+0xc0/0x1e0 [ 106.422685] ? execlists_submit_request+0x200/0x200 [i915] [ 106.422764] __do_softirq+0x106/0x3cf [ 106.422840] irq_exit+0xdc/0xf0 [ 106.422914] smp_apic_timer_interrupt+0x81/0x1c0 [ 106.422988] apic_timer_interrupt+0xf/0x20 [ 106.423059] </IRQ> [ 106.423144] RIP: 0010:cpuidle_enter_state+0xc3/0x620 [ 106.423222] Code: 24 0f 1f 44 00 00 31 ff e8 da 87 9c ff 80 7c 24 10 00 74 12 9c 58 f6 c4 02 0f 85 33 05 00 00 31 ff e8 c1 77 a3 ff fb 45 85 e4 <0f> 89 bf 02 00 00 48 8d 7d 10 e8 4e 45 b9 ff c7 45 10 00 00 00 00 [ 106.423311] RSP: 0018:ffff88881c30fda8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 [ 106.423390] RAX: 0000000000000000 RBX: ffffffff825b4c80 RCX: ffffffff810c8a00 [ 106.423465] RDX: dffffc0000000000 RSI: 0000000039f89620 RDI: ffff88881f6b00a8 [ 106.423540] RBP: ffff88881f6b5bf8 R08: 0000000000000002 R09: 000000000002ed80 [ 106.423616] R10: 0000003fdd956146 R11: ffff88881c2d1e47 R12: 0000000000000008 [ 106.423691] R13: 0000000000000008 R14: ffffffff825b4f80 R15: ffffffff825b4fc0 [ 106.423772] ? sched_idle_set_state+0x20/0x30 [ 106.423851] ? cpuidle_enter_state+0xa6/0x620 [ 106.423874] ? tick_nohz_idle_stop_tick+0x1d1/0x3f0 [ 106.423896] cpuidle_enter+0x37/0x60 [ 106.423919] do_idle+0x246/0x280 [ 106.423941] ? arch_cpu_idle_exit+0x30/0x30 [ 106.423964] ? __wake_up_common+0x46/0x240 [ 106.423986] cpu_startup_entry+0x14/0x20 [ 106.424009] start_secondary+0x1b0/0x200 [ 106.424031] ? set_cpu_sibling_map+0x990/0x990 [ 106.424054] secondary_startup_64+0xa4/0xb0 [ 106.424075] [ 106.424096] Allocated by task 626: [ 106.424119] save_stack+0x19/0x80 [ 106.424143] __kasan_kmalloc.constprop.7+0xc1/0xd0 [ 106.424165] kmem_cache_alloc+0xb2/0x1d0 [ 106.424277] i915_sched_lookup_priolist+0x1ab/0x320 [i915] [ 106.424385] execlists_submit_request+0x73/0x200 [i915] [ 106.424498] submit_notify+0x59/0x60 [i915] [ 106.424600] __i915_sw_fence_complete+0x9b/0x330 [i915] [ 106.424713] __i915_request_commit+0x4bf/0x570 [i915] [ 106.424818] intel_engine_pulse+0x213/0x310 [i915] [ 106.424925] context_close+0x22f/0x470 [i915] [ 106.425033] i915_gem_context_destroy_ioctl+0x7b/0xa0 [i915] [ 106.425058] drm_ioctl_kernel+0x131/0x170 [ 106.425081] drm_ioctl+0x2d9/0x4f1 [ 106.425104] do_vfs_ioctl+0x115/0x890 [ 106.425126] ksys_ioctl+0x35/0x70 [ 106.425147] __x64_sys_ioctl+0x38/0x40 [ 106.425169] do_syscall_64+0x66/0x220 [ 106.425191] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 106.425213] [ 106.425234] Freed by task 0: [ 106.425255] (stack is not available) [ 106.425276] [ 106.425297] The buggy address belongs to the object at ffff888703506a40 [ 106.425297] which belongs to the cache i915_priolist of size 104 [ 106.425321] The buggy address is located 136 bytes to the right of [ 106.425321] 104-byte region [ffff888703506a40, ffff888703506aa8) [ 106.425345] The buggy address belongs to the page: [ 106.425367] page:ffffea001c0d4180 refcount:1 mapcount:0 mapping:ffff88873e1cf740 index:0xffff888703506e40 compound_mapcount: 0 [ 106.425391] flags: 0x8000000000010200(slab\|head) [ 106.425415] raw: 8000000000010200 ffffea0020192b88 ffff8888174b5450 ffff88873e1cf740 [ 106.425439] raw: ffff888703506e40 000000000010000e 00000001ffffffff 0000000000000000 [ 106.425464] page dumped because: kasan: bad access detected [ 106.425486] [ 106.425506] Memory state around the buggy address: [ 106.425528] ffff888703506a00: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 [ 106.425551] ffff888703506a80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc [ 106.425573] >ffff888703506b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 106.425597] ^ [ 106.425619] ffff888703506b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 106.425642] ffff888703506c00: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 [ 106.425664] ================================================================== Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809073723.6593-1-chris@chris-wilson.co.uk	2019-08-09 13:32:29 +01:00
Jani Nikula	db94e9f133	drm/i915: extract i915_perf.h from i915_drv.h It used to be handy that we only had a couple of headers, but over time i915_drv.h has become unwieldy. Extract declarations to a separate header file corresponding to the implementation module, clarifying the modularity of the driver. Ensure the new header is self-contained, and do so with minimal further includes, using forward declarations as needed. Include the new header only where needed, and sort the modified include directives while at it and as needed. No functional changes. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/d7826e365695f691a3ac69a69ff6f2bbdb62700d.1565271681.git.jani.nikula@intel.com	2019-08-09 11:52:04 +03:00
Chris Wilson	c7302f2044	drm/i915: Defer final intel_wakeref_put to process context As we need to acquire a mutex to serialise the final intel_wakeref_put, we need to ensure that we are in process context at that time. However, we want to allow operation on the intel_wakeref from inside timer and other hardirq context, which means that need to defer that final put to a workqueue. Inside the final wakeref puts, we are safe to operate in any context, as we are simply marking up the HW and state tracking for the potential sleep. It's only the serialisation with the potential sleeping getting that requires careful wait avoidance. This allows us to retain the immediate processing as before (we only need to sleep over the same races as the current mutex_lock). v2: Add a selftest to ensure we exercise the code while lockdep watches. v3: That test was extremely loud and complained about many things! v4: Not a whale! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111295 References: https://bugs.freedesktop.org/show_bug.cgi?id=111245 References: https://bugs.freedesktop.org/show_bug.cgi?id=111256 Fixes: `18398904ca` ("drm/i915: Only recover active engines") Fixes: `51fbd8de87` ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190808202758.10453-1-chris@chris-wilson.co.uk	2019-08-08 21:28:51 +01:00
Jani Nikula	a09d9a8002	drm/i915: avoid including intel_drv.h via i915_drv.h->i915_trace.h Disentangle i915_drv.h from intel_drv.h, which gets included via i915_trace.h. This necessitates including i915_trace.h wherever it's needed. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ed82bf259d3b725a1a1a3c3e9d6fb5c08bc4d489.1565085691.git.jani.nikula@intel.com	2019-08-07 12:43:14 +03:00
Chris Wilson	a1c9ca223c	drm/i915: Remove lrc default desc from GEM context We only compute the lrc_descriptor() on pinning the context, i.e. infrequently, so we do not benefit from storing the template as the addressing mode is also fixed for the lifetime of the intel_context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730133035.1977-9-chris@chris-wilson.co.uk	2019-08-01 17:37:02 +01:00
Chris Wilson	10e36489ab	drm/i915/execlists: Always clear pending&inflight requests on reset If we skip the reset as we found the engine inactive at the time of the reset, we still need to clear the residual inflight & pending request bookkeeping to reflect the current state of HW. Otherwise, we may end up stuck in a loop like: <7> [416.490346] hangcheck rcs0 <7> [416.490371] hangcheck Awake? 1 <7> [416.490376] hangcheck Hangcheck: 8003 ms ago <7> [416.490380] hangcheck Reset count: 0 (global 0) <7> [416.490383] hangcheck Requests: <7> [416.491210] hangcheck RING_START: 0x0017b000 <7> [416.491983] hangcheck RING_HEAD: 0x00000048 <7> [416.491992] hangcheck RING_TAIL: 0x00000048 <7> [416.492006] hangcheck RING_CTL: 0x00000000 <7> [416.492037] hangcheck RING_MODE: 0x00000200 [idle] <7> [416.492044] hangcheck RING_IMR: 00000000 <7> [416.492809] hangcheck ACTHD: 0x00000000_9ca00048 <7> [416.492824] hangcheck BBADDR: 0x00000000_00001004 <7> [416.492838] hangcheck DMA_FADDR: 0x00000000_00000000 <7> [416.492845] hangcheck IPEIR: 0x00000000 <7> [416.492852] hangcheck IPEHR: 0x00000000 <7> [416.492863] hangcheck Execlist status: 0x00018001 00000000, entries 12 <7> [416.492869] hangcheck Execlist CSB read 1, write 1, tasklet queued? no (enabled) <7> [416.492938] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 8307ms: signaled <7> [416.492972] hangcheck Queue priority hint: -4093 <7> [416.492979] hangcheck Q 20ffa:16fd8- prio=-4093 @ 8307ms: [i915] <7> [416.492985] hangcheck Q 20ffa:16fda prio=-4094 @ 8307ms: [i915] <7> [416.492990] hangcheck Q 20ffa:16fdc prio=-4094 @ 8307ms: [i915] <7> [416.492996] hangcheck Q 20ffa:16fde prio=-4094 @ 8307ms: [i915] <7> [416.493001] hangcheck Q 20ffa:16fe0 prio=-4094 @ 8307ms: [i915] <7> [416.493007] hangcheck Q 20ffa:16fe2 prio=-4094 @ 8307ms: [i915] <7> [416.493013] hangcheck Q 20ffa:16fe4 prio=-4094 @ 8307ms: [i915] <7> [416.493021] hangcheck ...skipping 21 queued requests... <7> [416.493027] hangcheck Q 20ffa:17010 prio=-4094 @ 8307ms: [i915] <7> [416.493081] hangcheck HWSP: <7> [416.493089] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [416.493094] hangcheck * <7> [416.493100] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000 <7> [416.493106] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000 <7> [416.493111] hangcheck * <7> [416.493117] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 <7> [416.493123] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [416.493127] hangcheck * <7> [416.493132] hangcheck Idle? no <6> [416.512124] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, hang on rcs0 <6> [416.512205] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. <6> [416.512207] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel <6> [416.512208] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. <6> [416.512210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. <6> [416.512212] [drm] GPU crash dump saved to /sys/class/drm/card0/error <5> [416.513602] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 <7> [424.489258] hangcheck rcs0 <7> [424.489263] hangcheck Awake? 1 <7> [424.489267] hangcheck Hangcheck: 5954 ms ago <7> [424.489271] hangcheck Reset count: 1 (global 0) <7> [424.489274] hangcheck Requests: <7> [424.490128] hangcheck RING_START: 0x00000000 <7> [424.490870] hangcheck RING_HEAD: 0x00000000 <7> [424.490877] hangcheck RING_TAIL: 0x00000000 <7> [424.490887] hangcheck RING_CTL: 0x00000000 <7> [424.490897] hangcheck RING_MODE: 0x00000200 [idle] <7> [424.490904] hangcheck RING_IMR: 00000000 <7> [424.490917] hangcheck ACTHD: 0x00000000_00000000 <7> [424.490930] hangcheck BBADDR: 0x00000000_00000000 <7> [424.490943] hangcheck DMA_FADDR: 0x00000000_00000000 <7> [424.490950] hangcheck IPEIR: 0x00000000 <7> [424.490956] hangcheck IPEHR: 0x00000000 <7> [424.490968] hangcheck Execlist status: 0x00000001 00000000, entries 12 <7> [424.490972] hangcheck Execlist CSB read 11, write 11, tasklet queued? no (enabled) <7> [424.490983] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 16305ms: signaled <7> [424.490989] hangcheck Queue priority hint: -4093 <7> [424.490996] hangcheck Q 20ffa:16fd8- prio=-4093 @ 16305ms: [i915] <7> [424.491001] hangcheck Q 20ffa:16fda prio=-4094 @ 16305ms: [i915] <7> [424.491006] hangcheck Q 20ffa:16fdc prio=-4094 @ 16305ms: [i915] <7> [424.491011] hangcheck Q 20ffa:16fde prio=-4094 @ 16305ms: [i915] <7> [424.491016] hangcheck Q 20ffa:16fe0 prio=-4094 @ 16305ms: [i915] <7> [424.491022] hangcheck Q 20ffa:16fe2 prio=-4094 @ 16305ms: [i915] <7> [424.491048] hangcheck Q 20ffa:16fe4 prio=-4094 @ 16305ms: [i915] <7> [424.491057] hangcheck ...skipping 21 queued requests... <7> [424.491063] hangcheck Q 20ffa:17010 prio=-4094 @ 16305ms: [i915] <7> [424.491095] hangcheck HWSP: <7> [424.491102] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [424.491106] hangcheck * <7> [424.491113] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000 <7> [424.491118] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000 <7> [424.491122] hangcheck * <7> [424.491127] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000b <7> [424.491133] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [424.491136] hangcheck * <7> [424.491141] hangcheck Idle? no <5> [424.491834] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 Where not having cleared the pending array on reset, it persists indefinitely. Fixes: `fff8102aae` ("drm/i915/execlists: Process interrupted context on reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730133035.1977-2-chris@chris-wilson.co.uk	2019-08-01 09:24:59 +01:00
Chris Wilson	f5d974f9d2	drm/i915/gt: Provide a local intel_context.vm Track the currently bound address space used by the HW context. Minor conversions to use the local intel_context.vm are made, leaving behind some more surgery required to make intel_context the primary through the selftests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730143209.4549-2-chris@chris-wilson.co.uk	2019-07-30 16:09:35 +01:00
Chris Wilson	a562772166	drm/i915: Inline engine->init_context into its caller We only use the init_context vfunc once while recording the default context state, and we use the same sequence in each backend (eliding steps that do not apply). Remove the vfunc for simplicity and de-duplication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190729113720.24830-1-chris@chris-wilson.co.uk	2019-07-30 11:50:42 +01:00
Chris Wilson	df8cf31e74	drm/i915/gt: Hook up intel_context_fini() Prior to freeing the struct, call the fini function to cleanup the common members. Currently this only calls the debug functions to mark the structs as destroyed, but may be extended to real work in future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190718070024.21781-2-chris@chris-wilson.co.uk	2019-07-22 23:20:07 +01:00
Chris Wilson	7d6b60dbc6	drm/i915/execlists: Cancel breadcrumb on preempting the virtual engine As we unwind the requests for a preemption event, we return a virtual request back to its original virtual engine (so that it is available for execution on any of its siblings). In the process, this means that its breadcrumb should no longer be associated with the original physical engine, and so we are forced to decouple it. Previously, as the request could not complete without our awareness, we would move it to the next real engine without any danger. However, preempt-to-busy allowed for requests to continue on the HW and complete in the background as we unwound, which meant that we could end up retiring the request before fixing up the breadcrumb link. [51679.517943] INFO: trying to register non-static key. [51679.517956] the code is fine but needs lockdep annotation. [51679.517960] turning off the locking correctness validator. [51679.517966] CPU: 0 PID: 3270 Comm: kworker/u8:0 Tainted: G U 5.2.0+ #717 [51679.517971] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [51679.518012] Workqueue: i915 retire_work_handler [i915] [51679.518017] Call Trace: [51679.518026] dump_stack+0x67/0x90 [51679.518031] register_lock_class+0x52c/0x540 [51679.518038] ? find_held_lock+0x2d/0x90 [51679.518042] __lock_acquire+0x68/0x1800 [51679.518047] ? find_held_lock+0x2d/0x90 [51679.518073] ? __i915_sw_fence_complete+0xff/0x1c0 [i915] [51679.518079] lock_acquire+0x90/0x170 [51679.518105] ? i915_request_cancel_breadcrumb+0x29/0x160 [i915] [51679.518112] _raw_spin_lock+0x27/0x40 [51679.518138] ? i915_request_cancel_breadcrumb+0x29/0x160 [i915] [51679.518165] i915_request_cancel_breadcrumb+0x29/0x160 [i915] [51679.518199] i915_request_retire+0x43f/0x530 [i915] [51679.518232] retire_requests+0x4d/0x60 [i915] [51679.518263] i915_retire_requests+0xdf/0x1f0 [i915] [51679.518294] retire_work_handler+0x4c/0x60 [i915] [51679.518301] process_one_work+0x22c/0x5c0 [51679.518307] worker_thread+0x37/0x390 [51679.518311] ? process_one_work+0x5c0/0x5c0 [51679.518316] kthread+0x116/0x130 [51679.518320] ? kthread_create_on_node+0x40/0x40 [51679.518325] ret_from_fork+0x24/0x30 [51679.520177] ------------[ cut here ]------------ [51679.520189] list_del corruption, ffff88883675e2f0->next is LIST_POISON1 (dead000000000100) Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-4-chris@chris-wilson.co.uk	2019-07-19 12:53:29 +01:00
Chris Wilson	c30d5dc653	drm/i915/gt: Push engine stopping into reset-prepare Push the engine stop into the back reset_prepare (where it already was!) This allows us to avoid dangerously setting the RING registers to 0 for logical contexts. If we clear the register on a live context, those invalid register values are recorded in the logical context state and replayed (with hilarious results). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-2-chris@chris-wilson.co.uk	2019-07-17 18:47:00 +01:00
Chris Wilson	fff8102aae	drm/i915/execlists: Process interrupted context on reset By stopping the rings, we may trigger an arbitration point resulting in a premature context-switch (i.e. a completion event before the request is actually complete). This clears the active context before the reset, but we must remember to rewind the incomplete context for replay upon resume. Fixes: `1863e3020a` ("drm/i915/execlists: Always reset the context's RING registers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-3-chris@chris-wilson.co.uk	2019-07-17 14:44:57 +01:00
Chris Wilson	a9877da2d6	drm/i915/oa: Reconfigure contexts on the fly Avoid a global idle barrier by reconfiguring each context by rewriting them with MI_STORE_DWORD from the kernel context. v2: We only need to determine the desired register values once, they are the same for all contexts. v3: Don't remove the kernel context from the list of known GEM contexts; the world is not ready for that yet. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk	2019-07-17 07:58:27 +01:00
Chris Wilson	09975b861a	drm/i915/execlists: Disable preemption under GVT Preempt-to-busy uses a GPU semaphore to enforce an idle-barrier across preemption, but mediated gvt does not fully support semaphores. v2: Fiddle around with the flags and settle on using has-semaphores for the core bits so that we retain the ability to preempt our own semaphores. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Xiaolin Zhang <xiaolin.zhang@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190709091233.8573-1-chris@chris-wilson.co.uk	2019-07-16 14:06:45 +01:00
Chris Wilson	cb823ed991	drm/i915/gt: Use intel_gt as the primary object for handling resets Having taken the first step in encapsulating the functionality by moving the related files under gt/, the next step is to start encapsulating by passing around the relevant structs rather than the global drm_i915_private. In this step, we pass intel_gt to intel_reset.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk	2019-07-12 21:06:56 +01:00
Chris Wilson	58d1b42714	drm/i915/execlists: Record preemption for selftests Put back the preemption counters lost in commit `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") so that our selftests that assert no preemption took place continue to function. v2: But a timeslice is only a "soft" preemption! Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190710064454.682-1-chris@chris-wilson.co.uk	2019-07-10 08:46:35 +01:00
Lionel Landwerlin	2a98f4e65b	drm/i915: add infrastructure to hold off preemption on a request We want to set this flag in the next commit on requests containing perf queries so that the result of the perf query can just be a delta of global counters, rather than doing post processing of the OA buffer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [ickle: add basic selftest for nopreempt] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190709164227.25859-1-chris@chris-wilson.co.uk	2019-07-09 21:26:40 +01:00
Lionel Landwerlin	46c5847e3d	drm/i915: enumerate scratch fields We have a bunch of offsets in the scratch buffer. As we're about to add some more, let's group all of the offsets in a common location. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-6-lionel.g.landwerlin@intel.com	2019-07-09 21:26:40 +01:00
Chris Wilson	ab9e2f7776	drm/i915/gt: Pull engine w/a initialisation into common We need to setup the workarounds on all engines, with the knowledge about which platforms each workaround applies to kept together in the workaround list. As such, we can pull the w/a initialisation into the common setup and try to avoid duplicating knowledge about when to setup the workarounds. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190703135805.7310-2-chris@chris-wilson.co.uk	2019-07-04 19:22:11 +01:00
Chris Wilson	313443b16a	drm/i915/gt: Assume we hold forcewake for execlists resume We can assume the caller is holding a blanket forcewake for the register writes during resume, and so we can skip taking individual locks around each write inside execlists resume. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190703155225.9501-3-chris@chris-wilson.co.uk	2019-07-04 14:42:38 +01:00
Chris Wilson	2006058e99	drm/i915: Move the renderstate setup under gt/ The render state is used to initialise the default RCS context, and only used during early setup from within the gt code. As such, it makes a good candidate for placing within gt/, even if it is not yet entirely clean of our GEM heritage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190704091925.7391-1-chris@chris-wilson.co.uk	2019-07-04 11:48:22 +01:00
Chris Wilson	ad9e3792b0	drm/i915/execlists: Hesitate before slicing Be a little more hesitant before injecting a timeslice, and try to take into account any change in priority that is due for the running task before switching to another task. This will allow us to arbitrarily prevent switching away from a request if we deem it necessarily to disable preemption, for instance. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-9-chris@chris-wilson.co.uk	2019-07-03 11:20:35 +01:00
Chris Wilson	8759aa4cc1	drm/i915/execlists: Refactor CSB state machine Daniele pointed out that the CSB status information will change with Tigerlake and suggested that we could rearrange our state machine to hide the differences in generation. gcc also prefers the explicit state machine, so make it so: process_csb 1980 1967 -13 Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190701100502.15639-4-chris@chris-wilson.co.uk	2019-07-01 17:23:54 +01:00
Chris Wilson	5f22e5b311	drm/i915: Rename intel_wakeref_[is]_active Our general rule is to use is/has as the verb for boolean functions, rename intel_wakeref_active to intel_wakeref_is_active so the question being asked is clear. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190625130128.11009-6-chris@chris-wilson.co.uk	2019-06-25 20:17:22 +01:00
Chris Wilson	07bfe6bf10	drm/i915/execlists: Convert recursive defer_request() into iterative As this engine owns the lock around rq->sched.link (for those waiters submitted to this engine), we can use that link as an element in a local list. We can thus replace the recursive algorithm with an iterative walk over the ordered list of waiters. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190625130128.11009-1-chris@chris-wilson.co.uk	2019-06-25 20:17:22 +01:00
Chris Wilson	8db7933ee3	drm/i915/execlists: Always clear ring_pause if we do not submit In the unlikely case (thank you CI!), we may find ourselves wanting to issue a preemption but having no runnable requests left. In this case, we set the semaphore before computing the preemption and so must unset it before forgetting (or else we leave the machine busywaiting until the next request comes along and so likely hang). v2: Replace readback with only a wmb after asserting the semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190624092009.30189-1-chris@chris-wilson.co.uk	2019-06-24 11:42:37 +01:00
Chris Wilson	12c255b5da	drm/i915: Provide an i915_active.acquire callback If we introduce a callback for i915_active that is only called the first time we use the i915_active and is symmetrically paired with the i915_active.retire callback, we can replace the open-coded and non-atomic implementations -- which will be very fragile (i.e. broken) upon removing the struct_mutex serialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-4-chris@chris-wilson.co.uk	2019-06-21 19:47:55 +01:00
Chris Wilson	a93615f900	drm/i915: Throw away the active object retirement complexity Remove the accumulated optimisations that we have for i915_vma_retire and reduce it to the bare essential of tracking the active object reference. This allows us to only use atomic operations, and so will be able to avoid the struct_mutex requirement. The principal loss here is the shrinker MRU bumping, so now if we have to shrink, we will do so in much more random order and more likely to try and shrink recently used objects. That is a nuisance, but shrinking active objects is a second step we try to avoid and will always be a system-wide performance issue. The other loss is here is in the automatic pruning of the reservation_object when idling. This is not as large an issue as upon reservation_object introduction as now adding new fences into the object replaces already signaled fences, keeping the array compact. But we do lose the auto-expiration of stale fences and unused arrays. That may be a noticeable problem for which we need to re-implement autopruning. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-3-chris@chris-wilson.co.uk	2019-06-21 19:47:51 +01:00
Tvrtko Ursulin	db56f97494	drm/i915: Eliminate dual personality of i915_scratch_offset Scratch vma lives under gt but the API used to work on i915. Make this consistent by renaming the function to intel_gt_scratch_offset and make it take struct intel_gt. v2: * Move to intel_gt. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-33-tvrtko.ursulin@linux.intel.com	2019-06-21 13:49:00 +01:00
Tvrtko Ursulin	f0c02c1b91	drm/i915: Rename i915_timeline to intel_timeline and move under gt Move all timeline code under gt and rename to intel_gt prefix. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com	2019-06-21 13:48:53 +01:00
Tvrtko Ursulin	4c6d51ea2a	drm/i915: Make timelines gt centric Our timelines are stored inside intel_gt so we can convert the interface to take exactly that and not i915. At the same time re-order the params to our more typical layout and replace the backpointer to the new containing structure. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-31-tvrtko.ursulin@linux.intel.com	2019-06-21 13:48:51 +01:00
Tvrtko Ursulin	ba4134a419	drm/i915: Save trip via top-level i915 in a few more places For gt related operations it makes more logical sense to stay in the realm of gt instead of dereferencing via driver i915. This patch handles a few of the easy ones with work requiring more refactoring still outstanding. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-30-tvrtko.ursulin@linux.intel.com	2019-06-21 13:48:48 +01:00
Tvrtko Ursulin	f937f5613b	drm/i915: Store backpointer to intel_gt in the engine It will come useful in the next patch. v2: * Do mock_engine as well. v3: * And the virtual engine... Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-11-tvrtko.ursulin@linux.intel.com	2019-06-21 13:48:26 +01:00
Chris Wilson	12fdaf19e0	drm/i915/execlists: Keep virtual context alive until after we kick The call to kick_siblings() dereferences the rq->context, so we should not drop our local reference until afterwards! v2: Stick to setting ce.inflight=NULL before kicking as this is what the other threads will check to see if the context is ready for takeover. Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621080729.2652-1-chris@chris-wilson.co.uk	2019-06-21 10:11:05 +01:00
Chris Wilson	8ee36e048c	drm/i915/execlists: Minimalistic timeslicing If we have multiple contexts of equal priority pending execution, activate a timer to demote the currently executing context in favour of the next in the queue when that timeslice expires. This enforces fairness between contexts (so long as they allow preemption -- forced preemption, in the future, will kick those who do not obey) and allows us to avoid userspace blocking forward progress with e.g. unbounded MI_SEMAPHORE_WAIT. For the starting point here, we use the jiffie as our timeslice so that we should be reasonably efficient wrt frequent CPU wakeups. Testcase: igt/gem_exec_scheduler/semaphore-resolve Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-2-chris@chris-wilson.co.uk	2019-06-20 16:52:36 +01:00
Chris Wilson	22b7a426bb	drm/i915/execlists: Preempt-to-busy When using a global seqno, we required a precise stop-the-workd event to handle preemption and unwind the global seqno counter. To accomplish this, we would preempt to a special out-of-band context and wait for the machine to report that it was idle. Given an idle machine, we could very precisely see which requests had completed and which we needed to feed back into the run queue. However, now that we have scrapped the global seqno, we no longer need to precisely unwind the global counter and only track requests by their per-context seqno. This allows us to loosely unwind inflight requests while scheduling a preemption, with the enormous caveat that the requests we put back on the run queue are still _inflight_ (until the preemption request is complete). This makes request tracking much more messy, as at any point then we can see a completed request that we believe is not currently scheduled for execution. We also have to be careful not to rewind RING_TAIL past RING_HEAD on preempting to the running context, and for this we use a semaphore to prevent completion of the request before continuing. To accomplish this feat, we change how we track requests scheduled to the HW. Instead of appending our requests onto a single list as we submit, we track each submission to ELSP as its own block. Then upon receiving the CS preemption event, we promote the pending block to the inflight block (discarding what was previously being tracked). As normal CS completion events arrive, we then remove stale entries from the inflight tracker. v2: Be a tinge paranoid and ensure we flush the write into the HWS page for the GPU semaphore to pick in a timely fashion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk	2019-06-20 16:52:36 +01:00
Chris Wilson	09c5ab384f	drm/i915: Keep rings pinned while the context is active Remember to keep the rings pinned as well as the context image until the GPU is no longer active. v2: Introduce a ring->pin_count primarily to hide the mock_ring that doesn't fit into the normal GGTT vma picture. v3: Order is important in teardown, ringbuffer submission needs to drop the pin count on the engine->kernel_context before it can gleefully free its ring. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110946 Fixes: `ce476c80b8` ("drm/i915: Keep contexts pinned until after the next kernel context switch") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190619170135.15281-1-chris@chris-wilson.co.uk	2019-06-19 19:49:14 +01:00
Chris Wilson	7359134101	drm/i915/execlists: Detect cross-contamination with GuC The process_csb routine from execlists_submission is incompatible with the GuC backend. Add a warning to detect if we accidentally end up in the wrong spot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618110736.31155-1-chris@chris-wilson.co.uk	2019-06-19 12:18:14 +01:00
Chris Wilson	44d89409a1	drm/i915: Make the semaphore saturation mask global The idea behind keeping the saturation mask local to a context backfired spectacularly. The premise with the local mask was that we would be more proactive in attempting to use semaphores after each time the context idled, and that all new contexts would attempt to use semaphores ignoring the current state of the system. This turns out to be horribly optimistic. If the system state is still oversaturated and the existing workloads have all stopped using semaphores, the new workloads would attempt to use semaphores and be deprioritised behind real work. The new contexts would not switch off using semaphores until their initial batch of low priority work had completed. Given sufficient backload load of equal user priority, this would completely starve the new work of any GPU time. To compensate, remove the local tracking in favour of keeping it as global state on the engine -- once the system is saturated and semaphores are disabled, everyone stops attempting to use semaphores until the system is idle again. One of the reason for preferring local context tracking was that it worked with virtual engines, so for switching to global state we could either do a complete check of all the virtual siblings or simply disable semaphores for those requests. This takes the simpler approach of disabling semaphores on virtual engines. The downside is that the decision that the engine is saturated is a local measure -- we are only checking whether or not this context was scheduled in a timely fashion, it may be legitimately delayed due to user priorities. We still have the same dilemma though, that we do not want to employ the semaphore poll unless it will be used. v2: Explain why we need to assume the worst wrt virtual engines. Fixes: `ca6e56f654` ("drm/i915: Disable semaphore busywaits on saturated systems") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk	2019-06-19 12:10:45 +01:00
Chris Wilson	422d7df4f0	drm/i915: Replace engine->timeline with a plain list To continue the onslaught of removing the assumption of a global execution ordering, another casualty is the engine->timeline. Without an actual timeline to track, it is overkill and we can replace it with a much less grand plain list. We still need a list of requests inflight, for the simple purpose of finding inflight requests (for retiring, resetting, preemption etc). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk	2019-06-14 19:03:40 +01:00
Chris Wilson	ce476c80b8	drm/i915: Keep contexts pinned until after the next kernel context switch We need to keep the context image pinned in memory until after the GPU has finished writing into it. Since it continues to write as we signal the final breadcrumb, we need to keep it pinned until the request after it is complete. Currently we know the order in which requests execute on each engine, and so to remove that presumption we need to identify a request/context-switch we know must occur after our completion. Any request queued after the signal must imply a context switch, for simplicity we use a fresh request from the kernel context. The sequence of operations for keeping the context pinned until saved is: - On context activation, we preallocate a node for each physical engine the context may operate on. This is to avoid allocations during unpinning, which may be from inside FS_RECLAIM context (aka the shrinker) - On context deactivation on retirement of the last active request (which is before we know the context has been saved), we add the preallocated node onto a barrier list on each engine - On engine idling, we emit a switch to kernel context. When this switch completes, we know that all previous contexts must have been saved, and so on retiring this request we can finally unpin all the contexts that were marked as deactivated prior to the switch. We can enhance this in future by flushing all the idle contexts on a regular heartbeat pulse of a switch to kernel context, which will also be used to check for hung engines. v2: intel_context_active_acquire/_release Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk	2019-06-14 19:03:32 +01:00
Chris Wilson	ab53497b57	drm/i915: Rename i915_hw_ppgtt to i915_ppgtt Keeping the _hw_ in there does not help to distinguish it from its only brethren i915_ggtt, so drop it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190611091238.15808-2-chris@chris-wilson.co.uk	2019-06-11 11:44:32 +01:00
Chris Wilson	e568ac3874	drm/i915: Pull kref into i915_address_space Make the kref common to both derived structs (i915_ggtt and i915_ppgtt) so that we can safely reference count an abstract ctx->vm address space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190611091238.15808-1-chris@chris-wilson.co.uk	2019-06-11 11:44:24 +01:00
Tvrtko Ursulin	f6e903db89	drm/i915: Tidy intel_execlists_submission_init Get to uncore from the engine for better logic organization and use already available i915 everywhere. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190607084521.16845-2-tvrtko.ursulin@linux.intel.com	2019-06-07 12:47:51 +01:00
Tvrtko Ursulin	dbc6518363	drm/i915: Convert some more bits to use engine mmio accessors Remove a couple dev_priv locals as a consequence. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190607084521.16845-1-tvrtko.ursulin@linux.intel.com	2019-06-07 12:47:49 +01:00
Chris Wilson	754f7a0b2a	drm/i915: Rename intel_context.active to .inflight Rename the engine this HW context is currently active upon (that we are flying upon) to disambiguate between the mixture of different active terms (and prevent conflict in future patches). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-14-chris@chris-wilson.co.uk	2019-05-28 12:45:29 +01:00
Chris Wilson	10be98a77c	drm/i915: Move more GEM objects under gem/ Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk	2019-05-28 12:45:29 +01:00
Chris Wilson	8475355f7a	drm/i915: Move shmem object setup to its own file Split the plain old shmem object into its own file to start decluttering i915_gem.c v2: Lose the confusing, hysterical raisins, suffix of _gtt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk	2019-05-28 12:45:29 +01:00
Chris Wilson	ee1136908e	drm/i915/execlists: Virtual engine bonding Some users require that when a master batch is executed on one particular engine, a companion batch is run simultaneously on a specific slave engine. For this purpose, we introduce virtual engine bonding, allowing maps of master:slaves to be constructed to constrain which physical engines a virtual engine may select given a fence on a master engine. For the moment, we continue to ignore the issue of preemption deferring the master request for later. Ideally, we would like to then also remove the slave and run something else rather than have it stall the pipeline. With load balancing, we should be able to move workload around it, but there is a similar stall on the master pipeline while it may wait for the slave to be executed. At the cost of more latency for the bonded request, it may be interesting to launch both on their engines in lockstep. (Bubbles abound.) Opens: Also what about bonding an engine as its own master? It doesn't break anything internally, so allow the silliness. v2: Emancipate the bonds v3: Couple in delayed scheduling for the selftests v4: Handle invalid mutually exclusive bonding v5: Mention what the uapi does v6: s/nbond/num_bonds/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk	2019-05-22 08:40:46 +01:00
Chris Wilson	78e41ddd21	drm/i915: Apply an execution_mask to the virtual_engine Allow the user to direct which physical engines of the virtual engine they wish to execute one, as sometimes it is necessary to override the load balancing algorithm. v2: Only kick the virtual engines on context-out if required Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk	2019-05-22 08:40:43 +01:00
Chris Wilson	6d06779e86	drm/i915: Load balancing across a virtual engine Having allowed the user to define a set of engines that they will want to only use, we go one step further and allow them to bind those engines into a single virtual instance. Submitting a batch to the virtual engine will then forward it to any one of the set in a manner as best to distribute load. The virtual engine has a single timeline across all engines (it operates as a single queue), so it is not able to concurrently run batches across multiple engines by itself; that is left up to the user to submit multiple concurrent batches to multiple queues. Multiple users will be load balanced across the system. The mechanism used for load balancing in this patch is a late greedy balancer. When a request is ready for execution, it is added to each engine's queue, and when an engine is ready for its next request it claims it from the virtual engine. The first engine to do so, wins, i.e. the request is executed at the earliest opportunity (idle moment) in the system. As not all HW is created equal, the user is still able to skip the virtual engine and execute the batch on a specific engine, all within the same queue. It will then be executed in order on the correct engine, with execution on other virtual engines being moved away due to the load detection. A couple of areas for potential improvement left! - The virtual engine always take priority over equal-priority tasks. Mostly broken up by applying FQ_CODEL rules for prioritising new clients, and hopefully the virtual and real engines are not then congested (i.e. all work is via virtual engines, or all work is to the real engine). - We require the breadcrumb irq around every virtual engine request. For normal engines, we eliminate the need for the slow round trip via interrupt by using the submit fence and queueing in order. For virtual engines, we have to allow any job to transfer to a new ring, and cannot coalesce the submissions, so require the completion fence instead, forcing the persistent use of interrupts. - We only drip feed single requests through each virtual engine and onto the physical engines, even if there was enough work to fill all ELSP, leaving small stalls with an idle CS event at the end of every request. Could we be greedy and fill both slots? Being lazy is virtuous for load distribution on less-than-full workloads though. Other areas of improvement are more general, such as reducing lock contention, reducing dispatch overhead, looking at direct submission rather than bouncing around tasklets etc. sseu: Lift the restriction to allow sseu to be reconfigured on virtual engines composed of RENDER_CLASS (rcs). v2: macroize check_user_mbz() v3: Cancel virtual engines on wedging v4: Commence commenting v5: Replace 64b sibling_mask with a list of class:instance v6: Drop the one-element array in the uabi v7: Assert it is an virtual engine in to_virtual_engine() v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2) Link: https://github.com/intel/media-driver/pull/283 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk	2019-05-22 08:40:38 +01:00
Chris Wilson	4cc79cbb01	drm/i915/execlists: Drop promotion on unsubmit With the disappearance of NEWCLIENT, we no longer need to provide the priority boost on preemption in order to prevent repeated gazumping, and we can remove the dead code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-5-chris@chris-wilson.co.uk	2019-05-17 16:05:08 +01:00

... 2 3 4 5 6 ...

361 Commits