linux

Author	SHA1	Message	Date
Chris Wilson	1d2a19c256	drm/i915/lrc: Remove superfluous WARN_ON Remove the WARN_ON(ce->state) inside the static function only called when ce->state == NULL and downgrade the w/a batch setup warning into a developer only mode (GEM_WARN_ON). v2: Move the deferred alloc guard into the callee, eliminating the need for the WARN_ON: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-1 (-1) Function old new delta execlists_context_pin 1819 1818 -1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180126121846.12007-1-chris@chris-wilson.co.uk	2018-01-26 13:03:07 +00:00
Chris Wilson	09b1a4e4b5	drm/i915/lrc: Clear context restore/save inhibit flags for new contexts CTX_CONTEXT_CONTROL (CTX_SR_CTL) operates as a masked register and so will only apply the bits that are selected by the upper half. In the case of selectively enabling sr inhibit, this may mean the context keeps the current setting (so forgetting to save the context later, eventually leading to a very upset GPU!). Fixes: `517aaffe0c` ("drm/i915/execlists: Inhibit context save/restore for the fake preempt context") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180125112443.12745-1-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2018-01-25 18:04:25 +00:00
Chris Wilson	517aaffe0c	drm/i915/execlists: Inhibit context save/restore for the fake preempt context We only use the preempt context to inject an idle point into execlists. We never need to reference its logical state, so tell the GPU never to load it or save it. v2: BIT(2) for save-inhibit. N.B. Daniele mentioned this bit mbz for ICL, and has been moved into the submission process rather than the context image. Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180123210412.17653-1-chris@chris-wilson.co.uk	2018-01-24 09:40:15 +00:00
Michel Thierry	578f1ac689	drm/i915: Move LRC register offsets to a header file Newer platforms may have subtle offset changes, which will increase the number of defines, so it is probably better to start moving them to its own header file. Also move the macros used while setting the reg state. v2: Rename to intel_lrc_reg.h, to be consistent with i915_reg.h and intel_guc_reg.h (Chris) v3: License notice shenanigans. v4: Documentation/process/coding-style.rst is always right (Chris) v5: Rebase. Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180124004349.22126-2-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2018-01-24 09:22:55 +00:00
Michel Thierry	751d115302	drm/i915/lrc: Update reg_state macros to pass checkpatch The macros we use to init the reg_state had the following issues reported by checkpatch --strict. Macro argument reuse 'reg_state' - possible side-effects Macro argument reuse 'pos' - possible side-effects Macro argument reuse 'ppgtt' - possible side-effects spaces preferred around that '+' (ctx:VxV) So fix these issues before they are moved to a new header file. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180124004349.22126-1-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2018-01-24 09:19:38 +00:00
Chris Wilson	bb5db7e160	drm/i915/execlists: Skip forcewake for ELSP submission Now that we can read the CSB from the HWSP, we may avoid having to perform mmio reads entirely and so forgo the rigmarole of the forcewake dance. v2: Include forcewake hint for GEM_TRACE readback of mmio. If we don't hold fw ourselves, the reads may return garbage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180122100714.15137-1-chris@chris-wilson.co.uk	2018-01-22 18:27:04 +00:00
Tvrtko Ursulin	10bde236ef	drm/i915: Per-engine scratch VMA is mandatory We fail engine initialization if the scratch VMA cannot be created so there is no point in error handle it later. If the initialization ordering gets messed up, we can explode during development just as well. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180119100005.9072-2-tvrtko.ursulin@linux.intel.com	2018-01-22 17:15:31 +00:00
Tvrtko Ursulin	ae504be2e0	drm/i915: Downgrade incorrect engine constructor usage warnings to development Render engine constructor helpers must only be called from the render engine constructors, but there is no need to burden the production binaries with warnings which can only be triggered during development. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180119100005.9072-1-tvrtko.ursulin@linux.intel.com	2018-01-22 17:15:20 +00:00
Chris Wilson	c218ee03b9	drm/i915: Don't adjust priority on an already signaled fence When we retire a signaled fence, we free the dependency tree. However, we skip clearing the list so that if we then try to adjust the priority of the signaled fence, we may walk the list of freed dependencies. [ 3083.156757] ================================================================== [ 3083.156806] BUG: KASAN: use-after-free in execlists_schedule+0x199/0x660 [i915] [ 3083.156810] Read of size 8 at addr ffff8806bf20f400 by task Xorg/831 [ 3083.156815] CPU: 0 PID: 831 Comm: Xorg Not tainted 4.15.0-rc6-no-psn+ #1 [ 3083.156817] Hardware name: Notebook N24_25BU/N24_25BU, BIOS 5.12 02/17/2017 [ 3083.156818] Call Trace: [ 3083.156823] dump_stack+0x5c/0x7a [ 3083.156827] print_address_description+0x6b/0x290 [ 3083.156830] kasan_report+0x28f/0x380 [ 3083.156872] ? execlists_schedule+0x199/0x660 [i915] [ 3083.156914] execlists_schedule+0x199/0x660 [i915] [ 3083.156956] ? intel_crtc_atomic_check+0x146/0x4e0 [i915] [ 3083.156997] ? execlists_submit_request+0xe0/0xe0 [i915] [ 3083.157038] ? i915_vma_misplaced.part.4+0x25/0xb0 [i915] [ 3083.157079] ? __i915_vma_do_pin+0x7c8/0xc80 [i915] [ 3083.157121] ? intel_atomic_state_alloc+0x44/0x60 [i915] [ 3083.157130] ? drm_atomic_helper_page_flip+0x3e/0xb0 [drm_kms_helper] [ 3083.157145] ? drm_mode_page_flip_ioctl+0x7d2/0x850 [drm] [ 3083.157159] ? drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.157172] ? drm_ioctl+0x45b/0x560 [drm] [ 3083.157211] i915_gem_object_wait_priority+0x14c/0x2c0 [i915] [ 3083.157251] ? i915_gem_get_aperture_ioctl+0x150/0x150 [i915] [ 3083.157290] ? i915_vma_pin_fence+0x1d8/0x320 [i915] [ 3083.157331] ? intel_pin_and_fence_fb_obj+0x175/0x250 [i915] [ 3083.157372] ? intel_rotation_info_size+0x60/0x60 [i915] [ 3083.157413] ? intel_link_compute_m_n+0x80/0x80 [i915] [ 3083.157428] ? drm_dev_printk+0x1b0/0x1b0 [drm] [ 3083.157443] ? drm_dev_printk+0x1b0/0x1b0 [drm] [ 3083.157485] intel_prepare_plane_fb+0x2f8/0x5a0 [i915] [ 3083.157527] ? intel_crtc_get_vblank_counter+0x80/0x80 [i915] [ 3083.157536] drm_atomic_helper_prepare_planes+0xa0/0x1c0 [drm_kms_helper] [ 3083.157587] intel_atomic_commit+0x12e/0x4e0 [i915] [ 3083.157605] drm_atomic_helper_page_flip+0xa2/0xb0 [drm_kms_helper] [ 3083.157621] drm_mode_page_flip_ioctl+0x7d2/0x850 [drm] [ 3083.157638] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm] [ 3083.157652] ? drm_lease_owner+0x1a/0x30 [drm] [ 3083.157668] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm] [ 3083.157681] drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.157696] drm_ioctl+0x45b/0x560 [drm] [ 3083.157711] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm] [ 3083.157725] ? drm_getstats+0x20/0x20 [drm] [ 3083.157729] ? timerqueue_del+0x49/0x80 [ 3083.157732] ? __remove_hrtimer+0x62/0xb0 [ 3083.157735] ? hrtimer_try_to_cancel+0x173/0x210 [ 3083.157738] do_vfs_ioctl+0x13b/0x880 [ 3083.157741] ? ioctl_preallocate+0x140/0x140 [ 3083.157744] ? _raw_spin_unlock_irq+0xe/0x30 [ 3083.157746] ? do_setitimer+0x234/0x370 [ 3083.157750] ? SyS_setitimer+0x19e/0x1b0 [ 3083.157752] ? SyS_alarm+0x140/0x140 [ 3083.157755] ? __rcu_read_unlock+0x66/0x80 [ 3083.157757] ? __fget+0xc4/0x100 [ 3083.157760] SyS_ioctl+0x74/0x80 [ 3083.157763] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 3083.157765] RIP: 0033:0x7f6135d0c6a7 [ 3083.157767] RSP: 002b:00007fff01451888 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 [ 3083.157769] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6135d0c6a7 [ 3083.157771] RDX: 00007fff01451950 RSI: 00000000c01864b0 RDI: 000000000000000c [ 3083.157772] RBP: 00007f613076f600 R08: 0000000000000001 R09: 0000000000000000 [ 3083.157773] R10: 0000000000000060 R11: 0000000000003246 R12: 0000000000000000 [ 3083.157774] R13: 0000000000000060 R14: 000000000000001b R15: 0000000000000060 [ 3083.157779] Allocated by task 831: [ 3083.157783] kmem_cache_alloc+0xc0/0x200 [ 3083.157822] i915_gem_request_await_dma_fence+0x2c4/0x5d0 [i915] [ 3083.157861] i915_gem_request_await_object+0x321/0x370 [i915] [ 3083.157900] i915_gem_do_execbuffer+0x1165/0x19c0 [i915] [ 3083.157937] i915_gem_execbuffer2+0x1ad/0x550 [i915] [ 3083.157950] drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.157962] drm_ioctl+0x45b/0x560 [drm] [ 3083.157964] do_vfs_ioctl+0x13b/0x880 [ 3083.157966] SyS_ioctl+0x74/0x80 [ 3083.157968] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 3083.157971] Freed by task 831: [ 3083.157973] kmem_cache_free+0x77/0x220 [ 3083.158012] i915_gem_request_retire+0x72c/0xa70 [i915] [ 3083.158051] i915_gem_request_alloc+0x1e9/0x8b0 [i915] [ 3083.158089] i915_gem_do_execbuffer+0xa96/0x19c0 [i915] [ 3083.158127] i915_gem_execbuffer2+0x1ad/0x550 [i915] [ 3083.158140] drm_ioctl_kernel+0xa7/0xf0 [drm] [ 3083.158153] drm_ioctl+0x45b/0x560 [drm] [ 3083.158155] do_vfs_ioctl+0x13b/0x880 [ 3083.158156] SyS_ioctl+0x74/0x80 [ 3083.158158] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 3083.158162] The buggy address belongs to the object at ffff8806bf20f400 which belongs to the cache i915_dependency of size 64 [ 3083.158166] The buggy address is located 0 bytes inside of 64-byte region [ffff8806bf20f400, ffff8806bf20f440) [ 3083.158168] The buggy address belongs to the page: [ 3083.158171] page:00000000d43decc4 count:1 mapcount:0 mapping: (null) index:0x0 [ 3083.158174] flags: 0x17ffe0000000100(slab) [ 3083.158179] raw: 017ffe0000000100 0000000000000000 0000000000000000 0000000180200020 [ 3083.158182] raw: ffffea001afc16c0 0000000500000005 ffff880731b881c0 0000000000000000 [ 3083.158184] page dumped because: kasan: bad access detected [ 3083.158187] Memory state around the buggy address: [ 3083.158190] ffff8806bf20f300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158192] ffff8806bf20f380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158195] >ffff8806bf20f400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158196] ^ [ 3083.158199] ffff8806bf20f480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158201] ffff8806bf20f500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3083.158203] ================================================================== Reported-by: Alexandru Chirvasitu <achirvasub@gmail.com> Reported-by: Mike Keehan <mike@keehan.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104436 Fixes: `1f181225f8` ("drm/i915/execlists: Keep request->priority for its lifetime") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Alexandru Chirvasitu <achirvasub@gmail.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Tested-by: Alexandru Chirvasitu <achirvasub@gmail.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180106105618.13532-1-chris@chris-wilson.co.uk	2018-01-08 09:16:02 +00:00
Chris Wilson	2221c5b7dd	drm/i915/execlists: Reduce list_for_each_safe+list_safe_reset_next After staring at the list_for_each_safe macros for a bit, our current invocation of list_safe_reset_next in execlists_schedule() simply reduces to list_for_each. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-11-chris@chris-wilson.co.uk	2018-01-03 12:09:46 +00:00
Chris Wilson	ce01b17377	drm/i915/execlists: Assert there are no simple cycles in the dependencies The dependency chain must be an acyclic graph. This is checked by the swfence, but for sanity, also do a simple check that we do not corrupt our list iteration in execlists_schedule() by a shallow dependency cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-10-chris@chris-wilson.co.uk	2018-01-03 12:09:44 +00:00
Chris Wilson	83cc84c5a8	drm/i915: Assert all signalers we depended on did indeed signal Back up our comment that all signalers should have been signaled before we ourselves were retired with an assert to that effect. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-9-chris@chris-wilson.co.uk	2018-01-03 12:05:41 +00:00
Chris Wilson	f3c9d40757	drm/i915/execlists: Tidy enabling execlists Move the register settings for enabling execlists into its own function for clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-18-chris@chris-wilson.co.uk	2018-01-03 11:02:43 +00:00
Chris Wilson	693cfbf058	drm/i915/execlists: Record elsp offset during engine setup Currently, we record the elsp register offset inside init-hw but we only need to do it once during engine setup (after we know the mmio iomapping). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-17-chris@chris-wilson.co.uk	2018-01-03 11:01:38 +00:00
Chris Wilson	4223221361	drm/i915/execlists: Clear context-switch interrupt earlier in the reset Move the clearing of the CS-interrupt into the engine reset phase, before the current init-hw phase. This helps clarify that we clear the pending interrupts prior to any restarting of the execlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-16-chris@chris-wilson.co.uk	2018-01-03 11:01:05 +00:00
Chris Wilson	193a98dc7c	drm/i915/execlists: Show preemption progress in GEM_TRACE We already emit a GEM_TRACE for when we start preemption, but we lack one to show when the preemption is completed and we return to the regular queue. This is to continue the investigation into the mysterious <0>[ 197.854177] <idle>-0 1..s1 197837017us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0], tail=0 [0] <0>[ 197.854209] drv_self-6008 2.... 197837390us : reset_common_ring: rcs0 seqno=15515 <0>[ 197.854240] drv_self-6008 2.... 197837415us : reset_common_ring: bcs0 seqno=0 <0>[ 197.854270] drv_self-6008 2.... 197837443us : reset_common_ring: vcs0 seqno=0 <0>[ 197.854300] drv_self-6008 2.... 197837463us : reset_common_ring: vcs1 seqno=0 <0>[ 197.854330] drv_self-6008 2.... 197837482us : reset_common_ring: vecs0 seqno=0 <0>[ 197.854360] ksoftirq-23 2..s. 197838341us : execlists_submission_tasklet: bcs0 in[0]: ctx=0.1, seqno=1dce7 <0>[ 197.854392] <idle>-0 1..s1 197838347us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=0 [0] <0>[ 197.854423] ksoftirq-23 2..s. 197838354us : execlists_submission_tasklet: vcs0 in[0]: ctx=0.1, seqno=1d027 <0>[ 197.854456] ksoftirq-23 2.Ns. 197838361us : execlists_submission_tasklet: vcs1 in[0]: ctx=0.1, seqno=1e738 <0>[ 197.854488] ksoftirq-23 2.Ns. 197838366us : execlists_submission_tasklet: vecs0 in[0]: ctx=0.1, seqno=235aa <0>[ 197.854520] ksoftirq-23 2.Ns. 197838376us : execlists_submission_tasklet: rcs0 in[0]: ctx=0.1, seqno=15518 <0>[ 197.854552] <idle>-0 1..s1 197853285us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0], tail=7 [7] <0>[ 197.854584] <idle>-0 1..s1 197853285us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000 <0>[ 197.854616] <idle>-0 1..s1 197853286us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171222132742.4272-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-12-22 18:32:51 +00:00
Chris Wilson	16a8739402	drm/i915: Tidy up GEM_TRACE around execlists Looking at the coordination of resets with the submission of execlists, it will be useful to have a GEM_TRACE for when we issue the reset. Whilst there tidy up the other GEM_TRACE to always include the engine name, and be careful not to trust any pointers prior to asserts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171220090626.31643-1-chris@chris-wilson.co.uk	2017-12-20 13:24:34 +00:00
Chris Wilson	16c8619a7c	drm/i915: Avoid context dereference inside execlists_submission_tasklet A lesson that has to be relearnt over and over again is that the request does not keep a reference to the context and so we cannot freely dereference the context from inside the execlists_submission_tasklet. In particular, we try to do so in the new GEM_TRACE() so convert those over to the port->context_id we keep for GEM debugging. This means the tracing now depends on DRM_I915_GEM_DEBUG. Fixes: `bccd3b8311` ("drm/i915: Use trace_printk to provide a death rattle for GEM") References: https://bugs.freedesktop.org/show_bug.cgi?id=104066 References: https://bugs.freedesktop.org/show_bug.cgi?id=104162 References: https://bugs.freedesktop.org/show_bug.cgi?id=104242 References: https://bugs.freedesktop.org/show_bug.cgi?id=104310 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171219220916.30882-1-chris@chris-wilson.co.uk	2017-12-19 23:04:45 +00:00
Chris Wilson	2fc7a06ad5	drm/i915/execlists: Cache ELSP register offset Currently on every submission, we recalculate the ELSP register offset for the engine, after chasing the pointers to find the iomem base. Since this is fixed for the lifetime of the driver, record the offset in the execlists struct. In practice the difference is negligible, it just happens to remove 27 bytes of eyesore pointer dancing from next to the hottest instruction (which is itself due to stalling for a cache miss) in perf profiles of the execlists_submission_tasklet(). v2: Trim off one more elsp local. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171207222434.17686-1-chris@chris-wilson.co.uk	2017-12-08 00:37:05 +00:00
Tvrtko Ursulin	cf669b4e9f	drm/i915: Consolidate checks for engine stats availability Sagar noticed the check can be consolidated between the engine stats implementation and the PMU. My first choice was a static inline helper but that got into include ordering mess quickly fast so I went with a macro instead. At some point we should perhaps looking into taking out the non-ringubffer bits from intel_ringbuffer.h into a new intel_engine.h or something. v2: Use engine->flags. (Chris Wilson) v3: Rebase and mark GuC as not yet supported. (Chris Wilson) v4: Move flag setting to intel_engines_reset_default_submission. (Chris Wilson) v5: Move flag setting to logical_ring_setup. v6: intel_engines_reset_default_submission is the wrong place to set the flag - it needs to be in execlists_set_default_submission. (Sagar) v7: Flag setting in logical_ring_setup is not required. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v6) Link: https://patchwork.freedesktop.org/patch/msgid/20171129102805.22690-1-tvrtko.ursulin@linux.intel.com	2017-11-29 12:29:40 +00:00
Tvrtko Ursulin	30e17b7847	drm/i915: Engine busy time tracking Track total time requests have been executing on the hardware. We add new kernel API to allow software tracking of time GPU engines are spending executing requests. Both per-engine and global API is added with the latter also being exported for use by external users. v2: * Squashed with the internal API. * Dropped static key. * Made per-engine. * Store time in monotonic ktime. v3: Moved stats clearing to disable. v4: * Comments. * Don't export the API just yet. v5: Whitespace cleanup. v6: * Rename ref to active. * Drop engine aggregate stats for now. * Account initial busy period after enabling stats. v7: * Rebase. v8: * Move context in notification after the notifier. (Chris Wilson) v9: In cases where stats tracking is getting disabled while there is an active context on an engine, add up the current value to the total. This also implies we don't clear the total when tracking is disabled any longer. There is no real need to do so because we define the stats as relative while enabled, meaning comparison between two samples while tracking is enabled is the valid usage. However, when busy stats will later be plugged into the perf PMU API, it is beneficial to not reset the total, since the PMU core likes to do some counter disable/enable cycles on startup, and while doing so during a single long context executing on an engine we would lose some accuracy and so make unit testing more difficult than needs to be. v10: * Fix accounting for preemption. v11: * Rebase for i915_modparams.enable_execlists removal. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-5-tvrtko.ursulin@linux.intel.com	2017-11-22 11:25:02 +00:00
Tvrtko Ursulin	73fd9d3816	drm/i915: Wrap context schedule notification No functional change just something which will be handy in the following patch. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-4-tvrtko.ursulin@linux.intel.com	2017-11-22 11:25:01 +00:00
Chris Wilson	fb5c551ad5	drm/i915: Remove i915.enable_execlists module parameter Execlists and legacy ringbuffer submission are no longer feature comparable (execlists now offer greater functionality that should overcome their performance hit) and obsoletes the unsafe module parameter, i.e. comparing the two modes of execution is no longer useful, so remove the debug tool. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk	2017-11-20 21:53:59 +00:00
Michel Thierry	ba74cb10c7	drm/i915/execlists: Delay writing to ELSP until HW has processed the previous write The hardware needs some time to process the information received in the ExecList Submission Port, and expects us to not write anything more until it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or PREEMPTED CSB event. If we do not follow this, the driver could write new data into the ELSP before HW had finishing fetching the previous one, putting us in 'undefined behaviour' space. This seems to be the problem causing the spurious PREEMPTED & COMPLETE events after a COMPLETE like the one below: [] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3. [] vcs0: Execlist CSB[0]: 0x00000018 _ 0x00000007 [] vcs0: Execlist CSB[1]: 0x00000001 _ 0x00000000 [] vcs0: Execlist CSB[2]: 0x00000018 _ 0x00000007 <<< COMPLETE [] vcs0: Execlist CSB[3]: 0x00000012 _ 0x00000007 <<< PREEMPTED & COMPLETE [] vcs0: Execlist CSB[4]: 0x00008002 _ 0x00000006 [] vcs0: Execlist CSB[5]: 0x00000014 _ 0x00000006 The ELSP writes that lead to this CSB sequence show that the HW hadn't started executing the previous execlist (the one with only ctx 0x6) by the time the new one was submitted; this is a bit more clear in the data show in the EXECLIST_STATUS register at the time of the ELSP write. [] vcs0: ELSP[0] = 0x0_0 [execlist1] - status_reg = 0x0_302 [] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302 [] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308 [] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308 Note that having to wait for this ack does not disable lite-restores, although it may reduce their numbers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035 Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-20 17:01:38 +00:00
Chris Wilson	1f5f9edb44	drm/i915/execlists: Assert that we don't get mixed IDLE_ACTIVE \| COMPLETE events If IDLE_ACTIVE is set, then all other bits are invalid. For us, we can assert that if we see a COMPLETE \| PREEMPTED event, then it should be impossible for it to also contain an IDLE_ACTIVE flag. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-11-20 14:50:45 +00:00
Chris Wilson	d8747afb11	drm/i915/execlists: Reduce completed event mask to COMPLETE \| PREEMPTED Since we get a COMPLETE event when the context switch occurs on RING_HEAD == RING_TAIL and a PREEMPTED event when a switch occurs before that point, COMPLETE \| PREEMPTED should cover all possible context switch completion events. We can move the ELEMENT_SWITCH info message from the COMPLETED_MASK into an assertion for when we are performing a switch to port[1]. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-11-20 14:50:03 +00:00
Chris Wilson	e40dd22624	drm/i915/execlists: Listen to COMPLETE context event not ACTIVE_IDLE Since commit `e1fee72c2e` Author: Oscar Mateo <oscar.mateo@intel.com> Date: Thu Jul 24 17:04:40 2014 +0100 drm/i915/bdw: Avoid non-lite-restore preemptions execlists has listened to (ACTIVE_IDLE \| ELEMENT_SWITCH) for detecting when one context completed and it either continued onto the next (in port 1) or idled. We would always see COMPLETE \| ACTIVE_IDLE on the final context-switch event, but on recent gen it appears that we now get separate ACTIVE_IDLE and COMPLETE events. In particular, the ACTIVE_IDLE events may not be coupled to a context (since it is a general state rather than a specific context completion event). v2: Update the history, execlists did originally start out by listening to the COMPLETE event not ACTIVE_IDLE. v3: Update preempt completion test to also use COMPLETE not ACTIVE_IDLE. References: bspec/12255 References: https://bugs.freedesktop.org/show_bug.cgi?id=103800 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-1-chris@chris-wilson.co.uk	2017-11-20 14:49:16 +00:00
Sagar Arun Kamble	c6dce8f140	drm/i915: Update execlists tasklet naming intel_lrc_irq_handler and i915_guc_irq_handler are HW submission related tasklet functions. Name them with "submission_tasklet" suffix and remove intel/i915 prefix as they are static. Also rename irq_tasklet as just tasklet for clarity. v2: s/_bh/_tasklet (Chris) Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-2-git-send-email-sagar.a.kamble@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-16 15:01:31 +00:00
Chris Wilson	fd13821219	drm/i915: Make request's wait-for-space explicit At the start of building a request, we would wait for roughly enough space to fit the average request (to reduce the likelihood of having to wait and abort partway through request construction). To achieve we would try to begin a 0-length command packet, this just adds extra confusion so make the wait-for-space explicit, as in the next patch we want to move it from the backend to the i915_gem_request_alloc() so it can ensure that the wait-for-space is the first operation in building a new request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171115151204.8105-2-chris@chris-wilson.co.uk	2017-11-15 17:12:49 +00:00
Chris Wilson	7c2fa7faf1	drm/i915: Stop caching the "golden" renderstate As we now record the default HW state and so only emit the "golden" renderstate once to prepare the HW, there is no advantage in keeping the renderstate batch around as it will never be used again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-8-chris@chris-wilson.co.uk	2017-11-10 17:23:22 +00:00
Chris Wilson	d2b4b97933	drm/i915: Record the default hw state after reset upon load Take a copy of the HW state after a reset upon module loading by executing a context switch from a blank context to the kernel context, thus saving the default hw state over the blank context image. We can then use the default hw state to initialise any future context, ensuring that each starts with the default view of hw state. v2: Unmap our default state from the GTT after stealing it from the context. This should stop us from accidentally overwriting it via the GTT (and frees up some precious GTT space). Testcase: igt/gem_ctx_isolation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk	2017-11-10 17:23:10 +00:00
Chris Wilson	f4e15af7e2	drm/i915: Mark the context state as dirty/written In the next few patches, we will want to both copy out of the context image and write a valid image into a new context. To be completely safe, we should then couple in our domain tracking to ensure that we don't have any issues with stale data remaining in unwanted cachelines. Historically, we omitted the .write=true from the call to set-gtt-domain in i915_switch_context() in order to avoid a stall between every request as we would want to wait for the previous context write from the gpu. Since then, we limit the set-gtt-domain to only occur when we first bind the vma, so once in use we will never stall, and we are sure to flush the context following a load from swap. Equally we never applied the lessons learnt from ringbuffer submission to execlists; so time to apply the flush of the lrc after load as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-6-chris@chris-wilson.co.uk	2017-11-10 17:20:29 +00:00
Chris Wilson	bccd3b8311	drm/i915: Use trace_printk to provide a death rattle for GEM Trying to enable printk debugging for GEM is fraught with the issue of spam; interactions with HW are very frequent and often boring. However, one instance where they are not so boring is just before a BUG; here ftrace provides a facility to dump its ringbuffer on an oops. So for CI let's enable trace_printk() to capture the last exchanges with HW as a death rattle. For example, [ 79.234110] ------------[ cut here ]------------ [ 79.234137] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:907! [ 79.234145] invalid opcode: 0000 [#1] SMP [ 79.234153] Dumping ftrace buffer: [ 79.234158] --------------------------------- ... [ 79.314044] gem_conc-1059 1..s1 79203443us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.2, seqno=145 [ 79.314089] gem_conc-1059 1..s. 79220800us : intel_lrc_irq_handler: bcs0 csb[1/1]: status=0x00000018:0x00000005 [ 79.314133] gem_conc-1059 1..s. 79220803us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.1, seqno=145 [ 79.314177] gem_conc-1062 2..s1 79230458us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.1, seqno=146 [ 79.314220] gem_conc-1062 2..s1 79230515us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.2, seqno=147 [ 79.314265] gem_conc-1059 1..s1 79230951us : intel_lrc_irq_handler: bcs0 csb[2/3]: status=0x00000012:0x00000008 [ 79.314309] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.2, seqno=147 [ 79.314353] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 csb[3/3]: status=0x00008002:0x00000008 [ 79.314396] gem_conc-1059 1..s1 79230955us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.1, seqno=147 [ 79.314402] --------------------------------- v2: Tweak the formatting to be more consistent between in/out. v3: do {} while (0) stub macro protection Suggested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171109143019.16568-1-chris@chris-wilson.co.uk	2017-11-09 21:39:18 +00:00
Michał Winiarski	c41937fd99	drm/i915/guc: Preemption! With GuC Pretty similar to what we have on execlists. We're reusing most of the GEM code, however, due to GuC quirks we need a couple of extra bits. Preemption is implemented as GuC action, and actions can be pretty slow. Because of that, we're using a mutex to serialize them. Since we're requesting preemption from the tasklet, the task of creating a workitem and wrapping it in GuC action is delegated to a worker. To distinguish that preemption has finished, we're using additional piece of HWSP, and since we're not getting context switch interrupts, we're also adding a user interrupt. The fact that our special preempt context has completed unfortunately doesn't mean that we're ready to submit new work. We also need to wait for GuC to finish its own processing. v2: Don't compile out the wait for GuC, handle workqueue flush on reset, no need for ordered workqueue, put on a reviewer hat when looking at my own patches (Chris) Move struct work around in intel_guc, move user interruput outside of conditional (Michał) Keep ring around rather than chase though intel_context v3: Extract WA for flushing ggtt writes to a helper (Chris) Keep work_struct in intel_guc rather than engine (Michał) Use ordered workqueue for inject_preempt worker to avoid GuC quirks. v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele) Drop stray newlines, use container_of for intel_guc in worker, check for presence of workqueue when flushing it, rather than enable_guc_submission modparam, reorder preempt postprocessing (Chris) v5: Make wq NULL after destroying it v6: Swap struct guc_preempt_work members (Michał) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com	2017-10-26 21:35:21 +01:00
Michał Winiarski	a4598d1755	drm/i915: Rename helpers used for unwinding, use macro for can_preempt We would also like to make use of execlist_cancel_port_requests and unwind_incomplete_requests in GuC preemption backend. Let's rename the functions to use the correct prefixes, so that we can simply add the declarations in the following patch. Similar thing for applies for can_preempt, except we're introducing HAS_LOGICAL_RING_PREEMPTION macro instad, converting other users that were previously touching device info directly. v2: s/intel_engine/execlists and pass execlists to unwind (Chris) v3: use locked version for exporting, drop const qual (Chris) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-11-michal.winiarski@intel.com	2017-10-26 21:35:21 +01:00
Michał Winiarski	df77cd83d5	drm/i915: Extract "emit write" part of emit breadcrumb functions Let's separate the "emit" part from touching any internal structures, this way we can have a generic "emit coherent GGTT write" function. We would like to reuse this functionality for emitting HWSP write, to confirm that preempt-to-idle has finished. v2: Reorder args to match emit_pipe_control, s/render/rcs (Chris) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-8-michal.winiarski@intel.com	2017-10-26 21:35:21 +01:00
Michał Winiarski	9bdc3573a5	drm/i915/guc: Initialize GuC before restarting engines Now that we're handling request resubmission the same way as regular submission (from the tasklet), we can move GuC initialization earlier, before restarting the engines. This way, we're no longer being in the state of flux during engine restart - we're already in user requested submission mode. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025172519.10670-5-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-10-25 19:41:04 +01:00
Chris Wilson	aba5e27858	drm/i915: Add a hook for making the engines idle (parking) and unparking In the next patch, we will want to install a callback when the engines (GT as a whole) become idle and similarly when they first become busy. To enable that callback, first rename intel_engines_mark_idle() to intel_engines_park() and provide the companion intel_engines_unpark(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk	2017-10-25 18:47:30 +01:00
Chris Wilson	64b80085dd	drm/i915/execlists: Remove the priority "optimisation" Originally we set the priority to max upon inserting the request into the execlists queue (and removing it from the scheduler lists). We could then use the prio==INT_MAX as a shortcut within execlists_schedule() to detect the end of the dependency chain. Since commit `1f181225f8` ("drm/i915/execlists: Keep request->priority for its lifetime") this is no longer true as we use the request completion as an indicator the schedule dependency chain is complete instead. (This allows us to then reschedule requests even when its context is in flight.) However, this makes the GEM_BUG_ON() inside execlists_schedule() racy as we may change the rq->prio at the same time. As the assertion is useful, let's keep the assertion and remove the micro-optimisation. Fixes: `1f181225f8` ("drm/i915/execlists: Keep request->priority for its lifetime") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171024115501.21033-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-10-24 15:54:38 +01:00
Chris Wilson	4a118ecbe9	drm/i915: Filter out spurious execlists context-switch interrupts Back in commit `a4b2b01523` ("drm/i915: Don't mark an execlists context-switch when idle") we noticed the presence of late context-switch interrupts. We were able to filter those out by looking at whether the ELSP remained active, but in commit `beecec9017` ("drm/i915/execlists: Preemption!") that became problematic as we now anticipate receiving a context-switch event for preemption while ELSP may be empty. To restore the spurious interrupt suppression, add a counter for the expected number of pending context-switches and skip if we do not need to handle this interrupt to make forward progress. v2: Don't forget to switch on for preempt. v3: Reduce the counter to a on/off boolean tracker. Declare the HW as active when we first submit, and idle after the final completion event (with which we confirm the HW says it is idle), and track each source of activity separately. With a finite number of sources, it should aide us in debugging which gets stuck. Fixes: `beecec9017` ("drm/i915/execlists: Preemption!") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-3-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2017-10-24 15:54:37 +01:00
Chris Wilson	3d574a6bbb	drm/i915: Remove walk over obj->vma_list for the shrinker In the next patch, we want to reduce the lock coverage within the shrinker, and one of the dangerous walks we have is over obj->vma_list. We are only walking the obj->vma_list in order to check whether it has been permanently pinned by HW access, typically via use on the scanout. But we have a couple of other long term pins, the context objects for which we currently have to check the individual vma pin_count. If we instead mark these using obj->pin_display, we can forgo the dangerous and sometimes slow list iteration. v2: Rearrange code to try and avoid confusion from false associations due to arrangement of whitespace along with rebasing on obj->pin_global. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-4-chris@chris-wilson.co.uk	2017-10-16 20:44:19 +01:00
Weinan Li	1fd51d9d97	drm/i915: enable to read CSB and CSB write pointer from HWSP in GVT-g VM Let GVT-g VM read the CSB and CSB write pointer from virtual HWSP, not all the host support this feature, need to check the BIT(3) of caps in PVINFO. v3 : Remove unnecessary comments. v4 : Separate VM enable patch with GVT-g implementation patch due to code dependency. v5 : Use inline for GVT virtual HWSP caps check function. v6 : Comments refine. Signed-off-by: Weinan Li <weinan.z.li@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1508039725-1066-1-git-send-email-weinan.z.li@intel.com	2017-10-16 13:56:29 +03:00
Mika Kuoppala	dc2279e169	drm/i915: Use execlists_num_ports instead of size of array There is function to tell how many ports we have, so use it. We still have direct relationship with array size and port count, so no harm was done. Fixes: `76e70087d3` ("drm/i915: Make execlist port count variable") Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171010114857.13108-1-mika.kuoppala@intel.com	2017-10-10 16:19:40 +03:00
Chris Wilson	279f5a00c9	drm/i915/execlists: Add a comment for the extra MI_ARB_ENABLE Michel Thierry noticed that we were applying WaDisableCtxRestoreArbitration even to gen9, which does not require the w/a. The rationale is that we need to enable MI arbitration for execlists to work, and to be safe we do that before every batch (in addition to every context switch into the batch). Since this is not clear from the single line comment suggesting the MI_ARB_ENABLE is solely for the w/a, add a little more detail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171005191005.13462-1-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2017-10-06 23:21:11 +01:00
Chris Wilson	beecec9017	drm/i915/execlists: Preemption! When we write to ELSP, it triggers a context preemption at the earliest arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other operations and the explicit MI_ARB_CHECK). If this is to the same context, it triggers a LITE_RESTORE where the RING_TAIL is merely updated (used currently to chain requests from the same context together, avoiding bubbles). However, if it is to a different context, a full context-switch is performed and it will start to execute the new context saving the image of the old for later execution. Previously we avoided preemption by only submitting a new context when the old was idle. But now we wish embrace it, and if the new request has a higher priority than the currently executing request, we write to the ELSP regardless, thus triggering preemption, but we tell the GPU to switch to our special preemption context (not the target). In the context-switch interrupt handler, we know that the previous contexts have finished execution and so can unwind all the incomplete requests and compute the new highest priority request to execute. It would be feasible to avoid the switch-to-idle intermediate by programming the ELSP with the target context. The difficulty is in tracking which request that should be whilst maintaining the dependency change, the error comes in with coalesced requests. As we only track the most recent request and its priority, we may run into the issue of being tricked in preempting a high priority request that was followed by a low priority request from the same context (e.g. for PI); worse still that earlier request may be our own dependency and the order then broken by preemption. By injecting the switch-to-idle and then recomputing the priority queue, we avoid the issue with tracking in-flight coalesced requests. Having tried the preempt-to-busy approach, and failed to find a way around the coalesced priority issue, Michal's original proposal to inject an idle context (based on handling GuC preemption) succeeds. The current heuristic for deciding when to preempt are only if the new request is of higher priority, and has the privileged priority of greater than 0. Note that the scheduler remains unfair! v2: Disable for gen8 (bdw/bsw) as we need additional w/a for GPGPU. Since, the feature is now conditional and not always available when we have a scheduler, make it known via the HAS_SCHEDULER GETPARAM (now a capability mask). v3: Stylistic tweaks. v4: Appease Joonas with a snippet of kerneldoc, only to fuel to fire of the preempt vs preempting debate. Suggested-by: Michal Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-8-chris@chris-wilson.co.uk	2017-10-04 17:52:46 +01:00
Chris Wilson	1f181225f8	drm/i915/execlists: Keep request->priority for its lifetime With preemption, we will want to "unsubmit" a request, taking it back from the hw and returning it to the priority sorted execution list. In order to know where to insert it into that list, we need to remember its adjust priority (which may change even as it was being executed). This also affects reset for execlists as we are now unsubmitting the requests following the reset (rather than directly writing the ELSP for the inflight contexts). This turns reset into an accidental preemption point, as after the reset we may choose a different pair of contexts to submit to hw. GuC is not updated as this series doesn't add preemption to the GuC submission, and so it can keep benefiting from the early pruning of the DFS inside execlists_schedule() for a little longer. We also need to find a way of reducing the cost of that DFS... v2: Include priority in error-state Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-6-chris@chris-wilson.co.uk	2017-10-04 17:52:46 +01:00
Chris Wilson	3ad7b52d96	drm/i915/execlists: Move bdw GPGPU w/a to emit_bb Move the re-enabling of MI arbitration from a per-bb w/a buffer to the emission of the batch buffer itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-5-chris@chris-wilson.co.uk	2017-10-04 17:52:45 +01:00
Chris Wilson	d6c0511300	drm/i915/execlists: Distinguish the incomplete context notifies Let the listener know that the context we just scheduled out was not complete, and will be scheduled back in at a later point. v2: Handle CONTEXT_STATUS_PREEMPTED in gvt by aliasing it to CONTEXT_STATUS_OUT for the moment, gvt can expand upon the difference later. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-3-chris@chris-wilson.co.uk	2017-10-04 17:52:45 +01:00
Maarten Lankhorst	8279aaf590	drm/i915: Remove use_mmio_flip modparm, v2. This has been unused since commit `afa8ce5b30` ("drm/i915: Nuke legacy flip queueing code"). Changes since v1: - Rebase on top of all the changes to modparams. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> \o/-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171004094416.31306-1-maarten.lankhorst@linux.intel.com	2017-10-04 15:51:09 +02:00
Michał Winiarski	097a94815f	drm/i915/execlists: Cache the last priolist lookup Avoid the repeated rbtree lookup for each request as we unwind them by tracking the last priolist. v2: Fix up my unhelpful suggestion of using default_priolist. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-4-chris@chris-wilson.co.uk	2017-09-29 12:46:08 +01:00
Chris Wilson	7d1ea609f6	drm/i915: Give the invalid priority a magic name We use INT_MIN to denote the priority of a request that has not been submitted to the scheduler; we treat INT_MIN as an invalid priority and initialise the request to it. Give the value a name so it stands out. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-3-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com	2017-09-29 12:45:34 +01:00
Chris Wilson	7e4992ac04	drm/i915/execlists: Move request unwinding to a separate function In the future, we will want to unwind requests following a preemption point. This requires the same steps as for unwinding upon a reset, so extract the existing code to a separate function for later use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-2-chris@chris-wilson.co.uk	2017-09-29 12:45:21 +01:00
Chris Wilson	7e44fc289d	drm/i915/execlists: Notify context-out for lost requests When cancelling requests, also send the notification to any listeners (gvt) that the request is no longer scheduled on hw. They may require to keep the in/out exactly balanced, and so the reuse after the reset may confuse the listener. Fixes: `221ab9719b` ("drm/i915/execlists: Unwind incomplete requests on resets") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170926101720.9479-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2017-09-27 10:48:59 +01:00
Chris Wilson	3f9e6cd823	drm/i915/execlists: Microoptimise execlists_cancel_port_request() Just rearrange the code slightly to trim the number of iterations required. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170925124929.16974-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2017-09-25 20:36:02 +01:00
Chris Wilson	b8aa223341	drm/i915/lrc: Skip no-op per-bb buffer on gen9 Since we inherited the context image setup from gen8 which needed a per-bb workaround (for GPGPU), we are submitting an empty per-bb buffer on gen9. Now that we can skip adding the buffer to the context image, remove the dangling per-bb. This slightly improves execution latency, most notably on an idle engine. References: https://bugs.freedesktop.org/show_bug.cgi?id=87725 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-2-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-09-25 10:21:45 +01:00
Chris Wilson	604a8f6f1e	drm/i915/lrc: Only enable per-context and per-bb buffers if set The per-context and per-batch workaround buffers are optional, yet we tell the GPU to execute them even if they contain no instructions. Doing so incurs the dispatch latency, which we can avoid if we don't ask the GPU to execute the no-op buffers. Allow ourselves to skip setup of empty buffer, and then to only enable non-empty buffers in the context image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-09-25 10:21:21 +01:00
Mika Kuoppala	76e70087d3	drm/i915: Make execlist port count variable As we emulate execlists on top of the GuC workqueue, it is not restricted to just 2 ports and we can increase that number arbitrarily to trade-off queue depth (i.e. scheduling latency) against pipeline bubbles. v2: rebase. better commit msg (Chris) v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-5-mika.kuoppala@intel.com	2017-09-25 11:33:53 +03:00
Mika Kuoppala	7a62cc6107	drm/i915: Add execlist_port_complete When first execlist entry is processed, we move the port (contents). Introduce function for this as execlist and guc use this common operation. v2: rebase. s/GEM_DEBUG_BUG/GEM_BUG (Chris) v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-4-mika.kuoppala@intel.com	2017-09-25 11:33:47 +03:00
Mika Kuoppala	cf4591d1ce	drm/i915: Wrap port cancellation into a function On reset and wedged path, we want to release the requests that are tied to ports and then mark the ports to be unset. Introduce a function for this. v2: rebase v3: drop local, keep GEM_BUG_ON (Michał, Chris) v4: rebase Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-3-mika.kuoppala@intel.com	2017-09-25 11:33:40 +03:00
Mika Kuoppala	19df9a5782	drm/i915: Move execlist initialization into intel_engine_cs.c Move execlist init into a common engine setup. As it is common to both guc and hw execlists. v2: rebase with csb changes v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-2-mika.kuoppala@intel.com	2017-09-25 11:33:32 +03:00
Mika Kuoppala	b620e87021	drm/i915: Make own struct for execlist items Engine's execlist related items have been increasing to a point where a separate struct is warranted. Carve execlist specific items to a dedicated struct to add clarity. v2: add kerneldoc and fix whitespace (Joonas, Chris) v3: csb_mmio changes, rebase v4: s/\b(el\|execlist)\b/execlists/ (Joonas) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com	2017-09-25 11:33:23 +03:00
Michal Wajdeczko	4f044a88a8	drm/i915: Rename global i915 to i915_modparams Our global struct with params is named exactly the same way as new preferred name for the drm_i915_private function parameter. To avoid such name reuse lets use different name for the global. v5: pure rename v6: fix Credits-to: Coccinelle @@ identifier n; @@ ( - i915.n + i915_modparams.n ) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ville Syrjala <ville.syrjala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com	2017-09-22 14:50:36 +03:00
Michał Winiarski	85e2fe679e	drm/i915/guc: Submit GuC workitems containing coalesced requests To create an upper bound on number of GuC workitems, we need to change the way that requests are being submitted. Rather than submitting each request as an individual workitem, we can do coalescing in a similar way we're handlig execlist submission ports. We also need to stop pretending that we're doing "lite-restore" in GuC submission (we would create a workitem each time we hit this condition). This allows us to completely remove the reservation, replacing it with a compile time check. v2: Also coalesce when replaying on reset (Daniele) v3: Consistent wq_resv - per-request (Daniele) v4: Squash removing wq_resv v5: Reflect i915_guc_submit argument changes in doc v6: Rebase on top of execlists reset/restart fix (Chris,Michał) References: https://bugs.freedesktop.org/show_bug.cgi?id=101873 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170914083216.10192-2-michal.winiarski@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-09-18 11:18:00 +01:00
Chris Wilson	221ab9719b	drm/i915/execlists: Unwind incomplete requests on resets Given the mechanism to unwind and replay requests (designed to support preemption), we have an alternative to the current method of resubmitting the ELSP upon reset. Resubmitting ELSP turns out to be more complicated than expected, due to having to handle lost context-switch interrupts and so guessing what ELSP we need to resubmit later. Instead, by unwinding the requests and clearing the ELSP tracking entirely, we can then just dequeue the first pair of ready requests after resetting, using the normal submission procedure. Currently, the unwound requests have maximum priority and so are guaranteed to be resubmitted upon resume. If we are lucky, we may be able to coalesce a new request on top! Suggested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-4-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-09-18 11:00:12 +01:00
Chris Wilson	27606fd878	drm/i915/execlists: Split insert_request() In the next patch we will want to reinsert a request not at the end of the priority queue, but at the front. Here we split insert_request() into two, the first function retrieves the priority list (for reuse for unsubmit later) and a wrapper function to insert at the end of that list and to schedule the tasklet if we were first. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-3-chris@chris-wilson.co.uk	2017-09-18 11:00:10 +01:00
Chris Wilson	08dd3e1acc	drm/i915/execlists: Move insert_request() Move insert_request() earlier to avoid a forward declaration in a later patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-2-chris@chris-wilson.co.uk	2017-09-18 11:00:05 +01:00
Chris Wilson	523e7c9278	drm/i915/execlists: Kick start request processing after a reset During a reset, we may skip over completed requests and lost context-switch interrupts. Following the reset, we may then may end up with no active requests in the ELSP (and so do not resubmit to restart the engine), but have a queue of requests ready for execution. This is unlikely, it requires the last request to complete after the hang is detected, but not impossible. The outcome of this is that the engine stalls, possibly leading to full ring and indefinite wait under struct_mutex, eventually leading to a full driver hang. Alternatively, we can solve this by unsubmitting the incomplete requests and just kickstarting the tasklet. Michał has patches for that, which I initially disliked due to the extra complexity, but the complexity of this "simple" restart is growing... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-09-18 10:59:59 +01:00
Chris Wilson	27a5f61b37	drm/i915: Cancel all ready but queued requests when wedging When wedging the hw, we want to mark all in-flight requests as -EIO. This is made slightly more complex by execlists who store the ready but not yet submitted-to-hw requests on a private queue (an rbtree priolist). Call into execlists to cancel not only the ELSP tracking for the submitted requests, but also the queue of unsubmitted requests. v2: Move the majority of engine_set_wedged to the backends (both legacy ringbuffer and execlists handling their own lists). Reported-by: Michał Winiarski <michal.winiarski@intel.com> Testcase: igt/gem_eio/in-flight-contexts Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-09-18 10:59:55 +01:00
Chris Wilson	767a983ab2	drm/i915/execlists: Read the context-status HEAD from the HWSP The engine also provides a mirror of the CSB write pointer in the HWSP, but not of our read pointer. To take advantage of this we need to remember where we read up to on the last interrupt and continue off from there. This poses a problem following a reset, as we don't know where the hw will start writing from, and due to the use of power contexts we cannot perform that query during the reset itself. So we continue the current modus operandi of delaying the first read of the context-status read/write pointers until after the first interrupt. With this we should now have eliminated all uncached mmio reads in handling the context-status interrupt, though we still have the uncached mmio writes for submitting new work, and many uncached mmio reads in the global interrupt handler itself. Still a step in the right direction towards reducing our resubmit latency, although it appears lost in the noise! v2: Cannonlake moved the CSB write index v3: Include the sw/hwsp state in debugfs/i915_engine_info v4: Also revert to using CSB mmio for GVT-g v5: Prevent the compiler reloading tail (Mika) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-09-13 17:28:46 +01:00
Chris Wilson	6d2cb5aa38	drm/i915/execlists: Read the context-status buffer from the HWSP The engine provides a mirror of the CSB in the HWSP. If we use the cacheable reads from the HWSP, we can shave off a few mmio reads per context-switch interrupt (which are quite frequent!). Just removing a couple of mmio is not enough to actually reduce any latency, but a small reduction in overall cpu usage. Much appreciation for Ben dropping the bombshell that the CSB was in the HWSP and for Michel in digging out the details. v2: Don't be lazy, add the defines for the indices. v3: Include the HWSP in debugfs/i915_engine_info v4: Check for GVT-g, it currently depends on intercepting CSB mmio v5: Fixup GVT-g mmio path v6: Disable HWSP if VT-d is active as the iommu adds unpredictable memory latency. (Mika) v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio read and we want to stop the compiler from issuing a later (v.slow) reload. Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-09-13 17:24:32 +01:00
Daniele Ceraolo Spurio	486e93f72a	drm/i915/lrc: allocate separate page for HWSP On gen8+ we're currently using the PPHWSP of the kernel ctx as the global HWSP. However, when the kernel ctx gets submitted (e.g. from __intel_autoenable_gt_powersave) the HW will use that page as both HWSP and PPHWSP. This causes a conflict in the register arena of the HWSP, i.e. dword indices below 0x30. We don't current utilize this arena, but in the following patches we will take advantage of the cached register state for handling execlist's context status interrupt. To avoid the conflict, instead of re-using the PPHWSP of the kernel ctx we can allocate a separate page for the HWSP like what happens for pre-execlists platform. v2: Add a use-case for the register arena of the HWSP. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk	2017-09-13 15:02:39 +01:00
Michel Thierry	0b29c75a01	drm/i915/lrc: Clarify the format of the context image Not only the context image consist of two parts (the PPHWSP, and the logical context state), but we also allocate a header at the start of for sharing data with GuC. Thus every lrc looks like this: \| [guc] \| [hwsp] [logical state] \| \|<- our header ->\|<- context image ->\| So far, we have oversimplified whenever we use each of these parts of the context, just because the GuC header happens to be in page 0, and the (PP)HWSP is in page 1. But this had led to using the same define for more than one meaning (as a page index in the lrc and as 1 page). This patch adds defines for the GuC shared page, the PPHWSP page and the start of the logical state. It also updated the places where the old define was being used. Since we are not changing the size (or format) of the context, there are no functional changes. v2: Use PPHWSP index for hws again. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: intel-gvt-dev@lists.freedesktop.org Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-1-chris@chris-wilson.co.uk	2017-09-13 15:02:15 +01:00
Chris Wilson	2013ddebd2	drm/i915: Move the context descriptor to an inline helper The context descriptor is stored inside the per-engine context state, as we only need to compute it once and access it frequently. However, currently only intel_lrc.c has easy access, but i915_guc_submission.c would like to frequently read it as well, and more so only ever needs the lower 32bits. Make it an inline as the compiler should be able to retrieve the value in less instructions than it takes to do the function call: add/remove: 0/1 grow/shrink: 1/0 up/down: 8/-45 (-37) function old new delta i915_guc_submit 621 629 +8 intel_lr_context_descriptor 45 - -45 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170912214905.21987-1-chris@chris-wilson.co.uk Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>	2017-09-13 10:31:48 +01:00
Rodrigo Vivi	90007bca61	drm/i915/cnl: Introduce initial Cannonlake Workarounds. Let's inherit workarounds from previous platforms that according to wa_database and BSpec are still valid for Cannonlake. v2: Add missed workarounds. v3: Rebase v4: Remove bad chunk that was added to rc6 disable. (Ander) Also remove A0 W/a that are not needed anymore. v5: Rebase on top of CFL. v6: Remove empty gen9_init_perctx_bb and gen9_init_indirectctx_bb since they don't carry any gen10 related W/a. (by Oscar). Also Remove A0 exclusive workaround. v7: Remove more A0 exclusive workarounds. As pointed out by Oscar many workarounds were changed to be A0 only so let's remove them. Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170815231651.975-1-rodrigo.vivi@intel.com	2017-08-18 15:32:17 -07:00
Chris Wilson	64f09f00ca	drm/i915: Clear lost context-switch interrupts across reset During a global reset, we disable the irq. As we disable the irq, the hardware may be raising a GT interrupt that we then ignore, leaving it pending in the GTIIR. After the reset, we then re-enable the irq, triggering the pending interrupt. However, that interrupt was for the stale state from before the reset, and the contents of the CSB buffer are now invalid. v2: Add a comment to make it clear that the double clear is purely my paranoia. Reported-by: "Dong, Chuanxiao" <chuanxiao.dong@intel.com> Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Dong, Chuanxiao" <chuanxiao.dong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170807121919.30165-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20170818090509.5363-1-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2017-08-18 22:42:13 +01:00
Chris Wilson	cdb6ded42f	drm/i915: Flush the execlist ports if idle When doing a GPU reset, the CSB register will be trashed and we will lose any context-switch notifications that happened since the tasklet was disabled. If we find that all requests on this engine were completed, we want to make sure that the ELSP tracker is similarly empty so that we do not feed back in the completed requests upon recovering from the reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-4-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:45 +02:00
Chris Wilson	829a0af29f	drm/i915: Group all the global context information together Create a substruct to hold all the global context state under drm_i915_private. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk	2017-06-20 17:13:40 +01:00
Robert Bragg	19f81df285	drm/i915/perf: Add OA unit support for Gen 8+ Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. v5: Drain submitted requests when enabling metric set to ensure no lite-restore erases the context image we just updated (Lionel) v6: In addition to drain, switch to kernel context & update all context in place (Chris) v7: Add missing mutex_unlock() if switching to kernel context fails (Matthew) v8: Simplify OA period/flex-eu-counters programming by using the batchbuffer instead of modifying ctx-image (Lionel) v9: Back to updating the context image (due to erroneous testing, batchbuffer programming the OA unit doesn't actually work) (Lionel) Pin context before updating context image (Chris) Drop MMIO programming now that we switch to a kernel context with right values in initial context image (Chris) v10: Just pin_map the contexts we want to modify or let the configuration happen on first use (Chris) v11: Update kernel context OA config through the batchbuffer rather than on the fly ctx-image update (Lionel) v12: Rework OA context registers update again by swithing away from user contexts and reconfiguring the kernel context through the batchbuffer and updating all the other contexts' context image. Also take care to lock slice/subslice configuration when OA is on. (Lionel) v13: Request rpcs updates on all engine when updating the OA config (Lionel) v14: Drop any kind of rpcs management now that we monitor sseu configuration changes in a later patch (Lionel) Remove usleep after programming the NOA configs on Gen8+, this doesn't seem to be needed (Lionel) v15: Respect coding style for block comments (Chris) v16: Add missing i915_add_request() in case we fail to emit OA configuration (Matthew) Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2017-06-14 12:31:57 -07:00
Michel Thierry	7bd0a2c6e1	drm/i915/gen10: Set value of Indirect Context Offset for gen10 Indirect Context Offset Pointer has changed for Cannonlake. INDIRECT_CTX_OFFSET[15:6] valid value for CNL is 19h per Spec. v2: rebased to intel_lr_indirect_ctx_offset v3: Commit message added per Tvrtko request. Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1496781040-20888-9-git-send-email-rodrigo.vivi@intel.com	2017-06-07 07:29:58 -07:00
Chris Wilson	349bdb68bd	drm/i915/execlists: Reduce lock contention between schedule/submit_request If we do not require to perform priority bumping, and we haven't yet submitted the request, we can update its priority in situ and skip acquiring the engine locks -- thus avoiding any contention between us and submit/execute. v2: Remove the stack element from the list if we can do the early assignment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-10-chris@chris-wilson.co.uk	2017-05-17 13:38:13 +01:00
Chris Wilson	c5cf9a9147	drm/i915: Create a kmem_cache to allocate struct i915_priolist from The i915_priolist are allocated within an atomic context on a path where we wish to minimise latency. If we use a dedicated kmem_cache, we have the advantage of a local freelist from which to service new requests that should keep the latency impact of an allocation small. Though currently we expect the majority of requests to be at default priority (and so hit the preallocate priolist), once userspace starts using priorities they are likely to use many fine grained policies improving the utilisation of a private slab. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk	2017-05-17 13:38:12 +01:00
Chris Wilson	6c067579e6	drm/i915: Split execlist priority queue into rbtree + linked list All the requests at the same priority are executed in FIFO order. They do not need to be stored in the rbtree themselves, as they are a simple list within a level. If we move the requests at one priority into a list, we can then reduce the rbtree to the set of priorities. This should keep the height of the rbtree small, as the number of active priorities can not exceed the number of active requests and should be typically only a few. Currently, we have ~2k possible different priority levels, that may increase to allow even more fine grained selection. Allocating those in advance seems a waste (and may be impossible), so we opt for allocating upon first use, and freeing after its requests are depleted. To avoid the possibility of an allocation failure causing us to lose a request, we preallocate the default priority (0) and bump any request to that priority if we fail to allocate it the appropriate plist. Having a request (that is ready to run, so not leading to corruption) execute out-of-order is better than leaking the request (and its dependency tree) entirely. There should be a benefit to reducing execlists_dequeue() to principally using a simple list (and reducing the frequency of both rbtree iteration and balancing on erase) but for typical workloads, request coalescing should be small enough that we don't notice any change. The main gain is from improving PI calls to schedule, and the explicit list within a level should make request unwinding simpler (we just need to insert at the head of the list rather than the tail and not have to make the rbtree search more complicated). v2: Avoid use-after-free when deleting a depleted priolist v3: Michał found the solution to handling the allocation failure gracefully. If we disable all priority scheduling following the allocation failure, those requests will be executed in fifo and we will ensure that this request and its dependencies are in strict fifo (even when it doesn't realise it is only a single list). Normal scheduling is restored once we know the device is idle, until the next failure! Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk	2017-05-17 13:38:09 +01:00
Chris Wilson	77f0d0e925	drm/i915/execlists: Pack the count into the low bits of the port.request add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187) function old new delta execlists_submit_ports 262 471 +209 port_assign.isra - 136 +136 capture 6344 6359 +15 reset_common_ring 438 452 +14 execlists_submit_request 228 238 +10 gen8_init_common_ring 334 341 +7 intel_engine_is_idle 106 105 -1 i915_engine_info 2314 2290 -24 __i915_gem_set_wedged_BKL 485 411 -74 intel_lrc_irq_handler 1789 1604 -185 execlists_update_context 294 - -294 The most important change there is the improve to the intel_lrc_irq_handler and excclist_submit_ports (net improvement since execlists_update_context is now inlined). v2: Use the port_api() for guc as well (even though currently we do not pack any counters in there, yet) and hide all port->request_count inside the helpers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk	2017-05-17 13:38:06 +01:00
Chuanxiao Dong	0d402a24df	drm/i915: set initialised only when init_context callback is NULL During execlist_context_deferred_alloc() we presumed that the context is uninitialised (we only just allocated the state object for it!) and chose to optimise away the later call to engine->init_context() if engine->init_context were NULL. This breaks with GVT's contexts that are marked as pre-initialised to avoid us annoyingly calling engine->init_context(). The fix is to not override ce->initialised if it is already true. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1494497262-24855-1-git-send-email-chuanxiao.dong@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-05-12 11:11:11 +01:00
Chris Wilson	266a240bf0	drm/i915: Use engine->context_pin() to report the intel_ring Since unifying ringbuffer/execlist submission to use engine->pin_context, we ensure that the intel_ring is available before we start constructing the request. We can therefore move the assignment of the request->ring to the central i915_gem_request_alloc() and not require it in every engine->request_alloc() callback. Another small step towards simplification (of the core, but at a cost of handling error pointers in less important callers of engine->pin_context). v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code generation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk	2017-05-04 11:54:43 +01:00
Joonas Lahtinen	63ffbcdadc	drm/i915: Sanitize engine context sizes Pre-calculate engine context size based on engine class and device generation and store it in the engine instance. v2: - Squash and get rid of hw_context_size (Chris) v3: - Move after MMIO init for probing on Gen7 and 8 (Chris) - Retained rounding (Tvrtko) v4: - Rebase for deferred legacy context allocation Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: intel-gvt-dev@lists.freedesktop.org Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-04-28 12:11:59 +03:00
Chris Wilson	e6ba9992de	drm/i915: Differentiate between sw write location into ring and last hw read We need to keep track of the last location we ask the hw to read up to (RING_TAIL) separately from our last write location into the ring, so that in the event of a GPU reset we do not tell the HW to proceed into a partially written request (which can happen if that request is waiting for an external signal before being executed). v2: Refactor intel_ring_reset() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_fence/await-hang Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Fixes: `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-04-25 15:33:22 +01:00
Chris Wilson	6b764a594f	drm/i915: Report request restarts for both execlists/guc As we now share the execlist_port[] tracking for both execlists/guc, we can reset the inflight count on both and report which requests are being restarted. Suggested-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425103835.31871-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-04-25 14:02:36 +01:00
Chris Wilson	48921260b7	drm/i915/execlists: Document runtime pm for intel_lrc_irq_handler() We indirectly hold the runtime-pm for the intel_lrc_irq_handler() by virtue of dev_priv->gt.awake keeping a wakeref whilst the requests are busy. As this is not obvious from the code, add a comment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170411175850.2470-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-04-12 09:44:08 +01:00
Daniele Ceraolo Spurio	ddfb570c20	drm/i915: Use the engine class to get the context size Technically speaking, the context size is per engine class, not per instance. v2: Add MISSING_CASE (Tvrtko) v3: Rebased v4: Restore the interface back to hiding the class lookup (Chris) Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1491905472-16189-1-git-send-email-oscar.mateo@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-04-11 19:44:04 +01:00
Chris Wilson	d822bb18ce	drm/i915: intel_ring.engine is unused Or rather it is used only by intel_ring_pin() to extract the drm_i915_private which we can easily pass in. As this is a relatively rare operation, save the space in the struct, and as such it is even break even in the extra code for passing around the parameter: add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0) function old new delta intel_init_ring_buffer 906 918 +12 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261794, chg +0.00% v2: Reorder intel_init_ring_buffer to keep the ring setup together: add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6) function old new delta intel_init_ring_buffer 906 912 +6 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261788, chg -0.00% Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk	2017-04-03 13:52:58 +01:00
Chris Wilson	18afa28892	Revert "drm/i915: Skip execlists_dequeue() early if the list is empty" This reverts commit `6c943de668` ("drm/i915: Skip execlists_dequeue() early if the list is empty"). The validity of using READ_ONCE there depends upon having a mb to coordinate the assignment of engine->execlist_first inside submit_request() and checking prior to taking the spinlock in execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a memory barrier making this cross-cpu checking safe", but failed to notice that this mb was conditional on the execlists being ready, i.e. there wasn't the required mb when it was most necessary! We could install an unconditional memory barrier to fixup the READ_ONCE(): diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 7dd732cb9f57..1ed164b16d44 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -616,6 +616,7 @@ static void execlists_submit_request(struct drm_i915_gem_request *request) if (insert_request(&request->priotree, &engine->execlist_queue)) { engine->execlist_first = &request->priotree.node; + smp_wmb(); if (execlists_elsp_ready(engine)) But we have opted to remove the race as it should be rarely effective, and saves us having to explain the necessary memory barriers which we quite clearly failed at. Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Fixes: `6c943de668` ("drm/i915: Skip execlists_dequeue() early if the list is empty") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>	2017-03-29 13:02:24 +01:00
Chris Wilson	a79a524e92	drm/i915: Avoid lock dropping between rescheduling Unlocking is dangerous. In this case we combine an early update to the out-of-queue request, because we know that it will be inserted into the correct FIFO priority-ordered slot when it becomes ready in the future. However, given sufficient enthusiasm, it may become ready as we are continuing to reschedule, and so may gazump the FIFO if we have since dropped its spinlock. The result is that it may be executed too early, before its dependencies. v2: Move all work into the second phase over the topological sort. This removes the shortcut on the out-of-rbtree request to ensure that we only adjust its priority after adjusting all of its dependencies. Fixes: `20311bd350` ("drm/i915/scheduler: Execute requests in order of priorities") Testcase: igt/gem_exec_whisper Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170327202143.7972-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-29 11:47:45 +01:00
Chris Wilson	ed1501d451	drm/i915: Refactor tests for validity of RING_TAIL Whilst I like having the assertions clearly visible in the code, they are quite repetitious! As we find new limits we want to incorporate into the set of assertions, it make sense to refactor them to a common routine. v2: Add a guc holdout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-27 15:03:53 +01:00
Chris Wilson	a91fdf1293	drm/i915: Assert that the request->tail fits within the ring In addition to being qword-aligned, the RING_TAIL offset must be within the ring! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-27 15:03:21 +01:00
Chris Wilson	450362d3fe	drm/i915/execlists: Wrap tail pointer after reset tweaking If the request->wa_tail is 0 (because it landed exactly on the end of the ringbuffer), when we reconstruct request->tail following a reset we fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is never able to catch up with RING_TAIL and the GPU spins endlessly. If the ring contains a couple of breadcrumbs, even our hangcheck is unable to catch the busy-looping as the ACTHD and seqno continually advance. v2: Move the wrap into a common intel_ring_wrap(). Fixes: `a3aabe86a3` ("drm/i915/execlists: Reinitialise context image after GPU hang") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-27 15:02:56 +01:00
Chris Wilson	4af0d727b1	drm/i915/execlists: Trim irq handler I noticed that gcc was spilling the CSB to the stack, so rearrange the code to be more compact. Spilling in this function is slightly more interesting due to the mmio reads acting as memory barriers and so end up flushing the stack spills. Still miniscule to having to do at least the pair of uncached reads :( function old new delta intel_lrc_irq_handler 1039 878 -161 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170325201053.21306-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-27 12:48:44 +01:00
Chris Wilson	2e70b8c6bf	drm/i915/execlists: Relax the locked clear_bit(IRQ_EXECLIST) We only need to care about the ordering of the clearing of the bit with the uncached CSB read in order to correctly detect a new interrupt before the read completes. The uncached read itself acts as a full memory barrier, so we do not need to enforce another in the form of a locked clear_bit. v2: Clarify why the split and unlocked test/clear is harmless. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170323134803.10418-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-23 22:13:51 +00:00
Tvrtko Ursulin	9f7886d07f	drm/i915: Spinlocks in tasklets can use spin_(un)lock_irq The tasklets callbacks are only called from tasklet context so it is safe do to this. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170321105511.18269-1-tvrtko.ursulin@linux.intel.com	2017-03-21 15:04:55 +00:00
Chris Wilson	fe085f13c7	drm/i915: Remove intel_ring.last_retired_head Storing the position of the breadcrumb of the last retired request as a separate last_retired_head is superfluous as we always copy that into head prior to recalculation of the intel_ring.space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk	2017-03-21 14:21:50 +00:00
Chris Wilson	899f6204c0	drm/i915/execlists: Split the atomic test_and_clear_bit for irq handler Rather than impose the cost of a locked test before queuing a new request, reduce it to a simple test_bit() with a following clear_bit() prior to doing the CSB check. This ensure that if an interrupt does occur whilst reading from the CSB, we still detect it (the interrupt would trigger a rescheduling of the tasklet anyway). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170321113320.2603-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-03-21 14:14:55 +00:00
Chris Wilson	c9203e8277	drm/i915: Reset tasklet back to execlists after disabling guc When switching back to execlists, we also now need to restore the tasklet handler. Reported-by: Oscar Mateo <oscar.mateo@intel.com> Fixes: `31de73501a` ("drm/i915/scheduler: emulate a scheduler for guc") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170318102859.24101-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-03-20 10:13:14 +00:00
Chris Wilson	6c943de668	drm/i915: Skip execlists_dequeue() early if the list is empty Do an early read of the execlists' queue before we take the spinlock and start checking. This is safe as the first writer to the execlists queue will cause the tasklet to be run again after a memory barrier. v2: Keep guc in sync with execlists queue changes v3: Explain the mb between the tasklet running on one cpu and the execlist_first update and schedule from a second cpu. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170317120716.17191-1-chris@chris-wilson.co.uk	2017-03-17 15:53:26 +00:00
Chris Wilson	a533b4ba77	drm/i915: Assert that the context pin_counts do not overflow This should be impossible, but let's assert that we do not pin a context 4 billion times before retiring! v2: Fix the assertion -- the patch had just one job to do! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171628.3228-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-16 20:48:58 +00:00
Chris Wilson	ff44ad51eb	drm/i915: Move engine->submit_request selection to a vfunc It turns out that we may want to restore the original engine->submit_request (and engine->schedule) callbacks from more than just the guc <-> execlists transition. Move this to a vfunc so we can have a common interface. v2: Move initial selection to intel_engines_init_common(), repaint vfunc with engine->set_default_submission (and a similar colour for the helper). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk	2017-03-16 17:17:12 +00:00
Changbin Du	3fc03069bc	drm/i915: make context status notifier head be per engine GVTg has introduced the context status notifier to schedule the GVTg workload. At that time, the notifier is bound to GVTg context only, so GVTg is not aware of host workloads. Now we are going to improve GVTg's guest workload scheduler policy, and add Guc emulation support for new Gen graphics. Both these two features require acknowledgment for all contexts running on hardware. (But will not alter host workload.) So here try to make some change. The change is simple: 1. Move the context status notifier head from i915_gem_context to intel_engine_cs. Which means there is a notifier head per engine instead of per context. Execlist driver still call notifier for each context sched-in/out events of current engine. 2. At GVTg side, it binds a notifier_block for each physical engine at GVTg initialization period. Then GVTg can hear all context status events. In this patch, GVTg do nothing for host context event, but later will add a function there. But in any case, the notifier callback is a noop if this is no active vGPU. Since intel_gvt_init() is called at early initialization stage and require the status notifier head has been initiated, I initiate it in intel_engine_setup(). v2: remove a redundant newline. (chris) Fixes: `3c7ba6359d` ("drm/i915: Introduce execlist context status change notification") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100232 Signed-off-by: Changbin Du <changbin.du@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170313024711.28591-1-changbin.du@intel.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-03-16 16:24:35 +00:00
Chris Wilson	31de73501a	drm/i915/scheduler: emulate a scheduler for guc This emulates execlists on top of the GuC in order to defer submission of requests to the hardware. This deferral allows time for high priority requests to gazump their way to the head of the queue, however it nerfs the GuC by converting it back into a simple execlist (where the CPU has to wake up after every request to feed new commands into the GuC). v2: Drop hack status - though iirc there is still a lockdep inversion between fence and engine->timeline->lock (which is impossible as the nesting only occurs on different fences - hopefully just requires some judicious lockdep annotation) v3: Apply lockdep nesting to enabling signaling on the request, using the pattern we already have in __i915_gem_request_submit(); v4: Replaying requests after a hang also now needs the timeline spinlock, to disable the interrupts at least v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal v6: Reorder interrupt checking for a happier gcc. v7: Only signal the tasklet after a user-interrupt if using guc scheduling v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko) v9: Avoid re-initialising the engine->irq_tasklet from inside a reset v10: Hook up the execlists-style tracepoints v11: Clear the execlists irq_posted bit after taking over the interrupt/tasklet Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316125619.6856-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-03-16 14:45:07 +00:00
Mika Kuoppala	e71677698b	drm/i915: Avoid using word legacy with ppgtt The term legacy is subjective. Use 3lvl and 4lvl where appropriate. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-4-git-send-email-mika.kuoppala@intel.com	2017-03-03 16:46:23 +02:00
Mika Kuoppala	54af56dbf8	drm/i915: Don't mark pdps clear if pdps are not submitted Don't mark pdps clear if never do the necessary actions with the hardware to make them clear. v2: totally get rid of confusing ppgtt bool (Chris) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-2-git-send-email-mika.kuoppala@intel.com	2017-03-03 16:45:11 +02:00
Chris Wilson	0542524944	drm/i915: Generalise wait for execlists to be idle The code to check for execlists completion is generic, so move it to intel_engine_cs.c, where we can reuse the new intel_engine_is_idle(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-03-03 13:08:15 +00:00
Kelvin Gardiner	69060d9644	drm/i915/bdw: Do not write the replay bit of the ring mode register The replay bit of the ring mode register is not a valid bit for Gen8+. Do not write to this bit. Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> [Joonas: Fixed commit message line to be under 72 chars] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1487963724-4824-1-git-send-email-kelvin.gardiner@intel.com	2017-02-27 14:02:50 +02:00
Chris Wilson	fe9ae7a3bf	drm/i915/execlists: Detect an out-of-order context switch We require that the request is completed before the context is switched away. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170223145031.26210-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-24 12:57:32 +00:00
Tvrtko Ursulin	d7d96833f2	drm/i915/tracepoints: Add backend level request in and out tracepoints Two new tracepoints placed at the call sites where requests are actually passed to the GPU enable userspace to track engine utilisation. These tracepoints are only enabled when the DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled. v2: Fix compilation with !CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS. v3: Name global seqno consistently across tracepoints. v4: Remove port info from request out tracepoint. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-21 13:18:25 +00:00
Tvrtko Ursulin	56e51bf036	drm/i915: Tidy execlists_init_reg_state Compact the name of the macro and reg_state variable, and cache some data in local variables to make the function more compact and more readable. v2: Fixup some checkpatch warnings. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170221095839.30525-1-tvrtko.ursulin@linux.intel.com	2017-02-21 13:15:41 +00:00
Chris Wilson	944a36d472	drm/i915: Assert that the request->tail is always qword aligned The hardware requires that the tail pointer only advance in qword units, so assert that the value we write is aligned to qwords, and similarly enforce this restriction onto the request->tail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217163833.731-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-02-20 14:32:25 +00:00
Tvrtko Ursulin	9f235dfa49	drm/i915: Consolidate gen8_emit_pipe_control We have a few open coded instances in the execlists code and an almost suitable helper in intel_ringbuf.c We can consolidate to a single helper if we change the existing helper to emit directly to ring buffer memory and move the space reservation outside it. v2: Drop memcpy for memset. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170216122325.31391-2-tvrtko.ursulin@linux.intel.com	2017-02-17 11:39:59 +00:00
Tvrtko Ursulin	097d4f1c12	drm/i915: Tidy workaround batch buffer emission Use the "*batch++ = " style as in the ring emission for better readability and also simplify the logic a bit by consolidating the offset and size calculations and overflow checking. The latter is a programming error so it is not required to check for it after each write to the object, but rather do it once the whole state has been written and fail the driver if something went wrong. v2: Rebase. v3: Keep track of offsets and sizes in bytes for simplicity and rename function pointer variable to _fn suffix. (Chris Wilson) v4: Fix size calc broken in v3 and add alignment warning. (Chris Wilson) v5: Fix return code. v6: I added an exit from loop in v5 but forgot to put back the object teardown. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-17 11:39:59 +00:00
Tvrtko Ursulin	4ac9659ef9	drm/i915: Remove duplicate intel_logical_ring_workarounds_emit intel_ring_workarounds_emit is exactly the same code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170214150017.16058-1-tvrtko.ursulin@linux.intel.com	2017-02-15 08:16:41 +00:00
Tvrtko Ursulin	73dec95e6b	drm/i915: Emit to ringbuffer directly This removes the usage of intel_ring_emit in favour of directly writing to the ring buffer. intel_ring_emit was preventing the compiler for optimising fetch and increment of the current ring buffer pointer and therefore generating very verbose code for every write. It had no useful purpose since all ringbuffer operations are started and ended with intel_ring_begin and intel_ring_advance respectively, with no bail out in the middle possible, so it is fine to increment the tail in intel_ring_begin and let the code manage the pointer itself. Useless instruction removal amounts to approximately two and half kilobytes of saved text on my build. Not sure if this has any measurable performance implications but executing a ton of useless instructions on fast paths cannot be good. v2: * Change return from intel_ring_begin to error pointer by popular demand. * Move tail increment to intel_ring_advance to enable some error checking. v3: * Move tail advance back into intel_ring_begin. * Rebase and tidy. v4: * Complete rebase after a few months since v3. v5: * Remove unecessary cast and fix !debug compile. (Chris Wilson) v6: * Make intel_ring_offset take request as well. * Fix recording of request postfix plus a sprinkle of asserts. (Chris Wilson) v7: * Use intel_ring_offset to get the postfix. (Chris Wilson) * Convert GVT code as well. v8: * Rename out++ to cs++. v9: * Fix GVT out to cs conversion in GVT. v10: * Rebase for new intel_ring_begin in selftests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com	2017-02-14 14:30:46 +00:00
Chris Wilson	ae9a043b0c	drm/i915: Rename conditional GEM execution macros After a brief discussion, we settled on a naming convention for the conditional GEM debugging data that should be clearer to the casual user: GEM_DEBUG Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-02-10 21:43:43 +00:00
Chris Wilson	72b72ae473	drm/i915: Always pin contexts into the high GGTT Now that we have fast top-down insertion into the drm_mm, we can use it for frequent runtime operations like insertion of the context object, whereas before we limited it to the one-off insertion of the pinned kernel context. Keeping the active context objects out of the mappable region of the global GTT (except under memory pressure) improves our ability to allocate mappable aperture region without triggering a GPU stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170210101422.1598-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-10 13:58:40 +00:00
Chris Wilson	949e8ab3a9	drm/i915: Use the size/type of address space to make decisions Once the address space has been created (using 3 or 4 levels of page tables), we should use that to program the appropriate type into the contexts. This gives us the flexibility to handle different types of address spaces at runtime. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170209144036.23664-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-09 17:09:27 +00:00
Chris Wilson	c0dcb203fb	drm/i915: Restore context and pd for ringbuffer submission after reset Following a reset, the context and page directory registers are lost. However, the queue of requests that we resubmit after the reset may depend upon them - the registers are restored from a context image, but that restore may be inhibited and may simply be absent from the request if it was in the middle of a sequence using the same context. If we prime the CCID/PD registers with the first request in the queue (even for the hung request), we prevent invalid memory access for the following requests (and continually hung engines). v2: Magic BIT(8), reserved for future use but still appears unused. v3: Some commentary on handling innocent vs guilty requests v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my Ivybridge, but this bit probably exists for a reason. Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-07 21:34:58 +00:00
Chris Wilson	2ffe80aa44	drm/i915: Avoid unguarded reads from the request pointer In commit `86aa7e760a` ("drm/i915: Assert that the context-switch completion matches our context") I added a read to the irq tasklet handler that compared the on-chip status with that of our sw tracking, using an unguarded read of the request pointer to get the context and beyond. Whilst we hold a reference to the request, we do not hold anything on the context and if we are unlucky it may be reaped from a second thread retiring the request (since it may retire the request as soon as the breadcrumb is complete, even before we finish processing the context switch) as we try to read from the context pointer. Avoid the racy read from underneath the request by storing the expected result in the execlist_port[]. v2: Include commentary about port[].request being unprotected. Fixes: `86aa7e760a` ("drm/i915: Assert that the context-switch completion matches our context") Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Testcase: igt/gem_ctx_create Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170206170502.30944-2-chris@chris-wilson.co.uk	2017-02-06 20:41:01 +00:00
Zhi Wang	04da811b3d	drm/i915: Let execlist_update_context() cover !FULL_PPGTT mode. execlist_update_context() will try to update PDPs in a context before a ELSP submission only for full PPGTT mode, while PDPs was populated during context initialization. Now the latter code path is removed. Let execlist_update_context() also cover !FULL_PPGTT mode. Fixes: `34869776c7` ("drm/i915: check ppgtt validity when init reg state") Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhiyuan Lv <zhiyuan.lv@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1486377436-15380-1-git-send-email-zhi.a.wang@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-06 15:14:14 +00:00
Chris Wilson	22cc440eae	drm/i915: Print execlists restart after reset After resetting, show the requests that each engine restarts from in the debug log. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170204110519.7645-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-06 11:48:16 +00:00
Chris Wilson	453cfe2171	drm/i915/execlists: Add interrupt-pending check to intel_execlists_idle() Primarily this serves as a sanity check that the bit has been cleared before we suspend (and hasn't reappeared after resume). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170201131222.11882-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-01 15:23:06 +00:00
Chris Wilson	e2de845e0b	drm/i915/execlists: Skip resetting RING_CONTEXT_STATUS_PTR As we now flag when the GPU signals a context-switch and do not read the status register before we see that signal, we do not have to ensure that it is cleared upon reset (and can leave it to the GPU to reset it from the power context). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170201125338.12932-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-01 15:22:38 +00:00
Mika Kuoppala	2355cf088d	drm/i915: Create context desc template when context is created Move the invariant parts of context desc setup from execlist init to context creation. This is advantageous when we need to create different templates based on the context parametrization, ie. for svm capable contexts. v2: s/create/default, remove engine->ctx_desc_template Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1485522189-31984-1-git-send-email-mika.kuoppala@intel.com	2017-01-30 16:36:20 +02:00
Ander Conselvan de Oliveira	9fb5026f85	drm/i915/glk: Turn on workarounds that apply to Geminilake too Apply workarounds to Geminilake, and annotate those that are applied unconditionally when they apply to GLK based on the workaround database. v2: Fix commit message typos. (David) v3: Rebase. Cc: David Weinehall <david.weinehall@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1485422218-9102-1-git-send-email-ander.conselvan.de.oliveira@intel.com	2017-01-30 09:49:07 +02:00
Chris Wilson	48ea2554f4	drm/i915: Dequeue execlists on a new request if any port is available If the second ELSP port is available, schedule the execlists tasklet to see if the new request can occupy it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-7-chris@chris-wilson.co.uk	2017-01-24 16:00:25 +00:00
Chris Wilson	3833281adb	drm/i915: Only attempt to pass the first request to execlists Only the first request added to the execlist queue can be submitted. If this request is not the first request on the queue, it means that there are already higher priority requests waiting upon the tasklet and kicking it will make no difference. This is more relevant for a later patch, where we more eagerly try and kick the tasklet to handle the submission of new requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-01-24 16:00:13 +00:00
Chris Wilson	a37951ac9f	drm/i915: Skip the execlists CSB scan and rewrite if the ring is untouched If the CSB head/tail pointers are unchanged, we can skip the update of the CSB register afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-5-chris@chris-wilson.co.uk	2017-01-24 15:56:18 +00:00
Chris Wilson	f747026c2b	drm/i915: Only run execlist context-switch handler after an interrupt Mark when we run the execlist tasklet following the interrupt, so we don't probe a potentially uninitialised register when submitting the contexts multiple times before the hardware responds. v2: Use a shared engine->irq_posted v3: Always use locked bitops to be sure of atomicity wrt to other bits in the mask. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124152021.26587-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-01-24 15:56:01 +00:00
Chris Wilson	816ee798ec	drm/i915: Only disable execlist preemption for the duration of the request We need to prevent resubmission of the context immediately following an initial resubmit (which does a lite-restore preemption). Currently we do this by disabling all submission whilst the context is still active, but we can improve this by limiting the restriction to only until we receive notification from the context-switch interrupt that the lite-restore preemption is complete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-2-chris@chris-wilson.co.uk	2017-01-24 15:55:16 +00:00
Chris Wilson	c816e605ff	drm/i915: Assert that we don't submit to execlists whilst a preempt is pending To complement the check in execlists_elsp_ready(), also assert that we don't submit the same context while it has a lite restore still pending. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-1-chris@chris-wilson.co.uk	2017-01-24 15:55:05 +00:00
Chris Wilson	b403c8feaf	drm/i915: Remove BXT TDL state w/a This w/a (WaClearTdlStateAckDirtyBits) was only used for preproduction hw, which is no longer in use. Remove the workaround to simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-5-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-23 15:53:02 +00:00
Chris Wilson	68bee61573	drm/i915: Remove BXT disable pixel mask clamping w/a This w/a (WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken) was only used for preproduction hw, which is no longer in use. Remove the workaround to simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-4-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-23 15:53:01 +00:00
Chris Wilson	32ebc29211	drm/i915: Remove BXT restore arbitration around ctx switch This w/a (WaDisableCtxRestoreArbitration) was only used for preproduction hw, which is no longer in use. Remove the workaround to simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-23 15:53:01 +00:00
Chris Wilson	f8dd2934c4	drm/i915: Remove BXT incoherent seqno write workaround This w/a was only used for preproduction hw, which is no longer in use. Remove the workaround to simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-23 15:53:00 +00:00
Chris Wilson	70962fbe5c	drm/i915: Remove disable_lite_restore_wa This w/a (WaEnableForceRestoreInCtxtDescForVCS) was only used for preproduction hw, which is no longer in use. Remove the workaround to simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-23 15:52:42 +00:00
Chris Wilson	86aa7e760a	drm/i915: Assert that the context-switch completion matches our context When execlists signals the context completion, it also provides the context id for the status event. Assert that id matches the one we expect. v2: The upper dword of the context status is a duplicate of the upper dword from elsp submission (i.e. includes the group id as well as the context id). Include this check as well. v3: Only check against lrc_desc (as this contains the hw_id check) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170123113132.18665-2-chris@chris-wilson.co.uk	2017-01-23 12:04:46 +00:00
Chris Wilson	a01cb37aff	drm/i915: Remove i915_vma_create from VMA API With the introduce of i915_vma_instance() for obtaining the VMA singleton for a (obj, vm, view) tuple, we can remove the i915_vma_create() in favour of a single entry point. We do incur a lookup onto an empty tree, but the i915_vma_create() were being called infrequently and during initialisation, so the small overhead is negligible. v2: Drop the i915_ prefix from the now static vma_create() function Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk	2017-01-19 10:17:39 +00:00
Chris Wilson	7c3f86b6dc	drm/i915: Invalidate the guc ggtt TLB upon insertion Move the GuC invalidation of its ggtt TLB to where we perform the ggtt modification rather than proliferate it into all the callers of the insert (which may or may not in fact have to do the insertion). v2: Just do the guc invalidate unconditionally, (afaict) it has no impact without the guc loaded on gen8+ v3: Conditionally invalidate the guc - just in case that register has not been validated for other modes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170112110050.25333-1-chris@chris-wilson.co.uk	2017-01-12 16:47:04 +00:00
Francisco Jerez	8726f2faa3	drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround. The WaDisableLSQCROPERFforOCL workaround has the side effect of disabling an L3SQ optimization that has huge performance implications and is unlikely to be necessary for the correct functioning of usual graphic workloads. Userspace is free to re-enable the workaround on demand, and is generally in a better position to determine whether the workaround is necessary than the DRM is (e.g. only during the execution of compute kernels that rely on both L3 fences and HDC R/W requests). The same workaround seems to apply to BDW (at least to production stepping G1) and SKL as well (the internal workaround database claims that it does for all steppings, while the BSpec workaround table only mentions pre-production steppings), but the DRM doesn't do anything beyond whitelisting the L3SQCREG4 register so userspace can enable it when it sees fit. Do the same on KBL platforms. Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%, and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master -- This is followed by a regression of 35% and 10% respectively for the same benchmarks and platform caused by my recent patch series switching userspace to use the dataport constant cache instead of the sampler to implement uniform pull constant loads, which caused us to hit more heavily the L3 cache (and on platforms other than KBL had the opposite effect of improving performance of the same two benchmarks). The overall effect on KBL of this change combined with the recent userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf was affected by the constant cache changes (though it improved as it did on other platforms rather than regressing), but is not significantly affected by this patch (with statistical significance of 5% and sample size 20). v2: Drop some more code to avoid unused variable warning. Fixes: `738fa1b312` ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: beignet@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v4.7+ Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [Removed double Fixes tag] Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com	2017-01-12 15:48:08 +02:00
Zhenyu Wang	34869776c7	drm/i915: check ppgtt validity when init reg state Check if ppgtt is valid for context when init reg state. For gvt context which has no i915 allocated ppgtt, failed to check that would cause kernel null ptr reference error. v2: remove !48bit ppgtt case as we'll always update before submit (Chris) Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170109131453.3943-1-zhenyuw@linux.intel.com	2017-01-12 08:39:41 +00:00
Chris Wilson	f51455d442	drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE Start converting over from the byte count to its semantic macro, either we want to allocate the size of a physical page in main memory or we want the size of a virtual page in the GTT. 4096 could mean either, but PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve code comprehension and future changes. In the future, we may want to use variable GTT page sizes and so have the challenge of knowing which hardcoded values were used to represent a physical page vs the virtual page. v2: Look for a few more 4096s to convert, discover IS_ALIGNED(). v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep bdw stolen w/a as 4096 until we know better. v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page sizes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-10 20:54:32 +00:00
Chris Wilson	984ff29f74	drm/i915: Simplify testing for am-I-the-kernel-context? The kernel context (dev_priv->kernel_context) is unique in that it is not associated with any user filp - it is the only one with ctx->file_priv == NULL. This is a simpler test than comparing it against dev_priv->kernel_context which involves some pointer dancing. In checking that this is true, we notice that the gvt context is allocating itself a i915_hw_ppgtt it doesn't use and not flagging that its file_priv should be invalid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk	2017-01-06 16:02:12 +00:00
Chris Wilson	f3b8f9126a	drm/i915/execlists: Reorder execlists register enabling Empirically we restart following a GPU reset more successfully if we call lrc_init_hws() (which contains a posting read) last. (The failure mode that was observed was that breadcrumb writes into the HWS from the recovered requests went astray leading to the context-switch maintaining forward progress, but the requests not being retired/completed.) For clarity, lrc_init_hws() is inlined (and the unused function then removed). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-3-chris@chris-wilson.co.uk	2017-01-05 15:34:42 +00:00
Chris Wilson	56f6e0a7e7	drm/i915: Assert that we do create the deferred context In order to convince static analyzers that the allocation function returns an error or sets ce->state, assert that it is set afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-2-chris@chris-wilson.co.uk	2017-01-05 15:34:41 +00:00

1 2 3 4 5 ...

655 Commits