linux

Author	SHA1	Message	Date
Chris Wilson	c0d5f32c50	drm/i915: Set guilty-flag on fence after detecting a hang The struct dma_fence carries a status field exposed to userspace by sync_file. This is inspected after the fence is signaled and can convey whether or not the request completed successfully, or in our case if we detected a hang during the request (signaled via -EIO in SYNC_IOC_FILE_INFO). v2: Mark all cancelled requests as failed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-2-chris@chris-wilson.co.uk	2017-01-10 20:49:30 +00:00
Chris Wilson	2edc6e0df1	drm/i915: Consolidate reset_request() Always reset the requests of the guilty context, including the hung request that we tell the hardware to skip. This should help if the reprogram fails entirely, but more importantly makes the guilty path more uniform (and simplifies the subsequent patch to tweak the cancelled requests). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-1-chris@chris-wilson.co.uk	2017-01-10 20:49:29 +00:00
Daniel Vetter	05adb1850a	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued Pull in latest drm-next from Dave Airlie to get at all the drm-misc goodies, specifically: - dma_fence error state handling rework (Chris needs that for error recovery) - crc support locking changes (Tomeu's i915 crc patches need that). Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2017-01-10 17:03:46 +01:00
Chris Wilson	8201c1fad4	drm/i915: Clip the partial view against the object not vma The VMA is later clipped against the vm_area_struct before insertion of the faulting PTE so we are free to create the partial view as we desire. If we use the object as the extents rather than the area, this partial can then be used for other areas. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-2-chris@chris-wilson.co.uk	2017-01-10 11:59:24 +00:00
Chris Wilson	2d4281bb93	drm/i915: Extract compute_partial_view() In order to reuse the partial view for selftesting, extract the common function for computing the view. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-1-chris@chris-wilson.co.uk	2017-01-10 11:59:24 +00:00
Chris Wilson	91d4e0aa92	drm/i915: Move ggtt fence/alignment to i915_gem_tiling.c Rename i915_gem_get_ggtt_size() and i915_gem_get_ggtt_alignment() to i915_gem_fence_size() and i915_gem_fence_alignment() respectively to better match usage. Similarly move the pair of functions into i915_gem_tiling.c next to the fence restrictions. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-6-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-10 08:12:22 +00:00
Chris Wilson	944397f04f	drm/i915: Store required fence size/alignment for GGTT vma The fence size/alignment is a combination of the vma size plus object tiling parameters. Those parameters are rarely changed, making the fence size/alignemnt roughly constant for the lifetime of the VMA. We can simplify subsequent calculations by precalculating the size/alignment required for GGTT vma taking fencing into account (with an update if we do change the tiling or stride). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk	2017-01-10 08:12:21 +00:00
Chris Wilson	5b30694b47	drm/i915: Align GGTT sizes to a fence tile row Ensure the view occupies the full tile row so that reads/writes into the VMA do not escape (via fenced detiling) into neighbouring objects - we will pad the object with scratch pages to satisfy the fence. This applies the lazy-tiling we employed on gen2/3 to gen4+. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-2-chris@chris-wilson.co.uk	2017-01-10 08:12:20 +00:00
Chris Wilson	6649a0b650	drm/i915: Extract tile_row_size for fencing Computing the tile row size of a tiled object (for use with fence registers) is repeated, so extract it to a common helper. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-10 08:12:09 +00:00
Dave Airlie	5c37daf5dd	Merge tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel into drm-next More 4.11 stuff, holidays edition (i.e. not much): - docs and cleanups for shared dpll code (Ander) - some kerneldoc work (Chris) - fbc by default on gen9+ too, yeah! (Paulo) - fixes, polish and other small things all over gem code (Chris) - and a few small things on top Plus a backmerge, because Dave was enjoying time off too. * tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel: (275 commits) drm/i915: Update DRIVER_DATE to 20170109 drm/i915: Drain freed objects for mmap space exhaustion drm/i915: Purge loose pages if we run out of DMA remap space drm/i915: Fix phys pwrite for struct_mutex-less operation drm/i915: Simplify testing for am-I-the-kernel-context? drm/i915: Use range_overflows() drm/i915: Use fixed-sized types for stolen drm/i915: Use phys_addr_t for the address of stolen memory drm/i915: Consolidate checks for memcpy-from-wc support drm/i915: Only skip requests once a context is banned drm/i915: Move a few more utility macros to i915_utils.h drm/i915: Clear ret before unbinding in i915_gem_evict_something() drm/i915/guc: Exclude the upper end of the Global GTT for the GuC drm/i915: Move a few utility macros into a separate header drm/i915/execlists: Reorder execlists register enabling drm/i915: Assert that we do create the deferred context drm/i915: Assert all timeline requests are gone before fini drm/i915: Revoke fenced GTT mmapings across GPU reset drm/i915: enable FBC on gen9+ too drm/i915: actually drive the BDW reserved IDs ...	2017-01-10 08:02:09 +10:00
Linus Torvalds	2fd8774c79	Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fixes from Konrad Rzeszutek Wilk: "This has one fix to make i915 work when using Xen SWIOTLB, and a feature from Geert to aid in debugging of devices that can't do DMA outside the 32-bit address space. The feature from Geert is on top of v4.10 merge window commit (specifically you pulling my previous branch), as his changes were dependent on the Documentation/ movement patches. I figured it would just easier than me trying than to cherry-pick the Documentation patches to satisfy git. The patches have been soaking since 12/20, albeit I updated the last patch due to linux-next catching an compiler error and adding an Tested-and-Reported-by tag" * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Export swiotlb_max_segment to users swiotlb: Add swiotlb=noforce debug option swiotlb: Convert swiotlb_force from int to enum x86, swiotlb: Simplify pci_swiotlb_detect_override()	2017-01-06 10:53:21 -08:00
Konrad Rzeszutek Wilk	7453c549f5	swiotlb: Export swiotlb_max_segment to users So they can figure out what is the optimal number of pages that can be contingously stitched together without fear of bounce buffer. We also expose an mechanism for sub-users of SWIOTLB API, such as Xen-SWIOTLB to set the max segment value. And lastly if swiotlb=force is set (which mandates we bounce buffer everything) we set max_segment so at least we can bounce buffer one 4K page instead of a giant 512KB one for which we may not have space. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Juergen Gross <jgross@suse.com>	2017-01-06 13:00:01 -05:00
Chris Wilson	b42a13d918	drm/i915: Drain freed objects for mmap space exhaustion As we now use a deferred free queue for objects, simply retiring the active objects is not enough to immediately free them and recover their mmap space - we must now also drain the freed object list. Fixes: `fbbd37b36f` ("drm/i915: Move object release to a freelist + worker" Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-01-06 16:54:47 +00:00
Chris Wilson	10466d2a59	drm/i915: Fix phys pwrite for struct_mutex-less operation Since commit `fe115628d5` ("drm/i915: Implement pwrite without struct-mutex") the lowlevel pwrite calls are now called without the protection of struct_mutex, but pwrite_phys was still asserting that it held the struct_mutex and later tried to drop and relock it. Fixes: `fe115628d5` ("drm/i915: Implement pwrite without struct-mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-01-06 16:53:44 +00:00
Chris Wilson	984ff29f74	drm/i915: Simplify testing for am-I-the-kernel-context? The kernel context (dev_priv->kernel_context) is unique in that it is not associated with any user filp - it is the only one with ctx->file_priv == NULL. This is a simpler test than comparing it against dev_priv->kernel_context which involves some pointer dancing. In checking that this is true, we notice that the gvt context is allocating itself a i915_hw_ppgtt it doesn't use and not flagging that its file_priv should be invalid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk	2017-01-06 16:02:12 +00:00
Chris Wilson	7ec73b7e36	drm/i915: Only skip requests once a context is banned If we skip before banning, we have an inconsistent interface between execbuf still queueing valid request but those requests already queued being cancelled. If we only cancel the pending requests once we stop accepting new requests, the user interface is more consistent. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: <stable@vger.kernel.org> # v4.9+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105170059.344-1-chris@chris-wilson.co.uk	2017-01-05 21:06:20 +00:00
Chris Wilson	40b326eefe	drm/i915: Move a few utility macros into a separate header In order to defeat some circular dependencies between headers to allow use of e.g. range_overflows() in a header, move the simple independent macros into their own header. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-4-chris@chris-wilson.co.uk	2017-01-05 15:34:45 +00:00
Chris Wilson	b1ed35d917	drm/i915: Revoke fenced GTT mmapings across GPU reset The fence registers are clobbered by a GPU reset. If there is concurrent user access to a fenced region via a GTT mmaping, the access will not be fenced during the reset (until we restore the fences afterwards). In order to prevent invalid access during the reset, before we clobber the fences first we must invalidate the GTT mmapings. Access to the mmap will then be forced to fault in the page, and in handling the fault, i915_gem_fault() will take the struct_mutex and wait upon the reset to complete. v2: Fix up commentary. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99274 Testcase: igt/gem_mmap_gtt/hang Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170104145110.1486-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-01-04 18:44:30 +00:00
Daniel Vetter	a402eae64d	Linux 4.10-rc2 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9 J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr 7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI= =OhGe -----END PGP SIGNATURE----- Merge tag 'v4.10-rc2' into drm-intel-next-queued Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've done the backmerge directly because Dave is on vacation. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2017-01-04 11:35:18 +01:00
Chris Wilson	2471eb5fb6	drm/i915: Prevent timeline updates whilst performing reset As the fence may be signaled concurrently from an interrupt on another device, it is possible for the list of requests on the timeline to be modified as we walk it. Take both (the context's timeline and the global timeline) locks to prevent such modifications. Fixes: `80b204bce8` ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk (cherry picked from commit `00c25e3f40`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2017-01-03 11:41:57 +02:00
Chris Wilson	64d1461ce0	drm/i915: Silence allocation failure during sg_trim() As trimming the sg table is merely an optimisation that gracefully fails if we cannot allocate a new table, we do not need to report the failure either. Fixes: `0c40ce130e` ("drm/i915: Trim the object sg table") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk (cherry picked from commit `8bfc478fa4`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2017-01-03 11:41:48 +02:00
Chris Wilson	c3f923b554	drm/i915: Don't clflush before release phys object When we teardown the backing storage for the phys object, we copy from the coherent contiguous block back to the shmemfs object, clflushing as we go. Trying to clflush the invalid sg beforehand just oops and would be redundant (due to it already being coherent, and clflushed afterwards). Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk (cherry picked from commit `e5facdf964`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2017-01-03 11:41:38 +02:00
Chris Wilson	6095868a27	drm/i915: Complete kerneldoc for struct i915_gem_context The existing kerneldoc was outdated, so time for a refresh. v2: Use single line kdoc, mention functions for manipulation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk	2016-12-31 11:41:47 +00:00
Chris Wilson	00c25e3f40	drm/i915: Prevent timeline updates whilst performing reset As the fence may be signaled concurrently from an interrupt on another device, it is possible for the list of requests on the timeline to be modified as we walk it. Take both (the context's timeline and the global timeline) locks to prevent such modifications. Fixes: `80b204bce8` ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk	2016-12-23 16:08:10 +00:00
Chris Wilson	8bfc478fa4	drm/i915: Silence allocation failure during sg_trim() As trimming the sg table is merely an optimisation that gracefully fails if we cannot allocate a new table, we do not need to report the failure either. Fixes: `0c40ce130e` ("drm/i915: Trim the object sg table") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk	2016-12-23 16:07:32 +00:00
Chris Wilson	e5facdf964	drm/i915: Don't clflush before release phys object When we teardown the backing storage for the phys object, we copy from the coherent contiguous block back to the shmemfs object, clflushing as we go. Trying to clflush the invalid sg beforehand just oops and would be redundant (due to it already being coherent, and clflushed afterwards). Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk	2016-12-23 16:07:20 +00:00
Chris Wilson	bdeb978506	drm/i915: Repeat flush of idle work during suspend The idle work handler is self-arming - if it detects that it needs to run again it will queue itself from its work handler. Take greater care when trying to drain the idle work, and double check that it is flushed. The free worker has a similar issue where it is armed by an RCU task which may be running concurrently with us. This should hopefully help with the sporadic WARN_ON(dev_priv->gt.awake) from i915_gem_suspend. v2: Reuse drain_freed_objects. v3: Don't try to flush the freed objects from the shrinker, as it may be underneath the struct_mutex already. v4: do while and comment upon the excess rcu_barrier in drain_freed_objects Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-2-chris@chris-wilson.co.uk	2016-12-23 16:07:11 +00:00
Chris Wilson	28f412e02b	drm/i915: Break after walking all GGTT vma in bump_inactive_ggtt Since commit `db6c2b4151` ("drm/i915: Store the vma in an rbtree under the object") the vma are once again sorted into GGTT first, then ppGTT so that the typical case of walking the GGTT vma can stop as soon as we find a non-ppGTT. Apply that optimisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-1-chris@chris-wilson.co.uk	2016-12-23 16:06:56 +00:00
Dave Airlie	4a401ceeef	Merge tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes First set of i915 fixes for code in next. * tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel: drm/i915: skip the first 4k of stolen memory on everything >= gen8 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping drm/i915: Fix use after free in logical_render_ring_init drm/i915: disable PSR by default on HSW/BDW drm/i915: Fix setting of boost freq tunable drm/i915: tune down the fast link training vs boot fail drm/i915: Reorder phys backing storage release drm/i915/gen9: Fix PCODE polling during SAGV disabling drm/i915/gen9: Fix PCODE polling during CDCLK change notification drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating drm/i915: drop the struct_mutex when wedged or trying to reset	2016-12-23 05:28:02 +10:00
Chris Wilson	abb0deacb5	drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping If we at first do not succeed with attempting to remap our physical pages using a coalesced scattergather list, try again with one scattergather entry per page. This should help with swiotlb as it uses a limited buffer size and only searches for contiguous chunks within its buffer aligned up to the next boundary - i.e. we may prematurely cause a failure as we are unable to utilize the unused space between large chunks and trigger an error such as: i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes) Reported-by: Juergen Gross <jgross@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Fixes: `871dfbd67d` ("drm/i915: Allow compaction upto SWIOTLB max segment size") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (cherry picked from commit `d766ef5300`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-12-20 16:30:47 +02:00
Chris Wilson	057f803ff1	drm/i915: Reorder phys backing storage release In commit `a4f5ea64f0` ("drm/i915: Refactor object page API"), I reordered the object->pages teardown to be more friendly wrt to a separate obj->mm.lock. However, I overlooked the phys object and left it with a dangling use-after-free of its phys_handle. Move the allocation of the phys handle to get_pages and it release to put_pages to prevent the invalid access and to improve symmetry. v2: Add commentary about always aligning to page size. Testcase: igt/drv_selftest/objects Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: `a4f5ea64f0` ("drm/i915: Refactor object page API") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk (cherry picked from commit `dbb4351bab`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-12-20 16:29:26 +02:00
Chris Wilson	c2dc6cc946	drm/i915: Add a test that we terminate the trimmed sgtable as expected In commit `0c40ce130e` ("drm/i915: Trim the object sg table"), we expect to copy exactly orig_st->nents across and allocate the table thusly. The copy loop should therefore end with the new_sg being NULL. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-2-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-12-20 12:31:52 +00:00
Chris Wilson	d766ef5300	drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping If we at first do not succeed with attempting to remap our physical pages using a coalesced scattergather list, try again with one scattergather entry per page. This should help with swiotlb as it uses a limited buffer size and only searches for contiguous chunks within its buffer aligned up to the next boundary - i.e. we may prematurely cause a failure as we are unable to utilize the unused space between large chunks and trigger an error such as: i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes) Reported-by: Juergen Gross <jgross@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Fixes: `871dfbd67d` ("drm/i915: Allow compaction upto SWIOTLB max segment size") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-12-20 12:30:56 +00:00
Chris Wilson	e8a9c58fcd	drm/i915: Unify active context tracking between legacy/execlists/guc The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk	2016-12-18 16:18:50 +00:00
Matthew Auld	966d5bf5eb	drm/i915: convert to using range_overflows Convert some of the obvious hand-rolled ranged overflow sanity checks to our shiny new range_overflows macro. Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-4-matthew.auld@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-12-16 21:22:12 +00:00
Jan Kara	1a29d85eb0	mm: use vmf->address instead of of vmf->virtual_address Every single user of vmf->virtual_address typed that entry to unsigned long before doing anything with it so the type of virtual_address does not really provide us any additional safety. Just use masked vmf->address which already has the appropriate type. Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:09 -08:00
Chris Wilson	dbb4351bab	drm/i915: Reorder phys backing storage release In commit `a4f5ea64f0` ("drm/i915: Refactor object page API"), I reordered the object->pages teardown to be more friendly wrt to a separate obj->mm.lock. However, I overlooked the phys object and left it with a dangling use-after-free of its phys_handle. Move the allocation of the phys handle to get_pages and it release to put_pages to prevent the invalid access and to improve symmetry. v2: Add commentary about always aligning to page size. Testcase: igt/drv_selftest/objects Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: `a4f5ea64f0` ("drm/i915: Refactor object page API") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk	2016-12-12 12:24:36 +00:00
Jani Nikula	73f67aa8cc	drm/i915: distinguish G33 and Pineview from each other Pineview deserves to use its own platform enum (which was already added, unused, previously). IS_G33() no longer matches Pineview, and gets replaced by IS_G33() \|\| IS_PINEVIEW() or equivalent. Pineview is no longer an outlier among platform definitions. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1481143689-19672-1-git-send-email-jani.nikula@intel.com	2016-12-07 23:28:33 +02:00
Jani Nikula	c0f86832e3	drm/i915: rename BROADWATER and CRESTLINE to I965G and I965GM, respectively Add more consistency to our naming. Pineview remains the outlier. Keep using code names for gen5+. v2: rebased Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1481105584-23033-1-git-send-email-jani.nikula@intel.com	2016-12-07 15:18:33 +02:00
Chris Wilson	85fd4f58d7	drm/i915: Mark all non-vma being inserted into the address spaces We need to distinguish between full i915_vma structs and simple drm_mm_nodes when considering eviction (i.e. we must be careful not to treat a mere drm_mm_node as a much larger i915_vma causing memory corruption, if we are lucky). To do this, color these not-a-vma with -1 (I915_COLOR_UNEVICTABLE). v2...v200: New name for -1. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk	2016-12-05 20:49:17 +00:00
Chris Wilson	ce1135c7de	drm/i915: Complete requests in nop_submit_request Since the submit/execute split in commit `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") the global seqno advance was deferred until the submit_request callback. After wedging the GPU, we were installing a nop_submit_request handler (to avoid waking up the dead hw) but I had missed converting this over to the new scheme. Under the new scheme, we have to explicitly call i915_gem_submit_request() from the submit_request handler to mark the request as on the hardware. If we don't the request is always pending, and any waiter will continue to wait indefinitely and hangcheck will not be able to resolve the lockup. References: https://bugs.freedesktop.org/show_bug.cgi?id=98748 Testcase: igt/gem_eio/in-flight Fixes: `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk (cherry picked from commit `3dcf93f7f2`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-12-05 10:38:01 +02:00
Tvrtko Ursulin	cb15d9f8c3	drm/i915: More GEM init dev_priv cleanup Simplifies the code to pass the right parameter in. v2: Commit message. (Joonas Lahtinen) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-12-01 18:01:16 +00:00
Tvrtko Ursulin	bf9e8429ab	drm/i915: Make various init functions take dev_priv Like GEM init, GUC init, MOCS init and context creation. Enables them to lose dev_priv locals. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-12-01 18:01:15 +00:00
Tvrtko Ursulin	12d79d7828	drm/i915: Make GEM object create and create from data take dev_priv Makes all GEM object constructors consistent. v2: Fix compilation in GVT code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)	2016-12-01 18:01:08 +00:00
Tvrtko Ursulin	187685cb90	drm/i915: Make GEM object alloc/free and stolen created take dev_priv Where it is more appropriate and also to be consistent with the direction of the driver. v2: Leave out object alloc/free inlining. (Joonas Lahtinen) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-12-01 18:00:15 +00:00
Chris Wilson	49d73912cb	drm/i915: Convert vm->dev backpointer to vm->i915 99% of the time we access i915_address_space->dev we want the i915 device and not the drm device, so let's store the drm_i915_private backpointer instead. The only real complication here are the inlines in i915_vma.h where drm_i915_private is not yet defined and so we have to choose an alternate path for our asserts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-29 11:38:00 +00:00
Chris Wilson	20e4933c47	drm/i915: Stop the machine as we install the wedged submit_request handler In order to prevent a race between the old callback submitting an incomplete request and i915_gem_set_wedged() installing its nop handler, we must ensure that the swap occurs when the machine is idle (stop_machine). v2: move context lost from out of BKL. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-4-chris@chris-wilson.co.uk	2016-11-22 17:42:19 +00:00
Chris Wilson	3dcf93f7f2	drm/i915: Complete requests in nop_submit_request Since the submit/execute split in commit `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") the global seqno advance was deferred until the submit_request callback. After wedging the GPU, we were installing a nop_submit_request handler (to avoid waking up the dead hw) but I had missed converting this over to the new scheme. Under the new scheme, we have to explicitly call i915_gem_submit_request() from the submit_request handler to mark the request as on the hardware. If we don't the request is always pending, and any waiter will continue to wait indefinitely and hangcheck will not be able to resolve the lockup. References: https://bugs.freedesktop.org/show_bug.cgi?id=98748 Testcase: igt/gem_eio/in-flight Fixes: `d55ac5bf97` ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk	2016-11-22 17:42:18 +00:00
Chris Wilson	d9e9da64c4	drm/i915: Don't deref context->file_priv ERR_PTR upon reset When a user context is closed, it's file_priv backpointer is replaced by ERR_PTR(-EBADF); be careful not to chase this invalid pointer after a hang and a GPU reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `b083a0870c` ("drm/i915: Add per client max context ban limit") Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-1-chris@chris-wilson.co.uk	2016-11-22 17:42:17 +00:00
Mika Kuoppala	bc1d53c647	drm/i915: Wipe hang stats as an embedded struct Bannable property, banned status, guilty and active counts are properties of i915_gem_context. Make them so. v2: rebase Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1479309634-28574-1-git-send-email-mika.kuoppala@intel.com	2016-11-21 14:36:56 +02:00
Mika Kuoppala	b083a0870c	drm/i915: Add per client max context ban limit If we have a bad client submitting unfavourably across different contexts, creating new ones, the per context scoring of badness doesn't remove the root cause, the offending client. To counter, keep track of per client context bans. Deny access if client is responsible for more than 3 context bans in it's lifetime. v2: move ban check to context create ioctl (Chris) v3: add commentary about hangs needed to reach client ban (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>	2016-11-21 14:36:40 +02:00
Mika Kuoppala	841021713a	drm/i915: Add bannable context parameter Now when driver has per context scoring of 'hanging badness' and also subsequent hangs during short windows are allowed, if there is progress made in between, it does not make sense to expose a ban timing window as a context parameter anymore. Let the scoring be the sole indicator for ban policy and substitute ban period context parameter as a boolean to get/set context bannable property. v2: allow non root to opt into being banned (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>	2016-11-21 14:36:40 +02:00
Mika Kuoppala	e5e1fc47ea	drm/i915: Use request retirement as context progress As hangcheck score was removed, the active decay of score was removed also. This removed feature for hangcheck to detect if the gpu client was accidentally or maliciously causing intermittent hangs. Reinstate the scoring as a per context property, so that if one context starts to act unfavourably, ban it. v2: ban_period_secs as a gate to score check (Chris) v3: decay in proper spot. scores as tunables (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>	2016-11-21 14:36:40 +02:00
Mika Kuoppala	3fe3b030bd	drm/i915: Decouple hang detection from hangcheck period Hangcheck state accumulation has gained more steps along the years, like head movement and more recently the subunit inactivity check. As the subunit sampling is only done if the previous state check showed inactivity, we have added more stages (and time) to reach a hang verdict. Asymmetric engine states led to different actual weight of 'one hangcheck unit' and it was demonstrated in some hangs that due to difference in stages, simpler engines were accused falsely of a hang as their scoring was much more quicker to accumulate above the hang treshold. To completely decouple the hangcheck guilty score from the hangcheck period, convert hangcheck score to a rough period of inactivity measurement. As these are tracked as jiffies, they are meaningful also across reset boundaries. This makes finding a guilty engine more accurate across multi engine activity scenarios, especially across asymmetric engines. We lose the ability to detect cross batch malicious attempts to hinder the progress. Plan is to move this functionality to be part of context banning which is more natural fit, later in the series. v2: use time_before macros (Chris) reinstate the pardoning of moving engine after hc (Chris) v3: avoid global state for per engine stall detection (Chris) v4: take timeline last retirement into account (Chris) v5: do debug print on pardoning, split out retirement timestamp (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>	2016-11-21 14:36:40 +02:00
Chris Wilson	05c348377d	drm/i915: Skip final clflush if LLC is coherent If the LLC is coherent with the object, we do not need to worry about whether main memory and cache mismatch when we hand the object back to the system. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-2-chris@chris-wilson.co.uk	2016-11-18 22:33:49 +00:00
Chris Wilson	a6a7cc4b7d	drm/i915: Always flush the dirty CPU cache when pinning the scanout Currently we only clflush the scanout if it is in the CPU domain. Also flush if we have a pending CPU clflush. We also want to treat the dirtyfb path similar, and flush any pending writes there as well. v2: Only send the fb flush message if flushing the dirt on flip v3: Make flush-for-flip and dirtyfb look more alike since they serve similar roles as end-of-frame marker. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2 Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-1-chris@chris-wilson.co.uk	2016-11-18 22:33:22 +00:00
Chris Wilson	b17993b7b2	drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error On the DMA mapping error path, sg may be NULL (it has already been marked as the last scatterlist entry), and we should avoid dereferencing it again. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `e227330223` ("drm/i915: avoid leaking DMA mappings") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2016-11-18 20:51:53 +00:00
Chris Wilson	5b8c8aec8e	drm/i915: Move frontbuffer CS write tracking from ggtt vma to object I tried to avoid having to track the write for every VMA by only tracking writes to the ggtt. However, for the purposes of frontbuffer tracking this is insufficient as we need to invalidate around writes not just to the the ggtt but all aliased ppgtt views of the framebuffer. By moving the critical section to the object and only doing so for framebuffer writes we can reduce the tracking even further by only watching framebuffers and not vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-18 11:15:59 +00:00
Matthew Auld	ea84aa776f	drm/i915: don't leak global_timeline We need to clean up the global_timeline in i915_gem_load_cleanup. v2: don't forget about the struct_mutex, and also WARN_ON if we have any remaining timelines before purging the global_timeline. v3: it might be a good idea to first remove the global_timeline...duh! Fixes: `73cb97010d` ("drm/i915: Combine seqno + tracking into a global timeline struct") Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1479415087-13216-1-git-send-email-matthew.auld@intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20161117210411.14044-2-chris@chris-wilson.co.uk	2016-11-17 21:05:37 +00:00
Chris Wilson	c4c29d7b59	drm/i915: Demote i915_gem_open() debugging from DRIVER to USER We use DRM_DEBUG() when reporting on user actions, to try and keep intentional errors out of the CI dmesg. Demote the debug from i915_gem_open() similarly so that it is only apparent with drm.debug & 1 like its brethren. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161109104507.21228-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-11-17 14:23:20 +00:00
Tvrtko Ursulin	275a991c03	drm/i915: dev_priv cleanup in i915_gem_gtt.c Started with removing INTEL_INFO(dev) and cascaded into a quite big trickle of function prototype changes. Still, I think it is for the better. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-17 13:56:12 +00:00
Tvrtko Ursulin	4362f4f6dd	drm/i915: Use dev_priv in INTEL_INFO in i915_gem_fence_reg.c Plus a small cascade of function prototype changes. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-17 13:56:06 +00:00
Tvrtko Ursulin	c6be607abc	drm/i915: dev_priv and a small cascade of cleanups in i915_gem.c Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-17 13:55:26 +00:00
Chris Wilson	6b5e90f58c	drm/i915/scheduler: Boost priorities for flips Boost the priority of any rendering required to show the next pageflip as we want to avoid missing the vblank by being delayed by invisible workload. We prioritise avoiding jank and jitter in the GUI over starving background tasks. v2: Descend dma_fence_array when boosting priorities. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-10-chris@chris-wilson.co.uk	2016-11-14 21:01:22 +00:00
Chris Wilson	20311bd350	drm/i915/scheduler: Execute requests in order of priorities Track the priority of each request and use it to determine the order in which we submit requests to the hardware via execlists. The priority of the request is determined by the user (eventually via the context) but may be overridden at any time by the driver. When we set the priority of the request, we bump the priority of all of its dependencies to match - so that a high priority drawing operation is not stuck behind a background task. When the request is ready to execute (i.e. we have signaled the submit fence following completion of all its dependencies, including third party fences), we put the request into a priority sorted rbtree to be submitted to the hardware. If the request is higher priority than all pending requests, it will be submitted on the next context-switch interrupt as soon as the hardware has completed the current request. We do not currently preempt any current execution to immediately run a very high priority request, at least not yet. One more limitation, is that this is first implementation is for execlists only so currently limited to gen8/gen9. v2: Replace recursive priority inheritance bumping with an iterative depth-first search list. v3: list_next_entry() for walking lists v4: Explain how the dfs solves the recursion problem with PI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk	2016-11-14 21:01:21 +00:00
Chris Wilson	52e5420907	drm/i915/scheduler: Record all dependencies upon request construction The scheduler needs to know the dependencies of each request for the lifetime of the request, as it may choose to reschedule the requests at any time and must ensure the dependency tree is not broken. This is in additional to using the fence to only allow execution after all dependencies have been completed. One option was to extend the fence to support the bidirectional dependency tracking required by the scheduler. However the mismatch in lifetimes between the submit fence and the request essentially meant that we had to build a completely separate struct (and we could not simply reuse the existing waitqueue in the fence for one half of the dependency tracking). The extra dependency tracking simply did not mesh well with the fence, and keeping it separate both keeps the fence implementation simpler and allows us to extend the dependency tracking into a priority tree (whilst maintaining support for reordering the tree). To avoid the additional allocations and list manipulations, the use of the priotree is disabled when there are no schedulers to use it. v2: Create a dedicated slab for i915_dependency. Rename the lists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk	2016-11-14 21:00:28 +00:00
Chris Wilson	663f71e73f	drm/i915: Remove engine->execlist_lock The execlist_lock is now completely subsumed by the engine->timeline->lock, and so we can remove the redundant layer of locking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-5-chris@chris-wilson.co.uk	2016-11-14 21:00:25 +00:00
Chris Wilson	bb89485e99	drm/i915: Create distinct lockclasses for execution vs user timelines In order to simplify the lockdep annotation, as they become more complex in the future with deferred execution and multiple paths through the same functions, create a separate lockclass for the user timeline and the hardware execution timeline. We should only ever be locking the user timeline and the execution timeline in parallel so we only need to create two lock classes, rather than a separate class for every timeline. v2: Rename the lock classes to be more consistent with other lockdep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk	2016-11-14 21:00:21 +00:00
Chris Wilson	2b3c83176e	drm/i915: Stop skipping the final clflush back to system pages When we release the shmem backing storage, we make sure that the pages are coherent with the cpu cache. However, our clflush routine was skipping the flush as the object had no pages at release time. Fix this by explicitly flushing the sg_table we are decoupling. Fixes: `03ac84f183` ("drm/i915: Pass around sg_table to get_pages/put_pages backend") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-2-chris@chris-wilson.co.uk	2016-11-11 16:21:52 +00:00
Chris Wilson	9caa34aa93	drm/i915: Only wait upon the execution timeline when unlocked In order to walk the list of all timelines, we currently require the struct_mutex. We are sometimes called prior to the struct_mutex being taken by the caller (i.e !I915_WAIT_LOCKED) in which case we can only trust the global execution timelines (as these are owned by the device). This means in the unlocked phase we can only wait upon the currently executing requests and not all queued. [ 175.743243] general protection fault: 0000 [#1] SMP [ 175.743263] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlwifi aesni_intel aes_x86_64 lrw snd_soc_rt5640 gf128mul snd_soc_rl6231 snd_soc_core glue_helper snd_compress snd_pcm_dmaengine snd_hda_codec_hdmi ablk_helper snd_hda_codec_realtek cryptd snd_hda_codec_generic serio_raw cfg80211 snd_hda_intel snd_hda_codec ir_lirc_codec snd_hda_core lirc_dev snd_hwdep snd_pcm lpc_ich mei_me mei snd_seq_midi shpchp snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer rc_rc6_mce acpi_als nuvoton_cir kfifo_buf rc_core snd industrialio snd_soc_sst_acpi soundcore snd_soc_sst_match i2c_designware_platform 8250_dw i2c_designware_core dw_dmac spi_pxa2xx_platform mac_hid acpi_pad parport_pc ppdev lp parport [ 175.743509] autofs4 i915 e1000e psmouse ptp pps_core xhci_pci ehci_pci ahci xhci_hcd ehci_hcd libahci video sdhci_acpi sdhci i2c_hid hid [ 175.743560] CPU: 2 PID: 2386 Comm: wtdg_monitor.sh Tainted: G U 4.9.0-rc4-nightly+ #2 [ 175.743581] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0358.2016.0606.1423 06/06/2016 [ 175.743603] task: ffff88024509ba80 task.stack: ffffc9007bd18000 [ 175.743618] RIP: 0010:[<ffffffffa01af29b>] [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915] [ 175.743660] RSP: 0000:ffffc9007bd1b9b8 EFLAGS: 00010297 [ 175.743674] RAX: ffff88024489d248 RBX: 0000000000000000 RCX: 0000000000000000 [ 175.743691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880244898000 [ 175.743708] RBP: ffffc9007bd1b9f0 R08: 0000000000000000 R09: 0000000000000001 [ 175.743724] R10: 00000028eaf42792 R11: 0000000000000001 R12: dead000000000100 [ 175.743741] R13: dead000000000148 R14: ffffc9007bd1ba5f R15: 0000000000000005 [ 175.743758] FS: 00007f2638330700(0000) GS:ffff880256d00000(0000) knlGS:0000000000000000 [ 175.743777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 175.743791] CR2: 00007f885c8cea40 CR3: 00000002416b5000 CR4: 00000000003406e0 [ 175.743808] Stack: [ 175.743816] ffff88024489d248 000000004509ba80 ffff880244898000 ffff88024509ba80 [ 175.743840] 00000000ffff8b69 ffffc9007bd1ba5f ffffc9007bd1ba5e ffffc9007bd1ba28 [ 175.743863] ffffffffa01b661d 00000000ffffffff 0000000000000000 ffff880244898000 [ 175.743886] Call Trace: [ 175.743906] [<ffffffffa01b661d>] i915_gem_shrinker_lock_uninterruptible.constprop.5+0x5d/0xc0 [i915] [ 175.743937] [<ffffffffa01b6cd0>] i915_gem_shrinker_oom+0x30/0x1b0 [i915] [ 175.743955] [<ffffffff8109ca79>] notifier_call_chain+0x49/0x70 [ 175.743971] [<ffffffff8109cd9d>] __blocking_notifier_call_chain+0x4d/0x70 [ 175.743988] [<ffffffff8109cdd6>] blocking_notifier_call_chain+0x16/0x20 [ 175.744005] [<ffffffff811885dc>] out_of_memory+0x22c/0x480 [ 175.744020] [<ffffffff81205542>] __alloc_pages_slowpath+0x851/0x8ec [ 175.744037] [<ffffffff8118ca51>] __alloc_pages_nodemask+0x2c1/0x310 [ 175.744054] [<ffffffff811d8ea8>] alloc_pages_current+0x88/0x120 [ 175.744070] [<ffffffff811833a4>] __page_cache_alloc+0xb4/0xc0 [ 175.744086] [<ffffffff811865ca>] filemap_fault+0x29a/0x500 [ 175.744101] [<ffffffff81299aa6>] ext4_filemap_fault+0x36/0x50 [ 175.744117] [<ffffffff811b3d4a>] __do_fault+0x6a/0xe0 [ 175.744131] [<ffffffff811b97ee>] handle_mm_fault+0xd0e/0x1330 [ 175.744147] [<ffffffff8106738c>] __do_page_fault+0x23c/0x4d0 [ 175.744162] [<ffffffff81067650>] do_page_fault+0x30/0x80 [ 175.744177] [<ffffffff817ffbe8>] page_fault+0x28/0x30 [ 175.744191] Code: 41 57 41 56 41 55 41 54 53 48 83 ec 10 4c 8b a7 48 52 00 00 89 75 d4 48 89 45 c8 49 39 c4 74 78 4d 8d 6c 24 48 41 bf 05 00 00 00 <49> 8b 5d 00 48 85 db 74 50 8b 83 20 01 00 00 85 c0 74 15 48 8b [ 175.744320] RIP [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915] [ 175.744351] RSP <ffffc9007bd1b9b8> Fixes: `80b204bce8` ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-1-chris@chris-wilson.co.uk	2016-11-11 16:21:52 +00:00
Tvrtko Ursulin	0031fb9685	drm/i915: Assorted dev_priv cleanups A small selection of macros which can only accept dev_priv from now on and a resulting trickle of fixups. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>	2016-11-11 14:58:26 +00:00
Joonas Lahtinen	b42fe9ca0a	drm/i915: Split out i915_vma.c As a side product, had to split two other files; - i915_gem_fence_reg.h - i915_gem_object.h (only parts that needed immediate untanglement) I tried to move code in as big chunks as possible, to make review easier. i915_vma_compare was moved to a header temporarily. v2: - Use i915_gem_fence_reg.{c,h} v3: - Rebased v4: - Fix building when DEBUG_GEM is enabled by reordering a bit. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com	2016-11-11 14:34:54 +02:00
Tvrtko Ursulin	0c40ce130e	drm/i915: Trim the object sg table At the moment we allocate enough sg table entries assuming we will not be able to do any coalescing. But since in practice we most often can, and more so very effectively, this ends up wasting a lot of memory. A simple and effective way of trimming the over-allocated entries is to copy the table over to a new one allocated to the exact size. Experiments on my freshly logged and idle desktop (KDE) showed that by doing this we can save approximately 1 MiB of RAM, or when running a typical benchmark like gl_manhattan I have even seen a 6 MiB saving. More complicated techniques such as only copying the last used page and freeing the rest are left to the reader. v2: * Update commit message. * Use temporary sg_table on stack. (Chris Wilson) v3: * Commit message update. * Comment added. * Replace memcpy with copy assignment. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478704423-7447-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-11-10 09:29:25 +00:00
Chris Wilson	d0da48cf92	drm/i915: Remove chipset flush after cache flush We always flush the chipset prior to executing with the GPU, so we can skip the flush during ordinary domain management. This should help mitigate some of the potential performance regressions, but likely trivial, from doing the flush unconditionally before execbuf introduced in commit `dcd79934b0` ("drm/i915: Unconditionally flush any chipset buffers before execbuf") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161106130001.9509-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-08 11:04:04 +00:00
Imre Deak	31ab49abde	drm/i915: Add assert for no pending GPU requests during suspend/resume in LR mode During resume we will reset the SW/HW tracking for each ring head/tail pointers and so are not prepared to replay any pending requests (as opposed to GPU reset time). Add an assert for this both to the suspend and the resume code. v2: - Check for ELSP port idle already during suspend and check !gt.awake during resume. (Chris) v3: - Move the !gt.awake check to i915_gem_resume(). v4: - s/intel_lr_engines_idle/intel_execlists_idle/ (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-4-git-send-email-imre.deak@intel.com	2016-11-07 14:48:05 +02:00
Imre Deak	0cb5670baa	drm/i915: Make sure engines are idle during GPU idling in LR mode We assume that the GPU is idle once receiving the seqno via the last request's user interrupt. In execlist mode the corresponding context completed interrupt can be delayed though and until this latter interrupt arrives we consider the request to be pending on the ELSP submit port. This can cause a problem during system suspend where this last request will be seen by the resume code as still pending. Such pending requests are normally replayed after a GPU reset, but during resume we reset both SW and HW tracking of the ring head/tail pointers, so replaying the pending request with its stale tail pointer will leave the ring in an inconsistent state. A subsequent request submission can lead then to the GPU executing from uninitialized area in the ring behind the above stale tail pointer. Fix this by making sure any pending request on the ELSP port is completed before suspending. I used a polling wait since the completion time I measured was <1ms and since normally we only need to wait during system suspend. GPU idling during runtime suspend is scheduled with a delay (currently 50-100ms) after the retirement of the last request at which point the context completed interrupt must have arrived already. The chance of this bug was increased by commit `1c777c5d1d` Author: Imre Deak <imre.deak@intel.com> Date: Wed Oct 12 17:46:37 2016 +0300 drm/i915/hsw: Fix GPU hang during resume from S3-devices state but it could happen even without the explicit GPU reset, since we disable interrupts afterwards during the suspend sequence. v2: - Do an unlocked poll-wait first. (Chris) v3-4: - s/intel_lr_engines_idle/intel_execlists_idle/ and move i915.enable_execlists check to the new helper. (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98470 Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-3-git-send-email-imre.deak@intel.com	2016-11-07 14:48:05 +02:00
Imre Deak	93c97dc17f	drm/i915: Avoid early GPU idling due to race with new request There is a small race where a new request can be submitted and retired after the idle worker started to run which leads to idling the GPU too early. Fix this by deferring the idling to the pending instance of the worker. This scenario was pointed out by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-2-git-send-email-imre.deak@intel.com	2016-11-07 14:48:04 +02:00
Chris Wilson	767a222e47	drm/i915: Limit Valleyview and earlier to only using mappable scanout Valleyview appears to be limited to only scanning out from the first 512MiB of the Global GTT. Lets presume that this behaviour was inherited from the display block copied from g4x (not Ironlake) and all earlier generations are similarly affected, though testing suggests different symptoms. For simplicity, impose that these platforms must scanout from the mappable region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this catches Cherryview which does not appear to be limited to the low aperture for its scanout.) v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about limiting this workaround to the old style of display engine. v3: Update changelog to reflect testing by Ville Syrjälä v4: Include the changes to the comments as well Reported-by: Luis Botello <luis.botello.ortega@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036 Fixes: `2efb813d53` ("drm/i915: Fallback to using unmappable memory for scanout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com	2016-11-07 12:25:50 +00:00
Chris Wilson	0ef723cbce	drm/i915: Round tile chunks up for constructing partial VMAs When we split a large object up into chunks for GTT faulting (because we can't fit the whole object into the aperture) we have to align our cuts with the fence registers. Each partial VMA must cover a complete set of tile rows or the offset into each partial VMA is not aligned with the whole image. Currently we enforce a minimum size on each partial VMA, but this minimum size itself was not aligned to the tile row causing distortion. Reported-by: Andreas Reis <andreas.reis@gmail.com> Reported-by: Chris Clayton <chris2553@googlemail.com> Reported-by: Norbert Preining <preining@logic.at> Tested-by: Norbert Preining <preining@logic.at> Tested-by: Chris Clayton <chris2553@googlemail.com> Fixes: `03af84fe7f` ("drm/i915: Choose partial chunksize based on tile row size") Fixes: `a61007a83a` ("drm/i915: Fix partial GGTT faulting") # enabling patch Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402 Testcase: igt/gem_mmap_gtt/medium-copy-odd Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-07 11:39:59 +00:00
Chris Wilson	2c3a3f44dc	drm/i915: Fix pages pin counting around swizzle quirk commit `bc0629a767` ("drm/i915: Track pages pinned due to swizzling quirk") fixed one problem, but revealed a whole lot more. The root cause of the pin count mismatch for the swizzle quirk (for L-shaped memory on gen3/4) was that we were incrementing the pages_pin_count upon getting the backing pages but then overwriting the pages_pin_count to set it to 1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON sanitychecks, the fix is to replace the explicit atomic_set with an atomic_inc. v2: Consistently use atomics (not mix atomics and helpers) within the lowlevel get_pages routines. This makes the atomic operations much clearer. Fixes: `1233e2db19` ("drm/i915: Move object backing storage manipulation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161104103001.27643-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-04 11:55:39 +00:00
Tvrtko Ursulin	a933568eb6	drm/i915: Tidy slab cache allocations We can use the preferred KMEM_CACHE helper for brevity. Also simplifiy error unwind by only setting the ENOMEM error code once. v2: Add forgotten changes. (Joonas Lahtinen) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1) Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478099699-28652-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-11-03 13:52:57 +00:00
Joonas Lahtinen	56cea32382	drm/i915: Unify global_list into global_link $ sed -i -r 's/\bglobal_list\b/global_link/g' .c .h Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478081764-8058-1-git-send-email-joonas.lahtinen@linux.intel.com	2016-11-02 15:17:13 +02:00
Tvrtko Ursulin	3599a91cc8	drm/i915: Allow shrinking of userptr objects once again Commit `1bec9b0bda` ("drm/i915/shrinker: Only shmemfs objects are backed by swap") stopped considering the userptr objects in shrinker callbacks. Restore that so idle userptr objects can be discarded in order to free up memory. One change further to what was introduced in `1bec9b0bda` is to start considering userptr objects in oom but that should also be a correct thing to do. v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `1bec9b0bda` ("drm/i915/shrinker: Only shmemfs objects are backed by swap") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: <stable@vger.kernel.org> Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-11-01 16:35:26 +00:00
Ville Syrjälä	a26e523921	drm/i915: Pass dev_priv to IS_BROADWATER/IS_CRESTLINE Unify our approach to things by passing around dev_priv instead of dev. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1477946245-14134-21-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-01 16:40:38 +02:00
Chris Wilson	548625ee8f	drm/i915: Improve lockdep tracking for obj->mm.lock The shrinker may appear to recurse into obj->mm.lock as the shrinker may be called from a direct reclaim path whilst handling get_pages. We filter out recursing on the same obj->mm.lock by inspecting obj->mm.pages, but we do want to take the lock on a second object in order to reap their pages. lockdep spots the recursion on the same lockclass and needs annotation to avoid a false positive. To keep the two paths distinct, create an enum to indicate which subclass of obj->mm.lock we are using. This removes the false positive and avoids masking real bugs. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-01 13:01:44 +00:00
Chris Wilson	db6c2b4151	drm/i915: Store the vma in an rbtree under the object With full-ppgtt one of the main bottlenecks is the lookup of the VMA underneath the object. For execbuf there is merit in having a very fast direct lookup of ctx:handle to the vma using a hashtree, but that still leaves a large number of other lookups. One way to speed up the lookup would be to use a rhashtable, but that requires extra allocations and may exhibit poor worse case behaviour. An alternative is to use an embedded rbtree, i.e. no extra allocations and deterministic behaviour, but at the slight cost of O(lgN) lookups (instead of O(1) for rhashtable). The major of such tree will be very shallow and so not much slower, and still scales much, much better than the current unsorted list. v2: Bump vma_compare() to return a long, as we return the result of comparing two pointers. References: https://bugs.freedesktop.org/show_bug.cgi?id=87726 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk	2016-11-01 13:00:40 +00:00
Chris Wilson	bc0629a767	drm/i915: Track pages pinned due to swizzling quirk If we have a tiled object and an unknown CPU swizzle pattern, we pin the pages to prevent the object from being swapped out (and us corrupting the contents as we do not know the access pattern and so cannot convert it to linear and back to tiled on reuse). This requires us to remember to drop the extra pinning when freeing the object, or else we trigger warnings about the pin leak. In commit `fbbd37b36f` ("drm/i915: Move object release to a freelist + worker"), the object free path was deferred to a worker, but the unpinning of the quirk, along with marking the object as reclaimable, was left on the immediate path (so that if required we could reclaim the pages under memory pressure as early as possible). However, this split introduced a bug where the pages were no longer being unpinned if they were marked as unneeded. [ 231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915] [ 231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj)) [ 231.800405] Modules linked in: [ 231.800406] snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915] [ 231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G U 4.9.0-rc2-CI-CI_DRM_1780+ #1 [ 231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009 [ 231.800456] Workqueue: events __i915_gem_free_work [i915] [ 231.800459] ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000 [ 231.800465] ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000 [ 231.800469] ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0 [ 231.800474] Call Trace: [ 231.800479] [<ffffffff8142dd65>] dump_stack+0x67/0x92 [ 231.800484] [<ffffffff8107e4e6>] __warn+0xc6/0xe0 [ 231.800487] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50 [ 231.800491] [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340 [ 231.800520] [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915] [ 231.800548] [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915] [ 231.800552] [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0 [ 231.800555] [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0 [ 231.800558] [<ffffffff8109c789>] worker_thread+0x49/0x490 [ 231.800561] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0 [ 231.800563] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0 [ 231.800566] [<ffffffff810a2aab>] kthread+0xeb/0x110 [ 231.800569] [<ffffffff810a29c0>] ? kthread_park+0x60/0x60 [ 231.800573] [<ffffffff818164a7>] ret_from_fork+0x27/0x40 Moving to a separate flag for tracking the quirked pin is overkill for the bug (since we only have to interchange the two tests in i915_gem_free_object) but it does reduce a complicated test on all objects and provide a sanitycheck for uncommon code paths. Fixes: `fbbd37b36f` ("drm/i915: Move object release to a freelist + worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk	2016-11-01 10:48:41 +00:00
Chris Wilson	cb399eabc4	drm/i915: Avoid accessing request->timeline outside of its lifetime Whilst waiting on a request, we may do so without holding any locks or any guards beyond a reference to the request. In order to avoid taking locks within request deallocation, we drop references to its timeline (via the context and ppgtt) upon retirement. We should avoid chasing such pointers outside of their control, in particular we inspect the request->timeline to see if we may restore the RPS waitboost for a client. If we instead look at the engine->timeline, we will have similar behaviour on both full-ppgtt and !full-ppgtt systems and reduce the amount of reward we give towards stalling clients (i.e. only if the client stalls and the GPU is uncontended does it reclaim its boost). This restores behaviour back to pre-timelines, whilst fixing: [ 645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0 [ 645.078577] Read of size 4 by task gem_exec_schedu/28408 [ 645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64 [ 645.078724] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [ 645.078816] ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200 [ 645.078998] ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200 [ 645.079172] ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796 [ 645.079345] Call Trace: [ 645.079404] [<ffffffff8143d059>] dump_stack+0x68/0x9f [ 645.079467] [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70 [ 645.079534] [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0 [ 645.079601] [<ffffffff8122a244>] kasan_report+0x34/0x40 [ 645.079676] [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0 [ 645.079741] [<ffffffff81229951>] __asan_load4+0x61/0x80 [ 645.079807] [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0 [ 645.079876] [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590 [ 645.079944] [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500 [ 645.080016] [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0 [ 645.080084] [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210 [ 645.080157] [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0 [ 645.080226] [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690 [ 645.080296] [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690 [ 645.080366] [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0 [ 645.080433] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640 [ 645.080508] [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0 [ 645.080603] [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120 [ 645.080670] [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210 [ 645.080738] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20 [ 645.080804] [<ffffffff8120268c>] ? do_mmap+0x47c/0x580 [ 645.080871] [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140 [ 645.080938] [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150 [ 645.081011] [<ffffffff81108c53>] ? up_write+0x23/0x50 [ 645.081078] [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140 [ 645.081145] [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90 [ 645.081214] [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0 [ 645.082030] [<ffffffff81288408>] ? __fget+0x168/0x250 [ 645.082106] [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1 [ 645.082176] [<ffffffff81288592>] ? __fget_light+0xa2/0xc0 [ 645.082242] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70 [ 645.082309] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192 [ 645.082431] Allocated: [ 645.082480] PID = 28408 [ 645.082535] [ 645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20 [ 645.082623] [ 645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0 [ 645.082716] [ 645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0 [ 645.082817] [ 645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220 [ 645.082908] [ 645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560 [ 645.083027] [ 645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0 [ 645.083152] [ 645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640 [ 645.083243] [ 645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20 [ 645.083334] [ 645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70 [ 645.083432] [ 645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 645.083551] Freed: [ 645.083599] PID = 27629 [ 645.083648] [ 645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20 [ 645.083738] [ 645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0 [ 645.083830] [ 645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0 [ 645.083922] [ 645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170 [ 645.084021] [ 645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180 [ 645.084139] [ 645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230 [ 645.084257] [ 645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230 [ 645.084380] [ 645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630 [ 645.084500] [ 645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280 [ 645.085313] [ 645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0 [ 645.085440] [ 645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920 [ 645.085532] [ 645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840 [ 645.085622] [ 645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0 [ 645.085718] [ 645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40 [ 645.085811] Memory state around the buggy address: [ 645.085869] ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.085956] ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.086138] ^ [ 645.086193] ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 645.086283] ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb v2: Add a comment to document the hint like nature of intel_engine_last_submit() Fixes: `73cb97010d` ("drm/i915: Combine seqno + tracking into a global timeline struct") Fixes: `80b204bce8` ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-1-chris@chris-wilson.co.uk	2016-11-01 10:48:40 +00:00
Chris Wilson	7d5d59e527	drm/i915: Use the full hammer when shutting down the rcu tasks To flush all call_rcu() tasks (here from i915_gem_free_object()) we need to call rcu_barrier() (not synchronize_rcu()). If we don't then we may still have objects being freed as we continue to teardown the driver - in particular, the recently released rings may race with the memory manager shutdown resulting in sporadic: [ 142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40 [ 142.217187] Memory manager not clean during takedown. [ 142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel] [ 142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1 [ 142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013 [ 142.217200] ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000 [ 142.217202] ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48 [ 142.217204] ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0 [ 142.217206] Call Trace: [ 142.217209] [<ffffffff8142dd65>] dump_stack+0x67/0x92 [ 142.217211] [<ffffffff8107e4e6>] __warn+0xc6/0xe0 [ 142.217213] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50 [ 142.217214] [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40 [ 142.217236] [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915] [ 142.217246] [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915] [ 142.217253] [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915] [ 142.217260] [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915] [ 142.217268] [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915] [ 142.217269] [<ffffffff8147d214>] pci_device_remove+0x34/0xb0 [ 142.217271] [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150 [ 142.217272] [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0 [ 142.217273] [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0 [ 142.217274] [<ffffffff8157c787>] driver_unregister+0x27/0x50 [ 142.217276] [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70 [ 142.217287] [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915] [ 142.217289] [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0 [ 142.217291] [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 142.217292] ---[ end trace 6fd164859c154772 ]--- [ 142.217505] [drm:show_leaks] ERROR node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0 [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360 [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915] [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915] [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915] [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915] [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915] [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915] [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915] [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915] [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915] [<ffffffff8147d315>] pci_device_probe+0x85/0xf0 [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430 [<ffffffff8157baee>] __driver_attach+0xde/0xe0 In particular note that the node was being poisoned as we inspected the list, a clear indication that the object is being freed as we make the assertion. v2: Don't loop, just assert that we do all the work required as that will be better at detecting further errors. Fixes: `fbbd37b36f` ("drm/i915: Move object release to a freelist + worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161101084843.3961-1-chris@chris-wilson.co.uk	2016-11-01 09:30:08 +00:00
Chris Wilson	80b204bce8	drm/i915: Enable multiple timelines With the infrastructure converted over to tracking multiple timelines in the GEM API whilst preserving the efficiency of using a single execution timeline internally, we can now assign a separate timeline to every context with full-ppgtt. v2: Add a comment to indicate the xfer between timelines upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk	2016-10-28 20:53:57 +01:00
Chris Wilson	28176ef4cf	drm/i915: Reserve space in the global seqno during request allocation A restriction on our global seqno is that they cannot wrap, and that we cannot use the value 0. This allows us to detect when a request has not yet been submitted, its global seqno is still 0, and ensures that hardware semaphores are monotonic as required by older hardware. To meet these restrictions when we defer the assignment of the global seqno, we must check that we have an available slot in the global seqno space during request construction. If that test fails, we wait for all requests to be completed and reset the hardware back to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk	2016-10-28 20:53:56 +01:00
Chris Wilson	65e4760e39	drm/i915: Introduce a global_seqno for each request Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	3033acab07	drm/i915: Queue the idling context switch after all other timelines Before suspend, we wait for the switch to the kernel context. In order for all the other context images to be complete upon suspend, that switch must be the last operation by the GPU (i.e. this idling request must not overtake any pending requests). To make this request execute last, we make it depend on every other inflight request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk	2016-10-28 20:53:52 +01:00
Chris Wilson	73cb97010d	drm/i915: Combine seqno + tracking into a global timeline struct Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk	2016-10-28 20:53:51 +01:00
Chris Wilson	d07f0e59b2	drm/i915: Move GEM activity tracking into a common struct reservation_object In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which i915 engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk	2016-10-28 20:53:50 +01:00
Chris Wilson	f0cd518206	drm/i915: Use lockless object free Having moved the locked phase of freeing an object to a separate worker, we can now declare to the core that we only need the unlocked variant of driver->gem_free_object, and can use the simple unreference internally. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk	2016-10-28 20:53:50 +01:00
Chris Wilson	fbbd37b36f	drm/i915: Move object release to a freelist + worker We want to hide the latency of releasing objects and their backing storage from the submission, so we move the actual free to a worker. This allows us to switch to struct_mutex freeing of the object in the next patch. Furthermore, if we know that the object we are dereferencing remains valid for the duration of our access, we can forgo the usual synchronisation barriers and atomic reference counting. To ensure this we defer freeing an object til after an RCU grace period, such that any lookup of the object within an RCU read critical section will remain valid until after we exit that critical section. We also employ this delay for rate-limiting the serialisation on reallocation - we have to slow down object creation in order to prevent resource starvation (in particular, files). v2: Return early in i915_gem_tiling() ioctl to skip over superfluous work on error. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk	2016-10-28 20:53:49 +01:00
Chris Wilson	40e62d5d6b	drm/i915: Acquire the backing storage outside of struct_mutex in set-domain As we can locklessly (well struct_mutex-lessly) acquire the backing storage, do so in set-domain-ioctl to reduce the contention on the struct_mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-18-chris@chris-wilson.co.uk	2016-10-28 20:53:49 +01:00
Chris Wilson	fe115628d5	drm/i915: Implement pwrite without struct-mutex We only need struct_mutex within pwrite for a brief window where we need to serialise with rendering and control our cache domains. Elsewhere we can rely on the backing storage being pinned, and forgive userspace any races against us. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk	2016-10-28 20:53:48 +01:00
Chris Wilson	bb6dc8d96b	drm/i915: Implement pread without struct-mutex We only need struct_mutex within pread for a brief window where we need to serialise with rendering and control our cache domains. Elsewhere we can rely on the backing storage being pinned, and forgive userspace any races against us. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk	2016-10-28 20:53:48 +01:00
Chris Wilson	1233e2db19	drm/i915: Move object backing storage manipulation to its own locking Break the allocation of the backing storage away from struct_mutex into a per-object lock. This allows parallel page allocation, provided we can do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT fault), i.e. before execbuf! The increased cost of the atomic counters are hidden behind i915_vma_pin() for the typical case of execbuf, i.e. as the object is typically bound between execbufs, the page_pin_count is static. The cost will be felt around set-domain and pwrite, but offset by the improvement from reduced struct_mutex contention. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk	2016-10-28 20:53:47 +01:00
Chris Wilson	03ac84f183	drm/i915: Pass around sg_table to get_pages/put_pages backend The plan is to move obj->pages out from under the struct_mutex into its own per-object lock. We need to prune any assumption of the struct_mutex from the get_pages/put_pages backends, and to make it easier we pass around the sg_table to operate on rather than indirectly via the obj. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk	2016-10-28 20:53:47 +01:00
Chris Wilson	a4f5ea64f0	drm/i915: Refactor object page API The plan is to make obtaining the backing storage for the object avoid struct_mutex (i.e. use its own locking). The first step is to update the API so that normal users only call pin/unpin whilst working on the backing storage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk	2016-10-28 20:53:46 +01:00
Chris Wilson	96d7763452	drm/i915: Use a radixtree for random access to the object's backing storage A while ago we switched from a contiguous array of pages into an sglist, for that was both more convenient for mapping to hardware and avoided the requirement for a vmalloc array of pages on every object. However, certain GEM API calls (like pwrite, pread as well as performing relocations) do desire access to individual struct pages. A quick hack was to introduce a cache of the last access such that finding the following page was quick - this works so long as the caller desired sequential access. Walking backwards, or multiple callers, still hits a slow linear search for each page. One solution is to store each successful lookup in a radix tree. v2: Rewrite building the radixtree for clarity, hopefully. v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from within an atomic section and so relax the allocation context to a simple GFP_KERNEL and mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk	2016-10-28 20:53:45 +01:00
Chris Wilson	4c7d62c6b8	drm/i915: Markup GEM API with lockdep asserts Add lockdep_assert_held(struct_mutex) to the API preamble of the internal GEM interfaces. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk	2016-10-28 20:53:45 +01:00
Chris Wilson	f8a7fde456	drm/i915: Defer active reference until required We only need the active reference to keep the object alive after the handle has been deleted (so as to prevent a synchronous gem_close). Why then pay the price of a kref on every execbuf when we can insert that final active ref just in time for the handle deletion? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Chris Wilson	e95433c73a	drm/i915: Rearrange i915_wait_request() accounting with callers Our low-level wait routine has evolved from our generic wait interface that handled unlocked, RPS boosting, waits with time tracking. If we push our GEM fence tracking to use reservation_objects (required for handling multiple timelines), we lose the ability to pass the required information down to i915_wait_request(). However, if we push the extra functionality from i915_wait_request() to the individual callsites (i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use of those extras, we can both simplify our low level wait and prepare for extending the GEM interface for use of reservation_objects. v2: Rewrite i915_wait_request() kerneldocs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Chris Wilson	c92ac094a9	drm/i915: Remove superfluous wait_for_error() from throttle-ioctl The throttle-ioctl never touches the struct_mutex. It does, however, as part of its ABI report whether the hardware is terminally wedged. For that purposes, it only has to report the current state and not incur the cost of checking/waiting every invocation, as we do not have to wait for a reset before waiting on a request to ensure completion (that is baked into the wait request implementation). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-3-chris@chris-wilson.co.uk	2016-10-28 20:53:42 +01:00
Chris Wilson	b52992c06c	drm/i915: Support asynchronous waits on struct fence from i915_gem_request We will need to wait on DMA completion (as signaled via struct fence) before executing our i915_gem_request. Therefore we want to expose a method for adding the await on the fence itself to the request. v2: Add a comment detailing a failure to handle a signal-on-any fence-array. v3: Pretend that magic numbers don't exist. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk	2016-10-28 20:53:41 +01:00
Tvrtko Ursulin	3299e7e434	drm/i915: Remove two invalid warns Objects can have multiple VMAs used for display in which case assertion that objects must not be pinned for display more times than the current VMA is incorrect. v2: Commit message update. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `058d88c433` ("drm/i915: Track pinned VMA") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1477413635-3876-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-10-26 09:04:56 +01:00
Tvrtko Ursulin	07ee2bce6a	drm/i915: Rotated view does not need a fence We do not need to set up a fence for the rotated view. Display does not need it and no one can access it. v2: Move code to __i915_vma_set_map_and_fenceable. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `05a20d098d` ("drm/i915: Move map-and-fenceable tracking to the VMA") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-10-26 09:04:55 +01:00
Chris Wilson	de867c20b9	drm/i915: Include the kernel uptime in the error state As well as knowing when the error occurred, it is more interesting to me to know how long after booting the error occurred, and for good measure record the time since last hw initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161025121602.1457-1-chris@chris-wilson.co.uk	2016-10-25 13:22:43 +01:00
Daniel Vetter	f9bf1d97e8	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued Backmerge because Chris Wilson needs the very latest&greates of Gustavo Padovan's sync_file work, specifically the refcounting changes from: commit `30cd85dd6e` Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Date: Wed Oct 19 15:48:32 2016 -0200 dma-buf/sync_file: hold reference to fence when creating sync_file Also good to sync in general since git tends to get confused with the cherry-picking going on. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2016-10-25 08:57:53 +02:00
Dave Airlie	5481e27f6f	Merge tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel into drm-next - first slice of the gvt device model (Zhenyu et al) - compression support for gpu error states (Chris) - sunset clause on gpu errors resulting in dmesg noise telling users how to report them - .rodata diet from Tvrtko - switch over lots of macros to only take dev_priv (Tvrtko) - underrun suppression for dp link training (Ville) - lspcon (hmdi 2.0 on skl/bxt) support from Shashank Sharma, polish from Jani - gen9 wm fixes from Paulo&Lyude - updated ddi programming for kbl (Rodrigo) - respect alternate aux/ddc pins (from vbt) for all ddi ports (Ville) * tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel: (227 commits) drm/i915: Update DRIVER_DATE to 20161024 drm/i915: Stop setting SNB min-freq-table 0 on powersave setup drm/i915/dp: add lane_count check in intel_dp_check_link_status drm/i915: Fix whitespace issues drm/i915: Clean up DDI DDC/AUX CH sanitation drm/i915: Respect alternate_ddc_pin for all DDI ports drm/i915: Respect alternate_aux_channel for all DDI ports drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7 drm/i915: KBL - Recommended buffer translation programming for DisplayPort drm/i915: Move down skl/kbl ddi iboost and n_edp_entires fixup drm/i915: Add a sunset clause to GPU hang logging drm/i915: Stop reporting error details in dmesg as well as the error-state drm/i915/gvt: do not ignore return value of create_scratch_page drm/i915/gvt: fix spare warnings on odd constant _Bool cast drm/i915/gvt: mark symbols static where possible drm/i915/gvt: fix sparse warnings on different address spaces drm/i915/gvt: properly access enabled intel_engine_cs drm/i915/gvt: Remove defunct vmap_batch() drm/i915/gvt: Use common mapping routines for shadow_bb object drm/i915/gvt: Use common mapping routines for indirect_ctx object ...	2016-10-25 16:39:43 +10:00
Chris Wilson	7c108fd8fe	drm/i915: Move fence cancellation to runtime suspend At the moment, we have dependency on the RPM as a barrier itself in both i915_gem_release_all_mmaps() and i915_gem_restore_fences(). i915_gem_restore_fences() is also called along !runtime pm paths, but we can move the markup of lost fences alongside releasing the mmaps into a common i915_gem_runtime_suspend(). This has the advantage of locating all the tricky barrier dependencies into one location. v2: Just mark the fence as invalid (fence->dirty) so that upon waking we will be sure to clear the fence after use, or restore it to the correct value before use. This makes sure that if the fence is left intact across the sleep, we do not leave it pointing to a region of GTT for the next unsuspecting user. Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Imre Deak <imre.deak@linux.intel.com> Reviewed-by: Imre Deak <imre.deak@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-5-chris@chris-wilson.co.uk	2016-10-24 13:45:38 +01:00
Chris Wilson	3594a3e21f	drm/i915: Remove superfluous locking around userfault_list Now that we have reduced the access to the list to either (a) under the struct_mutex whilst holding the RPM wakeref (so that concurrent writers to the list are serialised by struct_mutex) and (b) under the atomic runtime suspend (which cannot run concurrently with any other accessor due to the atomic nature of the runtime suspend) we can remove the extra locking around the list itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-3-chris@chris-wilson.co.uk	2016-10-24 13:45:36 +01:00
Chris Wilson	9c870d0367	drm/i915: Use RPM as the barrier for controlling user mmap access We can remove the false coupling between RPM and struct mutex by the observation that we can use the RPM wakeref as the barrier around user mmap access. That is as we tear down the user's PTE atomically from within rpm suspend and then to fault in new PTE requires the rpm wakeref, means that no user access is possible through those PTE without RPM being awake. Having made that observation, we can then remove the presumption of having to take rpm outside of struct_mutex and so allow fine grained acquisition of a wakeref around hw access rather than having to remember to acquire the wakeref early on. v2: Rejig placement of the new intel_runtime_pm_get() to be as tight as possible around the GTT pread/pwrite. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-2-chris@chris-wilson.co.uk	2016-10-24 13:45:35 +01:00
Chris Wilson	275f039db5	drm/i915: Move user fault tracking to a separate list We want to decouple RPM and struct_mutex, but currently RPM has to walk the list of bound objects and remove userspace mmapping before we suspend (otherwise userspace may continue to access the GTT whilst it is powered down). This currently requires the struct_mutex to walk the bound_list, but if we move that to a separate list and lock we can take the first step towards removing the struct_mutex. v2: Split runtime suspend unmapping vs regular unmapping, to make the locking (and barriers) clearer. Add the object to the userfault_list prior to inserting the first PTE, the race between add/revoke depends upon struct_mutex for regular unmappings and rpm for runtime-suspend. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1 Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk	2016-10-24 13:45:35 +01:00
Chris Wilson	4ff340f061	drm/i915: Limit the scattergather coalescing to 32bits The scattergather list uses a 32bit size counter, we should avoid exceeding it. v2: Also we should use unsigned int to match sg->length. Fixes: `871dfbd67d` ("drm/i915: Allow compaction upto SWIOTLB max segment size") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-3-chris@chris-wilson.co.uk	2016-10-18 14:22:27 +01:00
Chris Wilson	b4bcbe2a90	drm/i915: Document our internal limit on object size In many places, we try to count pages using a 32 bit integer. That implies if we are asked to create an object larger than 43bits, we will subtly crash much later. Catch this on the boundary, and add a warning to remind ourselves later on our exabyte systems. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-2-chris@chris-wilson.co.uk	2016-10-18 14:22:26 +01:00
Chris Wilson	3ef7f22893	drm/i915: Bump object bookkeeping to u64 from size_t Internally we allow for using more objects than a single process can allocate, i.e. we allow for a 64bit GPU address space even on a 32bit system. Using size_t may oveerflow. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-1-chris@chris-wilson.co.uk	2016-10-18 14:22:26 +01:00
Michał Winiarski	4fb84d991e	drm/i915: Remove unused "valid" parameter from pte_encode We never used any invalid ptes, those were put in place for a possibility of doing gpu faults. However our batchbuffers are not restricted in length, so everything needs to be pointing to something and thus out-of-bounds is pointing to scratch. Remove the valid flag as it is always true. v2: Expand commit msg, patch reorder (Mika) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com	2016-10-14 12:40:32 +01:00
Tvrtko Ursulin	5db9401983	drm/i915: Make IS_GEN macros only take dev_priv Saves 1416 bytes of .rodata strings. v2: Add parantheses around dev_priv. (Ville Syrjala) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476352990-2504-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-10-14 12:23:22 +01:00
Tvrtko Ursulin	772c2a519c	drm/i915: Make IS_HASWELL only take dev_priv Saves 2432 bytes of .rodata strings. v2: Add parantheses around dev_priv. (Ville Syrjala) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2016-10-14 12:23:19 +01:00
Tvrtko Ursulin	8652744b64	drm/i915: Make IS_BROADWELL only take dev_priv Saves 1808 bytes of .rodata strings. v2: Add parantheses around dev_priv. (Ville Syrjala) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2016-10-14 12:23:19 +01:00
Tvrtko Ursulin	fd6b8f43c9	drm/i915: Make IS_IVYBRIDGE only take dev_priv Saves 848 bytes of .rodata strings. v2: Add parantheses around dev_priv. (Ville Syrjala) v3: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2016-10-14 12:23:19 +01:00
Tvrtko Ursulin	50a0bc9054	drm/i915: Make INTEL_DEVID only take dev_priv Saves 4472 bytes of .rodata strings. v2: Add parantheses around dev_priv. (Ville Syrjala) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2016-10-14 12:23:19 +01:00
Tvrtko Ursulin	6e266956a5	drm/i915: Make INTEL_PCH_TYPE & co only take dev_priv This saves 1872 bytes of .rodata strings. v2: * Rebase. * Add parantheses around dev_priv. (Ville Syrjala) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2016-10-14 12:23:19 +01:00
Akash Goel	3b3f1650b1	drm/i915: Allocate intel_engine_cs structure only for the enabled engines With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine* macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com	2016-10-14 09:58:43 +01:00
Chris Wilson	ad16d2ed8f	drm/i915: Skip unbinding large unmappable global buffers If the user requests a mappable binding to the global GTT, we will first unbind an existing mapping if it doesn't match. We will unbind even if there is no possibility that the object can fit in the mappable aperture. This may lead to a ping-pong migration of the object, for example igt/gem_exec_big. v2: Comment upon the reasoning, or lack thereof!, behind the choice of magic numbers. Testcase: igt/gem_exec_big Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161013085504.30705-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com	2016-10-13 15:47:47 +01:00
Imre Deak	1c777c5d1d	drm/i915/hsw: Fix GPU hang during resume from S3-devices state Currently resuming on HSW from S3 pm_test/devices state leads to an unrecoverable GPU hang. Resetting the GPU during suspend fixes this. For a full S3 cycle this change only means the reset happens earlier (before reaching S3). For S4 the reset will happen now both during the freeze and quiesce phases, which is a benefit since it will guarantee that the GPU is idle before creating and loading the hibernation image. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1476283597-580-1-git-send-email-imre.deak@intel.com	2016-10-13 12:11:31 +03:00
Linus Torvalds	6b25e21fa6	Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Core: - Fence destaging work - DRIVER_LEGACY to split off legacy drm drivers - drm_mm refactoring - Splitting drm_crtc.c into chunks and documenting better - Display info fixes - rbtree support for prime buffer lookup - Simple VGA DAC driver Panel: - Add Nexus 7 panel - More simple panels i915: - Refactoring GEM naming - Refactored vma/active tracking - Lockless request lookups - Better stolen memory support - FBC fixes - SKL watermark fixes - VGPU improvements - dma-buf fencing support - Better DP dongle support amdgpu: - Powerplay for Iceland asics - Improved GPU reset support - UVD/VEC powergating support for CZ/ST - Preinitialised VRAM buffer support - Virtual display support - Initial SI support - GTT rework - PCI shutdown callback support - HPD IRQ storm fixes amdkfd: - bugfixes tilcdc: - Atomic modesetting support mediatek: - AAL + GAMMA engine support - Hook up gamma LUT - Temporal dithering support imx: - Pixel clock from devicetree - drm bridge support for LVDS bridges - active plane reconfiguration - VDIC deinterlacer support - Frame synchronisation unit support - Color space conversion support analogix: - PSR support - Better panel on/off support rockchip: - rk3399 vop/crtc support - PSR support vc4: - Interlaced vblank timing - 3D rendering CPU overhead reduction - HDMI output fixes tda998x: - HDMI audio ASoC support sunxi: - Allwinner A33 support - better TCON support msm: - DT binding cleanups - Explicit fence-fd support sti: - remove sti415/416 support etnaviv: - MMUv2 refactoring - GC3000 support exynos: - Refactoring HDMI DCC/PHY - G2D pm regression fix - Page fault issues with wait for vblank There is no nouveau work in this tree, as Ben didn't get a pull request in, and he was fighting moving to atomic and adding mst support, so maybe best it waits for a cycle" * tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits) drm/crtc: constify drm_crtc_index parameter drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next drm/i915/guc: Unwind GuC workqueue reservation if request construction fails drm/i915: Reset the breadcrumbs IRQ more carefully drm/i915: Force relocations via cpu if we run out of idle aperture drm/i915: Distinguish last emitted request from last submitted request drm/i915: Allow DP to work w/o EDID drm/i915: Move long hpd handling into the hotplug work drm/i915/execlists: Reinitialise context image after GPU hang drm/i915: Use correct index for backtracking HUNG semaphores drm/i915: Unalias obj->phys_handle and obj->userptr drm/i915: Just clear the mmiodebug before a register access drm/i915/gen9: only add the planes actually affected by ddb changes drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED drm/i915/bxt: Fix HDMI DPLL configuration drm/i915/gen9: fix the watermark res_blocks value drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations drm/i915/gen9: minimum scanlines for Y tile is not always 4 drm/i915/gen9: fix the WaWmMemoryReadLatency implementation drm/i915/kbl: KBL also needs to run the SAGV code ...	2016-10-11 18:12:22 -07:00
Chris Wilson	908b123225	drm/i915: Convert open-coded use of vma_pages() If we want to know how many pages a VMA spans, we can use vma_pages() to find out. We have one such invocation inside our faulthandler, so convert it. (We have two other that want the size in bytes rather than pages, food for future thought.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161011090656.29554-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2016-10-11 10:54:20 +01:00
Chris Wilson	871dfbd67d	drm/i915: Allow compaction upto SWIOTLB max segment size commit `1625e7e549` ("drm/i915: make compact dma scatter lists creation work with SWIOTLB backend") took a heavy handed approach to undo the scatterlist compaction in the face of SWIOTLB. (The compaction hit a bug whereby we tried to pass a segment larger than SWIOTLB could handle.) We can be a little more intelligent and try compacting the scatterlist up to the maximum SWIOTLB segment size (when using SWIOTLB). v2: Tidy sg_mark_end() and cpp Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> CC: Imre Deak <imre.deak@intel.com> CC: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161011082021.14606-2-chris@chris-wilson.co.uk	2016-10-11 10:15:01 +01:00
Chris Wilson	465350d0db	drm/i915: Remove self-harming shrink_all on get_pages_gtt fail When we notice the system under memory pressure, we try to evict some driver pages before asking the VM to shrink all caches. As a final step in that process, we tried to evict everything, including active buffers. This is harming ourselves, and we can mix shrinking all caches as well as our residual buffers (after the first pass of trying to shrink just our own buffers). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161011082021.14606-1-chris@chris-wilson.co.uk	2016-10-11 10:15:01 +01:00
Chris Wilson	105f1a65b0	drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next The conflict resolution of v4.8-rc8 backmerge to drm-next pulled back in a few lines of dead code due to the code movement around i915_gem_reset(), fix that up. Fixes: `ca09fb9f60` ("Merge tag 'v4.8-rc8' into drm-next") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161010125017.23911-1-chris@chris-wilson.co.uk	2016-10-10 16:12:21 +03:00
Chris Wilson	ec7ce653d9	drm/i915: Only shrink the unbound objects during freeze At the point of creating the hibernation image, the runtime power manage core is disabled - and using the rpm functions triggers a warn. i915_gem_shrink_all() tries to unbind objects, which requires device access and so tries to how an rpm reference triggering a warning: [ 44.235420] ------------[ cut here ]------------ [ 44.235424] WARNING: CPU: 2 PID: 2199 at drivers/gpu/drm/i915/intel_runtime_pm.c:2688 intel_runtime_pm_get_if_in_use+0xe6/0xf0 [ 44.235426] WARN_ON_ONCE(ret < 0) [ 44.235445] Modules linked in: ctr ccm arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib crc_ccitt mac80211 cmac cfg80211 btusb rfcomm bnep btrtl btbcm btintel bluetooth dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper cryptd snd_hda_codec hid_multitouch joydev snd_hda_core binfmt_misc i2c_hid serio_raw snd_pcm acpi_pad snd_timer snd i2c_designware_platform 8250_dw nls_iso8859_1 i2c_designware_core lpc_ich mfd_core soundcore usbhid hid psmouse ahci libahci [ 44.235447] CPU: 2 PID: 2199 Comm: kworker/u8:8 Not tainted 4.8.0-rc5+ #130 [ 44.235447] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A07 11/11/2015 [ 44.235450] Workqueue: events_unbound async_run_entry_fn [ 44.235453] 0000000000000000 ffff8801b2f7fb98 ffffffff81306c2f ffff8801b2f7fbe8 [ 44.235454] 0000000000000000 ffff8801b2f7fbd8 ffffffff81056c01 00000a801f50ecc0 [ 44.235456] ffff88020ce50000 ffff88020ce59b60 ffffffff81a60b5c ffffffff81414840 [ 44.235456] Call Trace: [ 44.235459] [<ffffffff81306c2f>] dump_stack+0x4d/0x6e [ 44.235461] [<ffffffff81056c01>] __warn+0xd1/0xf0 [ 44.235464] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30 [ 44.235465] [<ffffffff81056c6f>] warn_slowpath_fmt+0x4f/0x60 [ 44.235468] [<ffffffff814e73ce>] ? pm_runtime_get_if_in_use+0x6e/0xa0 [ 44.235469] [<ffffffff81433526>] intel_runtime_pm_get_if_in_use+0xe6/0xf0 [ 44.235471] [<ffffffff81458a26>] i915_gem_shrink+0x306/0x360 [ 44.235473] [<ffffffff81343fd4>] ? pci_platform_power_transition+0x24/0x90 [ 44.235475] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30 [ 44.235476] [<ffffffff81458dfb>] i915_gem_shrink_all+0x1b/0x30 [ 44.235478] [<ffffffff814560b3>] i915_gem_freeze_late+0x33/0x90 [ 44.235479] [<ffffffff81414877>] i915_pm_freeze_late+0x37/0x40 [ 44.235481] [<ffffffff814e9b8e>] dpm_run_callback+0x4e/0x130 [ 44.235483] [<ffffffff814ea5db>] __device_suspend_late+0xdb/0x1f0 [ 44.235484] [<ffffffff814ea70f>] async_suspend_late+0x1f/0xa0 [ 44.235486] [<ffffffff81077557>] async_run_entry_fn+0x37/0x150 [ 44.235488] [<ffffffff8106f518>] process_one_work+0x148/0x3f0 [ 44.235490] [<ffffffff8106f8eb>] worker_thread+0x12b/0x490 [ 44.235491] [<ffffffff8106f7c0>] ? process_one_work+0x3f0/0x3f0 [ 44.235492] [<ffffffff81074d09>] kthread+0xc9/0xe0 [ 44.235495] [<ffffffff816e257f>] ret_from_fork+0x1f/0x40 [ 44.235496] [<ffffffff81074c40>] ? kthread_park+0x60/0x60 [ 44.235497] ---[ end trace e438706b97c7f132 ]--- Alternatively, to actually shrink everything we have to do so slightly earlier in the hibernation process. To keep lockdep silent, we need to take struct_mutex for the shrinker even though we know that we are the only user during the freeze. Fixes: `7aab2d534e` ("drm/i915: Shrink objects prior to hibernation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-2-chris@chris-wilson.co.uk (cherry picked from commit `6a800eabba`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-10-10 16:06:36 +03:00
Chris Wilson	ac75694125	drm/i915: Restore current RPS state after reset Following commit `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") we no longer mark the context as lost on reset as we keep the requests (and contexts) alive. However, RPS remains reset and we need to restore the current state to match the in-flight requests. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97824 Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-1-chris@chris-wilson.co.uk (cherry picked from commit `f2a91d1a6f`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-10-10 16:06:35 +03:00
Chris Wilson	77c607013e	drm/i915: Double check hangcheck.seqno after reset Check that there was not a late recovery between us declaring the GPU hung and processing the reset. If the GPU did recover by itself, let the request remain on the active list and see if it hangs again! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-5-chris@chris-wilson.co.uk	2016-10-05 08:40:06 +01:00
Chris Wilson	9e60ab0387	drm/i915: Disable irqs across GPU reset Whilst we reset the GPU, we want to prevent execlists from submitting new work (which it does via an interrupt handler). To achieve this we disable the irq (and drain the irq tasklet) around the reset. When we enable it again afters, the interrupt queue should be empty and we can reinitialise from a known state without fear of the tasklet running concurrently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-4-chris@chris-wilson.co.uk	2016-10-05 08:40:06 +01:00
Dave Airlie	ca09fb9f60	Linux 4.8-rc8 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJX6H4uAAoJEHm+PkMAQRiG5sMH/3yzrMiUCSokdS+cvY+jgKAG JS58JmRvBPz2mRaU3MRPBGRDeCz/Nc9LggL2ZcgM+E1ZYirlYyQfIED3lkqk5R07 kIN1wmb+kQhXyU4IY3fEX7joqyKC6zOy4DUChPkBQU0/0+VUmdVmcJvsuPlnMZtf g95m0BdYTui+eDezASRqOEp3Lb5ONL4c3ao4yBP0LHF033ctj3VJQiyi5uERPZJ0 5e6Mo7Wxn78t9WqJLQAiEH46kTwT2plNlxf3XXqTenfIdbWhqE873HPGeSMa3VQV VywXTpCpSPQsA8BYg66qIbebdKOhs9MOviHVfqDtwQlvwhjlBDya0gNHfI5fSy4= =Y/L5 -----END PGP SIGNATURE----- Merge tag 'v4.8-rc8' into drm-next Linux 4.8-rc8 There was a lot of fallout in the imx/amdgpu/i915 drivers, so backmerge it now to avoid troubles. * tag 'v4.8-rc8': (1442 commits) Linux 4.8-rc8 fault_in_multipages_readable() throws set-but-unused error mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing radix tree: fix sibling entry handling in radix_tree_descend() radix tree test suite: Test radix_tree_replace_slot() for multiorder entries fix memory leaks in tracing_buffers_splice_read() tracing: Move mutex to protect against resetting of seq data MIPS: Fix delay slot emulation count in debugfs MIPS: SMP: Fix possibility of deadlock when bringing CPUs online mm: delete unnecessary and unsafe init_tlb_ubc() huge tmpfs: fix Committed_AS leak shmem: fix tmpfs to handle the huge= option properly blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx MIPS: Fix pre-r6 emulation FPU initialisation arm64: kgdb: handle read-only text / modules arm64: Call numa_store_cpu_info() earlier. locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text nvme-rdma: only clear queue flags after successful connect i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended perf/core: Limit matching exclusive events to one PMU ...	2016-09-28 12:08:49 +10:00
Al Viro	4bce9f6ee8	get rid of separate multipage fault-in primitives * the only remaining callers of "short" fault-ins are just as happy with generic variants (both in lib/iov_iter.c); switch them to multipage variants, kill the "short" ones * rename the multipage variants to now available plain ones. * get rid of compat macro defining iov_iter_fault_in_multipage_readable by expanding it in its only user. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-09-27 18:12:24 -04:00
Chris Wilson	6a800eabba	drm/i915: Only shrink the unbound objects during freeze At the point of creating the hibernation image, the runtime power manage core is disabled - and using the rpm functions triggers a warn. i915_gem_shrink_all() tries to unbind objects, which requires device access and so tries to how an rpm reference triggering a warning: [ 44.235420] ------------[ cut here ]------------ [ 44.235424] WARNING: CPU: 2 PID: 2199 at drivers/gpu/drm/i915/intel_runtime_pm.c:2688 intel_runtime_pm_get_if_in_use+0xe6/0xf0 [ 44.235426] WARN_ON_ONCE(ret < 0) [ 44.235445] Modules linked in: ctr ccm arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib crc_ccitt mac80211 cmac cfg80211 btusb rfcomm bnep btrtl btbcm btintel bluetooth dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper cryptd snd_hda_codec hid_multitouch joydev snd_hda_core binfmt_misc i2c_hid serio_raw snd_pcm acpi_pad snd_timer snd i2c_designware_platform 8250_dw nls_iso8859_1 i2c_designware_core lpc_ich mfd_core soundcore usbhid hid psmouse ahci libahci [ 44.235447] CPU: 2 PID: 2199 Comm: kworker/u8:8 Not tainted 4.8.0-rc5+ #130 [ 44.235447] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A07 11/11/2015 [ 44.235450] Workqueue: events_unbound async_run_entry_fn [ 44.235453] 0000000000000000 ffff8801b2f7fb98 ffffffff81306c2f ffff8801b2f7fbe8 [ 44.235454] 0000000000000000 ffff8801b2f7fbd8 ffffffff81056c01 00000a801f50ecc0 [ 44.235456] ffff88020ce50000 ffff88020ce59b60 ffffffff81a60b5c ffffffff81414840 [ 44.235456] Call Trace: [ 44.235459] [<ffffffff81306c2f>] dump_stack+0x4d/0x6e [ 44.235461] [<ffffffff81056c01>] __warn+0xd1/0xf0 [ 44.235464] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30 [ 44.235465] [<ffffffff81056c6f>] warn_slowpath_fmt+0x4f/0x60 [ 44.235468] [<ffffffff814e73ce>] ? pm_runtime_get_if_in_use+0x6e/0xa0 [ 44.235469] [<ffffffff81433526>] intel_runtime_pm_get_if_in_use+0xe6/0xf0 [ 44.235471] [<ffffffff81458a26>] i915_gem_shrink+0x306/0x360 [ 44.235473] [<ffffffff81343fd4>] ? pci_platform_power_transition+0x24/0x90 [ 44.235475] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30 [ 44.235476] [<ffffffff81458dfb>] i915_gem_shrink_all+0x1b/0x30 [ 44.235478] [<ffffffff814560b3>] i915_gem_freeze_late+0x33/0x90 [ 44.235479] [<ffffffff81414877>] i915_pm_freeze_late+0x37/0x40 [ 44.235481] [<ffffffff814e9b8e>] dpm_run_callback+0x4e/0x130 [ 44.235483] [<ffffffff814ea5db>] __device_suspend_late+0xdb/0x1f0 [ 44.235484] [<ffffffff814ea70f>] async_suspend_late+0x1f/0xa0 [ 44.235486] [<ffffffff81077557>] async_run_entry_fn+0x37/0x150 [ 44.235488] [<ffffffff8106f518>] process_one_work+0x148/0x3f0 [ 44.235490] [<ffffffff8106f8eb>] worker_thread+0x12b/0x490 [ 44.235491] [<ffffffff8106f7c0>] ? process_one_work+0x3f0/0x3f0 [ 44.235492] [<ffffffff81074d09>] kthread+0xc9/0xe0 [ 44.235495] [<ffffffff816e257f>] ret_from_fork+0x1f/0x40 [ 44.235496] [<ffffffff81074c40>] ? kthread_park+0x60/0x60 [ 44.235497] ---[ end trace e438706b97c7f132 ]--- Alternatively, to actually shrink everything we have to do so slightly earlier in the hibernation process. To keep lockdep silent, we need to take struct_mutex for the shrinker even though we know that we are the only user during the freeze. Fixes: `7aab2d534e` ("drm/i915: Shrink objects prior to hibernation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-2-chris@chris-wilson.co.uk	2016-09-21 16:57:06 +01:00
Chris Wilson	f2a91d1a6f	drm/i915: Restore current RPS state after reset Following commit `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") we no longer mark the context as lost on reset as we keep the requests (and contexts) alive. However, RPS remains reset and we need to restore the current state to match the in-flight requests. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97824 Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-1-chris@chris-wilson.co.uk	2016-09-21 16:52:39 +01:00
Chris Wilson	7aab2d534e	drm/i915: Shrink objects prior to hibernation In an attempt to keep the hibernation image as same as possible, let's try and discard any unwanted pages and our own page arrays. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909190218.16831-1-chris@chris-wilson.co.uk	2016-09-09 20:07:46 +01:00
Chris Wilson	a2bc4695bb	drm/i915: Prepare object synchronisation for asynchronicity We are about to specialize object synchronisation to enable nonblocking execbuf submission. First we make a copy of the current object synchronisation for execbuffer. The general i915_gem_object_sync() will be removed following the removal of CS flips in the near future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk	2016-09-09 14:23:06 +01:00
Chris Wilson	5590af3e11	drm/i915: Drive request submission through fence callbacks Drive final request submission from a callback from the fence. This way the request is queued until all dependencies are resolved, at which point it is handed to the backend for queueing to hardware. At this point, no dependencies are set on the request, so the callback is immediate. A side-effect of imposing a heavier-irqsafe spinlock for execlist submission is that we lose the softirq enabling after scheduling the execlists tasklet. To compensate, we manually kickstart the softirq by disabling and enabling the bh around the fence signaling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk	2016-09-09 14:23:05 +01:00
Chris Wilson	821ed7df6e	drm/i915: Update reset path to fix incomplete requests Update reset path in preparation for engine reset which requires identification of incomplete requests and associated context and fixing their state so that engine can resume correctly after reset. The request that caused the hang will be skipped and head is reset to the start of breadcrumb. This allows us to resume from where we left-off. Since this request didn't complete normally we also need to cleanup elsp queue manually. This is vital if we employ nonblocking request submission where we may have a web of dependencies upon the hung request and so advancing the seqno manually is no longer trivial. ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS We change the way we count pending batches. Only the active context involved in the reset is marked as either innocent or guilty, and not mark the entire world as pending. By inspection this only affects igt/gem_reset_stats (which assumes implementation details) and not piglit. ARB_robustness gives this guide on how we expect the user of this interface to behave: * Provide a mechanism for an OpenGL application to learn about graphics resets that affect the context. When a graphics reset occurs, the OpenGL context becomes unusable and the application must create a new context to continue operation. Detecting a graphics reset happens through an inexpensive query. And with regards to the actual meaning of the reset values: Certain events can result in a reset of the GL context. Such a reset causes all context state to be lost. Recovery from such events requires recreation of all objects in the affected context. The current status of the graphics reset state is returned by enum GetGraphicsResetStatusARB(); The symbolic constant returned indicates if the GL context has been in a reset state at any point since the last call to GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context has not been in a reset state since the last call. GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected that is attributable to the current GL context. INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that is not attributable to the current GL context. UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose cause is unknown. The language here is explicit in that we must mark up the guilty batch, but is loose enough for us to relax the innocent (i.e. pending) accounting as only the active batches are involved with the reset. In the future, we are looking towards single engine resetting (with minimal locking), where it seems inappropriate to mark the entire world as innocent since the reset occurred on a different engine. Reducing the information available means we only have to encounter the pain once, and also reduces the information leaking from one context to another. v2: Legacy ringbuffer submission required a reset following hibernation, or else we restore stale values to the RING_HEAD and walked over stolen garbage. v3: GuC requires replaying the requests after a reset. v4: Restore engine IRQ after reset (so waiters will be woken!) Rearm hangcheck if resetting with a waiter. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk	2016-09-09 14:23:05 +01:00
Chris Wilson	22dd3bb919	drm/i915: Mark up all locked waiters In the next patch we want to handle reset directly by a locked waiter in order to avoid issues with returning before the reset is handled. To handle the reset, we must first know whether we hold the struct_mutex. If we do not hold the struct_mtuex we can not perform the reset, but we do not block the reset worker either (and so we can just continue to wait for request completion) - otherwise we must relinquish the mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk	2016-09-09 14:23:03 +01:00
Chris Wilson	ea746f3659	drm/i915: Expand bool interruptible to pass flags to i915_wait_request() We need finer control over wakeup behaviour during i915_wait_request(), so expand the current bool interruptible to a bitmask. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk	2016-09-09 14:23:03 +01:00

1 2 3 4 5 ...

1754 Commits