-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbz8FfAAoJEAx081l5xIa+t8AQAJ6KaMM7rYlRDzIr1Vuh0++2
kFmfEQnSZnOWxO+zpyQpCJQr+/1aAHQ6+QzWq+/fX/bwH01H1Q4+z6t+4QoPqYlw
rUYXZYRxnqGVPYx8hwNLmRbJcsXMVKly1SemBXIoabKkEtPNX5AZ4FXCR2WEbsV3
/YLxsYQv2KMd8aoeC9Hupa07Jj9GfOtEo0a9B1hKmo+XiF9HqPadxaofqOpQ6MCh
54itBgP7Kj4mwwr8KxG2JCNJagG5aG8q3yiEwaU5b0KzWya4o1wrOAJcBaEAIxpj
JAgPde+f2L6w9dQbBBWeVFYKNn0jJqJdmbPs5Ek3i/NNFyx01Mn/3vlTZoRUqJN7
TGXwOI/BWz1iTaHyFPqVH6RPQAoUUDeCwgHkXonogFxvQLpiFG+dRNqxue0XVUMX
9tDSdZefWPoH3n9J/gDhwbV2Qbw/2n6yzCRYCb8HkqX1Y1JTmdYVgKvcnOwyYwsJ
QzcVkWUJ31UAaZcTLCEW6SVqcUR0mso3LJAPSKp2NJiVLL8mSd/ViUTUbxRNkkXf
H0abVGDjWAAZaT5uqNVqg4kV1Vc4Kj+/9QtspW4ktGezOz9DsctwJtfhTgOmT8Fx
zlEwWmAbf1iJP9UgqI7r4+Nq24saqUYmIX0bowEasLIRO+l14Pf9mQJjgKRcMs/j
SK4W5EreSFosKsxtQU4H
=3yxi
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This is going to rebuild more than drm as it adds a new helper to
list.h for doing bulk updates. Seemed like a reasonable addition to
me.
Otherwise the usual merge window stuff lots of i915 and amdgpu, not so
much nouveau, and piles of everything else.
Core:
- Adds a new list.h helper for doing bulk list updates for TTM.
- Don't leak fb address in smem_start to userspace (comes with EXPORT
workaround for people using mali out of tree hacks)
- udmabuf device to turn memfd regions into dma-buf
- Per-plane blend mode property
- ref/unref replacements with get/put
- fbdev conflicting framebuffers code cleaned up
- host-endian format variants
- panel orientation quirk for Acer One 10
bridge:
- TI SN65DSI86 chip support
vkms:
- GEM support.
- Cursor support
amdgpu:
- Merge amdkfd and amdgpu into one module
- CEC over DP AUX support
- Picasso APU support + VCN dynamic powergating
- Raven2 APU support
- Vega20 enablement + kfd support
- ACP powergating improvements
- ABGR/XBGR display support
- VCN jpeg support
- xGMI support
- DC i2c/aux cleanup
- Ycbcr 4:2:0 support
- GPUVM improvements
- Powerplay and powerplay endian fixes
- Display underflow fixes
vmwgfx:
- Move vmwgfx specific TTM code to vmwgfx
- Split out vmwgfx buffer/resource validation code
- Atomic operation rework
bochs:
- use more helpers
- format/byteorder improvements
qxl:
- use more helpers
i915:
- GGTT coherency getparam
- Turn off resource streamer API
- More Icelake enablement + DMC firmware
- Full PPGTT for Ivybridge, Haswell and Valleyview
- DDB distribution based on resolution
- Limited range DP display support
nouveau:
- CEC over DP AUX support
- Initial HDMI 2.0 support
virtio-gpu:
- vmap support for PRIME objects
tegra:
- Initial Tegra194 support
- DMA/IOMMU integration fixes
msm:
- a6xx perf improvements + clock prefix
- GPU preemption optimisations
- a6xx devfreq support
- cursor support
rockchip:
- PX30 support
- rgb output interface support
mediatek:
- HDMI output support on mt2701 and mt7623
rcar-du:
- Interlaced modes on Gen3
- LVDS on R8A77980
- D3 and E3 SoC support
hisilicon:
- misc fixes
mxsfb:
- runtime pm support
sun4i:
- R40 TCON support
- Allwinner A64 support
- R40 HDMI support
omapdrm:
- Driver rework changing display pipeline ordering to use common code
- DMM memory barrier and irq fixes
- Errata workarounds
exynos:
- out-bridge support for LVDS bridge driver
- Samsung 16x16 tiled format support
- Plane alpha and pixel blend mode support
tilcdc:
- suspend/resume update
mali-dp:
- misc updates"
* tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drm: (1382 commits)
firmware/dmc/icl: Add missing MODULE_FIRMWARE() for Icelake.
drm/i915/icl: Fix signal_levels
drm/i915/icl: Fix DDI/TC port clk_off bits
drm/i915/icl: create function to identify combophy port
drm/i915/gen9+: Fix initial readout for Y tiled framebuffers
drm/i915: Large page offsets for pread/pwrite
drm/i915/selftests: Disable shrinker across mmap-exhaustion
drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode
drm/i915: Fix intel_dp_mst_best_encoder()
drm/i915: Skip vcpi allocation for MSTB ports that are gone
drm/i915: Don't unset intel_connector->mst_port
drm/i915: Only reset seqno if actually idle
drm/i915: Use the correct crtc when sanitizing plane mapping
drm/i915: Restore vblank interrupts earlier
drm/i915: Check fb stride against plane max stride
drm/amdgpu/vcn:Fix uninitialized symbol error
drm: panel-orientation-quirks: Add quirk for Acer One 10 (S1003)
drm/amd/amdgpu: Fix debugfs error handling
drm/amdgpu: Update gc_9_0 golden settings.
drm/amd/powerplay: update PPtable with DC BTC and Tvr SocLimit fields
...
Latency is in the eye of the beholder. In the case where a client stops
and waits for the gpu, give that request chain a small priority boost
(not so that it overtakes higher priority clients, to preserve the
external ordering) so that ideally the wait completes earlier.
v2: Tvrtko recommends to keep the boost-from-user-stall as small as
possible and to allow new client flows to be preferred for interactivity
over stalls.
Testcase: igt/gem_sync/switch-default
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk
Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries. This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry). It is also a change in emphasis; exceptional entries are
intimidating and different. As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Partial views are small but there can be many of them, and since the sg
list space for them is allocated pessimistically, we can save some slab by
trimming the unused tail entries.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926080353.20867-1-tvrtko.ursulin@linux.intel.com
Right now RESET_PCH_HANDSHAKE_ENABLE is enabled all the times inside
of intel_power_domains_init_hw() and if PCH is NOP it is unsed in
i915_gem_init_hw().
So making skl_pch_reset_handshake() handle both cases and calling
it for the missing gens in intel_power_domains_init_hw().
Ivybridge have a different register and bits but with the same
objective so moving it too.
v2(Rodrigo):
- handling IVYBRIDGE case inside intel_pch_reset_handshake()
v4(Rodrigo and Ville):
- moving the enable/disable decision to callers
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180918204714.27306-2-jose.souza@intel.com
In commit 9144d75e22 ("include/linux/bitops.h: introduce BITS_PER_TYPE"),
we made BITS_PER_TYPE available to all and now we can use the macro to
replace some open-coded computation of sizeof(T) * BITS_PER_BYTE.
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926104707.17410-1-chris@chris-wilson.co.uk
Once we have flushed the first request through the system to both load a
context and record the default state; tell the GPU to park and idle
itself, putting itself immediately (hopefully at least) into a
powersaving state, and allowing ourselves to start from known state
after setting up all our bookkeeping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920161343.1117-1-chris@chris-wilson.co.uk
If we copy all the contents of the sg across and not just the page link,
we can then also put it to work in fake_get_huge_pages and beyond.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920142707.19659-1-matthew.auld@intel.com
The prior assumption was that we did not need to reset the CSB on
wedging when cancelling the outstanding requests as it would be cleaned
up in the subsequent reset prior to restarting the GPU. However, what
was not accounted for was that in preparing for the reset, we would try
to process the outstanding CSB entries. If the GPU happened to complete
a CS event just as we were performing the cancellation of requests, that
event would be kept in the CSB until the reset -- but our bookkeeping
was cleared, causing confusion when trying to complete the CS event.
v2: Use a sanitize on unwedge to avoid interfering with eio suspend
(where we intentionally disable GPU reset).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107925
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-3-chris@chris-wilson.co.uk
That we use a WB mapping for updating the RING_TAIL register inside the
context image even on !llc machines has been a source of consternation
for every reader. It appears to work on bsw+, but it may just have been
that we have been incredibly bad at detecting the errors.
v2: With extra enthusiasm.
v3: Drop force of map type for pinned default_state as by the time we
pin it, the map type is always WB and doesn't conflict with the earlier
use by ce->state.
v4: Transfer engine->default_state from MAP_WC to MAP_WB on creation so
we do not need the MAP_FORCE littered around the backends
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914123504.2062-3-chris@chris-wilson.co.uk
Check we can indeed acquire a WB mapping of the context image on module
load. Later this will give us the opportunity to validate that we can
switch from WC to WB as required.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914123504.2062-2-chris@chris-wilson.co.uk
We currently try to pin and allocate the whole buffer at a time. If that
object is larger than RAM, we will try to pin the whole of physical
memory, force the machine into oom, and then still fail the allocation.
If the request is obviously too large, error out early. We opt to do
this in the backend to make it easy to use alternate paths that do not
require the entire object pinned, or may easily handle proxy objects
that are larger than physical memory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-4-chris@chris-wilson.co.uk
We do not explicitly mark the PTE for the user's GTT mmap as being
wrprotect, so we don't get a refault when we would need to change a
read-only mmapping into read-write. As such, we must presume that if the
vma has PROT_WRITE it may be written to, although this is supposed to be
indicated by set-domain there are cases (e.g. after swap) where
userspace may not be aware of the implicit domain change.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-2-chris@chris-wilson.co.uk
Although we cannot do a full system-level test of suspend/hibernate from
deep with the kernel selftests, we can exercise the GEM subsystem in
isolation and simulate the external effects (such as losing stolen
contents and trashing the register state).
v2: Don't forget to hold rpm
v3: Suspend the GTT mappings, and more rpm!
References: https://bugs.freedesktop.org/show_bug.cgi?id=96526
References: 5ab57c7020 ("drm/i915: Flush logical context image out to memory upon suspend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jakub Bartmiński <jakub.bartminski@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Jakub Bartmiński <jakub.bartminski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180830134806.21939-1-chris@chris-wilson.co.uk
Hitting the timeout and finding that all engines are actually idle is
indicative of an interrupt delivery problem. This problem is an issue
that we need to fix, so make sure we log it and provide the GEM trace.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-1-chris@chris-wilson.co.uk
If we issue a device level GPU reset on the older gen, it will disable
key components of the GMCH and the display engine. The purpose of
wedging is to simply prevent further GEM usage without disabling KMS, so
we need to be careful when we do issue the reset on wedging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726085033.4044-3-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
If we fail during GEM initialisation, we scrub the HW state by
performing a device level GPU resuet. However, we want to leave the
system in a usable state (with functioning KMS but no GEM) so after
scrubbing the HW state, we need to restore some sane defaults and
re-enable the low-level common parts of the GPU (such as the GMCH).
v2: Restore GTT entries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726085033.4044-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Not all chipsets have an internal buffer delaying the visibility of
writes via the GGTT being visible by other physical paths, but we use a
very heavy workaround for all. We only need to apply that workarounds to
the chipsets we know suffer from the delay and the resulting coherency
issue.
Similarly, the same inconsistent coherency fouls up our ABI promise that
a write into a mmap_gtt is immediately visible to others. Since the HW
has made that a lie, let userspace know when that contract is broken.
(Not that userspace would want to use mmap_gtt on those chipsets for
other performance reasons...)
Testcase: igt/drv_selftest/live_coherency
Testcase: igt/gem_mmap_gtt/coherency
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180720101910.11153-1-chris@chris-wilson.co.uk
If the driver is wedged, we skip idling the GPU. However, we may still
have a few requests still not retired following the wedging (since they
will be waiting for a background worker trying to acquire struct_mutex).
As we hold the struct_mutex, always do a quick request retirement in
order to flush the wedged path.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717084121.28185-1-chris@chris-wilson.co.uk
Our I915g (early gen3, the oldest machine we have in the farm) is still
reporting occasional incoherency performing the following operations:
1) write through GGTT (indirect write into memory)
2) write through either CPU or WC (direct write into memory)
3) read from GGTT (indirect read)
Instead of reporting the value from (2), the read from GGTT reports the
earlier value written via the GGTT. We have made sure that the writes are
flushed from the CPU (commit 3a32497f0d ("drm/i915/selftests: Provide
full mb() around clflush") and commit add00e6d89 ("drm/i915: Flush the
WCB following a WC write")), but still see the error, just less
frequently. The only remaining cache that might be affected here is a
chipset cache, so flush that as well.
Testcase: igt/drv_selftest/live_coherency #gdg
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717092655.28417-1-chris@chris-wilson.co.uk
If the user created a read-only object, they should not be allowed to
circumvent the write protection using the pwrite ioctl.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-5-chris@chris-wilson.co.uk
If the user has created a read-only object, they should not be allowed
to circumvent the write protection by using a GGTT mmapping. Deny it.
Also most machines do not support read-only GGTT PTEs, so again we have
to reject attempted writes. Fortunately, this is known a priori, so we
can at least reject in the call to create the mmap (with a sanity check
in the fault handler).
v2: Check the vma->vm_flags during mmap() to allow readonly access.
v3: Remove VM_MAYWRITE to curtail mprotect()
Testcase: igt/gem_userptr_blits/readonly_mmap*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> #v1
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-4-chris@chris-wilson.co.uk
Let's reorder things so that we can do onion teardown rather than double
goto.
References: b96f6ebfd0 ("drm/i915: Correctly handle error path in i915_gem_init_hw")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712124810.25241-1-michal.winiarski@intel.com
On unwinding following a critical failure inside GEM init, we also need
to be sure to flush the workers before unloading the module.
Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180710094421.16223-1-chris@chris-wilson.co.uk
In the next patch, we will make a fairly minor change to flush
outstanding resets before suspend. In order to keep churn to a minimum
in that functional patch, we fix up the comments and coding style now.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-7-chris@chris-wilson.co.uk
With a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.
We can therefore set a timeout on our wait-for-idle that is shorter than
the hangcheck (which may be up to 60s for a declaring a wedged driver)
and so detect the broken GPU much more quickly during driver load (and
so prevent stalling userspace for ages).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-2-chris@chris-wilson.co.uk
Usually we have no idea about the upper bound we need to wait to catch
up with userspace when idling the device, but in a few situations we
know the system was idle beforehand and can provide a short timeout in
order to very quickly catch a failure, long before hangcheck kicks in.
In the following patches, we will use the timeout to curtain two overly
long waits, where we know we can expect the GPU to complete within a
reasonable time or declare it broken.
In particular, with a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.
The other improvement is that in selftests, we do not need to arm an
independent timer to inject a wedge, as we can just limit the timeout on
the wait directly.
v2: Include the timeout parameter in the trace.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
In the next patch, we will want a third distinct class of timeline that
may overlap with the current pair of client and engine timeline classes.
Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise
the different timeline classes with an explicit subclass.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk
In the next patch, we will want to start skipping requests on failing to
complete their payloads. So export the utility function current used to
make requests inoperable following a failed gpu reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-2-chris@chris-wilson.co.uk
If we have just completed a WC write, we must ensure that the WCB (Write
Combining Buffer) is flushed out to main memory before we can expect to
see the results. This is especially important when mixing WC with GTT as
the physical paths are different and cachelines are not naturally flushed.
Testcase: igt/drv_selftests/live_coherency #gdg
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706115402.18547-1-chris@chris-wilson.co.uk
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 141432
Addresses-Coverity-ID: 141433
Addresses-Coverity-ID: 141434
Addresses-Coverity-ID: 141435
Addresses-Coverity-ID: 141436
Addresses-Coverity-ID: 1357360
Addresses-Coverity-ID: 1357403
Addresses-Coverity-ID: 1357433
Addresses-Coverity-ID: 1392622
Addresses-Coverity-ID: 1415273
Addresses-Coverity-ID: 1435752
Addresses-Coverity-ID: 1441500
Addresses-Coverity-ID: 1454596
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com
If the whole object is already pinned by HW for use as scanout, we will
fail to move it to the mappable region and so must resort to using a
partial VMA covering the whole object.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104513
Fixes: aa136d9d72 ("drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180630090509.469-1-chris@chris-wilson.co.uk
We're fetching GuC/HuC firmwares directly from uc level during
init_early stage but this breaks guc/huc struct isolation and
also strict SW-only initialization rule for init_early. Move fw
fetching to init phase and do it separately per guc/huc struct.
v2: don't forget to move wopcm_init - Michele
v3: fetch in init_misc phase - Michal
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> #2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628141522.62788-2-michal.wajdeczko@intel.com
In the next^W forthcoming patch, we will start to defer retiring the
request from the engine list if it is still active on the submission
backend. To preserve the semantics that after wait-for-idle completes
the system is idle and fully retired, we need to therefore wait for the
backends to idle before calling i915_retire_requests().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627115334.16282-1-chris@chris-wilson.co.uk
Since we trigger 10,000s of hangs and resets during selftesting, we emit
many, many thousands of lines of useless debug messages. Reduce the
frequency by only logging a change in state of a guilty context.
Fixes: 14921f3cef ("drm/i915: Fix context ban and hang accounting for client")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180618073135.10849-1-chris@chris-wilson.co.uk
commit b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on
reset/sanitization") and commit 1288786b18 ("drm/i915: Move GEM sanitize
from resume_early to resume") show the conflicting requirements on the
code. We must reset the GPU before trashing live state on a fast resume
(hibernation debug, or error paths), but we must only reset our state
tracking iff the GPU is reset (or power cycled). This is tricky if we
are disabling GPU reset to simulate broken hardware; we reset our state
tracking but the GPU is left intact and recovers from its stale state.
v2: Again without the assertion for forcewake, no longer required since
commit b3ee09a4de ("drm/i915/ringbuffer: Fix context restore upon reset")
as the contexts are reset from the CS ensuring everything is powered up.
Fixes: b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
Fixes: 1288786b18 ("drm/i915: Move GEM sanitize from resume_early to resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk
As we want to be able to call i915_reset_engine and co from a softirq or
timer context, we need to be irqsafe at all times. So we have to forgo
the simple spin_lock_irq for the full spin_lock_irqsave.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615093137.14270-1-chris@chris-wilson.co.uk
If client is smart or lucky enough to create a new context
after each hang, our context banning mechanism will never
catch up, and as a result of that it will be saved from
client banning. This can result in a never ending streak of
gpu hangs caused by bad or malicious client, preventing
access from other legit gpu clients.
Fix this by always incrementing per client ban score if
it hangs in short successions regardless of context ban
scoring. The exception are non bannable contexts. They remain
detached from client ban scoring mechanism.
v2: xchg timestamp, tidyup (Chris)
v3: comment, bannable & banned together (Chris)
Fixes: b083a0870c ("drm/i915: Add per client max context ban limit")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615104429.31477-1-mika.kuoppala@linux.intel.com
For symmetry, simplicity and ensuring the request is always truly idle
upon its completion, always emit the closing flush prior to emitting the
request breadcrumb. Previously, we would only emit the flush if we had
started a user batch, but this just leaves all the other paths open to
speculation (do they affect the GPU caches or not?) With mm switching, a
key requirement is that the GPU is flushed and invalidated before hand,
so for absolute safety, we want that closing flush be mandatory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk
Due to a silent conflict (silent because we are trying to fix the CI
test that is meant to exercising these failures!) between commit
51e645b665 ("drm/i915: Mark the GPU as wedged without error on fault
injection") and commit 8571a05a9d ("drm/i915: Use GEM suspend when
aborting initialisation"), we failed to actually squash the error
message after injecting the load failure.
Rearrange the code to export i915_load_failure() for better logging of
real errors (and quiet logging of injected errors).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180609111058.2660-1-chris@chris-wilson.co.uk
If we have been instructed (by CI) to inject a fault to load the module
with a wedged GPU, do so quietly less we upset CI.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180607134558.31150-1-chris@chris-wilson.co.uk
In preparation for vm_fault_t becoming a distinct type, convert the
fault handler (i915_gem_fault()) over to the new interface.
Based on a patch by Souptick Joarder
References: 1c8f422059 ("mm: change return type to vm_fault_t")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180606214520.20220-1-chris@chris-wilson.co.uk
As part of our GEM initialisation now, we send a request to the hardware
in order to record the initial GPU state. This coupled with deferred
idle workers, makes aborting on error tricky. We already have the
mechanism in place to wait on the GPU and cancel all the deferred
workers for suspend, so let's reuse it during the error teardown. It is
already used in places for later init error handling, but doing so at
this point is slightly ugly due to the mutex dance (it's ok, the module
load is still single threaded).
Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180606145441.4460-1-chris@chris-wilson.co.uk
In the near future, I want to subclass gen6_hw_ppgtt as it contains a
few specialised members and I wish to add more. To avoid the ugliness of
using ppgtt->base.base, rename the i915_hw_ppgtt base member
(i915_address_space) as vm, which is our common shorthand for an
i915_address_space local.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk
In function gem_init_hw() we are calling uc_init_hw() but in case
of error later in function, we missed to call matching uc_fini_hw()
v2: pulled out from the series
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605122443.23776-1-michal.wajdeczko@intel.com
This adds a new vGPU cap info bit VGT_CAPS_HUGE_GTT, which is to detect
whether the host supports shadowing of huge gtt pages. If host does
support it, remove the page sizes restriction for vGPU.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1525770425-5373-1-git-send-email-changbin.du@intel.com
We should keep i915_gem_init/fini functions together for easier
tracking of their symmetry.
v2: rebased, pulled out from the series
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180604090032.20840-1-michal.wajdeczko@intel.com
Let's not take any chances by using a shortcut to mark the objects as in
the CPU domain upon freezing (all pages will be written to disk and so
on restore all objects will start from the CPU domain). Currently, we
simply mark the objects as being in the CPU domain, bypassing the
flushes. Let's call the full domain transfer function so that we have
less special case code (and symmetry with the suspend path) even though
it will be mostly redundant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180601144125.18026-2-chris@chris-wilson.co.uk
As we have already suspended the device, this should be a no-op except
for marking that all writes are indeed complete. The downside is that
we then have to walk all the lists of objects for what should be a no-op
(in some cases they will be mmio read to ensure the GGTT writes are
indeed flushed, and clflushes to ensure that cpu writes are in memory).
It seems prudent and the safer course for us to ensure all writes are
flushed to memory before suspend.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180601144125.18026-1-chris@chris-wilson.co.uk
Now that we always switch to the kernel context upon idling, we can
make that assertion.
References: 4dfacb0bcb ("drm/i915: Switch to kernel context before idling at runtime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531224057.6036-1-chris@chris-wilson.co.uk
During testing we encounter a conflict between SUSPEND_TEST_DEVICES and
disabling reset (gem_eio/suspend). This results in the device continuing
on without being reset, but since it has gone through HW sanitization to
account for the suspend/resume cycle, we have to assume the device has
been reset to its defaults. A simple way around this is to skip the
sanitize phase for SUSPEND_TEST_DEVICES by moving it to suspend-late.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-4-chris@chris-wilson.co.uk
During suspend we want to flush out all active contexts and their
rendering. To do so we queue a request from the kernel's context, once
we know that request is done, we know the GPU is completely idle. To
speed up that switch bump the GPU clocks.
Switching to the kernel context prior to idling is also used to enforce
a barrier before changing OA properties, and when evicting active
rendering from the global GTT. All cases where we do want to
race-to-idle.
v2: Limit the boosting to only the switch before suspend.
v3: Limit it to the wait-for-idle on suspend.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Tested-by: David Weinehall <david.weinehall@linux.intel.com> #v1
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-2-chris@chris-wilson.co.uk
We can reduce our exposure to random neutrinos by resting on the kernel
context having flushed out the user contexts to system memory and
beyond. The corollary is that we then we require two passes through the
idle handler to go to sleep, which on a truly idle system involves an
extra pass through the slow and irregular retire work handler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-1-chris@chris-wilson.co.uk
Since we use i915_gem_find_active_request() from inside
intel_engine_dump() and may call that at any time, we do not guarantee
that the engine is paused nor that the signal kthreads and irq handler
are suspended, so we cannot assert that the breadcrumb doesn't advance
and that the irq hasn't happened on another CPU signaling the request we
believe to be idle.
The second assert removed (that request->engine == engine) remains
valid, but is now more rigorously checked during retirement.
Fixes: f636edb214 ("drm/i915: Make i915_engine_info pretty printer to standalone")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180529132922.6831-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We were not very carefully checking to see if an older request on the
engine was an earlier switch-to-kernel-context before deciding to emit a
new switch. The end result would be that we could get into a permanent
loop of trying to emit a new request to perform the switch simply to
flush the existing switch.
What we need is a means of tracking the completion of each timeline
versus the kernel context, that is to detect if a more recent request
has been submitted that would result in a switch away from the kernel
context. To realise this, we need only to look in our syncmap on the
kernel context and check that we have synchronized against all active
rings.
v2: Since all ringbuffer clients currently share the same timeline, we do
have to use the gem_context to distinguish clients.
As a bonus, include all the tracing used to debug the death inside
suspend.
v3: Test, test, test. Construct a selftest to exercise and assert the
expected behaviour that multiple switch-to-contexts do not emit
redundant requests.
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: a89d1f921c ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.
v2: Set mock_context->ops so we don't crash and burn in selftests.
Cleanups from Tvrtko.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk
In the next patch, we want to store the intel_context pointer inside
i915_request, as it is frequently access via a convoluted dance when
submitting the request to hw. Having two context pointers inside
i915_request leads to confusion so first rename the existing
i915_gem_context pointer to i915_request.gem_context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk
We cannot call kthread_park() from softirq context, so let's avoid it
entirely during the reset. We wanted to suspend the signaler so that it
would not mark a request as complete at the same time as we marked it as
being in error. Instead of parking the signaling, stop the engine from
advancing so that the GPU doesn't emit the breadcrumb for our chosen
"guilty" request.
v2: Refactor setting STOP_RING so that we don't have the same code thrice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michałt Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-8-chris@chris-wilson.co.uk
In preparation to more carefully handling incomplete preemption during
reset by execlists, we move the existing code wholesale to the backends
under a couple of new reset vfuncs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-4-chris@chris-wilson.co.uk
When setting up reset, we may need to recursively prepare an engine. In
which case we should only synchronously flush the tasklets on the outer
most call, the inner calls will then be inside an atomic section where
the tasklet will never be run (and so the sync will never complete).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-2-chris@chris-wilson.co.uk
The idea was to try and let the existing tasklet run to completion
before we began the reset, but it involves a racy check against anything
else that tries to run the tasklet. Rather than acknowledge and ignore
the race, let it be and don't try and be too clever.
The tasklet will resume execution after reset (after spinning a bit
during reset), but before we allow it to resume we will have cleared all
the pending state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-1-chris@chris-wilson.co.uk
When we process the outstanding requests upon banning a context, we need
to acquire both the engine and the client's timeline, nesting the locks.
This requires explicit markup as the two timelines are now of the same
class, since commit a89d1f921c ("drm/i915: Split i915_gem_timeline into
individual timelines").
Testcase: igt/gem_eio/banned
Fixes: a89d1f921c ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180512084957.9829-1-chris@chris-wilson.co.uk
When called from process context tasklet_schedule() defers itself to
ksoftirqd. From experience this may cause unacceptable latencies of over
200ms in executing the submission tasklet, our goal is to reprioritise
the HW execution queue and trigger HW preemption immediately, so disable
bh over the call to schedule and force the tasklet to run afterwards if
scheduled.
v2: Keep rcu_read_lock() around for PREEMPT_RCU
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180507135731.10587-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When userspace is passing around swapbuffers using DRI, we frequently
have to open and close the same object in the foreign address space.
This shows itself as the same object being rebound at roughly 30fps
(with a second object also being rebound at 30fps), which involves us
having to rewrite the page tables and maintain the drm_mm range manager
every time.
However, since the object still exists and it is only the local handle
that disappears, if we are lazy and do not unbind the VMA immediately
when the local user closes the object but defer it until the GPU is
idle, then we can reuse the same VMA binding. We still have to be
careful to mark the handle and lookup tables as closed to maintain the
uABI, just allowing the underlying VMA to be resurrected if the user is
able to access the same object from the same context again.
If the object itself is destroyed (neither userspace keeping a handle to
it), the VMA will be reaped immediately as usual.
In the future, this will be even more useful as instantiating a new VMA
for use on the GPU will become heavier. A nuisance indeed, so nip it in
the bud.
v2: s/__i915_vma_final_close/i915_vma_destroy/ etc.
v3: Leave a hint as to why we deferred the unbind on close.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-1-chris@chris-wilson.co.uk
We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.
By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.
v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
In the future, we want to move a request between engines. To achieve
this, we first realise that we have two timelines in effect here. The
first runs through the GTT is required for ordering vma access, which is
tracked currently by engine. The second is implied by sequential
execution of commands inside the ringbuffer. This timeline is one that
maps to userspace's expectations when submitting requests (i.e. given the
same context, batch A is executed before batch B). As the rings's
timelines map to userspace and the GTT timeline an implementation
detail, move the timeline from the GTT into the ring itself (per-context
in logical-ring-contexts/execlists, or a global per-engine timeline for
the shared ringbuffers in legacy submission.
The two timelines are still assumed to be equivalent at the moment (no
migrating requests between engines yet) and so we can simply move from
one to the other without adding extra ordering.
v2: Reinforce that one isn't allowed to mix the engine execution
timeline with the client timeline from userspace (on the ring).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
We don't need to track every ring for its lifetime as they are managed
by the contexts/engines. What we do want to track are the live rings so
that we can sporadically clean up requests if userspace falls behind. We
can simply restrict the gt->rings list to being only gt->live_rings.
v2: s/live/active/ for consistency with gt.active_requests
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.
As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.
In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!
v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).
v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
Make life easier in upcoming patches by moving the context_pin and
context_unpin vfuncs into inline helpers.
v2: Fixup mock_engine to mark the context as pinned on use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk
The majority of the engine state dumping is too voluminous to be useful
outside of a controlled setup, though a few do accompany severe errors.
Keep the debug dumps next to the errors, but hide the others behind a CI
compile flag. This becomes more useful when adding more dumps to latency
sensitive paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180426103219.22181-1-chris@chris-wilson.co.uk
Today we only want to pass along the priority to engine->schedule(), but
in the future we want to have much more control over the various aspects
of the GPU during a context's execution, for example controlling the
frequency allowed. As we need an ever growing number of parameters for
scheduling, move those into a struct for convenience.
v2: Move the anonymous struct into its own function for legibility and
ye olde gcc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk
There are different kind of workarounds (those that modify registers that
live in the context image, those that modify global registers, those that
whitelist registers, etc...) and they have different requirements in terms
of where they are applied and how. Also, by splitting them apart, it should
be easier to decide where a new workaround should go.
v2:
- Add multiple MISSING_CASE
- Rebased
v3:
- Rename mmio_workarounds to gt_workarounds (Chris, Mika)
- Create empty placeholders for BDW and CHV GT WAs
- Rebased
v4: Rebased
v5:
- Rebased
- FORCE_TO_NONPRIV register exists since BDW, so make a path
for it to achieve universality, even if empty (Chris)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: appease checkpatch]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1523376767-18480-2-git-send-email-oscar.mateo@intel.com
Resetting the GPU doesn't affect the RPS/RC6 state, so we can stop
forcibly reloading the registers.
Ville suggested this many moons ago, I said at that time that sanitizing
was no harm and meant that our bookkeeping was kept consistent with the
HW. However, in a forthcoming series, we want to split rps/rc6 GT
powermanagement and one of the key simplifications is the control of
when we enable it. Performing a crude sanitize in the middle of
i915_gem_reset() is then a huge wart.
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180410133354.13425-1-chris@chris-wilson.co.uk
Currently, we rely on inspecting the hangcheck state from within the
i915_reset() routines to determine which engines were guilty of the
hang. This is problematic for cases where we want to run
i915_handle_error() and call i915_reset() independently of hangcheck.
Instead of relying on the indirect parameter passing, turn it into an
explicit parameter providing the set of stalled engines which then are
treated as guilty until proven innocent.
While we are removing the implicit stalled parameter, also make the
reason into an explicit parameter to i915_reset(). We still need a
back-channel for i915_handle_error() to hand over the task to the locked
waiter, but let's keep that its own channel rather than incriminate
another.
This leaves stalled/seqno as being private to hangcheck, with no more
nefarious snooping by reset, be it whole-device or per-engine. \o/
The only real issue now is that this makes it crystal clear that we
don't actually do any testing of hangcheck per se in
drv_selftest/live_hangcheck, merely of resets!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-2-chris@chris-wilson.co.uk
If we are resetting just one engine, we know it has stalled. So we can
pass the stalled parameter directly to i915_gem_reset_engine(), which
alleviates the necessity to poke at the generic engine->hangcheck.stalled
magic variable, leaving that under control of hangcheck as its name
implies. Other than simplifying by removing the indirect parameter along
this path, this allows us to introduce new reset mechanisms that run
independently of hangcheck.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-1-chris@chris-wilson.co.uk
We will want to park GEM before disengaging the drive^W^W^W unwedging.
Since we already do the work for idling, expose the guts as a new
function that we can then reuse.
v2: Just skip if already parked; makes it more forgiving to use by
future callers.
v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to
i915_gem_park so that we can evaluate it for symmetry more easily.
Calling GEM from inside i915_request looks to be a bit of a layering
violation, for the moment I am imaging them as being notify_cb.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk
In upcoming patch, we want to perform more actions in early
initialization of the uC. This reordering will help resolve
new dependencies that will be introduced by future patch.
v2: s/i915_gem_load_init/i915_gem_init_early (Chris)
v3: s/i915_gem_load_cleanup/i915_gem_cleanup_early (Michal)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180323123451.59244-1-michal.wajdeczko@intel.com
If we fail to reset the GPU, we declare the machine wedged. However, the
GPU may well still be running in the background with an in-flight
request. So despite our efforts in cleaning up the request queue and
faking the breadcrumb in the HWSP, the GPU may eventually write the
in-flght seqno there breaking all of our assumptions and throwing the
driver into a deep turmoil, wedging beyond wedged.
To avoid this we ideally want to reset the GPU. Since that has already
failed, make sure the rings have the stop bit set instead. This is part
of the normal GPU reset sequence, but that is actually disabled by
igt/gem_eio to force the wedged state. If we assume the worst, we must
poke at the bit again before we give up.
v2: Move the intel_gpu_reset() from set-wedged in the reset error path
into i915_gem_set_wedged() itself. Even if the reset fails (e.g. if it is
disabled by gem_eio), it still tries to make sure the engines are
stopped. For i915_gem_set_wedged() callers from outside of i915_reset(),
this should make sure the GPU is disabled while the driver is marked as
being wedged.
Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180315151015.22741-1-chris@chris-wilson.co.uk
We still have an odd race with wedging/unwedging as shown by igt/gem_eio
that defies expectations. Add some more trace_printks to try and
visualize the flow over the precipice.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180315131451.4060-1-chris@chris-wilson.co.uk
GuC WOPCM registers are write-once registers. Current driver code accesses
these registers without checking the accessibility to these registers which
will lead to unpredictable driver behaviors if these registers were touch
by other components (such as faulty BIOS code).
This patch moves the GuC WOPCM registers updating code into intel_wopcm.c
and adds check before and after the update to GuC WOPCM registers so that
we can make sure the driver is in a known state after writing to these
write-once registers.
v6:
- Made sure module reloading won't bug the kernel while doing
locking status checking
v7:
- Fixed patch format issues
v8:
- Fixed coding style issue on register lock bit macro definition (Sagar)
v9:
- Avoided to use redundant !! to cast uint to bool (Chris)
- Return error code instead of GEM_BUG_ON for locked with invalid register
values case (Sagar)
- Updated guc_wopcm_hw_init to use guc_wopcm as first parameter (Michal)
- Added code to set and validate the HuC_LOADING_AGENT_GUC bit in GuC
WOPCM offset register based on the presence of HuC firmware (Michal)
- Use bit fields instead of macros for GuC WOPCM flags (Michal)
v10:
- Refined variable names, removed redundant comments (Joonas)
- Introduced lockable_reg to handle the write once register write and
propagate the write error to caller (Joonas)
- Used lockable_reg abstraction to avoid locking bit check on generic
i915_reg_t (Michal)
- Added log message for error paths (Michal)
- Removed hw_updated flag and only relies on real hardware status
v11:
- Replaced lockable_reg with simplified function (Michal)
- Used new macros for locking bits of WOPCM size/offset registers instead
of using BIT(0) directly (Michal)
- use intel_wopcm_init_hw() called from intel_gem_init_hw() to do GuC
WOPCM register setup instead of calling from intel_uc_init_hw() (Michal)
v12:
- Updated function kernel-doc to align with code changes (Michal)
- Updated code to use wopcm pointer directly (Michal)
v13:
- Updated the ordering of s-o-b/cc/r-b tags (Sagar)
BSpec: 10875, 10833
Signed-off-by: Jackie Li <yaodong.li@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-5-git-send-email-yaodong.li@intel.com
Hardware may have specific restrictions on GuC WOPCM offset and size. On
Gen9, the value of the GuC WOPCM size register needs to be larger than the
value of GuC WOPCM offset register + a Gen9 specific offset (144KB) for
reserved GuC WOPCM. Fail to enforce such a restriction on GuC WOPCM size
will lead to GuC firmware execution failures. On the other hand, with
current static GuC WOPCM offset and size values (512KB for both offset and
size), the GuC WOPCM size verification will fail on Gen9 even if it can be
fixed by lowering the GuC WOPCM offset by calculating its value based on
HuC firmware size (which is likely less than 200KB on Gen9), so that we can
have a GuC WOPCM size value which is large enough to pass the GuC WOPCM
size check.
This patch updates the reserved GuC WOPCM size for RC6 context on Gen9 to
24KB to strictly align with the Gen9 GuC WOPCM layout. It also adds support
to verify the GuC WOPCM size aganist the Gen9 hardware restrictions. To
meet all above requirements, let's provide dynamic partitioning of the
WOPCM that will be based on platform specific HuC/GuC firmware sizes.
v2:
- Removed intel_wopcm_init (Ville/Sagar/Joonas)
- Renamed and Moved the intel_wopcm_partition into intel_guc (Sagar)
- Removed unnecessary function calls (Joonas)
- Init GuC WOPCM partition as soon as firmware fetching is completed
v3:
- Fixed indentation issues (Chris)
- Removed layering violation code (Chris/Michal)
- Created separat files for GuC wopcm code (Michal)
- Used inline function to avoid code duplication (Michal)
v4:
- Preset the GuC WOPCM top during early GuC init (Chris)
- Fail intel_uc_init_hw() as soon as GuC WOPCM partitioning failed
v5:
- Moved GuC DMA WOPCM register updating code into intel_wopcm.c
- Took care of the locking status before writing to GuC DMA
Write-Once registers. (Joonas)
v6:
- Made sure the GuC WOPCM size to be multiple of 4K (4K aligned)
v8:
- Updated comments and fixed naming issues (Sagar/Joonas)
- Updated commit message to include more description about the hardware
restriction on GuC WOPCM size (Sagar)
v9:
- Minor changes variable names and code comments (Sagar)
- Added detailed GuC WOPCM layout drawing (Sagar/Michal)
- Refined macro definitions to be reader friendly (Michal)
- Removed redundent check to valid flag (Michal)
- Unified first parameter for exported GuC WOPCM functions (Michal)
- Refined the name and parameter list of hardware restriction checking
functions (Michal)
v10:
- Used shorter function name for internal functions (Joonas)
- Moved init-ealry function into c file (Joonas)
- Consolidated and removed redundant size checks (Joonas/Michal)
- Removed unnecessary unlikely() from code which is only called once
during boot (Joonas)
- More fixes to kernel-doc format and content (Michal)
- Avoided the use of PAGE_MASK for 4K pages (Michal)
- Added error log messages to error paths (Michal)
v11:
- Replaced intel_guc_wopcm with more generic intel_wopcm and attached
intel_wopcm to drm_i915_private instead intel_guc (Michal)
- dynamic calculation of GuC non-wopcm memory start (a.k.a WOPCM Top
offset from GuC WOPCM base) (Michal)
- Moved WOPCM marco definitions into .c source file (Michal)
- Exported WOPCM layout diagram as kernel-doc (Michal)
v12:
- Updated naming, function kernel-doc to align with new changes (Michal)
v13:
- Updated the ordering of s-o-b/cc/r-b tags (Sagar)
- Corrected one tense error in comment (Sagar)
- Corrected typos and removed spurious comments (Joonas)
Bspec: 12690
Signed-off-by: Jackie Li <yaodong.li@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Spotswood <john.a.spotswood@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v8)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v9)
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-2-git-send-email-yaodong.li@intel.com
If we timeout waiting for the GPU to idle, something went seriously
wrong. We currently dump the engine state, but we can also dump the
ftrace buffer showing our last operations (when available).
In passing, note that since commit 559e040f1f ("drm/i915: Show the GPU
state when declaring wedged", we now show the engine state twice, once
in detecting the failed idle and then again on declaring wedged.
v2: ftrace_dump() takes a parameter specifying whether to dump all cpu
buffers or the local cpu's.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180309101114.1138-1-chris@chris-wilson.co.uk
i915_gem_obj_pin_to_display() calls frontbuffer_flush with origin set to
DIRTYFB. The callers however are at a vantage point to decide if hardware
frontbuffer tracking can do the flush for us. For example, legacy cursor
updates, like flips, write to MMIO registers, which then triggers PSR flush
by the hardware. Moving frontbuffer_flush out will enable us to skip a
software initiated flush by setting origin to FLIP. Thanks to Chris for the
idea.
v2:
Rebased due to Ville adding intel_plane_pin_fb().
Minor code reordering as fb_obj_flush doesn't need struct_mutex (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307033420.3086-1-dhinakaran.pandiyan@intel.com
Instead of dancing around uC on reset/suspend/resume scenarios,
explicitly sanitize uC when we sanitize GEM to force uC reload
and start from known beginning.
v2: don't forget about reset path (Daniele)
sanitize uc before gem initiated full reset (Daniele)
v3: drop redundant disable_communication in init_hw (Daniele)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180312130308.22952-3-michal.wajdeczko@intel.com
tasklet_kill() will spin waiting for the current tasklet to be executed.
However, if tasklet_disable() has been called, then the tasklet is never
executed but permanently put back onto the runlist until
tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
disable/enable pair. This is the case when we call set-wedge from inside
i915_reset(), and another request was submitted to us concurrent to the
reset.
Fixes: 963ddd63c3 ("drm/i915: Suspend submission tasklets around wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-6-chris@chris-wilson.co.uk
Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.
Fixes: af7a8ffad9 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk
Before we reset the GPU after marking the device as wedged, we wait for
all the remaining requests to be completed (and marked as EIO).
Afterwards, we should flush the request lists so the next batch start
with the driver in an idle state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-1-chris@chris-wilson.co.uk
Currently, we only allow ourselves to prune the fences so long as
all the waits completed (i.e. all the fences we checked were signaled),
and that the reservation snapshot did not change across the wait.
However, if we only waited for a subset of the reservation object, i.e.
just waiting for the last writer to complete as opposed to all readers
as well, then we would erroneously conclude we could prune the fences as
indeed although all of our waits were successful, they did not represent
the totality of the reservation object.
v2: We only need to check the shared fences due to construction (i.e.
all of the shared fences will be later than the exclusive fence, if
any).
Fixes: e54ca97747 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307171303.29466-1-chris@chris-wilson.co.uk
We want to use higher level 'uc' functions as the main entry points to
the GuC/HuC code to hide some details and keep code layered.
While here, move call to disable_guc_interrupts after sending suspend
action to the GuC to allow it work also with CTB as comm mechanism.
v2: update commit msg (Sagar)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302111550.21328-1-michal.wajdeczko@intel.com
Now that we can pass arbitrary commands into the base __wait_for()
macro, we can reimplement the open-coded wait-for inside
i915_gem_idle_work_handler() using the new macro. This means that instead
of using ktime, we now use jiffies, and benefit from the exponential sleep
backoff that allows a fast response if the HW settles quickly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180301103338.5380-1-chris@chris-wilson.co.uk
We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)
In short, the spatch:
@@
@@
- struct drm_i915_gem_request
+ struct i915_request
A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>