The main difference with previous GENs is that starting from Gen11
each VCS and VECS engine has its own power well, which only exist
if the related engine exists in the HW.
The fallback forcewake request workaround is only needed on gen9
according to the HSDES WA entry (1604254524), so we can go back to using
the simpler fw_domains_get/put functions.
BSpec: 18331
v2: fix fwtable, use array to test shadow tables, create new
accessors to avoid check on every access (Tvrtko)
v3 (from Paulo): Rebase.
v4:
- Range 09400-097FF should be FORCEWAKE_ALL (Daniele)
- Use the BIT macro for forcewake domains (Daniele)
- Add a comment about the range ordering (Oscar)
- Updated commit message (Oscar)
v5: Rebased
v6: Use I915_MAX_VCS/VECS (Michal)
v7: translate FORCEWAKE_ALL to available domains
v8: rebase, add clarification on fallback ack in commit message.
v9: fix rebase issue, change check in fw_domains_init from IS_GEN11
to GEN >= 11
v10: Generate is_genX_shadowed with a macro (Daniele)
Include gen11_fw_ranges in the selftest (Michel)
v11: Simplify FORCEWAKE_ALL, new line between NEEDS_FORCEWAKEs (Tvrtko)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-6-mika.kuoppala@linux.intel.com
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Driver Changes:
- Lift alpha_support protection from Cannonlake (Rodrigo)
* Meaning the driver should mostly work for the hardware we had
at our disposal when testing
* Used to be preliminary_hw_support
- Add missing Cannonlake PCI device ID of 0x5A4C (Rodrigo)
- Cannonlake port register fix (Mahesh)
- Fix Dell Venue 8 Pro black screen after modeset (Hans)
- Fix for always returning zero out-fence from execbuf (Daniele)
- Fix HDMI audio when no no relevant video output is active (Jani)
- Fix memleak of VBT data on driver_unload (Hans)
- Fix for KASAN found locking issue (Maarten)
- RCU barrier consolidation to improve igt/gem_sync/idle (Chris)
- Optimizations to IRQ handlers (Chris)
- vblank tracking improvements (64-bit resolution, PM) (Dhinakaran)
- Pipe select bit corrections (Ville)
- Reduce runtime computed device_info fields (Chris)
- Tune down some WARN_ONs to GEM_BUG_ON now that CI has good coverage (Chris)
- A bunch of kerneldoc warning fixes (Chris)
* tag 'drm-intel-next-2018-02-21' of git://anongit.freedesktop.org/drm/drm-intel: (113 commits)
drm/i915: Update DRIVER_DATE to 20180221
drm/i915/fbc: Use PLANE_HAS_FENCE to determine if the plane is fenced
drm/i915/fbdev: Use the PLANE_HAS_FENCE flags from the time of pinning
drm/i915: Move the policy for placement of the GGTT vma into the caller
drm/i915: Also check view->type for a normal GGTT view
drm/i915: Drop WaDoubleCursorLP3Latency:ivb
drm/i915: Set the primary plane pipe select bits on gen4
drm/i915: Don't set cursor pipe select bits on g4x+
drm/i915: Assert that we don't overflow frontbuffer tracking bits
drm/i915: Track number of pending freed objects
drm/i915/: Initialise trans_min for skl_compute_transition_wm()
drm/i915: Clear the in-use marker on execbuf failure
drm/i915: Prune gen8_gt_irq_handler
drm/i915: Track GT interrupt handling using the master iir
drm/i915: Remove WARN_ONCE for failing to pm_runtime_if_in_use
drm: intel_dpio_phy: fix kernel-doc comments at nested struct
drm/i915: Release connector iterator on a digital port conflict.
drm/i915/execlists: Remove too early assert
drm/i915: Assert that we always complete a submission to guc/execlists
drm: move read_domains and write_domain into i915
...
Add HDCP support to i915 drm driver.
* tag 'topic/hdcp-2018-02-13' of git://anongit.freedesktop.org/drm/drm-misc: (26 commits)
drm/i915: fix misalignment in HDCP register def
drm/i915: Reauthenticate HDCP on failure
drm/i915: Detect panel's hdcp capability
drm/i915: Optimize HDCP key load
drm/i915: Retry HDCP bksv read
drm/i915: Connector info in HDCP debug msgs
drm/i915: Stop encryption for repeater with no sink
drm/i915: Handle failure from 2nd stage HDCP auth
drm/i915: Downgrade hdcp logs from INFO to DEBUG_KMS
drm/i915: Restore HDCP DRM_INFO when with no downstream
drm/i915: Check for downstream topology errors
drm/i915: Start repeater auth on READY/CP_IRQ
drm/i915: II stage HDCP auth for repeater only
drm/i915: Extending HDCP for HSW, BDW and BXT+
drm/i915/dp: Fix compilation of intel_dp_hdcp_check_link
drm/i915: Only disable HDCP when it's active
drm/i915: Don't allow HDCP on PORT E/F
drm/i915: Implement HDCP for DisplayPort
drm/i915: Implement HDCP for HDMI
drm/i915: Add function to output Aksv over GMBUS
...
This patch adds a little more control to a couple wait_for routines such
that we can avoid open-coding read/wait/timeout patterns which:
- need the value of the register after the wait_for
- run arbitrary operation for the read portion
This patch also chooses the correct sleep function (based on
timers-howto.txt) for the polling interval the caller specifies.
Changes in v2:
- Added to the series
Changes in v3:
- Rebased on drm-intel-next-queued and the new Wmin/max _wait_for
- Removed msleep option
Changes in v4:
- Removed ; for OP in _wait_for (Chris)
- Moved reg_value definition above ret (Chris)
Changes in v4:
- checkpatch whitespace fix
Changes in v5:
- None
Changes in v6:
- None
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180108195545.218615-3-seanpaul@chromium.org
It has been many years since the last confirmed sighting (and fix) of an
RC6 related bug (usually a system hang). Remove the parameter to stop
users from setting dangerous values, as they often set it during triage
and end up disabling the entire runtime pm instead (the option is not a
fine scalpel!).
Furthermore, it allows users to set known dangerous values which were
intended for testing and not for production use. For testing, we can
always patch in the required setting without having to expose ourselves
to random abuse.
v2: Fixup NEEDS_WaRsDisableCoarsePowerGating fumble, and document the
lack of ilk support better.
v3: Clear intel_info->rc6p if we don't support rc6 itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171201113030.18360-1-chris@chris-wilson.co.uk
assert_rpm_wakelock_held is triggered from i915_pmic_bus_access_notifier
even though it gets unregistered on (runtime) suspend, this is caused
by a race happening under the following circumstances:
intel_runtime_pm_put does:
atomic_dec(&dev_priv->pm.wakeref_count);
pm_runtime_mark_last_busy(kdev);
pm_runtime_put_autosuspend(kdev);
And pm_runtime_put_autosuspend calls intel_runtime_suspend from
a workqueue, so there is ample of time between the atomic_dec() and
intel_runtime_suspend() unregistering the notifier. If the notifier
gets called in this windowd assert_rpm_wakelock_held falsely triggers
(at this point we're not runtime-suspended yet).
This commit adds disable_rpm_wakeref_asserts and
enable_rpm_wakeref_asserts calls around the
intel_uncore_forcewake_get(FORCEWAKE_ALL) call in
i915_pmic_bus_access_notifier fixing the false-positive WARN_ON.
Changes in v2:
-Reword comment explaining why disabling the wakeref asserts is
ok and necessary
Cc: stable@vger.kernel.org
Reported-by: FKr <bugs-freedesktop@ubermail.me>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110150301.9601-2-hdegoede@redhat.com
intel_uncore_forcewake_reset() does forcewake puts and gets as such
we need to make sure that no-one tries to access the PUNIT->PMIC bus
(on systems where this bus is shared) while it runs, otherwise bad
things happen.
Normally this is taken care of by the i915_pmic_bus_access_notifier()
which does an intel_uncore_forcewake_get(FORCEWAKE_ALL) when some other
driver tries to access the PMIC bus, so that later forcewake gets are
no-ops (for the duration of the bus access).
But intel_uncore_forcewake_reset gets called in 3 cases:
1) Before registering the pmic_bus_access_notifier
2) After unregistering the pmic_bus_access_notifier
3) To reset forcewake state on a GPU reset
In all 3 cases the i915_pmic_bus_access_notifier() protection is
insufficient.
This commit fixes this race by calling iosf_mbi_punit_acquire() before
calling intel_uncore_forcewake_reset(). In the case where it is called
directly after unregistering the pmic_bus_access_notifier, we need to
hold the punit-lock over both calls to avoid a race where
intel_uncore_fw_release_timer() may execute between the 2 calls.
Reviewed-by: Imre Deak <imre.deak@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171019111620.26761-3-hdegoede@redhat.com
There is a possibility on gen9 hardware to miss the forcewake ack
message. The recommended workaround is to use another free
bit and toggle it until original bit is successfully acknowledged.
Some future gen9 revs might or might not fix the underlying issue but
using fallback forcewake bit dance can be considered as harmless:
without the ack timeout we never reach the fallback bit forcewake.
Thus as of now we adopt a blanket approach for all gen9 and leave
the bypassing the fallback bit approach for future patches if
corresponding hw revisions do appear.
Commit 83e3337204 ("drm/i915: Increase maximum polling time to 50ms
for forcewake request/clear ack") did increase the forcewake timeout.
If the issue was a delayed ack, future work could include finding
a suitable timeout value both for primary ack and reserve toggle
to reduce the worst case latency.
v2: use bit 15, naming, comment (Chris), only wait fallback ack
v3: fix return on fallback, backoff after fallback write (Chris)
v4: udelay on first pass, grammar (Chris)
v4: s/reserve/fallback
References: HSDES #1604254524
References: https://bugs.freedesktop.org/show_bug.cgi?id=102051
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171102094836.2506-1-mika.kuoppala@linux.intel.com
This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.
In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before
resetting it.
Once the reset is successful, i915 takes over again and handles the
resubmission. The scheduler in i915 knows which requests are pending so
after resetting a engine, pending workloads/requests are resubmitted
again.
v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc function names.
v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)
v4: Rebase.
v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
v6: Rename the existing reset engine function and share a similar
interface between guc and non-guc paths (Chris).
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031225309.10888-1-michel.thierry@intel.com
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Forcewake is not affected by the engine reset on gen6+. Indeed the
reason why we added intel_uncore_forcewake_reset() to
gen6_reset_engines() was to keep the bookkeeping intact because the
reset did not touch the forcewake bit (yet we cancelled the forcewake
consumers)! This was done in commit 521198a2e7:
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Fri Aug 23 16:52:30 2013 +0300
drm/i915: sanitize forcewake registers on reset
In reset we try to restore the forcewake state to
pre reset state, using forcewake_count. The reset
doesn't seem to clear the forcewake bits so we
get warn on forcewake ack register not clearing.
That futzing of the forcewake bookkeeping was dropped in commit
0294ae7b44 ("drm/i915: Consolidate forcewake resetting to a single
function"), but it did not make the realisation that the remaining
intel_uncore_forcewake_reset() was redundant.
The new danger with using intel_uncore_forcewake_reset() with per-engine
resets is that the driver and hw are still in an active state as we
perform the reset. We may be using the forcewake to read protected
registers elsewhere and those results may be clobbered by the concurrent
dropping of forcewake.
Reported-by: Michel Thierry <michel.thierry@intel.com>
Fixes: 142bc7d99b ("drm/i915: Modify error handler for per engine hang recovery")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170817173229.20324-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
This is a preparatory patch which modifies error handler to do per engine
hang recovery. The actual patch which implements this sequence follows
later in the series. The aim is to prepare existing recovery function to
adapt to this new function where applicable (which fails at this point
because core implementation is lacking) and continue recovery using legacy
full gpu reset.
A helper function is also added to query the availability of engine
reset. A subsequent patch will add the capability to query which type
of reset is present (engine -> full -> no-reset) via the get-param
ioctl.
It has been decided that the error events that are used to notify user of
reset will only be sent in case if full chip reset. In case of just
single (or multiple) engine resets, userspace won't be notified by these
events.
Note that this implementation of engine reset is for i915 directly
submitting to the ELSP, where the driver manages the hang detection,
recovery and resubmission. With GuC submission these tasks are shared
between driver and firmware; i915 will still responsible for detecting a
hang, and when it does it will have to request GuC to reset that Engine and
remind the firmware about the outstanding submissions. This will be
added in different patch.
v2: rebase, advertise engine reset availability in platform definition,
add note about GuC submission.
v3: s/*engine_reset*/*reset_engine*/. (Chris)
Handle reset as 2 level resets, by first going to engine only and fall
backing to full/chip reset as needed, i.e. reset_engine will need the
struct_mutex.
v4: Pass the engine mask to i915_reset. (Chris)
v5: Rebase, update selftests.
v6: Rebase, prepare for mutex-less reset engine.
v7: Pass reset_engine mask as a function parameter, and iterate over the
engine mask for reset_engine. (Chris)
v8: Use i915.reset >=2 in has_reset_engine; remove redundant reset
logging; add a reset-engine-in-progress flag to prevent concurrent
resets, and avoid dual purposing of reset-backoff. (Chris)
v9: Support reset of different engines in parallel (Chris)
v10: Handle reset-engine flag locking better (Chris)
v11: Squash in reporting of per-engine-reset availability.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ian Lister <ian.lister@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-4-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-5-chris@chris-wilson.co.uk
Currently the timer is armed for 1ms after the first use and is killed
immediately, dropping the forcewake as early as possible. However, for
very frequent operations the forcewake dance has a large impact on
latency and keeping the timer alive until we are idle is preferred. To
achieve this, if we call intel_uncore_forcewake_get whilst the timer is
alive (repeated use), then set a flag to restart the timer on expiry
rather than drop the forcewake usage count. The timer is racy, the
consequence of the race is to expire the timer earlier than is now
desired but does not impact on correct behaviour. The offset the race
slightly, we set the active flag again on intel_uncore_forcewake_put.
The effect should be to reduce the jitter of reacquiring the fw every
1ms on a busy system. However, the cost is to keep the timer alive for
an extra 1ms on a nearly idle system. We chose to incur the jitter
previously to keep the timer off for as much as possible.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170526132209.14640-1-chris@chris-wilson.co.uk
ELK seems to very picky about the preconditions to reset.
Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does
not like if reset occurs when there is active ring.
Ville found out that there is workaround with name
'WaMediaResetMainRingCleanup' which suggests that we need to
cleanup rings before resetting. It is unclear what cleanup
exactly means but evidence shows that stopping the ring
does have an effect on reset reliability. This patch makes
reset successful on hangs induced by chained batches (the igt ones).
Note that if the hang is inside a shader, it is possible
that our attempts to stop the ring achieves anything.
v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris)
v3: specify platform on testcases, comment tidyup (Chris)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942
Testcase: igt/gem_busy/*-hang #elk
Testcase: igt/gem_ringfill/hang-* #elk
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170519091340.21439-1-mika.kuoppala@intel.com
Replace the handcrafter loop when checking for fifo slots
with atomic wait for. This brings this wait in line with
the other waits on register access. We also get a readable
timeout constraint, so make it to fail after 10ms.
Chris suggested that we should fail silently as the fifo debug
handler, now attached to unclaimed mmio handling, will take care of the
possible errors at later stage.
Note that the decision to wait was changed so that we avoid
allocating the first reserved entry. Nor do we reduce the count
if we fail the wait, removing the possiblity to wrap the
count if the hw fifo returned zero.
v2: remove unclaimed check on timeout (Chris)
v3: use void return (Chris)
References: https://bugs.freedesktop.org/show_bug.cgi?id=100247
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1491493182-31540-1-git-send-email-mika.kuoppala@intel.com
Remove the per-mmio checking of the FIFO debug register into the common
conditional mmio debug handling. Based on patch from Chris Wilson.
v2: postpone warn on fifodbg for unclaimed reg debugs
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Looks like intel_guc_reset had the ability to sleep under the
uncore spinlock since forever but it wasn't detected until the
recent changes annotated the wait for register with might_sleep.
I have fixed it by removing holding of the uncore spinlock over
the call to gen6_hw_domain_reset, since I do not see that is
really needed. But there is always a possibility I am missing
some nasty detail so please double check.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>