Allow a brief period for continued access to a dead intel_context by
deferring the release of the struct until after an RCU grace period.
As we are using a dedicated slab cache for the contexts, we can defer
the release of the slab pages via RCU, with the caveat that individual
structs may be reused from the freelist within an RCU grace period. To
handle that, we have to avoid clearing members of the zombie struct.
This is required for a later patch to handle locking around virtual
requests in the signaler, as those requests may want to move between
engines and be destroyed while we are holding b->irq_lock on a physical
engine.
v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we
don't need to reset the debug code, at the loss of having the mutex
debug code spot us attempting to destroy a locked mutex.
v3: As the intended use will remain strongly referenced counted, with
very little inflight access across reuse, drop the ctor.
v4: Drop the unrequired change to remove the temporary reference around
dropping the active context, and add back some more missing ctor
operations.
v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced
later] maybe accessed under RCU and so needs special care not to be
reinitialised.
v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-3-chris@chris-wilson.co.uk
(cherry picked from commit 14d1eaf088)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.
v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk
(cherry picked from commit 46eecfccb4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Move the register slow register write and readback from out of the
critical path for execlists submission and delay it until the following
worker, shaving off around 200us. Note that the same signal_irq_work() is
allowed to run concurrently on each CPU (but it will only be queued once,
once running though it can be requeued and reexecuted) so we have to
remember to lock the global interactions as we cannot rely on the
signal_irq_work() itself providing the serialisation (in constrast to a
tasklet).
By pushing the arm/disarm into the central signaling worker we can close
the race for disarming the interrupt (and dropping its associated
GT wakeref) on parking the engine. If we loose the race, that GT wakeref
may be held indefinitely, preventing the machine from sleeping while
the GPU is ostensibly idle.
v2: Move the self-arming parking of the signal_irq_work to a flush of
the irq-work from intel_breadcrumbs_park().
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271
Fixes: e23005604b ("drm/i915/gt: Hold context/request reference while breadcrumbs are active")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-1-chris@chris-wilson.co.uk
(cherry picked from commit 9d5612ca16)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
After having written the entire OA buffer with reports, the HW will
write again at the beginning of the OA buffer. It'll indicate it by
setting the WRAP bits in the OASTATUS register.
When a wrap happens and that at the end of the read vfunc we write the
OASTATUS register back to clear the REPORT_LOST bit, we sometimes see
that the OATAILPTR register is reset to a previous position on Gen8/9
(apparently not the case on Gen11+). This leads the next call to the
read vfunc to process reports we've already read. Because we've marked
those as read by clearing the reason & timestamp dwords, they're
discarded and a "Skipping spurious, invalid OA report" message is
emitted.
The workaround to avoid this OATAILPTR value reset seems to be to set
the wrap bits when writing back OASTATUS.
This change has no impact on userspace, it only avoids a bunch of
DRM_NOTE("Skipping spurious, invalid OA report\n") messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df285 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117130124.829979-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 059a0beb48)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Get rid of the __call_single_node union and clean up the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
For DG1 we have a little of mix up wrt to DDI/port names and indexes.
Bspec refers to the ports as DDIA, DDIB, DDI USBC1 and DDI USBC2
(besides the DDIA, DDIB, DDIC, DDID), but the previous naming is the
most unambiguous one. This means that for any register on Display Engine
we should use the index of A, B, D and E. However in some places this is
not true:
- VBT: uses C and D and have to be mapped to D/E
- IO/Combo: uses C and D, but we already differentiate those when
we created the phy vs port distinction.
This additional mapping for VBT and phy are already covered in previous
patches, so now we can initialize all the DDIs as A, B, D and E.
v2: Squash previous patch enabling just ports A and B since most of the
pumbling code is already merged now
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Clinton Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117084836.2318234-1-lucas.demarchi@intel.com
After having written the entire OA buffer with reports, the HW will
write again at the beginning of the OA buffer. It'll indicate it by
setting the WRAP bits in the OASTATUS register.
When a wrap happens and that at the end of the read vfunc we write the
OASTATUS register back to clear the REPORT_LOST bit, we sometimes see
that the OATAILPTR register is reset to a previous position on Gen8/9
(apparently not the case on Gen11+). This leads the next call to the
read vfunc to process reports we've already read. Because we've marked
those as read by clearing the reason & timestamp dwords, they're
discarded and a "Skipping spurious, invalid OA report" message is
emitted.
The workaround to avoid this OATAILPTR value reset seems to be to set
the wrap bits when writing back OASTATUS.
This change has no impact on userspace, it only avoids a bunch of
DRM_NOTE("Skipping spurious, invalid OA report\n") messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df285 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117130124.829979-1-lionel.g.landwerlin@intel.com
Add the new vma_set_file() function to allow changing
vma->vm_file with the necessary refcount dance.
v2: add more users of this.
v3: add missing EXPORT_SYMBOL, rebase on mmap cleanup,
add comments why we drop the reference on two occasions.
v4: make it clear that changing an anonymous vma is illegal.
v5: move vma_set_file to mm/util.c
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://patchwork.freedesktop.org/patch/399360/
EDID can declare the maximum supported bpc up to 16,
and apparently there are displays that do so. Currently
we assume 12 bpc is tha max. Fix the assumption and
toss in a MISSING_CASE() for any other value we don't
expect to see.
This fixes modesets with a display with EDID max bpc > 12.
Previously any modeset would just silently fail on platforms
that didn't otherwise limit this via the max_bpc property.
In particular we don't add the max_bpc property to HDMI
ports on gmch platforms, and thus we would see the raw
max_bpc coming from the EDID.
I suppose we could already adjust this to also allow 16bpc,
but seeing as no current platform supports that there is
little point.
Cc: stable@vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2632
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201110210447.27454-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 2ca5a7b85b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Skip iterating over bigjoiner slaves, only the master has the state we
care about.
Add the width of the bigjoiner slave to the reconstructed fb.
Hide the bigjoiner slave to userspace, and double the mode on bigjoiner
master.
And last, disable bigjoiner slave from primary if reconstruction fails.
v3:
* Fix the ddi_get_config slave error (Ankit Nautiyal)
v2:
* Unsupported bigjoiner config for initial fb (Ville)
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
[vsyrjala:
* Don't do any hw->uapi state copy for bigjoiner slave
* We still have hw.mode so no need to pass it in
* Appease checkpatch]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117194718.11462-7-manasi.d.navare@intel.com
Enabling is done in a special sequence and so should plane updates
be. Ideally the end user never notices the second pipe is used.
This way ideally everything will be tear free, and updates are
really atomic as userspace expects it.
This uses generic modeset_enables() calls like trans port sync
but still has special handling for disable since for slave we
should not disable things like encoder, plls that are not enabled
for slave.
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
[vsyrjala: Appease checkpatch]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117194718.11462-6-manasi.d.navare@intel.com
When the clock is higher than the dotclock, try with 2 pipes enabled.
If we can enable 2, then we will go into big joiner mode, and steal
the adjacent crtc.
This only links the crtc's in software, no hardware or plane
programming is done yet. Blobs are also copied from the master's
crtc_state, so it doesn't depend at commit time on the other
crtc_state.
v6:
* Enable dSC for any mode->hdisplay > 5120
v5:
* Remove intel_dp_max_dotclock (Manasi)
v4:
* Fixes in intel_crtc_compute_config (Ville)
v3:
* Manual Rebase (Manasi)
Changes since v1:
- Rename pipe timings to transcoder timings, as they are now different.
Changes since v2:
- Rework bigjoiner checks; always disable slave when recalculating
master. No need to have a separate bigjoiner pass any more.
- Use pipe_mode instead of transcoder_mode, to clean up the code.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
[vsyrjala:
* hskew isn't a thing
* Do the dsc compute if bigjoiner is enabled, not the other way around]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117194718.11462-4-manasi.d.navare@intel.com
Small changes to intel_dp_mode_valid(), allow listing modes that
can only be supported in the bigjoiner configuration, which is
not supported yet.
v13:
* Allow bigjoiner if hdisplay >5120
v12:
* slice_count logic simplify (Ville)
* Fix unnecessary changes in downstream_mode_valid (Ville)
v11:
* Make intel_dp_can_bigjoiner non static
so it can be used in intel_display (Manasi)
v10:
* Simplify logic (Ville)
* Allow bigjoiner on edp (Ville)
v9:
* Restric Bigjoiner on PORT A (Ville)
v8:
* use source dotclock for max dotclock (Manasi)
v7:
* Add can_bigjoiner() helper (Ville)
* Pass bigjoiner to plane_size validation (Ville)
v6:
* Rebase after dp_downstream mode valid changes (Manasi)
v5:
* Increase max plane width to support 8K with bigjoiner (Maarten)
v4:
* Rebase (Manasi)
Changes since v1:
- Disallow bigjoiner on eDP.
Changes since v2:
- Rename intel_dp_downstream_max_dotclock to intel_dp_max_dotclock,
and split off the downstream and source checking to its own function.
(Ville)
v3:
* Rebase (Manasi)
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[vsyrjala:
* Keep bigjoiner disabled until everything is ready
* Appease checkpatch]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117194718.11462-3-manasi.d.navare@intel.com
The WA specifies that we need to toggle a SDE chicken bit on and then
off as the final step in preparation for s0ix entry.
Bspec: 33450
Bspec: 8402
However, something is happening after we toggle the bit that causes
the WA to be invalidated. This makes dispcnlunit1_cp_xosc_clkreq
active being already in s0ix state i.e SLP_S0 counter incremented.
Tweaking the Wa_14010685332 by setting the bit on suspend and clearing
it on resume turns down the dispcnlunit1_cp_xosc_clkreq.
B.Spec has Documented this tweaked sequence of WA as an alternative.
Let keep this tweaked WA for Gen11 platforms and keep untweaked WA for
other platforms which never observed this issue.
v2 (MattR):
- Change the comment on the workaround to give PCH names rather than
platform names. Although the bspec is setup to list workarounds by
platform, the hardware team has confirmed that the actual issue being
worked around here is something that was introduced back in the
Cannon Lake PCH and carried forward to subsequent PCH's.
- Extend the untweaked version of the workaround to include PCH_CNP as
well. Note that since PCH_CNP is used to represent CMP, this will
apply on CML and some variants of RKL too.
- Cap the untweaked version of the workaround so that it won't apply to
"fake" PCH's (i.e., DG1). The issue we're working around really is
an issue in the PCH itself, not the South Display, so it shouldn't
apply when there isn't a real PCH.
v3:
- use intel_de_rmw(). [Rodrigo]
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201110121700.4338-1-anshuman.gupta@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
EDID can declare the maximum supported bpc up to 16,
and apparently there are displays that do so. Currently
we assume 12 bpc is tha max. Fix the assumption and
toss in a MISSING_CASE() for any other value we don't
expect to see.
This fixes modesets with a display with EDID max bpc > 12.
Previously any modeset would just silently fail on platforms
that didn't otherwise limit this via the max_bpc property.
In particular we don't add the max_bpc property to HDMI
ports on gmch platforms, and thus we would see the raw
max_bpc coming from the EDID.
I suppose we could already adjust this to also allow 16bpc,
but seeing as no current platform supports that there is
little point.
Cc: stable@vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2632
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201110210447.27454-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Currently the DPLL .get_freq() uses pll->state.hw_state which
is not the thing we actually read out (except during driver
load/resume). Outside of that pll->state.hw_state is just the
thing we committed last time around. During state check we
just read the thing into crtc_state->dpll_hw_state, so that
is what we should use for calculating the DPLL output frequency.
I think we used to do this so that the results of the readout
were actually used, but somehow it got changed when the
.get_freq() refactoring happened.
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201109231239.17002-3-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>