It's been invariant since
commit ccbc1b9794
Author: Jason Ekstrand <jason@jlekstrand.net>
Date: Thu Jul 8 10:48:30 2021 -0500
drm/i915/gem: Don't allow changing the VM on running contexts (v4)
this just completes the deed. I've tried to split out prep work for
more careful review as much as possible, this is what's left:
- get_ppgtt gets simplified since we don't need to grab a temporary
reference - we can rely on the temporary reference for the gem_ctx
while we inspect the vm. The new vm_id still needs a full
i915_vm_open ofc. This also removes the final caller of context_get_vm_rcu
- A pile of selftests can now just look at ctx->vm instead of
rcu_dereference_protected( , true) or similar things.
- All callers of i915_gem_context_vm also disappear.
- I've changed the hugepage selftest to set scrub_64K without any
locking, because when we inspect that setting we're also not taking
any locks either. It works because it's a selftests that's careful
(single threaded gives you nice ordering) and not a live driver
where races can happen from anywhere.
These can only be split up further if we have some intermediate state
with a bunch more rcu_dereference_protected(ctx->vm, true), just to
shut up lockdep and sparse.
The conversion to __rcu happened in
commit a4e7ccdac3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Oct 4 14:40:09 2019 +0100
drm/i915: Move context management under GEM
Note that we're not breaking the actual bugfix in there: The real
bugfix is pushing the i915_vm_relase onto a separate worker, to avoid
locking inversion issues. The rcu conversion was just thrown in for
entertainment value on top (no vm lookup isn't even close to anything
that's a hotpath where removing the single spinlock can be measured).
v2: Rebase over the change to move the i915_vm_put() into
i915_gem_context_release().
v3: Trivial conflict against repainted shed.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-9-daniel.vetter@ffwll.ch
When the APIs were added to manage VMs more directly from userspace, the
questionable choice was made to allow changing out the VM on a context
at any time. This is horribly racy and there's absolutely no reason why
any userspace would want to do this outside of testing that exact race.
By removing support for CONTEXT_PARAM_VM from ctx_setparam, we make it
impossible to change out the VM after the context has been fully
created. This lets us delete a bunch of deferred task code as well as a
duplicated (and slightly different) copy of the code which programs the
PPGTT registers.
v2 (Jason Ekstrand):
- Expand the commit message
v3 (Daniel Vetter):
- Don't drop the __rcu on the vm pointer
v4 (Jason Ekstrand):
- Make it more obvious that I915_CONTEXT_PARAM_VM returns -EINVAL
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-26-jason@jlekstrand.net
This was done by the following semantic patch:
@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)
@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E
@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E
@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E
@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)
@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)
@@
identifier def.id;
@@
- id
+ ver
It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-4-lucas.demarchi@intel.com
We need to generalise our accessor for the page directories and tables from
using the simple kmap_atomic to support local memory, and this setup
must be done on acquisition of the backing storage prior to entering
fence execution contexts. Here we replace the kmap with the object
mapping code that for simple single page shmemfs object will return a
plain kmap, that is then kept for the lifetime of the page directory.
Note that keeping the mapping around is a potential concern here, since
while the vma is pinned the mapping remains there for the PDs
underneath, or at least until the used_count reaches zero, at which
point we can safely destroy the mapping. For 32b this will be even worse
since the address space is more limited, but since this change mostly
impacts full ppGTT platforms, the justification is that for modern
platforms we shouldn't care too much about 32b.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-3-matthew.auld@intel.com
The GEM object is grossly overweight for the practicality of tracking
large numbers of individual pages, yet it is currently our only
abstraction for tracking DMA allocations. Since those allocations need
to be reserved upfront before an operation, and that we need to break
away from simple system memory, we need to ditch using plain struct page
wrappers.
In the process, we drop the WC mapping as we ended up clflushing
everything anyway due to various issues across a wider range of
platforms. Though in a future step, we need to drop the kmap_atomic
approach which suggests we need to pre-map all the pages and keep them
mapped.
v2: Verify our large scratch page is suitably DMA aligned; and manually
clear the scratch since we are allocating plain struct pages full of
prior content.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The history of i915_vma_close() is confusing, as is its use. As the
lifetime of the i915_vma is currently bounded by the object it is
attached to, we needed a means of identify when a vma was no longer in
use by userspace (via the user's fd). This is further complicated by
that only ppgtt vma should be closed at the user's behest, as the ggtt
were always shared.
Now that we attach the vma to a lut on the user's context, the open
count does indicate how many unique and open context/vm are referencing
this vma from the user. As such, we can and should just use the
open_count to track when the vma is still in use by userspace.
It's a poor man's replacement for reference counting.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1193
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422190558.30509-1-chris@chris-wilson.co.uk
Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).
Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.
GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
As the engine->kernel_context is used within the engine-pm barrier, we
have to be careful when emitting requests outside of the barrier, as the
strict timeline locking rules do not apply. Instead, we must ensure the
engine_park() cannot be entered as we build the request, which is
simplest by taking an explicit engine-pm wakeref around the request
construction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk
As drm now exports a method to create an anonymous struct file around a
drm_device for internal use, make use of it to avoid our horrible hacks.
Danial suggested that the mock_file_put() wrapper was suitable for
drm-core, along with the mock_drm_getfile() [and that the vestigal
mock_drm_file() in this patch should perhaps be the drm interface
itself]. However, the eventual goal is to remove the mock_drm_file() and
use the struct file and fput() directly, in this patch we take a simple
transition in that direction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20191107180601.30815-3-chris@chris-wilson.co.uk
With the introduction of ctx->engines[] we allow multiple logical
contexts to be used on the same engine (e.g. with virtual engines).
According to bspec, aach logical context requires a unique tag in order
for context-switching to occur correctly between them. [Simple
experiments show that it is not so easy to trick the HW into performing
a lite-restore with matching logical IDs, though my memory from early
Broadwell experiments do suggest that it should be generating
lite-restores.]
We only need to keep a unique tag for the active lifetime of the
context, and for as long as we need to identify that context. The HW
uses the tag to determine if it should use a lite-restore (why not the
LRCA?) and passes the tag back for various status identifies. The only
status we need to track is for OA, so when using perf, we assign the
specific context a unique tag.
v2: Calculate required number of tags to fill ELSP.
Fixes: 976b55f0e1 ("drm/i915: Allow a context to define its set of engines")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk