If we want to percolate information back from the HW, up through the GEM
context, we need to wait until the intel_context is scheduled out for
the last time. This is handled by the retirement of the intel_context's
barrier, i.e. by listening to the pulse after the notional unpin. So
wait until the intel_context is finally retired before releasing the
engine, so that we can inspect the final context state and pass it on.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200406155840.1728-3-chris@chris-wilson.co.uk
Now that we can peek at GEM->engines[] and obtain a reference to them
using RCU, do so for instances where we can safely iterate the
potentially old copy of the engines. For setting, we can do this when we
know the engine properties are copied over before swapping, so we know
the new engines already have the global property and we update the old
before they are discarded. For reading, we only need to be safe; as we
do so on behalf of the user, their races are their own problem.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200402124218.6375-1-chris@chris-wilson.co.uk
We cached the number of vma bound to the object in order to speed up
shrinker decisions. This has been superseded by being more proactive in
removing objects we cannot shrink from the shrinker lists, and so we can
drop the clumsy attempt at atomically counting the bind count and
comparing it to the number of pinned mappings of the object. This will
only get more clumsier with asynchronous binding and unbinding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200401223924.16667-1-chris@chris-wilson.co.uk
A recent commit in clang added -Wtautological-compare to -Wall, which is
enabled for i915 after -Wtautological-compare is disabled for the rest
of the kernel so we see the following warning on x86_64:
../drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1433:22: warning:
result of comparison of constant 576460752303423487 with expression of
type 'unsigned int' is always false
[-Wtautological-constant-out-of-range-compare]
if (unlikely(remain > N_RELOC(ULONG_MAX)))
~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
../include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
1 warning generated.
It is not wrong in the case where ULONG_MAX > UINT_MAX but it does not
account for the case where this file is built for 32-bit x86, where
ULONG_MAX == UINT_MAX and this check is still relevant.
Cast remain to unsigned long, which keeps the generated code the same
(verified with clang-11 on x86_64 and GCC 9.2.0 on x86 and x86_64) and
the warning is silenced so we can catch more potential issues in the
future.
Closes: https://github.com/ClangBuiltLinux/linux/issues/778
Suggested-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200214054706.33870-1-natechancellor@gmail.com
If the caller allows and we do not have to wait for any signals,
immediately execute the work within the caller's process. By doing so we
avoid the overhead of scheduling a new task, and the latency in
executing it, at the cost of pulling that work back into the immediate
context. (Sometimes we still prefer to offload the task to another cpu,
especially if we plan on executing many such tasks in parallel for this
client.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325120227.8044-2-chris@chris-wilson.co.uk
It looks like some callers expect a non-volatile object, that they do not
want the contents of the pages lost if they happen to not be looking at it.
The shrinker however sees that we mark the pages as DONTNEED and
believes that it can freely reap them. However, since the huge object
use plain pages, they cannot be swapped out as they have no backing
storge, and the only way we can shrink them is by discarding the
contents. In light of the callers wanting to keep the contents around,
both IS_SHRINKABLE and marking the pages as volatile are incorrect.
If we drop the IS_SHRINKABLE flag we avoid the immediate issue of the
shrinker accidentally removing valuable content. We will have to
remember that a huge object is not suitable for exercising the shrinker
interaction -- although we can introduce a shrinkable one if we require.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200323130821.47914-1-matthew.auld@intel.com
As we store the handle lookup inside a radix tree, we do not need the
gem_context->mutex except until we need to insert our lookup into the
common radix tree. This takes a small bit of rearranging to ensure that
the lut we insert into the tree is ready prior to actually inserting it
(as soon as it is exposed via the radixtree, it is visible to any other
submission).
v2: For brownie points, remove the goto spaghetti.
v3: Tighten up the closed-handle checks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-8-chris@chris-wilson.co.uk
On Gen11 powergating half the execution units is a functional
requirement when using the VME samplers. Not fullfilling this
requirement can lead to hangs.
This unfortunately plays fairly poorly with the NOA requirements. NOA
requires a stable power configuration to maintain its configuration.
As a result using OA (and NOA feeding into it) so far has required us
to use a power configuration that can work for all contexts. The only
power configuration fullfilling this is powergating half the execution
units.
This makes performance analysis for 3D workloads somewhat pointless.
Failing to find a solution that would work for everybody, this change
introduces a new i915-perf stream open parameter that punts the
decision off to userspace. If this parameter is omitted, the existing
Gen11 behavior remains (half EU array powergating).
This change takes the initiative to move all perf related sseu
configuration into i915_perf.c
v2: Make parameter priviliged if different from default
v3: Fix context modifying its sseu config while i915-perf is enabled
v4: Always consider global sseu a privileged operation (Tvrtko)
Override req_sseu point in intel_sseu_make_rpcs() (Tvrtko)
Remove unrelated changes (Tvrtko)
v5: Some typos (Tvrtko)
Process sseu param in read_properties_unlocked() (Tvrtko)
v6: Actually commit the bits from v5...
Fixup some checkpath warnings
v7: Only compare engine uabi field (Chris)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-3-lionel.g.landwerlin@intel.com
For our convenience, and to avoid frequent allocations, we placed some
lists we use for execbuf inside the common i915_vma struct. As we look
to parallelise execbuf, such fields guarded by the struct_mutex BKL must
be pulled under local control. Instead of using the i915_vma as our
primary means of tracking the user's list of objects and their virtual
mappings, we use a local eb_vma with the same lists as before (just now
local not global).
This should allow us to only perform the lookup of vma used for
execution once during the execbuf ioctl, as currently we need to remove
our secrets from inside i915_vma everytime we drop the struct_mutex as
another execbuf may use the shared locations.
Once potential user visible consequence is that we can remove the
requirement that the execobj[] be unique, and only require that they do
not conflict (i.e. you cannot softpin the same object into two locations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200303204345.1859734-2-chris@chris-wilson.co.uk
With the goal of removing the serialisation from around execbuf, we will
no longer have the privilege of there being a single execbuf in flight
at any time and so will only be able to inspect the user's flags within
the carefully controlled execbuf context. i915_gem_evict_for_node() is
the only user outside of execbuf that currently peeks at the flag to
convert an overlapping softpinned request from ENOSPC to EINVAL. Retract
this nicety and only report ENOSPC if the location is in current use,
either due to this execbuf or another.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200303204345.1859734-1-chris@chris-wilson.co.uk