linux/drivers/gpu/drm/i915/gem/i915_gem_object.c

684 lines
19 KiB
C
Raw Normal View History

drm/i915: Split obj->cache_coherent to track r/w Another month, another story in the cache coherency saga. This time, we come to the realisation that i915_gem_object_is_coherent() has been reporting whether we can read from the target without requiring a cache invalidate; but we were using it in places for testing whether we could write into the object without requiring a cache flush. So split the tracking into two, one to decide before reads, one after writes. See commit e27ab73d17ef ("drm/i915: Mark CPU cache as dirty on every transition for CPU writes") for the previous entry in this saga. v2: Be verbose v3: Remove unused function (i915_gem_object_is_coherent) v4: Fix inverted coherency check prior to execbuf (from v2) v5: Add comment for nasty code where we are optimising on gcc's behalf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555 Testcase: igt/kms_mmap_write_crc Testcase: igt/kms_pwrite_crc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-11 11:11:16 +00:00
/*
* Copyright © 2017 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*
*/
drm/i915: Switch obj->mm.lock lockdep annotations on its head The trouble with having a plain nesting flag for locks which do not naturally nest (unlike block devices and their partitions, which is the original motivation for nesting levels) is that lockdep will never spot a true deadlock if you screw up. This patch is an attempt at trying better, by highlighting a bit more of the actual nature of the nesting that's going on. Essentially we have two kinds of objects: - objects without pages allocated, which cannot be on any lru and are hence inaccessible to the shrinker. - objects which have pages allocated, which are on an lru, and which the shrinker can decide to throw out. For the former type of object, memory allocations while holding obj->mm.lock are permissible. For the latter they are not. And get/put_pages transitions between the two types of objects. This is still not entirely fool-proof since the rules might change. But as long as we run such a code ever at runtime lockdep should be able to observe the inconsistency and complain (like with any other lockdep class that we've split up in multiple classes). But there are a few clear benefits: - We can drop the nesting flag parameter from __i915_gem_object_put_pages, because that function by definition is never going allocate memory, and calling it on an object which doesn't have its pages allocated would be a bug. - We strictly catch more bugs, since there's not only one place in the entire tree which is annotated with the special class. All the other places that had explicit lockdep nesting annotations we're now going to leave up to lockdep again. - Specifically this catches stuff like calling get_pages from put_pages (which isn't really a good idea, if we can call get_pages so could the shrinker). I've seen patches do exactly that. Of course I fully expect CI will show me for the fool I am with this one here :-) v2: There can only be one (lockdep only has a cache for the first subclass, not for deeper ones, and we don't want to make these locks even slower). Still separate enums for better documentation. Real fix: don't forget about phys objs and pin_map(), and fix the shrinker to have the right annotations ... silly me. v3: Forgot usertptr too ... v4: Improve comment for pages_pin_count, drop the IMPORTANT comment and instead prime lockdep (Chris). v5: Appease checkpatch, no double empty lines (Chris) v6: More rebasing over selftest changes. Also somehow I forgot to push this patch :-/ Also format comments consistently while at it. v7: Fix typo in commit message (Joonas) Also drop the priming, with the lmem merge we now have allocations while holding the lmem lock, which wreaks the generic priming I've done in earlier patches. Should probably be resurrected when lmem is fixed. See commit 232a6ebae419193f5b8da4fa869ae5089ab105c2 Author: Matthew Auld <matthew.auld@intel.com> Date: Tue Oct 8 17:01:14 2019 +0100 drm/i915: introduce intel_memory_region I'm keeping the priming patch locally so it wont get lost. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Tang, CQ" <cq.tang@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v6) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191105090148.30269-1-daniel.vetter@ffwll.ch [mlankhorst: Fix commit typos pointed out by Michael Ruhl]
2019-11-05 09:01:48 +00:00
#include <linux/sched/mm.h>
#include "display/intel_frontbuffer.h"
drm/i915: Split obj->cache_coherent to track r/w Another month, another story in the cache coherency saga. This time, we come to the realisation that i915_gem_object_is_coherent() has been reporting whether we can read from the target without requiring a cache invalidate; but we were using it in places for testing whether we could write into the object without requiring a cache flush. So split the tracking into two, one to decide before reads, one after writes. See commit e27ab73d17ef ("drm/i915: Mark CPU cache as dirty on every transition for CPU writes") for the previous entry in this saga. v2: Be verbose v3: Remove unused function (i915_gem_object_is_coherent) v4: Fix inverted coherency check prior to execbuf (from v2) v5: Add comment for nasty code where we are optimising on gcc's behalf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555 Testcase: igt/kms_mmap_write_crc Testcase: igt/kms_pwrite_crc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-11 11:11:16 +00:00
#include "i915_drv.h"
#include "i915_gem_clflush.h"
#include "i915_gem_context.h"
#include "i915_gem_mman.h"
#include "i915_gem_object.h"
#include "i915_memcpy.h"
#include "i915_trace.h"
drm/i915: Split obj->cache_coherent to track r/w Another month, another story in the cache coherency saga. This time, we come to the realisation that i915_gem_object_is_coherent() has been reporting whether we can read from the target without requiring a cache invalidate; but we were using it in places for testing whether we could write into the object without requiring a cache flush. So split the tracking into two, one to decide before reads, one after writes. See commit e27ab73d17ef ("drm/i915: Mark CPU cache as dirty on every transition for CPU writes") for the previous entry in this saga. v2: Be verbose v3: Remove unused function (i915_gem_object_is_coherent) v4: Fix inverted coherency check prior to execbuf (from v2) v5: Add comment for nasty code where we are optimising on gcc's behalf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555 Testcase: igt/kms_mmap_write_crc Testcase: igt/kms_pwrite_crc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-11 11:11:16 +00:00
static struct kmem_cache *slab_objects;
static const struct drm_gem_object_funcs i915_gem_object_funcs;
struct drm_i915_gem_object *i915_gem_object_alloc(void)
{
struct drm_i915_gem_object *obj;
obj = kmem_cache_zalloc(slab_objects, GFP_KERNEL);
if (!obj)
return NULL;
obj->base.funcs = &i915_gem_object_funcs;
return obj;
}
void i915_gem_object_free(struct drm_i915_gem_object *obj)
{
return kmem_cache_free(slab_objects, obj);
}
void i915_gem_object_init(struct drm_i915_gem_object *obj,
const struct drm_i915_gem_object_ops *ops,
struct lock_class_key *key, unsigned flags)
{
/*
* A gem object is embedded both in a struct ttm_buffer_object :/ and
* in a drm_i915_gem_object. Make sure they are aliased.
*/
BUILD_BUG_ON(offsetof(typeof(*obj), base) !=
offsetof(typeof(*obj), __do_not_access.base));
spin_lock_init(&obj->vma.lock);
INIT_LIST_HEAD(&obj->vma.list);
INIT_LIST_HEAD(&obj->mm.link);
INIT_LIST_HEAD(&obj->lut_list);
spin_lock_init(&obj->lut_lock);
spin_lock_init(&obj->mmo.lock);
obj->mmo.offsets = RB_ROOT;
init_rcu_head(&obj->rcu);
obj->ops = ops;
GEM_BUG_ON(flags & ~I915_BO_ALLOC_FLAGS);
obj->flags = flags;
obj->mm.madv = I915_MADV_WILLNEED;
INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
mutex_init(&obj->mm.get_page.lock);
INIT_RADIX_TREE(&obj->mm.get_dma_page.radix, GFP_KERNEL | __GFP_NOWARN);
mutex_init(&obj->mm.get_dma_page.lock);
}
drm/i915: Split obj->cache_coherent to track r/w Another month, another story in the cache coherency saga. This time, we come to the realisation that i915_gem_object_is_coherent() has been reporting whether we can read from the target without requiring a cache invalidate; but we were using it in places for testing whether we could write into the object without requiring a cache flush. So split the tracking into two, one to decide before reads, one after writes. See commit e27ab73d17ef ("drm/i915: Mark CPU cache as dirty on every transition for CPU writes") for the previous entry in this saga. v2: Be verbose v3: Remove unused function (i915_gem_object_is_coherent) v4: Fix inverted coherency check prior to execbuf (from v2) v5: Add comment for nasty code where we are optimising on gcc's behalf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555 Testcase: igt/kms_mmap_write_crc Testcase: igt/kms_pwrite_crc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-11 11:11:16 +00:00
/**
* Mark up the object's coherency levels for a given cache_level
* @obj: #drm_i915_gem_object
* @cache_level: cache level
*/
void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
unsigned int cache_level)
{
obj->cache_level = cache_level;
if (cache_level != I915_CACHE_NONE)
obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |
I915_BO_CACHE_COHERENT_FOR_WRITE);
else if (HAS_LLC(to_i915(obj->base.dev)))
obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ;
else
obj->cache_coherent = 0;
obj->cache_dirty =
!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE);
}
static void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
{
struct drm_i915_gem_object *obj = to_intel_bo(gem);
struct drm_i915_file_private *fpriv = file->driver_priv;
struct i915_lut_handle bookmark = {};
struct i915_mmap_offset *mmo, *mn;
struct i915_lut_handle *lut, *ln;
LIST_HEAD(close);
spin_lock(&obj->lut_lock);
list_for_each_entry_safe(lut, ln, &obj->lut_list, obj_link) {
struct i915_gem_context *ctx = lut->ctx;
if (ctx && ctx->file_priv == fpriv) {
i915_gem_context_get(ctx);
list_move(&lut->obj_link, &close);
}
/* Break long locks, and carefully continue on from this spot */
if (&ln->obj_link != &obj->lut_list) {
list_add_tail(&bookmark.obj_link, &ln->obj_link);
if (cond_resched_lock(&obj->lut_lock))
list_safe_reset_next(&bookmark, ln, obj_link);
__list_del_entry(&bookmark.obj_link);
}
}
spin_unlock(&obj->lut_lock);
spin_lock(&obj->mmo.lock);
rbtree_postorder_for_each_entry_safe(mmo, mn, &obj->mmo.offsets, offset)
drm_vma_node_revoke(&mmo->vma_node, file);
spin_unlock(&obj->mmo.lock);
list_for_each_entry_safe(lut, ln, &close, obj_link) {
struct i915_gem_context *ctx = lut->ctx;
struct i915_vma *vma;
/*
* We allow the process to have multiple handles to the same
* vma, in the same fd namespace, by virtue of flink/open.
*/
mutex_lock(&ctx->lut_mutex);
vma = radix_tree_delete(&ctx->handles_vma, lut->handle);
if (vma) {
GEM_BUG_ON(vma->obj != obj);
GEM_BUG_ON(!atomic_read(&vma->open_count));
i915_vma_close(vma);
}
mutex_unlock(&ctx->lut_mutex);
i915_gem_context_put(lut->ctx);
i915_lut_handle_free(lut);
i915_gem_object_put(obj);
}
}
void __i915_gem_free_object_rcu(struct rcu_head *head)
{
struct drm_i915_gem_object *obj =
container_of(head, typeof(*obj), rcu);
struct drm_i915_private *i915 = to_i915(obj->base.dev);
drm-misc-next for 5.4: UAPI Changes: Cross-subsystem Changes: Core Changes: - dma-buf: add reservation_object_fences helper, relax reservation_object_add_shared_fence, remove reservation_object seq number (and then restored) - dma-fence: Shrinkage of the dma_fence structure, Merge dma_fence_signal and dma_fence_signal_locked, Store the timestamp in struct dma_fence in a union with cb_list Driver Changes: - More dt-bindings YAML conversions - More removal of drmP.h includes - dw-hdmi: Support get_eld and various i2s improvements - gm12u320: Few fixes - meson: Global cleanup - panfrost: Few refactors, Support for GPU heap allocations - sun4i: Support for DDC enable GPIO - New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02, Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1 Toppoly TD043MTEA1 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXVqvpwAKCRDj7w1vZxhR xa3RAQDzAnt5zeesAxX4XhRJzHoCEwj2PJj9Re6xMJ9PlcfcvwD+OS+bcB6jfiXV Ug9IBd/DqjlmD9G9MxFxfSV946rksAw= =8uv4 -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2019-08-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.4: UAPI Changes: Cross-subsystem Changes: Core Changes: - dma-buf: add reservation_object_fences helper, relax reservation_object_add_shared_fence, remove reservation_object seq number (and then restored) - dma-fence: Shrinkage of the dma_fence structure, Merge dma_fence_signal and dma_fence_signal_locked, Store the timestamp in struct dma_fence in a union with cb_list Driver Changes: - More dt-bindings YAML conversions - More removal of drmP.h includes - dw-hdmi: Support get_eld and various i2s improvements - gm12u320: Few fixes - meson: Global cleanup - panfrost: Few refactors, Support for GPU heap allocations - sun4i: Support for DDC enable GPIO - New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02, Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1 Toppoly TD043MTEA1 Signed-off-by: Dave Airlie <airlied@redhat.com> [airlied: fixup dma_resv rename fallout] From: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190819141923.7l2adietcr2pioct@flea
2019-08-21 05:38:43 +00:00
dma_resv_fini(&obj->base._resv);
i915_gem_object_free(obj);
GEM_BUG_ON(!atomic_read(&i915->mm.free_count));
atomic_dec(&i915->mm.free_count);
}
static void __i915_gem_object_free_mmaps(struct drm_i915_gem_object *obj)
{
/* Skip serialisation and waking the device if known to be not used. */
if (obj->userfault_count)
i915_gem_object_release_mmap_gtt(obj);
if (!RB_EMPTY_ROOT(&obj->mmo.offsets)) {
struct i915_mmap_offset *mmo, *mn;
i915_gem_object_release_mmap_offset(obj);
rbtree_postorder_for_each_entry_safe(mmo, mn,
&obj->mmo.offsets,
offset) {
drm_vma_offset_remove(obj->base.dev->vma_offset_manager,
&mmo->vma_node);
kfree(mmo);
}
obj->mmo.offsets = RB_ROOT;
}
}
void __i915_gem_free_object(struct drm_i915_gem_object *obj)
{
trace_i915_gem_object_destroy(obj);
if (!list_empty(&obj->vma.list)) {
struct i915_vma *vma;
/*
* Note that the vma keeps an object reference while
* it is active, so it *should* not sleep while we
* destroy it. Our debug code errs insits it *might*.
* For the moment, play along.
*/
spin_lock(&obj->vma.lock);
while ((vma = list_first_entry_or_null(&obj->vma.list,
struct i915_vma,
obj_link))) {
GEM_BUG_ON(vma->obj != obj);
spin_unlock(&obj->vma.lock);
__i915_vma_put(vma);
drm/i915: Pull i915_vma_pin under the vm->mutex Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 13:39:58 +00:00
spin_lock(&obj->vma.lock);
}
spin_unlock(&obj->vma.lock);
}
drm/i915: Pull i915_vma_pin under the vm->mutex Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 13:39:58 +00:00
__i915_gem_object_free_mmaps(obj);
drm/i915: Pull i915_vma_pin under the vm->mutex Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 13:39:58 +00:00
GEM_BUG_ON(!list_empty(&obj->lut_list));
atomic_set(&obj->mm.pages_pin_count, 0);
__i915_gem_object_put_pages(obj);
GEM_BUG_ON(i915_gem_object_has_pages(obj));
bitmap_free(obj->bit_17);
if (obj->base.import_attach)
drm_prime_gem_destroy(&obj->base, NULL);
drm_gem_free_mmap_offset(&obj->base);
if (obj->ops->release)
obj->ops->release(obj);
if (obj->mm.n_placements > 1)
kfree(obj->mm.placements);
if (obj->shares_resv_from)
i915_vm_resv_put(obj->shares_resv_from);
}
static void __i915_gem_free_objects(struct drm_i915_private *i915,
struct llist_node *freed)
{
struct drm_i915_gem_object *obj, *on;
llist_for_each_entry_safe(obj, on, freed, freed) {
might_sleep();
if (obj->ops->delayed_free) {
obj->ops->delayed_free(obj);
continue;
}
__i915_gem_free_object(obj);
/* But keep the pointer alive for RCU-protected lookups */
call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
cond_resched();
}
}
void i915_gem_flush_free_objects(struct drm_i915_private *i915)
{
struct llist_node *freed = llist_del_all(&i915->mm.free_list);
if (unlikely(freed))
__i915_gem_free_objects(i915, freed);
}
static void __i915_gem_free_work(struct work_struct *work)
{
struct drm_i915_private *i915 =
container_of(work, struct drm_i915_private, mm.free_work);
i915_gem_flush_free_objects(i915);
}
static void i915_gem_free_object(struct drm_gem_object *gem_obj)
{
struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
struct drm_i915_private *i915 = to_i915(obj->base.dev);
GEM_BUG_ON(i915_gem_object_is_framebuffer(obj));
/*
* Before we free the object, make sure any pure RCU-only
* read-side critical sections are complete, e.g.
* i915_gem_busy_ioctl(). For the corresponding synchronized
* lookup see i915_gem_object_lookup_rcu().
*/
atomic_inc(&i915->mm.free_count);
/*
* This serializes freeing with the shrinker. Since the free
* is delayed, first by RCU then by the workqueue, we want the
* shrinker to be able to free pages of unreferenced objects,
* or else we may oom whilst there are plenty of deferred
* freed objects.
*/
i915_gem_object_make_unshrinkable(obj);
/*
* Since we require blocking on struct_mutex to unbind the freed
* object from the GPU before releasing resources back to the
* system, we can not do that directly from the RCU callback (which may
* be a softirq context), but must instead then defer that work onto a
* kthread. We use the RCU callback rather than move the freed object
* directly onto the work queue so that we can mix between using the
* worker and performing frees directly from subsequent allocations for
* crude but effective memory throttling.
*/
if (llist_add(&obj->freed, &i915->mm.free_list))
queue_work(i915->wq, &i915->mm.free_work);
}
void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,
enum fb_op_origin origin)
{
struct intel_frontbuffer *front;
front = __intel_frontbuffer_get(obj);
if (front) {
intel_frontbuffer_flush(front, origin);
intel_frontbuffer_put(front);
}
}
void __i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj,
enum fb_op_origin origin)
{
struct intel_frontbuffer *front;
front = __intel_frontbuffer_get(obj);
if (front) {
intel_frontbuffer_invalidate(front, origin);
intel_frontbuffer_put(front);
}
}
static void
i915_gem_object_read_from_page_kmap(struct drm_i915_gem_object *obj, u64 offset, void *dst, int size)
{
void *src_map;
void *src_ptr;
src_map = kmap_atomic(i915_gem_object_get_page(obj, offset >> PAGE_SHIFT));
src_ptr = src_map + offset_in_page(offset);
if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ))
drm_clflush_virt_range(src_ptr, size);
memcpy(dst, src_ptr, size);
kunmap_atomic(src_map);
}
static void
i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, u64 offset, void *dst, int size)
{
void __iomem *src_map;
void __iomem *src_ptr;
dma_addr_t dma = i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT);
src_map = io_mapping_map_wc(&obj->mm.region->iomap,
dma - obj->mm.region->region.start,
PAGE_SIZE);
src_ptr = src_map + offset_in_page(offset);
if (!i915_memcpy_from_wc(dst, (void __force *)src_ptr, size))
memcpy_fromio(dst, src_ptr, size);
io_mapping_unmap(src_map);
}
/**
* i915_gem_object_read_from_page - read data from the page of a GEM object
* @obj: GEM object to read from
* @offset: offset within the object
* @dst: buffer to store the read data
* @size: size to read
*
* Reads data from @obj at the specified offset. The requested region to read
* from can't cross a page boundary. The caller must ensure that @obj pages
* are pinned and that @obj is synced wrt. any related writes.
*
* Returns 0 on success or -ENODEV if the type of @obj's backing store is
* unsupported.
*/
int i915_gem_object_read_from_page(struct drm_i915_gem_object *obj, u64 offset, void *dst, int size)
{
GEM_BUG_ON(offset >= obj->base.size);
GEM_BUG_ON(offset_in_page(offset) > PAGE_SIZE - size);
GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
if (i915_gem_object_has_struct_page(obj))
i915_gem_object_read_from_page_kmap(obj, offset, dst, size);
else if (i915_gem_object_has_iomem(obj))
i915_gem_object_read_from_page_iomap(obj, offset, dst, size);
else
return -ENODEV;
return 0;
}
/**
* i915_gem_object_evictable - Whether object is likely evictable after unbind.
* @obj: The object to check
*
* This function checks whether the object is likely unvictable after unbind.
* If the object is not locked when checking, the result is only advisory.
* If the object is locked when checking, and the function returns true,
* then an eviction should indeed be possible. But since unlocked vma
* unpinning and unbinding is currently possible, the object can actually
* become evictable even if this function returns false.
*
* Return: true if the object may be evictable. False otherwise.
*/
bool i915_gem_object_evictable(struct drm_i915_gem_object *obj)
{
struct i915_vma *vma;
int pin_count = atomic_read(&obj->mm.pages_pin_count);
if (!pin_count)
return true;
spin_lock(&obj->vma.lock);
list_for_each_entry(vma, &obj->vma.list, obj_link) {
if (i915_vma_is_pinned(vma)) {
spin_unlock(&obj->vma.lock);
return false;
}
if (atomic_read(&vma->pages_count))
pin_count--;
}
spin_unlock(&obj->vma.lock);
GEM_WARN_ON(pin_count < 0);
return pin_count == 0;
}
/**
* i915_gem_object_migratable - Whether the object is migratable out of the
* current region.
* @obj: Pointer to the object.
*
* Return: Whether the object is allowed to be resident in other
* regions than the current while pages are present.
*/
bool i915_gem_object_migratable(struct drm_i915_gem_object *obj)
{
struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
if (!mr)
return false;
return obj->mm.n_placements > 1;
}
/**
* i915_gem_object_has_struct_page - Whether the object is page-backed
* @obj: The object to query.
*
* This function should only be called while the object is locked or pinned,
* otherwise the page backing may change under the caller.
*
* Return: True if page-backed, false otherwise.
*/
bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
{
#ifdef CONFIG_LOCKDEP
if (IS_DGFX(to_i915(obj->base.dev)) &&
i915_gem_object_evictable((void __force *)obj))
assert_object_held_shared(obj);
#endif
return obj->mem_flags & I915_BO_FLAG_STRUCT_PAGE;
}
/**
* i915_gem_object_has_iomem - Whether the object is iomem-backed
* @obj: The object to query.
*
* This function should only be called while the object is locked or pinned,
* otherwise the iomem backing may change under the caller.
*
* Return: True if iomem-backed, false otherwise.
*/
bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj)
{
#ifdef CONFIG_LOCKDEP
if (IS_DGFX(to_i915(obj->base.dev)) &&
i915_gem_object_evictable((void __force *)obj))
assert_object_held_shared(obj);
#endif
return obj->mem_flags & I915_BO_FLAG_IOMEM;
}
drm/i915/gem: Implement object migration Introduce an interface to migrate objects between regions. This is primarily intended to migrate objects to LMEM for display and to SYSTEM for dma-buf, but might be reused in one form or another for performance-based migration. v2: - Verify that the memory region given as an id really exists. (Reported by Matthew Auld) - Call i915_gem_object_{init,release}_memory_region() when switching region to handle also switching region lists. (Reported by Matthew Auld) v3: - Fix i915_gem_object_can_migrate() to return true if object is already in the correct region, even if the object ops doesn't have a migrate() callback. - Update typo in commit message. - Fix kerneldoc of i915_gem_object_wait_migration(). v4: - Improve documentation (Suggested by Mattew Auld and Michael Ruhl) - Always assume TTM migration hits a TTM move and unsets the pages through move_notify. (Reported by Matthew Auld) - Add a dma_fence_might_wait() annotation to i915_gem_object_wait_migration() (Suggested by Daniel Vetter) v5: - Re-add might_sleep() instead of __dma_fence_might_wait(), Sent v4 with the wrong version, didn't compile and __dma_fence_might_wait() is not exported. - Added an R-B. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210629151203.209465-2-thomas.hellstrom@linux.intel.com
2021-06-29 15:12:01 +00:00
/**
* i915_gem_object_can_migrate - Whether an object likely can be migrated
*
* @obj: The object to migrate
* @id: The region intended to migrate to
*
* Check whether the object backend supports migration to the
* given region. Note that pinning may affect the ability to migrate as
* returned by this function.
*
* This function is primarily intended as a helper for checking the
* possibility to migrate objects and might be slightly less permissive
* than i915_gem_object_migrate() when it comes to objects with the
* I915_BO_ALLOC_USER flag set.
*
* Return: true if migration is possible, false otherwise.
*/
bool i915_gem_object_can_migrate(struct drm_i915_gem_object *obj,
enum intel_region_id id)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
unsigned int num_allowed = obj->mm.n_placements;
struct intel_memory_region *mr;
unsigned int i;
GEM_BUG_ON(id >= INTEL_REGION_UNKNOWN);
GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED);
mr = i915->mm.regions[id];
if (!mr)
return false;
if (obj->mm.region == mr)
return true;
if (!i915_gem_object_evictable(obj))
return false;
if (!obj->ops->migrate)
return false;
if (!(obj->flags & I915_BO_ALLOC_USER))
return true;
if (num_allowed == 0)
return false;
for (i = 0; i < num_allowed; ++i) {
if (mr == obj->mm.placements[i])
return true;
}
return false;
}
/**
* i915_gem_object_migrate - Migrate an object to the desired region id
* @obj: The object to migrate.
* @ww: An optional struct i915_gem_ww_ctx. If NULL, the backend may
* not be successful in evicting other objects to make room for this object.
* @id: The region id to migrate to.
*
* Attempt to migrate the object to the desired memory region. The
* object backend must support migration and the object may not be
* pinned, (explicitly pinned pages or pinned vmas). The object must
* be locked.
* On successful completion, the object will have pages pointing to
* memory in the new region, but an async migration task may not have
* completed yet, and to accomplish that, i915_gem_object_wait_migration()
* must be called.
*
* Note: the @ww parameter is not used yet, but included to make sure
* callers put some effort into obtaining a valid ww ctx if one is
* available.
*
* Return: 0 on success. Negative error code on failure. In particular may
* return -ENXIO on lack of region space, -EDEADLK for deadlock avoidance
* if @ww is set, -EINTR or -ERESTARTSYS if signal pending, and
* -EBUSY if the object is pinned.
*/
int i915_gem_object_migrate(struct drm_i915_gem_object *obj,
struct i915_gem_ww_ctx *ww,
enum intel_region_id id)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
struct intel_memory_region *mr;
GEM_BUG_ON(id >= INTEL_REGION_UNKNOWN);
GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED);
assert_object_held(obj);
mr = i915->mm.regions[id];
GEM_BUG_ON(!mr);
if (!i915_gem_object_can_migrate(obj, id))
return -EINVAL;
drm/i915/gem: Implement object migration Introduce an interface to migrate objects between regions. This is primarily intended to migrate objects to LMEM for display and to SYSTEM for dma-buf, but might be reused in one form or another for performance-based migration. v2: - Verify that the memory region given as an id really exists. (Reported by Matthew Auld) - Call i915_gem_object_{init,release}_memory_region() when switching region to handle also switching region lists. (Reported by Matthew Auld) v3: - Fix i915_gem_object_can_migrate() to return true if object is already in the correct region, even if the object ops doesn't have a migrate() callback. - Update typo in commit message. - Fix kerneldoc of i915_gem_object_wait_migration(). v4: - Improve documentation (Suggested by Mattew Auld and Michael Ruhl) - Always assume TTM migration hits a TTM move and unsets the pages through move_notify. (Reported by Matthew Auld) - Add a dma_fence_might_wait() annotation to i915_gem_object_wait_migration() (Suggested by Daniel Vetter) v5: - Re-add might_sleep() instead of __dma_fence_might_wait(), Sent v4 with the wrong version, didn't compile and __dma_fence_might_wait() is not exported. - Added an R-B. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210629151203.209465-2-thomas.hellstrom@linux.intel.com
2021-06-29 15:12:01 +00:00
drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails Without TTM, we have no such hook so we exit early but this is fine because we use TTM on all LMEM platforms and, on integrated platforms, there is no real migration. If we do have the hook, it's better to just let TTM handle the migration because it knows where things are actually placed. This fixes a bug where i915_gem_object_migrate fails to migrate newly created LMEM objects. In that scenario, the object has obj->mm.region set to LMEM but TTM has it in SMEM because that's where all new objects are placed there prior to getting actual pages. When we invoke i915_gem_object_migrate, it exits early because, from the point of view of the GEM object, it's already in LMEM and no migration is needed. Then, when we try to pin the pages, __i915_ttm_get_pages is called which, unaware of our failed attempt at a migration, places the object in SMEM. This only happens on newly created objects because they have this weird state where TTM thinks they're in SMEM, GEM thinks they're in LMEM, and the reality is that they don't exist at all. It's better if GEM just always calls into TTM and let's TTM handle things. That way the lies stay better contained. Once the migration is complete, the object will have pages, obj->mm.region will be correct, and we're done lying. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210723172142.3273510-7-jason@jlekstrand.net
2021-07-23 17:21:40 +00:00
if (!obj->ops->migrate) {
if (GEM_WARN_ON(obj->mm.region != mr))
return -EINVAL;
return 0;
}
drm/i915/gem: Implement object migration Introduce an interface to migrate objects between regions. This is primarily intended to migrate objects to LMEM for display and to SYSTEM for dma-buf, but might be reused in one form or another for performance-based migration. v2: - Verify that the memory region given as an id really exists. (Reported by Matthew Auld) - Call i915_gem_object_{init,release}_memory_region() when switching region to handle also switching region lists. (Reported by Matthew Auld) v3: - Fix i915_gem_object_can_migrate() to return true if object is already in the correct region, even if the object ops doesn't have a migrate() callback. - Update typo in commit message. - Fix kerneldoc of i915_gem_object_wait_migration(). v4: - Improve documentation (Suggested by Mattew Auld and Michael Ruhl) - Always assume TTM migration hits a TTM move and unsets the pages through move_notify. (Reported by Matthew Auld) - Add a dma_fence_might_wait() annotation to i915_gem_object_wait_migration() (Suggested by Daniel Vetter) v5: - Re-add might_sleep() instead of __dma_fence_might_wait(), Sent v4 with the wrong version, didn't compile and __dma_fence_might_wait() is not exported. - Added an R-B. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210629151203.209465-2-thomas.hellstrom@linux.intel.com
2021-06-29 15:12:01 +00:00
return obj->ops->migrate(obj, mr);
}
/**
* i915_gem_object_placement_possible - Check whether the object can be
* placed at certain memory type
* @obj: Pointer to the object
* @type: The memory type to check
*
* Return: True if the object can be placed in @type. False otherwise.
*/
bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
enum intel_memory_type type)
{
unsigned int i;
if (!obj->mm.n_placements) {
switch (type) {
case INTEL_MEMORY_LOCAL:
return i915_gem_object_has_iomem(obj);
case INTEL_MEMORY_SYSTEM:
return i915_gem_object_has_pages(obj);
default:
/* Ignore stolen for now */
GEM_BUG_ON(1);
return false;
}
}
for (i = 0; i < obj->mm.n_placements; i++) {
if (obj->mm.placements[i]->type == type)
return true;
}
return false;
}
void i915_gem_init__objects(struct drm_i915_private *i915)
{
INIT_WORK(&i915->mm.free_work, __i915_gem_free_work);
}
void i915_objects_module_exit(void)
{
kmem_cache_destroy(slab_objects);
}
int __init i915_objects_module_init(void)
{
slab_objects = KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
if (!slab_objects)
return -ENOMEM;
return 0;
}
static const struct drm_gem_object_funcs i915_gem_object_funcs = {
.free = i915_gem_free_object,
.close = i915_gem_close_object,
.export = i915_gem_prime_export,
};
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftests/huge_gem_object.c"
#include "selftests/huge_pages.c"
#include "selftests/i915_gem_migrate.c"
#include "selftests/i915_gem_object.c"
#include "selftests/i915_gem_coherency.c"
#endif