Use the mm reference from the fence. This allows removing the
svm_bo->svms pointer, which was problematic because we cannot assume
that the struct kfd_process containing the svms is still allocated
without holding a refcount on the process.
Use mmget_not_zero to ensure the mm is still valid, and drop the svm_bo
reference if it isn't.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Unified memory with xnack off should be tracked, as userptr mappings
and legacy allocations do. To avoid oversuscribe system memory when
xnack off.
[How]
Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
API and call them on every prange creation and free.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Giant svm range split to smaller ranges, align the range start address
to max svm range pages to improve MMU TLB usage.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will be used to split giant svm range into smaller ranges, to
support VRAM overcommitment by giant range and improve GPU retry fault
recover on giant range.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SVM range unmapped from GPUs when range is unmapped from CPU, or with
xnack on from MMU notifier when range is evicted or migrated.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Output user queue eviction and restore event. User queue eviction may be
triggered by svm or userptr MMU notifier, TTM eviction, device suspend
and CRIU checkpoint and restore.
User queue restore may be rescheduled if eviction happens again while
restore.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For migration start and end event, output timestamp when migration
starts, ends, svm range address and size, GPU id of migration source and
destination and svm range attributes,
Migration trigger could be prefetch, CPU or GPU page fault and TTM
eviction.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use ktime_get_boottime_ns() as timestamp to correlate with other
APIs. Output timestamp when GPU recoverable fault starts and ends to
recover the fault, if migration happened or only GPU page table is
updated to recover, fault address, if read or write fault.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Migration range from system memory to VRAM, if system page can not be
locked or unmapped, we do partial migration and leave some pages in
system memory. Several bugs found to copy pages and update GPU mapping
for this situation:
1. copy to vram should use migrate->npage which is total pages of range
as npages, not migrate->cpages which is number of pages can be migrated.
2. After partial copy, set VRAM res cursor as j + 1, j is number of
system pages copied plus 1 page to skip copy.
3. copy to ram, should collect all continuous VRAM pages and copy
together.
4. Call amdgpu_vm_update_range, should pass in offset as bytes, not
as number of pages.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
MMU notifier callback may pass in mm with mm->mm_users==0 when process
is exiting, use mmget_no_zero to avoid accessing invalid mm in deferred
list work after mm is gone.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a AMDGPU_GEM_CREATE_DISCARDABLE flag to note that the content of a BO
doesn't needs to be preserved during eviction.
KFD was already using a similar functionality for SVM BOs so replace the
internal flag with the new UAPI.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change SVM range mapping flags or access attributes don't trigger
migration, if range is already mapped on GPUs we should update GPU
mapping and pass flush_tlb flag true to amdgpu vm.
Change SVM range preferred_loc or migration granularity don't need
update GPU mapping, skip the validate_and_map.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To avoid unnecessary unmap SVM range from GPUs if range is not mapped on
GPUs when migrating the range. This flag will also be used to flush TLB
when updating the existing mapping on GPUs.
It is protected by prange->migrate_mutex and mmap read lock in MMU
notifier callback.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 36bf93216e.
It causes SVM regressions on Vega10 with XNACK-ON. Just revert it
at the moment.
./kfdtest --gtest_filter=KFDSVMRangeTest.MigratePolicyTest
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The idea is from
commit a50fe70780 ("drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran")
and
commit f61c40c075 ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus").
At the moment, heavy-weight TLB could cause problems on ASICs except
Aldebaran and Arcturus.
A simple hipMallocManaged/hipFree program could trigger this issue.
[ 97.787657] amdgpu 0000:01:00.0: amdgpu: wait for kiq fence error: 0.
[ 106.868758] amdgpu: qcm fence wait loop timeout expired
[ 106.868966] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
[ 106.869203] amdgpu: Failed to evict process queues
[ 106.869261] amdgpu: Failed to quiesce KFD
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Application could change XNACK enabled to disabled while KFD is draining
stale retry fault, therefore the check for whether to drain retry faults
must be before the check for whether xnack_enabled, to avoid report
incorrect vm fault after application changes XNACK mode.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Testing the valid bit is not enough to figure out if we
need to invalidate the TLB or not.
During eviction it is quite likely that we move a BO from VRAM to GTT and
update the page tables immediately to the new GTT address.
Rework the whole function to get all the necessary parameters directly as
value.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Migrate vram to ram may fail to find the vma if process is exiting
and vma is removed, evict svm bo worker sets prange->svm_bo to NULL
and warn svm_bo ref count != 1 only if migrating vram to ram
successfully.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In CRIU resume stage, resume all the shared virtual memory ranges from
the data stored inside the resuming kfd process during CRIU restore
phase. Also setup xnack mode and free up the resources.
KFD_IOCTL_SVM_ATTR_CLR_FLAGS is not available for querying via get_attr
interface but we must clear the flags during restore as there might be
some default flags set when the prange is created. Also handle the
invalid PREFETCH atribute values saved during checkpoint by replacing
them with another dummy KFD_IOCTL_SVM_ATTR_SET_FLAGS attribute.
(rajneesh: Fixed the checkpatch reported problems)
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During CRIU restore phase, the VMAs for the virtual address ranges are
not at their final location yet so in this stage, only cache the data
required to successfully resume the svm ranges during an imminent CRIU
resume phase.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During checkpoint stage, save the shared virtual memory ranges and
attributes for the target process. A process may contain a number of svm
ranges and each range might contain a number of attributes. While not
all attributes may be applicable for a given prange but during
checkpoint we store all possible values for the max possible attribute
types.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A KFD process may contain a number of virtual address ranges for shared
virtual memory management and each such range can have many SVM
attributes spanning across various nodes within the process boundary.
This change reports the total number of such SVM ranges and
their total private data size by extending the PROCESS_INFO op of the the
CRIU IOCTL to discover the svm ranges in the target process and a future
patches brings in the required support for checkpoint and restore for
SVM ranges.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Both svm_range_get_attr and svm_range_set_attr helpers use mm struct
from current but for a Checkpoint or Restore operation, the current->mm
will fetch the mm for the CRIU master process. So modify these helpers to
accept the task mm for a target kfd process to support Checkpoint
Restore.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
All warnings (new ones prefixed by >>):
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function
'svm_range_deferred_list_work':
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2067:22: warning:
variable 'p' set but not used [-Wunused-but-set-variable]
2067 | struct kfd_process *p;
|
Fixes: 367c9b0f1b ("drm/amdkfd: Ensure mm remain valid in svm deferred_list work")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-By: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kfd_process_notifier_release flush svm_range_restore_work
which calls svm_range_list_lock_and_flush_work to flush deferred_list
work, but if deferred_list work mmput release the last user, it will
call exit_mmap -> notifier_release, it is deadlock with below backtrace.
Move flush svm_range_restore_work to kfd_process_wq_release to avoid
deadlock. Then svm_range_restore_work take task->mm ref to avoid mm is
gone while validating and mapping ranges to GPU.
Workqueue: events svm_range_deferred_list_work [amdgpu]
Call Trace:
wait_for_completion+0x94/0x100
__flush_work+0x12a/0x1e0
__cancel_work_timer+0x10e/0x190
cancel_delayed_work_sync+0x13/0x20
kfd_process_notifier_release+0x98/0x2a0 [amdgpu]
__mmu_notifier_release+0x74/0x1f0
exit_mmap+0x170/0x200
mmput+0x5d/0x130
svm_range_deferred_list_work+0x104/0x230 [amdgpu]
process_one_work+0x220/0x3c0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reported-by: Ruili Ji <ruili.ji@amd.com>
Tested-by: Ruili Ji <ruili.ji@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
svm_deferred_list work should continue to handle deferred_range_list
which maybe split to child range to avoid child range leak, and remove
ranges mmu interval notifier to avoid mm mm_count leak. So taking mm
reference when adding range to deferred list, to ensure mm is valid in
the scheduled deferred_list_work, and drop the mm referrence after range
is handled.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reported-by: Ruili Ji <ruili.ji@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The remove_list head was only used for keeping track of existing ranges
that are to be removed from the svms->list. The update_list was used for
new or existing ranges that need updated attributes. These two cases are
mutually exclusive (i.e. the same range will never be on both lists).
Therefore we can use the update_list head to track the remove_list and
save another 16 bytes in the svm_range struct.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are seven list_heads in struct svm_range: list, update_list,
remove_list, insert_list, svm_bo_list, deferred_list, child_list. This
patch and the next one remove two of them that are redundant.
The insert_list head was only used for new ranges that are not on the
svms->list yet. So we can use that list head for keeping track of
new ranges before they get added, and use list_move_tail to move them
to the svms->list when ready.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add svm_range_bo_unref_async to schedule work to wait for svm_bo
eviction work done and then free svm_bo. __do_munmap put_page
is atomic context, call svm_range_bo_unref_async to avoid warning
invalid wait context. Other non atomic context call svm_range_bo_unref.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes the warnings below, and also drops the display_count
variable, as it's unused.
In function 'svm_range_map_to_gpu':
warning: variable 'bo_va' set but not used [-Wunused-but-set-variable]
1172 | struct amdgpu_bo_va bo_va;
| ^~~~~
...
In function 'dcn201_update_clocks':
warning: variable 'enter_display_off' set but not used [-Wunused-but-set-variable]
132 | bool enter_display_off = false;
| ^~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Drop display_count variable.
- As suggested by Felix Kuehling:
1. Remove block surrounding amdgpu_xgmi_same_hive.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If an existing SVM range overlaps an svm_range_set_attr call, we would
normally split it in order to update only the overlapping part.
However, if the attributes of the existing range would not be changed
splitting it is unnecessary.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The existing function doesn't compare the access bitmaps and flags.
This can result in failure to update those attributes in existing
ranges when all other attributes remained unchanged.
Because the access and flags attributes modify only some bits in the
respective bitmaps, we cannot compare them directly. Instead we need to
check whether applying the attributes to a particular range would
change the bitmaps.
A PREFETCH_LOC attribute must always trigger a migration, even if the
attribute value remains unchanged. E.g. if some pages were migrated due
to a CPU page fault, a prefetch must still be executed to migrate pages
back to VRAM.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add null-pointer check after the last svm_range_new call. This was
originally reported by Zhou Qingyang <zhou1615@umn.edu> based on a
static analyzer.
To avoid duplicating the unwinding code from svm_range_handle_overlap,
I merged the two functions into one.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Zhou Qingyang <zhou1615@umn.edu>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes various warnings relating to erroneous docstring syntax, of
which some are listed below:
warning: Function parameter or member 'adev' not described in
'amdgpu_atomfirmware_ras_rom_addr'
...
warning: expecting prototype for amdgpu_atpx_validate_functions().
Prototype was for amdgpu_atpx_validate() instead
...
warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new'
...
warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of
waves in-flight on this device
...
warning: This comment starts with '/**', but isn't a kernel-doc
comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
process_info->lock is used to protect kfd_bo_list, vm_list_head, n_vms
and userptr valid/inval list, svm_range_restore_work and
svm_range_set_attr don't access those, so do not need to take
process_info lock. This will avoid potential circular locking issue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>