This allows user mode to map doorbell pages into GPUVM address space.
That way GPUs can submit to user mode queues (self-dispatch).
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is used for interoperability between ROCm compute and graphics
APIs. It allows importing graphics driver BOs into the ROCm SVM
address space for zero-copy GPU access.
The API is split into two steps (query and import) to allow user mode
to manage the virtual address space allocation for the imported buffer.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The same statement is later done in kgd_set_pasid_vmid_mapping(), so no
need to do it in set_pasid_vmid_mapping().
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After amdkfd module is merged into amdgpu, KFD can call amdgpu directly
and no longer needs to use the function pointer. Replace those function
pointers with functions if they are not ASIC dependent.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c: In function 'destroy_queue_cpsch':
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:1366:7: warning:
variable 'preempt_all_queues' set but not used [-Wunused-but-set-variable]
It never used since introduct in
commit 992839ad64 ("drm/amdkfd: Add static user-mode queues support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This mm_struct pointer should never be dereferenced. If running in
a user thread, just use current->mm. If running in a kernel worker
use get_task_mm to get a safe reference to the mm_struct.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Firmware have the workaround to replace the atomic Ops with read-modify-write on CP side.
User should not expect atomic Ops on system memory works normally if system didn't not
support it.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-By: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wavefront context save data is of interest to userspace clients for
debugging static wavefront state. The MQD contains two parameters
required to parse the control stack and the control stack itself
is kept in the MQD from gfx9 onwards.
Add an ioctl to fetch the context save area and control stack offsets
and to copy the control stack to a userspace address if it is kept in
the MQD.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Because CRAT_CU_FLAGS_IOMMU_PRESENT was not set in some BIOS crat, we
need to workaround this.
For future compatibility, we also overwrite the bit in capability according
to the value of needs_iommu_device.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CWSR fails on Raven if the control stack is MTYPE_UC, which is used
for regular GART mappings. As a workaround we map it using MTYPE_NC.
The MEC firmware expects the control stack at one page offset from the
start of the MQD so it is part of the MQD allocation on GFXv9. AMDGPU
added a memory allocation flag just for this purpose.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move some KFD-related (but used in amdgpu_drv.c) definitions from
kfd_priv.h to kgd_kfd_interface.h so we don't need to include kfd_priv.h
in amdgpu_drv.c. This fixes a build failure when AMDGPU is enabled but
MMU_NOTIFIER is not.
This patch also disables KFD-related module options when HSA_AMD is not
enabled.
v2: rebase (Alex)
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For compute vm acquired from amdgpu, vm.pasid is managed
by kfd. Decouple pasid from such vm on process destroy
to avoid duplicate pasid release.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To make a amdgpu vm to a compute vm, the old pasid will be freed and
replaced with a pasid managed by kfd. Kfd can't reuse original pasid
allocated by amdgpu because kfd uses different pasid policy with amdgpu.
For example, all graphic devices share one same pasid in a process.
v2: rebase (Alex)
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After merging KFD into amdgpu, move module parameters defined in KFD to
amdgpu_drv.c, where other module parameters are declared.
v2: add kernel-doc comments
v3: rebase and fix parameter variable name (Alex)
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since KFD is only supported by single GPU driver, it makes sense to merge
amdgpu and amdkfd into one module. This patch is the initial step: merge
Kconfig and Makefile.
v2: also remove kfd from drm Kconfig
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
User mode queue submissions don't go through KFD. Therefore we don't
know exactly when compute is idle or not idle. We use the existence
of user mode queues on a device as an approximation.
register_process is called when the first queue of a process is
created. Conversely unregister_process is called when the last queue
is destroyed. The first process that is registered takes compute
out of idle. The last process that is unregisters sets compute back
to idle.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CU-masking allows a KFD client to control the set of CUs used by a
user mode queue for executing compute dispatches. This can be used
for optimizing the partitioning of the GPU and minimize conflicts
between concurrent tasks.
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
On Raven multiple PPRs can be queued up by the hardware. When the
first of those requests is handled by the IOMMU driver, the memory
access succeeds. After that the application may be done with the
memory and unmap it. At that point the page table entries are
invalidated, but there are still outstanding duplicate PPRs for those
addresses. When the IOMMU driver processes those duplicate requests,
it finds invalid page table entries and triggers an invalid PPR fault.
As a workaround, don't signal invalid PPR faults on Raven to avoid
segfaulting applications that haven't done anything wrong. As a side
effect, real GPU memory access faults may go unnoticed by the
application.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
On Raven Invalid PPRs (peripheral page requests) can be reported
because multiple PPRs can be still queued when memory is freed.
Apply a rate limit to avoid flooding the log in this case.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
If there are several memory banks that has the same properties in CRAT,
we aggregate them into one memory bank. This cleans up memory banks on
APUs (e.g. Raven) where the CRAT reports each memory channel as a
separate bank. This only confuses user mode, which only deals with
virtual memory.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>