KFD (kernel fusion driver) is the kernel driver
for the compute backend for usermode compute
stack.
v2: squash in updates (Alex)
v3: squash in rebase fixes (Alex)
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can't have devices that are not completely initialized in kfd topology.
Otherwise it is a race condition when user access not completely
initialized device. This also addresses a kfd_topology_add_device accessing
NULL dqm pointer issue.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
New stuff for 5.3:
- Add new thermal sensors for vega asics
- Various RAS fixes
- Add sysfs interface for memory interface utilization
- Use HMM rather than mmu notifier for user pages
- Expose xgmi topology via kfd
- SR-IOV fixes
- Fixes for manual driver reload
- Add unique identifier for vega asics
- Clean up user fence handling with UVD/VCE/VCN blocks
- Convert DC to use core bpc attribute rather than a custom one
- Add GWS support for KFD
- Vega powerplay improvements
- Add CRC support for DCE 12
- SR-IOV support for new security policy
- Various cleanups
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190529220944.14464-1-alexander.deucher@amd.com
On device initialization, KFD allocates all (64) gws which
is shared by all KFD processes.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the VegaM information to KFD
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Existing QUEUE_TYPE_SDMA means PCIe optimized SDMA queues.
Introduce a new QUEUE_TYPE_SDMA_XGMI, which is optimized
for non-PCIe transfer such as XGMI.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix compute profile switching on process termination.
Add a dedicated reference counter to keep track of entry/exit to/from
compute profile. This enables switching compute profiles for other
reasons than process creation or termination.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix compute profile switching on process termination.
Add a dedicated reference counter to keep track of entry/exit to/from
compute profile. This enables switching compute profiles for other
reasons than process creation or termination.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This was added to amdgpu but was missed in amdkfd
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.rg
Method of getting firmware version is the same across ASICs, so remove
them from ASIC-specific files and create one in amdgpu_amdkfd.c. This new
created get_fw_version simply reads fw_version from adev->gfx than parsing
the ucode header.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
RAS ECC event will combine with GPU reset event, due to
ECC interrupts are caused by uncorrectable error that triggers
GPU reset.
v2: Fix misleading-indentation warning
v3: fix build with CONFIG_HSA_AMD disabled
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
New vega10 ids.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Add Vega12 and Polaris12 device info and device IDs to KFD.
Signed-off-by: Gang Ba <gaba@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add amdgpu_amdkfd_ prefix to amdgpu functions served for amdkfd usage.
v2: fix indentation
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After amdkfd module is merged into amdgpu, KFD can call amdgpu directly
and no longer needs to use the function pointer. Replace those function
pointers with functions if they are not ASIC dependent.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Firmware have the workaround to replace the atomic Ops with read-modify-write on CP side.
User should not expect atomic Ops on system memory works normally if system didn't not
support it.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-By: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add Vega20 device IDs, device info and enable it in KFD.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Vega20 supports 8 SDMA queues per engine
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Also save the version in struct kfd_dev so we only need to query
it once.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KFD module doesn't support TONGA SRIOV, if init KFD module in TONGA SRIOV
environment, it will let compute ring IB test fail.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the flags of properties according to Asic type and pcie
capabilities.
Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CWSR fails on Raven if the control stack is MTYPE_UC, which is used
for regular GART mappings. As a workaround we map it using MTYPE_NC.
The MEC firmware expects the control stack at one page offset from the
start of the MQD so it is part of the MQD allocation on GFXv9. AMDGPU
added a memory allocation flag just for this purpose.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Thunk will generate the XGMI topology information when necessary with the hive_id
for each specified device
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add DID and kfd_device_info for Raven.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
On Raven there is only one SDMA engine instead of previously assumed two,
so we need to adapt our code to this new scenario.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Lock KFD and evict existing queues on reset. Notify user mode by
signaling hw_exception events.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Upon VM Fault, the VMID and PASID written by HW are zeros in
Hawaii. Instead of reading from ih_ring_entry, read directly
from the registers. This workaround fix the soft hang issues
caused by mishandled VM Fault in Hawaii.
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This is no longer needed with the memalloc_nofs_save/restore in
dqm_lock/unlock.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Since the assembly code is inside "#if 0", it is ineffective. Despite that,
during debugging, we need to change the assembly code, extract it into
a separate file and compile the new file into hex values using sp3.
That process also requires us to remove "#if 0" and modify lines starting
with "#", so that sp3 can successfully compile the new file.
With this change, all the above chore is no longer needed, and
cwsr_trap_handler_gfx*.asm can be directly used by sp3 to generate its
hex values.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
* Report 64-bit doorbells as HSA_CAP_DOORBELL_TYPE_2_0 in topology
* Report cache information in topology (duplicates GFXv8 info for now)
* Add device info for Vega10 support in KFD
Raven is not enabled at this time as it needs additional changes in
DQM to work with a single SDMA engine.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Report failure to enable atomics only on GPUs that require them.
This allows GPUs that don't require atomics to function, but can
benefit if they are available. This is the case for Vega10, which
doesn't use atomics for basic functioning of the MEC, AQL and HWS
microcode. So it can work without atomics. But shader programs can
still use atomic instructions on systems that support PCIe atomics.
Signed-off-by: welu <Wei.Lu2@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This prepares for GFXv9 (Vega10), which has 64-bit doorbells.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
These interfaces allow KGD to stop and resume all GPU user mode queue
access to a process address space. This is needed for handling MMU
notifiers of userptrs mapped for GPU access in KFD VMs.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When an MMU notifier runs in memory reclaim context, it can deadlock
trying to take locks that are already held in the thread causing the
memory reclaim. The solution is to avoid memory reclaim while holding
locks that are taken in MMU notifiers by using GFP_NOIO.
This commit fixes memory allocations done while holding the dqm->lock
which is needed in the MMU notifier (dqm->ops.evict_process_queues).
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When the TTM memory manager in KGD evicts BOs, all user mode queues
potentially accessing these BOs must be evicted temporarily. Once
user mode queues are evicted, the eviction fence is signaled,
allowing the migration of the BO to proceed.
A delayed worker is scheduled to restore all the BOs belonging to
the evicted process and restart its queues.
During suspend/resume of the GPU we also evict all processes to allow
KGD to save BOs in system memory, since VRAM will be lost.
v2:
* Account for eviction when updating of q->is_active in MQD manager
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
ASIC information. Also allow building KFD without IOMMUv2 support.
This is still useful for dGPUs and prepares for enabling KFD on
architectures that don't support AMD IOMMUv2.
v2:
* Centralize IOMMUv2 code to avoid #ifdefs in too many places
v3:
* Imply AMD_IOMMU_V2 in Kconfig
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Some dGPUs don't support HWS. Allow them to use a per-device
sched_policy that may be different from the global default.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This will be needed for most dGPUs.
CC: linux-pci@vger.kernel.org
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Allow HWS to to execute multiple processes on the hardware
concurrently. The number of concurrent processes is limited by
the number of VMIDs allocated to the HWS.
A module parameter can be used for limiting this further or turn
it off altogether (mainly for debugging purposes).
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This hardware feature allows the GPU to preempt shader execution in
the middle of a compute wave, save the state and restore it later
to resume execution.
Memory for saving the state is allocated per queue in user mode and
the address and size passed to the create_queue ioctl. The size
depends on the number of waves that can be in flight simultaneously
on a given ASIC.
Signed-off-by: Shaoyun.liu <shaoyun.liu@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In systems under heavy load the IH work may experience significant
scheduling delays.
Under load + system workqueue:
Max Latency: 7.023695 ms
Avg Latency: 0.263994 ms
Under load + high priority workqueue:
Max Latency: 1.162568 ms
Avg Latency: 0.163213 ms
Further work is required to measure the impact of per-cpu settings on IH
performance.
Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The hard-coded values related to VMID were removed in KFD, as those
values can be calculated in the KFD initialization function.
v2: remove unnecessary local variable
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When we do suspend/resume through "sudo pm-suspend" while there is
HSA activity running, upon resume we will encounter HWS hanging, which
is caused by memory read/write failures. The root cause is that when
suspend, we neglected to unbind pasid from kfd device.
Another major change is that the bind/unbinding is changed to be
performed on a per process basis, instead of whether there are queues
in dqm.
v2:
- free IOMMU device if kfd_bind_processes_to_device fails in kfd_resume
- add comments to kfd_bind/unbind_processes_to/from_device
- minor cleanups
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The idea is to let kfd init and resume function share the same code path
as much as possible, rather than to have two copies of almost identical
code. That way improves the code readability and maintainability.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
PASID management is moving into KGD. Limiting the PASID range to the
number of doorbell pages is no longer practical.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To match current firmware. The map process packet has been extended
to support scratch. This is a non-backwards compatible change and
it's about two years old. So no point keeping the old version around
conditionally.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
v2: Turned WARN into dev_warn and made the message more helpful
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In most cases, BUG_ONs can be replaced with WARN_ON with an error
return. In some void functions just turn them into a WARN_ON and
possibly an early exit.
v2:
* Cleaned up error handling in pm_send_unmap_queue
* Removed redundant WARN_ON in kfd_process_destroy_delayed
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
gtt_sa_bitmap is accessed by bitmap functions, which operate on longs.
Therefore the array should be allocated in long units. Also round up
in case the number of bits is not a multiple of BITS_PER_LONG.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Remove BUG_ONs that check for NULL pointer arguments that are
dereferenced in the same function. Dereferencing the NULL pointer
will generate a BUG anyway, so the explicit check is redundant and
unnecessary overhead.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Upstream prefers the !x notation to x==NULL or x==false. Along those lines
change the ==true or !=NULL references as well. Also make the references
to !x the same, excluding () for readability.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Consolidate log commands so that dev_info(NULL, "Error...") uses the more
accurate pr_err, remove the module name from the log (can be seen via
dynamic debugging with +m), and the function name (can be seen via
dynamic debugging with +f). We also don't need debug messages saying
what function we're in. Those can be added by devs when needed
Don't print vendor and device ID in error messages. They are typically
the same for all GPUs in a multi-GPU system. So this doesn't add any
value to the message.
Lastly, remove parentheses around %d, %i and 0x%llX.
According to kernel.org:
"Printing numbers in parentheses (%d) adds no value and should be
avoided."
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Using checkpatch.pl -f <file> showed a number of style issues. This
patch addresses as many of them as possible. Some long lines have been
left for readability, but attempts to minimize them have been made.
v2: Broke long lines in gfx_v7 get_fw_version
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Update the KGD to KFD interface to allow sharing pipes with queue
granularity instead of pipe granularity.
This allows for more interesting pipe/queue splits.
v2: fix overflow check for res.queue_mask
v3: fix shift overflow when setting res.queue_mask
v4: fix comment in is_pipeline_enabled()
v5: clamp res.queue_mask to the first MEC only
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds the PCI IDs of supported CZ devices to the
supported_devices structure in amdkfd. That structure is used during the
amdkfd probing stage, to check if the currently probed device is eligible
to be handled by amdkfd.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds two missing properties initializations to the device
info structure of CZ.
As we don't have CZ support yet, it isn't critical, but its important to
fix this now instead of forgetting about it later.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the skeleton H/W debugger module support. This code
enables registration and unregistration of a single HSA process at a
time.
The module saves the process's pasid and use it to verify that only the
registered process is allowed to execute debugger operations through the
kernel driver.
v2: rename get_dbgmgr_mutex to kfd_get_dbgmgr_mutex to namespace it
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds support for static user-mode queues in QCM.
Queues which are designated as static can NOT be preempted by
the CP microcode when it is executing its scheduling algorithm.
This is needed for supporting the debugger feature, because we
can't allow the CP to preempt queues which are currently being debugged.
The number of queues that can be designated as static is limited by the
number of HQDs (Hardware Queue Descriptors).
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds Peripheral Page Request (PPR) failure processing
and reporting.
Bad address or pointer to a system memory block with inappropriate
read/write permission cause such PPR failure during a user queue
processing. PPR request handling is done by IOMMU driver notifying
AMDKFD module on PPR failure.
The process triggering a PPR failure will be notified by
appropriate event or SIGTERM signal will be sent to it.
v3:
- Change all bool fields in struct kfd_memory_exception_failure to
uint32_t
Signed-off-by: Alexey Skidanov <alexey.skidanov@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the events module (kfd_events.c) and the interrupt
handle module for Kaveri (cik_event_interrupt.c).
The patch updates the interrupt_is_wanted(), so that it now calls the
interrupt isr function specific for the device that received the
interrupt. That function(implemented in cik_event_interrupt.c)
returns whether this interrupt is of interest to us or not.
The patch also updates the interrupt_wq(), so that it now calls the
device's specific wq function, which checks the interrupt source
and tries to signal relevant events.
v2:
Increase limit of signal events to 4096 per process
Remove bitfields from struct cik_ih_ring_entry
Rename radeon_kfd_event_mmap to kfd_event_mmap
Add debug prints to allocate_free_slot and allocate_signal_page
Make allocate_event_notification_slot return a correct value
Add warning prints to create_signal_event
Remove error print from IOCTL path
Reformatted debug prints in kfd_event_mmap
Map correct size (as received from mmap) in kfd_event_mmap
v3:
Reduce limit of signal events back to 256 per process
Fix allocation of kernel memory for signal events
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the interrupt handling module, kfd_interrupt.c, and its
related members in different data structures to the amdkfd driver.
The amdkfd interrupt module maintains an internal interrupt ring
per amdkfd device. The internal interrupt ring contains interrupts
that needs further handling. The extra handling is deferred to
a later time through a workqueue.
There's no acknowledgment for the interrupts we use. The hardware
simply queues a new interrupt each time without waiting.
The fixed-size internal queue means that it's possible for us to lose
interrupts because we have no back-pressure to the hardware.
However, only interrupts that are "wanted" by amdkfd, are copied into
the amdkfd s/w interrupt ring, in order to minimize the chances
for overflow of the ring.
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The current code can only support one kgd instance. We have to
support multiple kgd instances in one system. i.e two amdgpu or two
radeon or one amdgpu + one radeon or more than two kgd instances.
Signed-off-by: Xihan Zhang <xihan.zhang@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This backmerges drm-fixes into drm-next mainly for the amdkfd
stuff, I'm not 100% confident, but it builds and the amdkfd
folks can fix anything up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Conflicts:
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
Backmerge Linus tree after rc5 + drm-fixes went in.
There were a few amdkfd conflicts I wanted to avoid,
and Ben requested this for nouveau also.
Conflicts:
drivers/gpu/drm/amd/amdkfd/Makefile
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
drivers/gpu/drm/amd/include/kgd_kfd_interface.h
drivers/gpu/drm/i915/intel_runtime_pm.c
drivers/gpu/drm/radeon/radeon_kfd.c
This patch replaces the two current amdkfd module parameters with a new one.
The current parameters that are being replaced are:
- Maximum number of HSA processes
- Maximum number of queues per process
The new parameter that replaces them is called "Maximum queues per device"
This replacement achieves two goals:
- Allows the user to have as many HSA processes as it wants (until
a maximum of 512 HSA processes in Kaveri).
- Removes the limitation the user had on maximum number of queues per HSA
process. E.g. the user can now have processes which only have one queue and
other processes which have hundreds of queues, while before the user
couldn't have more than 128 queues per process (as default).
The default value of the new parameter is 4096 (32 * 128, which were the
defaults of the old parameters). There is almost no additional GART memory
required for the default case. As a reminder, this amount of queues requires a
little bit below 4MB of GART memory.
v2:
In addition, This patch defines a new counter for queues accounting in the DQM
structure. This is done because the current counter only counts active queues
which allows the user to create more queues than the
max_num_of_queues_per_device module parameter allows.
However, we need the current counter for the runlist packet build process, so
the solution is to have a dedicated counter for this accounting.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Ben Goz <ben.goz@amd.com>
The work queue couldn't reliably prevent the SW ring buffer from
overflowing, so dmesg was spammed by
kfd kfd: Interrupt ring overflow, dropping interrupt.
messages when running e.g. the Atlantis Substance demo from
https://wiki.unrealengine.com/Linux_Demos on Kaveri.
Since the SW ring buffer doesn't actually do anything at this point, just
remove it for now. When actual interrupt processing code is added to
amdkfd, it should try to do things immediately and only defer to work
queues when necessary.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch does some re-org on the device_queue_manager structure. It takes out
all the function pointers from the structure and puts them in a new structure,
called device_queue_manager_ops. Then, it puts an instance of that structure
inside device_queue_manager.
This re-org is done to prepare the DQM module to support more than one AMD APU
(Kaveri).
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Instead of creating a BUG if trying to free a NULL GART sub-allocation object,
just return 0 (success).
This is done to mirror behavior of kfree.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds a new property to kfd_device_info structure. That structure
holds information that is H/W specific.
The new property is called asic_family and its purpose is to distinguish
between different asic families in amdkfd operations, mainly in QCM (queue
control & management)
This patch also adds a new enum, to select different ASICs. We set the current
kfd_device_info instance as Kaveri and create a new instance which describes
the new AMD APU, codenamed 'Carrizo'.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch changes the calls to allocate the gart memory for amdkfd from the
old interface (radeon_sa) to the new one (kfd_gtt_sa)
The new gart sub-allocator is initialized with chunk size equal to 512 bytes.
This is because the KV MQD is 512 Bytes and most of the sub-allocations are
MQDs.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch makes the gart's buffer size calculation more accurate. This buffer
is needed per GPU.
It takes into account maximum number of MQDs, runlist packets, kernel queues
and reserves 512KB for other misc allocations.
The total size is just shy of 4MB, for 32 processes and 128 queues per
process, which are the defaults for amdkfd kernel module parameters.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds new kfd gtt sub-allocator functions that service the amdkfd
driver when it wants to use gtt memory.
The sub-allocator uses a bitmap to handle the memory area that was transferred
to it during init. It divides the memory area into chunks, according to chunk
size parameter.
The allocation function will allocate contiguous chunks from that memory area,
according to the requested size. If the requested size is smaller than the
chunk size, a single chunk will be allocated.
v2: Do some more verifications on parameters that are passed into
kfd_gtt_sa_init()
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds the number of watch points to the node capabilities in the
topology module
Signed-off-by: Alexey Skidanov <Alexey.Skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds the interrupt handling module, in kfd_interrupt.c, and its
related members in different data structures to the amdkfd driver.
The amdkfd interrupt module maintains an internal interrupt ring per amdkfd
device. The internal interrupt ring contains interrupts that needs further
handling. The extra handling is deferred to a later time through a workqueue.
There's no acknowledgment for the interrupts we use. The hardware simply queues
a new interrupt each time without waiting.
The fixed-size internal queue means that it's possible for us to lose
interrupts because we have no back-pressure to the hardware.
v3:
Move amdkfd from drm/radeon/ to drm/amd/
Change device init
Made sure spin lock is taken only if init is complete
Moved bool field to the end of the structure
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The queue scheduler divides into two sections, one section is process bounded
and the other section is device bounded.
The device bounded section is handled by this module.
The DQM module handles queue setup, update and tear-down from the device side.
It also supports suspend/resume operation.
v3: Changed device_init, added the use of the new gart allocation functions an
Added documentation.
v4:
Fixed a race in DQM queue scheduler where dqm->lock must be held when accessing
dqm->queue_count and dqm->processes_count. This fixes runlist IB allocation
failures when DQM is under load.
Fixed race in DQM queue destruction where queues being destroyed must be
removed from qpd->queues_list prior to preemption, or concurrent queue
creation activity may reschedule them while their MQD is destroyed.
Fixed EOP queue size setting in CP_HPD_EOP_CONTROL, because the size is
specified as (log2(size_dwords)-1). The previous calculation assumed the
size was specified in bytes, which caused interference between EOP queues
when multiple MEC pipelines were active.
v5:
Move amdkfd from drm/radeon/ to drm/amd/
Change format of mqd structure to match latest KV firmware
Add support for AQL queues creation to enable working with open-source HSA
runtime
Remove unused unmap_queue function
Various fixes (Style, typos)
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds the functions to bind and unbind pasid
from a device through the amd_iommu driver.
The unbind function is called when the mm_struct of the
process is released.
The bind function is not called here because it is called
only in the IOCTLs which are not yet implemented at this
stage of the patchset.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds the process module and three helper modules:
- kfd_process, which handles process which open /dev/kfd
- kfd_doorbell, which provides helper functions for doorbell allocation,
release and mapping to userspace
- kfd_pasid, which provides helper functions for pasid allocation and release
- kfd_aperture, which provides helper functions for managing the LDS, Local GPU
memory and Scratch memory apertures of the process
This patch only contains the basic kfd_process module, which doesn't contain
the reference to the queue scheduler. This was done to allow easier code review.
Also, this patch doesn't contain the calls to the IOMMU driver for binding the
pasid to the device. Again, this was done to allow easier code review
The kfd_process object is created when a process opens /dev/kfd and is closed
when the mm_struct of that process is teared-down.
v3:
Removed kfd_vidmem.c file
Replaced direct mmput call to mmu_notifier release
Removed typedefs
Moved bool field to end of the structure
Added new kernel params for gart usage limitation
Added initialization of sa manager
Fixed debug messages
Remove support for LDS in 32 bit
Changed code to support mmap of doorbell pages from userspace
Added documentation for apertures
v4: Replaced RCU by SRCU for kfd_process list management
v5:
Move amdkfd from drm/radeon/ to drm/amd/
Rename kfd_aperture.c to kfd_flat_memory.c
Protect against multiple init calls
MQD size is H/W dependent so moved it to device info structure
Rename kfd_mem_obj structure's members
Use delayed function for process tear-down
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds the topology module to the driver. The topology is exposed to
userspace through the sysfs.
The calls to add and remove a device to/from topology are done by the radeon
driver.
v3:
The CPU information, that is provided in the topology section of the amdkfd
driver, is extracted from the CRAT table. Unlike the CPU information located
in /sys/devices/system/cpu/cpu*, which is extracted from the SRAT table.
While the CPU information provided by the CRAT and the SRAT tables might be
identical, the node topology might be different. The SRAT table contains the
topology of CPU nodes only. The CRAT table contains the topology of CPU and GPU
nodes together (and can be interleaved). For example CPU node 1 in SRAT can be
CPU node 3 in CRAT. Furthermore it's worth to mention that the CRAT table
contains only HSA compatible nodes (nodes which are compliant with the HSA
spec).
To recap, amdkfd exposes a different kind of topology than the one exposed by
/sys/devices/system/cpu/cpu even though it may contain similar information.
v4:
The topology module doesn't support uevent handling and doesn't notify the
userspace about runtime modifications. It is up to the userspace to acquire
snapshots of the topology information created by the amdkfd and exposed
in sysfs.
The following is an example of how the topology looks on a Kaveri A10-7850K
system with amdkfd installed:
/sys/devices/virtual/kfd/kfd/
|
--- topology/
|
|--- generation_id
|--- system_properties
|--- nodes/
|
|--- 0/
|
|--- gpu_id
|--- name
|--- properties
|--- caches/
|
|--- 0/
|
|--- properties
|--- 1/
|
|--- properties
|--- 2/
|
|--- properties
|--- io_links/
|
|--- mem_banks/
|
|--- 0/
|
|--- properties
|--- 1/
|
|--- properties
|--- 2/
|
|--- properties
|--- 3/
|
|--- properties
v5:
Move amdkfd from drm/radeon/ to drm/amd/
Add a check if dev->gpu pointer is null before accessing it in the
node_show function in kfd_topology.c
This situation may occur when amdkfd is loaded and there is a GPU with a CRAT
table, but that GPU isn't supported by amdkfd
Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds the amdkfd skeleton driver. The driver does nothing except
define a /dev/kfd device.
It returns -ENODEV on all amdkfd IOCTLs.
v3: Move bool field to the end of structure, removed the pmc ioctls and added
a meaningful error message for ioctl error.
v5:
Create a new folder drm/amd and move amdkfd from drm/radeon/ to drm/amd/
Remove scheduler_class from kfd_priv.h as it was never used
Add skeleton implementation of the Get Version IOCTL
v6:
Update module version to the correct number and remove the "default m" from the
Kconfig file
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>