linux

Author	SHA1	Message	Date
Philip Yang	dcdb4d904b	drm/amdkfd: fix sysfs kobj leak 3 cases of kobj leak, which causes memory leak: kobj_type must have release() method to free memory from release callback. Don't need NULL default_attrs to init kobj. sysfs files created under kobj_status should be removed with kobj_status as parent kobject. Remove queue sysfs files when releasing queue from process MMU notifier release callback. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-30 00:18:23 -04:00
Philip Yang	75ae84c89b	drm/amdkfd: add helper function for kfd sysfs create No functionality change. Modify kfd_sysfs_create_file to use kobject as parameter, so it becomes common helper function to remove duplicate code and will simplify new kfd sysfs file create in future. Move pr_warn to helper function if sysfs file create failed. Set helper function as void return because caller doesn't use the helper function return value. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-30 00:18:23 -04:00
xinhui pan	56f221b638	drm/amdkfd: Walk through list with dqm lock hold To avoid any list corruption. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-18 17:14:13 -04:00
Eric Huang	c9cfbf7f44	drm/amdkfd: Set iolink non-coherent in topology Fix non-coherent bit of iolink properties flag which always is 0. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-18 17:11:17 -04:00
Amber Lin	a7b2451d31	drm/amdkfd: Fix circular lock in nocpsch path Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a circular lock. destroy_queue_nocpsch_locked is called under a DQM lock, which is taken in MMU notifiers, potentially in FS reclaim context. Taking another lock, which is BO reservation lock from free_mqd, while causing an FS reclaim inside the DQM lock creates a problematic circular lock dependency. Therefore move free_mqd out of destroy_queue_nocpsch_locked and call it after unlocking DQM. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-15 17:25:42 -04:00
Nirmoy Das	391629bdfc	drm/amdgpu: remove amdgpu_vm_pt Page table entries are now in embedded in VM BO, so we do not need struct amdgpu_vm_pt. This patch replaces struct amdgpu_vm_pt with struct amdgpu_vm_bo_base. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-15 17:25:42 -04:00
Felix Kuehling	5a75ea56e3	drm/amdkfd: Disable SVM per GPU, not per process When some GPUs don't support SVM, don't disabe it for the entire process. That would be inconsistent with the information the process got from the topology, which indicates SVM support per GPU. Instead disable SVM support only for the unsupported GPUs. This is done by checking any per-device attributes against the bitmap of supported GPUs. Also use the supported GPU bitmap to initialize access bitmaps for new SVM address ranges. Don't handle recoverable page faults from unsupported GPUs. (I don't think there will be unsupported GPUs that can generate recoverable page faults. But better safe than sorry.) Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-15 17:25:41 -04:00
Jonathan Kim	63f6e01237	drm/amdkfd: fix circular locking on get_wave_state get_wave_state acquires the mmap_lock on copy_to_user but so do mmu_notifiers. mmu_notifiers allows dqm locking so do get_wave_state outside the dqm_lock to prevent circular locking. v2: squash in unused variable removal. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-15 17:25:40 -04:00
Alex Sierra	7642c56a20	drm/amdkfd: move CoherentHostAccess prop to HSA_CAPABILITY CoherentHostAccess flag support has moved from HSA_MEMORYPROPERTY to HSA_CAPABILITY struct. Proper changes have made also at the thunk to support this change. CoherentHostAccess: whether or not device memory can be coherently accessed by the host CPU. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-11 16:05:27 -04:00
Eric Huang	3be4dca197	drm/amdkfd: Add memory sync before TLB flush on unmap It is to fix a failure for SDMA updating PTEs. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-11 16:03:40 -04:00
Dave Airlie	c707b73f0c	Merge tag 'amd-drm-next-5.14-2021-06-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-06-09: amdgpu: - SR-IOV fixes - Smartshift updates - GPUVM TLB flush updates - 16bpc fixed point display fix for DCE11 - BACO cleanups and core refactoring - Aldebaran updates - Initial Yellow Carp support - RAS fixes - PM API cleanup - DC visual confirm updates - DC DP MST fixes - DC DML fixes - Misc code cleanups and bug fixes amdkfd: - Initial Yellow Carp support radeon: - memcpy_to/from_io fixes UAPI: - Add Yellow Carp chip family id Used internally in the kernel driver and by mesa Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210610031649.4006-1-alexander.deucher@amd.com	2021-06-10 13:47:13 +10:00
Dave Airlie	09b020bb05	Merge tag 'drm-misc-next-2021-06-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.14: UAPI Changes: * drm/panfrost: Export AFBC_FEATURES register to userspace Cross-subsystem Changes: * dma-buf: Fix debug printing; Rename dma_resv_() functions + changes in callers; Cleanups Core Changes: Add prefetching memcpy for WC * Avoid circular dependency on CONFIG_FB * Cleanups * Documentation fixes throughout DRM * ttm: Make struct ttm_resource the base of all managers + changes in all users of TTM; Add a generic memcpy for page-based iomem; Remove use of VM_MIXEDMAP; Cleanups Driver Changes: * drm/bridge: Add TI SN65DSI83 and SN65DSI84 + DT bindings * drm/hyperv: Add DRM driver for HyperV graphics output * drm/msm: Fix module dependencies * drm/panel: KD53T133: Support rotation * drm/pl111: Fix module dependencies * drm/qxl: Fixes * drm/stm: Cleanups * drm/sun4i: Be explicit about format modifiers * drm/vc4: Use struct gpio_desc; Cleanups * drm/vgem: Cleanups * drm/vmwgfx: Use ttm_bo_move_null() if there's nothing to copy * fbdev/mach64: Cleanups * fbdev/mb862xx: Use DEVICE_ATTR_RO Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YMBw3DF2b9udByfT@linux-uq9g	2021-06-10 11:28:09 +10:00
Wan Jiabing	272d57c3aa	drm/amdkfd: remove duplicate include of kfd_svm.h kfd_svm.h is included duplicately in commit `42de677f79` ("drm/amdkfd: register svm range"). After checking possible related header files, remove the former one to make the code format more reasonable. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-07 14:57:52 -04:00
Hawking Zhang	4a1d4b6d38	drm/amdkfd: add sdma poison consumption handling Follow the same apporach as GFX to handle SDMA poison consumption. Send SIGBUS to application when receives SDMA_ECC interrupt and issue gpu reset either mode 2 or mode 1 to get the engine back Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li<dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-07 14:57:24 -04:00
Philip Yang	0dc2bafb08	drm/amdkfd: pages_addr offset must be 0 for system range prange->offset is for VRAM range mm_nodes, if multiple ranges share same mm_nodes, migrate range back to VRAM will reuse the VRAM at offset of the same mm_nodes. For system memory pages_addr array, the offset is always 0, otherwise, update GPU mapping will use incorrect system memory page, and cause system memory corruption. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-07 14:57:18 -04:00
Aaron Liu	bf9d4e88c2	drm/amdkfd: add yellow carp KFD support This patch is to add GFX10 based Yellow Carp KFD support. We will bypass IOMMU v2. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-04 16:03:09 -04:00
Eric Huang	31f3324378	drm/amdkfd: Make TLB flush conditional on mapping It is to optimize memory mapping latency, and also aviod a page fault in a corner case of changing valid PDE into PTE. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-04 12:40:01 -04:00
Eric Huang	1098d658be	drm/amdkfd: Add heavy-weight TLB flush after unmapping It is a part of memory mapping optimization. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-04 12:40:00 -04:00
Eric Huang	3543b055b8	drm/amdkfd: Add flush-type parameter to kfd_flush_tlb It is to provide more tlb flush types option for different case scenario. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-06-04 12:40:00 -04:00
Christian König	2fdcb55dfc	drm/amdkfd: use resource cursor in svm_migrate_copy_to_vram v2 Access to the mm_node is now forbidden. So instead of hand wiring that use the cursor functionality. v2: fix handling as pointed out by Philip. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by and Tested-by: Philip Yang <philip.yang@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210602100914.46246-5-christian.koenig@amd.com	2021-06-04 15:16:46 +02:00
Dave Airlie	5745d647d5	Merge tag 'amd-drm-next-5.14-2021-06-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-06-02: amdgpu: - GC/MM register access macro clean up for SR-IOV - Beige Goby updates - W=1 Fixes - Aldebaran fixes - Misc display fixes - ACPI ATCS/ATIF handling rework - SR-IOV fixes - RAS fixes - 16bpc fixed point format support - Initial smartshift support - RV/PCO power tuning fixes for suspend/resume - More buffer object subclassing work - Add new INFO query for additional vbios information - Add new placement for preemptable SG buffers amdkfd: - Misc fixes radeon: - W=1 Fixes - Misc cleanups UAPI: - Add new INFO query for additional vbios information Useful for debugging vbios related issues. Proposed umr patch: https://patchwork.freedesktop.org/patch/433297/ - 16bpc fixed point format support IGT test: https://lists.freedesktop.org/archives/igt-dev/2021-May/031507.html Proposed Vulkan patch: `a25d480207` - Add a new GEM flag which is only used internally in the kernel driver. Userspace is not allowed to set it. drm: - 16bpc fixed point format fourcc Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210602214009.4553-1-alexander.deucher@amd.com	2021-06-04 06:13:57 +10:00
Christian König	d3116756a7	drm/ttm: rename bo->mem and make it a pointer When we want to decouble resource management from buffer management we need to be able to handle resources separately. Add a resource pointer and rename bo->mem so that all code needs to change to access the pointer instead. No functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210430092508.60710-4-christian.koenig@amd.com	2021-06-02 11:07:25 +02:00
Thomas Zimmermann	304ba5dca4	Merge drm/drm-next into drm-misc-next Backmerging from drm/drm-next to the patches for AMD devices for v5.14. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2021-05-22 07:17:05 +02:00
Guenter Roeck	6a593769c7	drm/amd/amdkfd: Drop unnecessary NULL check after container_of The first parameter passed to container_of() is the pointer to the work structure passed to the worker and never NULL. The NULL check on the result of container_of() is therefore unnecessary and misleading. Remove it. This change was made automatically with the following Coccinelle script. @@ type t; identifier v; statement s; @@ <+... ( t v = container_of(...); \| v = container_of(...); ) ... when != v - if (\( !v \\| v == NULL \) ) s ...+> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-21 18:18:04 -04:00
Dave Airlie	9a91e5e0af	Merge tag 'amd-drm-next-5.14-2021-05-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-05-21: amdgpu: - RAS fixes - SR-IOV fixes - More BO management cleanups - Aldebaran fixes - Display fixes - Support for new GPU, Beige Goby - Backlight fixes amdkfd: - RAS fixes - DMA mapping fixes - HMM SVM fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210521045743.4047-1-alexander.deucher@amd.com	2021-05-21 15:59:05 +10:00
Dave Airlie	c99c4d0ca5	Merge tag 'amd-drm-next-5.14-2021-05-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-05-19: amdgpu: - Aldebaran updates - More LTTPR display work - Vangogh updates - SDMA 5.x GCR fixes - RAS fixes - PCIe ASPM support - Modifier fixes - Enable TMZ on Renoir - Buffer object code cleanup - Display overlay fixes - Initial support for multiple eDP panels - Initial SR-IOV support for Aldebaran - DP link training refactor - Misc code cleanups and bug fixes - SMU regression fixes for variable sized arrays - MAINTAINERS fixes for amdgpu amdkfd: - Initial SR-IOV support for Aldebaran - Topology fixes - Initial HMM SVM support - Misc code cleanups and bug fixes radeon: - Misc code cleanups and bug fixes - SMU regression fixes for variable sized arrays - Flickering fix for Oland with multiple 4K displays UAPI: - amdgpu: Drop AMDGPU_GEM_CREATE_SHADOW flag. This was always a kernel internal flag and userspace use of it has always been blocked. It's no longer needed so remove it. - amdkgd: HMM SVM support Overview: https://patchwork.freedesktop.org/series/85562/ Porposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/fxkamd/hmm-wip Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210520031258.231896-1-alexander.deucher@amd.com	2021-05-21 15:29:40 +10:00
Andrey Grodzovsky	e9669fb782	drm/amdgpu: Add early fini callback Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. v7: Split kfd suspend from device exit to expdite HW related stuff to amdgpu_pci_remove v8: Squash previous KFD commit into this commit to avoid compile break. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com	2021-05-19 23:48:50 -04:00
Dennis Li	96b62c8aa4	drm/amdkfd: fix a resource leakage issue The function kfd_lookup_process_by_pasid will increase the reference count of kfd_process object, its caller should call kfd_unref_process to decrease the reference count. Otherwise resource leakage will happen. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:44:12 -04:00
Chengming Gui	c86eb51705	drm/amdkfd: add kfd2kgd funcs for beige_goby kfd support Add the function pointer. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:40:42 -04:00
Chengming Gui	5cf607cc35	drm/amdkfd: support beige_goby KFD Add KFD support for beige_goby v2: fix asic name typo v3: squash in updates (Alex) v4: squash in needs_atomics fix (Alex) Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:40:40 -04:00
Philip Yang	765385ec00	drm/amdkfd: heavy-weight flush TLB after unmap Need do a heavy-weight TLB flush to make sure we have no more dirty data in the cache for the unmapped pages. Define enum TLB_FLUSH_TYPE, add flush_type parameter to amdgpu_amdkfd_flush_gpu_tlb_pasid. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:37:50 -04:00
Philip Yang	7a3ae1e249	Revert "drm/amdkfd: flush TLB after updating GPU page table" This reverts commit `1704ac8e43`. After "drm/amdgpu: flush TLB if valid PDE turns into PTE" is checked in, this workaround is not needed. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:34:01 -04:00
Philip Yang	bf546940d5	drm/amdgpu: flush TLB if valid PDE turns into PTE Mapping huge page, 2MB aligned address with 2MB size, uses PDE0 as PTE. If previously valid PDE0, PDE0.V=1 and PDE0.P=0 turns into PTE, this requires TLB flush, otherwise page table walker will not read updated PDE0. Change page table update mapping to return table_freed flag to indicate the previously valid PDE may have turned into a PTE if page table is freed. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:33:54 -04:00
Christian König	0ccc3ccf5b	drm/amdgpu: re-apply "use the new cursor in the VM code" v2 Now that we found the underlying problem we can re-apply this patch. This reverts commit `6b44b667e2`. v2: rebase on KFD changes Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:30:21 -04:00
Felix Kuehling	2b2339eeaf	drm/amdgpu: Albebaran: MTYPE_NC for coarse-grain remote memory MTYPE UC was used for a specific use case that ended up not being implemented. Use NC for better performance for coarse-grained memory where cache coherence during shader execution is not required. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:29:56 -04:00
Felix Kuehling	0c6f7777cf	drm/amdgpu: Arcturus: MTYPE_NC for coarse-grain remote memory MTYPE UC was used for a specific use case that ended up not being implemented. Use NC for better performance for coarse-grained memory where cache coherence during shader execution is not required. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:29:53 -04:00
Dennis Li	e2b1f9f52b	drm/amdkfd: refine the poison data consumption handling The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison data consumed. Beside that, some applications maybe register SIGBUS signal hander. These applications will handle poison data by themselves, exit or re-create context to re-dispatch works. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:29:44 -04:00
Philip Yang	a9a76beed2	drm/amdkfd: new range accessible by all GPUs If xnack is on, new range is created to recover retry vm fault or created by SVM API calls, set all GPUs have access to the range. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-19 22:29:37 -04:00
Philip Yang	04fe3fd10e	drm/amdkfd: handle errors returned by svm_migrate_copy_to_vram/ram If migration copy failed because process is killed, or out of VRAM or system memory, pass error code back to caller to handle error gracefully. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:08:32 -04:00
Luben Tuikov	8ab0d6f030	drm/amdgpu: Rename to ras_*_enabled Rename, ras_hw_supported --> ras_hw_enabled, and ras_features --> ras_enabled, to show that ras_enabled is a subset of ras_hw_enabled, which itself is a subset of the ASIC capability. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:08:12 -04:00
Eric Huang	ddec8d3be0	drm/amdkfd: add ACPI SRAT parsing for topology In NPS4 BIOS we need to find the closest numa node when creating topology io link between cpu and gpu, if PCI driver doesn't set it. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:07:12 -04:00
Mike Li	74abbdedc3	drm/amdkfd: Update L1 and add L2/3 cache information The L1 cache information has been updated and the L2/L3 information has been added. The changes have been made for Vega10 and newer ASICs. There are no changes for the older ASICs before Vega10. Signed-off-by: Mike Li <Tianxinmike.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:45 -04:00
Jonathan Kim	bdd2465730	drm/amdkfd: fix no atomics settings in the kfd topology To account for various PCIe and xGMI setups, check the no atomics settings for a device in relation to every direct peer. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:45 -04:00
Felix Kuehling	2e4ec25162	drm/amdkfd: Make svm_migrate_put_sys_page static This function is only used in this source file. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:44 -04:00
Philip Yang	1704ac8e43	drm/amdkfd: flush TLB after updating GPU page table To workaround the situation that vm retry fault keep coming after page table update. We are investigating the root cause, but once this issue happens, application will stuck and sometimes have to reboot to recover. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:44 -04:00
Zhigang Luo	cecd91b4f7	drm/amdkfd: Add Aldebaran virtualization support update kfd_supported_devices to enable Aldebaran virtualization support Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Jonathan Kim	559f418ed6	drm/amdkfd: report the numa weight between host and device over xgmi GPUs connected to CPUs over xGMI are bidirectional so set weight by a single hop both ways. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Jonathan Kim	deb689832f	drm/amdkfd: report atomics support in io_links over xgmi Link atomics support over xGMI should be reported independently of PCIe. Do not set NO_ATOMICS flags on devices that support xGMI but that do not have atomics support over PCIe. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Linus Torvalds	4f9701057a	Merge tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Big cleanup of almost unsused parts of the IOMMU API by Christoph Hellwig. This mostly affects the Freescale PAMU driver. - New IOMMU driver for Unisoc SOCs - ARM SMMU Updates from Will: - Drop vestigial PREFETCH_ADDR support (SMMUv3) - Elide TLB sync logic for empty gather (SMMUv3) - Fix "Service Failure Mode" handling (SMMUv3) - New Qualcomm compatible string (SMMUv2) - Removal of the AMD IOMMU performance counter writeable check on AMD. It caused long boot delays on some machines and is only needed to work around an errata on some older (possibly pre-production) chips. If someone is still hit by this hardware issue anyway the performance counters will just return 0. - Support for targeted invalidations in the AMD IOMMU driver. Before that the driver only invalidated a single 4k page or the whole IO/TLB for an address space. This has been extended now and is mostly useful for emulated AMD IOMMUs. - Several fixes for the Shared Virtual Memory support in the Intel VT-d driver - Mediatek drivers can now be built as modules - Re-introduction of the forcedac boot option which got lost when converting the Intel VT-d driver to the common dma-iommu implementation. - Extension of the IOMMU device registration interface and support iommu_ops to be const again when drivers are built as modules. * tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (84 commits) iommu: Streamline registration interface iommu: Statically set module owner iommu/mediatek-v1: Add error handle for mtk_iommu_probe iommu/mediatek-v1: Avoid build fail when build as module iommu/mediatek: Always enable the clk on resume iommu/fsl-pamu: Fix uninitialized variable warning iommu/vt-d: Force to flush iotlb before creating superpage iommu/amd: Put newline after closing bracket in warning iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()' iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86 iommu/amd: Remove performance counter pre-initialization test Revert "iommu/amd: Fix performance counter initialization" iommu/amd: Remove duplicate check of devid iommu/exynos: Remove unneeded local variable initialization iommu/amd: Page-specific invalidations for more than one page iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command iommu/vt-d: Avoid unnecessary cache flush in pasid entry teardown iommu/vt-d: Invalidate PASID cache when root/context entry changed iommu/vt-d: Remove WO permissions on second-level paging entries iommu/vt-d: Report the right page fault address ...	2021-05-01 09:33:00 -07:00
Harish Kasiviswanathan	8baa6018b7	drm/amdkfd: Add Aldebaran gws support v2: updated MEC FW version after validating gws with debugger Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00

... 5 6 7 8 9 ...

1190 Commits