linux/drivers/gpu/drm/amd/amdkfd
Mukul Joshi d69fd951e6 drm/amdkfd: Fix circular locking dependency warning
[  150.887733] ======================================================
[  150.893903] WARNING: possible circular locking dependency detected
[  150.905917] ------------------------------------------------------
[  150.912129] kfdtest/4081 is trying to acquire lock:
[  150.917002] ffff8f7f3762e118 (&mm->mmap_sem#2){++++}, at:
                                 __might_fault+0x3e/0x90
[  150.924490]
               but task is already holding lock:
[  150.930320] ffff8f7f49d229e8 (&dqm->lock_hidden){+.+.}, at:
                                destroy_queue_cpsch+0x29/0x210 [amdgpu]
[  150.939432]
               which lock already depends on the new lock.

[  150.947603]
               the existing dependency chain (in reverse order) is:
[  150.955074]
               -> #3 (&dqm->lock_hidden){+.+.}:
[  150.960822]        __mutex_lock+0xa1/0x9f0
[  150.964996]        evict_process_queues_cpsch+0x22/0x120 [amdgpu]
[  150.971155]        kfd_process_evict_queues+0x3b/0xc0 [amdgpu]
[  150.977054]        kgd2kfd_quiesce_mm+0x25/0x60 [amdgpu]
[  150.982442]        amdgpu_amdkfd_evict_userptr+0x35/0x70 [amdgpu]
[  150.988615]        amdgpu_mn_invalidate_hsa+0x41/0x60 [amdgpu]
[  150.994448]        __mmu_notifier_invalidate_range_start+0xa4/0x240
[  151.000714]        copy_page_range+0xd70/0xd80
[  151.005159]        dup_mm+0x3ca/0x550
[  151.008816]        copy_process+0x1bdc/0x1c70
[  151.013183]        _do_fork+0x76/0x6c0
[  151.016929]        __x64_sys_clone+0x8c/0xb0
[  151.021201]        do_syscall_64+0x4a/0x1d0
[  151.025404]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  151.030977]
               -> #2 (&adev->notifier_lock){+.+.}:
[  151.036993]        __mutex_lock+0xa1/0x9f0
[  151.041168]        amdgpu_mn_invalidate_hsa+0x30/0x60 [amdgpu]
[  151.047019]        __mmu_notifier_invalidate_range_start+0xa4/0x240
[  151.053277]        copy_page_range+0xd70/0xd80
[  151.057722]        dup_mm+0x3ca/0x550
[  151.061388]        copy_process+0x1bdc/0x1c70
[  151.065748]        _do_fork+0x76/0x6c0
[  151.069499]        __x64_sys_clone+0x8c/0xb0
[  151.073765]        do_syscall_64+0x4a/0x1d0
[  151.077952]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  151.083523]
               -> #1 (mmu_notifier_invalidate_range_start){+.+.}:
[  151.090833]        change_protection+0x802/0xab0
[  151.095448]        mprotect_fixup+0x187/0x2d0
[  151.099801]        setup_arg_pages+0x124/0x250
[  151.104251]        load_elf_binary+0x3a4/0x1464
[  151.108781]        search_binary_handler+0x6c/0x210
[  151.113656]        __do_execve_file.isra.40+0x7f7/0xa50
[  151.118875]        do_execve+0x21/0x30
[  151.122632]        call_usermodehelper_exec_async+0x17e/0x190
[  151.128393]        ret_from_fork+0x24/0x30
[  151.132489]
               -> #0 (&mm->mmap_sem#2){++++}:
[  151.138064]        __lock_acquire+0x11a1/0x1490
[  151.142597]        lock_acquire+0x90/0x180
[  151.146694]        __might_fault+0x68/0x90
[  151.150879]        read_sdma_queue_counter+0x5f/0xb0 [amdgpu]
[  151.156693]        update_sdma_queue_past_activity_stats+0x3b/0x90 [amdgpu]
[  151.163725]        destroy_queue_cpsch+0x1ae/0x210 [amdgpu]
[  151.169373]        pqm_destroy_queue+0xf0/0x250 [amdgpu]
[  151.174762]        kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
[  151.180577]        kfd_ioctl+0x223/0x400 [amdgpu]
[  151.185284]        ksys_ioctl+0x8f/0xb0
[  151.189118]        __x64_sys_ioctl+0x16/0x20
[  151.193389]        do_syscall_64+0x4a/0x1d0
[  151.197569]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  151.203141]
               other info that might help us debug this:

[  151.211140] Chain exists of:
                 &mm->mmap_sem#2 --> &adev->notifier_lock --> &dqm->lock_hidden

[  151.222535]  Possible unsafe locking scenario:

[  151.228447]        CPU0                    CPU1
[  151.232971]        ----                    ----
[  151.237502]   lock(&dqm->lock_hidden);
[  151.241254]                                lock(&adev->notifier_lock);
[  151.247774]                                lock(&dqm->lock_hidden);
[  151.254038]   lock(&mm->mmap_sem#2);

This commit fixes the warning by ensuring get_user() is not called
while reading SDMA stats with dqm_lock held as get_user() could cause a
page fault which leads to the circular locking scenario.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:27 -04:00
..
cik_event_interrupt.c drm/amdkfd: Eliminate get_atc_vmid_pasid_mapping_valid 2019-10-03 09:11:04 -05:00
cik_int.h drm/amdkfd: Clean up reference of radeon 2018-07-11 22:33:08 -04:00
cik_regs.h drm/amdkfd: Delete a duplicate statement in set_pasid_vmid_mapping() 2018-11-05 14:21:13 -05:00
cwsr_trap_handler_gfx8.asm drm/amdkfd: Remove dead code from gfx8/gfx9 trap handlers 2019-07-30 23:22:18 -05:00
cwsr_trap_handler_gfx9.asm drm/amdkfd: Remove dead code from gfx8/gfx9 trap handlers 2019-07-30 23:22:18 -05:00
cwsr_trap_handler_gfx10.asm drm/amdkfd: Support debugger in Navi1x trap handler 2020-07-01 01:59:11 -04:00
cwsr_trap_handler.h drm/amdkfd: Support debugger in Navi1x trap handler 2020-07-01 01:59:11 -04:00
Kconfig drm/amdgpu: fix license on Kconfig and Makefiles 2019-12-11 15:22:08 -05:00
kfd_chardev.c drm/amdkfd: Track GPU memory utilization per process 2020-04-30 16:47:34 -04:00
kfd_crat.c drm/amdkfd: Support Sienna_Cichlid KFD v4 2020-07-01 01:59:11 -04:00
kfd_crat.h drm/amdkfd: Adjust weight to represent num_hops info when report xgmi iolink 2019-05-24 12:20:48 -05:00
kfd_dbgdev.c drm/amdkfd: Eliminate unnecessary kernel queue function pointers 2019-12-05 16:24:36 -05:00
kfd_dbgdev.h drm/amdkfd: Clean up reference of radeon 2018-07-11 22:33:08 -04:00
kfd_dbgmgr.c drm/amdkfd: Use hex print format for pasid 2019-10-03 09:11:03 -05:00
kfd_dbgmgr.h
kfd_debugfs.c drm/amdkfd: Fix permissions of hang_hws 2020-01-07 11:54:30 -05:00
kfd_device_queue_manager_cik.c drm/amdkfd: Introduce asic-specific mqd_manager_init function 2019-05-24 12:21:02 -05:00
kfd_device_queue_manager_v9.c drm/amdkfd: Consistently apply noretry setting 2019-07-16 13:02:55 -05:00
kfd_device_queue_manager_v10.c drm/amdkfd: Add navi10 support to amdkfd. (v3) 2019-06-21 18:59:24 -05:00
kfd_device_queue_manager_vi.c drm/amdkfd: Introduce asic-specific mqd_manager_init function 2019-05-24 12:21:02 -05:00
kfd_device_queue_manager.c drm/amdkfd: Fix circular locking dependency warning 2020-07-01 01:59:27 -04:00
kfd_device_queue_manager.h drm/amdkfd: Fix circular locking dependency warning 2020-07-01 01:59:27 -04:00
kfd_device.c drm/amdkfd: Add eviction debug messages 2020-07-01 01:59:21 -04:00
kfd_doorbell.c drm/amdkfd: Use better name to indicate the offset is in dwords 2019-11-13 15:29:45 -05:00
kfd_events.c drm/amdkfd: Use pr_debug to print the message of reaching event limit 2020-03-10 15:54:27 -04:00
kfd_events.h drm/amdkfd: Implement GPU reset handlers in KFD 2018-07-11 22:32:56 -04:00
kfd_flat_memory.c drm/amdkfd: Support Sienna_Cichlid KFD v4 2020-07-01 01:59:11 -04:00
kfd_int_process_v9.c drm/amdkfd: Fix boolreturn.cocci warnings 2020-05-21 12:37:19 -04:00
kfd_interrupt.c drm/amdkfd: fix a potential NULL pointer dereference (v2) 2019-10-03 09:11:00 -05:00
kfd_iommu.c drm/amdkfd: report the real PCI bus number 2020-05-21 12:48:42 -04:00
kfd_iommu.h
kfd_kernel_queue.c drm/amdkfd: Enable over-subscription with >1 GWS queue 2020-04-28 16:20:30 -04:00
kfd_kernel_queue.h drm/amdkfd: Eliminate unnecessary kernel queue function pointers 2019-12-05 16:24:36 -05:00
kfd_module.c drm/amdkfd: add missing void argument to function kgd2kfd_init 2019-10-07 15:10:26 -05:00
kfd_mqd_manager_cik.c drm/amdkfd: DIQ should not use HIQ way to allocate memory 2019-11-22 14:27:11 -05:00
kfd_mqd_manager_v9.c drm/amdkfd: Add more comments on GFX9 user CP queue MQD workaround 2020-03-06 14:34:30 -05:00
kfd_mqd_manager_v10.c drm/amdkfd: use map_queues for hiq on gfx v10 as well 2020-01-16 13:34:57 -05:00
kfd_mqd_manager_vi.c drm/amdkfd: Remove duplicate functions update_mqd_hiq() 2019-11-22 14:27:11 -05:00
kfd_mqd_manager.c drm/amdkfd: Extend CU mask to 8 SEs (v3) 2019-08-02 10:19:11 -05:00
kfd_mqd_manager.h drm/amdkfd: Extend CU mask to 8 SEs (v3) 2019-08-02 10:19:11 -05:00
kfd_packet_manager_v9.c drm/amdkfd: Enable over-subscription with >1 GWS queue 2020-04-28 16:20:30 -04:00
kfd_packet_manager_vi.c drm/amdkfd: Rename kfd_kernel_queue_*.c to kfd_packet_manager_*.c 2019-11-19 09:47:23 -05:00
kfd_packet_manager.c drm/amdkfd: Support Sienna_Cichlid KFD v4 2020-07-01 01:59:11 -04:00
kfd_pasid.c drm/amdkfd: Simplify kfd2kgd interface 2018-11-05 14:21:07 -05:00
kfd_pm4_headers_ai.h drm/amdkfd: Support bigger gds size 2019-07-18 14:18:03 -05:00
kfd_pm4_headers_diq.h
kfd_pm4_headers_vi.h drm/amdkfd: Delete alloc_format field from map_queue struct 2019-05-24 12:21:03 -05:00
kfd_pm4_headers.h
kfd_pm4_opcodes.h
kfd_priv.h drm/amdkfd: Add eviction debug messages 2020-07-01 01:59:21 -04:00
kfd_process_queue_manager.c drm/amdkfd: New IOCTL to allocate queue GWS (v2) 2020-04-28 16:20:30 -04:00
kfd_process.c drm/amdkfd: Fix circular locking dependency warning 2020-07-01 01:59:27 -04:00
kfd_queue.c
kfd_topology.c drm/amdkfd: Fix reference count leaks. 2020-07-01 01:59:21 -04:00
kfd_topology.h drm/amdkfd: Report domain with topology 2020-05-01 09:59:51 -04:00
Makefile drm/amdkfd: Rename kfd_kernel_queue_*.c to kfd_packet_manager_*.c 2019-11-19 09:47:23 -05:00
soc15_int.h