linux

Author	SHA1	Message	Date
Nirmoy Das	75e1658ea0	drm/amdgpu: call release_firmware() without a NULL check The release_firmware() function is NULL tolerant so we do not need to check for NULL param before calling it. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-01 01:59:27 -04:00
Kevin Wang	5d5bd5e32e	drm/amdgpu: restrict the hw sched jobs number to power of two the module parameter sched_hw_submission is probably from user mode, and the kernel need to check whether it is legal. 1. align hw sched jobs to power of 2 and set minimum number is 2. 2. use kernel api is_power_of_2() to simplify driver code. Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-01 01:59:23 -04:00
Colton Lewis	282fd22b46	drm/amd: correct trivial kernel-doc inconsistencies Silence documentation warnings by correcting kernel-doc comments. ./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3388: warning: Excess function parameter 'suspend' description in 'amdgpu_device_suspend' ./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3485: warning: Excess function parameter 'resume' description in 'amdgpu_device_resume' ./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:418: warning: Excess function parameter 'tbo' description in 'amdgpu_vram_mgr_del' ./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:418: warning: Excess function parameter 'place' description in 'amdgpu_vram_mgr_del' ./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:279: warning: Excess function parameter 'tbo' description in 'amdgpu_gtt_mgr_del' ./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:279: warning: Excess function parameter 'place' description in 'amdgpu_gtt_mgr_del' ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:332: warning: Function parameter or member 'hdcp_workqueue' not described in 'amdgpu_display_manager' ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:332: warning: Function parameter or member 'cached_dc_state' not described in 'amdgpu_display_manager' Signed-off-by: Colton Lewis <colton.w.lewis@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-01 01:59:19 -04:00
Alex Deucher	b7221f2b46	drm/amdgpu: skip BAR resizing if the bios already did it No need to do it again. Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-01 01:59:17 -04:00
Stanley.Yang	5278a159cf	drm/amdgpu: support reserve bad page for virt (v3) v1: rename some functions name, only init ras error handler data for supported asic. v2: fix potential memory leak. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-01 01:59:17 -04:00
Tianci.Yin	cc375d8c52	drm/amdgpu: temporarily read bounding box from gpu_info fw for navi12 The bounding box is still needed by Navi12, temporarily read it from gpu_info firmware. Should be droped when DAL no longer needs it. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Tianci.Yin <tianci.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-01 01:59:15 -04:00
Jerry (Fangzhi) Zuo	81d9bfb8c5	drm/amdgpu/dc: Add missing Sienna_Cichlid chip id Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-01 01:59:10 -04:00
Likun Gao	11e8aef52e	drm/amdgpu: set asic family and ip blocks for sienna_cichlid Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-06-03 13:51:57 -04:00
Likun Gao	c0a43457dc	drm/amdgpu: add sienna_cichlid gpu info firmware v2 gpu info fw contains chip specific parameters. v2: fix fw_name Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-06-03 13:51:57 -04:00
Likun Gao	ccaf72d3c2	drm/amdgpu: add sienna_cichlid asic type Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-06-03 13:51:56 -04:00
Alex Deucher	4292b0b202	drm/amdgpu: clean up discovery testing Rather than checking of the variable is enabled and the chip is the right family check for the presence of the discovery table. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-29 13:55:07 -04:00
Alex Deucher	258620d0b3	drm/amdgpu: skip gpu_info firmware if discovery info is available The GPU info firmware is only applicable at bring up when the IP discovery table is not present. If it's available, use that first and then fallback to parsing the gpu info firmware. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-29 13:55:07 -04:00
Evan Quan	b265bdbd9f	drm/amdgpu: added a sysfs interface for thermal throttling related V4 User can check and set the enablement of throttling logging and the interval between each logging. V2: simplify the sysfs interface(no string parsing) V3: add proper lock protection on updating throttling_logging_rs.interval V4: documentation cosmetic per Luben's suggestion Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-29 13:55:07 -04:00
Alex Deucher	da87c30b17	drm/amdgpu: put some case statments in family order SI and CIK came before VI and newer asics. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-28 14:00:49 -04:00
Alex Deucher	e1ad2d53bc	drm/amdgpu: simplify CZ/ST and KV/KB/ML checks Just check for APU. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-28 14:00:49 -04:00
Alex Deucher	70534d1ee8	drm/amdgpu: simplify raven and renoir checks Just check for APU. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-28 14:00:49 -04:00
Alex Deucher	54f78a7655	drm/amdgpu: add apu flags (v2) Add some APU flags to simplify handling of different APU variants. It's easier to understand the special cases if we use names flags rather than checking device ids and silicon revisions. v2: rebase on latest code Acked-by: Evan Quan <evan.quan@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-22 13:41:53 -04:00
Alex Deucher	6e29c227a4	drm/amdgpu: move gpu_info parsing after common early init We need to get the silicon revision id before we parse the firmware in order to load the correct gpu info firmware for raven2 variants. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1103 Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-22 13:41:32 -04:00
Alex Deucher	6ba57b7a8f	drm/amdgpu: move discovery gfx config fetching Move it into the fw_info function since it's logically part of the same functionality. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-22 13:41:22 -04:00
Jiange Zhao	728e7e0cd6	drm/amdgpu: Add autodump debugfs node for gpu reset v8 When GPU got timeout, it would notify an interested part of an opportunity to dump info before actual GPU reset. A usermode app would open 'autodump' node under debugfs system and poll() for readable/writable. When a GPU reset is due, amdgpu would notify usermode app through wait_queue_head and give it 10 minutes to dump info. After usermode app has done its work, this 'autodump' node is closed. On node closure, amdgpu gets to know the dump is done through the completion that is triggered in release(). There is no write or read callback because necessary info can be obtained through dmesg and umr. Messages back and forth between usermode app and amdgpu are unnecessary. v2: (1) changed 'registered' to 'app_listening' (2) add a mutex in open() to prevent race condition v3 (chk): grab the reset lock to avoid race in autodump_open, rename debugfs file to amdgpu_autodump, provide autodump_read as well, style and code cleanups v4: add 'bool app_listening' to differentiate situations, so that the node can be reopened; also, there is no need to wait for completion when no app is waiting for a dump. v5: change 'bool app_listening' to 'enum amdgpu_autodump_state' add 'app_state_mutex' for race conditions: (1)Only 1 user can open this file node (2)wait_dump() can only take effect after poll() executed. (3)eliminated the race condition between release() and wait_dump() v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex' removed state checking in amdgpu_debugfs_wait_dump Improve on top of version 3 so that the node can be reopened. v7: move reinit_completion into open() so that only one user can open it. v8: remove complete_all() from amdgpu_debugfs_wait_dump(). Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-18 11:23:37 -04:00
Nirmoy Das	77f3a5cd70	drm/amdgpu: cleanup sysfs file handling Create sysfs file using attributes. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-08 14:32:24 -04:00
Christian König	9d11eb0d0c	drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2 This should speed up debugging VRAM access a lot. v2: add HDP flush/invalidate Unrevert: RAS issue at root of the issue has been addressed Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-06 16:50:40 -04:00
Nathan Chancellor	54b7feb93f	drm/amdgpu: Avoid integer overflow in amdgpu_device_suspend_display_audio When building with Clang: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4160:53: warning: overflow in expression; result is -294967296 with type 'long' [-Winteger-overflow] expires = ktime_get_mono_fast_ns() + NSEC_PER_SEC * 4L; ^ 1 warning generated. Multiplication happens first due to order of operations and both NSEC_PER_SEC and 4 are long literals so the expression overflows. To avoid this, make 4 an unsigned long long literal, which matches the type of expires (u64). Fixes: `3f12acc8d6` ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3") Link: https://github.com/ClangBuiltLinux/linux/issues/1017 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-05-05 13:12:55 -04:00
Evan Quan	3f12acc8d6	drm/amdgpu: put the audio codec into suspend state before gpu reset V3 At default, the autosuspend delay of audio controller is 3S. If the gpu reset is triggered within 3S(after audio controller idle), the audio controller may be unable into suspended state. Then the sudden gpu reset will cause some audio errors. The change here is targeted to resolve this. However if the audio controller is in use when the gpu reset triggered, this change may be still not enough to put the audio controller into suspend state. Under this case, the gpu reset will still proceed but there will be a warning message printed("failed to suspend display audio"). V2: limit this for BACO and mode1 reset only V3: try 1st to use pm_runtime_autosuspend_expiration() to query how much time is left. Use default setting on failure Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-30 16:48:10 -04:00
Luben Tuikov	c6252390fc	drm/amdgpu: implement TMZ accessor (v3) Implement an accessor of adev->tmz.enabled. Let not code around access it as "if (adev->tmz.enabled)" as the organization may change. Instead... Recruit "bool amdgpu_is_tmz(adev)" to return exactly this Boolean value. That is, this function is now an accessor of an already initialized and set adev and adev->tmz. Add "void amdgpu_gmc_tmz_set(adev)" to check and set adev->gmc.tmz_enabled at initialization time. After which one uses "bool amdgpu_is_tmz(adev)" to query whether adev supports TMZ. Also, remove circular header file include. v2: Remove amdgpu_tmz.[ch] as requested. v3: Move TMZ into GMC. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-28 16:20:29 -04:00
Huang Rui	01a8dcec1a	drm/amdgpu: add function to check tmz capability (v4) Add a function to check tmz capability with kernel parameter and ASIC type. v2: use a per device tmz variable instead of global amdgpu_tmz. v3: refine the comments for the function. (Luben) v4: add amdgpu_tmz.c/h for future use. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-28 16:20:28 -04:00
Jason Yan	b6e79d9a31	drm/amdgpu: remove conversion to bool in amdgpu_device.c The '>' expression itself is bool, no need to convert it to bool again. This fixes the following coccicheck warning: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3004:68-73: WARNING: conversion to bool not needed here Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-27 15:52:16 -04:00
Evan Quan	fde812b32c	drm/amdgpu: drop redundant cg/pg ungate on runpm enter CG/PG ungate is already performed in ip_suspend_phase1. Otherwise, the CG/PG ungate will be performed twice. That will cause gfxoff disablement is performed twice also on runpm enter while gfxoff enablemnt once on rump exit. That will put gfxoff into disabled state. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-27 15:51:56 -04:00
Evan Quan	94fa566058	drm/amdgpu: move kfd suspend after ip_suspend_phase1 This sequence change should be safe as what did in ip_suspend_phase1 is to suspend DCE only. And this is a prerequisite for coming redundant cg/pg ungate dropping. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-27 15:51:49 -04:00
Dennis Li	a891d239f9	drm/amdgpu: set error query ready after all IPs late init If set error query ready in amdgpu_ras_late_init, which will cause some IP blocks aren't initialized, but their error query is ready. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:49 -04:00
Evan Quan	7dd8c205ea	drm/amdgpu: code cleanup around gpu reset Make code more readable. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:49 -04:00
Evan Quan	9e94d22c00	drm/amdgpu: optimize the gpu reset for XGMI setup V2 This is basically just some code cosmetic. The current design for XGMI setup gput reset is to operate on current device(adev) first and then on other devices from the hive(by another 'for' loop). But actually we can do some sort to the device list(to put current device 1st position) and handle all the devices in a single 'for' loop. V2: added missing hive->hive_lock protection Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:49 -04:00
Evan Quan	52fb44cf30	drm/amdgpu: correct cancel_delayed_work_sync on gpu reset As for XGMI setup, it should be performed on other devices from the hive also. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:49 -04:00
Evan Quan	a2f63ee8b5	drm/amdgpu: correct fbdev suspend on gpu reset As for XGMI setup, it needs to be performed on all the devices from the same hive. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:49 -04:00
Kevin Wang	e05185b341	drm/amdgpu: clean up unused variable about ring lru clean up unused variable: 1. ring_lru_list 2. ring_lru_list_lock related-commit: drm/amdgpu: remove ring lru handling Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:49 -04:00
Yong Zhao	d69b8971e5	drm/amdgpu: Print CU information by default during initialization This is convenient for multiple teams to obtain the information. Also, add device info by using dev_info(). Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:49 -04:00
Jonathan Kim	d84a430d9f	drm/amdgpu: fix race between pstate and remote buffer map Vega20 arbitrates pstate at hive level and not device level. Last peer to remote buffer unmap could drop P-State while another process is still remote buffer mapped. With this fix, P-States still needs to be disabled for now as SMU bug was discovered on synchronous P2P transfers. This should be fixed in the next FW update. Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:46 -04:00
Kent Russell	fdd21e62b0	Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2" This reverts commit `c12b84d6e0`. The original patch causes a RAS event and subsequent kernel hard-hang when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and Arcturus dmesg output at hang time: [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! amdgpu 0000:67:00.0: GPU reset begin! Evicting PASID 0x8000 queues Started evicting pasid 0x8000 qcm fence wait loop timeout expired The cp might be in an unrecoverable state due to an unsuccessful queues preemption Failed to evict process queues Failed to suspend process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT amdgpu: [powerplay] Failed to send message 0x26, response 0x0 amdgpu: [powerplay] Failed to set soft min gfxclk ! amdgpu: [powerplay] Failed to upload DPM Bootup Levels! amdgpu: [powerplay] Failed to send message 0x7, response 0x0 amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu features! amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features! amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block <powerplay> failed -5 Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:46 -04:00
Prike Liang	ced1ba9761	drm/amdgpu: fix the hw hang during perform system reboot and reset The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: `487eca11a3` ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-22 18:11:45 -04:00
Aurabindo Pillai	dd4fa6c1b8	drm/amd/amdgpu: remove hardcoded module name in prints Let format prefixes take care of printing the module name through pr_fmt and dev_fmt definitions. Signed-off-by: Aurabindo Pillai <mail@aurabindo.in> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-13 12:02:40 -04:00
Evan Quan	dadce777e0	drm/amdgpu: fix wrong vram lost counter increment V2 Vram lost counter is wrongly increased by two during baco reset. V2: assumed vram lost for mode1 reset on all ASICs Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-13 12:02:08 -04:00
Hawking Zhang	2eee0229f6	drm/amdgpu: support access regs outside of mmio bar add indirect access support to registers outside of mmio bar. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:18 -04:00
Hawking Zhang	f384ff95f6	drm/amdgpu: retire AMDGPU_REGS_KIQ flag all the register access through kiq is redirected to amdgpu_kiq_rreg/amdgpu_kiq_wreg Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:18 -04:00
Hawking Zhang	ec59847e74	drm/amdgpu: retire RREG32_IDX/WREG32_IDX those are not needed anymore Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:18 -04:00
Hawking Zhang	dec0520aff	drm/amdgpu: remove inproper workaround for vega10 the workaround is not needed for soc15 ASICs except for vega10. it is even not needed with latest vega10 vbios. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:18 -04:00
Prike Liang	a23ca7f76d	drm/amdgpu: fix gfx hang during suspend with video playback (v2) The system will be hang up during S3 suspend because of SMU is pending for GC not respose the register CP_HQD_ACTIVE access request.This issue root cause of accessing the GC register under enter GFX CGGPG and can be fixed by disable GFX CGPG before perform suspend. v2: Use disable the GFX CGPG instead of RLC safe mode guard. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:17 -04:00
Jack Zhang	b639c22c98	drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset [PATCH 2/2] kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:15 -04:00
Jack Zhang	fe9824d15e	drm/amdkfd Avoid destroy hqd when GPU is on reset This reverts commit 5161bba4311f in order to split it into two different patches, and this will make it easier to understand. [PATCH 1/2] porting to gfx10 from commit `1b0bfcff46` ("drm/amdgpu: Avoid destroy hqd when GPU is on reset") Originally, MEC is touched without GPU initialized first. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:15 -04:00
Nirmoy Das	1c6d567bdf	drm/amdgpu: rework sched_list generation Generate HW IP's sched_list in amdgpu_ring_init() instead of amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. This patch also stores sched_list for all HW IPs in one big array in struct amdgpu_device which makes amdgpu_ctx_init_entity() much more leaner. v2: fix a coding style issue do not use drm hw_ip const to populate amdgpu_ring_type enum v3: remove ctx reference and move sched array and num_sched to a struct use num_scheds to detect uninitialized scheduler list v4: use array_index_nospec for user space controlled variables fix possible checkpatch.pl warnings Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:14 -04:00
Jack Zhang	04bef61e5d	drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. v2:add a bugfix for kiq ring test fail Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-09 10:43:14 -04:00
Jiawei	b7b2a316b9	drm/amdgpu: extend compute job timeout extend compute lockup timeout to 60000 for SR-IOV. Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Jiawei <Jiawei.Gu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-01 14:44:43 -04:00
Monk Liu	2f2941324c	drm/amdgpu: postpone entering fullaccess mode if host support new handshake we only need to enter fullaccess_mode in ip_init() part, otherwise we need to do it before reading vbios (becuase host prepares vbios for VF only after received REQ_GPU_INIT event under legacy handshake) Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-01 14:44:43 -04:00
Monk Liu	dffa11b4f7	drm/amdgpu: adjust sequence of ip_discovery init and timeout_setting what: 1)move timtout setting before ip_early_init to reduce exclusive mode cost for SRIOV 2)move ip_discovery_init() to inside of amdgpu_discovery_reg_base_init() it is a prepare for the later upcoming patches. why: in later upcoming patches we would use a new mailbox event -- "req_gpu_init_data", which is a callback hooked in adev->virt.ops and this callback send a new event "REQ_GPU_INIT_DAT" to host to notify host to do some preparation like "IP discovery/vbios on the VF FB" and this callback must be: A) invoked after set_ip_block() because virt.ops is configured during set_ip_block() B) invoked before ip_discovery_init() becausen ip_discovery_init() need host side prepares everything in VF FB first. current place of ip_discovery_init() is before we can invoke callback of adev->virt.ops, thus we must move ip_discovery_init() to a place after the adev->virt.ops all settle done, and the perfect place is in amdgpu_discovery_reg_base_init() Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-01 14:44:43 -04:00
Monk Liu	122078de16	drm/amdgpu: equip new req_init_data handshake by this new handshake host side can prepare vbios/ip-discovery and pf&vf exchange data upon recieving this request without stopping world switch. this way the world switch is less impacted by VF's exclusive mode request Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-01 14:44:43 -04:00
John Clements	61380faa4b	drm/amdgpu: disable ras query and iject during gpu reset added flag to ras context to indicate if ras query functionality is ready Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-01 14:44:42 -04:00
Monk Liu	3aa0115d23	drm/amdgpu: cleanup all virtualization detection routine we need to move virt detection much earlier because: 1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always be at DE5 (dw) mmio offset from vega10, this way there is no need to implement detect_hw_virt() routine in each nbio/chip file. for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at 0x1503 2) we need to acknowledged we are SRIOV VF before we do IP discovery because the IP discovery content will be updated by host everytime after it recieved a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches for this new handshake soon). Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-01 14:44:42 -04:00
Kent Russell	bd607166af	drm/amdgpu: Enable reading FRU chip via I2C v3 Allow for reading of information like manufacturer, product number and serial number from the FRU chip. Report the serial number as the new sysfs file serial_number. Note that this only works on server cards, as consumer cards do not feature the FRU chip, which contains this information. v2: Add documentation to amdgpu.rst, add helper functions, rename functions for consistency, fix bad starting offset v3: Remove testing definitions Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-04-01 14:44:41 -04:00
John Clements	43c4d57618	drm/amdgpu: protect RAS sysfs during GPU reset MMHub EDC becomes dirty after BACO reset EDC registers should be cleared early on in reset phase Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-20 10:45:00 -04:00
Monk Liu	2e0cc4d48b	drm/amdgpu: revise RLCG access path what changed: 1)provide new implementation interface for the rlcg access path 2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op function can access reg that need RLCG path help now even debugfs's reg_op can used to dump wave. tested-by: Monk Liu <monk.liu@amd.com> tested-by: Zhou pengju <pengju.zhou@amd.com> Signed-off-by: Zhou pengju <pengju.zhou@amd.com> Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-16 16:17:55 -04:00
Evan Quan	565d194155	drm/amdgpu: add fbdev suspend/resume on gpu reset This can fix the baco reset failure seen on Navi10. And this should be a low risk fix as the same sequence is already used for system suspend/resume. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-13 11:52:34 -04:00
Monk Liu	752c683dbb	drm/amdgpu: fix IB test MCBP bug 1)for gfx IB test we shouldn't insert DE meta data 2)we should make sure IB test finished before we send event 3 to hypervisor otherwise the IDLE from event 3 will preempt IB test, which is not designed as a compatible structure for MCBP Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-05 00:28:11 -05:00
Yintian Tao	d2790e10d3	drm/amdgpu: no need to clean debugfs at amdgpu drm_minor_unregister will invoke drm_debugfs_cleanup to clean all the child node under primary minor node. We don't need to invoke amdgpu_debugfs_fini and amdgpu_debugfs_regs_cleanup to clean agian. Otherwise, it will raise the NULL pointer like below. [ 45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 45.047256] PGD 0 P4D 0 [ 45.047713] Oops: 0002 [#1] SMP PTI [ 45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G W OE 4.18.0-15-generic #16~18.04.1-Ubuntu [ 45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 45.050651] RIP: 0010:down_write+0x1f/0x40 [ 45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01 [ 45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246 [ 45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814 [ 45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8 [ 45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00 [ 45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300 [ 45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860 [ 45.059221] FS: 00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000 [ 45.060809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0 [ 45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 45.065897] Call Trace: [ 45.066426] debugfs_remove+0x36/0xa0 [ 45.067131] amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu] [ 45.068019] amdgpu_debugfs_fini+0x2c/0x50 [amdgpu] [ 45.068756] amdgpu_pci_remove+0x49/0x70 [amdgpu] [ 45.069439] pci_device_remove+0x3e/0xc0 [ 45.070037] device_release_driver_internal+0x18a/0x260 [ 45.070842] driver_detach+0x3f/0x80 [ 45.071325] bus_remove_driver+0x59/0xd0 [ 45.071850] driver_unregister+0x2c/0x40 [ 45.072377] pci_unregister_driver+0x22/0xa0 [ 45.073043] amdgpu_exit+0x15/0x57c [amdgpu] [ 45.073683] __x64_sys_delete_module+0x146/0x280 [ 45.074369] do_syscall_64+0x5a/0x120 [ 45.074916] entry_SYSCALL_64_after_hwframe+0x44/0xa9 v2: remove all debugfs cleanup/fini code at amdgpu v3: squash in unused variable removal Signed-off-by: Yintian Tao <yttao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-28 16:59:22 -05:00
Dave Airlie	a2ae604da7	Merge tag 'amd-drm-next-5.7-2020-02-26' of git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-next-5.7-2020-02-26: amdgpu: - Rework VM update handling in preparation for HMM support - HDCP srm support - PSR fixes - DC watermark fixes - OLED panel support - SR-IOV fixes - BACO fixes - Optimize debugging vram access - RAS fixes - Use BACO for runtime pm - HDCP fixes - XGMI fixes - DDC fixes - DC clock programming optimizations and fixes - PSP fw loading sequence updates - Drop DRIVER_USE_AGP - Remove legacy drm load and unload callbacks amdkfd: - Add runtime pm support radeon: - Drop DRIVER_USE_AGP Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227043142.4075-1-alexander.deucher@amd.com	2020-02-28 15:40:26 +10:00
Alex Deucher	c6385e503a	drm/amdgpu: drop legacy drm load and unload callbacks We've moved the debugfs handling into a centralized place so we can remove the legacy load an unload callbacks. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-26 14:21:13 -05:00
Alex Deucher	cd9e29e717	drm/amdgpu/firmware: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for firmware. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-26 14:21:12 -05:00
Alex Deucher	f9d64e6c4a	drm/amdgpu/regs: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for register access files. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-26 14:21:12 -05:00
Alex Deucher	3f5cea671c	drm/amdgpu/gem: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for gem. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-26 14:21:12 -05:00
Alex Deucher	923ffa6b02	drm/amdgpu: rename amdgpu_debugfs_preempt_cleanup to amdgpu_debugfs_fini. It will be used for other things in the future. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-26 14:21:12 -05:00
Yong Zhao	8bdab6bb1c	drm/amdgpu: Increase timout on emulator to tenfold instead of twice Since emulators are slower, sometime some operations like flushing tlb through FM need more than twice the regular timout of 100ms, so increase the timeout to 1s on emulators. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-26 14:21:03 -05:00
Dave Airlie	1b245ec5b6	drm-misc-next for 5.7: UAPI Changes: - lima: Add support for heap buffers Cross-subsystem Changes: Core Changes: - Implement mode_config mode_valid for memory constrained drivers - Bus format negociation between bridges - Consolidate fake vblank events for drivers without vblank interrupts - drm/bufs: dma_alloc related cleanups - drm/dp_mst: Various fixes - drm/print: New drm_device based print helpers - Thomas is a drm-misc maintainer now! Driver Changes: - DPMS cleanups for atomic drivers - Removal of owner field in SPI tinydrm drivers - Removal of explicit dependency on DT for tinydrm drivers - Conversion to YAML schemas for DT bindings - tidss: New driver - virtio: various reworks and fixes - Our usual dozen or so new panels or bridges -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXkEjOgAKCRDj7w1vZxhR xeaDAQD+1MludG4RmfQhATe4jTsPC1r2x63OF2CA0ChMGHXJyQEA8qqQ+8y1Cd/u PZ3PpcTl4qYYHgzJ6FwW7kDPTvlaZQE= =IJAt -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2020-02-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.7: UAPI Changes: - lima: Add support for heap buffers Cross-subsystem Changes: Core Changes: - Implement mode_config mode_valid for memory constrained drivers - Bus format negociation between bridges - Consolidate fake vblank events for drivers without vblank interrupts - drm/bufs: dma_alloc related cleanups - drm/dp_mst: Various fixes - drm/print: New drm_device based print helpers - Thomas is a drm-misc maintainer now! Driver Changes: - DPMS cleanups for atomic drivers - Removal of owner field in SPI tinydrm drivers - Removal of explicit dependency on DT for tinydrm drivers - Conversion to YAML schemas for DT bindings - tidss: New driver - virtio: various reworks and fixes - Our usual dozen or so new panels or bridges Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200210093421.xu4sofldm6wm6xq6@gilmour.lan	2020-02-21 05:44:40 +10:00
Rajneesh Bhardwaj	9593f4d6a6	drm/amdkfd: refactor runtime pm for baco So far the kfd driver implemented same routines for runtime and system wide suspend and resume (s2idle or mem). During system wide suspend the kfd aquires an atomic lock that prevents any more user processes to create queues and interact with kfd driver and amd gpu. This mechanism created problem when amdgpu device is runtime suspended with BACO enabled. Any application that relies on kfd driver fails to load because the driver reports a locked kfd device since gpu is runtime suspended. However, in an ideal case, when gpu is runtime suspended the kfd driver should be able to: - auto resume amdgpu driver whenever a client requests compute service - prevent runtime suspend for amdgpu while kfd is in use This change refactors the amdgpu and amdkfd drivers to support BACO and runtime power management. Reviewed-by: Oak Zeng <oak.zeng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-12 16:00:54 -05:00
Christian König	c12b84d6e0	drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2 This should speed up debugging VRAM access a lot. v2: add HDP flush/invalidate Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-07 11:45:25 -05:00
Christian König	ce05ac56e6	drm/amdgpu: optimize amdgpu_device_vram_access a bit. Only write the _HI register when necessary. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-07 11:45:17 -05:00
Jack Zhang	86b93fd62d	drm/amdgpu/sriov Don't send msg when smu suspend For sriov and pp_onevf_mode, do not send message to set smu status, because smu doesn't support these messages under VF. Besides, it should skip smu_suspend when pp_onevf_mode is disabled. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-02-07 11:44:24 -05:00
Alex Deucher	2cb44fb093	drm/amdgpu: enable GPU reset by default on renoir Everything is in place. Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-27 16:46:45 -05:00
Alex Deucher	658c663947	drm/amdgpu: enable GPU reset by default on Navi Has been working fine for a while. Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-27 16:46:45 -05:00
Chris Wilson	7e13ad8964	drm: Avoid drm_global_mutex for simple inc/dec of dev->open_count Since drm_global_mutex is a true global mutex across devices, we don't want to acquire it unless absolutely necessary. For maintaining the device local open_count, we can use atomic operations on the counter itself, except when making the transition to/from 0. Here, we tackle the easy portion of delaying acquiring the drm_global_mutex for the final release by using atomic_dec_and_mutex_lock(), leaving the global serialisation across the device opens. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Thomas Hellström (VMware) <thomas_os@shipmail.org> Reviewed-by: Thomas Hellström <thellstrom@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200124130107.125404-1-chris@chris-wilson.co.uk	2020-01-24 17:41:34 +00:00
Nirmoy Das	a9d4fe2fd6	drm/amdgpu: remove unnecessary conversion to bool Better clean that up before some automation starts to complain about it Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-22 16:55:27 -05:00
chen gong	c68dbcd8f9	drm/amdgpu: add kiq version interface for RREG32/WREG32 Reading some registers by mmio will result in hang when GPU is in "gfxoff" state.This problem can be solved by GPU in "ring command packages" way. Signed-off-by: chen gong <curry.gong@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-22 16:34:22 -05:00
chen gong	d33a99c4b6	drm/amdgpu: provide a generic function interface for reading/writing register by KIQ Move amdgpu_virt_kiq_rreg/amdgpu_virt_kiq_wreg function to amdgpu_gfx.c, and rename them to amdgpu_kiq_rreg/amdgpu_kiq_wreg.Make it generic and flexible. Signed-off-by: chen gong <curry.gong@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-22 16:34:14 -05:00
Pan, Xinhui	bd05221123	drm/amdgpu: add the lost mutex_init back Initialize notifier_lock. Bug: https://gitlab.freedesktop.org/drm/amd/issues/1016 Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-17 16:03:09 -05:00
Hawking Zhang	e9d4cf918f	drm/amdgpu: add arcturus to gpu recovery check code path support check if dirver should try gpu recovery for arcturus Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-16 13:40:54 -05:00
Evan Quan	9530273ec9	drm/amd/powerplay: cover the powerplay implementation details V3 This can save users much troubles. As they do not actually need to care whether swSMU or traditional powerplay routine should be used. V2: apply the fixes to vi.c and cik.c also V3: squash in oops fix Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-14 10:18:08 -05:00
Jack Zhang	895bd048fb	amd/amdgpu/sriov tdr enablement with pp_onevf_mode Under sriov and pp_onevf mode, 1.take resume instead of hw_init for smc recover to avoid potential memory leak. 2.add return condition inside smc resume function for sriov_pp_onevf_mode and pm_enabled param. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-07 11:58:19 -05:00
zhengbin	2a9b90ae47	drm/amdgpu: use true, false for bool variable in amdgpu_device.c Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3961:1-19: WARNING: Assignment of 0/1 to bool variable drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3981:1-19: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-23 15:00:01 -05:00
Ma Feng	e3c00faa7a	drm/amdgpu: Remove unneeded variable 'ret' in amdgpu_device.c Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1036:5-8: Unneeded variable: "ret". Return "0" on line 1079 Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ma Feng <mafeng.ma@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-23 14:58:56 -05:00
Jane Jian	d83c7a07a7	drm/amdgpu: update VCN1(dual instances) fw types ID and VCN ip block type Previously there is no VCN1 type ID in psp gfx interface. Also add VCN ip block type unless the reinit after FLR for sriov would fail. Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:33:26 -05:00
Andrey Grodzovsky	c96cf2823d	drm/amdgpu: Switch from system_highpri_wq to system_unbound_wq This is to avoid queueing jobs to same CPU during XGMI hive reset because there is a strict timeline for when the reset commands must reach all the GPUs in the hive. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:13 -05:00
Andrey Grodzovsky	c6a6e2db99	drm/amdgpu: Redo XGMI reset synchronization. Use task barrier in XGMI hive to synchronize ASIC resets across devices in XGMI hive. v2: Return right away with a warning if no xgmi hive, update doc. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:13 -05:00
Andrey Grodzovsky	041a62bc06	drm/amdgpu: reverts commit `ce316fa55e`. In preparation for doing XGMI reset synchronization using task barrier. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:12 -05:00
Monk Liu	5a7489a7e1	drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV issues: MEC is ruined by the amdkfd_pre_reset after VF FLR done fix: amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR, the correct sequence is do amdkfd_pre_reset before VF FLR but there is a limitation to block this sequence: if we do pre_reset() before VF FLR, it would go KIQ way to do register access and stuck there, because KIQ probably won't work by that time (e.g. you already made GFX hang) so the best way right now is to simply remove it. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:12 -05:00
Nirmoy Das	f880799d7f	amd/amdgpu: add sched array to IPs with multiple run-queues This sched array can be passed on to entity creation routine instead of manually creating such sched array on every context creation. v2: squash in missing break fix Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:12 -05:00
Nirmoy Das	0c88b43032	drm/amdgpu: replace vm_pte's run-queue list with drm gpu scheds list drm_sched_entity_init() takes drm gpu scheduler list instead of drm_sched_rq list. This makes conversion of drm_sched_rq list to drm gpu scheduler list unnecessary Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:12 -05:00
Emily Deng	8973d9ec8f	drm/amdgpu/sriov: Tonga sriov also need load firmware with smu Fix Tonga sriov load driver fail issue. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewd-by Yintian Tao <Yintian.tao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:06 -05:00
Yong Zhao	d7f72fe482	drm/amdgpu: Add CU info print log The log will be useful for easily getting the CU info on various emulation models or ASICs. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:06 -05:00
Simon Ser	93b09a9a89	drm/amdgpu: log when amdgpu.dc=1 but ASIC is unsupported This makes it easier to figure out whether the kernel parameter has been taken into account. Signed-off-by: Simon Ser <contact@emersion.fr> Cc: Harry Wentland <hwentlan@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-11 15:22:08 -05:00
Yintian Tao	c9ffa427db	drm/amd/powerplay: enable pp one vf mode for vega10 Originally, due to the restriction from PSP and SMU, VF has to send message to hypervisor driver to handle powerplay change which is complicated and redundant. Currently, SMU and PSP can support VF to directly handle powerplay change by itself. Therefore, the old code about the handshake between VF and PF to handle powerplay will be removed and VF will use new the registers below to handshake with SMU. mmMP1_SMN_C2PMSG_101: register to handle SMU message mmMP1_SMN_C2PMSG_102: register to handle SMU parameter mmMP1_SMN_C2PMSG_103: register to handle SMU response v2: remove module parameter pp_one_vf v3: fix the parens v4: forbid vf to change smu feature v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute v6: change skip condition at vega10_copy_table_to_smc Signed-off-by: Yintian Tao <yttao@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-11 15:22:07 -05:00
Le Ma	00eaa57172	drm/amdgpu: clear err_event_athub flag after reset exit Otherwise next err_event_athub error cannot call gpu reset. And following resume sequence will not be affected by this flag. v2: create function to clear amdgpu_ras_in_intr for modularity of ras driver Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-05 16:26:11 -05:00
Le Ma	b823821f22	drm/amdgpu: support full gpu reset workflow when ras err_event_athub occurs This athub fatal error can be recovered by baco without system-level reboot, so add a mode to use baco for the recovery. Not affect the default psp reset situations for now. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-05 16:26:03 -05:00
Le Ma	ce316fa55e	drm/amdgpu: add concurrent baco reset support for XGMI Currently each XGMI node reset wq does not run in parrallel if bound to same cpu. Make change to bound the xgmi_reset_work item to different cpus. XGMI requires all nodes enter into baco within very close proximity before any node exit baco. So schedule the xgmi_reset_work wq twice for enter/exit baco respectively. To use baco for XGMI, PMFW supported for baco on XGMI needs to be involved. The case that PSP reset and baco reset coexist within an XGMI hive never exist and is not in the consideration. v2: define use_baco flag to simplify the code for xgmi baco sequence Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-05 16:25:53 -05:00

1 2 3 4 5 ...

789 Commits