linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-10 14:11:52 +00:00

Author	SHA1	Message	Date
Michal Wajdeczko	16ba2b28df	drm/xe/pf: Add function to sanitize VF resources On current platforms it is a PF driver responsibility to clear some of the VF's resources during a VF FLR. Add simple function that will clear configured VF resources (GGTT, LMEM). We will start using this function soon. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828210809.1528-2-michal.wajdeczko@intel.com	2024-08-30 10:51:06 +02:00
Daniele Ceraolo Spurio	02a416afbe	drm/xe/gsc: Wedge the device if the GSCCS reset fails Due to the special handling of the GSCCS in HW, we can't escalate to GT reset when we receive the reset failure interrupt; the specs indicate that we should trigger an FLR instead, but we do not have support for that at the moment, so the HW will stay permanently in a broken state. We should therefore mark the device as wedged, the same as if the GT reset had failed. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828221457.2752868-1-daniele.ceraolospurio@intel.com	2024-08-29 14:18:52 -07:00
Daniele Ceraolo Spurio	5ee2d63ca1	drm/xe/gsc: Add debugfs to print GSC info This is useful for debug, in case something goes wrong with the GSC. The info includes the version information and the current value of the HECI1 status registers. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-5-daniele.ceraolospurio@intel.com	2024-08-29 10:32:20 -07:00
Daniele Ceraolo Spurio	f7c2ea682d	drm/xe/gsc: Track the platform in the compatibility version The GSC compatibility version number is reset for each new platform. To indicate this, the version includes a number that identifies the platform (102 = MTL, 104 = LNL); this matches what happens for the release version, where the major number also identifies a platform. To make it clearer in our logs that the compatibility version is specific to the platform, it is useful to include this platform number. However, given that our binary names already include the platform, it is not necessary to add this extra number there. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-4-daniele.ceraolospurio@intel.com	2024-08-29 10:32:19 -07:00
Daniele Ceraolo Spurio	7293859c51	drm/xe/gsc: Fix FW status if the firmware is already loaded We set the FW status to "TRANSFERRED" after the load completes and to "RUNNING"once we're done with proxy init, so do the same if we're trying to re-load the FW and it is already loaded. Note that there is no difference in driver behavior between the 2 states, but it's useful to be accurate when we dump the status for debug. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-3-daniele.ceraolospurio@intel.com	2024-08-29 10:32:18 -07:00
Daniele Ceraolo Spurio	2160f6f6e3	drm/xe/gsc: Do not attempt to load the GSC multiple times The GSC HW is only reset by driver FLR or D3cold entry. We don't support the former at runtime, while the latter is only supported on DGFX, for which we don't support GSC. Therefore, if GSC failed to load previously there is no need to try again because the HW is stuck in the error state. An assert has been added so that if we ever add DGFX support we'll know we need to handle the D3 case. v2: use "< 0" instead of "!= 0" in the FW state error check (Julia). Fixes: `dd0e89e5ed` ("drm/xe/gsc: GSC FW load") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-2-daniele.ceraolospurio@intel.com	2024-08-29 10:32:17 -07:00
Jani Nikula	87d8ecf015	drm/xe: replace #include <drm/xe_drm.h> with <uapi/drm/xe_drm.h> include/drm/xe_drm.h does not exist. Prefer the explicit uapi include. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827091539.4136838-1-jani.nikula@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 15:17:54 -04:00
Karthik Poosa	a7f657097e	drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16 WRITE_I1 sub-command of the POWER_SETUP pcode command accepts a u16 parameter instead of u32. This change prevents potential illegal sub-command errors. v2: Mask uval instead of changing the prototype. (Badal) v3: Rephrase commit message. (Badal) Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: `92d44a422d` ("drm/xe/hwmon: Expose card reactive critical power") Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827155301.183383-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 14:59:26 -04:00
Ilia Levi	aeb4ae66cb	drm/xe: move the kernel lrc from hwe to execlist port The kernel lrc is used solely by the execlist infra. Move it to the execlist port struct and initialize it only when execlists are used. v2: Rebase, improve error handling readability (Jonathan) Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826100655.1719060-1-ilia.levi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 14:50:13 -04:00
Balasubramani Vivekanandan	3adcf970dc	drm/xe/bmg: Drop force_probe requirement Battlemage platform is sufficiently tested and found stable. CI is also pretty stable. Remove the force_probe requirement to enable the platform support by default. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828082152.3194814-1-balasubramani.vivekanandan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-28 10:47:03 -07:00
Himal Prasad Ghimiray	c72084163c	drm/xe: Fix NPD in ggtt_node_remove() Make sure that ggtt_node_remove() is invoked only if both node and ggtt are not null. Move the null checks to the caller function xe_ggtt_node_remove(). v2: Move null check below declarations (Tejas) Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828092229.3606503-1-himal.prasad.ghimiray@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 13:15:40 -04:00
Thomas Hellström	379cad69bd	drm/xe: Use separate rpm lockdep map for non-d3cold-capable devices For non-d3cold-capable devices we'd like to be able to wake up the device from reclaim. In particular, for Lunar Lake we'd like to be able to blit CCS metadata to system at shrink time; at least from kswapd where it's reasonable OK to wait for rpm resume and a preceding rpm suspend. Therefore use a separate lockdep map for such devices and prime it reclaim-tainted. v2: - Rename lockmap acquire- and release functions. (Rodrigo Vivi). - Reinstate the old xe_pm_runtime_lockdep_prime() function and rename it to xe_rpm_might_enter_cb(). (Matthew Auld). - Introduce a separate xe_pm_runtime_lockdep_prime function called from module init for known required locking orders. v3: - Actually hook up the prime function at module init. v4: - Rebase. v5: - Don't use reclaim-safe RPM with sriov. Cc: "Vivi, Rodrigo" <rodrigo.vivi@intel.com> Cc: "Auld, Matthew" <matthew.auld@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826143450.92511-1-thomas.hellstrom@linux.intel.com	2024-08-28 16:29:15 +02:00
Nirmoy Das	789e51597d	Revert "drm/ttm: Add a flag to allow drivers to skip clear-on-free" Remove TTM_TT_FLAG_CLEARED_ON_FREE now that XE stopped using this flag. This reverts commit `decbfaf06d`. Cc: Christian König <christian.koenig@amd.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828083635.23601-2-nirmoy.das@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-28 06:45:53 -07:00
Nirmoy Das	7546a8201b	Revert "drm/xe/lnl: Offload system clear page activity to GPU" This optimization relied on having to clear CCS on allocations. If there is no need to clear CCS on allocations then this would mostly help in reducing CPU utilization. Revert this patch at this moment because of: 1 Currently Xe can't do clear on free and using a invalid ttm flag, TTM_TT_FLAG_CLEARED_ON_FREE which could poison global ttm pool on multi-device setup. 2 Also for LNL CPU:WB doesn't require clearing CCS as such BO will not be allowed to bind with compression PTE. Subsequent patch will disable clearing CCS for CPU:WB BOs for LNL. This reverts commit `2368306180`. Cc: Christian König <christian.koenig@amd.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828083635.23601-1-nirmoy.das@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-28 06:45:52 -07:00
Thomas Zimmermann	014125c64d	drm/xe: Support 'nomodeset' kernel command-line option Setting 'nomodeset' on the kernel command line disables all graphics drivers with modesetting capabilities, leaving only firmware drivers, such as simpledrm or efifb. Most DRM drivers automatically support 'nomodeset' via DRM's module helper macros. In xe, which uses regular module_init(), manually call drm_firmware_drivers_only() to test for 'nomodeset'. Do not register the driver if set. v2: - use xe's init table (Lucas) - do NULL test for init/exit functions Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827121003.97429-1-tzimmermann@suse.de Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-27 12:35:19 -07:00
Himal Prasad Ghimiray	8a04e34268	drm/xe: Remove unrequired NULL check in xe_sched_job_free_fences dma_fence_chain_free() can handle NULL input, there is no need for NULL check by caller. Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820090230.3258128-3-himal.prasad.ghimiray@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-08-27 10:32:57 +02:00
Himal Prasad Ghimiray	19f01d4bbe	drm/xe: Remove unrequired NULL checks in xe_sync_entry_cleanup dma_fence_put() and dma_fence_chain_free() can handle NULL input, there is no need for NULL check by caller. Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820090230.3258128-2-himal.prasad.ghimiray@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-08-27 10:32:57 +02:00
Himal Prasad Ghimiray	11b7309dbe	drm/xe: Remove extra dma_fence_put on xe_sync_entry_add_deps failure drm_sched_job_add_dependency() drops references even in case of error, no need for caller to call dma_fence_put. Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820090230.3258128-1-himal.prasad.ghimiray@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-08-27 10:32:57 +02:00
Lucas De Marchi	9c57bc0865	drm/xe/lnl: Drop force_probe requirement Lunar Lake has been usable for a while in a desktop setup. Bugs are sporadically showing up in CI, but being promptly fixed. Nothing very concerning. All the uapi changes related to fundamental platform usage have been finalized. Remove the force_probe requirement and enable the platform by default. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240822224615.793540-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-26 21:56:19 -07:00
Apoorva Singh	65112db0c2	drm/xe: Remove NULL check of lrc->bo in xe_lrc_snapshot_capture() - lrc->bo NULL check is not needed in xe_lrc_snapshot_capture() as its already been taken care of in xe_lrc_init(). Signed-off-by: Apoorva Singh <apoorva.singh@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240816080355.897256-1-apoorva.singh@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-08-26 10:27:33 +02:00
Nathan Chancellor	ff9c674d11	drm/xe: Fix total initialization in xe_ggtt_print_holes() Clang warns (or errors with CONFIG_DRM_WERROR or CONFIG_WERROR): drivers/gpu/drm/xe/xe_ggtt.c:810:3: error: variable 'total' is uninitialized when used here [-Werror,-Wuninitialized] 810 \| total += hole_size; \| ^~~~~ drivers/gpu/drm/xe/xe_ggtt.c:798:11: note: initialize the variable 'total' to silence this warning 798 \| u64 total; \| ^ \| = 0 1 error generated. Move the zero initialization of total from xe_gt_sriov_pf_config_print_available_ggtt() to xe_ggtt_print_holes() to resolve the warning. Fixes: `136367290e` ("drm/xe: Introduce xe_ggtt_print_holes") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823-drm-xe-fix-total-in-xe_ggtt_print_holes-v1-1-12b02d079327@kernel.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-24 06:11:26 -07:00
Vinod Govindapillai	66a0f6b9f5	drm/xe/display: handle HPD polling in display runtime suspend/resume In XE, display runtime suspend / resume routines are called only if d3cold is allowed. This makes the driver unable to detect any HPDs once the device goes into runtime suspend state in platforms like LNL. Update the display runtime suspend / resume routines to include HPD polling regardless of d3cold status. While xe_display_pm_suspend/resume() performs steps during runtime suspend/resume that shouldn't happen, like suspending MST and they are missing other steps like enabling DC9, this patchset is meant to keep the current behavior wrt. these, leaving the corresponding updates for a follow-up v2: have a separate function for display runtime s/r (Rodrigo) v3: better streamlining of system s/r and runtime s/r calls (Imre) v4: rebased Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823112148.327015-4-vinod.govindapillai@intel.com	2024-08-23 22:10:55 +03:00
Imre Deak	1228241654	drm/xe: Handle polling only for system s/r in xe_display_pm_suspend/resume() This is a preparation for the follow-up patch where polling will be handled properly for all cases during runtime suspend/resume. v2: rebased Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823112148.327015-3-vinod.govindapillai@intel.com	2024-08-23 22:10:55 +03:00
Imre Deak	a64e7e5b05	drm/xe: Suspend/resume user access only during system s/r Enable/Disable user access only during system suspend/resume. This should not happen during runtime s/r v2: rebased Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823112148.327015-2-vinod.govindapillai@intel.com	2024-08-23 22:10:54 +03:00
Matthew Brost	501d943893	drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map Preferred way to create kernel BOs is xe_managed_bo_create_pin_map, use it. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-7-matthew.brost@intel.com	2024-08-23 09:54:32 -07:00
Matthew Brost	6eb2aad402	drm/xe: Move hw_engine_fini to devm managed Kernel BOs are destroyed with GGTT mappings, this is hardware interaction so use devm. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-5-matthew.brost@intel.com	2024-08-23 09:54:13 -07:00
Matthew Brost	a323782567	drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini Not a big deal if CT is down as driver is unloading, no need to warn. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-4-matthew.brost@intel.com	2024-08-23 09:54:12 -07:00
Matthew Brost	b5de6a5ced	drm/xe: Set firmware state to loadable before registering guc_fini_hw The guc_fini_hw registered calls __xe_uc_fw_status which is only expected to be called after initializing fw state. Move this before registering guc_fini_hw. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-3-matthew.brost@intel.com	2024-08-23 09:54:11 -07:00
Matthew Brost	5b993d00d7	drm/xe: Move ggtt_fini to devm managed ggtt->scratch is destroyed via devm, ggtt_fini sets ggtt->scratch to NULL, ggtt->scratch in GGTT clears, so ensure ggtt->scratch is set NULL before the BO is destroyed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-2-matthew.brost@intel.com	2024-08-23 09:54:10 -07:00
Matthew Brost	25ebe10e3f	Revert "drm/xe: Invalidate media_gt TLBs in PT code" This reverts commit `40520283e0`. We can't install dma-fence-chain in timeline sync objs. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823162207.2168887-1-matthew.brost@intel.com	2024-08-23 09:51:47 -07:00
Rodrigo Vivi	919bb54e98	drm/xe: Fix missing runtime outer protection for ggtt_remove_node Defer the ggtt node removal to a thread if runtime_pm is not active. The ggtt node removal can be called from multiple places, including places where we cannot protect with outer callers and places we are within other locks. So, try to grab the runtime reference if the device is already active, otherwise defer the removal to a separate thread from where we are sure we can wake the device up. v2: - use xe wq instead of system wq (Matt and CI) - Avoid GFP_KERNEL to be future proof since this removal can be called from outside our drivers and we don't want to block if atomic is needed. (Brost) v3: amend forgot chunk declaring xe_device. v4: Use a xe_ggtt_region to encapsulate the node and remova info, wihtout the need for any memory allocation at runtime. v5: Actually fill the delayed_removal.invalidate (Brost) v6: - Ensure that ggtt_region is not freed before work finishes (Auld) - Own wq to ensures that the queued works are flushed before ggtt_fini (Brost) v7: also free ggtt_region on early !bound return (Auld) v8: Address the null deref (CI) v9: Based on the new xe_ggtt_node for the proper care of the lifetime of the object. v10: Redo the lost v5 change. (Brost) v11: Simplify the invalidate_on_remove (Lucas) Cc: Matthew Auld <matthew.auld@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-12-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	34e804220f	drm/xe: Make xe_ggtt_node struct independent In some rare cases, the drm_mm node cannot be removed synchronously due to runtime PM conditions. In this situation, the node removal will be delegated to a workqueue that will be able to wake up the device before removing the node. However, in this situation, the lifetime of the xe_ggtt_node cannot be restricted to the lifetime of the parent object. So, this patch introduces the infrastructure so the xe_ggtt_node struct can be allocated in advance and freed when needed. By having the ggtt backpointer, it also ensure that the init function is always called before any attempt to insert or reserve the node in the GGTT. v2: s/xe_ggtt_node_force_fini/xe_ggtt_node_fini and use it internaly (Brost) v3: - Use GF_NOFS for node allocation (CI) - Avoid ggtt argument, now that we have it inside the node (Lucas) - Fix some missed fini cases (CI) v4: - Fix SRIOV critical case where config->ggtt_region was lost (Michal) - Avoid ggtt argument also on removal (missed case on v3) (Michal) - Remove useless checks (Michal) - Return 0 instead of negative errno on a u32 addr. (Michal) - s/xe_ggtt_assign/xe_ggtt_node_assign for coherence, while we are touching it (Michal) v5: - Fix VFs' ggtt_balloon Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-11-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	15ca09499b	drm/xe: Refactor xe_ggtt balloon functions to make the node clear These operations are related to node. Convert them to the new appropriate name space xe_ggtt_node. v2: Also move arguments around for consistency (Lucas). v3: s/node_balloon/node_insert_balloon and s/node_deballoon/node_remove_balloon (Michal). Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-10-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	136367290e	drm/xe: Introduce xe_ggtt_print_holes Introduce a new xe_ggtt_print_holes helper that attends the SRIOV demand and finishes the goal of limiting drm_mm access to xe_ggtt. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-9-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	1144e0dff5	drm/xe: Introduce xe_ggtt_largest_hole Introduce a new xe_ggtt_largest_hole helper that attends the SRIOV demand and continue with the goal of limiting drm_mm access to xe_ggtt. v2: Fix a typo (Michal) Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-8-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	8b5ccc9743	drm/xe: Limit drm_mm_node_allocated access to xe_ggtt_node Continue with the encapsulation of drm_mm_node inside xe_ggtt. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-7-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:44 -04:00
Rodrigo Vivi	0567f18e07	drm/xe: Rename xe_ggtt_node related functions Bring some consistency and prepare for more xe_ggtt_node related functions to be introduced. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-6-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:44 -04:00
Rodrigo Vivi	6062ea9398	drm/xe: Encapsulate drm_mm_node inside xe_ggtt_node The xe_ggtt component uses drm_mm to manage the GGTT. The drm_mm_node is just a node inside drm_mm, but in Xe we use that only in the GGTT context. So, this patch encapsulates the drm_mm_node into a xe_ggtt's new struct. This is the first step towards limiting all the drm_mm access through xe_ggtt. The ultimate goal is to have a better control of the node insertion and removal, so the removal can be delegated to a delayed workqueue. v2: Fix includes and typos (Michal and Brost) Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-5-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:44 -04:00
Rodrigo Vivi	6dbd43dced	drm/{i915, xe}: Avoid direct inspection of dpt_vma from outside dpt DPT code is so dependent on i915 vma implementation and it is not ported yet to Xe. This patch limits inspection to DPT's VMA struct to intel_dpt component only, so the Xe GGTT code can evolve. Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:44 -04:00
Rodrigo Vivi	df99acc7ba	drm/xe: Remove unnecessary drm_mm.h includes These includes are no longer necessary, and where appropriate are replaced by the linux/types.h one. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:44 -04:00
Rodrigo Vivi	244fe16663	drm/xe: Introduce GGTT documentation Document xe_ggtt and ensure it is part of the built kernel docs. v2: - Accepted all Michal's suggestions - Rebased on top of new set_pte per platform/wa function pointer v3: - Typos and other acronym fixes (Michal) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> #v1 Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:44 -04:00
Rodrigo Vivi	69f0925c67	drm/xe: Removed unused xe_ggtt_printk Apparently this was only useful when enabling ggtt support for the very first time and never used again. It is also not useful now that we have the ggtt_dump available through debugfs. Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:44 -04:00
Matthew Auld	321d6b4b9c	drm/xe: fixup xe_alloc_pf_queue kzalloc expects number of bytes, therefore we should convert the number of dw into bytes, otherwise we are likely just accessing beyond the array causing all kinds of carnage. Also fixup the error handling while we are here. v2: - Prefer kcalloc (dim) Fixes: `3338e4f90c` ("drm/xe: Use topology to determine page fault queue size") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821171917.417386-2-matthew.auld@intel.com	2024-08-21 19:38:24 -07:00
Matthew Brost	40520283e0	drm/xe: Invalidate media_gt TLBs in PT code Testing on LNL has shown media GT's TLBs need to be invalidated via the GuC, update PT code appropriately. v2: - Do dma_fence_get before first call of invalidation_fence_init (Himal) - No need to check for valid chain fence (Himal) Fixes: `3330361543` ("drm/xe/lnl: Add LNL platform definition") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820161632.987369-1-matthew.brost@intel.com	2024-08-21 19:34:07 -07:00
Matthew Brost	77cc3f6c58	drm/xe: Invalidate media_gt TLBs Testing on LNL has shown media TLBs need to be invalidated via the GuC, update xe_vm_invalidate_vma appropriately. v2: Fix 2 tile case v3: Include missing local change Fixes: `3330361543` ("drm/xe/lnl: Add LNL platform definition") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820160129.986889-1-matthew.brost@intel.com	2024-08-21 08:53:50 -07:00
Matthew Brost	32a42c93b7	drm/xe: Free job before xe_exec_queue_put Free job depends on job->vm being valid, the last xe_exec_queue_put can destroy the VM. Prevent UAF by freeing job before xe_exec_queue_put. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820202309.1260755-1-matthew.brost@intel.com	2024-08-21 08:38:37 -07:00
Matthew Brost	60db6f540a	drm/xe: Drop HW fence pointer to HW fence ctx The HW fence ctx objects are not ref counted rather tied to the life of an LRC object. HW fences reference the HW fence ctx, HW fences can outlive LRCs thus resulting in UAF. Drop the HW fence pointer to HW fence ctx rather just store what is needed directly in HW fence. v2: - Fix typo in commit (Ashutosh) - Use snprintf (Ashutosh) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240815193522.16008-1-matthew.brost@intel.com	2024-08-20 13:06:00 -07:00
Stuart Summers	df2dbc925f	drm/xe/guc: Bump the G2H queue size to account for page faults With the increase in the size of the recoverable page fault queue, we want to ensure the initial messages from GuC in the G2H buffer have space while we transfer those out to the actual pf_queue. Bump the G2H queue size to account for this increase in the pf_queue size. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4c2b6974801bcffd8a010d838c8733fa4092573d.1723862633.git.stuart.summers@intel.com	2024-08-20 09:45:54 -07:00
Stuart Summers	3338e4f90c	drm/xe: Use topology to determine page fault queue size Currently the page fault queue size is hard coded. However the hardware supports faulting for each EU and each CS. For some applications running on hardware with a large number of EUs and CSs, this can result in an overflow of the page fault queue. Add a small calculation to determine the page fault queue size based on the number of EUs and CSs in the platform as detmined by fuses. Signed-off-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/24d582a3b48c97793b8b6a402f34b4b469471636.1723862633.git.stuart.summers@intel.com	2024-08-20 09:45:51 -07:00
Stuart Summers	7586fc52b1	drm/xe: Fix missing workqueue destroy in xe_gt_pagefault On driver reload we never free up the memory for the pagefault and access counter workqueues. Add those destroy calls here. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/c9a951505271dc3a7aee76de7656679f69c11518.1723862633.git.stuart.summers@intel.com	2024-08-20 09:40:30 -07:00

1 2 3 4 5 ...

1295281 Commits