linux

Author	SHA1	Message	Date
Alan Previn	a0f1f7b4f7	drm/i915/guc: Print the GuC error capture output register list. Print the GuC captured error state register list (string names and values) when gpu_coredump_state printout is invoked via the i915 debugfs for flushing the gpu error-state that was captured prior. Since GuC could have reported multiple engine register dumps in a single notification event, parse the captured data (appearing as a stream of structures) to identify each dump as a different 'engine-capture-group-output'. Finally, for each 'engine-capture-group-output' that is found, verify if the engine register dump corresponds to the engine_coredump content that was previously populated by the i915_gpu_coredump function. That function would have copied the context's vma's including the bacth buffer during the G2H-context-reset notification that occurred earlier. Perform this verification check by comparing guc_id, lrca and engine- instance obtained from the 'engine-capture-group-output' vs a copy of that same info taken during i915_gpu_coredump. If they match, then print those vma's as well (such as the batch buffers). NOTE: the output format was verified using the gem_exec_capture IGT test. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-14-alan.previn.teres.alexis@intel.com	2022-03-22 10:33:31 -07:00
Alan Previn	a6f0f9cf33	drm/i915/guc: Plumb GuC-capture into gpu_coredump Add a flags parameter through all of the coredump creation functions. Add a bitmask flag to indicate if the top level gpu_coredump event is triggered in response to a GuC context reset notification. Using that flag, ensure all coredump functions that read or print mmio-register values related to work submission or command-streamer engines are skipped and replaced with a calls guc-capture module equivalent functions to retrieve or print the register dump. While here, split out display related register reading and printing into its own function that is called agnostic to whether GuC had triggered the reset. For now, introduce an empty printing function that can filled in on a subsequent patch just to handle formatting. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-13-alan.previn.teres.alexis@intel.com	2022-03-22 10:33:31 -07:00
Matt Roper	cc1338f259	drm/i915/xehp: Update topology dumps for Xe_HP When running on Xe_HP or beyond, let's use an updated format for describing topology in our error state dumps and debugfs to give a more accurate view of the hardware: - Just report DSS directly without the legacy "slice0" output that's no longer meaningful. - Indicate whether each DSS is accessible for geometry and/or compute. - Rename "rcs_topology" to "sseu_topology" since the information reported is common to both RCS and CCS engines now. v2: - Name static functions in a more consistent manner. (Lucas) Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220311225459.385515-2-matthew.d.roper@intel.com	2022-03-14 09:33:59 -07:00
Matt Roper	239bbb2fe9	drm/i915/gt: Remove GEN12_SFC_DONE_MAX from register defs header We shouldn't really be keeping track of how many SFC_DONE registers our platforms can have, but rather how many SFC hardware units there can be (each SFC unit will have one corresponding SFC_DONE register). So drop the stray GEN12_SFC_DONE_MAX definition we had in the register definition file and replace it with an I915_MAX_SFC that follows the pattern we use for other hardware units. Note that our hardware has a 2:1:1 ratio of VD:VE:SFC, and as far as we know that pattern should carry forward to future platforms, so we'll define it as #VCS/2. Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220311062835.163744-1-matthew.d.roper@intel.com	2022-03-11 08:18:27 -08:00
Rodrigo Vivi	30424ebae8	Merge tag 'drm-intel-gt-next-2022-02-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-intel-next UAPI Changes: - Weak parallel submission support for execlists Minimal implementation of the parallel submission support for execlists backend that was previously only implemented for GuC. Support one sibling non-virtual engine. Core Changes: - Two backmerges of drm/drm-next for header file renames/changes and i915_regs reorganization Driver Changes: - Add new DG2 subplatform: DG2-G12 (Matt R) - Add new DG2 workarounds (Matt R, Ram, Bruce) - Handle pre-programmed WOPCM registers for DG2+ (Daniele) - Update guc shim control programming on XeHP SDV+ (Daniele) - Add RPL-S C0/D0 stepping information (Anusha) - Improve GuC ADS initialization to work on ARM64 on dGFX (Lucas) - Fix KMD and GuC race on accessing PMU busyness (Umesh) - Use PM timestamp instead of RING TIMESTAMP for reference in PMU with GuC (Umesh) - Report error on invalid reset notification from GuC (John) - Avoid WARN splat by holding RPM wakelock during PXP unbind (Juston) - Fixes to parallel submission implementation (Matt B.) - Improve GuC loading status check/error reports (John) - Tweak TTM LRU priority hint selection (Matt A.) - Align the plane_vma to min_page_size of stolen mem (Ram) - Introduce vma resources and implement async unbinding (Thomas) - Use struct vma_resource instead of struct vma_snapshot (Thomas) - Return some TTM accel move errors instead of trying memcpy move (Thomas) - Fix a race between vma / object destruction and unbinding (Thomas) - Remove short-term pins from execbuf (Maarten) - Update to GuC version 69.0.3 (John, Michal Wa.) - Improvements to GT reset paths in GuC backend (Matt B.) - Use shrinker_release_pages instead of writeback in shmem object hooks (Matt A., Tvrtko) - Use trylock instead of blocking lock when freeing GEM objects (Maarten) - Allocate intel_engine_coredump_alloc with ALLOW_FAIL (Matt B.) - Fixes to object unmapping and purging (Matt A) - Check for wedged device in GuC backend (John) - Avoid lockdep splat by locking dpt_obj around set_cache_level (Maarten) - Allow dead vm to unbind vma's without lock (Maarten) - s/engine->i915/i915/ for DG2 engine workarounds (Matt R) - Use to_gt() helper for GGTT accesses (Michal Wi.) - Selftest improvements (Matt B., Thomas, Ram) - Coding style and compiler warning fixes (Matt B., Jasmine, Andi, Colin, Gustavo, Dan) From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Yg4i2aCZvvee5Eai@jlahtine-mobl.ger.corp.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Fixed conflicts while applying, using the fixups/drm-intel-gt-next.patch from drm-rerere's 1f2b1742abdd ("2022y-02m-23d-16h-07m-57s UTC: drm-tip rerere cache update")]	2022-02-23 15:03:51 -05:00
Jani Nikula	5f2ec9095c	drm/i915: don't include drm_cache.h in i915_drv.h Include it only in files that use it. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/14edab4a193ea3f73f387a88e3836c8555401871.1644507885.git.jani.nikula@intel.com	2022-02-14 13:19:37 +02:00
Jani Nikula	24524e3f43	drm/i915: move the DRIVER_* macros to i915_driver.[ch] The macros are more at home in i915_driver.[ch]. v2: Rebase Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220209123121.3337496-1-jani.nikula@intel.com	2022-02-10 11:44:25 +02:00
Matt Roper	0d6419e9c8	drm/i915: Move GT registers to their own header file This is a huge, chaotic mass of registers copied over as-is without any real cleanup. We'll come back and organize these better, align on consistent coding style, remove dead code, etc. in separate patches later that will be easier to review. v2: - Add missing include in intel_pxp_irq.c v3: - Correct a few indentation errors (Lucas) - Minor conflict resolution Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-6-matthew.d.roper@intel.com	2022-02-02 07:59:14 -08:00
Rodrigo Vivi	063565aca3	Merge drm/drm-next into drm-intel-next Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next for a possible topic branch for merging the split of i915_regs... Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2022-01-31 13:19:33 -05:00
Matthew Brost	4f72fc3c7f	drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL Allocate intel_engine_coredump_alloc with ALLOW_FAIL rather than GFP_KERNEL to fully decouple the error capture from fence signalling. v2: (John Harrison) - Fix typo in commit message (s/do/to) Fixes: `8b91cdd4f8` ("drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220121043118.24886-2-matthew.brost@intel.com	2022-01-21 15:46:41 -08:00
Matt Roper	202b1f4c12	drm/i915/gt: Move engine registers to their own header Let's continue breaking up and cleaning up the massive i915_reg.h file by moving all registers that are defined in relation to an engine base to their own header. There are probably a bunch of other "engine registers" that we haven't moved yet (especially those that belong to the render engine in the 0x2??? range), but this is a relatively straightforward first step. Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111051600.3429104-8-matthew.d.roper@intel.com	2022-01-11 14:03:25 -08:00
Thomas Hellström	60dc43d119	drm/i915: Use struct vma_resource instead of struct vma_snapshot There is always a struct vma_resource guaranteed to be alive when we access a corresponding struct vma_snapshot. So ditch the latter and instead of allocating vma_snapshots, reference the already existning vma_resource. This requires a couple of extra members in struct vma_resource but that's a small price to pay for the simplification. v2: - Fix a missing include and declaration (kernel test robot <lkp@intel.com>) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-7-thomas.hellstrom@linux.intel.com	2022-01-11 10:54:11 +01:00
Thomas Hellström	39a2bd34c9	drm/i915: Use the vma resource as argument for gtt binding / unbinding When introducing asynchronous unbinding, the vma itself may no longer be alive when the actual binding or unbinding takes place. Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource instead of a struct i915_vma for the bind_vma() and unbind_vma() ops. Similarly change the insert_entries() op for struct i915_address_space. Replace a couple of i915_vma_snapshot members with their newly introduced i915_vma_resource counterparts, since they have the same lifetime. Also make sure to avoid changing the struct i915_vma_flags (in particular the bind flags) async. That should now only be done sync under the vm mutex. v2: - Update the vma_res::bound_flags when binding to the aliased ggtt v6: - Remove I915_VMA_ALLOC_BIT (Matthew Auld) - Change some members of struct i915_vma_resource from unsigned long to u64 (Matthew Auld) v7: - Fix vma resource size parameters to be u64 rather than unsigned long (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-3-thomas.hellstrom@linux.intel.com	2022-01-11 10:53:14 +01:00
Michał Winiarski	2cbc876daa	drm/i915: Use to_gt() helper Use to_gt() helper consistently throughout the codebase. Pure mechanical s/i915->gt/to_gt(i915). No functional changes. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214193346.21231-10-andi.shyti@linux.intel.com	2021-12-17 21:51:59 -08:00
Thomas Hellström	63cb9da6fc	drm/i915: Fix coredump of perma-pinned vmas When updating the error capture code and introducing vma snapshots, we introduced code to hold the vma in memory while capturing it, calling i915_active_acquire_if_busy(). Now that function isn't relevant for perma-pinned vmas and caused important vmas to be dropped from the coredump. Like for example the GuC log. Fix this by instead requiring those vmas to be pinned while capturing. Tested by running the initial subtests of the gem_exec_capture igt test with GuC submission enabled and verifying that a GuC log blob appears in the output. Fixes: `ff20afc4ce` ("drm/i915: Update error capture code to avoid using the current vma state") Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reported-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211208082245.86933-1-thomas.hellstrom@linux.intel.com	2021-12-13 12:14:01 +01:00
Thomas Hellström	ff20afc4ce	drm/i915: Update error capture code to avoid using the current vma state With asynchronous migrations, the vma state may be several migrations ahead of the state that matches the request we're capturing. Address that by introducing an i915_vma_snapshot structure that can be used to snapshot relevant state at request submission. In order to make sure we access the correct memory, the snapshots take references on relevant sg-tables and memory regions. Also move the capture list allocation out of the fence signaling critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to avoid compiling in members and functions used for error capture when they're not used. Finally, Introduce lockdep annotation. v4: - Break out the capture allocation mode change to a separate patch. v5: - Fix compilation error in the !CONFIG_DRM_I915_CAPTURE_ERROR case (kernel test robot) v6: - Use #if IS_ENABLED() instead of #ifdef to match driver style. - Move yet another change of allocation mode to the separate patch. - Commit message rework due to patch reordering. v7: - Adjust for removal of region refcounting. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211129202245.472043-1-thomas.hellstrom@linux.intel.com	2021-12-01 16:53:22 +01:00
Tvrtko Ursulin	cca0846923	drm/i915: Use per device iommu check With both integrated and discrete Intel GPUs in a system, the current global check of intel_iommu_gfx_mapped, as done from intel_vtd_active() may not be completely accurate. In this patch we add i915 parameter to intel_vtd_active() in order to prepare it for multiple GPUs and we also change the check away from Intel specific intel_iommu_gfx_mapped (global exported by the Intel IOMMU driver) to probing the presence of IOMMU on a specific device using device_iommu_mapped(). This will return true both for IOMMU pass-through and address translation modes which matches the current behaviour. If in the future we wanted to distinguish between these two modes we could either use iommu_get_domain_for_dev() and check for __IOMMU_DOMAIN_PAGING bit indicating address translation, or ask for a new API to be exported from the IOMMU core code. v2: * Check for dmar translation specifically, not just iommu domain. (Baolu) v3: * Go back to plain "any domain" check for now, rewrite commit message. v4: * Use device_iommu_mapped. (Robin, Baolu) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211126141424.493753-1-tvrtko.ursulin@linux.intel.com	2021-12-01 09:21:47 +00:00
Thomas Hellström	8b91cdd4f8	drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code The capture code is typically run entirely in the fence signalling critical path. We're about to add lockdep annotation in an upcoming patch which reveals a lockdep splat similar to the below one. Fix the associated potential deadlocks using __GFP_KSWAPD_RECLAIM (which is the same as GFP_WAIT, but open-coded for clarity) rather than GFP_KERNEL for memory allocation in the capture path. This has the potential drawback that capture might fail in situations with memory pressure. [ 234.842048] WARNING: possible circular locking dependency detected [ 234.842050] 5.15.0-rc7+ #20 Tainted: G U W [ 234.842052] ------------------------------------------------------ [ 234.842054] gem_exec_captur/1180 is trying to acquire lock: [ 234.842056] ffffffffa3e51c00 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x4d/0x330 [ 234.842063] but task is already holding lock: [ 234.842064] ffffffffa3f57620 (dma_fence_map){++++}-{0:0}, at: i915_vma_snapshot_resource_pin+0x27/0x30 [i915] [ 234.842138] which lock already depends on the new lock. [ 234.842140] the existing dependency chain (in reverse order) is: [ 234.842142] -> #2 (dma_fence_map){++++}-{0:0}: [ 234.842145] __dma_fence_might_wait+0x41/0xa0 [ 234.842149] dma_resv_lockdep+0x1dc/0x28f [ 234.842151] do_one_initcall+0x58/0x2d0 [ 234.842154] kernel_init_freeable+0x273/0x2bf [ 234.842157] kernel_init+0x16/0x120 [ 234.842160] ret_from_fork+0x1f/0x30 [ 234.842163] -> #1 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: [ 234.842166] fs_reclaim_acquire+0x6d/0xd0 [ 234.842168] __kmalloc_node+0x51/0x3a0 [ 234.842171] alloc_cpumask_var_node+0x1b/0x30 [ 234.842174] native_smp_prepare_cpus+0xc7/0x292 [ 234.842177] kernel_init_freeable+0x160/0x2bf [ 234.842179] kernel_init+0x16/0x120 [ 234.842181] ret_from_fork+0x1f/0x30 [ 234.842184] -> #0 (fs_reclaim){+.+.}-{0:0}: [ 234.842186] __lock_acquire+0x1161/0x1dc0 [ 234.842189] lock_acquire+0xb5/0x2b0 [ 234.842192] fs_reclaim_acquire+0xa1/0xd0 [ 234.842193] __kmalloc+0x4d/0x330 [ 234.842196] i915_vma_coredump_create+0x78/0x5b0 [i915] [ 234.842253] intel_engine_coredump_add_vma+0x36/0xe0 [i915] [ 234.842307] __i915_gpu_coredump+0x290/0x5e0 [i915] [ 234.842365] i915_capture_error_state+0x57/0xa0 [i915] [ 234.842415] intel_gt_handle_error+0x348/0x3e0 [i915] [ 234.842462] intel_gt_debugfs_reset_store+0x3c/0x90 [i915] [ 234.842504] simple_attr_write+0xc1/0xe0 [ 234.842507] full_proxy_write+0x53/0x80 [ 234.842509] vfs_write+0xbc/0x350 [ 234.842513] ksys_write+0x58/0xd0 [ 234.842514] do_syscall_64+0x38/0x90 [ 234.842516] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 234.842519] other info that might help us debug this: [ 234.842521] Chain exists of: fs_reclaim --> mmu_notifier_invalidate_range_start --> dma_fence_map [ 234.842526] Possible unsafe locking scenario: [ 234.842528] CPU0 CPU1 [ 234.842529] ---- ---- [ 234.842531] lock(dma_fence_map); [ 234.842532] lock(mmu_notifier_invalidate_range_start); [ 234.842535] lock(dma_fence_map); [ 234.842537] lock(fs_reclaim); [ 234.842539] * DEADLOCK * [ 234.842540] 4 locks held by gem_exec_captur/1180: [ 234.842543] #0: ffff9007812d9460 (sb_writers#17){.+.+}-{0:0}, at: ksys_write+0x58/0xd0 [ 234.842547] #1: ffff900781d9ecb8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write+0x3a/0xe0 [ 234.842552] #2: ffffffffc11913a8 (capture_mutex){+.+.}-{3:3}, at: i915_capture_error_state+0x1a/0xa0 [i915] [ 234.842602] #3: ffffffffa3f57620 (dma_fence_map){++++}-{0:0}, at: i915_vma_snapshot_resource_pin+0x27/0x30 [i915] [ 234.842656] stack backtrace: [ 234.842658] CPU: 0 PID: 1180 Comm: gem_exec_captur Tainted: G U W 5.15.0-rc7+ #20 [ 234.842661] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 234.842664] Call Trace: [ 234.842666] dump_stack_lvl+0x57/0x72 [ 234.842669] check_noncircular+0xde/0x100 [ 234.842672] ? __lock_acquire+0x3bf/0x1dc0 [ 234.842675] __lock_acquire+0x1161/0x1dc0 [ 234.842678] lock_acquire+0xb5/0x2b0 [ 234.842680] ? __kmalloc+0x4d/0x330 [ 234.842683] ? finish_task_switch.isra.0+0xf2/0x360 [ 234.842686] ? i915_vma_coredump_create+0x78/0x5b0 [i915] [ 234.842734] fs_reclaim_acquire+0xa1/0xd0 [ 234.842737] ? __kmalloc+0x4d/0x330 [ 234.842739] __kmalloc+0x4d/0x330 [ 234.842742] i915_vma_coredump_create+0x78/0x5b0 [i915] [ 234.842793] ? capture_vma+0xbe/0x110 [i915] [ 234.842844] intel_engine_coredump_add_vma+0x36/0xe0 [i915] [ 234.842892] __i915_gpu_coredump+0x290/0x5e0 [i915] [ 234.842939] i915_capture_error_state+0x57/0xa0 [i915] [ 234.842985] intel_gt_handle_error+0x348/0x3e0 [i915] [ 234.843032] ? __mutex_lock+0x81/0x830 [ 234.843035] ? simple_attr_write+0x3a/0xe0 [ 234.843038] ? __lock_acquire+0x3bf/0x1dc0 [ 234.843041] intel_gt_debugfs_reset_store+0x3c/0x90 [i915] [ 234.843083] ? _copy_from_user+0x45/0x80 [ 234.843086] simple_attr_write+0xc1/0xe0 [ 234.843089] full_proxy_write+0x53/0x80 [ 234.843091] vfs_write+0xbc/0x350 [ 234.843094] ksys_write+0x58/0xd0 [ 234.843096] do_syscall_64+0x38/0x90 [ 234.843098] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 234.843101] RIP: 0033:0x7fa467480877 [ 234.843103] Code: 75 05 48 83 c4 58 c3 e8 37 4e ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 234.843108] RSP: 002b:00007ffd14d79b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 234.843112] RAX: ffffffffffffffda RBX: 00007ffd14d79b60 RCX: 00007fa467480877 [ 234.843114] RDX: 0000000000000014 RSI: 00007ffd14d79b60 RDI: 0000000000000007 [ 234.843116] RBP: 0000000000000007 R08: 0000000000000000 R09: 00007ffd14d79ab0 [ 234.843119] R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000014 [ 234.843121] R13: 0000000000000000 R14: 00007ffd14d79b60 R15: 0000000000000005 v5: - Use __GFP_KSWAPD_RECLAIM rather than __GFP_NOWAIT for clarity. (Daniel Vetter) v6: - Include an instance in execlists_capture_work(). - Rework the commit message due to patch reordering. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211108174547.979714-3-thomas.hellstrom@linux.intel.com	2021-11-26 08:26:10 +01:00
Thomas Hellström	e45b98ba62	drm/i915: Avoid allocating a page array for the gpu coredump The gpu coredump typically takes place in a dma_fence signalling critical path, and hence can't use GFP_KERNEL allocations, as that means we might hit deadlocks under memory pressure. However changing to __GFP_KSWAPD_RECLAIM which will be done in an upcoming patch will instead mean a lower chance of the allocation succeeding. In particular large contigous allocations like the coredump page vector. Remove the page vector in favor of a linked list of single pages. Use the page lru list head as the list link, as the page owner is allowed to do that. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211108174547.979714-2-thomas.hellstrom@linux.intel.com	2021-11-26 08:26:08 +01:00
Matt Roper	45f63790e4	drm/i915: Check SFC fusing before recording/dumping SFC_DONE On Xe_HP and beyond the SFC unit may be fused off, even if the corresponding media engines are present. Check the SFC-specific fusing before trying to dump the SFC_DONE instances. Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210917161203.812251-3-matthew.d.roper@intel.com	2021-09-20 21:42:10 -07:00
Joonas Lahtinen	d5dd580deb	Merge drm/drm-next into drm-intel-gt-next Close the divergence which has caused patches not to apply and have a solid baseline for the PXP patches that Rodrigo will send a topic branch PR for. Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2021-09-15 13:23:27 +03:00
Linus Torvalds	477f70cd2a	Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "Highlights: - i915 has seen a lot of refactoring and uAPI cleanups due to a change in the upstream direction going forward This has all been audited with known userspace, but there may be some pitfalls that were missed. - i915 now uses common TTM to enable discrete memory on DG1/2 GPUs - i915 enables Jasper and Elkhart Lake by default and has preliminary XeHP/DG2 support - amdgpu adds support for Cyan Skillfish - lots of implicit fencing rules documented and fixed up in drivers - msm now uses the core scheduler - the irq midlayer has been removed for non-legacy drivers - the sysfb code now works on more than x86. Otherwise the usual smattering of stuff everywhere, panels, bridges, refactorings. Detailed summary: core: - extract i915 eDP backlight into core - DP aux bus support - drm_device.irq_enabled removed - port drivers to native irq interfaces - export gem shadow plane handling for vgem - print proper driver name in framebuffer registration - driver fixes for implicit fencing rules - ARM fixed rate compression modifier added - updated fb damage handling - rmfb ioctl logging/docs - drop drm_gem_object_put_locked - define DRM_FORMAT_MAX_PLANES - add gem fb vmap/vunmap helpers - add lockdep_assert(once) helpers - mark drm irq midlayer as legacy - use offset adjusted bo mapping conversion vgaarb: - cleanups fbdev: - extend efifb handling to all arches - div by 0 fixes for multiple drivers udmabuf: - add hugepage mapping support dma-buf: - non-dynamic exporter fixups - document implicit fencing rules amdgpu: - Initial Cyan Skillfish support - switch virtual DCE over to vkms based atomic - VCN/JPEG power down fixes - NAVI PCIE link handling fixes - AMD HDMI freesync fixes - Yellow Carp + Beige Goby fixes - Clockgating/S0ix/SMU/EEPROM fixes - embed hw fence in job - rework dma-resv handling - ensure eviction to system ram amdkfd: - uapi: SVM address range query added - sysfs leak fix - GPUVM TLB optimizations - vmfault/migration counters i915: - Enable JSL and EHL by default - preliminary XeHP/DG2 support - remove all CNL support (never shipped) - move to TTM for discrete memory support - allow mixed object mmap handling - GEM uAPI spring cleaning - add I915_MMAP_OBJECT_FIXED - reinstate ADL-P mmap ioctls - drop a bunch of unused by userspace features - disable and remove GPU relocations - revert some i915 misfeatures - major refactoring of GuC for Gen11+ - execbuffer object locking separate step - reject caching/set-domain on discrete - Enable pipe DMC loading on XE-LPD and ADL-P - add PSF GV point support - Refactor and fix DDI buffer translations - Clean up FBC CFB allocation code - Finish INTEL_GEN() and friends macro conversions nouveau: - add eDP backlight support - implicit fence fix msm: - a680/7c3 support - drm/scheduler conversion panfrost: - rework GPU reset virtio: - fix fencing for planes ast: - add detect support bochs: - move to tiny GPU driver vc4: - use hotplug irqs - HDMI codec support vmwgfx: - use internal vmware device headers ingenic: - demidlayering irq rcar-du: - shutdown fixes - convert to bridge connector helpers zynqmp-dsub: - misc fixes mgag200: - convert PLL handling to atomic mediatek: - MT8133 AAL support - gem mmap object support - MT8167 support etnaviv: - NXP Layerscape LS1028A SoC support - GEM mmap cleanups tegra: - new user API exynos: - missing unlock fix - build warning fix - use refcount_t" * tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits) drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box drm/amd/display: Remove duplicate dml init drm/amd/display: Update bounding box states (v2) drm/amd/display: Update number of DCN3 clock states drm/amdgpu: disable GFX CGCG in aldebaran drm/amdgpu: Clear RAS interrupt status on aldebaran drm/amdgpu: Add support for RAS XGMI err query drm/amdkfd: Account for SH/SE count when setting up cu masks. drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain drm/amdgpu: drop redundant cancel_delayed_work_sync call drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend drm/amdkfd: map SVM range with correct access permission drm/amdkfd: check access permisson to restore retry fault drm/amdgpu: Update RAS XGMI Error Query drm/amdgpu: Add driver infrastructure for MCA RAS drm/amd/display: Add Logging for HDMI color depth information drm/amd/amdgpu: consolidate PSP TA init shared buf functions drm/amd/amdgpu: add name field back to ras_common_if drm/amdgpu: Fix build with missing pm_suspend_target_state module export ...	2021-09-01 11:26:46 -07:00
Matt Roper	24d032e235	drm/i915: Only access SFC_DONE when media domain is not fused off The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6 forcewake domain and is not accessible if the vdbox in that domain is fused off and the forcewake is not initialized. This mistake went unnoticed because until recently we were using the wrong register offset for the SFC_DONE register; once the register offset was corrected, we started hitting errors like <4> [544.989065] i915 0000:cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000 on parts with fused-off vdbox engines. Fixes: `e50dbdbfd9` ("drm/i915/tgl: Add SFC instdone to error state") Fixes: `9c9c6d0ab0` ("drm/i915: Correct SFC_DONE register offset") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210806174130.1058960-1-matthew.d.roper@intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com> (cherry picked from commit `c5589bb5dc`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Changed Fixes tag to match the cherry-picked `82929a2140`]	2021-08-12 06:04:38 -04:00
Matt Roper	89f2e7ab4d	drm/i915/dg2: Report INSTDONE_GEOM values in error state Xe_HPG adds some additional INSTDONE_GEOM debug registers; the Mesa team has indicated that having these reported in the error state would be useful for debugging GPU hangs. These registers are replicated per-DSS with gslice steering. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210805163647.801064-4-matthew.d.roper@intel.com	2021-08-11 08:21:37 -07:00
Matt Roper	fa9899dad3	drm/i915/xehp: Loop over all gslices for INSTDONE processing We no longer have traditional slices on Xe_HP platforms, but the INSTDONE registers are replicated according to gslice representation which is similar. We can mostly re-use the existing instdone code with just a few modifications: * Create an alternate instdone loop macro that will iterate over the flat DSS space, but still provide the gslice/dss steering values for compatibility with the legacy code. * We should allocate INSTDONE storage space according to the maximum number of gslices rather than the maximum number of legacy slices to ensure we have enough storage space to hold all of the values. XeHP design has 8 gslices, whereas older platforms never had more than 3 slices. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210805163647.801064-3-matthew.d.roper@intel.com	2021-08-11 08:21:12 -07:00
Matthew Brost	573ba126ae	drm/i915/guc: Capture error state on context reset We receive notification of an engine reset from GuC at its completion. Meaning GuC has potentially cleared any HW state we may have been interested in capturing. GuC resumes scheduling on the engine post-reset, as the resets are meant to be transparent, further muddling our error state. There is ongoing work to define an API for a GuC debug state dump. The suggestion for now is to manually disable FW initiated resets in cases where debug state is needed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-19-matthew.brost@intel.com	2021-07-27 17:31:59 -07:00
Thomas Hellström	0ff375759f	drm/i915: Update object placement flags to be mutable The object ops i915_GEM_OBJECT_HAS_IOMEM and the object I915_BO_ALLOC_STRUCT_PAGE flags are considered immutable by much of our code. Introduce a new mem_flags member to hold these and make sure checks for these flags being set are either done under the object lock or with pages properly pinned. The flags will change during migration under the object lock. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624084240.270219-2-thomas.hellstrom@linux.intel.com	2021-06-24 18:50:56 +01:00
Matthew Brost	349a2bc5aa	drm/i915: Move active tracking to i915_sched_engine Move active request tracking and its lock to i915_sched_engine. This lock is also the submission lock so having it in the i915_sched_engine is the correct place. v3: (Jason Ekstrand) Add kernel doc v6: Rebase Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.comk> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-5-matthew.brost@intel.com	2021-06-18 15:13:33 -07:00
Lucas De Marchi	651e7d4857	drm/i915: replace IS_GEN and friends with GRAPHICS_VER This was done by the following semantic patch: @@ expression i915; @@ - INTEL_GEN(i915) + GRAPHICS_VER(i915) @@ expression i915; expression E; @@ - INTEL_GEN(i915) >= E + GRAPHICS_VER(i915) >= E @@ expression dev_priv; expression E; @@ - !IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) != E @@ expression dev_priv; expression E; @@ - IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) == E @@ expression dev_priv; expression from, until; @@ - IS_GEN_RANGE(dev_priv, from, until) + IS_GRAPHICS_VER(dev_priv, from, until) @def@ expression E; identifier id =~ "^gen$"; @@ - id = GRAPHICS_VER(E) + ver = GRAPHICS_VER(E) @@ identifier def.id; @@ - id + ver It also takes care of renaming the variable we assign to GRAPHICS_VER() so to use "ver" rather than "gen". Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210606045050.103862-2-lucas.demarchi@intel.com	2021-06-07 00:59:48 -07:00
Anusha Srivatsa	03256487fe	drm/i915/dmc: Add intel_dmc_has_payload() helper We check for dmc_payload being there at various points in the driver. Replace it with the helper. v2: rebased. v3: Move intel_dmc to intel_dmc.h in another patch (Lucas) v4: Remove headers not needed from intel_dmc.h Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210526220256.4097-3-anusha.srivatsa@intel.com	2021-06-02 23:19:04 -07:00
Anusha Srivatsa	32f9402d56	drm/i915/dmc: s/intel_csr.c/intel_dmc.c and s/intel_csr.h/intel_dmc.h Finally, rename the header and source file from csr to dmc. v2: Add file rename in Documentation. - Place headers in orders. (Jani) Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210518213444.11420-6-anusha.srivatsa@intel.com	2021-05-19 18:47:04 -07:00
Anusha Srivatsa	0633cdcbaa	drm/i915/dmc: Rename macro names containing csr Rename all occurences of CSR_* with DMC_* Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210518213444.11420-4-anusha.srivatsa@intel.com	2021-05-19 18:47:00 -07:00
Anusha Srivatsa	ec2b1485a0	drm/i915/dmc: s/HAS_CSR/HAS_DMC No functional change. Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210518213444.11420-3-anusha.srivatsa@intel.com	2021-05-19 18:46:58 -07:00
Anusha Srivatsa	c24760cf42	drm/i915/dmc: s/intel_csr/intel_dmc No functional change. v2: Chchpatch fixes. Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210518213444.11420-2-anusha.srivatsa@intel.com	2021-05-19 18:46:56 -07:00
Ville Syrjälä	e7c46e43bd	drm/i915: Nuke display error state I doubt anyone has used the display error state since CS flips went the way of the dodo. Just nuke it. It might be semi interesting to have something like this for FIFO underruns and the like, but as it stands this wouldn't provide a sufficient amount of information. So would need an extensive rewrite anyway. The lockless power well handling is also racy, so this could just be contributing noise to test results if we end up accessing something with the relevant power well already disabled. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210505191140.14215-1-ville.syrjala@linux.intel.com Acked-by: Jani Nikula <jani.nikula@intel.com>	2021-05-07 08:06:49 +03:00
Jani Nikula	35bb28ece9	Merge drm/drm-next into drm-intel-next Sync up with upstream. Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2021-03-11 08:52:53 +02:00
Thomas Zimmermann	8ff5446a7c	drm/i915: Remove references to struct drm_device.pdev Using struct drm_device.pdev is deprecated. Convert i915 to struct drm_device.dev. No functional changes. v6: * also remove assignment in selftests/ in a later patch (Chris) v5: * remove assignment in later patch (Chris) v3: * rebased v2: * move gt/ and gvt/ changes into separate patches Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210128133127.2311-2-tzimmermann@suse.de	2021-02-02 13:58:42 +02:00
CQ Tang	c97498363f	drm/i915/error: Fix object page offset within a region io_mapping_map_wc() expects the offset to be relative to the iomapping base address. Currently we just pass in the physical address for the page which only works if the region.start starts at zero. Signed-off-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210119133106.66294-2-matthew.auld@intel.com	2021-01-19 20:36:27 +00:00
Chris Wilson	f170523a7b	drm/i915/gt: Consolidate the CS timestamp clocks Pull the GT clock information [used to derive CS timestamps and PM interval] under the GT so that is it local to the users. In doing so, we consolidate the two references for the same information, of which the runtime-info took note of a potential clock source override and scaling factors. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201223122359.22562-2-chris@chris-wilson.co.uk	2020-12-23 21:10:41 +00:00
Dave Airlie	334a168393	Merge tag 'drm-intel-gt-next-2020-11-12-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next Cross-subsystem Changes: - DMA mapped scatterlist fixes in i915 to unblock merging of https://lkml.org/lkml/2020/9/27/70 (Tvrtko, Tom) Driver Changes: - Fix for user reported issue #2381 (Graphical output stops with "switching to inteldrmfb from simple"): Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init (Ville, Chris) - Fix for Tigerlake (and earlier) to avoid spurious empty CSB events leading to hang (Chris, Bruce) - Delay execlist processing for Tigerlake to avoid hang (Chris) - Fix for Tigerlake RCS engine health check through heartbeat (Chris) - Fix for Tigerlake reserved MOCS entries (Ayaz, Chris) - Fix Media power gate sequence on Tigerlake (Rodrigo) - Enable eLLC caching of display buffers for SKL+ (Ville) - Support parsing of oversize batches on Gen9 (Matt, Chris) - Exclude low pages (128KiB) of stolen from use to avoid thrashing during reset (Chris) - Flush engines before Tigerlake breadcrumbs (Chris) - Use the local HWSP offset during submission (Chris) - Flush coherency domains on first set-domain-ioctl (Chris, Zbigniew) - Use the active reference on the vma while capturing to avoid use-after-free (Chris) - Fix MOCS PTE setting for gen9+ (Ville) - Avoid NULL dereference on IPS driver callback while unbinding i915 (Chris) - Avoid NULL dereference from PT/PD stash allocation error (Matt) - Hold request reference for canceling an active context (Chris) - Avoid infinite loop on x86-32 when mapping a lot of objects (Chris) - Disallow WC mappings when processor doesn't support them (Chris) - Return correct error in i915_gem_object_copy_blt() error path (Dan) - Return correct error in intel_context_create_request() error path (Maarten) - Tune down GuC communication enabled/disabled messages to debug (Jani) - Fix rebased commit "Remove i915_request.lock requirement for execution callbacks" (Chris) - Cancel outstanding work after disabling heartbeats on an engine (Chris) - Signal cancelled requests (Chris) - Retire cancelled requests on unload (Chris) - Scrub HW state on driver remove (Chris) - Undo forced context restores after trivial preemptions (Chris) - Handle PCI unbind in PMU code (Tvrtko) - Fix CPU hotplug with multiple GPUs in PMU code (Trtkko) - Correctly set SFC capability for video engines (Venkata) - Update GuC code to use firmware v49.0.1 (John, Matthew B., Daniele, Oscar, Michel, Rodrigo, Michal) - Improve GuC warnings on loading failure (John) - Avoid ownership race in buffer pool by clearing age (Chris) - Use MMIO to read CSB in case of failure (Chris, Mika) - Show engine properties in engine state dump to indicate changes (Chris, Joonas) - Break up error capture compression loops with cond_resched() (Chris) - Reduce GPU error capture mutex hold time to avoid khungtaskd (Chris) - Serialise debugfs i915_gem_objects with ctx->mutex (Chris) - Always test execution status on closing the context and close if not persistent (Chris) - Avoid mixing integer types during batch copies (Chris, Jared) - Skip over MI_NOOP when parsing to avoid overhead (Chris) - Hold onto an explicit ref to i915_vma_work.pinned (Chris) - Perform all asynchronous waits prior to marking payload start (Chris) - Pull phys pread/pwrite implementations to the backend (Matt) - Improve record of hung engines in error state (Tvrtko) - Allow backends to override pread implementation (Matt) - Reinforce LRC poisoning checks to confirm context survives execution (Chris) - Fix memory region max size calculation (Matt) - Fix order when adding blocks to memory region (Matt) - Eliminate unused intel_virtual_engine_get_sibling func (Chris) - Cleanup kasan warning for on-stack (unsigned long) casting (Chris) - Onion unwind for scratch page allocation failure (Chris) - Poison stolen pages before use (Chris) - Selftest improvements (Chris) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201112163407.GA20320@jlahtine-mobl.ger.corp.intel.com	2020-11-13 15:01:57 +10:00
Tvrtko Ursulin	2dae0c8529	drm/i915: Use ABI engine class in error state ecode Instead of printing out the internal engine mask, which can change between kernel versions making it difficult to map to actual engines, present a bitmask of hanging engines ABI classes. For example: [drm] GPU HANG: ecode 9:8:24dffffd, in gem_exec_schedu [1334] Engine ABI class is useful to quickly categorize render vs media etc hangs in bug reports. Considering virtual engine even more so than the current scheme. v2: * Do not re-order fields. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201105113842.1395391-1-tvrtko.ursulin@linux.intel.com	2020-11-09 12:00:22 +00:00
Tvrtko Ursulin	bda3002485	drm/i915: Improve record of hung engines in error state Between events which trigger engine and GPU resets and capturing the error state we lose information on which engine triggered the reset. Improve this by passing in the hung engine mask down to error capture. Result is that the list of engines in user visible "GPU HANG: ecode <gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just active engines. Most importantly the displayed process is now the one which was actually hung. v2: * Stub prototype. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com	2020-11-09 11:59:43 +00:00
Chris Wilson	db9bc2d35f	drm/i915: Use the active reference on the vma while capturing During error capture, we need to take a reference to the vma from before the reset in order to catpure the contents of the vma later. Currently we are using both an active reference and a kref, but due to nature of the i915_vma reference handling, that kref is on the vma->obj and not the vma itself. This means the vma may be destroyed as soon as it is idle, that is in between the i915_active_release(&vma->active) and the i915_vma_put(vma): <3> [197.866181] BUG: KASAN: use-after-free in intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <3> [197.866339] Read of size 8 at addr ffff8881258cb800 by task gem_exec_captur/1041 <3> [197.866467] <4> [197.866512] CPU: 2 PID: 1041 Comm: gem_exec_captur Not tainted 5.9.0-g5e4234f97efba-kasan_200+ #1 <4> [197.866521] Hardware name: Intel Corp. Broxton P/Apollolake RVP1A, BIOS APLKRVPA.X64.0150.B11.1608081044 08/08/2016 <4> [197.866530] Call Trace: <4> [197.866549] dump_stack+0x99/0xd0 <4> [197.866760] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.866783] print_address_description.constprop.8+0x3e/0x60 <4> [197.866797] ? kmsg_dump_rewind_nolock+0xd4/0xd4 <4> [197.866819] ? lockdep_hardirqs_off+0xd4/0x120 <4> [197.867037] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867249] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867270] kasan_report.cold.10+0x1f/0x37 <4> [197.867492] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867710] intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867949] i915_gpu_coredump.part.29+0x150/0x7b0 [i915] <4> [197.868186] i915_capture_error_state+0x5e/0xc0 [i915] <4> [197.868396] intel_gt_handle_error+0x6eb/0xa20 [i915] <4> [197.868624] ? intel_gt_reset_global+0x370/0x370 [i915] <4> [197.868644] ? check_flags+0x50/0x50 <4> [197.868662] ? __lock_acquire+0xd59/0x6b00 <4> [197.868678] ? register_lock_class+0x1ad0/0x1ad0 <4> [197.868944] i915_wedged_set+0xcf/0x1b0 [i915] <4> [197.869147] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869371] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869398] simple_attr_write+0x153/0x1c0 <4> [197.869428] full_proxy_write+0xee/0x180 <4> [197.869442] ? __sb_start_write+0x1f3/0x310 <4> [197.869465] vfs_write+0x1a3/0x640 <4> [197.869492] ksys_write+0xec/0x1c0 <4> [197.869507] ? __ia32_sys_read+0xa0/0xa0 <4> [197.869525] ? lockdep_hardirqs_on_prepare+0x32b/0x4e0 <4> [197.869541] ? syscall_enter_from_user_mode+0x1c/0x50 <4> [197.869566] do_syscall_64+0x33/0x80 <4> [197.869579] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4> [197.869590] RIP: 0033:0x7fd8b7aee281 <4> [197.869604] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 <4> [197.869613] RSP: 002b:00007ffea3b72008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4> [197.869625] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd8b7aee281 <4> [197.869633] RDX: 0000000000000002 RSI: 00007fd8b81a82e7 RDI: 000000000000000d <4> [197.869641] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000034 <4> [197.869650] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd8b81a82e7 <4> [197.869658] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000 <3> [197.869707] <3> [197.869757] Allocated by task 1041: <4> [197.869833] kasan_save_stack+0x19/0x40 <4> [197.869843] __kasan_kmalloc.constprop.5+0xc1/0xd0 <4> [197.869853] kmem_cache_alloc+0x106/0x8e0 <4> [197.870059] i915_vma_instance+0x212/0x1930 [i915] <4> [197.870270] eb_lookup_vmas+0xe06/0x1d10 [i915] <4> [197.870475] i915_gem_do_execbuffer+0x131d/0x4080 [i915] <4> [197.870682] i915_gem_execbuffer2_ioctl+0x103/0x5d0 [i915] <4> [197.870701] drm_ioctl_kernel+0x1d2/0x270 <4> [197.870710] drm_ioctl+0x40d/0x85c <4> [197.870721] __x64_sys_ioctl+0x10d/0x170 <4> [197.870731] do_syscall_64+0x33/0x80 <4> [197.870740] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <3> [197.870748] <3> [197.870798] Freed by task 22: <4> [197.870865] kasan_save_stack+0x19/0x40 <4> [197.870875] kasan_set_track+0x1c/0x30 <4> [197.870884] kasan_set_free_info+0x1b/0x30 <4> [197.870894] __kasan_slab_free+0x111/0x160 <4> [197.870903] kmem_cache_free+0xcd/0x710 <4> [197.871109] i915_vma_parked+0x618/0x800 [i915] <4> [197.871307] __gt_park+0xdb/0x1e0 [i915] <4> [197.871501] ____intel_wakeref_put_last+0xb1/0x190 [i915] <4> [197.871516] process_one_work+0x8dc/0x15d0 <4> [197.871525] worker_thread+0x82/0xb30 <4> [197.871535] kthread+0x36d/0x440 <4> [197.871545] ret_from_fork+0x22/0x30 <3> [197.871553] <3> [197.871602] The buggy address belongs to the object at ffff8881258cb740 which belongs to the cache i915_vma of size 968 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2553 Fixes: `2850748ef8` ("drm/i915: Pull i915_vma_pin under the vm->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201016092527.29039-1-chris@chris-wilson.co.uk (cherry picked from commit `178536b829`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-10-19 13:29:57 -04:00
Chris Wilson	178536b829	drm/i915: Use the active reference on the vma while capturing During error capture, we need to take a reference to the vma from before the reset in order to catpure the contents of the vma later. Currently we are using both an active reference and a kref, but due to nature of the i915_vma reference handling, that kref is on the vma->obj and not the vma itself. This means the vma may be destroyed as soon as it is idle, that is in between the i915_active_release(&vma->active) and the i915_vma_put(vma): <3> [197.866181] BUG: KASAN: use-after-free in intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <3> [197.866339] Read of size 8 at addr ffff8881258cb800 by task gem_exec_captur/1041 <3> [197.866467] <4> [197.866512] CPU: 2 PID: 1041 Comm: gem_exec_captur Not tainted 5.9.0-g5e4234f97efba-kasan_200+ #1 <4> [197.866521] Hardware name: Intel Corp. Broxton P/Apollolake RVP1A, BIOS APLKRVPA.X64.0150.B11.1608081044 08/08/2016 <4> [197.866530] Call Trace: <4> [197.866549] dump_stack+0x99/0xd0 <4> [197.866760] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.866783] print_address_description.constprop.8+0x3e/0x60 <4> [197.866797] ? kmsg_dump_rewind_nolock+0xd4/0xd4 <4> [197.866819] ? lockdep_hardirqs_off+0xd4/0x120 <4> [197.867037] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867249] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867270] kasan_report.cold.10+0x1f/0x37 <4> [197.867492] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867710] intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867949] i915_gpu_coredump.part.29+0x150/0x7b0 [i915] <4> [197.868186] i915_capture_error_state+0x5e/0xc0 [i915] <4> [197.868396] intel_gt_handle_error+0x6eb/0xa20 [i915] <4> [197.868624] ? intel_gt_reset_global+0x370/0x370 [i915] <4> [197.868644] ? check_flags+0x50/0x50 <4> [197.868662] ? __lock_acquire+0xd59/0x6b00 <4> [197.868678] ? register_lock_class+0x1ad0/0x1ad0 <4> [197.868944] i915_wedged_set+0xcf/0x1b0 [i915] <4> [197.869147] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869371] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869398] simple_attr_write+0x153/0x1c0 <4> [197.869428] full_proxy_write+0xee/0x180 <4> [197.869442] ? __sb_start_write+0x1f3/0x310 <4> [197.869465] vfs_write+0x1a3/0x640 <4> [197.869492] ksys_write+0xec/0x1c0 <4> [197.869507] ? __ia32_sys_read+0xa0/0xa0 <4> [197.869525] ? lockdep_hardirqs_on_prepare+0x32b/0x4e0 <4> [197.869541] ? syscall_enter_from_user_mode+0x1c/0x50 <4> [197.869566] do_syscall_64+0x33/0x80 <4> [197.869579] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4> [197.869590] RIP: 0033:0x7fd8b7aee281 <4> [197.869604] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 <4> [197.869613] RSP: 002b:00007ffea3b72008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4> [197.869625] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd8b7aee281 <4> [197.869633] RDX: 0000000000000002 RSI: 00007fd8b81a82e7 RDI: 000000000000000d <4> [197.869641] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000034 <4> [197.869650] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd8b81a82e7 <4> [197.869658] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000 <3> [197.869707] <3> [197.869757] Allocated by task 1041: <4> [197.869833] kasan_save_stack+0x19/0x40 <4> [197.869843] __kasan_kmalloc.constprop.5+0xc1/0xd0 <4> [197.869853] kmem_cache_alloc+0x106/0x8e0 <4> [197.870059] i915_vma_instance+0x212/0x1930 [i915] <4> [197.870270] eb_lookup_vmas+0xe06/0x1d10 [i915] <4> [197.870475] i915_gem_do_execbuffer+0x131d/0x4080 [i915] <4> [197.870682] i915_gem_execbuffer2_ioctl+0x103/0x5d0 [i915] <4> [197.870701] drm_ioctl_kernel+0x1d2/0x270 <4> [197.870710] drm_ioctl+0x40d/0x85c <4> [197.870721] __x64_sys_ioctl+0x10d/0x170 <4> [197.870731] do_syscall_64+0x33/0x80 <4> [197.870740] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <3> [197.870748] <3> [197.870798] Freed by task 22: <4> [197.870865] kasan_save_stack+0x19/0x40 <4> [197.870875] kasan_set_track+0x1c/0x30 <4> [197.870884] kasan_set_free_info+0x1b/0x30 <4> [197.870894] __kasan_slab_free+0x111/0x160 <4> [197.870903] kmem_cache_free+0xcd/0x710 <4> [197.871109] i915_vma_parked+0x618/0x800 [i915] <4> [197.871307] __gt_park+0xdb/0x1e0 [i915] <4> [197.871501] ____intel_wakeref_put_last+0xb1/0x190 [i915] <4> [197.871516] process_one_work+0x8dc/0x15d0 <4> [197.871525] worker_thread+0x82/0xb30 <4> [197.871535] kthread+0x36d/0x440 <4> [197.871545] ret_from_fork+0x22/0x30 <3> [197.871553] <3> [197.871602] The buggy address belongs to the object at ffff8881258cb740 which belongs to the cache i915_vma of size 968 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2553 Fixes: `2850748ef8` ("drm/i915: Pull i915_vma_pin under the vm->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201016092527.29039-1-chris@chris-wilson.co.uk	2020-10-16 15:10:36 +01:00
Chris Wilson	7d55531476	drm/i915: Break up error capture compression loops with cond_resched() As the error capture will compress user buffers as directed to by the user, it can take an arbitrary amount of time and space. Break up the compression loops with a call to cond_resched(), that will allow other processes to schedule (avoiding the soft lockups) and also serve as a warning should we try to make this loop atomic in the future. Testcase: igt/gem_exec_capture/many-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-2-chris@chris-wilson.co.uk (cherry picked from commit `293f43c80c`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-09-30 14:24:30 -04:00
Chris Wilson	f2acf74068	drm/i915: Reduce GPU error capture mutex hold time Shrink the hold time for the error capture mutex to just around the acquire/release of the PTE used for reading back the object via the Global GTT. For platforms that do not need the GGTT read back, we can skip the mutex entirely and allow concurrent error capture. Where we do use the GGTT, by restricting the hold time around the slow readback and compression, we are more resilient against softlockups (khungtaskd) as the heartbeat may well also trigger an error while the first is on going, and this allows the heartbeat reset to skip past the capture and not be stalled. Testcase: igt/gem_exec_capture/many-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-3-chris@chris-wilson.co.uk	2020-09-18 11:53:04 +01:00
Chris Wilson	293f43c80c	drm/i915: Break up error capture compression loops with cond_resched() As the error capture will compress user buffers as directed to by the user, it can take an arbitrary amount of time and space. Break up the compression loops with a call to cond_resched(), that will allow other processes to schedule (avoiding the soft lockups) and also serve as a warning should we try to make this loop atomic in the future. Testcase: igt/gem_exec_capture/many-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-2-chris@chris-wilson.co.uk	2020-09-18 11:52:44 +01:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Chris Wilson	68172f2c0b	drm/i915: Pull printing GT capabilities on error to err_print_gt We try not to assume that we captured any information, and so have to check that error->gt exists before reporting. This check was missed in err_print_capabilities, so lets break up the capability info and push it into the GT dump. We are still a long way from yamlifying this output! Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `792592e72a` ("drm/i915: Move the engine mask to intel_gt_info") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200710193239.5419-1-chris@chris-wilson.co.uk	2020-07-10 22:06:35 +01:00
Venkata Sandeep Dhanalakota	0b6613c6b9	drm/i915/sseu: Move sseu_info under gt_info SSEUs are a GT capability, so track them under gt_info. Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-8-daniele.ceraolospurio@intel.com	2020-07-08 21:13:09 +01:00

1 2 3 4 5 ...

429 Commits