Newer platforms have DSS that aren't necessarily available for both
geometry and compute, two queries will need to exist. This introduces
the first, when passing a valid engine class and engine instance in the
flags returns a topology describing geometry.
Based on past discussion, we currently only support this new query item
on Xe_HP and beyond; earlier platforms do not need to worry about
geometry and compute pipelines having access to different topology and
should continue to use the existing topology query.
v2: fix white space errors
v3: change flags from hosting 2 8 bit numbers to holding a
i915_engine_class_instance struct
v4: add error if non rcs engine passed.
v5 (by MattR):
- Improve kerneldoc and cross references to related structs/enums.
(Daniel)
- Clarify that geometry query is only supported on render engines
(Francisco)
- Clarify that the new query is only supported on Xe_HP+.
- Fix checkpatch warnings.
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
UMD (mesa): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14143
Testcase: igt@i915_query@test-query-geometry-subslices
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20220414192230.749771-4-matthew.d.roper@intel.com
The latest GuC firmware drops the context descriptor pool in favour of
passing all creation data in the create H2G. It also greatly simplifies
the work queue and removes the process descriptor used for multi-LRC
submission. So, remove all mention of LRC and process descriptors and
update the registration code accordingly.
Unfortunately, the new API also removes the ability to set default
values for the scheduling policies at context registration time.
Instead, a follow up H2G must be sent. The individual scheduling
policy update H2G commands are also dropped in favour of a single KLV
based H2G. So, change the update wrappers accordingly and call this
during context registration..
Of course, this second H2G per registration might fail due to being
backed up. The registration code has a complicated state machine to
cope with the actual registration call failing. However, if that works
then there is no support for unwinding if a further call should fail.
Unwinding would require sending a H2G to de-register - but that can't
be done because the CTB is already backed up.
So instead, add a new flag to say whether the context has a pending
policy update. This is set if the policy H2G fails at registration
time. The submission code checks for this flag and retries the policy
update if set. If that call fails, the submission path early exists
with a retry error. This is something that is already supported for
other reasons.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220412225955.1802543-2-John.C.Harrison@Intel.com
[why]
These static variables save the RLC Scratch registers address.
When we install multiple GPUs (for example: XGMI setting) and
multiple GPUs call the function at same time. The RLC Scratch
registers address are changed each other. Then it caused
reading/writing from/to wrong GPU.
[how]
Removed the static from the variables. The variables are
on the stack.
Fixes: 5d447e2967 ("drm/amdgpu: add helper for rlcg indirect reg access")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the waiters to the wait queue during initialization, while holding the
event spinlock. Otherwise the waiter will not get activated if the event
signals before being added to the wait queue.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 863fa85e6a.
While we were testing DCN3.1 with a hub, we noticed that only one of 2
connected displays lights up when using some specific display
resolution. In summary, this was the setup:
1. Displays:
* Sharp LQ156M1JW26 (eDP): 1080@240
* BENQ SW320 (DP): 4k@60
* BENQ EX3203R (DP): 4k@60
2. Hub: Club3D CSV-7300
3. ASIC: DCN3.1
After bisecting this issue, we figured out the commit mentioned above
introduced this issue. We are investigating why this patch introduced
this regression, but we need to revert it for now.
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Mark Broadworth <Mark.Broadworth@amd.com>
Cc: Michael Strauss <michael.strauss@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VM might already be freed when amdgpu_vm_tlb_seq_cb() is called.
We see the calltrace below.
Fix it by keeping the last flush fence around and wait for it to signal
BUG kmalloc-4k (Not tainted): Poison overwritten
0xffff9c88630414e8-0xffff9c88630414e8 @offset=5352. First byte 0x6c
instead of 0x6b Allocated in amdgpu_driver_open_kms+0x9d/0x360 [amdgpu]
age=44 cpu=0 pid=2343
__slab_alloc.isra.0+0x4f/0x90
kmem_cache_alloc_trace+0x6b8/0x7a0
amdgpu_driver_open_kms+0x9d/0x360 [amdgpu]
drm_file_alloc+0x222/0x3e0 [drm]
drm_open+0x11d/0x410 [drm]
Freed in amdgpu_driver_postclose_kms+0x3e9/0x550 [amdgpu] age=22 cpu=1
pid=2485
kfree+0x4a2/0x580
amdgpu_driver_postclose_kms+0x3e9/0x550 [amdgpu]
drm_file_free+0x24e/0x3c0 [drm]
drm_close_helper.isra.0+0x90/0xb0 [drm]
drm_release+0x97/0x1a0 [drm]
__fput+0xb6/0x280
____fput+0xe/0x10
task_work_run+0x64/0xb0
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If lookup_event_by_id() returns a NULL "ev" pointer then the
spin_lock(&ev->lock) will crash. This was detected by Smatch:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.c:644 kfd_set_event()
error: we previously assumed 'ev' could be null (see line 639)
Fixes: 5273e82c5f ("drm/amdkfd: Improve concurrency of event handling")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.
When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem. ccs data is 1/256 of
lmem size.
Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.
Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT
v2: Fixing the ccs handling
v3: Handle the ccs data at same loop as main memory [Thomas]
v4: changes for emit_copy_ccs
v5: handle non-flat-ccs scenario
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-10-ramalingam.c@intel.com
On Xe-HP and later devices, dedicated compression control state (CCS)
stored in local memory is used for each surface, to support the
3D and media compression formats.
The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory
is reserved for the CCS data and a secure register will be programmed
with the CCS base address
So when an object is allocated in local memory, dont need to explicitly
allocate the space for ccs data. But when the obj is evicted into the
smem, to hold the compression related data along with the obj extra space
is needed in smem. i.e obj_size + (obj_size/256).
Hence when a smem pages are allocated for an obj with lmem placement
possibility we create with the extra pages required for the ccs data for
the obj size.
v2:
Used imperative wording [Thomas]
v3:
Inflate the pages only when obj's placement is lmem only
v4:
GEM_BUG_ON if the ttm->num_pages > obj page size [Thomas]
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-9-ramalingam.c@intel.com
Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.
XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the
ccs surface of a lmem memory. So on Flat-CCS capable platform we use
XY_CTRL_SURF_COPY_BLT to clear the CCS meta data.
v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
Added Kdoc on flat-ccs.
v5: GENMASK is used [Matt]
mocs fix [Matt]
Comments Fix [Matt]
Flush address programming [Ram]
v6: FLUSH_DW is fixed
Few coding style fix
v7: Adopting the XY_FAST_COLOR_BLT (Thomas]
v8: XY_CTRL_SURF_COPY_BLT for ccs clearing.
v9: emit_copy_ccs is used.
v10: ctrl_surf cmds are filled in caller itself. [Thomas]
only one ctrl surf cmd is used as size of lmem is <=8M [Thomas]
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-6-ramalingam.c@intel.com
Enabling gfxoff quirk results in perfectly usable graphical user
interface on MacBook Pro (15-inch, 2019) with Radeon Pro Vega 20 4 GB.
Without the quirk, X server is completely unusable as every few seconds
there is gpu reset due to ring gfx timeout.
Signed-off-by: Tomasz Moń <desowin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
DP/HDMI audio on AMD PRO VII stops working after S3:
[ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
[ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
[ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
[ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
[ 150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
...
[ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535
The offending commit is daf8de0874 ("drm/amdgpu: always reset the asic in
suspend (v2)"). Commit 34452ac303 ("drm/amdgpu: don't use BACO for
reset in S3 ") doesn't help, so the issue is something different.
Assuming that to make HDA resume to D0 fully realized, it needs to be
successfully put to D3 first. And this guesswork proves working, by
moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
function is in D3.
Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
"Pre-multiplied" is the default pixel blend mode for KMS/DRM, as
documented in supported_modes of drm_plane_create_blend_mode_property():
https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/drm_blend.c
In this mode, both 'pixel alpha' and 'plane alpha' participate in the
calculation, as described by the pixel blend mode formula in KMS/DRM
documentation:
out.rgb = plane_alpha * fg.rgb +
(1 - (plane_alpha * fg.alpha)) * bg.rgb
Considering the blend config mechanisms we have in the driver so far,
the alpha mode that better fits this blend mode is the
_PER_PIXEL_ALPHA_COMBINED_GLOBAL_GAIN, where the value for global_gain
is the plane alpha (global_alpha).
With this change, alpha property stops to be ignored. It also addresses
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1734
v2:
* keep the 8-bit value for global_alpha_value (Nicholas)
* correct the logical ordering for combined global gain (Nicholas)
* apply to dcn10 too (Nicholas)
Signed-off-by: Melissa Wen <mwen@igalia.com>
Tested-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Simon Ser <contact@emersion.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Let's make sure FBC is always disabled when we start to take
over the hardware state.
I suspect this should never really happen, since the only time
when we really should be taking over with the display already
active is when the previous state was progammed by the BIOS,
which likely shouldn't use FBC. This could be driver init,
or S4 resume when the boot kernel doesn't load i915. But I
suppose no harm in keeping this code around for exra safety
since it's quite trivial.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220315140001.1172-7-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Currently, the IO-links to the device being removed from topology,
are not cleared. As a result, there would be dangling links left in
the KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
values are consistent after the device is removed:
a. Adding a device and removing a GPU device are made mutually
exclusive.
b. The global proximity domain counter is no longer required to be
an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
changed due to device removal.
CC: Shuotao Xu <shuotaoxu@microsoft.com>
Reviewed-by: Shuotao Xu <shuotaoxu@microsoft.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>