In order to debug ras error, driver will print IPID/SYND/MISC0
register value if detect correctable or uncorrectable error.
Provide umc_query_error_status_helper function to reduce code
redundancy.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of the 'amdgpu_ring_priority_level' type,
the 'amdgpu_gfx_pipe_priority' type was used,
which is an error when setting ring priority.
This is a minor error, but may cause problems in the future.
Instead of AMDGPU_RING_PRIO_2 = 2, we can use AMDGPU_RING_PRIO_MAX = 3,
but AMDGPU_RING_PRIO_2 = 2 is used for compatibility with
AMDGPU_GFX_PIPE_PRIO_HIGH = 2, and not change the behavior of the
code.
Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The data revision was not changed to 5 from 4 when the CG flags
were extended to 64-bits. Since this was missed I took
the opportunity to add future upper 64-bits of PG flags
as well so we don't need to bump it again when that comes.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
driver loading failed on VEGA10 SRIOV VF with linux host due to a wide
range of stolen reserved vram.
Since VEGA10 SRIOV VF need to reserve vram for firmware with windows
Hyper_V host specifically, check hypervisor type to only reserve
memory for it, and the range of the reserved vram can be limited
to between 5M-7M area.
Fixes: faad5ccac1 ("drm/amdgpu: Add stolen reserved memory for MI25 SRIOV.")
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enabling gfxoff quirk results in perfectly usable graphical user
interface on MacBook Pro (15-inch, 2019) with Radeon Pro Vega 20 4 GB.
Without the quirk, X server is completely unusable as every few seconds
there is gpu reset due to ring gfx timeout.
Signed-off-by: Tomasz Moń <desowin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DP/HDMI audio on AMD PRO VII stops working after S3:
[ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
[ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
[ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
[ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
[ 150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
...
[ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535
The offending commit is daf8de0874 ("drm/amdgpu: always reset the asic in
suspend (v2)"). Commit 34452ac303 ("drm/amdgpu: don't use BACO for
reset in S3 ") doesn't help, so the issue is something different.
Assuming that to make HDA resume to D0 fully realized, it needs to be
successfully put to D3 first. And this guesswork proves working, by
moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
function is in D3.
Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Variable igp_lane_info always is 0. 0 & any value = 0 and false.
In this way, all сonditional statements will false.
The code was leftover from when the code was ported from radeon
where igp_lane_info was derived from the vbios on supported
platforms.
[update commit message - Alex]
Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For VG20 + XGMI bridge, all mappings PTEs cache in TC, this may have
stall invalid PTEs in TC because one cache line has 8 pages. Need always
flush_tlb after updating mapping.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Testing the valid bit is not enough to figure out if we
need to invalidate the TLB or not.
During eviction it is quite likely that we move a BO from VRAM to GTT and
update the page tables immediately to the new GTT address.
Rework the whole function to get all the necessary parameters directly as
value.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flush TLB needs wait for GPU update fence done. MMU notify callback to
unmap range from GPUs uses unlocked GPU page table update, so add tlb_cb
to unlocked update fence to increase vm->tlb_seq.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To fix two issues with unlocked update fence:
1. vm->last_unlocked store the latest fence without taking refcount.
2. amdgpu_vm_bo_update_mapping returns old fence, not the latest fence.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
RAS error query support addition for JPEG 2.6
V2: removed unused options and corrected comment format.
Moved register definition to header file.
V3: poison query status check added.
Removed the error query support
V4: Return statement refactored.
Signed-off-by: Mohammad Zafar Ziya <Mohammadzafar.ziya@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add vcn and jpeg ras support options
V2: vcn and jpeg ras flag enabled for aldebaran asic only
V3: vcn and jpeg ras flag disabled for error counter query
Generic poison query interface added
VCN and JPEG ras enabled based on IP version check
V4: vcn and jpeg ras flag moved under ecc flag for dGPU
Signed-off-by: Mohammad Zafar Ziya <Mohammadzafar.ziya@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some video card has more than one vcn instance, passing 0 to
vcn_v3_0_pause_dpg_mode is incorrect.
Error msg:
Register(1) [mmUVD_POWER_STATUS] failed to reach value
0x00000001 != 0x00000002
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ATOMIC and DRIVER log categories do not typically contain per-frame log
messages. This patch re-classifies some messages in amd to chattier
categories to keep ATOMIC/DRIVER quiet.
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For VCN FW to detect ASIC type, in order to use different mailbox registers.
V2: simplify codes and fix format issue.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of tracking the VM updates through the dependencies just use a
sequence counter for page table updates which indicates the need to
flush the TLB.
This reduces the need to flush the TLB drastically.
v2: squash in NULL check fix (Christian)
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Separate the VM page table backend operations from the state machine since
the amdgpu_vm.c file is becoming to complex.
The allocating, freeing and updating page tables and page directories can
easily be moved into a separate file.
While at it cleanup everything checkpatch.pl reported and rename the
functions a bit to make more clear that they belong together.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the page tables to the idle list after updating the PDEs.
We have gone back and forth with that a couple of times because of problems
with the inter PD dependencies, but it should work now that we have the
state handling cleanly separated.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add help functions to query and reset RAS UTCL2 poison status.
v2: implement it on amdgpu side and kfd only calls it.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS
v2: squash in warning fix (Alex)
Signed-off-by: Tushar Patel <tushar.patel@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It is a hardware issue that VCN can't handle a GTT
backing stored TMZ buffer on CHIP_RAVEN series ASIC.
Move such a TMZ buffer to VRAM domain before command
submission as a workaround.
v2:
- Use patch_cs_in_place callback.
v3:
- Bail out early if unsecure IBs.
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org