On GPUs with RAS, poison can propagate between processes if VRAM is not
cleared when it is freed or allocated. The reason is, that not all write
accesses clear RAS poison. 32-byte writes by the SDMA engine do clear RAS
poison. Clearing memory in the background when it is freed should avoid
major performance impact. KFD has been doing this already for a long time.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
In rmmod procedure, kfd sends cp a dequeue request, but the
request does not get response, then an error message "cp
queue pipe 4 queue 0 preemption failed" printed.
[how]
Performing kfd suspending after disabling gfxoff can fix it.
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Extend secondary function handling for runtime pm beyond audio
to USB and UCSI.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Seems more logical to enable runtime pm at the end of
the init sequence so we don't end up entering runtime
suspend before init is finished.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to set the APU flag from IP discovery before
we evaluate this code.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit df4f0041c6.
Xgmi ras initialization had been moved from .late_init to early_init,
the defect of repeated calling amdgpu_ras_register_ras_block had been
fixed, so revert this patch.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move xgmi ras initialization from .late_init to .early_init, which let
xgmi ras can be initialized only once.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pmfw read ecc info registers in the following order,
umc0: ch_inst 0, 1, 2 ... 7
umc1: ch_inst 0, 1, 2 ... 7
The position of the register value stored in eccinfo
table is calculated according to the below formula,
channel_index = umc_inst * channel_in_umc + ch_inst
Driver directly use the index of eccinfo table array
as channel index, it's not correct, driver needs convert
eccinfo table array index to channel index according to
channel_idx_tbl.
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence obj, which is bumped
earlier by amdgpu_cs_get_fence(). This may result in reference count
leaks.
Fix it by decreasing the refcount of specific object before returning
the error code.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Older radeon boards (r2xx-r5xx) had secondary PCI functions
which we solely there for supporting multi-head on OSs with
special requirements. Add them to the unsupported list
as well so we don't attempt to bind to them. The driver
would fail to bind to them anyway, but this does so
in a cleaner way that should not confuse the user.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yellow carp has been outputting versions like `1093.24.0`, but this
is supposed to be 69.24.0. That is the MSB is being interpreted
incorrectly.
The MSB is not part of the major version, but has generally been
treated that way thus far. It's actually the program, and used to
distinguish between two programs from a similar family but different
codebase.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Older radeon boards (r2xx-r5xx) had secondary PCI functions
which we solely there for supporting multi-head on OSs with
special requirements. Add them to the unsupported list
as well so we don't attempt to bind to them. The driver
would fail to bind to them anyway, but this does so
in a cleaner way that should not confuse the user.
Cc: stable@vger.kernel.org
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The current code assumes that the RGB444 and YUV444 formats are the
same, but the HDMI 2.0 specification states that:
The three DC_XXbit bits above only indicate support for RGB 4:4:4 at
that pixel size. Support for YCBCR 4:4:4 in Deep Color modes is
indicated with the DC_Y444 bit. If DC_Y444 is set, then YCBCR 4:4:4
is supported for all modes indicated by the DC_XXbit flags.
So if we have YUV444 support and any DC_XXbit flag set but the DC_Y444
flag isn't, we'll assume that we support that deep colour mode for
YUV444 which breaks the specification.
In order to fix this, let's split the edid_hdmi_dc_modes field in struct
drm_display_info into two fields, one for RGB444 and one for YUV444.
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: d0c94692e0 ("drm/edid: Parse and handle HDMI deep color modes.")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220120151625.594595-4-maxime@cerno.tech
Pmfw read ecc info registers and store values in
eccinfo_table in the following order
umc0 ch_inst 0, 1, 2 ... 7
umc1 ch_inst 0, 1, 2 ... 7
...
umc3 ch_inst 0, 1, 2 ... 7
Driver should convert eccinfo_table_idx to channel_index according
to channel_idx_tbl.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull drm fixes from Dave Airlie:
"Thanks to Daniel for taking care of things while I was out, just a set
of merge window fixes that came in this week, two i915 display fixes
and a bunch of misc amdgpu, along with a radeon regression fix.
amdgpu:
- SR-IOV fix
- VCN harvest fix
- Suspend/resume fixes
- Tahiti fix
- Enable GPU recovery on yellow carp
radeon:
- Fix error handling regression in radeon_driver_open_kms
i915:
- Update EHL display voltage swing table
- Fix programming the ADL-P display TC voltage swing"
* tag 'drm-next-2022-01-21' of git://anongit.freedesktop.org/drm/drm:
drm/radeon: fix error handling in radeon_driver_open_kms
drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV
drm/amdgpu: apply vcn harvest quirk
drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence
drm/i915/display/ehl: Update voltage swing table
drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21
drm/amdgpu: drop flags check for CHIP_IP_DISCOVERY
drm/amdgpu: Fix rejecting Tahiti GPUs
drm/amdgpu: don't do resets on APUs which don't support it
drm/amdgpu: invert the logic in amdgpu_device_should_recover_gpu()
drm/amdgpu: Enable recovery on yellow carp
Debug VRAM access through SDMA has several broken parts resulting in
silent MMIO fallback.
BO kernel creation takes the location of the cpu addr pointer, not
the pointer itself for address kmap.
drm_dev_enter return true on success so change access check.
The source BO is reserved but not pinned so find the address using the
cursor offset relative to its memory domain start.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
That's just a leftover from old radeon days and was preventing CS and GART
bindings before the hardware was initialized. But nowdays that is
perfectly valid.
The only thing we need to warn about are GART binding before the table
is even allocated.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch will modify a pair of functions for pcie port wreg/rreg.
AMD GPU have had an independent NBIO block from SOC15 arch.
If the dirver wants to read/write the address space of the pcie devices,
it has to go through the NBIO block.
This patch will move the pcie port wreg/rreg functions to
"amdgpu_device.c", so that to reuse the functions on the
future GPU ASICs.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch will add vram check function for GMC block.
It will write pattern data to the vram and then read back from the vram,
so that to verify the work status of vram.
This patch will cover gmc v6/7/8/9/10.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
This fixes 892deb4826 ("drm/amdgpu: Separate vf2pf work item init from virt data exchange").
we should read pf2vf data based at mman.fw_vram_usage_va after gmc
sw_init. commit 892deb4826 breaks this logic.
[How]
calling amdgpu_virt_exchange_data in amdgpu_virt_init_data_exchange to
set the right base in the right sequence.
v2:
call amdgpu_virt_init_data_exchange after gmc sw_init to make data
exchange workqueue run
v3:
clean up the code logic
v4:
add some comment and make the code more readable
Fixes: 892deb4826 ("drm/amdgpu: Separate vf2pf work item init from virt data exchange")
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>