The current approach breaks S3/S4 as asic reset is needed for them.
And putting SMU out of service(via SMU_MSG_PrepareMp1ForUnload) will make
that(asic reset) failed. Considering with current designs, there is
actually also asic reset involved on driver reloading. That can make
asic back to a clean state. So, the SMU_MSG_PrepareMp1ForUnload operation
will be not so necessary. Thus we will just drop the SMU_MSG_PrepareMp1ForUnload
operation. We may revise the whole driver reloading sequences when there
is a better design.
Fixes: 72aeb6ee0c ("drm/amd/pm: fix driver reload SMC firmware fail issue for smu13")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the case of SRIOV, the register smnMp1_PMI_3_FIFO will get an invalid
value which will cause the "shift out of bound". In Ubuntu22.04, this
issue will be checked an related call trace will be reported in dmesg.
Signed-off-by: lin cao <lin.cao@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd-drm-next-5.20-2022-07-14:
amdgpu:
- DCN3.2 updates
- DC SubVP support
- DP MST fixes
- Audio fixes
- DC code cleanup
- SMU13 updates
- Adjust GART size on newer APUs for S/G display
- Soft reset for GFX 11
- Soft reset for SDMA 6
- Add gfxoff status query for vangogh
- Improve BO domain pinning
- Fix timestamps for cursor only commits
- MES fixes
- DCN 3.1.4 support
- Misc fixes
- Misc code cleanup
amdkfd:
- Simplify GPUVM validation
- Unified memory for CWSR save/restore area
- fix possible list corruption on queue failure
radeon:
- Fix bogus power of two warning
UAPI:
- Unified memory for CWSR save/restore area for KFD
Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080952.html
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220714214716.8203-1-alexander.deucher@amd.com
divide error: 0000 [#1] SMP PTI
CPU: 3 PID: 78925 Comm: tee Not tainted 5.15.50-1-lts #1
Hardware name: MSI MS-7A59/Z270 SLI PLUS (MS-7A59), BIOS 1.90 01/30/2018
RIP: 0010:smu_v11_0_set_fan_speed_rpm+0x11/0x110 [amdgpu]
Speed is user-configurable through a file.
I accidentally set it to zero, and the driver crashed.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Yefim Barashkin <mr.b34r@kolabnow.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[v2]
simplified fix after Lijo's feedback
removed clocks.num_levels from calculation of loop count
removed unsafe accesses to shim table freq_values
retained corner case output only min,now if
clocks.num_levels == 1 && now > min
[v1]
added a check to populate and use SCLK shim table freq_values only
if using dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL or
AMD_DPM_FORCED_LEVEL_PERF_DETERMINISM
removed clocks.num_levels from calculation of shim table size
removed unsafe accesses to shim table freq_values
output gfx_table values if using other dpm levels
added check for freq_match when using freq_values for when now == min_clk
== Test ==
LOGFILE=aldebaran-sclk.test.log
AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1`
AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'`
HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON}
lspci -nn | grep "VGA\|Display" > $LOGFILE
FILES="pp_od_clk_voltage
pp_dpm_sclk"
for f in $FILES
do
echo === $f === >> $LOGFILE
cat $HWMON_DIR/device/$f >> $LOGFILE
done
cat $LOGFILE
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[v2]
No Changes, added RB
[v1]
Size of pp_clock_levels_with_latency is PP_MAX_CLOCK_LEVELS, not MAX_NUM_CLOCKS.
Both are currently defined as 16, modifying in case one value is modified in future
Changed code in both arcturus and aldabaran.
Also removed unneeded var count, and used min_t function
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some APUs with SMU13 are showing the following message:
`amdgpu 0000:63:00.0: amdgpu: Unexpected and unhandled version: 3.1`
This warning isn't relevant for smu info 3.1, as no bootup information
is present in the table.
Fixes: 593a54f180 ("drm/amd/pm: correct the way for retrieving bootup clocks")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
set mp1 unload state will cause the SMC FW can't accept any SMU message,
skip to set mp1 unload state to avoid following case fail:
- runtime pm case.
- gpu reset case.
Fixes: 72aeb6ee0c ("drm/amd/pm: fix driver reload SMC firmware fail issue for smu13")
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The EccInfo_t struct in driver_if.h is as below in official release
verion 68.55.0
typedef struct {
uint64_t mca_umc_status;
uint64_t mca_umc_addr;
uint16_t ce_count_lo_chip;
uint16_t ce_count_hi_chip;
uint32_t eccPadding;
uint64_t mca_ceumc_addr;
} EccInfo_t;
It's different from the debug version druing develop print correctable
error address, so adjust EccInfo_t struct.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For SMU IP v13.0.4, the smnMP1_FIRMWARE_FLAGS address is different,
we need this to correct the reading address.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
issue calltrace:
[ 402.773695] [drm] failed to load ucode SMC(0x2C)
[ 402.773754] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0)
[ 402.773762] [drm:psp_load_smu_fw [amdgpu]] *ERROR* PSP load smu failed!
[ 402.966758] [drm:psp_v13_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring
[ 402.966949] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[ 402.967116] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[ 402.967252] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
[ 402.967255] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
if not reset mp1 state during kernel driver unload, it will cause psp
load pmfw fail at the second time.
add PPSMC_MSG_PrepareMp1ForUnload support for smu_v13_0_0/smu_v13_0_7
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
support BAMACO reset on smu_v13_0_7, take BAMACO as a subset of BACO
for the low latency, and it only happens on specific platforms.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull bitmap updates from Yury Norov:
- bitmap: optimize bitmap_weight() usage, from me
- lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
Carvalho Chehab
- include/linux/find: Fix documentation, from Anna-Maria Behnsen
- bitmap: fix conversion from/to fix-sized arrays, from me
- bitmap: Fix return values to be unsigned, from Kees Cook
It has been in linux-next for at least a week with no problems.
* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
nodemask: Fix return values to be unsigned
bitmap: Fix return values to be unsigned
KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
KVM: x86: hyper-v: fix type of valid_bank_mask
ia64: cleanup remove_siblinginfo()
drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
lib/bitmap: add test for bitmap_{from,to}_arr64
lib: add bitmap_{from,to}_arr64
lib/bitmap: extend comment for bitmap_(from,to)_arr32()
include/linux/find: Fix documentation
lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
MAINTAINERS: add cpumask and nodemask files to BITMAP_API
arch/x86: replace nodes_weight with nodes_empty where appropriate
mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
irq: mips: replace cpumask_weight with cpumask_empty where appropriate
drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
arch/x86: replace cpumask_weight with cpumask_empty where appropriate
...
So we can eventaully use them in the common smu code for
accessing the SMU mailboxes without needing a lot of
per asic logic in the common code.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>