Implement functions to get and set GFXOFF's entry count and residency
for vangogh.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To meet the requirement for multi container usecase which needs
a quicker reset and not causing VRAM lost, adding the Mode2
reset handler for sienna_cichlid.
v2: move skip mode2 flag part separately
v3: remove the use of asic_reset_res
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the case of SRIOV, the register smnMp1_PMI_3_FIFO will get an invalid
value which will cause the "shift out of bound". In Ubuntu22.04, this
issue will be checked an related call trace will be reported in dmesg.
Signed-off-by: lin cao <lin.cao@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd-drm-next-5.20-2022-07-14:
amdgpu:
- DCN3.2 updates
- DC SubVP support
- DP MST fixes
- Audio fixes
- DC code cleanup
- SMU13 updates
- Adjust GART size on newer APUs for S/G display
- Soft reset for GFX 11
- Soft reset for SDMA 6
- Add gfxoff status query for vangogh
- Improve BO domain pinning
- Fix timestamps for cursor only commits
- MES fixes
- DCN 3.1.4 support
- Misc fixes
- Misc code cleanup
amdkfd:
- Simplify GPUVM validation
- Unified memory for CWSR save/restore area
- fix possible list corruption on queue failure
radeon:
- Fix bogus power of two warning
UAPI:
- Unified memory for CWSR save/restore area for KFD
Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080952.html
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220714214716.8203-1-alexander.deucher@amd.com
divide error: 0000 [#1] SMP PTI
CPU: 3 PID: 78925 Comm: tee Not tainted 5.15.50-1-lts #1
Hardware name: MSI MS-7A59/Z270 SLI PLUS (MS-7A59), BIOS 1.90 01/30/2018
RIP: 0010:smu_v11_0_set_fan_speed_rpm+0x11/0x110 [amdgpu]
Speed is user-configurable through a file.
I accidentally set it to zero, and the driver crashed.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Yefim Barashkin <mr.b34r@kolabnow.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[v2]
No Changes, added RB
[v1]
Size of pp_clock_levels_with_latency is PP_MAX_CLOCK_LEVELS, not MAX_NUM_CLOCKS.
Both are currently defined as 16, modifying in case one value is modified in future
Changed code in both arcturus and aldabaran.
Also removed unneeded var count, and used min_t function
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull bitmap updates from Yury Norov:
- bitmap: optimize bitmap_weight() usage, from me
- lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
Carvalho Chehab
- include/linux/find: Fix documentation, from Anna-Maria Behnsen
- bitmap: fix conversion from/to fix-sized arrays, from me
- bitmap: Fix return values to be unsigned, from Kees Cook
It has been in linux-next for at least a week with no problems.
* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
nodemask: Fix return values to be unsigned
bitmap: Fix return values to be unsigned
KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
KVM: x86: hyper-v: fix type of valid_bank_mask
ia64: cleanup remove_siblinginfo()
drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
lib/bitmap: add test for bitmap_{from,to}_arr64
lib: add bitmap_{from,to}_arr64
lib/bitmap: extend comment for bitmap_(from,to)_arr32()
include/linux/find: Fix documentation
lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
MAINTAINERS: add cpumask and nodemask files to BITMAP_API
arch/x86: replace nodes_weight with nodes_empty where appropriate
mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
irq: mips: replace cpumask_weight with cpumask_empty where appropriate
drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
arch/x86: replace cpumask_weight with cpumask_empty where appropriate
...
So we can eventaully use them in the common smu code for
accessing the SMU mailboxes without needing a lot of
per asic logic in the common code.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
support reading smartshift apu and dgpu power for smu11 based asic
v2: add new version of SmuMetrics and make calculation more readable (Lijo)
v3: avoid calculations that result in -ve values and skip related checks
v4: use the current power limit on dGPU and exclude smu 11_0_7 (Lijo)
v5: remove redundant code (Lijo)
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On Sienna Cichild, not all platforms use PMFW based fan control
(ex: fanless systems). On such ASICs fan control by PMFW will be
disabled in PPTable. Disable hwmon knobs for fan control also as
it is not possible to report or control fan speed on such platforms
through driver.
v3: FeaturesToRun casted as uint64_t
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since unique_id is only supported in PMFW 0x3A5300 and higher, we will
only be able to use it inside Smu_Metrics_V3_t, which requires PMFW
0x3A4900 and higher. Remove the unique_id/serial_number references from
the v1 and v2 tables to avoid any confusion, and return 0 if metrics_v1
or metrics_v2 are used to try to get the unique_id/serial_number.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is being added to SMU Metrics, so add the required tie-ins in the
kernel. Also create the corresponding unique_id sysfs file.
v2: Add FW version check, remove SMU mutex
v3: Fix style warning
v4: Add MP1 IP_VERSION check to FW version check
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There's always miss the SMU feature enabled checked in the NPI phase,
so let validate the SMU feature enable message directly rather than
add more and more MP1 version check.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication
that if one PCIe bridge with an AMD GPU connected doesn't support ASPM
then none of them do. This is an invalid assumption as the PCIe core will
configure ASPM for individual PCIe bridges.
Create a new helper function that can be called by individual dGPUs to
react to the `amdgpu_aspm` module parameter without having negative results
for other dGPUs on the PCIe bus.
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(v4)
Modifications to satisfy checkpatch --strict
(v3)
Rewrote patchset to order patches as (API, hw impl, usecase)
- implement emit_clk_levels for navi10, based on print_clk_levels,
but using sysfs_emit without smu_cmn_get_sysfs() workaround
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For Cyan Skillfish and Renoir, there is no interface provided by PMFW
to retrieve the enabled features. So, we assume all features are enabled.
Fixes: 7ade3ca9cd ("drm/amd/pm: correct the usage for 'supported' member of smu_feature structure")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following scenarios make the driver cache for enabled ppfeatures
outdated and invalid:
- Other tools interact with PMFW to change the enabled ppfeatures.
- PMFW may enable/disable some features behind driver's back. E.g.
for sienna_cichild, on gfxoff entering, PMFW will disable gfx
related DPM features. All those are performed without driver's
notice.
Also considering driver does not actually interact with PMFW such
frequently, the benefit brought by such cache is very limited.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The supported features should be retrieved just after EnableAllDpmFeatures message
complete. And the check(whether some dpm feature is supported) is only needed when we
decide to enable or disable it.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use uint64_t instead of an array of uint32_t. This can avoid
some non-necessary intermediate uint32_t -> uint64_t conversions.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Return an error if someone tries to use the i2c bus when the
SMU is not running. Otherwise we can end up sending commands
to the SMU which will either get ignored or could cause other
issues depending on what state the GPU and SMU are in.
Cc: Luben.Tuikov@amd.com
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Expose both SMU I2C buses. Some boards use the same bus for both the RAS
and FRU EEPROMs and others use different buses. This enables the
additional I2C bus and sets the right buses to use for RAS and FRU EEPROM
access.
Cc: Roy Sun <Roy.Sun@amd.com>
Co-developed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The existing way cannot handle Beige Goby well as a different
PPTable data structure(PPTable_beige_goby_t instead of PPTable_t)
is used there.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some clients(e.g., kfd) query sclk/mclk through this function.
As cyan skillfish doesn't support dpm, for sclk, set min/max
to CYAN_SKILLFISH_SCLK_MIN/CYAN_SKILLFISH_SCLK_MAX(to maintain the
existing logic).For others, set both min and max to current value.
Before this patch:
# /opt/rocm/opencl/bin/clinfo
Max clock frequency: 0Mhz
After this patch:
# /opt/rocm/opencl/bin/clinfo
Max clock frequency: 2000Mhz
v2:
- Maintain the existing min/max sclk logic.(Lijo)
v3:
- Avoid fetching metrics table twice.(Lijo)
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yellow carp has been outputting versions like `1093.24.0`, but this
is supposed to be 69.24.0. That is the MSB is being interpreted
incorrectly.
The MSB is not part of the major version, but has generally been
treated that way thus far. It's actually the program, and used to
distinguish between two programs from a similar family but different
codebase.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable power level, power limit and fan speed
information retrieval in one VF mode.
This is required so that tool ROCM-SMI
can provide this information to users.
Signed-off-by: Marina Nikolic <Marina.Nikolic@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
support ECC TABLE message, this table include umc ras error count
and error address
V2:
Return after smu version check fail
V3:
Return -EOPNOTSUPP, if fail to get smc ver.
V4:
ECCTABLE typo corrected and sentence rephrased.
Signed-off-by: mziya <Mohammadzafar.ziya@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Those gfxoff controls added for some specific ASICs are unnecessary.
The functionalities are not affected without them. Also to align with
other ASICs, they should also be dropped.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>