Commit Graph

31 Commits

Author SHA1 Message Date
Luben Tuikov
38a268b391 drm/amd/pm: Enhanced reporting also for a stuck command
Also print the message index and parameter of the stuck command.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 17:40:46 -05:00
Evan Quan
8b514e898e drm/amd/pm: fix runpm hang when amdgpu loaded prior to sound driver
Current RUNPM mechanism relies on PMFW to master the timing for BACO
in/exit. And that needs cooperation from sound driver for dstate
change notification for function 1(audio). Otherwise(on sound driver
missing), BACO cannot be kicked in correctly and hang will be observed
on RUNPM exit.

By switching back to legacy message way on sound driver missing,
we are able to fix the runpm hang observed for the scenario below:
amdgpu driver loaded -> runpm suspend kicked -> sound driver loaded

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reported-and-tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-09-16 09:56:24 -04:00
Darren Powell
828db598bf amdgpu/pm: Replace navi10 usage of sprintf with sysfs_emit
initial modification of files
  smu_cmn.c
  navi10_ppt.c

=== Test ===
AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1`
AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'`
HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON}
LOGFILE=pp_printf.test.log

lspci -nn | grep "VGA\|Display"  > $LOGFILE
FILES="pp_dpm_sclk
pp_sclk_od
pp_mclk_od
pp_dpm_pcie
pp_od_clk_voltage
pp_power_profile_mode "

for f in $FILES
do
  echo === $f === >> $LOGFILE
  cat $HWMON_DIR/device/$f >> $LOGFILE
done
cat $LOGFILE

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-10 10:34:37 -04:00
Luben Tuikov
544dcd74b7 drm/amd/pm: Fix a bug in semaphore double-lock
Fix a bug in smu_cmn_send_msg_without_waiting() in
that this function does not need to take the
smu->message_lock mutex in order to send a message
down to the SMU. The mutex is acquired by the
caller of this function instead.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Changfeng Zhu <Changfeng.Zhu@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Fixes: 5810323ba6 ("drm/amd/pm: Fix a bug communicating with the SMU (v5)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-28 22:15:44 -04:00
Luben Tuikov
5810323ba6 drm/amd/pm: Fix a bug communicating with the SMU (v5)
This fixes a bug which if we probe a non-existing
I2C device, and the SMU returns 0xFF, from then on
we can never communicate with the SMU, because the
code before this patch reads and interprets 0xFF
as a terminal error, and thus we never write 0
into register 90 to clear the status (and
subsequently send a new command to the SMU.)

It is not an error that the SMU returns status
0xFF. This means that the SMU executed the last
command successfully (execution status), but the
command result is an error of some sort (execution
result), depending on what the command was.

When doing a status check of the SMU, before we
send a new command, the only status which
precludes us from sending a new command is 0--the
SMU hasn't finished executing a previous command,
and 0xFC--the SMU is busy.

This bug was seen as the following line in the
kernel log,

amdgpu: Msg issuing pre-check failed(0xff) and SMU may be not in the right state!

when subsequent SMU commands, not necessarily
related to I2C, were sent to the SMU.

This patch fixes this bug.

v2: Add a comment to the description of
__smu_cmn_poll_stat() to explain why we're NOT
defining the SMU FW return codes as macros, but
are instead hard-coding them. Such a change, can
be followed up by a subsequent patch.

v3: The changes are,
a) Add comments to break labels in
   __smu_cmn_reg2errno().

b) When an unknown/unspecified/undefined result is
   returned back from the SMU, map that to
   -EREMOTEIO, to distinguish failure at the SMU
   FW.

c) Add kernel-doc to
   smu_cmn_send_msg_without_waiting(),
   smu_cmn_wait_for_response(),
   smu_cmn_send_smc_msg_with_param().

d) In smu_cmn_send_smc_msg_with_param(), since we
   wait for completion of the command, if the
   result of the completion is
   undefined/unknown/unspecified, we print that to
   the kernel log.

v4: a) Add macros as requested, though redundant, to
    be removed when SMU consolidates for all
    ASICs--see comment in code.
    b) Get out if the SMU code is unknown.

v5: Rename the macro names.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Fixes: fcb1fe9c9e ("drm/amd/powerplay: pre-check the SMU state before issuing message")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23 10:09:28 -04:00
Evan Quan
1e75be2b67 drm/amd/pm: update the cached dpm feature status
For some ASICs, the real dpm feature disablement job is handled by
PMFW during baco reset and custom pptable loading. Cached dpm feature
status need to be updated to pair that.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-11 16:02:30 -04:00
Graham Sider
c23083cd37 drm/amd/pm: Add common throttler translation func
Defines smu_cmn_get_indep_throttler_status which performs ASIC
independent translation given a corresponding lookup table.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-10 11:44:25 -04:00
Graham Sider
22a7dcf580 drm/amd/pm: Add u64 throttler status field to gpu_metrics
This patch set adds support for a new ASIC independant u64 throttler
status field (indep_throttle_status). Piggybacks off the
gpu_metrics_v1_3 bump and similarly bumps gpu_metrics_v2 version (to
v2_2) to add field.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-10 11:44:25 -04:00
Dave Airlie
5745d647d5 Merge tag 'amd-drm-next-5.14-2021-06-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.14-2021-06-02:

amdgpu:
- GC/MM register access macro clean up for SR-IOV
- Beige Goby updates
- W=1 Fixes
- Aldebaran fixes
- Misc display fixes
- ACPI ATCS/ATIF handling rework
- SR-IOV fixes
- RAS fixes
- 16bpc fixed point format support
- Initial smartshift support
- RV/PCO power tuning fixes for suspend/resume
- More buffer object subclassing work
- Add new INFO query for additional vbios information
- Add new placement for preemptable SG buffers

amdkfd:
- Misc fixes

radeon:
- W=1 Fixes
- Misc cleanups

UAPI:
- Add new INFO query for additional vbios information
  Useful for debugging vbios related issues.  Proposed umr patch:
  https://patchwork.freedesktop.org/patch/433297/
- 16bpc fixed point format support
  IGT test:
  https://lists.freedesktop.org/archives/igt-dev/2021-May/031507.html
  Proposed Vulkan patch:
  a25d480207
- Add a new GEM flag which is only used internally in the kernel driver.  Userspace
  is not allowed to set it.

drm:
- 16bpc fixed point format fourcc

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210602214009.4553-1-alexander.deucher@amd.com
2021-06-04 06:13:57 +10:00
Andrey Grodzovsky
7afefb81b7 drm/amdgpu: Rename flag which prevents HW access
Make it's name not feature but function descriptive.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210521204122.762288-1-andrey.grodzovsky@amd.com
2021-05-25 11:53:52 -04:00
David M Nieto
0b7db431e3 drm/amdgpu/pm: Update metrics table (v2)
v2: removed static dpm and frequency ranges from table

expand metrics table with voltages and frequency ranges

Signed-off-by: David M Nieto <david.nieto@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-21 10:31:30 -04:00
Evan Quan
cfd053be1f drm/amd/pm: expose pmfw attached timestamp on Aldebaran
Available with 68.18.0 and later PMFWs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10 18:06:46 -04:00
Darren Powell
1e4a53de01 amdgpu/pm: add extra info to SMU msg pre-check failed message
Insert the value of the response to error message emitted when the
SMU msg pre-check failes

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28 23:35:49 -04:00
charles sun
055162645a drm/amd/pm: increase time out value when sending msg to SMU
when do S3 stress, low rate that PowerUpVcn message will get response
more than 1s, so here increase the timeout to 2s

Signed-off-by: charles sun <charles.sun@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:54:44 -04:00
Lijo Lazar
e42569d02a drm/amd/pm: Modify mode2 msg sequence on aldebaran
v1: During mode2 reset, PCI space is lost after message is sent.
Restore PCI space before waiting for response from firmware.

v2: Move mode2 sequence to aldebaran and update PMFW version.
Handle generic sequence in smu13 without PMFW version check.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:57 -04:00
Evan Quan
1689fca0d6 drm/amd/pm: fix Navi1x runtime resume failure V2
The RLC was put into a wrong state on runtime suspend. Thus the RLC
autoload will fail on the succeeding runtime resume. By adding an
intermediate PPSMC_MSG_PrepareMp1ForUnload(some GC hard reset involved,
designed for PnP), we can bring RLC back into the desired state.

V2: integrate INTERRUPTS_ENABLED flag clearing into current
    mp1 state set routines

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:37:53 -04:00
Horace Chen
4215a11923 drm/amdgpu: enable one vf mode on sienna cichlid vf
sienna cichlid needs one vf mode which allows vf to set and get
clock status from guest vm. So now expose the required interface
and allow some smu request on VF mode. Also since this asic blocked
direct MMIO access, use KIQ to send SMU request under sriov vf.

OD use same command as getting pp table which is not allowed for
 sienna cichlid, so remove OD feature under sriov vf.

Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Monk Liu<monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-05 15:10:35 -05:00
Evan Quan
152bb95c38 drm/amd/pm: update existing gpu_metrics interfaces V2
Update the gpu_metrics interface implementations to use the latest
upgraded data structures.

V2: fit the data type change of energy_accumulator

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-05 15:10:28 -05:00
Nathan Chancellor
8addf37c24 drm/amd/pm/swsmu: Avoid using structure_size uninitialized in smu_cmn_init_soft_gpu_metrics
Clang warns:

drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:764:2: warning:
variable 'structure_size' is used uninitialized whenever switch default
is taken [-Wsometimes-uninitialized]
        default:
        ^~~~~~~
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:770:23: note:
uninitialized use occurs here
        memset(header, 0xFF, structure_size);
                             ^~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:753:25: note:
initialize the variable 'structure_size' to silence this warning
        uint16_t structure_size;
                               ^
                                = 0
1 warning generated.

Return in the default case, as the size of the header will not be known.

Fixes: de4b7cd8cb ("drm/amd/pm/swsmu: unify the init soft gpu metrics function")
Link: https://github.com/ClangBuiltLinux/linux/issues/1304
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-02-24 09:28:54 -05:00
Kevin Wang
de4b7cd8cb drm/amd/pm/swsmu: unify the init soft gpu metrics function
the soft gpu metrics is not asic related data structure.
unify them to reduce duplicate code.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-02-09 15:28:42 -05:00
Huang Rui
e484de44b1 drm/amd/pm: print the timeout of smc message
This patch is to help firmware designer to know the smc message timeout
status.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-01-21 09:55:05 -05:00
Huang Rui
966737906c drm/amd/pm: fix the return value of pm message
0 should be right driver return value, 0x1 is the right firmware
return value. So switch to 0 at last.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Xiaojian Du <xiaojian.du@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-01-13 23:48:34 -05:00
Huang Rui
743538838c drm/amdgpu: revise the mode2 reset for vangogh
PCIE MMIO bar needs to be restored firstly after the reset event
triggers. So it's unable to access the registers to wait for response
from SMU. Becasue the value of mmMP1_SMN_C2PMSG_90 is invalid at that
moment.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-01-13 23:48:13 -05:00
Huang Rui
4738154021 drm/amd/pm: don't mark all apu as true on feature mask
VHG based APU will support feature mask checking.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-01-13 23:47:33 -05:00
Huang Rui
11db224b34 drm/amd/pm: enhance the real response for smu message (v2)
The user prefers to know the real response value from C2PMSG 90 register
which is written by firmware not -EIO.

v2: return C2PMSG 90 value

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-01-13 23:47:11 -05:00
Evan Quan
0339258bd9 drm/amd/pm: invalidate hdp before CPU access the memory written by GPU
To eliminate the possible influence by outdated HDP read cache.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-12-08 23:01:24 -05:00
Xiaojian Du
ac70c6c62a drm/amd/pm: add one new function to get 32 bit feature mask for vangogh
This patch is to add one new function to get 32 bit feature mask for
vangogh.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-30 00:58:32 -04:00
Kevin Wang
7aeef2aacc drm/amd/swsmu: correct wrong feature bit mapping
1. when smc feature bit isn't mapped,
the feature state isn't showed on sysfs node of pp_features.
2. add pp_features table title

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-21 16:13:57 -04:00
Evan Quan
10144762e2 drm/amd/pm: postpone SOCCLK/UCLK enablement after DAL initialization(V2)
This is needed for Navi1X only. And it may help for display missing
or hang issue seen on some high resolution monitors.

V2: no UCLK DPM enablement for Navi10 A0 secure SKU

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-17 17:46:13 -04:00
Andrey Grodzovsky
bf36b52e78 drm/amdgpu: Avoid accessing HW when suspending SW state
At this point the ASIC is already post reset by the HW/PSP
so the HW not in proper state to be configured for suspension,
some blocks might be even gated and so best is to avoid touching it.

v2: Rename in_dpc to more meaningful name

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-15 17:24:39 -04:00
Evan Quan
e098bc9612 drm/amd/pm: optimize the power related source code layout
The target is to provide a clear entry point(for power routines).
Also this can help to maintain a clear view about the frameworks
used on different ASICs. Hopefully all these can make power part
more friendly to play with.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00