Commit Graph

88297 Commits

Author SHA1 Message Date
Linus Torvalds
440b7895c9 Merge tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morron:
 "Seventeen hotfixes, mainly for MM.

  Five are cc:stable and the remainder address post-6.0 issues"

* tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nouveau: fix migrate_to_ram() for faulting page
  mm/huge_memory: do not clobber swp_entry_t during THP split
  hugetlb: fix memory leak associated with vma_lock structure
  mm/page_alloc: reduce potential fragmentation in make_alloc_exact()
  mm: /proc/pid/smaps_rollup: fix maple tree search
  mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
  mm/mmap: fix MAP_FIXED address return on VMA merge
  mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
  mm/mmap: undo ->mmap() when mas_preallocate() fails
  init: Kconfig: fix spelling mistake "satify" -> "satisfy"
  ocfs2: clear dinode links count in case of error
  ocfs2: fix BUG when iput after ocfs2_mknod fails
  gcov: support GCC 12.1 and newer compilers
  zsmalloc: zs_destroy_pool: add size_class NULL check
  mm/mempolicy: fix mbind_range() arguments to vma_merge()
  mailmap: update email for Qais Yousef
  mailmap: update Dan Carpenter's email address
2022-10-21 12:33:03 -07:00
Alistair Popple
97061d4411 nouveau: fix migrate_to_ram() for faulting page
Commit 16ce101db8 ("mm/memory.c: fix race when faulting a device private
page") changed the migrate_to_ram() callback to take a reference on the
device page to ensure it can't be freed while handling the fault. 
Unfortunately the corresponding update to Nouveau to accommodate this
change was inadvertently dropped from that patch causing GPU to CPU
migration to fail so add it here.

Link: https://lkml.kernel.org/r/20221019122934.866205-1-apopple@nvidia.com
Fixes: 16ce101db8 ("mm/memory.c: fix race when faulting a device private page")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:24 -07:00
Dave Airlie
cbc543c59e Merge tag 'drm-misc-fixes-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc2:
- Fix a buffer overflow in format_helper_test.
- Set DDC pointer in drmm_connector_init.
- Compiler fixes for panfrost.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c4d05683-8ebe-93b8-d24c-d1d2c68f12c4@linux.intel.com
2022-10-21 09:56:14 +10:00
Dave Airlie
a4294d5c77 Merge tag 'amd-drm-fixes-6.1-2022-10-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-10-20:

amdgpu:
- Fix gfx9 APU regression caused by PCI AER fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020135225.562807-1-alexander.deucher@amd.com
2022-10-21 08:10:32 +10:00
Dave Airlie
f046ca4a18 Merge tag 'amd-drm-fixes-6.1-2022-10-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-10-19:

amdgpu:
- Mode2 reset fixes for Sienna Cichlid
- Revert broken fan speed sensor fix
- SMU 13.x fixes
- GC 11.x fixes
- RAS fixes
- SR-IOV fixes
- Fix BO move breakage on SI
- Misc compiler fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019191357.6208-1-alexander.deucher@amd.com
2022-10-21 08:10:16 +10:00
Dave Airlie
8865dd756f Merge tag 'drm-misc-fixes-2022-10-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Short summary of fixes pull:

 * vc4: HDMI fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Y0gGdlujszCstDeP@linux-uq9g
2022-10-21 08:08:30 +10:00
Alex Deucher
50b0e4d4da drm/amdgpu: fix sdma doorbell init ordering on APUs
Commit 8795e182b0 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
uncovered a bug in amdgpu that required a reordering of the driver
init sequence to avoid accessing a special register on the GPU
before it was properly set up leading to an PCI AER error.  This
reordering uncovered a different hw programming ordering dependency
in some APUs where the SDMA doorbells need to be programmed before
the GFX doorbells. To fix this, move the SDMA doorbell programming
back into the soc15 common code, but use the actual doorbell range
values directly rather than the values stored in the ring structure
since those will not be initialized at this point.

This is a partial revert, but with the doorbell assignment
fixed so the proper doorbell index is set before it's used.

Fixes: e3163bc8ff ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: skhan@linuxfoundation.org
Cc: stable@vger.kernel.org
2022-10-20 09:35:51 -04:00
Steven Price
72655fb942 drm/panfrost: replace endian-specific types with native ones
__le32 and __le64 types aren't portable and are not available on
FreeBSD (which uses the same uAPI).

Instead of attempting to always output little endian, just use native
endianness in the dumps. Tools can detect the endianness in use by
looking at the 'magic' field, but equally we don't expect big-endian to
be used with Mali (there are no known implementations out there).

Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7252
Fixes: 730c2bf4ad ("drm/panfrost: Add support for devcoredump")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017104602.142992-3-steven.price@arm.com
2022-10-20 11:02:11 +01:00
Maxime Ripard
a91e5e3e22 drm/connector: Set DDC pointer in drmm_connector_init
Commit 35a3b82f1b ("drm/connector: Introduce drmm_connector_init")
introduced the function drmm_connector_init() with a parameter for an
optional ddc pointer to the i2c controller used to access the DDC bus.

However, the underlying call to __drm_connector_init() was always
setting it to NULL instead of passing the ddc argument around.

This resulted in unexpected null pointer dereference on platforms
expecting to get a DDC controller.

Fixes: 35a3b82f1b ("drm/connector: Introduce drmm_connector_init")
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20221019143442.1798964-1-maxime@cerno.tech
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2022-10-20 10:53:13 +02:00
David Gow
7089003304 drm: tests: Fix a buffer overflow in format_helper_test
The xrgb2101010 format conversion test (unlike for other formats) does
an endianness conversion on the results. However, it always converts
TEST_BUF_SIZE 32-bit integers, which results in reading from (and
writing to) more memory than in present in the result buffer. Instead,
use the buffer size, divided by sizeof(u32).

The issue could be reproduced with KASAN:
./tools/testing/kunit/kunit.py run --kunitconfig drivers/gpu/drm/tests \
	--kconfig_add CONFIG_KASAN=y --kconfig_add CONFIG_KASAN_VMALLOC=y \
	--kconfig_add CONFIG_KASAN_KUNIT_TEST=y \
	drm_format_helper_test.*xrgb2101010

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Fixes: 4531143196 ("drm/format-helper: Add KUnit tests for drm_fb_xrgb8888_to_xrgb2101010()")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019073239.3779180-1-davidgow@google.com
2022-10-20 09:56:05 +02:00
Thomas Zimmermann
1aca5ce036 Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get v6.1-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2022-10-20 09:09:00 +02:00
Christian König
01f2cf5384 drm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates
Make sure that we always have a CPU round trip to let the submission
code correctly decide if a TLB flush is necessary or not.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-2-christian.koenig@amd.com
2022-10-19 12:45:00 +02:00
Christian König
7b476affcc drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag
Setting this flag on a scheduler fence prevents pipelining of jobs
depending on this fence. In other words we always insert a full CPU
round trip before dependent jobs are pushed to the pipeline.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
2022-10-19 12:42:51 +02:00
Arunpravin Paneer Selvam
8273b40486 drm/amdgpu: Fix for BO move issue
A user reported a bug on CAPE VERDE system where uvd_v3_1
IP component failed to initialize as there is an issue with
BO move code from one memory to other.

In function amdgpu_mem_visible() called by amdgpu_bo_move(),
when there are no blocks to compare or if we have a single
block then break the loop.

Fixes: 312b4dc11d ("drm/amdgpu: Fix VRAM BO swap issue")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:14:07 -04:00
YuBiao Wang
2abe92c7ad drm/amdgpu: dequeue mes scheduler during fini
[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:59 -04:00
Kenneth Feng
5ce4726a13 drm/amd/pm: enable thermal alert on smu_v13_0_10
enable thermal alert on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:34 -04:00
Yifan Zha
97a3d6090f drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
[Why]
L1 blocks most of GC registers accessing by MMIO.

[How]
Use RLCG interface to program GC registers under SRIOV VF in full access time.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:24 -04:00
Nathan Chancellor
e688ba3e27 drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback
When booting a kernel compiled with CONFIG_CFI_CLANG on a machine with
an RX 6700 XT, there is a CFI failure in kfd_destroy_mqd_cp():

  [   12.894543] CFI failure at kfd_destroy_mqd_cp+0x2a/0x40 [amdgpu] (target: hqd_destroy_v10_3+0x0/0x260 [amdgpu]; expected type: 0x8594d794)

Clang's kernel Control Flow Integrity (kCFI) makes sure that all
indirect call targets have a type that exactly matches the function
pointer prototype. In this case, hqd_destroy()'s third parameter,
reset_type, should have a type of 'uint32_t' but every implementation of
this callback has a third parameter type of 'enum kfd_preempt_type'.

Update the function pointer prototype to match reality so that there is
no more CFI violation.

Link: https://github.com/ClangBuiltLinux/linux/issues/1738
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:12 -04:00
Guenter Roeck
8a70b2d89e drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o
Building 32-bit images may fail with the following error.

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:
	In function ‘dml32_UseMinimumDCFCLK’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:3142:1:
	error: the frame size of 1096 bytes is larger than 1024 bytes

This is seen when building i386:allmodconfig with any of the following
compilers.

	gcc (Debian 12.2.0-3) 12.2.0
	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

The problem is not seen if the compiler supports GCC_PLUGIN_LATENT_ENTROPY
because in that case CONFIG_FRAME_WARN is already set to 2048 even for
32-bit builds.

dml32_UseMinimumDCFCLK() was introduced with commit dda4fb85e4
("drm/amd/display: DML changes for DCN32/321"). It declares a large
number of local variables. Increase the frame size for the affected
file to 2048, similar to other files in the same directory, to enable
32-bit build tests with affected compilers.

Fixes: dda4fb85e4 ("drm/amd/display: DML changes for DCN32/321")
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reported-by: Łukasz Bartosik <ukaszb@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:03 -04:00
Tim Huang
31c261a7ff drm/amd/pm: add SMU IP v13.0.4 IF version define to V7
The pmfw has changed the driver interface version, so keep same with the
fw.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:51 -04:00
Tim Huang
853fdb4916 drm/amd/pm: update SMU IP v13.0.4 driver interface version
Update the SMU driver interface version to V7.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:44 -04:00
ZhenGuo Yin
5fa993737b drm/amd/pm: Init pm_attr_list when dpm is disabled
[Why]
In SRIOV multi-vf, dpm is always disabled, and pm_attr_list won't
be initialized. There will be a NULL pointer call trace after
removing the dpm check condition in amdgpu_pm_sysfs_fini.
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:amdgpu_device_attr_remove_groups+0x20/0x90 [amdgpu]
Call Trace:
  <TASK>
  amdgpu_pm_sysfs_fini+0x2f/0x40 [amdgpu]
  amdgpu_device_fini_hw+0xdf/0x290 [amdgpu]

[How]
List pm_attr_list should be initialized when dpm is disabled.

Fixes: a6ad27cec5 ("drm/amd/pm: Remove redundant check condition")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:36 -04:00
Evan Quan
3059cd8c5f drm/amd/pm: disable cstate feature for gpu reset scenario
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:20 -04:00
Evan Quan
ba2f09960e drm/amd/pm: fulfill SMU13.0.7 cstate control interface
Fulfill the functionality for cstate control.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:14 -04:00
Evan Quan
528c0e66e0 drm/amd/pm: fulfill SMU13.0.0 cstate control interface
Fulfill the functionality for cstate control.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:01 -04:00
YiPeng Chai
3bd026c3e3 drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported
V2:
Add sriov vf ras support in amdgpu_ras_asic_supported.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:48 -04:00
YiPeng Chai
001ebcf5b9 drm/amdgpu: Enable ras support for mp0 v13_0_0 and v13_0_10
V1:
Enable ras support for CHIP_IP_DISCOVERY asic type.

V2:
1. Change commit comment.
2. Enable ras support for mp0 v13_0_0 and v13_0_10.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:37 -04:00
YiPeng Chai
7cd3f6c3ac drm/amdgpu: Enable gmc soft reset on gmc_v11_0_3
Enable gmc soft reset on gmc_v11_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:48 -04:00
Likun Gao
b7a76a2914 drm/amdgpu: skip mes self test for gc 11.0.3
Temporary disable mes self teset for gc 11.0.3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:41 -04:00
Kenneth Feng
f700486cd1 drm/amd/pm: skip loading pptable from driver on secure board for smu_v13_0_10
skip loading pptable from driver on secure board since it's loaded from psp.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Guan Yu <Guan.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:31 -04:00
Kenneth Feng
657e07221c drm/amd/amdgpu: enable gfx clock gating features on smu_v13_0_10
enable gfx clock gating features on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:23 -04:00
Kenneth Feng
4c7f9a3c15 drm/amd/pm: remove the pptable id override on smu_v13_0_10
remove the pptable id override on smu_v13_0_10,
and the id is fetched from vbios now.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:17 -04:00
Kenneth Feng
4d72a4e4fb drm/amd/pm: temporarily disable thermal alert on smu_v13_0_10
temporarily disable thermal alert on smu_v13_0_10 due to kfd test fail.
will enable it again after confirming the thermal hardware setting.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:10 -04:00
Asher Song
4545ae2ed3 drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly"
This reverts commit 16fb4dca95.

Unfortunately, that commit causes fan monitors can't be read and written
properly.

Fixes: 16fb4dca95 ("drm/amdgpu: getting fan speed pwm for vega10 properly")
Signed-off-by: Asher Song <Asher.Song@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:49 -04:00
Victor Zhao
a31e62873f drm/amdgpu: Refactor mode2 reset logic for v11.0.7
- refactor mode2 on v11.0.7 to align with aldebaran
- comment out using mode2 reset as default for now, will introduce
another controller to replace previous reset_level_mask

v2: squash in unused variable removal (Alex)

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:40 -04:00
Victor Zhao
a340847b02 Revert "drm/amdgpu: let mode2 reset fallback to default when failure"
This reverts commit dac6b80818.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:33 -04:00
Victor Zhao
afbaa15501 Revert "drm/amdgpu: add debugfs amdgpu_reset_level"
This reverts commit 5bd8d53f6f.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6f ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:25 -04:00
Danijel Slivka
65f8682b9a drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.

v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT
which indicates that VF MMIO write access is not allowed in sriov runtime

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:07:58 -04:00
Linus Torvalds
f1947d7c8a Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
 "This time with some large scale treewide cleanups.

  The intent of this pull is to clean up the way callers fetch random
  integers. The current rules for doing this right are:

   - If you want a secure or an insecure random u64, use get_random_u64()

   - If you want a secure or an insecure random u32, use get_random_u32()

     The old function prandom_u32() has been deprecated for a while
     now and is just a wrapper around get_random_u32(). Same for
     get_random_int().

   - If you want a secure or an insecure random u16, use get_random_u16()

   - If you want a secure or an insecure random u8, use get_random_u8()

   - If you want secure or insecure random bytes, use get_random_bytes().

     The old function prandom_bytes() has been deprecated for a while
     now and has long been a wrapper around get_random_bytes()

   - If you want a non-uniform random u32, u16, or u8 bounded by a
     certain open interval maximum, use prandom_u32_max()

     I say "non-uniform", because it doesn't do any rejection sampling
     or divisions. Hence, it stays within the prandom_*() namespace, not
     the get_random_*() namespace.

     I'm currently investigating a "uniform" function for 6.2. We'll see
     what comes of that.

  By applying these rules uniformly, we get several benefits:

   - By using prandom_u32_max() with an upper-bound that the compiler
     can prove at compile-time is ≤65536 or ≤256, internally
     get_random_u16() or get_random_u8() is used, which wastes fewer
     batched random bytes, and hence has higher throughput.

   - By using prandom_u32_max() instead of %, when the upper-bound is
     not a constant, division is still avoided, because
     prandom_u32_max() uses a faster multiplication-based trick instead.

   - By using get_random_u16() or get_random_u8() in cases where the
     return value is intended to indeed be a u16 or a u8, we waste fewer
     batched random bytes, and hence have higher throughput.

  This series was originally done by hand while I was on an airplane
  without Internet. Later, Kees and I worked on retroactively figuring
  out what could be done with Coccinelle and what had to be done
  manually, and then we split things up based on that.

  So while this touches a lot of files, the actual amount of code that's
  hand fiddled is comfortably small"

* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  prandom: remove unused functions
  treewide: use get_random_bytes() when possible
  treewide: use get_random_u32() when possible
  treewide: use get_random_{u8,u16}() when possible, part 2
  treewide: use get_random_{u8,u16}() when possible, part 1
  treewide: use prandom_u32_max() when possible, part 2
  treewide: use prandom_u32_max() when possible, part 1
2022-10-16 15:27:07 -07:00
Linus Torvalds
5e714bf171 Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:

 - fix a race which causes page refcounting errors in ZONE_DEVICE pages
   (Alistair Popple)

 - fix userfaultfd test harness instability (Peter Xu)

 - various other patches in MM, mainly fixes

* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
  highmem: fix kmap_to_page() for kmap_local_page() addresses
  mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
  mm/selftest: uffd: explain the write missing fault check
  mm/hugetlb: use hugetlb_pte_stable in migration race check
  mm/hugetlb: fix race condition of uffd missing/minor handling
  zram: always expose rw_page
  LoongArch: update local TLB if PTE entry exists
  mm: use update_mmu_tlb() on the second thread
  kasan: fix array-bounds warnings in tests
  hmm-tests: add test for migrate_device_range()
  nouveau/dmem: evict device private memory during release
  nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
  mm/migrate_device.c: add migrate_device_range()
  mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
  mm/memremap.c: take a pgmap reference on page allocation
  mm: free device private pages have zero refcount
  mm/memory.c: fix race when faulting a device private page
  mm/damon: use damon_sz_region() in appropriate place
  mm/damon: move sz_damon_region to damon_sz_region
  lib/test_meminit: add checks for the allocation functions
  ...
2022-10-14 12:28:43 -07:00
Nathan Chancellor
2130b87b22 drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n
After commit 8799c0be89 ("drm/amd/display: Fix vblank refcount in vrr
transition"), a build with CONFIG_DEBUG_FS=n is broken due to a
misplaced brace, along the lines of:

  In file included from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_trace.h:39,
                   from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:41:
  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: At top level:
  ./include/drm/drm_atomic.h:864:9: error: expected identifier or ‘(’ before ‘for’
    864 |         for ((__i) = 0;                                                 \
        |         ^~~
  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8317:9: note: in expansion of macro ‘for_each_new_crtc_in_state’
   8317 |         for_each_new_crtc_in_state(state, crtc, new_crtc_state, j)
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~

Move the brace within the #ifdef so that the file can be built with or
without CONFIG_DEBUG_FS.

Fixes: 8799c0be89 ("drm/amd/display: Fix vblank refcount in vrr transition")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-14 11:13:19 -07:00
Linus Torvalds
9c9155a350 Merge tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm
Pull more drm updates from Dave Airlie:
 "Round of fixes for the merge window stuff, bunch of amdgpu and i915
  changes, this should have the gcc11 warning fix, amongst other
  changes.

  amdgpu:
   - DC mutex fix
   - DC SubVP fixes
   - DCN 3.2.x fixes
   - DCN 3.1.x fixes
   - SDMA 6.x fixes
   - Enable DPIA for 3.1.4
   - VRR fixes
   - VRAM BO swapping fix
   - Revert dirty fb helper change
   - SR-IOV suspend/resume fixes
   - Work around GCC array bounds check fail warning
   - UMC 8.10 fixes
   - Misc fixes and cleanups

  i915:
   - Round to closest in g4x+ HDMI clock readout
   - Update MOCS table for EHL
   - Fix PSR_IMR/IIR field handling
   - Fix watermark calculations for gen12+/DG2 modifiers
   - Reject excessive dotclocks early
   - Fix revocation of non-persistent contexts
   - Handle migration for dpt
   - Fix display problems after resume
   - Allow control over the flags when migrating
   - Consider DG2_RC_CCS_CC when migrating buffers"

* tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm: (110 commits)
  drm/amd/display: Add HUBP surface flip interrupt handler
  drm/i915/display: consider DG2_RC_CCS_CC when migrating buffers
  drm/i915: allow control over the flags when migrating
  drm/amd/display: Simplify bool conversion
  drm/amd/display: fix transfer function passed to build_coefficients()
  drm/amd/display: add a license to cursor_reg_cache.h
  drm/amd/display: make virtual_disable_link_output static
  drm/amd/display: fix indentation in dc.c
  drm/amd/display: make dcn32_split_stream_for_mpc_or_odm static
  drm/amd/display: fix build error on arm64
  drm/amd/display: 3.2.207
  drm/amd/display: Clean some DCN32 macros
  drm/amdgpu: Add poison mode query for umc v8_10_0
  drm/amdgpu: Update umc v8_10_0 headers
  drm/amdgpu: fix coding style issue for mca notifier
  drm/amdgpu: define convert_error_address for umc v8.7
  drm/amdgpu: define RAS convert_error_address API
  drm/amdgpu: remove check for CE in RAS error address query
  drm/i915: Fix display problems after resume
  drm/amd/display: fix array-bounds error in dc_stream_remove_writeback() [take 2]
  ...
2022-10-13 21:56:34 -07:00
Dave Airlie
fc3523a833 Merge tag 'amd-drm-fixes-6.1-2022-10-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.1-2022-10-12:

amdgpu:
- DC mutex fix
- DC SubVP fixes
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- SDMA 6.x fixes
- Enable DPIA for 3.1.4
- VRR fixes
- VRAM BO swapping fix
- Revert dirty fb helper change
- SR-IOV suspend/resume fixes
- Work around GCC array bounds check fail warning
- UMC 8.10 fixes
- Misc fixes and cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221012162650.8810-1-alexander.deucher@amd.com
2022-10-14 07:47:25 +10:00
Dave Airlie
e55978a4f2 Merge tag 'drm-intel-next-fixes-2022-10-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Fix revocation of non-persistent contexts (Tvrtko Ursulin)
- Handle migration for dpt (Matthew Auld)
- Fix display problems after resume (Thomas Hellström)
- Allow control over the flags when migrating (Matthew Auld)
- Consider DG2_RC_CCS_CC when migrating buffers (Matthew Auld)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y0gK9QmCmktLLzqp@tursulin-desk
2022-10-14 07:46:50 +10:00
Maxime Ripard
4190e8bbcb drm/vc4: hdmi: Check the HSM rate at runtime_resume
If our HSM clock has not been properly initialized, any register access
will silently lock up the system.

Let's check that this can't happen by adding a check for the rate before
any register access, and error out otherwise.

Link: https://lore.kernel.org/dri-devel/20220922145448.w3xfywkn5ecak2et@pengutronix.de/
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220929-rpi-pi3-unplugged-fixes-v1-2-cd22e962296c@cerno.tech
2022-10-13 13:57:04 +02:00
Maxime Ripard
ae71ab585c drm/vc4: hdmi: Enforce the minimum rate at runtime_resume
This is a revert of commit fd5894fa24 ("drm/vc4: hdmi: Remove clock
rate initialization"), with the code slightly moved around.

It turns out that we can't downright remove that code from the driver,
since the Pi0-3 and Pi4 are in different cases, and it only works for
the Pi4.

Indeed, the commit mentioned above was relying on the RaspberryPi
firmware clocks driver to initialize the rate if it wasn't done by the
firmware. However, the Pi0-3 are using the clk-bcm2835 clock driver that
wasn't doing this initialization. We therefore end up with the clock not
being assigned a rate, and the CPU stalling when trying to access a
register.

We can't move that initialization in the clk-bcm2835 driver, since the
HSM clock we depend on is actually part of the HDMI power domain, so any
rate setup is only valid when the power domain is enabled. Thus, we
reinstated the minimum rate setup at runtime_suspend, which should
address both issues.

Link: https://lore.kernel.org/dri-devel/20220922145448.w3xfywkn5ecak2et@pengutronix.de/
Fixes: fd5894fa24 ("drm/vc4: hdmi: Remove clock rate initialization")
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220929-rpi-pi3-unplugged-fixes-v1-1-cd22e962296c@cerno.tech
2022-10-13 13:56:52 +02:00
Maxime Ripard
d1c0b7de4d drm/vc4: Add module dependency on hdmi-codec
The VC4 HDMI controller driver relies on the HDMI codec ASoC driver. In
order to set it up properly, in vc4_hdmi_audio_init(), our HDMI driver
will register a device matching the HDMI codec driver, and then register
an ASoC card using that codec.

However, if vc4 is compiled as a module, chances are that the hdmi-codec
driver will be too. In such a case, the module loader will have a very
narrow window to load the module between the device registration and the
card registration.

If it fails to load the module in time, the card registration will fail
with EPROBE_DEFER, and we'll abort the audio initialisation,
unregistering the HDMI codec device in the process.

The next time the bind callback will be run, it's likely that we end up
missing that window again, effectively preventing vc4 to probe entirely.

In order to prevent this, we can create a soft dependency of the vc4
driver on the HDMI codec one so that we're sure the HDMI codec will be
loaded before the VC4 module is, and thus we'll never end up in the
previous situation.

Fixes: 91e99e1139 ("drm/vc4: hdmi: Register HDMI codec")
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220902144111.3424560-1-maxime@cerno.tech
2022-10-13 13:44:40 +02:00
Alistair Popple
249881232e nouveau/dmem: evict device private memory during release
When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to still be mapped in currently running
processes.  This can lead to a hangs and RCU stall warnings when unbinding
the device as memunmap_pages() will wait in an uninterruptible state until
all device pages have been freed which may never happen.

Fix this by migrating device mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Link: https://lkml.kernel.org/r/66277601fb8fda9af408b33da9887192bf895bda.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:50 -07:00
Alistair Popple
d9b719394a nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
the migrate_to_ram() callback and is used to copy data from GPU to CPU
memory.  It is currently specific to fault handling, however a future
patch implementing eviction of data during teardown needs similar
functionality.

Refactor out the core functionality so that it is not specific to fault
handling.

Link: https://lkml.kernel.org/r/20573d7b4e641a78fde9935f948e64e71c9e709e.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:49 -07:00
Alistair Popple
ef23345089 mm: free device private pages have zero refcount
Since 27674ef6c7 ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use.  However before handing them back to the
owning device driver we add an extra reference count such that free pages
have a reference count of one.

This makes it difficult to tell if a page is free or not because both free
and in use pages will have a non-zero refcount.  Instead we should return
pages to the drivers page allocator with a zero reference count.  Kernel
code can then safely use kernel functions such as get_page_unless_zero().

Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:49 -07:00