Drivers' .remove and .shutdown callbacks are executed on different code
paths. The former is called when a device is removed from the bus, while
the latter is called at system shutdown time to quiesce the device.
This means that some overlap exists between the two, because both have to
take care of properly shutting down the hardware. But currently the logic
used in these two callbacks isn't consistent in msm drivers, which could
lead to kernel panic.
For example, on .remove the component is deleted and its .unbind callback
leads to the hardware being shutdown but only if the DRM device has been
marked as registered.
That check doesn't exist in the .shutdown logic and this can lead to the
driver calling drm_atomic_helper_shutdown() for a DRM device that hasn't
been properly initialized.
A situation like this can happen if drivers for expected sub-devices fail
to probe, since the .bind callback will never be executed. If that is the
case, drm_atomic_helper_shutdown() will attempt to take mutexes that are
only initialized if drm_mode_config_init() is called during a device bind.
This bug was attempted to be fixed in commit 623f279c77 ("drm/msm: fix
shutdown hook in case GPU components failed to bind"), but unfortunately
it still happens in some cases as the one mentioned above, i.e:
systemd-shutdown[1]: Powering off.
kvm: exiting hardware virtualization
platform wifi-firmware.0: Removing from iommu group 12
platform video-firmware.0: Removing from iommu group 10
------------[ cut here ]------------
WARNING: CPU: 6 PID: 1 at drivers/gpu/drm/drm_modeset_lock.c:317 drm_modeset_lock_all_ctx+0x3c4/0x3d0
...
Hardware name: Google CoachZ (rev3+) (DT)
pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : drm_modeset_lock_all_ctx+0x3c4/0x3d0
lr : drm_modeset_lock_all_ctx+0x48/0x3d0
sp : ffff80000805bb80
x29: ffff80000805bb80 x28: ffff327c00128000 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000001 x24: ffffc95d820ec030
x23: ffff327c00bbd090 x22: ffffc95d8215eca0 x21: ffff327c039c5800
x20: ffff327c039c5988 x19: ffff80000805bbe8 x18: 0000000000000034
x17: 000000040044ffff x16: ffffc95d80cac920 x15: 0000000000000000
x14: 0000000000000315 x13: 0000000000000315 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : ffff80000805bc28 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
x2 : ffff327c00128000 x1 : 0000000000000000 x0 : ffff327c039c59b0
Call trace:
drm_modeset_lock_all_ctx+0x3c4/0x3d0
drm_atomic_helper_shutdown+0x70/0x134
msm_drv_shutdown+0x30/0x40
platform_shutdown+0x28/0x40
device_shutdown+0x148/0x350
kernel_power_off+0x38/0x80
__do_sys_reboot+0x288/0x2c0
__arm64_sys_reboot+0x28/0x34
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0x44/0xec
do_el0_svc+0x2c/0xc0
el0_svc+0x2c/0x84
el0t_64_sync_handler+0x11c/0x150
el0t_64_sync+0x18c/0x190
---[ end trace 0000000000000000 ]---
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000010eab1000
[0000000000000018] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
...
Hardware name: Google CoachZ (rev3+) (DT)
pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : ww_mutex_lock+0x28/0x32c
lr : drm_modeset_lock_all_ctx+0x1b0/0x3d0
sp : ffff80000805bb50
x29: ffff80000805bb50 x28: ffff327c00128000 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000018
x23: ffff80000805bc10 x22: ffff327c039c5ad8 x21: ffff327c039c5800
x20: ffff80000805bbe8 x19: 0000000000000018 x18: 0000000000000034
x17: 000000040044ffff x16: ffffc95d80cac920 x15: 0000000000000000
x14: 0000000000000315 x13: 0000000000000315 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : ffff80000805bc28 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
x2 : ffff327c00128000 x1 : 0000000000000000 x0 : 0000000000000018
Call trace:
ww_mutex_lock+0x28/0x32c
drm_modeset_lock_all_ctx+0x1b0/0x3d0
drm_atomic_helper_shutdown+0x70/0x134
msm_drv_shutdown+0x30/0x40
platform_shutdown+0x28/0x40
device_shutdown+0x148/0x350
kernel_power_off+0x38/0x80
__do_sys_reboot+0x288/0x2c0
__arm64_sys_reboot+0x28/0x34
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0x44/0xec
do_el0_svc+0x2c/0xc0
el0_svc+0x2c/0x84
el0t_64_sync_handler+0x11c/0x150
el0t_64_sync+0x18c/0x190
Code: aa0103f4 d503201f d2800001 aa0103e3 (c8e37c02)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Kernel Offset: 0x495d77c00000 from 0xffff800008000000
PHYS_OFFSET: 0xffffcd8500000000
CPU features: 0x800,00c2a015,19801c82
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Fixes: 9d5cbf5fe4 ("drm/msm: add shutdown support for display platform_driver")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220816134612.916527-1-javierm@redhat.com
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
Pull char / misc driver updates from Greg KH:
"Here is the large set of char and misc and other driver subsystem
changes for 6.0-rc1.
Highlights include:
- large set of IIO driver updates, additions, and cleanups
- new habanalabs device support added (loads of register maps much
like GPUs have)
- soundwire driver updates
- phy driver updates
- slimbus driver updates
- tiny virt driver fixes and updates
- misc driver fixes and updates
- interconnect driver updates
- hwtracing driver updates
- fpga driver updates
- extcon driver updates
- firmware driver updates
- counter driver update
- mhi driver fixes and updates
- binder driver fixes and updates
- speakup driver fixes
All of these have been in linux-next for a while without any reported
problems"
* tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (634 commits)
drivers: lkdtm: fix clang -Wformat warning
char: remove VR41XX related char driver
misc: Mark MICROCODE_MINOR unused
spmi: trace: fix stack-out-of-bound access in SPMI tracing functions
dt-bindings: iio: adc: Add compatible for MT8188
iio: light: isl29028: Fix the warning in isl29028_remove()
iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes
iio: fix iio_format_avail_range() printing for none IIO_VAL_INT
iio: adc: max1027: unlock on error path in max1027_read_single_value()
iio: proximity: sx9324: add empty line in front of bullet list
iio: magnetometer: hmc5843: Remove duplicate 'the'
iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
...
Rename "GEM CMA" helpers to "GEM DMA" helpers - considering the
hierarchy of APIs (mm/cma -> dma -> gem dma) calling them "GEM
DMA" seems to be more applicable.
Besides that, commit e57924d4ae ("drm/doc: Task to rename CMA helpers")
requests to rename the CMA helpers and implies that people seem to be
confused about the naming.
In order to do this renaming the following script was used:
```
#!/bin/bash
DIRS="drivers/gpu include/drm Documentation/gpu"
REGEX_SYM_UPPER="[0-9A-Z_\-]"
REGEX_SYM_LOWER="[0-9a-z_\-]"
REGEX_GREP_UPPER="(${REGEX_SYM_UPPER}*)(GEM)_CMA_(${REGEX_SYM_UPPER}*)"
REGEX_GREP_LOWER="(${REGEX_SYM_LOWER}*)(gem)_cma_(${REGEX_SYM_LOWER}*)"
REGEX_SED_UPPER="s/${REGEX_GREP_UPPER}/\1\2_DMA_\3/g"
REGEX_SED_LOWER="s/${REGEX_GREP_LOWER}/\1\2_dma_\3/g"
# Find all upper case 'CMA' symbols and replace them with 'DMA'.
for ff in $(grep -REHl "${REGEX_GREP_UPPER}" $DIRS)
do
sed -i -E "$REGEX_SED_UPPER" $ff
done
# Find all lower case 'cma' symbols and replace them with 'dma'.
for ff in $(grep -REHl "${REGEX_GREP_LOWER}" $DIRS)
do
sed -i -E "$REGEX_SED_LOWER" $ff
done
# Replace all occurrences of 'CMA' / 'cma' in comments and
# documentation files with 'DMA' / 'dma'.
for ff in $(grep -RiHl " cma " $DIRS)
do
sed -i -E "s/ cma / dma /g" $ff
sed -i -E "s/ CMA / DMA /g" $ff
done
# Rename all 'cma_obj's to 'dma_obj'.
for ff in $(grep -RiHl "cma_obj" $DIRS)
do
sed -i -E "s/cma_obj/dma_obj/g" $ff
done
```
Only a few more manual modifications were needed, e.g. reverting the
following modifications in some DRM Kconfig files
- select CMA if HAVE_DMA_CONTIGUOUS
+ select DMA if HAVE_DMA_CONTIGUOUS
as well as manually picking the occurrences of 'CMA'/'cma' in comments and
documentation which relate to "GEM CMA", but not "FB CMA".
Also drivers/gpu/drm/Makefile was fixed up manually after renaming
drm_gem_cma_helper.c to drm_gem_dma_helper.c.
This patch is compile-time tested building a x86_64 kernel with
`make allyesconfig && make drivers/gpu/drm`.
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> #drivers/gpu/drm/arm
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220802000405.949236-4-dakr@redhat.com
Next for v5.20
GPU:
- a619 support
- Fix for unclocked GMU register access
- Devcore dump enhancements
Core:
- client utilization via fdinfo support
- fix fence rollover issue
- gem: Lockdep false-positive warning fix
- gem: Switch to pfn mappings
DPU:
- constification of HW catalog
- support for using encoder as CRC source
- WB support on sc7180
- WB resolution fixes
DP:
- dropped custom bulk clock implementation
- made dp_bridge_mode_valid() return MODE_CLOCK_HIGH where applicable
- fix link retraining on resolution change
MDP5:
- MSM8953 perf data
HDMI:
- YAML'ification of schema
- dropped obsolete GPIO support
- misc cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtuqswBGPw-kCYzJvckK2RR1XTeUEgaXwVG_mvpbv3gPA@mail.gmail.com
To avoid preventing the display from coming up before the rootfs is
mounted, without resorting to packing fw in the initrd, the GPU has
this limbo state where the device is probed, but we aren't ready to
start sending commands to it. This is particularly problematic for
a6xx, since the GMU (which requires fw to be loaded) is the one that
is controlling the power/clk/icc votes.
So defer enabling runpm until we are ready to call gpu->hw_init(),
as that is a point where we know we have all the needed fw and are
ready to start sending commands to the coproc's.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/489337/
Link: https://lore.kernel.org/r/20220613182036.2567963-1-robdclark@gmail.com