linux/mm
Charan Teja Kalla 5ec8e8ea8b mm/sparsemem: fix race in accessing memory_section->usage
The below race is observed on a PFN which falls into the device memory
region with the system memory configuration where PFN's are such that
[ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL].  Since normal zone start and end
pfn contains the device memory PFN's as well, the compaction triggered
will try on the device memory PFN's too though they end up in NOP(because
pfn_to_online_page() returns NULL for ZONE_DEVICE memory sections).  When
from other core, the section mappings are being removed for the
ZONE_DEVICE region, that the PFN in question belongs to, on which
compaction is currently being operated is resulting into the kernel crash
with CONFIG_SPASEMEM_VMEMAP enabled.  The crash logs can be seen at [1].

compact_zone()			memunmap_pages
-------------			---------------
__pageblock_pfn_to_page
   ......
 (a)pfn_valid():
     valid_section()//return true
			      (b)__remove_pages()->
				  sparse_remove_section()->
				    section_deactivate():
				    [Free the array ms->usage and set
				     ms->usage = NULL]
     pfn_section_valid()
     [Access ms->usage which
     is NULL]

NOTE: From the above it can be said that the race is reduced to between
the pfn_valid()/pfn_section_valid() and the section deactivate with
SPASEMEM_VMEMAP enabled.

The commit b943f045a9af("mm/sparse: fix kernel crash with
pfn_section_valid check") tried to address the same problem by clearing
the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns
false thus ms->usage is not accessed.

Fix this issue by the below steps:

a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.

b) RCU protected read side critical section will either return NULL
   when SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.

c) Free the ->usage with kfree_rcu() and set ms->usage = NULL.  No
   attempt will be made to access ->usage after this as the
   SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.

Thanks to David/Pavan for their inputs on this patch.

[1] https://lore.kernel.org/linux-mm/994410bb-89aa-d987-1f50-f514903c55aa@quicinc.com/

On Snapdragon SoC, with the mentioned memory configuration of PFN's as
[ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL], we are able to see bunch of
issues daily while testing on a device farm.

For this particular issue below is the log.  Though the below log is
not directly pointing to the pfn_section_valid(){ ms->usage;}, when we
loaded this dump on T32 lauterbach tool, it is pointing.

[  540.578056] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
[  540.578068] Mem abort info:
[  540.578070]   ESR = 0x0000000096000005
[  540.578073]   EC = 0x25: DABT (current EL), IL = 32 bits
[  540.578077]   SET = 0, FnV = 0
[  540.578080]   EA = 0, S1PTW = 0
[  540.578082]   FSC = 0x05: level 1 translation fault
[  540.578085] Data abort info:
[  540.578086]   ISV = 0, ISS = 0x00000005
[  540.578088]   CM = 0, WnR = 0
[  540.579431] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBSBTYPE=--)
[  540.579436] pc : __pageblock_pfn_to_page+0x6c/0x14c
[  540.579454] lr : compact_zone+0x994/0x1058
[  540.579460] sp : ffffffc03579b510
[  540.579463] x29: ffffffc03579b510 x28: 0000000000235800 x27:000000000000000c
[  540.579470] x26: 0000000000235c00 x25: 0000000000000068 x24:ffffffc03579b640
[  540.579477] x23: 0000000000000001 x22: ffffffc03579b660 x21:0000000000000000
[  540.579483] x20: 0000000000235bff x19: ffffffdebf7e3940 x18:ffffffdebf66d140
[  540.579489] x17: 00000000739ba063 x16: 00000000739ba063 x15:00000000009f4bff
[  540.579495] x14: 0000008000000000 x13: 0000000000000000 x12:0000000000000001
[  540.579501] x11: 0000000000000000 x10: 0000000000000000 x9 :ffffff897d2cd440
[  540.579507] x8 : 0000000000000000 x7 : 0000000000000000 x6 :ffffffc03579b5b4
[  540.579512] x5 : 0000000000027f25 x4 : ffffffc03579b5b8 x3 :0000000000000001
[  540.579518] x2 : ffffffdebf7e3940 x1 : 0000000000235c00 x0 :0000000000235800
[  540.579524] Call trace:
[  540.579527]  __pageblock_pfn_to_page+0x6c/0x14c
[  540.579533]  compact_zone+0x994/0x1058
[  540.579536]  try_to_compact_pages+0x128/0x378
[  540.579540]  __alloc_pages_direct_compact+0x80/0x2b0
[  540.579544]  __alloc_pages_slowpath+0x5c0/0xe10
[  540.579547]  __alloc_pages+0x250/0x2d0
[  540.579550]  __iommu_dma_alloc_noncontiguous+0x13c/0x3fc
[  540.579561]  iommu_dma_alloc+0xa0/0x320
[  540.579565]  dma_alloc_attrs+0xd4/0x108

[quic_charante@quicinc.com: use kfree_rcu() in place of synchronize_rcu(), per David]
  Link: https://lkml.kernel.org/r/1698403778-20938-1-git-send-email-quic_charante@quicinc.com
Link: https://lkml.kernel.org/r/1697202267-23600-1-git-send-email-quic_charante@quicinc.com
Fixes: f46edbd1b1 ("mm/sparsemem: add helpers track active portions of a section at boot")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 11:58:43 -08:00
..
damon mm/damon/core-test: test max_nr_accesses overflow caused divide-by-zero 2023-12-20 14:48:13 -08:00
kasan kasan: memset free track in qlink_free 2023-12-29 11:58:42 -08:00
kfence LoongArch changes for v6.6 2023-09-08 12:16:52 -07:00
kmsan kmsan: use stack_depot_save instead of __stack_depot_save 2023-12-10 16:51:46 -08:00
backing-dev.c writeback: remove redundant checks for root memcg 2023-08-21 13:37:48 -07:00
balloon_compaction.c
bootmem_info.c bootmem: use kmemleak_free_part_phys in put_page_bootmem 2023-10-25 16:47:13 -07:00
cma_debug.c
cma_sysfs.c
cma.c mm: cma: remove unnecessary initialization of ret 2023-12-12 10:57:08 -08:00
cma.h
compaction.c mm: compaction: avoid fast_isolate_freepages blindly choose improper pageblock 2023-12-12 10:57:08 -08:00
debug_page_alloc.c
debug_page_ref.c
debug_vm_pgtable.c mm: fix multiple typos in multiple files 2023-10-25 16:47:14 -07:00
debug.c
dmapool_test.c
dmapool.c
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c
failslab.c
filemap.c sync mm-stable with mm-hotfixes-stable to pick up depended-upon changes 2023-12-20 14:47:18 -08:00
folio-compat.c mm: remove page_add_new_anon_rmap and lru_cache_add_inactive_or_unevictable 2023-12-29 11:58:27 -08:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h
gup.c mm/gup: fix follow_devmap_p[mu]d() on page==NULL handling 2023-12-10 16:51:52 -08:00
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c userfaultfd: UFFDIO_MOVE uABI 2023-12-29 11:58:24 -08:00
hugetlb_cgroup.c mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER 2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: move mmap lock to vmemmap_remap_range() 2023-12-12 10:57:08 -08:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hugetlb.c hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write 2023-12-06 16:12:43 -08:00
hwpoison-inject.c
init-mm.c mm: move dummy_vm_ops out of a header 2023-08-21 13:37:46 -07:00
internal.h mm: use vma_pages() for vma objects 2023-12-12 10:57:08 -08:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig mm/thp: add CONFIG_TRANSPARENT_HUGEPAGE_NEVER option 2023-12-12 10:57:07 -08:00
Kconfig.debug
khugepaged.c mm/khugepaged: remove redundant try_to_freeze() 2023-12-29 11:58:42 -08:00
kmemleak.c kmemleak: avoid RCU stalls when freeing metadata for per-CPU pointers 2023-12-12 10:57:07 -08:00
ksm.c mm/ksm: add tracepoint for ksm advisor 2023-12-29 11:58:27 -08:00
list_lru.c mm/list_lru.c: remove unused list_lru_from_kmem() 2023-12-20 14:48:11 -08:00
maccess.c
madvise.c mm: return a folio from read_swap_cache_async() 2023-12-29 11:58:32 -08:00
Makefile mm: vmscan: move shrinker-related code into a separate file 2023-10-04 10:32:23 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c NUMA: optimize detection of memory with no node id assigned by firmware 2023-12-10 16:51:34 -08:00
memcontrol.c mm: memcg: restore subtree stats flushing 2023-12-20 14:48:11 -08:00
memfd.c memfd: drop warning for missing exec-related flags 2023-10-04 10:32:22 -07:00
memory_hotplug.c mm/memory_hotplug: split memmap_on_memory requests across memblocks 2023-12-10 16:51:34 -08:00
memory-failure.c sync mm-stable with mm-hotfixes-stable to pick up depended-upon changes 2023-12-20 14:47:18 -08:00
memory-tiers.c dax, kmem: calculate abstract distance with general interface 2023-10-16 15:44:39 -07:00
memory.c mm: convert swap_readpage() to swap_read_folio() 2023-12-29 11:58:31 -08:00
mempolicy.c Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
mempool.c mempool: introduce mempool_use_prealloc_only 2023-12-29 11:58:39 -08:00
memremap.c mm: remove stale example from comment 2023-12-29 11:58:26 -08:00
memtest.c mm: memtest: convert to memtest_report_meminfo() 2023-08-21 13:37:47 -07:00
migrate_device.c mm: convert migrate_vma_insert_page() to use a folio 2023-12-29 11:58:26 -08:00
migrate.c mm: migrate: fix getting incorrect page mapping during page migration 2023-12-29 11:58:32 -08:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c mm: mlock: avoid folio_within_range() on KSM pages 2023-10-25 16:47:14 -07:00
mm_init.c mm/mm_init.c: append newline to the unavailable ranges log-message 2023-12-10 16:51:51 -08:00
mm_slot.h
mmap_lock.c
mmap.c mmap: remove the IA64-specific vma expansion implementation 2023-12-10 16:51:39 -08:00
mmu_gather.c mm: fix kernel-doc warning from tlb_flush_rmaps() 2023-08-24 16:20:30 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
mprotect.c mm: mprotect: use a folio in change_pte_range() 2023-10-25 16:47:12 -07:00
mremap.c mm: abstract VMA merge and extend into vma_merge_extend() helper 2023-10-18 14:34:18 -07:00
msync.c
nommu.c Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
oom_kill.c mm, oom:dump_tasks add rss detailed information printing 2023-12-10 16:51:53 -08:00
page_alloc.c mm: page_alloc: simplify __free_pages_ok() 2023-12-20 14:48:14 -08:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c
page_io.c mm: convert swap_readpage() to swap_read_folio() 2023-12-29 11:58:31 -08:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_owner: record and dump free_pid and free_tgid 2023-12-10 16:51:40 -08:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: thp: introduce multi-size THP sysfs interface 2023-12-20 14:48:12 -08:00
page-writeback.c mm: return void from folio_start_writeback() and related functions 2023-12-10 16:51:37 -08:00
pagewalk.c mm: pagewalk: assert write mmap lock only for walking the user page tables 2023-12-10 16:51:53 -08:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c mm/readahead: do not allow order-1 folio 2023-12-12 10:57:06 -08:00
rmap.c mm: remove references to page_add_new_anon_rmap in comments 2023-12-29 11:58:26 -08:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem_quota.c shmem: Add default quota limit mount options 2023-08-09 09:15:40 +02:00
shmem.c mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio 2023-12-29 11:58:32 -08:00
show_mem.c mm: refactor si_mem_available() 2023-10-04 10:32:19 -07:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shrinker.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shuffle.c
shuffle.h
slab_common.c RCU pull request for v6.7 2023-10-30 18:01:41 -10:00
slab.c kasan: rename and document kasan_(un)poison_object_data 2023-12-29 11:58:40 -08:00
slab.h mm: kmem: scoped objcg protection 2023-10-25 16:47:11 -07:00
slub.c kasan: rename and document kasan_(un)poison_object_data 2023-12-29 11:58:40 -08:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2023-12-29 11:58:43 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio 2023-12-29 11:58:32 -08:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio 2023-12-29 11:58:32 -08:00
swapfile.c mm: remove page_swap_info() 2023-12-29 11:58:32 -08:00
truncate.c fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
usercopy.c
userfaultfd.c mm: remove some calls to page_add_new_anon_rmap() 2023-12-29 11:58:25 -08:00
util.c mm/util: use kmap_local_page() in memcmp_pages() 2023-12-10 16:51:49 -08:00
vmalloc.c mm/vmalloc: fix the unchecked dereference warning in vread_iter() 2023-11-01 12:38:35 -07:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-08-16 12:21:32 +01:00
vmscan.c mm: memcg: restore subtree stats flushing 2023-12-20 14:48:11 -08:00
vmstat.c mm: memcg: add per-memcg zswap writeback stat 2023-12-12 10:57:02 -08:00
workingset.c mm: memcg: restore subtree stats flushing 2023-12-20 14:48:11 -08:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c zsmalloc: use copy_page for full page copy 2023-10-18 14:34:16 -07:00
zswap.c mm: pass a folio to __swap_writepage() 2023-12-29 11:58:29 -08:00