linux/mm
David Hildenbrand c145e0b47c mm: streamline COW logic in do_swap_page()
Currently we have a different COW logic when:
* triggering a read-fault to swapin first and then trigger a write-fault
  -> do_swap_page() + do_wp_page()
* triggering a write-fault to swapin
  -> do_swap_page() + do_wp_page() only if we fail reuse in do_swap_page()

The COW logic in do_swap_page() is different than our reuse logic in
do_wp_page().  The COW logic in do_wp_page() -- page_count() == 1 -- makes
currently sure that we certainly don't have a remaining reference, e.g.,
via GUP, on the target page we want to reuse: if there is any unexpected
reference, we have to copy to avoid information leaks.

As do_swap_page() behaves differently, in environments with swap enabled
we can currently have an unintended information leak from the parent to
the child, similar as known from CVE-2020-29374:

	1. Parent writes to anonymous page
	-> Page is mapped writable and modified
	2. Page is swapped out
	-> Page is unmapped and replaced by swap entry
	3. fork()
	-> Swap entries are copied to child
	4. Child pins page R/O
	-> Page is mapped R/O into child
	5. Child unmaps page
	-> Child still holds GUP reference
	6. Parent writes to page
	-> Page is reused in do_swap_page()
	-> Child can observe changes

Exchanging 2. and 3. should have the same effect.

Let's apply the same COW logic as in do_wp_page(), conditionally trying to
remove the page from the swapcache after freeing the swap entry, however,
before actually mapping our page.  We can change the order now that we use
try_to_free_swap(), which doesn't care about the mapcount, instead of
reuse_swap_page().

To handle references from the LRU pagevecs, conditionally drain the local
LRU pagevecs when required, however, don't consider the page_count() when
deciding whether to drain to keep it simple for now.

Link: https://lkml.kernel.org/r/20220131162940.210846-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Liang Zhang <zhangliang5@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:50 -07:00
..
damon Folio changes for 5.18 2022-03-22 17:03:12 -07:00
kasan kasan: disable LOCKDEP when printing reports 2022-03-24 19:06:50 -07:00
kfence kfence: allow use of a deferrable timer 2022-03-22 15:57:11 -07:00
backing-dev.c remove congestion tracking framework 2022-03-22 15:57:01 -07:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c bootmem: Use page->index instead of page->freelist 2022-01-06 12:27:03 +01:00
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma.c mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c mm: compaction: cleanup the compaction trace events 2022-03-22 15:57:09 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-04 09:25:04 -08:00
debug.c mm: unexport page_init_poison 2022-03-24 19:06:45 -07:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c remove inode_congested() 2022-03-22 15:57:01 -07:00
failslab.c
filemap.c mm: filemap_unaccount_folio() large skip mapcount fixup 2022-03-24 19:06:45 -07:00
folio-compat.c mm/rmap: Convert rmap_walk() to take a folio 2022-03-21 13:01:35 -04:00
frontswap.c frontswap: remove support for multiple ops 2022-01-22 08:33:38 +02:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
gup.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
highmem.c mm/highmem: remove unnecessary done label 2022-03-22 15:57:11 -07:00
hmm.c mm/hmm.c: remove unneeded local variable ret 2022-03-22 15:57:12 -07:00
huge_memory.c mm/huge_memory: make is_transparent_hugepage() static 2022-03-24 19:06:50 -07:00
hugetlb_cgroup.c hugetlb: add hugetlb.*.numa_stat file 2022-01-15 16:30:29 +02:00
hugetlb_vmemmap.c mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key 2022-03-22 15:57:08 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hugetlb.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
hwpoison-inject.c mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler 2022-03-22 15:57:07 -07:00
init-mm.c kernel/fork: Initialize mm's PASID 2022-02-14 19:51:47 +01:00
internal.h Folio changes for 5.18 2022-03-22 17:03:12 -07:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
Kconfig Folio changes for 5.18 2022-03-22 17:03:12 -07:00
Kconfig.debug mm: page table check 2022-01-15 16:30:28 +02:00
khugepaged.c mm/rmap: Convert try_to_unmap() to take a folio 2022-03-21 12:59:03 -04:00
kmemleak.c mm/kmemleak: avoid scanning potential huge holes 2022-02-04 09:25:05 -08:00
ksm.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
list_lru.c mm/list_lru: optimize memcg_reparent_list_lru_node() 2022-03-22 15:57:08 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c mm: enable MADV_DONTNEED for hugetlb mappings 2022-03-24 19:06:50 -07:00
Makefile mm: move the migrate_vma_* device migration code into its own file 2022-03-03 12:47:33 -05:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c memblock: use kfree() to release kmalloced memblock regions 2022-02-20 08:45:39 +02:00
memcontrol.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
memory-failure.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
memory.c mm: streamline COW logic in do_swap_page() 2022-03-24 19:06:50 -07:00
mempolicy.c mempolicy: mbind_range() set_policy() after vma_merge() 2022-03-22 15:57:09 -07:00
mempool.c mm: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
memremap.c mm: delete __ClearPageWaiters() 2022-03-24 19:06:45 -07:00
memtest.c
migrate_device.c mm/migrate: Convert remove_migration_ptes() to folios 2022-03-21 13:01:35 -04:00
migrate.c mm/migration: add trace events for base page and HugeTLB migrations 2022-03-24 19:06:45 -07:00
mincore.c
mlock.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmap.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mmu_gather.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mprotect.c memory tiering: skip to scan fast memory 2022-03-22 15:57:09 -07:00
mremap.c mm/mremap:: use vma_lookup() instead of find_vma() 2022-03-22 15:57:05 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
oom_kill.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
page_alloc.c kasan, page_alloc: allow skipping memory init for HW_TAGS 2022-03-24 19:06:47 -07:00
page_counter.c mm/page_counter: remove an incorrect call to propagate_protected_usage() 2022-01-15 16:30:27 +02:00
page_ext.c mm: make some vars and functions static or __init 2022-01-15 16:30:31 +02:00
page_idle.c mm/rmap: Constify the rmap_walk_control argument 2022-03-21 13:01:35 -04:00
page_io.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
page_isolation.c Revert "mm/page_isolation: unset migratetype directly for non Buddy page" 2022-02-04 09:25:04 -08:00
page_owner.c mm/page_owner.c: record tgid 2022-03-24 19:06:44 -07:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm/page_table_check.c: use strtobool for param parsing 2022-03-22 15:57:11 -07:00
page_vma_mapped.c mm: Convert page_vma_mapped_walk to work on PFNs 2022-03-21 12:59:02 -04:00
page-writeback.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h mm: memcg/percpu: account extra objcg space to memory cgroups 2022-01-15 16:30:31 +02:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
rmap.c mm/migration: add trace events for base page and HugeTLB migrations 2022-03-24 19:06:45 -07:00
rodata_test.c
secretmem.c fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio 2022-03-16 13:37:05 -04:00
shmem.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab_common.c mm/slab_common: use helper function is_power_of_2() 2022-02-21 11:38:12 +01:00
slab.c mm: introduce kmem_cache_alloc_lru 2022-03-22 15:57:03 -07:00
slab.h mm: introduce kmem_cache_alloc_lru 2022-03-22 15:57:03 -07:00
slob.c slab updates for 5.18 2022-03-23 12:33:21 -07:00
slub.c slab updates for 5.18 2022-03-23 12:33:21 -07:00
sparse-vmemmap.c mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP 2022-03-22 15:57:08 -07:00
sparse.c mm/sparse: make mminit_validate_memmodel_limits() static 2022-03-22 15:57:05 -07:00
swap_cgroup.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
swap_slots.c treewide: Add missing includes masked by cgroup -> bpf dependency 2021-12-03 10:58:13 -08:00
swap_state.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
swap.c mm: delete __ClearPageWaiters() 2022-03-24 19:06:45 -07:00
swapfile.c userfaultfd: provide unmasked address on page-fault 2022-03-22 15:57:08 -07:00
truncate.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
usercopy.c Merge branch 'akpm' (patches from Andrew) 2022-03-22 16:11:53 -07:00
userfaultfd.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
util.c ARM: 2022-03-24 11:58:57 -07:00
vmacache.c
vmalloc.c kasan, vmalloc: only tag normal vmalloc allocations 2022-03-24 19:06:48 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
vmstat.c mm: only re-generate demotion targets when a numa node changes its N_CPU state 2022-03-22 15:57:11 -07:00
workingset.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c zsmalloc: replace get_cpu_var with local_lock 2022-01-22 08:33:37 +02:00
zswap.c mm/zswap.c: allow handling just same-value filled pages 2022-03-22 15:57:11 -07:00