linux/mm
Oscar Salvador a8b2c2ce89 mm,hwpoison: take free pages off the buddy freelists
The crux of the matter is that historically we left poisoned pages in the
buddy system because we have some checks in place when allocating a page
that are gatekeeper for poisoned pages.  Unfortunately, we do have other
users (e.g: compaction [1]) that scan buddy freelists and try to get a
page from there without checking whether the page is HWPoison.

As I stated already, I think it is fundamentally wrong to keep HWPoison
pages within the buddy systems, checks in place or not.

Let us fix this the same way we did for soft_offline [2], taking the page
off the buddy freelist so it is completely unreachable.

Note that this is fairly simple to trigger, as we only need to poison free
buddy pages (madvise MADV_HWPOISON) and then run some sort of memory
stress system.

Just for a matter of reference, I put a dump_page() in compaction_alloc()
to trigger for HWPoison patches:

    page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1d5db
    flags: 0xfffffc0800000(hwpoison)
    raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 0000000000000000
    raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000
    page dumped because: compaction_alloc

    CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G            E     5.9.0-rc2-mm1-1-default+ #5
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
    Call Trace:
     dump_stack+0x6d/0x8b
     compaction_alloc+0xb2/0xc0
     migrate_pages+0x2a6/0x12a0
     compact_zone+0x5eb/0x11c0
     proactive_compact_node+0x89/0xf0
     kcompactd+0x2d0/0x3a0
     kthread+0x118/0x130
     ret_from_fork+0x22/0x30

After that, if e.g: a process faults in the page,  it will get killed
unexpectedly.
Fix it by containing the page immediatelly.

Besides that, two more changes can be noticed:

* MF_DELAYED no longer suits as we are fixing the issue by containing
  the page immediately, so it does no longer rely on the allocation-time
  checks to stop HWPoison to be handed over.
  gain unless it is unpoisoned, so we fixed the situation.
  Because of that, let us use MF_RECOVERED from now on.

* The second block that handles PageBuddy pages is no longer needed:
  We call shake_page and then check whether the page is Buddy
  because shake_page calls drain_all_pages, which sends pcp-pages back to
  the buddy freelists, so we could have a chance to handle free pages.
  Currently, get_hwpoison_page already calls drain_all_pages, and we call
  get_hwpoison_page right before coming here, so we should be on the safe
  side.

[1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u
[2] https://patchwork.kernel.org/cover/11792607/

[osalvador@suse.de: take the poisoned subpage off the buddy frelists]
  Link: https://lkml.kernel.org/r/20201013144447.6706-4-osalvador@suse.de

Link: https://lkml.kernel.org/r/20201013144447.6706-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:44 -08:00
..
kasan kasan: print workqueue stack 2020-12-15 12:13:42 -08:00
backing-dev.c bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag 2020-09-24 13:43:39 -06:00
balloon_compaction.c
cleancache.c
cma_debug.c
cma.c cma: don't quit at first error when activating reserved areas 2020-08-12 10:57:57 -07:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
compaction.c mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate 2020-11-14 11:26:03 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped. 2020-10-16 11:11:14 -07:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm: memcontrol: add file_thp, shmem_thp to memory.stat 2020-12-15 12:13:39 -08:00
frame_vector.c
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_test.c mm/gup_test.c: mark gup_test_init as __init function 2020-12-15 12:13:38 -08:00
gup_test.h selftests/vm: gup_test: introduce the dump_pages() sub-test 2020-12-15 12:13:38 -08:00
gup.c mm/gup: combine put_compound_head() and unpin_user_page() 2020-12-15 12:13:39 -08:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
hugetlb_cgroup.c hugetlb_cgroup: fix offline of hugetlb cgroup with reservations 2020-12-06 10:19:07 -08:00
hugetlb.c vm_ops: rename .split() callback to .may_split() 2020-12-15 12:13:41 -08:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm, page_alloc: disable pcplists during memory offline 2020-12-15 12:13:43 -08:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig mm/gup_test: GUP_TEST depends on DEBUG_FS 2020-12-15 12:13:38 -08:00
Kconfig.debug
khugepaged.c mm: memcontrol: add file_thp, shmem_thp to memory.stat 2020-12-15 12:13:39 -08:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c docs: get rid of :c:type explicit declarations for structs 2020-10-15 07:49:40 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm/madvise: remove racy mm ownership check 2020-12-08 20:57:18 -08:00
Makefile mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: enhance the kernel-doc markups 2020-12-15 12:13:41 -08:00
memblock.c arm, arm64: move free_unused_memmap() to generic mm 2020-12-15 12:13:42 -08:00
memcontrol.c mm: memcontrol: account pagetables per node 2020-12-15 12:13:40 -08:00
memfd.c
memory_hotplug.c mm, page_alloc: disable pcplists during memory offline 2020-12-15 12:13:43 -08:00
memory-failure.c mm,hwpoison: take free pages off the buddy freelists 2020-12-15 12:13:44 -08:00
memory.c mm: cleanup: remove unused tsk arg from __access_remote_vm 2020-12-15 12:13:40 -08:00
mempolicy.c mm: mempolicy: fix potential pte_unmap_unlock pte error 2020-11-02 12:14:19 -08:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm/mremap_pages: fix static key devmap_managed_key updates 2020-11-02 12:14:18 -08:00
memtest.c
migrate.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap_lock.c mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mmap.c mm: forbid splitting special mappings 2020-12-15 12:13:41 -08:00
mmu_gather.c
mmu_notifier.c mm: track mmu notifiers in fs_reclaim_acquire/release 2020-12-15 12:13:41 -08:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2020-12-15 12:13:42 -08:00
mprotect.c mm: Introduce arch_validate_flags() 2020-09-04 12:46:07 +01:00
mremap.c mremap: check if it's possible to split original vma 2020-12-15 12:13:41 -08:00
msync.c
nommu.c mm: cleanup: remove unused tsk arg from __access_remote_vm 2020-12-15 12:13:40 -08:00
oom_kill.c mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-13 18:38:35 -07:00
page_alloc.c mm/page_alloc: speed up the iteration of max_order 2020-12-15 12:13:44 -08:00
page_counter.c mm/page_counter: use page_counter_read in page_counter_set_max 2020-12-15 12:13:40 -08:00
page_ext.c mm: fix page_owner initializing issue for arm32 2020-12-15 12:13:38 -08:00
page_idle.c
page_io.c mm/page_io.c: remove useless out label in __swap_writepage() 2020-10-13 18:38:30 -07:00
page_isolation.c mm, page_alloc: disable pcplists during memory offline 2020-12-15 12:13:43 -08:00
page_owner.c mm/page_owner: record timestamp and pid 2020-12-15 12:13:38 -08:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte 2020-12-15 12:13:41 -08:00
page-writeback.c mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) 2020-11-24 15:23:19 -08:00
pagewalk.c
percpu-internal.h mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: convert flexible array initializers to use struct_size() 2020-10-30 23:02:28 +00:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c
process_vm_access.c mm/process_vm_access: Add missing #include <linux/compat.h> 2020-10-27 12:41:29 -07:00
ptdump.c
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c mm: memcontrol: add file_thp, shmem_thp to memory.stat 2020-12-15 12:13:39 -08:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c mm: slab: clarify krealloc()'s behavior with __GFP_ZERO 2020-12-15 12:13:37 -08:00
slab.c mm: introduce debug_pagealloc_{map,unmap}_pages() helpers 2020-12-15 12:13:43 -08:00
slab.h mm: extract might_alloc() debug check 2020-12-15 12:13:41 -08:00
slob.c mm: extract might_alloc() debug check 2020-12-15 12:13:41 -08:00
slub.c mm/slub: let number of online CPUs determine the slub page order 2020-12-15 12:13:38 -08:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG 2020-10-16 11:11:18 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c mm/swap_state: skip meaningless swap cache readahead when ra_info.win == 0 2020-12-15 12:13:39 -08:00
swap.c mm: remove pagevec_lookup_range_nr_tag() 2020-12-15 12:13:39 -08:00
swapfile.c mm/swapfile.c: use memset to fill the swap_map with SWAP_HAS_CACHE 2020-12-15 12:13:39 -08:00
truncate.c mm/truncate: add parameter explanation for invalidate_mapping_pagevec 2020-12-15 12:13:38 -08:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c mm/util.c: update the kerneldoc for kstrdup_const() 2020-10-16 11:11:17 -07:00
vmacache.c
vmalloc.c mm/vmalloc.c: fix kasan shadow poisoning size 2020-12-15 12:13:42 -08:00
vmpressure.c
vmscan.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2020-12-15 12:13:42 -08:00
workingset.c mm: memcg/slab: rename *_lruvec_slab_state to *_lruvec_kmem_state 2020-12-15 12:13:40 -08:00
z3fold.c mm/z3fold.c: use xx_zalloc instead xx_alloc and memset 2020-10-13 18:38:34 -07:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
zswap.c