linux/mm
Gerald Schaefer 2247bb335a mm/hugetlb: fix memory offline with hugepage size > memory block size
Patch series "mm/hugetlb: memory offline issues with hugepages", v4.

This addresses several issues with hugepages and memory offline.  While
the first patch fixes a panic, and is therefore rather important, the
last patch is just a performance optimization.

The second patch fixes a theoretical issue with reserved hugepages,
while still leaving some ugly usability issue, see description.

This patch (of 3):

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.

When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly.  In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page().  This means
that an unused pre-allocated gigantic page that has any part of itself
inside the memory block that is going offline will be dissolved
completely.  Losing an unused gigantic hugepage is preferable to failing
the memory offline, for example in the situation where a (possibly
faulty) memory DIMM needs to go offline.

Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Link: http://lkml.kernel.org/r/20160926172811.94033-2-gerald.schaefer@de.ibm.com
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rui Teng <rui.teng@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
..
kasan kasan: remove the unnecessary WARN_ONCE from quarantine.c 2016-08-11 16:58:14 -07:00
backing-dev.c block: fix bdi vs gendisk lifetime mismatch 2016-08-04 14:19:16 -06:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm/bootmem.c: replace kzalloc() by kzalloc_node() 2016-10-07 18:46:28 -07:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c
cma.c mm/cma: silence warnings due to max() usage 2016-05-27 14:49:37 -07:00
cma.h
compaction.c mm, compaction: restrict fragindex to costly orders 2016-10-07 18:46:29 -07:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
debug.c mm: avoid endless recursion in dump_page() 2016-09-19 15:36:16 -07:00
dmapool.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED 2016-06-09 14:23:11 -07:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c do_generic_file_read(): fail immediately if killed 2016-10-07 18:46:27 -07:00
frame_vector.c mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm 2016-02-16 10:11:12 +01:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c - ARM: GICv3 ITS emulation and various fixes. Removal of the old 2016-08-02 16:11:27 -04:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c thp: reduce usage of huge zero page's atomic counter 2016-10-07 18:46:28 -07:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hugetlb.c mm/hugetlb: fix memory offline with hugepage size > memory block size 2016-10-07 18:46:29 -07:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm, compaction: make full priority ignore pageblock suitability 2016-10-07 18:46:29 -07:00
interval_tree.c
Kconfig mm: clarify COMPACTION Kconfig text 2016-08-26 17:39:35 -07:00
Kconfig.debug PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO 2016-09-13 02:35:27 +02:00
khugepaged.c mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin() 2016-09-19 15:36:16 -07:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c kmemleak: don't hang if user disables scanning early 2016-07-28 16:07:41 -07:00
ksm.c mm,ksm: fix endless looping in allocating memory when ksm enable 2016-09-28 16:19:01 -07:00
list_lru.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c mm: make mmap_sem for write waits killable for mm syscalls 2016-05-23 17:04:14 -07:00
Makefile Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
memblock.c mm/memblock.c: expose total reserved memory 2016-10-07 18:46:28 -07:00
memcontrol.c mm: memcontrol: consolidate cgroup socket tracking 2016-10-07 18:46:29 -07:00
memory_hotplug.c mem-hotplug: use nodes that contain memory as mask in new_node_page() 2016-09-28 16:19:02 -07:00
memory-failure.c mm: hwpoison: remove incorrect comments 2016-07-28 16:07:41 -07:00
memory.c mm: fix cache mode tracking in vm_insert_mixed() 2016-10-07 18:46:28 -07:00
mempolicy.c mm: use zonelist name instead of using hardcoded index 2016-10-07 18:46:28 -07:00
mempool.c Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" 2016-07-28 16:07:41 -07:00
memtest.c
migrate.c mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations 2016-07-28 16:07:41 -07:00
mincore.c mm, swap: use offset of swap entry as key of swap cache 2016-10-07 18:46:28 -07:00
mlock.c mm: mlock: avoid increase mm->locked_vm on mlock() when already mlock2(,MLOCK_ONFAULT) 2016-10-07 18:46:28 -07:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 17:29:01 -07:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2016-04-28 11:44:19 +02:00
mmu_notifier.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
mmzone.c mm, page_alloc: inline the fast path of the zonelist iterator 2016-05-19 19:12:14 -07:00
mprotect.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
mremap.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c mm: nobootmem: move the comment of free_all_bootmem 2016-10-07 18:46:29 -07:00
nommu.c mm: introduce fault_env 2016-07-26 16:19:19 -07:00
oom_kill.c mm: don't emit warning from pagefault_out_of_memory() 2016-10-07 18:46:29 -07:00
page_alloc.c mm, page_alloc: pull no_progress_loops update to should_reclaim_retry() 2016-10-07 18:46:29 -07:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm/page_ext: support extra space allocation by page_ext user 2016-10-07 18:46:27 -07:00
page_idle.c mm, vmscan: move lru_lock to the node 2016-07-28 16:07:41 -07:00
page_io.c mm/page_io.c: replace some BUG_ON()s with VM_BUG_ON_PAGE() 2016-10-07 18:46:29 -07:00
page_isolation.c mm/page_isolation: clean up confused code 2016-07-26 16:19:19 -07:00
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
page-writeback.c mm: don't use radix tree writeback tags for pages in swap cache 2016-10-07 18:46:28 -07:00
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c
percpu.c percpu: fix synchronization between synchronous map extension and chunk destruction 2016-05-25 11:48:25 -04:00
pgtable-generic.c mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range 2016-03-17 15:09:34 -07:00
process_vm_access.c mm/gup: Introduce get_user_pages_remote() 2016-02-16 10:04:09 +01:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm: silently skip readahead for DAX inodes 2016-08-26 17:39:35 -07:00
rmap.c rmap: fix compound check logic in page_remove_file_rmap 2016-08-10 16:40:56 -07:00
shmem.c mm/shmem.c: constify anon_ops 2016-10-07 18:46:29 -07:00
slab_common.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
slab.c slab: Convert to hotplug state machine 2016-09-06 18:30:20 +02:00
slab.h mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB 2016-07-28 16:07:41 -07:00
slob.c mm: slab: free kmem_cache_node after destroy sysfs file 2016-02-18 16:23:24 -08:00
slub.c slub: Convert to hotplug state machine 2016-09-06 18:30:20 +02:00
sparse-vmemmap.c treewide: replace obsolete _refok by __ref 2016-08-02 17:31:41 -04:00
sparse.c treewide: replace obsolete _refok by __ref 2016-08-02 17:31:41 -04:00
swap_cgroup.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_state.c mm, swap: use offset of swap entry as key of swap cache 2016-10-07 18:46:28 -07:00
swap.c thp: reduce usage of huge zero page's atomic counter 2016-10-07 18:46:28 -07:00
swapfile.c mm, swap: use offset of swap entry as key of swap cache 2016-10-07 18:46:28 -07:00
truncate.c truncate: handle file thp 2016-07-26 16:19:19 -07:00
usercopy.c mm: usercopy: Check for module addresses 2016-09-20 16:07:39 -07:00
userfaultfd.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
util.c mm: move most file-based accounting to the node 2016-07-28 16:07:41 -07:00
vmacache.c mm: unrig VMA cache hit ratio 2016-10-07 18:46:27 -07:00
vmalloc.c mm/vmalloc.c: fix align value calculation error 2016-10-07 18:46:26 -07:00
vmpressure.c mm/vmpressure.c: fix subtree pressure detection 2016-02-03 08:28:43 -08:00
vmscan.c mm: use zonelist name instead of using hardcoded index 2016-10-07 18:46:28 -07:00
vmstat.c cpu: fix node state for whether it contains CPU 2016-10-07 18:46:28 -07:00
workingset.c mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() 2016-09-30 15:26:52 -07:00
z3fold.c mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup 2016-06-03 16:02:55 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: Delete an unnecessary check before the function call "iput" 2016-07-28 16:07:41 -07:00
zswap.c mm/zswap: use workqueue to destroy pool 2016-05-20 17:58:30 -07:00