linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-14 16:12:02 +00:00

History

Huang Ying c79b57e462 mm: hugetlb: clear target sub-page last when clearing huge page Huge page helps to reduce TLB miss rate, but it has higher cache footprint, sometimes this may cause some issue. For example, when clearing huge page on x86_64 platform, the cache footprint is 2M. But on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M LLC (last level cache). That is, in average, there are 2.5M LLC for each core and 1.25M LLC for each thread. If the cache pressure is heavy when clearing the huge page, and we clear the huge page from the begin to the end, it is possible that the begin of huge page is evicted from the cache after we finishing clearing the end of the huge page. And it is possible for the application to access the begin of the huge page after clearing the huge page. To help the above situation, in this patch, when we clear a huge page, the order to clear sub-pages is changed. In quite some situation, we can get the address that the application will access after we clear the huge page, for example, in a page fault handler. Instead of clearing the huge page from begin to end, we will clear the sub-pages farthest from the the sub-page to access firstly, and clear the sub-page to access last. This will make the sub-page to access most cache-hot and sub-pages around it more cache-hot too. If we cannot know the address the application will access, the begin of the huge page is assumed to be the the address the application will access. With this patch, the throughput increases ~28.3% in vm-scalability anon-w-seq test case with 72 processes on a 2 socket Xeon E5 v3 2699 system (36 cores, 72 threads). The test case creates 72 processes, each process mmap a big anonymous memory area and writes to it from the begin to the end. For each process, other processes could be seen as other workload which generates heavy cache pressure. At the same time, the cache miss rate reduced from ~33.4% to ~31.7%, the IPC (instruction per cycle) increased from 0.56 to 0.74, and the time spent in user space is reduced ~7.9% Christopher Lameter suggests to clear bytes inside a sub-page from end to begin too. But tests show no visible performance difference in the tests. May because the size of page is small compared with the cache size. Thanks Andi Kleen to propose to use address to access to determine the order of sub-pages to clear. The hugetlbfs access address could be improved, will do that in another patch. [ying.huang@intel.com: improve readability of clear_huge_page()] Link: http://lkml.kernel.org/r/20170830051842.1397-1-ying.huang@intel.com Link: http://lkml.kernel.org/r/20170815014618.15842-1-ying.huang@intel.com Suggested-by: Andi Kleen <andi.kleen@intel.com> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Nadia Yvette Chambers <nyc@holomorphy.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shaohua Li <shli@fb.com> Cc: Christopher Lameter <cl@linux.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-09-06 17:27:30 -07:00
..
kasan	Merge branch 'linus' into locking/core, to pick up fixes	2017-08-10 12:20:53 +02:00
backing-dev.c	bdi: Drop 'parent' argument from bdi_register[_va]()	2017-04-20 12:09:55 -06:00
balloon_compaction.c	mm/balloon_compaction.c: don't zero ballooned pages	2017-08-10 15:54:07 -07:00
bootmem.c	mm/bootmem.c: cosmetic improvement of code readability	2017-02-22 16:41:29 -08:00
cleancache.c	fs: switch ->s_uuid to uuid_t	2017-06-05 16:59:12 +02:00
cma_debug.c	mm/cma_debug.c: fix stack corruption due to sprintf usage	2017-08-18 15:32:02 -07:00
cma.c	cma: fix calculation of aligned offset	2017-07-10 16:32:32 -07:00
cma.h	cma: Store a name in the cma structure	2017-04-18 20:41:12 +02:00
compaction.c	mm, compaction: skip over holes in __reset_isolation_suitable	2017-07-06 16:24:32 -07:00
debug_page_ref.c	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
debug.c	mm: make tlb_flush_pending global	2017-08-10 15:54:07 -07:00
dmapool.c	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
early_ioremap.c	x86/mm: Add support to access boot related data in the clear	2017-07-18 11:38:02 +02:00
fadvise.c	mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED	2016-12-20 09:48:46 -08:00
failslab.c
filemap.c	mm: use find_get_pages_range() in filemap_range_has_page()	2017-09-06 17:27:27 -07:00
frame_vector.c	treewide: use kv[mz]alloc* rather than opencoded variants	2017-05-08 17:15:13 -07:00
frontswap.c	mm, frontswap: convert frontswap_enabled to static key	2016-07-26 16:19:19 -07:00
gup.c	mm/gup: make __gup_device_* require THP	2017-09-06 17:27:26 -07:00
highmem.c	mm/highmem: make nr_free_highpages() handles all highmem zones by itself	2016-05-19 19:12:14 -07:00
huge_memory.c	mm: hugetlb: clear target sub-page last when clearing huge page	2017-09-06 17:27:30 -07:00
hugetlb_cgroup.c	mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size	2016-05-20 17:58:30 -07:00
hugetlb.c	mm, hugetlb: do not allocate non-migrateable gigantic pages from movable zones	2017-09-06 17:27:29 -07:00
hwpoison-inject.c	mm: hwpoison: call shake_page() unconditionally	2017-05-03 15:52:12 -07:00
init-mm.c	mm: Add a user_ns owner to mm_struct and fix ptrace permission checks	2016-11-22 11:49:48 -06:00
internal.h	mm, oom: do not rely on TIF_MEMDIE for memory reserves access	2017-09-06 17:27:30 -07:00
interval_tree.c
Kconfig	mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups	2017-09-06 17:27:29 -07:00
Kconfig.debug	mm: enable page poisoning early at boot	2017-05-03 15:52:10 -07:00
khugepaged.c	mm: make PR_SET_THP_DISABLE immediately active	2017-07-10 16:32:31 -07:00
kmemcheck.c	mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU	2017-04-18 11:42:36 -07:00
kmemleak-test.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak.c	mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects	2017-07-06 16:24:34 -07:00
ksm.c	mm/ksm.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
list_lru.c	mm/list_lru.c: fix list_lru_count_node() to be race free	2017-07-10 16:32:33 -07:00
maccess.c	x86: remove more uaccess_32.h complexity	2016-05-22 17:21:27 -07:00
madvise.c	mm, madvise: ensure poisoned pages are removed from per-cpu lists	2017-08-31 16:33:15 -07:00
Makefile	percpu: expose statistics about percpu memory via debugfs	2017-06-20 15:31:38 -04:00
memblock.c	mm/memblock.c: reversed logic in memblock_discard()	2017-08-25 16:12:46 -07:00
memcontrol.c	mm: replace TIF_MEMDIE checks by tsk_is_oom_victim	2017-09-06 17:27:30 -07:00
memory_hotplug.c	mm, memory_hotplug: get rid of zonelists_mutex	2017-09-06 17:27:26 -07:00
memory-failure.c	x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages	2017-08-17 10:30:49 +02:00
memory.c	mm: hugetlb: clear target sub-page last when clearing huge page	2017-09-06 17:27:30 -07:00
mempolicy.c	mm/mempolicy: fix use after free when calling get_mempolicy	2017-08-18 15:32:02 -07:00
mempool.c	sched/wait: Rename wait_queue_t => wait_queue_entry_t	2017-06-20 12:18:27 +02:00
memtest.c
migrate.c	Sanitize 'move_pages()' permission checks	2017-08-20 13:26:27 -07:00
mincore.c	mm: remove shmem_mapping() shmem_zero_setup() duplicates	2017-02-24 17:46:56 -08:00
mlock.c	mlock: fix mlock count can not decrease in race condition	2017-06-02 15:07:38 -07:00
mm_init.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
mmap.c	mm: oom: let oom_reap_task and exit_mmap run concurrently	2017-09-06 17:27:30 -07:00
mmu_context.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
mmu_notifier.c	mm/mmu_notifier: kill invalidate_page	2017-08-31 16:13:00 -07:00
mmzone.c	mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist()	2017-02-22 16:41:29 -08:00
mprotect.c	mm: migrate: prevent racy access to tlb_flush_pending	2017-08-10 15:54:07 -07:00
mremap.c	mm/mremap: fail map duplication attempts for private mappings	2017-09-06 17:27:26 -07:00
msync.c
nobootmem.c	mm: discard memblock data later	2017-08-18 15:32:01 -07:00
nommu.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
oom_kill.c	mm: oom: let oom_reap_task and exit_mmap run concurrently	2017-09-06 17:27:30 -07:00
page_alloc.c	mm, oom: do not rely on TIF_MEMDIE for memory reserves access	2017-09-06 17:27:30 -07:00
page_counter.c
page_ext.c	mm, page_ext: periodically reschedule during page_ext_init()	2017-09-06 17:27:26 -07:00
page_idle.c	mm/page_idle.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
page_io.c	mm: test code to write THP to swap device as a whole	2017-09-06 17:27:28 -07:00
page_isolation.c	mm: unify new_node_page and alloc_migrate_target	2017-07-10 16:32:31 -07:00
page_owner.c	mm, page_owner: don't grab zone->lock for init_pages_in_zone()	2017-09-06 17:27:26 -07:00
page_poison.c	mm: enable page poisoning early at boot	2017-05-03 15:52:10 -07:00
page_vma_mapped.c	mm/hugetlb: add size parameter to huge_pte_offset()	2017-07-06 16:24:34 -07:00
page-writeback.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
pagewalk.c	mm/hugetlb: add size parameter to huge_pte_offset()	2017-07-06 16:24:34 -07:00
percpu-internal.h	percpu: fix early calls for spinlock in pcpu_stats	2017-06-21 13:53:52 -04:00
percpu-km.c	percpu: fix static checker warnings in pcpu_destroy_chunk	2017-06-29 11:23:38 -04:00
percpu-stats.c	percpu: expose statistics about percpu memory via debugfs	2017-06-20 15:31:38 -04:00
percpu-vm.c	percpu: fix static checker warnings in pcpu_destroy_chunk	2017-06-29 11:23:38 -04:00
percpu.c	percpu: resolve err may not be initialized in pcpu_alloc	2017-06-21 12:00:45 -04:00
pgtable-generic.c	mm: convert generic code to 5-level paging	2017-03-09 11:48:47 -08:00
process_vm_access.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>	2017-03-02 08:42:28 +01:00
quicklist.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
readahead.c	mm: don't cap request size based on read-ahead setting	2016-12-12 18:55:08 -08:00
rmap.c	mm/rmap: update to new mmu_notifier semantic v2	2017-08-31 16:12:59 -07:00
rodata_test.c	mm: remove rodata_test_data export, add pr_fmt	2017-05-03 15:52:09 -07:00
shmem.c	mm, swap: VMA based swap readahead	2017-09-06 17:27:29 -07:00
slab_common.c	mm: allow slab_nomerge to be set at build time	2017-07-06 16:24:31 -07:00
slab.c	mm: memcontrol: account slab stats per lruvec	2017-07-06 16:24:35 -07:00
slab.h	locking/lockdep: Rework FS_RECLAIM annotation	2017-08-10 12:29:03 +02:00
slob.c	locking/lockdep: Rework FS_RECLAIM annotation	2017-08-10 12:29:03 +02:00
slub.c	mm/slub.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
sparse-vmemmap.c	mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations	2017-09-06 17:27:26 -07:00
sparse.c	mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations	2017-09-06 17:27:26 -07:00
swap_cgroup.c	mm, THP, swap: delay splitting THP during swap out	2017-07-06 16:24:31 -07:00
swap_slots.c	mm/swap_slots.c: don't disable preemption while taking the per-CPU cache	2017-07-10 16:32:32 -07:00
swap_state.c	mm, swap: add sysfs interface for VMA based swap readahead	2017-09-06 17:27:29 -07:00
swap.c	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
swapfile.c	swap: choose swap device according to numa node	2017-09-06 17:27:30 -07:00
truncate.c	mm/truncate.c: fix THP handling in invalidate_mapping_pages()	2017-07-10 16:32:32 -07:00
usercopy.c	mm/usercopy: Drop extra is_vmalloc_or_module() check	2017-04-05 12:30:18 -07:00
userfaultfd.c	userfaultfd: shmem: wire up shmem_mfill_zeropage_pte	2017-09-06 17:27:28 -07:00
util.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
vmacache.c	sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
vmalloc.c	mm/vmalloc.c: don't reinvent the wheel but use existing llist API	2017-09-06 17:27:29 -07:00
vmpressure.c	mm, vmpressure: pass-through notification support	2017-07-10 16:32:31 -07:00
vmscan.c	mm, THP, swap: add THP swapping out fallback counting	2017-09-06 17:27:28 -07:00
vmstat.c	mm, swap: add swap readahead hit statistics	2017-09-06 17:27:29 -07:00
workingset.c	mm: memcontrol: per-lruvec stats infrastructure	2017-07-06 16:24:35 -07:00
z3fold.c	z3fold: use per-cpu unbuddied lists	2017-09-06 17:27:30 -07:00
zbud.c
zpool.c
zsmalloc.c	zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse	2017-09-06 17:27:26 -07:00
zswap.c	mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()	2017-07-06 16:24:35 -07:00