linux/mm
Shaohua Li a128ca71fb mm: delete unnecessary TTU_* flags
Patch series "mm: fix some MADV_FREE issues", v5.

We are trying to use MADV_FREE in jemalloc.  Several issues are found.
Without solving the issues, jemalloc can't use the MADV_FREE feature.

 - Doesn't support system without swap enabled. Because if swap is off,
   we can't or can't efficiently age anonymous pages. And since
   MADV_FREE pages are mixed with other anonymous pages, we can't
   reclaim MADV_FREE pages. In current implementation, MADV_FREE will
   fallback to MADV_DONTNEED without swap enabled. But in our
   environment, a lot of machines don't enable swap. This will prevent
   our setup using MADV_FREE.

 - Increases memory pressure. page reclaim bias file pages reclaim
   against anonymous pages. This doesn't make sense for MADV_FREE pages,
   because those pages could be freed easily and refilled with very
   slight penality. Even page reclaim doesn't bias file pages, there is
   still an issue, because MADV_FREE pages and other anonymous pages are
   mixed together. To reclaim a MADV_FREE page, we probably must scan a
   lot of other anonymous pages, which is inefficient. In our test, we
   usually see oom with MADV_FREE enabled and nothing without it.

 - Accounting. There are two accounting problems. We don't have a global
   accounting. If the system is abnormal, we don't know if it's a
   problem from MADV_FREE side. The other problem is RSS accounting.
   MADV_FREE pages are accounted as normal anon pages and reclaimed
   lazily, so application's RSS becomes bigger. This confuses our
   workloads. We have monitoring daemon running and if it finds
   applications' RSS becomes abnormal, the daemon will kill the
   applications even kernel can reclaim the memory easily.

To address the first the two issues, we can either put MADV_FREE pages
into a separate LRU list (Minchan's previous patches and V1 patches), or
put them into LRU_INACTIVE_FILE list (suggested by Johannes).  The
patchset use the second idea.  The reason is LRU_INACTIVE_FILE list is
tiny nowadays and should be full of used once file pages.  So we can
still efficiently reclaim MADV_FREE pages there without interference
with other anon and active file pages.  Putting the pages into inactive
file list also has an advantage which allows page reclaim to prioritize
MADV_FREE pages and used once file pages.  MADV_FREE pages are put into
the lru list and clear SwapBacked flag, so PageAnon(page) &&
!PageSwapBacked(page) will indicate a MADV_FREE pages.  These pages will
directly freed without pageout if they are clean, otherwise normal swap
will reclaim them.

For the third issue, the previous post adds global accounting and a
separate RSS count for MADV_FREE pages.  The problem is we never get
accurate accounting for MADV_FREE pages.  The pages are mapped to
userspace, can be dirtied without notice from kernel side.  To get
accurate accounting, we could write protect the page, but then there is
extra page fault overhead, which people don't want to pay.  Jemalloc
guys have concerns about the inaccurate accounting, so this post drops
the accounting patches temporarily.  The info exported to
/proc/pid/smaps for MADV_FREE pages are kept, which is the only place we
can get accurate accounting right now.

This patch (of 6):

Johannes pointed out TTU_LZFREE is unnecessary.  It's true because we
always have the flag set if we want to do an unmap.  For cases we don't
do an unmap, the TTU_LZFREE part of code should never run.

Also the TTU_UNMAP is unnecessary.  If no other flags set (for example,
TTU_MIGRATION), an unmap is implied.

The patch includes Johannes's cleanup and dead TTU_ACTION macro removal
code

Link: http://lkml.kernel.org/r/4be3ea1bc56b26fd98a54d0a6f70bec63f6d8980.1487965799.git.shli@fb.com
Signed-off-by: Shaohua Li <shli@fb.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:08 -07:00
..
kasan kasan: report only the first error by default 2017-03-31 17:13:30 -07:00
backing-dev.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm/bootmem.c: cosmetic improvement of code readability 2017-02-22 16:41:29 -08:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c mm: cma_alloc: allow to specify GFP mask 2017-02-24 17:46:55 -08:00
cma.c mm: cma: print allocation failure reason and bitmap status 2017-02-24 17:46:55 -08:00
cma.h
compaction.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
debug.c mm, debug: print raw struct page data in __dump_page() 2016-12-12 18:55:08 -08:00
dmapool.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-02 11:18:50 -07:00
frame_vector.c mm: replace get_vaddr_frames() write/force parameters with gup_flags 2016-10-19 08:11:24 -07:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" 2017-04-23 11:45:20 +02:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c thp: fix MADV_DONTNEED vs. MADV_FREE race 2017-04-13 18:24:21 -07:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hugetlb.c mm/hugetlb.c: don't call region_abort if region_chg fails 2017-03-31 17:13:30 -07:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2016-11-22 11:49:48 -06:00
internal.h mm: delete NR_PAGES_SCANNED and pgdat_reclaimable() 2017-05-03 15:52:08 -07:00
interval_tree.c
Kconfig mm: remove AVR32 arch special handling in mm/Kconfig 2017-05-01 09:36:31 +02:00
Kconfig.debug mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
khugepaged.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h> 2017-03-02 08:42:28 +01:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c mm: fix section name for .data..ro_after_init 2017-03-31 17:13:30 -07:00
ksm.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h> 2017-03-02 08:42:28 +01:00
list_lru.c mm/list_lru.c: avoid error-path NULL pointer deref 2016-10-27 18:43:42 -07:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED 2017-03-09 17:01:10 -08:00
Makefile mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
memblock.c mm/memblock.c: fix memblock_next_valid_pfn() 2017-03-09 17:01:10 -08:00
memcontrol.c mm: do not call mem_cgroup_free() from within mem_cgroup_alloc() 2017-03-09 17:01:10 -08:00
memory_hotplug.c mm: add private lock to serialize memory hotplug operations 2017-03-16 16:56:18 -07:00
memory-failure.c mm: delete unnecessary TTU_* flags 2017-05-03 15:52:08 -07:00
memory.c Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
mempolicy.c mm/mempolicy.c: fix error handling in set_mempolicy and mbind. 2017-04-08 10:57:55 -07:00
mempool.c Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" 2016-07-28 16:07:41 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c mm: remove unnecessary reclaimability check from NUMA balancing target 2017-05-03 15:52:07 -07:00
mincore.c mm: remove shmem_mapping() shmem_zero_setup() duplicates 2017-02-24 17:46:56 -08:00
mlock.c Merge branch 'prep-for-5level' 2017-03-10 08:59:07 -08:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-02 15:20:00 -08:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_notifier.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
mmzone.c mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() 2017-02-22 16:41:29 -08:00
mprotect.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
mremap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
nommu.c Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
oom_kill.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
page_alloc.c mm: remove unnecessary back-off function when retrying page reclaim 2017-05-03 15:52:08 -07:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm/page_ext: support extra space allocation by page_ext user 2016-10-07 18:46:27 -07:00
page_idle.c mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs() 2017-02-24 17:46:55 -08:00
page_io.c writeback: add wbc_to_write_flags() 2016-11-02 10:24:03 -06:00
page_isolation.c mm, page_alloc: avoid page_to_pfn() when merging buddies 2017-02-22 16:41:27 -08:00
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
page_vma_mapped.c mm: fix page_vma_mapped_walk() for ksm pages 2017-04-08 00:47:48 -07:00
page-writeback.c mm/page-writeback.c: use setup_deferrable_timer 2017-05-03 15:52:08 -07:00
pagewalk.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c percpu: remove unused chunk_alloc parameter from pcpu_get_pages() 2017-03-06 15:56:55 -05:00
percpu.c Merge branch 'sched/core' into locking/core 2017-04-04 11:31:12 +02:00
pgtable-generic.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
process_vm_access.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm: don't cap request size based on read-ahead setting 2016-12-12 18:55:08 -08:00
rmap.c mm: delete unnecessary TTU_* flags 2017-05-03 15:52:08 -07:00
rodata_test.c mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
shmem.c Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
slab_common.c kasan: drain quarantine of memcg slab objects 2017-02-24 17:46:56 -08:00
slab.c slab: avoid IPIs when creating kmem caches 2017-05-03 15:52:07 -07:00
slab.h slab: remove synchronous synchronize_sched() from memcg cache deactivation path 2017-02-22 16:41:27 -08:00
slob.c slab: introduce __kmemcg_cache_deactivate() 2017-02-22 16:41:27 -08:00
slub.c slub: make sysfs directories for memcg sub-caches optional 2017-02-22 16:41:27 -08:00
sparse-vmemmap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
sparse.c mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next 2017-02-22 16:41:29 -08:00
swap_cgroup.c mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() 2017-04-08 00:47:49 -07:00
swap_slots.c mm, swap: Remove WARN_ON_ONCE() in free_swap_slot() 2017-03-21 14:13:19 -07:00
swap_state.c mm/swap: skip readahead only when swap slot cache is enabled 2017-02-22 16:41:30 -08:00
swap.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-01 23:54:56 -07:00
swapfile.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
truncate.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
usercopy.c mm/usercopy: Drop extra is_vmalloc_or_module() check 2017-04-05 12:30:18 -07:00
userfaultfd.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
util.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
vmacache.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
vmalloc.c A reasonably busy cycle for documentation this time around. There is a new 2017-05-02 10:21:17 -07:00
vmpressure.c mm: vmpressure: fix sending wrong events on underflow 2017-02-24 17:46:56 -08:00
vmscan.c mm: delete unnecessary TTU_* flags 2017-05-03 15:52:08 -07:00
vmstat.c mm: delete NR_PAGES_SCANNED and pgdat_reclaimable() 2017-05-03 15:52:08 -07:00
workingset.c mm: workingset: fix premature shadow node shrinking with cgroups 2017-03-31 17:13:30 -07:00
z3fold.c z3fold: fix page locking in z3fold_alloc() 2017-04-13 18:24:20 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: expand class bit 2017-04-13 18:24:21 -07:00
zswap.c zswap: don't param_set_charp while holding spinlock 2017-02-27 18:43:45 -08:00