linux/mm
Johannes Weiner 42a3003535 mm: memcontrol: fix recursive statistics correctness & scalabilty
Right now, when somebody needs to know the recursive memory statistics
and events of a cgroup subtree, they need to walk the entire subtree and
sum up the counters manually.

There are two issues with this:

1. When a cgroup gets deleted, its stats are lost. The state counters
   should all be 0 at that point, of course, but the events are not.
   When this happens, the event counters, which are supposed to be
   monotonic, can go backwards in the parent cgroups.

2. During regular operation, we always have a certain number of lazily
   freed cgroups sitting around that have been deleted, have no tasks,
   but have a few cache pages remaining. These groups' statistics do not
   change until we eventually hit memory pressure, but somebody
   watching, say, memory.stat on an ancestor has to iterate those every
   time.

This patch addresses both issues by introducing recursive counters at
each level that are propagated from the write side when stats change.

Upward propagation happens when the per-cpu caches spill over into the
local atomic counter.  This is the same thing we do during charge and
uncharge, except that the latter uses atomic RMWs, which are more
expensive; stat changes happen at around the same rate.  In a sparse
file test (page faults and reclaim at maximum CPU speed) with 5 cgroup
nesting levels, perf shows __mod_memcg_page state at ~1%.

Link: http://lkml.kernel.org/r/20190412151507.2769-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 19:52:53 -07:00
..
kasan arm64 updates for 5.2 2019-05-06 17:54:22 -07:00
backing-dev.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-01-22 14:39:38 -07:00
balloon_compaction.c
cleancache.c
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
cma.c mm/cma.c: fix crash on CMA allocation if bitmap allocation fails 2019-05-14 09:47:47 -07:00
cma.h
compaction.c mm: move buddy list manipulations into helpers 2019-05-14 19:52:48 -07:00
debug_page_ref.c
debug.c mm: update references to page _refcount 2019-05-14 19:52:47 -07:00
dmapool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
early_ioremap.c
fadvise.c
failslab.c mm: no need to check return value of debugfs_create functions 2019-03-05 21:07:17 -08:00
filemap.c mm: delete find_get_entries_tag 2019-05-14 09:47:51 -07:00
frame_vector.c
frontswap.c
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
gup.c mm: introduce put_user_page*(), placeholder versions 2019-05-14 09:47:47 -07:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/mmu_notifier: convert user range->blockable to helper function 2019-05-14 09:47:49 -07:00
huge_memory.c mm/huge_memory.c: make __thp_get_unmapped_area static 2019-05-14 09:47:51 -07:00
hugetlb_cgroup.c
hugetlb.c hugetlbfs: always use address space in inode for resv_map pointer 2019-05-14 09:47:50 -07:00
hwpoison-inject.c
init-mm.c
internal.h mm, compaction: capture a page under direct compaction 2019-03-05 21:07:17 -08:00
interval_tree.c
Kconfig mm/Kconfig: update "Memory Model" help text 2019-05-14 09:47:51 -07:00
Kconfig.debug mm: remove redundant 'default n' from Kconfig-s 2019-05-14 09:47:50 -07:00
khugepaged.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
kmemleak-test.c
kmemleak.c Merge branch 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-05-06 13:11:48 -07:00
ksm.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
list_lru.c numa: make "nr_node_ids" unsigned int 2019-03-05 21:07:19 -08:00
maccess.c Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses" 2019-02-25 09:10:51 -08:00
madvise.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
Makefile mm: shuffle initial free memory to improve memory-side-cache utilization 2019-05-14 19:52:48 -07:00
memblock.c mm: memblock: make keeping memblock memory opt-in rather than opt-out 2019-05-14 09:47:50 -07:00
memcontrol.c mm: memcontrol: fix recursive statistics correctness & scalabilty 2019-05-14 19:52:53 -07:00
memfd.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
memory_hotplug.c mm: shuffle initial free memory to improve memory-side-cache utilization 2019-05-14 19:52:48 -07:00
memory-failure.c mm: hwpoison: fix thp split handing in soft_offline_in_use_page() 2019-03-05 21:07:13 -08:00
memory.c mm: introduce new vm_map_pages() and vm_map_pages_zero() API 2019-05-14 09:47:50 -07:00
mempolicy.c mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified 2019-03-29 10:01:37 -07:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memtest.c
migrate.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-14 19:52:48 -07:00
mlock.c mm: remove zone_lru_lock() function, access ->lru_lock directly 2019-03-05 21:07:21 -08:00
mm_init.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
mmap.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-19 09:46:05 -07:00
mmu_context.c
mmu_gather.c asm-generic/tlb: Remove tlb_table_flush() 2019-04-03 10:33:02 +02:00
mmu_notifier.c mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper 2019-05-14 09:47:49 -07:00
mmzone.c
mprotect.c mm/mprotect.c: fix compilation warning because of unused 'mm' variable 2019-05-14 09:47:51 -07:00
mremap.c mm/mmu_notifier: contextual information for event triggering invalidation 2019-05-14 09:47:49 -07:00
msync.c
nommu.c mm: introduce new vm_map_pages() and vm_map_pages_zero() API 2019-05-14 09:47:50 -07:00
oom_kill.c mm/mmu_notifier: contextual information for event triggering invalidation 2019-05-14 09:47:49 -07:00
page_alloc.c mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
page_counter.c
page_ext.c memblock: drop memblock_alloc_*_nopanic() variants 2019-03-12 10:04:02 -07:00
page_idle.c mm: remove zone_lru_lock() function, access ->lru_lock directly 2019-03-05 21:07:21 -08:00
page_io.c mm/page_io.c: fix polled swap page in 2019-01-04 13:13:48 -08:00
page_isolation.c mm/page_isolation.c: remove redundant pfn_valid_within() in __first_valid_page() 2019-05-14 09:47:46 -07:00
page_owner.c mm/page_owner: Simplify stack trace handling 2019-04-29 12:37:50 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-03-05 21:07:13 -08:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-10-31 08:54:11 -07:00
page-writeback.c mm/page-writeback: introduce tracepoint for wait_on_page_writeback() 2019-05-14 09:47:51 -07:00
pagewalk.c
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE 2019-03-13 12:25:31 -07:00
percpu-stats.c percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-vm.c
percpu.c Merge branch 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2019-05-13 15:34:03 -07:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
rmap.c mm/rmap.c: use the pra.mapcount to do the check 2019-05-14 09:47:49 -07:00
rodata_test.c
shmem.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
shuffle.c mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm: add support for kmem caches in DMA32 zone 2019-03-29 10:01:37 -07:00
slab.c mm/slab.c: fix an infinite loop in leaks_show() 2019-05-14 09:47:45 -07:00
slab.h mm: add support for kmem caches in DMA32 zone 2019-03-29 10:01:37 -07:00
slob.c slob: use slab_list instead of lru 2019-05-14 09:47:44 -07:00
slub.c mm/slub.c: update the comment about slab frozen 2019-05-14 09:47:45 -07:00
sparse-vmemmap.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sparse.c mm/sparse.c: clean up obsolete code comment 2019-05-14 09:47:48 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
swap.c mm/swap.c: __pagevec_lru_add_fn: typo fix 2019-05-14 09:47:48 -07:00
swapfile.c mm: swapoff: shmem_unuse() stop eviction without igrab() 2019-04-19 09:46:04 -07:00
truncate.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
usercopy.c mm/usercopy.c: no check page span for stack objects 2019-01-08 17:15:11 -08:00
userfaultfd.c hugetlb: use same fault hash key for shared and private mappings 2019-05-14 09:47:48 -07:00
util.c mm: fix false-positive OVERCOMMIT_GUESS failures 2019-05-14 09:47:50 -07:00
vmacache.c
vmalloc.c mm/vmalloc.c: convert vmap_lazy_nr to atomic_long_t 2019-05-14 19:52:48 -07:00
vmpressure.c
vmscan.c mm: memcontrol: make cgroup stats and events query API explicitly local 2019-05-14 19:52:53 -07:00
vmstat.c mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n 2019-04-19 09:46:04 -07:00
workingset.c mm: memcontrol: make cgroup stats and events query API explicitly local 2019-05-14 19:52:53 -07:00
z3fold.c mm/z3fold.c: support page migration 2019-05-14 09:47:50 -07:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: fix fall-through annotation 2018-10-26 16:26:35 -07:00
zswap.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00