linux/mm
Vladimir Davydov 4949148ad4 mm: charge/uncharge kmemcg from generic page allocator paths
Currently, to charge a non-slab allocation to kmemcg one has to use
alloc_kmem_pages helper with __GFP_ACCOUNT flag.  A page allocated with
this helper should finally be freed using free_kmem_pages, otherwise it
won't be uncharged.

This API suits its current users fine, but it turns out to be impossible
to use along with page reference counting, i.e.  when an allocation is
supposed to be freed with put_page, as it is the case with pipe or unix
socket buffers.

To overcome this limitation, this patch moves charging/uncharging to
generic page allocator paths, i.e.  to __alloc_pages_nodemask and
free_pages_prepare, and zaps alloc/free_kmem_pages helpers.  This way,
one can use any of the available page allocation functions to get the
allocated page charged to kmemcg - it's enough to pass __GFP_ACCOUNT,
just like in case of kmalloc and friends.  A charged page will be
automatically uncharged on free.

To make it possible, we need to mark pages charged to kmemcg somehow.
To avoid introducing a new page flag, we make use of page->_mapcount for
marking such pages.  Since pages charged to kmemcg are not supposed to
be mapped to userspace, it should work just fine.  There are other
(ab)users of page->_mapcount - buddy and balloon pages - but we don't
conflict with them.

In case kmemcg is compiled out or not used at runtime, this patch
introduces no overhead to generic page allocator paths.  If kmemcg is
used, it will be plus one gfp flags check on alloc and plus one
page->_mapcount check on free, which shouldn't hurt performance, because
the data accessed are hot.

Link: http://lkml.kernel.org/r/a9736d856f895bcb465d9f257b54efe32eda6f99.1464079538.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
..
kasan kasan/quarantine: fix bugs on qlist_move_cache() 2016-07-15 14:54:27 +09:00
backing-dev.c mm: throttle on IO only when there are too many dirty and writeback pages 2016-05-20 17:58:30 -07:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c
cma.c mm/cma: silence warnings due to max() usage 2016-05-27 14:49:37 -07:00
cma.h
compaction.c mm/page_alloc: introduce post allocation processing on page allocator 2016-07-26 16:19:19 -07:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
debug.c mm: introduce page reference manipulation functions 2016-03-17 15:09:34 -07:00
dmapool.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
early_ioremap.c
fadvise.c mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED 2016-06-09 14:23:11 -07:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c Revert "mm: make faultaround produce old ptes" 2016-06-24 17:23:52 -07:00
frame_vector.c mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm 2016-02-16 10:11:12 +01:00
frontswap.c
gup.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c mm/mmu_gather: track page size with mmu gather and force flush if page size change 2016-07-26 16:19:19 -07:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hugetlb.c mm/mmu_gather: track page size with mmu gather and force flush if page size change 2016-07-26 16:19:19 -07:00
hwpoison-inject.c
init-mm.c
internal.h mm/page_alloc: introduce post allocation processing on page allocator 2016-07-26 16:19:19 -07:00
interval_tree.c
Kconfig mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM 2016-05-27 14:49:37 -07:00
Kconfig.debug mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c mm: prevent KASAN false positives in kmemleak 2016-06-24 17:23:52 -07:00
ksm.c mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
list_lru.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c mm: make mmap_sem for write waits killable for mm syscalls 2016-05-23 17:04:14 -07:00
Makefile z3fold: the 3-fold allocator for compressed pages 2016-05-20 17:58:30 -07:00
memblock.c mm/memblock.c: remove unnecessary always-true comparison 2016-05-20 17:58:30 -07:00
memcontrol.c mm: memcontrol: cleanup kmem charge functions 2016-07-26 16:19:19 -07:00
memory_hotplug.c memory-hotplug: more general validation of zone during online 2016-07-26 16:19:19 -07:00
memory-failure.c mm/memory-failure.c: replace "MCE" with "Memory failure" 2016-05-20 17:58:30 -07:00
memory.c mm/mmu_gather: track page size with mmu gather and force flush if page size change 2016-07-26 16:19:19 -07:00
mempolicy.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
mempool.c mm: mempool: kasan: don't poot mempool objects in quarantine 2016-06-24 17:23:52 -07:00
memtest.c
migrate.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
mincore.c mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
mlock.c mm: make mmap_sem for write waits killable for mm syscalls 2016-05-23 17:04:14 -07:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c x86/vdso: Add mremap hook to vm_special_mapping 2016-07-08 14:17:51 +02:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2016-04-28 11:44:19 +02:00
mmu_notifier.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
mmzone.c mm, page_alloc: inline the fast path of the zonelist iterator 2016-05-19 19:12:14 -07:00
mprotect.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
mremap.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
msync.c
nobootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
nommu.c mm: remove more IS_ERR_VALUE abuses 2016-05-27 15:57:31 -07:00
oom_kill.c mm: oom: add memcg to oom_control 2016-07-26 16:19:19 -07:00
page_alloc.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
page_counter.c
page_ext.c mm: use early_pfn_to_nid in page_ext_init 2016-05-27 14:49:37 -07:00
page_idle.c mm: add page_check_address_transhuge() helper 2016-01-15 17:56:32 -08:00
page_io.c Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
page_isolation.c mm/page_isolation: clean up confused code 2016-07-26 16:19:19 -07:00
page_owner.c mm/page_owner: use stackdepot to store stacktrace 2016-07-26 16:19:19 -07:00
page_poison.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
page-writeback.c fs/fs-writeback.c: add a new writeback list for sync 2016-07-26 16:19:19 -07:00
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c
percpu.c percpu: fix synchronization between synchronous map extension and chunk destruction 2016-05-25 11:48:25 -04:00
pgtable-generic.c mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range 2016-03-17 15:09:34 -07:00
process_vm_access.c mm/gup: Introduce get_user_pages_remote() 2016-02-16 10:04:09 +01:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
rmap.c mm: thp: refix false positive BUG in page_move_anon_rmap() 2016-07-15 14:54:27 +09:00
shmem.c tmpfs: fix regression hang in fallocate undo 2016-07-10 20:08:44 -07:00
slab_common.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
slab.c mm/slab: use list_move instead of list_del/list_add 2016-07-26 16:19:19 -07:00
slab.h mm: memcontrol: cleanup kmem charge functions 2016-07-26 16:19:19 -07:00
slob.c mm: slab: free kmem_cache_node after destroy sysfs file 2016-02-18 16:23:24 -08:00
slub.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
sparse-vmemmap.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
sparse.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_cgroup.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_state.c mm: thp: broken page count after commit aa88b68c3b 2016-06-09 14:23:11 -07:00
swap.c mm/swap.c: flush lru pvecs on compound page arrival 2016-06-24 17:23:52 -07:00
swapfile.c mm: thp: calculate the mapcount correctly for THP pages during WP faults 2016-05-12 15:52:50 -07:00
truncate.c dax: New fault locking 2016-05-19 15:20:54 -06:00
userfaultfd.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
util.c mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
vmacache.c
vmalloc.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
vmpressure.c mm/vmpressure.c: fix subtree pressure detection 2016-02-03 08:28:43 -08:00
vmscan.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
vmstat.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
workingset.c mm: workingset: printk missing log level, use pr_info() 2016-07-15 14:54:27 +09:00
z3fold.c mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup 2016-06-03 16:02:55 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c
zsmalloc.c zsmalloc: use OBJ_TAG_BIT for bit shifter 2016-07-26 16:19:19 -07:00
zswap.c mm/zswap: use workqueue to destroy pool 2016-05-20 17:58:30 -07:00