linux/mm
Johannes Weiner d7365e783e mm: memcontrol: fix missed end-writeback page accounting
Commit 0a31bc97c8 ("mm: memcontrol: rewrite uncharge API") changed
page migration to uncharge the old page right away.  The page is locked,
unmapped, truncated, and off the LRU, but it could race with writeback
ending, which then doesn't unaccount the page properly:

test_clear_page_writeback()              migration
                                           wait_on_page_writeback()
  TestClearPageWriteback()
                                           mem_cgroup_migrate()
                                             clear PCG_USED
  mem_cgroup_update_page_stat()
    if (PageCgroupUsed(pc))
      decrease memcg pages under writeback

  release pc->mem_cgroup->move_lock

The per-page statistics interface is heavily optimized to avoid a
function call and a lookup_page_cgroup() in the file unmap fast path,
which means it doesn't verify whether a page is still charged before
clearing PageWriteback() and it has to do it in the stat update later.

Rework it so that it looks up the page's memcg once at the beginning of
the transaction and then uses it throughout.  The charge will be
verified before clearing PageWriteback() and migration can't uncharge
the page as long as that is still set.  The RCU lock will protect the
memcg past uncharge.

As far as losing the optimization goes, the following test results are
from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
three times in a nested fashion, so that there are two negative passes
that don't account but still go through the new transaction overhead.
There is no actual difference:

 old:     33.195102545 seconds time elapsed       ( +-  0.01% )
 new:     33.199231369 seconds time elapsed       ( +-  0.03% )

The time spent in page_remove_rmap()'s callees still adds up to the
same, but the time spent in the function itself seems reduced:

     # Children      Self  Command        Shared Object       Symbol
 old:     0.12%     0.11%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
 new:     0.12%     0.08%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org>	[3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29 16:33:15 -07:00
..
backing-dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
balloon_compaction.c mm/balloon_compaction: add vmstat counters and kpageflags bit 2014-10-09 22:26:01 -04:00
bootmem.c mm/bootmem.c: use include/linux/ headers 2014-10-09 22:26:00 -04:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
cma.c drivers: dma-contiguous: add initialization from device tree 2014-10-14 02:18:12 +02:00
compaction.c mm/compaction.c: avoid premature range skip in isolate_migratepages_range 2014-10-29 16:33:13 -07:00
debug-pagealloc.c
debug.c mm/debug.c: use pr_emerg() 2014-10-09 22:25:59 -04:00
dmapool.c mm/dmapool.c: fixed a brace coding style issue 2014-10-09 22:26:00 -04:00
early_ioremap.c mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
fadvise.c
failslab.c
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
filemap.c mm/filemap.c: remove trailing whitespace 2014-10-09 22:26:00 -04:00
fremap.c mm: mark remap_file_pages() syscall as deprecated 2014-06-06 16:08:17 -07:00
frontswap.c swap: change swap_list_head to plist, add swap_avail_head 2014-06-04 16:54:07 -07:00
gup.c mm: introduce a general RCU get_user_pages_fast() 2014-10-09 22:26:00 -04:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c mm, thp: fix collapsing of hugepages on madvise 2014-10-29 16:33:14 -07:00
hugetlb_cgroup.c hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked 2014-08-29 16:28:16 -07:00
hugetlb.c mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA 2014-10-09 22:25:57 -04:00
hwpoison-inject.c mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive 2014-08-06 18:01:19 -07:00
init-mm.c
internal.h mm, compaction: pass gfp mask to compact_control 2014-10-09 22:25:55 -04:00
interval_tree.c mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA 2014-10-09 22:25:57 -04:00
iov_iter.c Add copy_to_iter(), copy_from_iter() and iov_iter_zero() 2014-10-09 02:39:03 -04:00
Kconfig mm/balloon_compaction: add vmstat counters and kpageflags bit 2014-10-09 22:26:01 -04:00
Kconfig.debug
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c mm: introduce kmemleak_update_trace() 2014-06-06 16:08:17 -07:00
ksm.c mm: ksm use pr_err instead of printk 2014-10-09 22:26:00 -04:00
list_lru.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
maccess.c
madvise.c mm: update the description for madvise_remove 2014-08-06 18:01:18 -07:00
Makefile Fixup for 3.18: use PATCHv2 of "mm: Support compiling out madvise and fadvise" 2014-10-12 09:21:57 -04:00
memblock.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
memcontrol.c mm: memcontrol: fix missed end-writeback page accounting 2014-10-29 16:33:15 -07:00
memory_hotplug.c memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() 2014-10-29 16:33:14 -07:00
memory-failure.c cgroup: remove redundant check in cgroup_ino() 2014-09-19 09:16:23 -04:00
memory.c zap_pte_range: update addr when forcing flush after TLB batching faiure 2014-10-28 13:16:28 -07:00
mempolicy.c mm: mempolicy: skip inaccessible VMAs when setting MPOL_MF_LAZY 2014-10-09 22:26:02 -04:00
mempool.c mm/mempool.c: update the kmemleak stack trace for mempool allocations 2014-06-06 16:08:17 -07:00
migrate.c mm/balloon_compaction: redesign ballooned pages management 2014-10-09 22:26:01 -04:00
mincore.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
mlock.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 15:44:12 +02:00
mm_init.c mm: bring back /sys/kernel/mm 2014-01-27 21:02:39 -08:00
mmap.c mm, thp: fix collapsing of hugepages on madvise 2014-10-29 16:33:14 -07:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared 2014-10-14 02:18:28 +02:00
mremap.c mm/mremap.c: use linux headers 2014-10-09 22:26:00 -04:00
msync.c msync: fix incorrect fstart calculation 2014-07-03 09:21:53 -07:00
nobootmem.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
nommu.c percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
oom_kill.c OOM, PM: OOM killed task shouldn't escape PM suspend 2014-10-21 23:44:21 +02:00
page_alloc.c OOM, PM: OOM killed task shouldn't escape PM suspend 2014-10-21 23:44:21 +02:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2014-10-29 16:33:13 -07:00
page_io.c fix __swap_writepage() compile failure on old gcc versions 2014-06-14 19:30:48 -05:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
page-writeback.c mm: memcontrol: fix missed end-writeback page accounting 2014-10-29 16:33:15 -07:00
pagewalk.c mm: use VM_BUG_ON_MM where possible 2014-10-09 22:25:58 -04:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: fix how @gfp is interpreted by the percpu allocator 2014-10-08 12:01:52 -04:00
pgtable-generic.c mm: actually clear pmd_numa before invalidating 2014-08-29 16:28:15 -07:00
process_vm_access.c start adding the tag to iov_iter 2014-05-06 17:32:49 -04:00
quicklist.c
readahead.c mm/readahead.c: remove unused file_ra_state from count_history_pages 2014-08-06 18:01:15 -07:00
rmap.c mm: memcontrol: fix missed end-writeback page accounting 2014-10-29 16:33:15 -07:00
shmem.c shmem: support RENAME_WHITEOUT 2014-10-24 00:14:37 +02:00
slab_common.c memcg: move memcg_update_cache_size() to slab_common.c 2014-10-09 22:25:59 -04:00
slab.c mm/slab: fix unaligned access on sparc64 2014-10-14 02:18:12 +02:00
slab.h mm/slab: use percpu allocator for cpu cache 2014-10-09 22:25:51 -04:00
slob.c mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller() 2014-10-09 22:25:50 -04:00
slub.c mm/slab_common: commonize slab merge logic 2014-10-09 22:25:51 -04:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:35:54 -07:00
swap_state.c mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache 2014-10-09 22:25:59 -04:00
swap.c mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache 2014-10-09 22:25:59 -04:00
swapfile.c mm: memcontrol: rewrite uncharge API 2014-08-08 15:57:17 -07:00
truncate.c vfs: fix data corruption when blocksize < pagesize for mmaped data 2014-10-01 21:49:18 -04:00
util.c proc/maps: make vm_is_stack() logic namespace-friendly 2014-10-09 22:25:50 -04:00
vmacache.c mm,vmacache: optimize overflow system-wide flushing 2014-06-04 16:53:57 -07:00
vmalloc.c mm/vmalloc.c: use seq_open_private() instead of seq_open() 2014-10-09 22:25:56 -04:00
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c mm: memcontrol: fix transparent huge page allocations under pressure 2014-10-09 22:25:59 -04:00
vmstat.c vmstat: on-demand vmstat workers V8 2014-10-09 22:26:02 -04:00
workingset.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
zbud.c zbud: avoid accessing last unused freelist 2014-10-09 22:26:03 -04:00
zpool.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zsmalloc.c zsmalloc: simplify init_zspage free obj linking 2014-10-09 22:26:03 -04:00
zswap.c mm/zswap.c: add __init to zswap_entry_cache_destroy() 2014-08-08 15:57:18 -07:00