linux/mm
Michel Lespinasse bf181b9f9d mm anon rmap: replace same_anon_vma linked list with an interval tree.
When a large VMA (anon or private file mapping) is first touched, which
will populate its anon_vma field, and then split into many regions through
the use of mprotect(), the original anon_vma ends up linking all of the
vmas on a linked list.  This can cause rmap to become inefficient, as we
have to walk potentially thousands of irrelevent vmas before finding the
one a given anon page might fall into.

By replacing the same_anon_vma linked list with an interval tree (where
each avc's interval is determined by its vma's start and last pgoffs), we
can make rmap efficient for this use case again.

While the change is large, all of its pieces are fairly simple.

Most places that were walking the same_anon_vma list were looking for a
known pgoff, so they can just use the anon_vma_interval_tree_foreach()
interval tree iterator instead.  The exception here is ksm, where the
page's index is not known.  It would probably be possible to rework ksm so
that the index would be known, but for now I have decided to keep things
simple and just walk the entirety of the interval tree there.

When updating vma's that already have an anon_vma assigned, we must take
care to re-index the corresponding avc's on their interval tree.  This is
done through the use of anon_vma_interval_tree_pre_update_vma() and
anon_vma_interval_tree_post_update_vma(), which remove the avc's from
their interval tree before the update and re-insert them after the update.
 The anon_vma stays locked during the update, so there is no chance that
rmap would miss the vmas that are being updated.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:41 +09:00
..
backing-dev.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
bootmem.c bootmem: Fix the short description of reserve_bootmem() 2012-08-27 07:57:21 -07:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm: compaction: capture a suitable high-order page immediately when it is made available 2012-10-09 16:22:21 +09:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: fix implicit stat.h usage in dmapool.c 2011-10-31 09:20:12 -04:00
fadvise.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
filemap.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
fremap.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
frontswap.c frontswap: support exclusive gets if tmem backend is capable 2012-09-21 10:38:12 -04:00
highmem.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
huge_memory.c mm anon rmap: replace same_anon_vma linked list with an interval tree. 2012-10-09 16:22:41 +09:00
hugetlb_cgroup.c hugetlb/cgroup: remove exclude and wakeup rmdir calls from migrate 2012-07-31 18:42:41 -07:00
hugetlb.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: adjust final #endif position in mm/internal.h 2012-10-09 16:22:24 +09:00
interval_tree.c mm anon rmap: replace same_anon_vma linked list with an interval tree. 2012-10-09 16:22:41 +09:00
Kconfig thp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE 2012-10-09 16:22:29 +09:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: use rbtree instead of prio tree 2012-10-09 16:22:39 +09:00
ksm.c mm anon rmap: replace same_anon_vma linked list with an interval tree. 2012-10-09 16:22:41 +09:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: prepare VM_DONTDUMP for using in drivers 2012-10-09 16:22:18 +09:00
Makefile mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
memblock.c mm/memblock: Use NULL instead of 0 for pointers 2012-09-05 08:32:30 +02:00
memcontrol.c cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them 2012-09-14 12:01:16 -07:00
memory_hotplug.c memory hotplug: fix section info double registration bug 2012-09-17 15:00:38 -07:00
memory-failure.c mm anon rmap: replace same_anon_vma linked list with an interval tree. 2012-10-09 16:22:41 +09:00
memory.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
mempolicy.c mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() 2012-10-09 16:22:22 +09:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm: memcg: fix compaction/migration failing due to memcg limits 2012-07-31 18:42:48 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c mm anon rmap: replace same_anon_vma linked list with an interval tree. 2012-10-09 16:22:41 +09:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm/mmu_notifier: init notifier if necessary 2012-10-09 16:22:23 +09:00
mmzone.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm anon rmap: remove anon_vma_moveto_tail 2012-10-09 16:22:41 +09:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-11 16:04:50 -07:00
nommu.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
oom_kill.c oom: remove deprecated oom_adj 2012-10-09 16:22:24 +09:00
page_alloc.c mm: compaction: capture a suitable high-order page immediately when it is made available 2012-10-09 16:22:21 +09:00
page_cgroup.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c memory-hotplug: fix kswapd looping forever problem 2012-07-31 18:42:45 -07:00
page-writeback.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c sections: fix section conflicts in mm/percpu.c 2012-10-06 03:04:44 +09:00
pgtable-generic.c thp: introduce pmdp_invalidate() 2012-10-09 16:22:29 +09:00
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
rmap.c mm anon rmap: replace same_anon_vma linked list with an interval tree. 2012-10-09 16:22:41 +09:00
shmem.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
slab_common.c Revert "mm/sl[aou]b: Move sysfs_slab_add to common" 2012-09-05 12:07:44 +03:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-10-07 07:53:13 +09:00
slab.h Revert "mm/sl[aou]b: Move sysfs_slab_add to common" 2012-09-05 12:07:44 +03:00
slob.c Merge branch 'slab/tracing' into slab/for-linus 2012-10-03 09:57:17 +03:00
slub.c Merge branch 'slab/common-for-cgroups' into slab/for-linus 2012-10-03 09:56:37 +03:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm/sparse: remove index_init_lock 2012-07-31 18:42:49 -07:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swap.c mm: fix nonuniform page status when writing new file with small buffer 2012-10-09 16:22:19 +09:00
swapfile.c mm: swapfile: clean up unuse_pte race handling 2012-07-31 18:42:48 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c mm: Use __do_krealloc to do the krealloc job 2012-09-04 10:22:58 +03:00
vmalloc.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
vmscan.c mm/vmscan: fix error number for failed kthread 2012-10-09 16:22:24 +09:00
vmstat.c workqueue: make deferrable delayed_work initializer names consistent 2012-08-21 13:18:23 -07:00