linux/lib
Johannes Weiner 449dd6984d mm: keep page cache radix tree nodes in check
Previously, page cache radix tree nodes were freed after reclaim emptied
out their page pointers.  But now reclaim stores shadow entries in their
place, which are only reclaimed when the inodes themselves are
reclaimed.  This is problematic for bigger files that are still in use
after they have a significant amount of their cache reclaimed, without
any of those pages actually refaulting.  The shadow entries will just
sit there and waste memory.  In the worst case, the shadow entries will
accumulate until the machine runs out of memory.

To get this under control, the VM will track radix tree nodes
exclusively containing shadow entries on a per-NUMA node list.  Per-NUMA
rather than global because we expect the radix tree nodes themselves to
be allocated node-locally and we want to reduce cross-node references of
otherwise independent cache workloads.  A simple shrinker will then
reclaim these nodes on memory pressure.

A few things need to be stored in the radix tree node to implement the
shadow node LRU and allow tree deletions coming from the list:

1. There is no index available that would describe the reverse path
   from the node up to the tree root, which is needed to perform a
   deletion.  To solve this, encode in each node its offset inside the
   parent.  This can be stored in the unused upper bits of the same
   member that stores the node's height at no extra space cost.

2. The number of shadow entries needs to be counted in addition to the
   regular entries, to quickly detect when the node is ready to go to
   the shadow node LRU list.  The current entry count is an unsigned
   int but the maximum number of entries is 64, so a shadow counter
   can easily be stored in the unused upper bits.

3. Tree modification needs tree lock and tree root, which are located
   in the address space, so store an address_space backpointer in the
   node.  The parent pointer of the node is in a union with the 2-word
   rcu_head, so the backpointer comes at no extra cost as well.

4. The node needs to be linked to an LRU list, which requires a list
   head inside the node.  This does increase the size of the node, but
   it does not change the number of objects that fit into a slab page.

[akpm@linux-foundation.org: export the right function]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
..
fonts partly revert commit 8a10bc9: parisc/sti_console: prefer Linux fonts over built-in ROM fonts 2014-03-23 16:44:42 +01:00
lz4 lz4: fix compression/decompression signedness mismatch 2013-09-11 15:59:45 -07:00
lzo lib/lzo: Update LZO compression to current upstream version 2013-02-20 19:36:01 +01:00
mpi MPILIB: add module description and license 2013-09-25 17:17:01 +01:00
raid6 md update for v3.12 2013-09-10 13:03:41 -07:00
reed_solomon
xz decompressors: fix typo "POWERPC" 2013-03-13 15:21:48 -07:00
zlib_deflate zlib: slim down zlib_deflate() workspace when possible 2011-03-22 17:44:17 -07:00
zlib_inflate inflate_fast: sout is already a short so ptr arith was off by one. 2010-03-12 15:52:44 -08:00
.gitignore X.509: Implement simple static OID registry 2012-10-08 13:50:18 +10:30
argv_split.c argv_split(): teach it to handle mutable strings 2013-04-29 18:28:19 -07:00
asn1_decoder.c Nothing all that exciting; a new module-from-fd syscall for those who want 2012-12-19 07:55:08 -08:00
assoc_array.c assoc_array: remove global variable 2014-01-23 09:25:10 -08:00
atomic64_test.c atomic64_test: simplify the #ifdef for atomic64_dec_if_positive() test 2012-07-30 17:25:16 -07:00
atomic64.c lib: atomic64: Initialize locks statically to fix early users 2012-12-20 13:50:16 -08:00
audit.c audit: support the "standard" <asm-generic/unistd.h> 2011-05-04 14:41:28 -04:00
average.c lib: Ensure EWMA does not store wrong intermediate values 2014-01-16 23:46:06 -08:00
bcd.c usb/core: use bin2bcd() for bcdDevice in RH 2012-09-10 11:13:16 -07:00
bch.c lib: add shared BCH ECC library 2011-03-11 14:25:50 +00:00
bitmap.c propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
bitrev.c
bsearch.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
btree.c btree: catch NULL value before it does harm 2012-06-07 14:43:55 -07:00
bug.c taint: add explicit flag to show whether lock dep is still OK. 2013-01-21 17:17:57 +10:30
build_OID_registry X.509: do not emit any informational output 2013-06-19 17:54:06 +02:00
bust_spinlocks.c printk: Provide a wake_up_klogd() off-case 2013-03-22 16:41:20 -07:00
check_signature.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
checksum.c asm-generic headers: Allow yet more arch overrides in checksum.h 2013-02-11 20:00:33 +05:30
clz_ctz.c lib: add weak clz/ctz functions 2013-07-09 10:33:30 -07:00
clz_tab.c lib: Fix multiple definitions of clz_tab 2012-02-02 10:34:23 +11:00
cmdline.c lib/cmdline.c: declare exported symbols immediately 2014-01-23 16:36:57 -08:00
cordic.c Docs: wording: functions -> algorithm 2011-10-29 21:20:22 +02:00
cpu_rmap.c Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
cpu-notifier-error-inject.c cpu: rewrite cpu-notifier-error-inject module 2012-07-30 17:25:22 -07:00
cpumask.c lib/cpumask.c: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c crypto: crct10dif - Add fallback for broken initrds 2013-09-12 15:31:34 +10:00
crc7.c
crc8.c lib: crc8: add new library module providing crc8 algorithm 2011-06-03 15:01:06 -04:00
crc16.c
crc32.c lib: crc32: reduce number of cases for crc32{, c}_combine 2013-11-04 15:27:08 -05:00
crc32defs.h crc32: select an algorithm via Kconfig 2012-03-23 16:58:38 -07:00
ctype.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
debug_locks.c mutex: Add support for wound/wait style locks 2013-06-26 12:10:56 +02:00
debugobjects.c lib/debugobjects.c: remove unnecessary work pending test 2013-11-13 12:09:22 +09:00
dec_and_lock.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
decompress_bunzip2.c decompress_bunzip2: remove invalid vi modeline 2011-12-06 10:00:05 +01:00
decompress_inflate.c lib/decompressors: fix "no limit" output buffer length 2013-09-11 15:58:38 -07:00
decompress_unlz4.c lib/decompress_unlz4.c: always set an error return code on failures 2014-01-23 16:37:04 -08:00
decompress_unlzma.c treewide: Fix comment and string typo 'bufer' 2011-12-06 09:53:40 +01:00
decompress_unlzo.c lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c 2013-02-20 19:36:00 +01:00
decompress_unxz.c Fix common misspellings 2011-03-31 11:26:23 -03:00
decompress.c lib: add support for LZ4-compressed kernel 2013-07-09 10:33:30 -07:00
devres.c lib/devres.c: fix misplaced #endif 2013-02-27 19:10:09 -08:00
digsig.c lib/digsig.c: use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)) 2013-11-13 12:09:22 +09:00
div64.c math64: New separate div64_u64_rem helper 2013-08-23 09:02:14 -04:00
dma-debug.c dma debug: account for cachelines and read-only mappings in overlap tracking 2014-03-04 07:55:47 -08:00
dump_stack.c x86, asmlinkage: Make dump_stack visible 2013-08-06 14:21:01 -07:00
dynamic_debug.c dynamic_debug: replace obselete simple_strtoul() with kstrtouint() 2014-01-27 21:02:39 -08:00
dynamic_queue_limits.c bql: Avoid possible inconsistent calculation. 2012-05-31 18:18:17 -04:00
earlycpio.c earlycpio.c: Fix the confusing comment of find_cpio_data(). 2013-08-14 23:24:01 +02:00
extable.c module: trim exception table on init free. 2009-06-12 21:47:04 +09:30
fault-inject.c debugfs: add get/set for atomic types 2013-06-03 13:55:01 -07:00
fdt_ro.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_rw.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_strerror.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_sw.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_wip.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
find_last_bit.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
find_next_bit.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
flex_array.c reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
flex_proportions.c lib/flex_proportions.c: fix corruption of denominator in flexible proportions 2012-09-25 08:59:21 -07:00
gcd.c lib/gcd.c: prevent possible div by 0 2012-10-06 03:04:57 +09:00
gen_crc32table.c sections: fix const sections for crc32 table 2012-10-06 03:04:46 +09:00
genalloc.c lib/genalloc.c: add check gen_pool_dma_alloc() if dma pointer is not NULL 2014-01-29 16:22:39 -08:00
halfmd4.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
hash.c lib: hash: follow-up fixups for arch hash 2013-12-19 00:14:53 -05:00
hexdump.c lib: introduce upper case hex ascii helpers 2013-09-20 15:38:26 -04:00
hweight.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
idr.c idr: Add new function idr_is_empty() 2014-02-17 16:27:47 +01:00
inflate.c MN10300: Don't try and #include <linux/slab.h> in lib/inflate.c from bootloader 2010-08-12 09:51:35 -07:00
int_sqrt.c lib/int_sqrt.c: optimize square root algorithm 2013-04-29 18:28:19 -07:00
interval_tree_test_main.c random32: rename random32 to prandom 2012-12-17 17:15:26 -08:00
interval_tree.c mm: interval tree updates 2012-10-09 16:22:40 +09:00
iomap_copy.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
iomap.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
iommu-helper.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
ioremap.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
iovec.c Hoist memcpy_fromiovec/memcpy_toiovec into lib/ 2013-05-20 10:24:22 +09:30
irq_regs.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
is_single_threaded.c kernel: is_current_single_threaded: don't use ->mmap_sem 2009-07-17 09:11:31 +10:00
jedec_ddr_data.c ddr: add LPDDR2 data from JESD209-2 2012-05-02 00:04:06 -07:00
kasprintf.c lib/kasprintf.c: use kmalloc_track_caller() to get accurate traces for kvasprintf 2012-10-11 08:50:15 +09:00
Kconfig Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2013-11-21 19:46:00 -08:00
Kconfig.debug locktorture: Add a lock-torture kernel module 2014-02-23 09:04:29 -08:00
Kconfig.kgdb treewide: Fix typo in printk 2013-06-18 13:48:45 +02:00
Kconfig.kmemcheck kmemcheck: depend on HAVE_ARCH_KMEMCHECK 2009-07-01 22:28:44 +02:00
kfifo.c kfifo: kfifo_copy_{to,from}_user: fix copied bytes calculation 2013-11-15 09:32:23 +09:00
klist.c klist: del waiter from klist_remove_waiters before wakeup waitting process 2013-05-21 10:16:39 -07:00
kobject_uevent.c net: fix "queues" uevent between network namespaces 2014-01-19 20:02:02 -08:00
kobject.c sysfs, kobject: add sysfs wrapper for kernfs_enable_ns() 2014-02-07 16:08:57 -08:00
kstrtox.c lib/kstrtox.c: remove redundant cleanup 2014-01-23 16:36:57 -08:00
kstrtox.h lib/kstrtox: common code between kstrto*() and simple_strto*() functions 2011-10-31 17:30:56 -07:00
lcm.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
libcrc32c.c libcrc32c: Fix "crc32c undefined" compilation error 2008-12-25 11:01:42 +11:00
list_debug.c rcu: Fix broken strings in RCU's source code. 2012-07-06 06:01:49 -07:00
list_sort.c lib/: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
llist.c llists-move-llist_reverse_order-from-raid5-to-llistc-fix 2013-11-15 09:32:22 +09:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c sched: Introduce preempt_count accessor functions 2013-09-25 14:07:32 +02:00
lockref.c lockref: include mutex.h rather than reinvent arch_mutex_cpu_relax 2013-11-27 20:37:33 -08:00
lru_cache.c lru_cache: introduce lc_get_cumulative() 2013-03-22 22:17:36 -06:00
Makefile * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit). 2014-02-07 11:27:30 -08:00
md5.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
memory-notifier-error-inject.c memory: memory notifier error injection module 2012-07-30 17:25:22 -07:00
memweight.c string: introduce memweight() 2012-07-30 17:25:16 -07:00
net_utils.c net: core: move mac_pton() to lib/net_utils.c 2013-06-05 12:00:27 -07:00
nlattr.c netlink: don't compare the nul-termination in nla_strcmp 2014-04-01 15:25:02 -04:00
notifier-error-inject.c mode_t, whack-a-mole at 11... 2013-04-09 14:13:05 -04:00
notifier-error-inject.h fault-injection: notifier error injection 2012-07-30 17:25:22 -07:00
of-reconfig-notifier-error-inject.c powerpc+of: Rename and fix OF reconfig notifier error inject module 2012-12-14 10:32:52 +11:00
oid_registry.c Give the OID registry file module info to avoid kernel tainting 2013-05-05 14:38:00 -07:00
parser.c lib/parser.c: put EXPORT_SYMBOLs in the conventional place 2014-01-23 16:36:55 -08:00
pci_iomap.c lib: add NO_GENERIC_PCI_IOPORT_MAP 2012-01-31 23:19:47 +02:00
percpu_counter.c percpu_counter: unbreak __percpu_counter_add() 2014-01-17 11:32:24 +11:00
percpu_ida.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2014-02-14 10:45:18 -08:00
percpu_test.c percpu: add test module for various percpu operations 2013-11-13 12:09:11 +09:00
percpu-refcount.c percpu-refcount: Add a WARN() for ref going negative 2014-01-21 04:40:56 -05:00
plist.c lib/plist.c: make plist test announcements KERN_DEBUG 2012-10-06 03:04:58 +09:00
pm-notifier-error-inject.c PM: PM notifier error injection module 2012-07-30 17:25:22 -07:00
prio_heap.c lib: fix sparse shadowed variable warning 2009-01-06 15:59:11 -08:00
proportions.c locking, lib/proportions: Annotate prop_local_percpu::lock as raw 2011-09-13 11:11:50 +02:00
radix-tree.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
random32.c random32: avoid attempt to late reseed if in the middle of seeding 2014-03-28 16:04:19 -04:00
ratelimit.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
rational.c lib: Change mail address of Oskar Schirmer 2012-05-17 15:18:37 +02:00
rbtree_test.c rbtree/test: test rbtree_postorder_for_each_entry_safe() 2014-01-23 16:37:03 -08:00
rbtree.c rbtree: add postorder iteration functions 2013-09-11 15:59:19 -07:00
reciprocal_div.c reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
scatterlist.c lib/scatterlist: export sg_miter_skip() 2013-12-08 17:56:37 -08:00
sha1.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
show_mem.c lib/show_mem.c: show num_poisoned_pages when oom 2014-01-21 16:19:48 -08:00
smp_processor_id.c sched: Introduce preempt_count accessor functions 2013-09-25 14:07:32 +02:00
sort.c generic swap(): lib/sort.c: rename swap to swap_func 2009-01-08 08:31:14 -08:00
stmp_device.c lib: add support for stmp-style devices 2012-04-20 23:27:08 +02:00
string_helpers.c lib/string_helpers: introduce generic string_unescape 2013-04-30 17:04:03 -07:00
string.c asmlinkage Make __stack_chk_failed and memcmp visible 2014-02-13 18:13:43 -08:00
strncpy_from_user.c word-at-a-time: make the interfaces truly generic 2012-05-26 11:33:40 -07:00
strnlen_user.c lib: Fix generic strnlen_user for 32-bit big-endian machines 2012-05-27 20:59:46 -07:00
swiotlb.c memblock, nobootmem: add memblock_virt_alloc_low() 2014-01-27 21:02:38 -08:00
syscall.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
test_module.c test: add minimal module for verification testing 2014-01-23 16:36:57 -08:00
test_user_copy.c test: check copy_to/from_user boundary validation 2014-01-23 16:36:57 -08:00
test-kstrtox.c lib/test-kstrtox.c: mark const init data with __initconst instead of __initdata 2012-05-29 16:22:32 -07:00
test-string_helpers.c lib/string_helpers: introduce generic string_unescape 2013-04-30 17:04:03 -07:00
textsearch.c textsearch: doc - fix spelling in lib/textsearch.c. 2011-01-24 23:33:30 -08:00
timerqueue.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
ts_bm.c
ts_fsm.c
ts_kmp.c
ucs2_string.c Move utf16 functions to kernel core and rename 2013-04-15 21:23:03 +01:00
usercopy.c Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 2013-04-30 17:04:09 -07:00
uuid.c uuid: use prandom_bytes() 2013-04-29 18:28:42 -07:00
vsprintf.c vsprintf: Add support for IORESOURCE_UNSET in %pR 2014-02-26 14:42:09 -07:00