linux/mm
KOSAKI Motohiro 7b51755c3b vmscan: kill hibernation specific reclaim logic and unify it
shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework
memory shrinker) for hibernate performance improvement.  and
sc.swap_cluster_max was introduced by commit a06fe4d307 (Speed freeing
memory for suspend).

commit a06fe4d307 said

   Without the patch:
   Freed  14600 pages in  1749 jiffies = 32.61 MB/s (Anomolous!)
   Freed  88563 pages in 14719 jiffies = 23.50 MB/s
   Freed 205734 pages in 32389 jiffies = 24.81 MB/s

   With the patch:
   Freed  68252 pages in   496 jiffies = 537.52 MB/s
   Freed 116464 pages in   569 jiffies = 798.54 MB/s
   Freed 209699 pages in   705 jiffies = 1161.89 MB/s

At that time, their patch was pretty worth.  However, Modern Hardware
trend and recent VM improvement broke its worth.  From several reason, I
think we should remove shrink_all_zones() at all.

detail:

1) Old days, shrink_zone()'s slowness was mainly caused by stupid io-throttle
  at no i/o congestion.
  but current shrink_zone() is sane, not slow.

2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
  fine on numa system.
  example)
    System has 4GB memory and each node have 2GB. and hibernate need 1GB.

    optimal)
       steal 500MB from each node.
    shrink_all_zones)
       steal 1GB from node-0.

  Oh, Cache balancing logic was broken. ;)
  Unfortunately, Desktop system moved ahead NUMA at nowadays.
  (Side note, if hibernate require 2GB, shrink_all_zones() never success
   on above machine)

3) if the node has several I/O flighting pages, shrink_all_zones() makes
  pretty bad result.

  schenario) hibernate need 1GB

  1) shrink_all_zones() try to reclaim 1GB from Node-0
  2) but it only reclaimed 990MB
  3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1
  4) it reclaimed 990MB

  Oh, well. it reclaimed twice much than required.
  In the other hand, current shrink_zone() has sane baling out logic.
  then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk.

4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only
  shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal.

Now, shrink_all_memory() is only the wrapper function of do_try_to_free_pages().
it bring good reviewability and debuggability, and solve above problems.

side note: Reclaim logic unificication makes two good side effect.
 - Fix recursive reclaim bug on shrink_all_memory().
   it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
 - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:18 -08:00
..
backing-dev.c flusher: Fix PF_FROZEN race 2009-12-03 13:49:43 +01:00
bootmem.c bootmem: Add free_bootmem_late() 2009-11-10 12:31:43 +01:00
bounce.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: move max_sane_readahead() calls into force_page_cache_readahead() 2009-06-16 19:47:28 -07:00
failslab.c kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c 2009-04-03 12:23:01 +02:00
filemap_xip.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
filemap.c kill wait_on_page_writeback_range 2009-12-10 15:02:50 +01:00
fremap.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
highmem.c highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE 2009-11-10 04:15:47 +01:00
hugetlb.c mm: add gfp flags for NODEMASK_ALLOC slab allocations 2009-12-15 08:53:13 -08:00
hwpoison-inject.c HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs 2009-09-16 11:50:17 +02:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h mm: CONFIG_MMU for PG_mlocked 2009-12-15 08:53:17 -08:00
Kconfig mm: stop ptlock enlarging struct page 2009-12-15 08:53:17 -08:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
kmemcheck.c kmemcheck: add hooks for the page allocator 2009-06-15 15:48:33 +02:00
kmemleak-test.c percpu: clean up percpu variable definitions 2009-06-24 15:13:48 +09:00
kmemleak.c tree-wide: fix typos "aquire" -> "acquire", "cumsumed" -> "consumed" 2009-11-09 09:40:57 +01:00
ksm.c ksm: cond_resched in unstable tree 2009-11-09 09:55:44 -08:00
maccess.c [S390] maccess: add weak attribute to probe_kernel_write 2009-06-12 10:27:37 +02:00
madvise.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-09-24 07:53:22 -07:00
Makefile percpu: kill legacy percpu allocator 2009-10-02 13:29:29 +09:00
memcontrol.c tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
memory_hotplug.c mm: clear node in N_HIGH_MEMORY and stop kswapd when all memory is offlined 2009-12-15 08:53:13 -08:00
memory-failure.c mm: CONFIG_MMU for PG_mlocked 2009-12-15 08:53:17 -08:00
memory.c mm: sigbus instead of abusing oom 2009-12-15 08:53:17 -08:00
mempolicy.c hugetlb: derive huge pages nodes allowed from task mempolicy 2009-12-15 08:53:12 -08:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: define PAGE_MAPPING_FLAGS 2009-12-15 08:53:17 -08:00
mincore.c [CVE-2009-0029] System call wrappers part 14 2009-01-14 14:15:24 +01:00
mlock.c mm: mlocking in try_to_unmap_one 2009-12-15 08:53:17 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mmap: don't return ENOMEM when mapcount is temporarily exceeded in munmap() 2009-12-15 08:53:11 -08:00
mmu_context.c mm: reduce atomic use on use_mm fast path 2009-09-22 07:17:42 -07:00
mmu_notifier.c ksm: add mmu_notifier set_pte_at_notify() 2009-09-22 07:17:31 -07:00
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c perf: Do the big rename: Performance Counters -> Performance Events 2009-09-21 14:28:04 +02:00
mremap.c Take arch_mmap_check() into get_unmapped_area() 2009-12-11 06:44:58 -05:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c NOMMU: Don't pass NULL pointers to fput() in do_mmap_pgoff() 2009-10-31 12:11:37 -07:00
oom_kill.c oom: dump stack and VM state when oom killer panics 2009-12-15 08:53:10 -08:00
page_alloc.c mm: CONFIG_MMU for PG_mlocked 2009-12-15 08:53:17 -08:00
page_cgroup.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
page_io.c swap: rework map_swap_page() again 2009-12-15 08:53:16 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c writeback: remove unused nonblocking and congestion checks 2009-12-03 13:54:25 +01:00
pagewalk.c
percpu.c Merge branch 'for-linus' into for-next 2009-12-08 10:02:12 +09:00
prio_tree.c
quicklist.c cpumask: use new-style cpumask ops in mm/quicklist. 2009-09-24 09:34:52 +09:30
readahead.c readahead: introduce context readahead algorithm 2009-06-16 19:47:30 -07:00
rmap.c mm: pass address down to rmap ones 2009-12-15 08:53:17 -08:00
shmem_acl.c shmfs: use 'check_acl' instead of 'permission' 2009-09-08 11:08:46 -07:00
shmem.c swap_info: note SWAP_MAP_SHMEM 2009-12-15 08:53:16 -08:00
slab.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-14 10:13:22 -08:00
slob.c slab: remove duplicate kmem_cache_init_late() declarations 2009-08-06 11:36:25 +03:00
slub.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-14 10:13:22 -08:00
sparse-vmemmap.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
sparse.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
swap_state.c mm: add_to_swap_cache() does not return -EEXIST 2009-09-22 07:17:35 -07:00
swap.c mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
swapfile.c mm: define PAGE_MAPPING_FLAGS 2009-12-15 08:53:17 -08:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c mm: fix comments for invalidate_inode_pages2() 2009-12-04 15:39:48 +01:00
util.c fix a struct file leak in do_mmap_pgoff() 2009-12-11 06:44:57 -05:00
vmalloc.c vmalloc(): adjust gfp mask passed on nested vmalloc() invocation 2009-12-15 08:53:13 -08:00
vmscan.c vmscan: kill hibernation specific reclaim logic and unify it 2009-12-15 08:53:18 -08:00
vmstat.c vmscan: stop kswapd waiting on congestion when the min watermark is not being met 2009-12-15 08:53:16 -08:00