linux/mm
Mel Gorman 782fd30406 mm: avoid waking kswapd for THP allocations when compaction is deferred or contended
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following

  Hmm,  so it's just took longer to hit the problem and observe
  kswapd0 spinning on my CPU again - it's not as endless like before -
  but still it easily eats minutes - it helps to turn off  Firefox
  or TB  (memory hungry apps) so kswapd0 stops soon - and restart
  those apps again.  (And I still have like >1GB of cached memory)

  kswapd0         R  running task        0    30      2 0x00000000
  Call Trace:
    preempt_schedule+0x42/0x60
    _raw_spin_unlock+0x55/0x60
    put_super+0x31/0x40
    drop_super+0x22/0x30
    prune_super+0x149/0x1b0
    shrink_slab+0xba/0x510

The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction.  That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.

The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.

If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided.  However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time.  This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep().  Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.

This patch defers when kswapd gets woken up for THP allocations.  For
!THP allocations, kswapd is always woken up.  For THP allocations,
kswapd is woken up iff the process is willing to enter into direct
reclaim/compaction.

[akpm@linux-foundation.org: fix typo in comment]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-30 08:51:18 -08:00
..
backing-dev.c backing-dev: use kstrto* in preference to simple_strtoul 2012-08-25 16:58:14 +08:00
bootmem.c revert "mm: fix-up zone present pages" 2012-11-16 14:33:04 -08:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm: compaction: correct the nr_strict va isolated check for CMA 2012-10-19 14:07:47 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: fix implicit stat.h usage in dmapool.c 2011-10-31 09:20:12 -04:00
fadvise.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c mm: move all mmu notifier invocations to be done outside the PT lock 2012-10-09 16:22:58 +09:00
filemap.c readahead: fault retry breaks mmap file read random detection 2012-10-09 16:22:47 +09:00
fremap.c remap_file_pages: correctly handle the case of a NULL vm_ops pointer 2012-10-19 13:37:57 -07:00
frontswap.c frontswap: support exclusive gets if tmem backend is capable 2012-09-21 10:38:12 -04:00
highmem.c mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address 2012-11-16 14:33:04 -08:00
huge_memory.c mm: huge_memory: Fix build error. 2012-10-15 07:59:15 -07:00
hugetlb_cgroup.c hugetlb/cgroup: remove exclude and wakeup rmdir calls from migrate 2012-07-31 18:42:41 -07:00
hugetlb.c mm: document PageHuge somewhat 2012-10-09 16:23:03 +09:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm, thp: fix mlock statistics 2012-10-09 16:23:03 +09:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
Kconfig mm: enable CONFIG_COMPACTION by default 2012-10-09 16:22:53 +09:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: use rbtree instead of prio tree 2012-10-09 16:22:39 +09:00
ksm.c mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end 2012-10-09 16:22:58 +09:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: prepare VM_DONTDUMP for using in drivers 2012-10-09 16:22:18 +09:00
Makefile mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
memblock.c x86, mm: Trim memory in memblock to be page aligned 2012-10-24 11:52:21 -07:00
memcontrol.c memcg: fix hotplugged memory zone oops 2012-11-16 14:33:04 -08:00
memory_hotplug.c revert "mm: fix-up zone present pages" 2012-11-16 14:33:04 -08:00
memory-failure.c mm anon rmap: replace same_anon_vma linked list with an interval tree. 2012-10-09 16:22:41 +09:00
memory.c mm: fix build warning for uninitialized value 2012-11-16 14:33:03 -08:00
mempolicy.c mm, mempolicy: fix printing stack contents in numa_maps 2012-10-16 18:00:50 -07:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm: memcg: fix compaction/migration failing due to memcg limits 2012-07-31 18:42:48 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c mm, thp: fix mlock statistics 2012-10-09 16:23:03 +09:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c mm: add anon_vma_lock to validate_mm() 2012-11-16 14:33:03 -08:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm/mmu_notifier: allocate mmu_notifier in advance 2012-10-25 14:37:53 -07:00
mmzone.c memcg: fix hotplugged memory zone oops 2012-11-16 14:33:04 -08:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: move all mmu notifier invocations to be done outside the PT lock 2012-10-09 16:22:58 +09:00
msync.c
nobootmem.c revert "mm: fix-up zone present pages" 2012-11-16 14:33:04 -08:00
nommu.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
oom_kill.c oom: remove deprecated oom_adj 2012-10-09 16:22:24 +09:00
page_alloc.c mm: avoid waking kswapd for THP allocations when compaction is deferred or contended 2012-11-30 08:51:18 -08:00
page_cgroup.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c mm/page_alloc: refactor out __alloc_contig_migrate_alloc() 2012-10-09 16:22:52 +09:00
page-writeback.c CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug 2012-09-28 20:27:49 +08:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c sections: fix section conflicts in mm/percpu.c 2012-10-06 03:04:44 +09:00
pgtable-generic.c thp: introduce pmdp_invalidate() 2012-10-09 16:22:29 +09:00
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
rmap.c mm: fix XFS oops due to dirty pages without buffers on s390 2012-10-25 14:37:52 -07:00
shmem.c tmpfs: change final i_blocks BUG to WARNING 2012-11-16 14:33:04 -08:00
slab_common.c mm, slab: release slab_mutex earlier in kmem_cache_destroy() 2012-10-10 09:25:08 +03:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-10-07 07:53:13 +09:00
slab.h Revert "mm/sl[aou]b: Move sysfs_slab_add to common" 2012-09-05 12:07:44 +03:00
slob.c Merge branch 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into fixes 2012-10-19 15:40:18 -07:00
slub.c Merge branch 'slab/common-for-cgroups' into slab/for-linus 2012-10-03 09:56:37 +03:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm/vmemmap: fix wrong use of virt_to_page 2012-11-30 08:51:17 -08:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swap.c mm: remove vma arg from page_evictable 2012-10-09 16:22:55 +09:00
swapfile.c swapfile: fix name leak in swapoff 2012-11-16 14:33:04 -08:00
truncate.c mm: use clear_page_mlock() in page_remove_rmap() 2012-10-09 16:22:56 +09:00
util.c mm: Use __do_krealloc to do the krealloc job 2012-09-04 10:22:58 +03:00
vmalloc.c mm: use %pK for /proc/vmallocinfo 2012-10-09 16:23:03 +09:00
vmscan.c mm: vmscan: fix endless loop in kswapd balancing 2012-11-30 08:51:17 -08:00
vmstat.c mm: remove unevictable_pgs_mlockfreed 2012-10-09 16:22:59 +09:00