linux/drivers/mtd
Mel Gorman 82b212f400 Revert "mm: remove __GFP_NO_KSWAPD"
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following

  Hmm,  so it's just took longer to hit the problem and observe
  kswapd0 spinning on my CPU again - it's not as endless like before -
  but still it easily eats minutes - it helps to	turn off  Firefox
  or TB  (memory hungry apps) so kswapd0 stops soon - and restart
  those apps again.  (And I still have like >1GB of cached memory)

  kswapd0         R  running task        0    30      2 0x00000000
  Call Trace:
    preempt_schedule+0x42/0x60
    _raw_spin_unlock+0x55/0x60
    put_super+0x31/0x40
    drop_super+0x22/0x30
    prune_super+0x149/0x1b0
    shrink_slab+0xba/0x510

The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction.  That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.

The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.

If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided.  However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time.  This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep().  Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.

The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing.  As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26 17:41:24 -08:00
..
chips mtd: cfi_cmdset_0001: Fix problem with unlocking timeout 2012-09-29 15:29:08 +01:00
devices mtd: slram: invalid checking of absolute end address 2012-11-09 17:02:50 +02:00
lpddr mtd: lpddr: replace open-coded ARRAY_SIZE with macro 2012-05-13 22:47:31 -05:00
maps mtd: autcpu12-nvram: drop frees of devm_ alloc'd data 2012-09-29 15:33:36 +01:00
nand mtd: nand: fix Samsung SLC detection regression 2012-11-15 15:37:43 +02:00
onenand mtd: onenand: Make flexonenand_set_boundary static 2012-11-09 17:02:50 +02:00
tests mtd: mtd_nandecctest: add double bit error detection tests 2012-09-29 15:48:02 +01:00
ubi This pull request contains the UBI fastmap support implemented by Richard 2012-10-08 20:40:45 +09:00
afs.c mtd: introduce mtd_read interface 2012-01-09 18:25:19 +00:00
ar7part.c mtd: introduce mtd_read interface 2012-01-09 18:25:19 +00:00
bcm47xxpart.c mtd: bcm47part driver for BCM47XX chipsets 2012-09-29 15:32:31 +01:00
bcm63xxpart.c mtd: bcm63xxpart: handle Broadcom partition order 2012-05-13 22:55:03 -05:00
cmdlinepart.c mtd: cmdlinepart: make the partitions rule more strict 2012-09-29 15:44:25 +01:00
ftl.c mtd: do not use mtd->sync directly 2012-01-09 18:26:21 +00:00
inftlcore.c mtd: add leading underscore to all mtd functions 2012-03-27 00:20:01 +01:00
inftlmount.c mtd: introduce mtd_block_markbad interface 2012-01-09 18:25:48 +00:00
Kconfig mtd: bcm47part driver for BCM47XX chipsets 2012-09-29 15:32:31 +01:00
Makefile mtd: bcm47part driver for BCM47XX chipsets 2012-09-29 15:32:31 +01:00
mtd_blkdevs.c mtd: mtdblock: call mtd_sync() only if opened for write 2012-03-27 00:11:11 +01:00
mtdblock_ro.c mtd: introduce mtd_write interface 2012-01-09 18:25:20 +00:00
mtdblock.c mtd: mtdblock: call mtd_sync() only if opened for write 2012-03-27 00:11:11 +01:00
mtdchar.c mtd: Disable mtdchar mmap on MMU systems 2012-10-09 15:08:42 +01:00
mtdconcat.c mtd: unify initialization of erase_info->fail_addr 2012-03-27 01:02:24 +01:00
mtdcore.c Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-26 17:41:24 -08:00
mtdcore.h mtd: hide parse_mtd_partitions 2011-09-11 15:02:13 +03:00
mtdoops.c UAPI Disintegration 2012-10-09 2012-10-09 15:04:25 +01:00
mtdpart.c UAPI Disintegration 2012-10-09 2012-10-09 15:04:25 +01:00
mtdsuper.c VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
mtdswap.c mtd: do not use mtd->block_markbad directly 2012-01-09 18:26:26 +00:00
nftlcore.c mtd: nftlcore: remove out-of-date and now irrelevant piece of code 2012-03-27 00:24:03 +01:00
nftlmount.c mtd: introduce mtd_block_markbad interface 2012-01-09 18:25:48 +00:00
ofpart.c mtd: ofpart: Fix incorrect NULL check in parse_ofoldpart_partitions() 2012-10-10 09:12:39 +01:00
redboot.c mtd: redboot: remove useless code 2012-03-27 00:24:14 +01:00
rfd_ftl.c mtd: do not use mtd->sync directly 2012-01-09 18:26:21 +00:00
sm_ftl.c mtd: kill MTD_NAND_VERIFY_WRITE 2012-09-29 15:00:46 +01:00
sm_ftl.h mtd: sm_ftl: cosmetic, use bool when possible 2010-10-25 01:33:08 +01:00
ssfdc.c mtd: introduce mtd_block_isbad interface 2012-01-09 18:25:47 +00:00