linux/fs
Mel Gorman dac1d27bc8 mm: use zonelists instead of zones when direct reclaiming pages
The following patches replace multiple zonelists per node with two zonelists
that are filtered based on the GFP flags.  The patches as a set fix a bug with
regard to the use of MPOL_BIND and ZONE_MOVABLE.  With this patchset, the
MPOL_BIND will apply to the two highest zones when the highest zone is
ZONE_MOVABLE.  This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters
only custom zonelists.

The first patch cleans up an inconsistency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch introduces a helper function node_zonelist() for looking up
the appropriate zonelist for a GFP mask which simplifies patches later in the
set.

The third patch defines/remembers the "preferred zone" for numa statistics, as
it is no longer always the first zone in a zonelist.

The forth patch replaces multiple zonelists with two zonelists that are
filtered.  The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fifth patch introduces helper macros for retrieving the zone and node
indices of entries in a zonelist.

The final patch introduces filtering of the zonelists based on a nodemask.
Two zonelists exist per node, one for normal allocations and one for
__GFP_THISNODE.

Performance results varied depending on the machine configuration.  In real
workloads the gain/loss will depend on how much the userspace portion of the
benchmark benefits from having more cache available due to reduced referencing
of zonelists.

These are the range of performance losses/gains when running against
2.6.24-rc4-mm1.  The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.
			     loss   to  gain
Total CPU time on Kernbench: -0.86% to  1.13%
Elapsed   time on Kernbench: -0.79% to  0.76%
page_test from aim9:         -4.37% to  0.79%
brk_test  from aim9:         -0.71% to  4.07%
fork_test from aim9:         -1.84% to  4.60%
exec_test from aim9:         -0.71% to  1.08%

This patch:

The allocator deals with zonelists which indicate the order in which zones
should be targeted for an allocation.  Similarly, direct reclaim of pages
iterates over an array of zones.  For consistency, this patch converts direct
reclaim to use a zonelist.  No functionality is changed by this patch.  This
simplifies zonelist iterators in the next patch.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
..
9p [PATCH] restore sane ->umount_begin() API 2008-04-25 09:23:25 -04:00
adfs mount options: fix adfs 2008-02-08 09:22:39 -08:00
affs mount options: fix affs 2008-02-08 09:22:39 -08:00
afs AFS: Do not describe debug parameters with their value 2008-04-16 07:43:48 -07:00
autofs mount options: fix autofs 2008-02-08 09:22:40 -08:00
autofs4 Introduce path_put() 2008-02-14 21:13:33 -08:00
befs mount options: fix befs 2008-02-08 09:22:40 -08:00
bfs iget: stop BFS from using iget() and read_inode() 2008-02-07 08:42:27 -08:00
cifs [PATCH] restore sane ->umount_begin() API 2008-04-25 09:23:25 -04:00
coda Introduce path_put() 2008-02-14 21:13:33 -08:00
configfs Introduce path_put() 2008-02-14 21:13:33 -08:00
cramfs fs: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:16:44 -04:00
debugfs debugfs: fix sparse warnings 2008-03-04 14:47:06 -08:00
devpts mount options: fix devpts 2008-02-08 09:22:40 -08:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm 2008-04-22 13:44:23 -07:00
ecryptfs eCryptfs: Swap dput() and mntput() 2008-03-19 18:53:36 -07:00
efs efs: update error msg to not refer to deleted read_inode() 2008-04-02 15:28:19 -07:00
exportfs exportfs: update documentation 2007-10-22 08:13:21 -07:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial 2008-04-21 16:36:46 -07:00
ext3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial 2008-04-21 16:36:46 -07:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial 2008-04-21 16:36:46 -07:00
fat [PATCH] r/o bind mounts: elevate write count for ioctls() 2008-04-19 00:29:24 -04:00
freevxfs iget: stop FreeVXFS from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
fuse [PATCH] restore sane ->umount_begin() API 2008-04-25 09:23:25 -04:00
gfs2 mm: remove nopage 2008-04-28 08:58:18 -07:00
hfs hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakage 2008-03-17 09:46:55 -07:00
hfsplus [PATCH] r/o bind mounts: elevate write count for ioctls() 2008-04-19 00:29:24 -04:00
hostfs uml: fix hostfs tv_usec calculations 2008-02-05 09:44:30 -08:00
hpfs mount options: fix hpfs 2008-02-08 09:22:40 -08:00
hppfs [PATCH] sanitize hppfs 2008-03-19 06:42:18 -04:00
hugetlbfs [PATCH] double iput() on failure exit in hugetlb 2008-03-19 06:55:01 -04:00
isofs zisofs: fix readpage() outside i_size 2008-03-19 18:53:36 -07:00
jbd jbd/jbd2 NULL noise 2008-03-30 14:18:41 -07:00
jbd2 jbd/jbd2 NULL noise 2008-03-30 14:18:41 -07:00
jffs2 [JFFS2] Introduce dbg_readinode2 log level, use it to shut read_dnode() up 2008-04-23 16:43:15 +01:00
jfs [PATCH] r/o bind mounts: elevate write count for ioctls() 2008-04-19 00:29:24 -04:00
lockd locks: don't call ->copy_lock methods on return of conflicting locks 2008-04-25 13:00:11 -04:00
minix iget: stop the MINIX filesystem from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
msdos
ncpfs [PATCH] r/o bind mounts: elevate write count for ncp_ioctl() 2008-04-19 00:29:23 -04:00
nfs [PATCH] restore sane ->umount_begin() API 2008-04-25 09:23:25 -04:00
nfs_common
nfsd nfsd: don't allow setting ctime over v4 2008-04-25 13:00:11 -04:00
nls sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
ntfs is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
ocfs2 [PATCH] r/o bind mounts: elevate write count for ioctls() 2008-04-19 00:29:24 -04:00
openpromfs iget: stop OPENPROMFS from using iget() and read_inode() 2008-02-07 08:42:29 -08:00
partitions block: send disk "change" event for rescan_partitions() 2008-04-19 19:10:24 -07:00
proc make swap_pte_to_pagemap_entry() static 2008-04-28 08:58:18 -07:00
qnx4 iget: stop QNX4 from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
ramfs Remove valueless definition of hard-selected RAMFS option 2007-10-17 08:42:56 -07:00
reiserfs Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-04-21 15:41:27 -07:00
romfs ROMFS: Fix up an error in iget removal 2008-03-19 18:53:36 -07:00
smbfs NULL noise: fs/*, mm/*, kernel/* 2008-03-30 14:18:41 -07:00
sysfs [SCSI] sysfs: make group is_valid return a mode_t 2008-04-22 15:16:31 -05:00
sysv iget: stop the SYSV filesystem from using iget() and read_inode() 2008-02-07 08:42:29 -08:00
udf udf: use crc_itu_t from lib instead of udf_crc 2008-04-17 14:29:56 +02:00
ufs fs/ufs/balloc.c: fix sparc64 printk warning 2008-03-19 18:53:37 -07:00
vfat Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) 2008-02-07 08:42:26 -08:00
xfs Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-04-21 15:41:27 -07:00
aio.c aio: io_getevents() should return if io_destroy() is invoked 2008-04-28 08:58:17 -07:00
anon_inodes.c [PATCH] fix up new filp allocators 2008-03-19 06:54:05 -04:00
attr.c VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations 2007-10-18 14:37:22 -07:00
bad_inode.c iget: introduce a function to register iget failure 2008-02-07 08:42:26 -08:00
binfmt_aout.c aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT 2008-02-08 09:22:30 -08:00
binfmt_elf_fdpic.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
binfmt_elf.c [PATCH] sanitize handling of shared descriptor tables in failing execve() 2008-04-25 09:23:53 -04:00
binfmt_em86.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_flat.c FLAT binaries: drop BINFMT_FLAT bad header magic warning 2008-02-14 20:58:05 -08:00
binfmt_misc.c [PATCH] sanitize handling of shared descriptor tables in failing execve() 2008-04-25 09:23:53 -04:00
binfmt_script.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_som.c [PATCH] sanitize handling of shared descriptor tables in failing execve() 2008-04-25 09:23:53 -04:00
bio.c block: convert bio_copy_user to bio_copy_user_iov 2008-04-21 09:50:08 +02:00
block_dev.c fs/block_dev.c: remove #if 0'ed code 2008-02-19 10:04:00 +01:00
buffer.c mm: use zonelists instead of zones when direct reclaiming pages 2008-04-28 08:58:18 -07:00
char_dev.c fs/char_dev.c: chrdev_open marked static and removed from fs.h 2008-02-08 09:22:42 -08:00
compat_binfmt_elf.c x86: compat_binfmt_elf 2008-01-30 13:31:46 +01:00
compat_ioctl.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
compat.c Merge branch 'linus_origin' into hotfixes 2008-02-15 13:36:30 -05:00
dcache.c [patch 2/7] vfs: mountinfo: add seq_file_root() 2008-04-23 00:04:38 -04:00
dcookies.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
direct-io.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
dnotify.c
dquot.c quota: add possibly missing iput() when quotaon and quotaoff races 2008-03-19 18:53:35 -07:00
drop_caches.c
eventfd.c fs/eventfd.c should #include <linux/syscalls.h> 2008-02-06 10:41:03 -08:00
eventpoll.c lockdep: annotate epoll 2008-02-05 09:44:07 -08:00
exec.c [PATCH] sanitize unshare_files/reset_files_struct 2008-04-25 09:23:59 -04:00
fcntl.c [PATCH] sanitize locate_fd() 2008-04-25 09:24:05 -04:00
fifo.c
file_table.c [PATCH] r/o bind mounts: debugging for missed calls 2008-04-19 00:29:28 -04:00
file.c get rid of NR_OPEN and introduce a sysctl_nr_open 2008-02-06 10:41:06 -08:00
filesystems.c
fs-writeback.c fs: fix kernel-doc notation warnings 2008-03-19 18:53:36 -07:00
generic_acl.c
inode.c [PATCH] r/o bind mounts: write count for file_update_time() 2008-04-19 00:29:24 -04:00
inotify_user.c Introduce path_put() 2008-02-14 21:13:33 -08:00
inotify.c inotify: remove debug code 2008-02-06 10:41:07 -08:00
internal.h [PATCH] move a bunch of declarations to fs/internal.h 2008-04-21 23:11:01 -04:00
ioctl.c fix up kerneldoc in fs/ioctl.c a little bit 2008-02-09 11:08:33 -08:00
ioprio.c cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions 2008-01-28 11:38:15 +01:00
Kconfig Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6 2008-04-24 11:46:16 -07:00
Kconfig.binfmt [SPARC]: Remove SunOS and Solaris binary support. 2008-04-21 15:10:15 -07:00
libfs.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
locks.c Export __locks_copy_lock() so modular lockd builds 2008-04-25 15:49:46 -07:00
Makefile x86: compat_binfmt_elf Kconfig 2008-01-30 13:31:46 +01:00
mbcache.c vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs 2008-04-15 19:35:41 -07:00
mpage.c docbook: fix filesystems.tmpl source files 2008-03-03 10:47:13 -08:00
namei.c [PATCH] r/o bind mounts: elevate write count for open()s 2008-04-19 00:29:25 -04:00
namespace.c [PATCH] restore sane ->umount_begin() API 2008-04-25 09:23:25 -04:00
nfsctl.c Introduce path_put() 2008-02-14 21:13:33 -08:00
no-block.c
open.c [PATCH] r/o bind mounts: debugging for missed calls 2008-04-19 00:29:28 -04:00
pipe.c [PATCH] double-free of inode on alloc_file() failure exit in create_write_pipe() 2008-04-22 19:54:57 -04:00
pnode.c [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
pnode.h [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
posix_acl.c
quota_v1.c
quota_v2.c
quota.c Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) 2008-02-07 08:42:26 -08:00
read_write.c fs: use loff_t type instead of long long 2008-04-22 15:17:11 -07:00
read_write.h
readdir.c Use mutex_lock_killable in vfs_readdir 2007-12-06 17:39:54 -05:00
select.c trivial: small cleanups 2008-04-21 22:15:06 +00:00
seq_file.c [patch 2/7] vfs: mountinfo: add seq_file_root() 2008-04-23 00:04:38 -04:00
signalfd.c signalfd: fix for incorrect SI_QUEUE user data reporting 2008-04-11 08:06:44 -07:00
splice.c splice: fix infinite loop in generic_file_splice_read() 2008-04-10 08:24:25 +02:00
stack.c
stat.c Introduce path_put() 2008-02-14 21:13:33 -08:00
super.c [PATCH] move a bunch of declarations to fs/internal.h 2008-04-21 23:11:01 -04:00
sync.c
timerfd.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
utimes.c [PATCH] r/o bind mounts: elevate write count for do_utimes() 2008-04-19 00:29:24 -04:00
xattr_acl.c
xattr.c [PATCH] remove unused label in xattr.c (noise from ro-bind) 2008-04-23 00:04:04 -04:00