linux/mm
Hugh Dickins c4cc6d07b2 swapin_readahead: excise NUMA bogosity
For three years swapin_readahead has been cluttered with fanciful CONFIG_NUMA
code, advancing addr, and stepping on to the next vma at the boundary, to line
up the mempolicy for each page allocation.

It _might_ be a good idea to allocate swap more according to vma layout; but
the fact is, that's not how we do it at all, 2.6 even less than 2.4: swap is
allocated as needed for pages as they sink to the bottom of the inactive LRUs.
 Sometimes that may match vma layout, but not so often that it's worth going
to these misleading vma->vm_next lengths: rip all that out.

Originally I intended to retain the incrementation of addr, but correct its
initial value: valid_swaphandles generally supplies an offset below the target
addr (this is readaround rather than readahead), but addr has not been
adjusted accordingly, so in the interleave case it has usually been allocating
the target page from the "wrong" node (though that may not matter very much).

But look at the equivalent shmem_swapin code: either by oversight or by
design, though it has all the apparatus for choosing a new mempolicy per page,
it uses the same idx throughout, choosing the same mempolicy and interleave
node for each page of the cluster.

Which is actually a much better strategy: each node has its own LRUs and its
own kswapd, so if you're betting on any particular relationship between swap
and node, the best bet is that nearby swap entries belong to pages from the
same node - even when the mempolicy of the target page is to interleave.  And
examining a map of nodes corresponding to swap entries on a numa=fake system
bears this out.  (We could later tweak swap allocation to make it even more
likely, but this patch is merely about removing cruft.)

So, neither adjust nor increment addr in swapin_readahead, and then
shmem_swapin can use it too; the pseudo-vma to pass policy need only be set up
once per cluster, and so few fields of pvma are used, let's skip the memset -
from shmem_alloc_page also.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:14 -08:00
..
allocpercpu.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
backing-dev.c mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init 2007-12-05 09:21:18 -08:00
bootmem.c
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
fadvise.c
filemap_xip.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
filemap.c fix writev regression: pan hanging unkillable and un-straceable 2008-02-03 07:55:39 +11:00
fremap.c sys_remap_file_pages: fix ->vm_file accounting 2008-02-05 09:44:07 -08:00
highmem.c Create the ZONE_MOVABLE zone 2007-07-17 10:22:59 -07:00
hugetlb.c fix hugepages leak due to pagetable page sharing 2008-01-24 08:07:27 -08:00
internal.h Breakout page_order() to internal.h to avoid special knowledge of the buddy allocator 2007-10-16 09:43:01 -07:00
Kconfig sh: Bump number of quicklists for SH-5. 2008-01-28 13:18:55 +09:00
madvise.c speed up madvise_need_mmap_write() usage 2007-07-16 09:05:36 -07:00
Makefile memory unplug: page isolation 2007-10-16 09:43:02 -07:00
memory_hotplug.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
memory.c swapin_readahead: excise NUMA bogosity 2008-02-05 09:44:14 -08:00
mempolicy.c Migration: find correct vma in new_vma_page() 2007-11-14 18:45:38 -08:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c Typo fixes retrun -> return 2007-10-20 02:13:26 +02:00
mincore.c
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mmap.c vm audit: add VM_DONTEXPAND to mmap for drivers that need it 2008-02-04 07:55:38 -08:00
mmzone.c
mprotect.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c vmalloc: add const to void* parameters 2008-02-05 09:44:14 -08:00
oom_kill.c sched: sched_rt_entity 2008-01-25 21:08:27 +01:00
page_alloc.c mm: fix section mismatch warning in page_alloc.c 2008-01-17 15:38:58 -08:00
page_io.c Drop 'size' argument from bio_endio and bi_end_io 2007-10-10 09:25:57 +02:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
page-writeback.c Revert "writeback: introduce writeback_control.more_io to indicate more io" 2008-01-14 21:21:29 -08:00
pdflush.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: bdi init hooks 2007-10-17 08:42:45 -07:00
rmap.c [S390] Optimize storage key handling for anonymous pages 2007-11-20 11:13:46 +01:00
shmem_acl.c
shmem.c swapin_readahead: excise NUMA bogosity 2008-02-05 09:44:14 -08:00
slab.c cpu-hotplug: replace per-subsystem mutexes with get_online_cpus() 2008-01-25 21:08:02 +01:00
slob.c Avoid double memclear() in SLOB/SLUB 2007-12-09 10:17:52 -08:00
slub.c SLUB: Do not upset lockdep 2008-02-04 10:56:02 -08:00
sparse-vmemmap.c memory hotplug fix: fix section mismatch in vmammap_allock_block() 2007-11-29 09:24:54 -08:00
sparse.c is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
swap_state.c mm: clarify __add_to_swap_cache locking 2007-10-16 09:42:53 -07:00
swap.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
swapfile.c Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION 2007-07-29 16:45:38 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c r/o bind mounts: filesystem helpers for custom 'struct file's 2007-10-17 08:43:04 -07:00
truncate.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c vmalloc: clean up page array indexing 2008-02-05 09:44:14 -08:00
vmscan.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
vmstat.c vmstat: fix section mismatch warning 2007-11-14 18:45:42 -08:00