linux/mm
Martin Hicks bfbb38fb80 [PATCH] VM: add may_swap flag to scan_control
Here's the next round of these patches.  These are totally different in
an attempt to meet the "simpler" request after the last patches.  For
reference the earlier threads are:

http://marc.theaimsgroup.com/?l=linux-kernel&m=110839604924587&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=111461480721249&w=2

This set of patches replaces my other vm- patches that are currently in
-mm.  So they're against 2.6.12-rc5-mm1 about half way through the -mm
patchset.

As I said already this patch is a lot simpler.  The reclaim is turned on
or off on a per-zone basis using a syscall.  I haven't tested the x86
syscall, so it might be wrong.  It uses the existing reclaim/pageout
code with the small addition of a may_swap flag to scan_control
(patch 1/4).

I also added __GFP_NORECLAIM (patch 3/4) so that certain allocation
types can be flagged to never cause reclaim.  This was a deficiency
that was in all of my earlier patch sets.  Previously, doing a big
buffered read would fill one zone with page cache and then start to
reclaim from that same zone, leaving the other zones untouched.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c

This patch:

This adds an extra switch to the scan_control struct.  It simply lets the
reclaim code know if its allowed to swap pages out.

This was required for a simple per-zone reclaimer.  Without this addition
pages would be swapped out as soon as a zone ran out of memory and the early
reclaim kicked in.

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
..
bootmem.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
fadvise.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
filemap.c [PATCH] broken fault_in_pages_readable call in generic_file_buffered_write() 2005-06-06 14:42:23 -07:00
fremap.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
highmem.c [PATCH] count bounce buffer pages in vmstat 2005-05-01 08:58:37 -07:00
hugetlb.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
internal.h Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
madvise.c [PATCH] madvise: merge the maps 2005-06-21 18:46:13 -07:00
Makefile Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
memory.c [PATCH] do_swap_page() can map random data if swap read fails 2005-05-17 07:59:20 -07:00
mempolicy.c [PATCH] mempolicy.c GFP fix 2005-04-24 12:28:34 -07:00
mempool.c [PATCH] use smp_mb/wmb/rmb where possible 2005-05-01 08:58:47 -07:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
mmap.c Fix get_unmapped_area sanity tests 2005-05-19 22:43:37 -07:00
mprotect.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
mremap.c [PATCH] mm acct accounting fix 2005-05-17 07:59:12 -07:00
msync.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
nommu.c [PATCH] mm/nommu.c: try to fix __vmalloc 2005-05-17 07:59:17 -07:00
oom_kill.c [PATCH] oom-killer disable for iscsi/lvm2/multipath userland critical sections 2005-04-16 15:24:05 -07:00
page_alloc.c [PATCH] mm: add /proc/zoneinfo 2005-06-21 18:46:14 -07:00
page_io.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
page-writeback.c [PATCH] DocBook: fix some descriptions 2005-05-01 08:59:26 -07:00
pdflush.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
rmap.c [PATCH] try_to_unmap_cluster() passes out-of-bounds pte to pte_unmap() 2005-05-24 20:08:13 -07:00
shmem.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
slab.c [SLAB] Introduce kmem_cache_name 2005-06-18 22:46:19 -07:00
swap_state.c [PATCH] mm: use __GFP_NOMEMALLOC 2005-05-01 08:58:37 -07:00
swap.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
swapfile.c [PATCH] swapout oops fix 2005-05-17 07:59:18 -07:00
thrash.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
tiny-shmem.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
truncate.c [PATCH] DocBook: fix some descriptions 2005-05-01 08:59:26 -07:00
vmalloc.c [PATCH] x86_64: Fixed guard page handling again in iounmap 2005-05-20 15:48:20 -07:00
vmscan.c [PATCH] VM: add may_swap flag to scan_control 2005-06-21 18:46:14 -07:00