linux/mm
Christoph Lameter 0697212a41 [PATCH] Swapless page migration: add R/W migration entries
Implement read/write migration ptes

We take the upper two swapfiles for the two types of migration ptes and define
a series of macros in swapops.h.

The VM is modified to handle the migration entries.  migration entries can
only be encountered when the page they are pointing to is locked.  This limits
the number of places one has to fix.  We also check in copy_pte_range and in
mprotect_pte_range() for migration ptes.

We check for migration ptes in do_swap_cache and call a function that will
then wait on the page lock.  This allows us to effectively stop all accesses
to apge.

Migration entries are created by try_to_unmap if called for migration and
removed by local functions in migrate.c

From: Hugh Dickins <hugh@veritas.com>

  Several times while testing swapless page migration (I've no NUMA, just
  hacking it up to migrate recklessly while running load), I've hit the
  BUG_ON(!PageLocked(p)) in migration_entry_to_page.

  This comes from an orphaned migration entry, unrelated to the current
  correctly locked migration, but hit by remove_anon_migration_ptes as it
  checks an address in each vma of the anon_vma list.

  Such an orphan may be left behind if an earlier migration raced with fork:
  copy_one_pte can duplicate a migration entry from parent to child, after
  remove_anon_migration_ptes has checked the child vma, but before it has
  removed it from the parent vma.  (If the process were later to fault on this
  orphaned entry, it would hit the same BUG from migration_entry_wait.)

  This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
  not.  There's no such problem with file pages, because vma_prio_tree_add
  adds child vma after parent vma, and the page table locking at each end is
  enough to serialize.  Follow that example with anon_vma: add new vmas to the
  tail instead of the head.

  (There's no corresponding problem when inserting migration entries,
  because a missed pte will leave the page count and mapcount high, which is
  allowed for.  And there's no corresponding problem when migrating via swap,
  because a leftover swap entry will be correctly faulted.  But the swapless
  method has no refcounting of its entries.)

From: Ingo Molnar <mingo@elte.hu>

  pte_unmap_unlock() takes the pte pointer as an argument.

From: Hugh Dickins <hugh@veritas.com>

  Several times while testing swapless page migration, gcc has tried to exec
  a pointer instead of a string: smells like COW mappings are not being
  properly write-protected on fork.

  The protection in copy_one_pte looks very convincing, until at last you
  realize that the second arg to make_migration_entry is a boolean "write",
  and SWP_MIGRATION_READ is 30.

  Anyway, it's better done like in change_pte_range, using
  is_write_migration_entry and make_migration_entry_read.

From: Hugh Dickins <hugh@veritas.com>

  Remove unnecessary obfuscation from sys_swapon's range check on swap type,
  which blew up causing memory corruption once swapless migration made
  MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
From: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:50 -07:00
..
bootmem.c [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory 2006-04-09 11:53:16 -07:00
fadvise.c [PATCH] sys_sync_file_range() 2006-03-31 12:18:54 -08:00
filemap_xip.c [PATCH] replace inode_update_time with file_update_time 2006-01-10 08:01:30 -08:00
filemap.c [PATCH] writeback: fix range handling 2006-06-23 07:42:49 -07:00
filemap.h [PATCH] xip: reduce code duplication 2005-06-24 00:06:41 -07:00
fremap.c VM: add common helper function to create the page tables 2005-11-29 14:03:14 -08:00
highmem.c BUG_ON() Conversion in mm/highmem.c 2006-04-02 13:47:35 +02:00
hugetlb.c [PATCH] tightening hugetlb strict accounting 2006-06-23 07:42:48 -07:00
internal.h [PATCH] remove set_page_count() outside mm/ 2006-03-22 07:54:02 -08:00
Kconfig [PATCH] mm: make page migration dependent on swap and NUMA 2006-03-25 08:22:50 -08:00
madvise.c [PATCH] Fix MADV_REMOVE protection checking 2006-04-17 18:22:18 -07:00
Makefile [PATCH] uninline zone helpers 2006-03-27 08:44:48 -08:00
memory_hotplug.c [PATCH] wait_table and zonelist initializing for memory hotadd: update zonelists 2006-06-23 07:42:46 -07:00
memory.c [PATCH] Swapless page migration: add R/W migration entries 2006-06-23 07:42:50 -07:00
mempolicy.c [PATCH] Remove cond_resched in gather_stats() 2006-04-20 07:54:03 -07:00
mempool.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-03-26 09:41:18 -08:00
migrate.c [PATCH] Swapless page migration: add R/W migration entries 2006-06-23 07:42:50 -07:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
mmap.c [PATCH] overcommit: use totalreserve_pages 2006-04-11 06:18:32 -07:00
mmzone.c [PATCH] uninline zone helpers 2006-03-27 08:44:48 -08:00
mprotect.c [PATCH] Swapless page migration: add R/W migration entries 2006-06-23 07:42:50 -07:00
mremap.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
msync.c The comment describing how MS_ASYNC works in msync.c is confusing 2006-03-24 18:30:53 +01:00
nommu.c [PATCH] overcommit: use totalreserve_pages for nommu 2006-04-11 06:18:32 -07:00
oom_kill.c [PATCH] mm: fix typos in comments in mm/oom_kill.c 2006-06-23 07:42:47 -07:00
page_alloc.c [PATCH] squash duplicate page_to_pfn and pfn_to_page 2006-06-23 07:42:47 -07:00
page_io.c [PATCH] mm: split page table lock 2005-10-29 21:40:42 -07:00
page-writeback.c [PATCH] writeback: fix range handling 2006-06-23 07:42:49 -07:00
pdflush.c [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap 2006-01-08 20:12:41 -08:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] ext3_readdir: use generic readahead 2006-03-23 07:38:09 -08:00
rmap.c [PATCH] Swapless page migration: add R/W migration entries 2006-06-23 07:42:50 -07:00
shmem.c [PATCH] migration: remove unnecessary PageSwapCache checks 2006-06-23 07:42:46 -07:00
slab.c [PATCH] slab: redzone double-free detection 2006-06-23 07:42:49 -07:00
slob.c [PATCH] mm/slob.c: for_each_possible_cpu(), not NR_CPUS 2006-04-19 09:13:49 -07:00
sparse.c [PATCH] SPARSEMEM incorrectly calculates section number 2006-05-21 12:59:17 -07:00
swap_state.c BUG_ON() Conversion in mm/swap_state.c 2006-04-01 01:25:12 +02:00
swap.c [PATCH] for_each_possible_cpu: fixes for generic part 2006-03-28 09:16:05 -08:00
swapfile.c [PATCH] Swapless page migration: add R/W migration entries 2006-06-23 07:42:50 -07:00
thrash.c [PATCH] temporarily disable swap token on memory pressure 2005-11-28 14:42:25 -08:00
tiny-shmem.c [PATCH] do_truncate() call fix in tiny-shmem.c 2006-01-12 09:08:49 -08:00
truncate.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
util.c [PATCH] slab: optimize constant-size kzalloc calls 2006-03-25 08:22:49 -08:00
vmalloc.c [PATCH] mm: introduce remap_vmalloc_range() 2006-06-23 07:42:49 -07:00
vmscan.c [PATCH] writeback: fix range handling 2006-06-23 07:42:49 -07:00