linux/mm
Hugh Dickins fb59e9f1e9 memcg: fix oops on NULL lru list
While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
 I couldn't see what racing tasks on other cpus were doing, but surmise that
another must have been in mem_cgroup_charge_common on the same page, between
its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
which I'd almost changed to a kmalloc).

Normally such a race cannot happen, the ref_cnt prevents it, the final
uncharge cannot race with the initial charge.  But force_empty buggers the
ref_cnt, that's what it's all about; and thereafter forced pages are
vulnerable to races such as this (just think of a shared page also mapped into
an mm of another mem_cgroup than that just emptied).  And remain vulnerable
until they're freed indefinitely later.

This patch just fixes the oops by moving the unlock_page_cgroups down below
adding to and removing from the list (only possible given the previous patch);
and while we're at it, we might as well make it an invariant that
page->page_cgroup is always set while pc is on lru.

But this behaviour of force_empty seems highly unsatisfactory to me: why have
a ref_cnt if we always have to cope with it being violated (as in the earlier
page migration patch).  We may prefer force_empty to move pages to an orphan
mem_cgroup (could be the root, but better not), from which other cgroups could
recover them; we might need to reverse the locking again; but no time now for
such concerns.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:15 -08:00
..
allocpercpu.c alloc_percpu() fails to allocate percpu data 2008-03-04 16:35:11 -08:00
backing-dev.c mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init 2007-12-05 09:21:18 -08:00
bootmem.c Introduce flags for reserve_bootmem() 2008-02-07 08:42:25 -08:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
dmapool.c pool: Improve memory usage for devices which can't cross boundaries 2007-12-04 10:39:58 -05:00
fadvise.c check ADVICE of fadvise64_64 even if get_xip_page is given 2008-02-05 09:44:19 -08:00
filemap_xip.c Use pgoff_t instead of unsigned long 2008-02-08 09:22:32 -08:00
filemap.c remove final fastcall users 2008-02-13 16:21:18 -08:00
fremap.c sys_remap_file_pages: fix ->vm_file accounting 2008-02-05 09:44:07 -08:00
highmem.c mm: remove fastcall from mm/ 2008-02-05 09:44:18 -08:00
hugetlb.c hugetlb: ensure we do not reference a surplus page after handing it to buddy 2008-02-23 17:12:13 -08:00
internal.h Solve section mismatch for free_area_init_core. 2008-02-23 17:13:24 -08:00
Kconfig sh: Bump number of quicklists for SH-5. 2008-01-28 13:18:55 +09:00
madvise.c
Makefile Memory controller: rename to Memory Resource Controller 2008-03-04 16:35:12 -08:00
memcontrol.c memcg: fix oops on NULL lru list 2008-03-04 16:35:15 -08:00
memory_hotplug.c Page allocator: clean up pcp draining functions 2008-02-05 09:44:17 -08:00
memory.c memcg: when do_swap's do_wp_page fails 2008-03-04 16:35:14 -08:00
mempolicy.c d_path: Make seq_path() use a struct path argument 2008-02-14 21:17:08 -08:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c memcg: fix VM_BUG_ON from page migration 2008-03-04 16:35:14 -08:00
mincore.c
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mmap.c mm: special mapping nopage 2008-02-08 18:57:39 -08:00
mmzone.c
mprotect.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c
nommu.c nommu: add new vmalloc_user() and remap_vmalloc_range() interfaces. 2008-02-05 09:44:21 -08:00
oom_kill.c Memory controller: rename to Memory Resource Controller 2008-03-04 16:35:12 -08:00
page_alloc.c memcg: bad page if page_cgroup when free 2008-03-04 16:35:15 -08:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
page-writeback.c writeback: speed up writeback of big dirty files 2008-02-05 09:44:19 -08:00
pagewalk.c maps4: introduce a generic page walker 2008-02-05 09:44:16 -08:00
pdflush.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: bdi init hooks 2007-10-17 08:42:45 -07:00
rmap.c memcg: mm_match_cgroup not vm_match_cgroup 2008-03-04 16:35:14 -08:00
shmem_acl.c
shmem.c memcg: mem_cgroup_charge never NULL 2008-03-04 16:35:15 -08:00
slab.c slab: avoid double initialization & do initialization in 1 place 2008-02-14 15:30:01 -08:00
slob.c slob: reduce external fragmentation by using three free lists 2008-02-05 09:44:19 -08:00
slub.c slub: fix possible NULL pointer dereference 2008-03-03 12:22:32 -08:00
sparse-vmemmap.c memory hotplug fix: fix section mismatch in vmammap_allock_block() 2007-11-29 09:24:54 -08:00
sparse.c mm: fix section mismatch warning in sparse.c 2008-02-05 09:44:19 -08:00
swap_state.c memcgroup: revert swap_state mods 2008-02-07 08:42:20 -08:00
swap.c memcg: move_lists on page not page_cgroup 2008-03-04 16:35:14 -08:00
swapfile.c d_path: Make seq_path() use a struct path argument 2008-02-14 21:17:08 -08:00
thrash.c
tiny-shmem.c Remove unused code from mm/tiny-shmem.c 2008-02-05 09:44:17 -08:00
truncate.c docbook: fix kernel-api source files 2008-03-03 10:47:14 -08:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c CONFIG_HIGHPTE vs. sub-page page tables. 2008-02-08 09:22:42 -08:00
vmscan.c memcg: move_lists on page not page_cgroup 2008-03-04 16:35:14 -08:00
vmstat.c vmstat: remove prefetch 2008-02-05 09:44:18 -08:00