linux/mm
Vladimir Davydov 2edefe1155 memcg, slab: fix races in per-memcg cache creation/destruction
We obtain a per-memcg cache from a root kmem_cache by dereferencing an
entry of the root cache's memcg_params::memcg_caches array.  If we find
no cache for a memcg there on allocation, we initiate the memcg cache
creation (see memcg_kmem_get_cache()).  The cache creation proceeds
asynchronously in memcg_create_kmem_cache() in order to avoid lock
clashes, so there can be several threads trying to create the same
kmem_cache concurrently, but only one of them may succeed.  However, due
to a race in the code, it is not always true.  The point is that the
memcg_caches array can be relocated when we activate kmem accounting for
a memcg (see memcg_update_all_caches(), memcg_update_cache_size()).  If
memcg_update_cache_size() and memcg_create_kmem_cache() proceed
concurrently as described below, we can leak a kmem_cache.

Asume two threads schedule creation of the same kmem_cache.  One of them
successfully creates it.  Another one should fail then, but if
memcg_create_kmem_cache() interleaves with memcg_update_cache_size() as
follows, it won't:

  memcg_create_kmem_cache()             memcg_update_cache_size()
  (called w/o mutexes held)             (called with slab_mutex,
                                         set_limit_mutex held)
  -------------------------             -------------------------

  mutex_lock(&memcg_cache_mutex)

                                        s->memcg_params=kzalloc(...)

  new_cachep=cache_from_memcg_idx(cachep,idx)
  // new_cachep==NULL => proceed to creation

                                        s->memcg_params->memcg_caches[i]
                                            =cur_params->memcg_caches[i]

  // kmem_cache_create_memcg takes slab_mutex
  // so we will hang around until
  // memcg_update_cache_size finishes, but
  // nothing will prevent it from succeeding so
  // memcg_caches[idx] will be overwritten in
  // memcg_register_cache!

  new_cachep = kmem_cache_create_memcg(...)
  mutex_unlock(&memcg_cache_mutex)

Let's fix this by moving the check for existence of the memcg cache to
kmem_cache_create_memcg() to be called under the slab_mutex and make it
return NULL if so.

A similar race is possible when destroying a memcg cache (see
kmem_cache_destroy()).  Since memcg_unregister_cache(), which clears the
pointer in the memcg_caches array, is called w/o protection, we can race
with memcg_update_cache_size() and omit clearing the pointer.  Therefore
memcg_unregister_cache() should be moved before we release the
slab_mutex.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
..
backing-dev.c
balloon_compaction.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
bootmem.c
bounce.c
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
compaction.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
fremap.c mm: fix use-after-free in sys_remap_file_pages 2014-01-02 14:40:30 -08:00
frontswap.c
highmem.c
huge_memory.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
hugetlb_cgroup.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
hugetlb.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
hwpoison-inject.c mm/hwpoison: add '#' to hwpoison_inject 2014-01-21 16:19:48 -08:00
init-mm.c
internal.h mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
interval_tree.c
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter 2014-01-21 16:19:48 -08:00
memcontrol.c memcg, slab: fix races in per-memcg cache creation/destruction 2014-01-23 16:36:51 -08:00
memory_hotplug.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
memory-failure.c mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages 2014-01-21 16:19:49 -08:00
memory.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
mempolicy.c mm/mempolicy: fix !vma in new_vma_page() 2013-12-18 19:04:52 -08:00
mempool.c
migrate.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
mincore.c
mlock.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
mm_init.c
mmap.c mm/mmap.c: add mlock_future_check() helper 2014-01-21 16:19:44 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm: numa: do not automatically migrate KSM pages 2014-01-21 16:19:48 -08:00
mremap.c
msync.c
nobootmem.c mm/memblock: switch to use NUMA_NO_NODE instead of MAX_NUMNODES 2014-01-21 16:19:46 -08:00
nommu.c mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
oom_kill.c oom_kill: add rcu_read_lock() into find_lock_task_mm() 2014-01-21 16:19:46 -08:00
page_alloc.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
page_cgroup.c Merge branch 'akpm' (incoming from Andrew) 2014-01-21 19:05:45 -08:00
page_io.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
page_isolation.c
page-writeback.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c Merge branch 'akpm' (incoming from Andrew) 2014-01-21 19:05:45 -08:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
shmem.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
slab_common.c memcg, slab: fix races in per-memcg cache creation/destruction 2014-01-23 16:36:51 -08:00
slab.c
slab.h memcg, slab: fix barrier usage when accessing memcg_caches 2014-01-23 16:36:51 -08:00
slob.c
slub.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
swap_state.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
swap.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
swapfile.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
truncate.c
util.c mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
vmalloc.c mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page} 2014-01-21 16:19:44 -08:00
vmpressure.c
vmscan.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
vmstat.c
zbud.c
zswap.c mm/zswap.c: change params from hidden to ro 2014-01-23 16:36:50 -08:00