linux/mm
Konstantin Khlebnikov e9714acf8c mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas
Currently the kernel sets mm->exe_file during sys_execve() and then tracks
number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon
as this counter drops to zero kernel resets mm->exe_file to NULL.  Plus it
resets mm->exe_file at last mmput() when mm->mm_users drops to zero.

VMA with VM_EXECUTABLE flag appears after mapping file with flag
MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
splitting, because sys_mmap ignores this flag.  Usually binfmt module sets
mm->exe_file and mmaps executable vmas with this file, they hold
mm->exe_file while task is running.

comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
where all this stuff was introduced:

> The kernel implements readlink of /proc/pid/exe by getting the file from
> the first executable VMA.  Then the path to the file is reconstructed and
> reported as the result.
>
> Because of the VMA walk the code is slightly different on nommu systems.
> This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
> walking the VMAs to find the first executable file-backed VMA we store a
> reference to the exec'd file in the mm_struct.
>
> That reference would prevent the filesystem holding the executable file
> from being unmounted even after unmapping the VMAs.  So we track the number
> of VM_EXECUTABLE VMAs and drop the new reference when the last one is
> unmapped.  This avoids pinning the mounted filesystem.

exe_file's vma accounting is hooked into every file mmap/unmmap and vma
split/merge just to fix some hypothetical pinning fs from umounting by mm,
which already unmapped all its executable files, but still alive.

Seems like currently nobody depends on this behaviour.  We can try to
remove this logic and keep mm->exe_file until final mmput().

mm->exe_file is still protected with mm->mmap_sem, because we want to
change it via new sys_prctl(PR_SET_MM_EXE_FILE).  Also via this syscall
task can change its mm->exe_file and unpin mountpoint explicitly.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:18 +09:00
..
backing-dev.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
bootmem.c bootmem: Fix the short description of reserve_bootmem() 2012-08-27 07:57:21 -07:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm: compaction: Abort async compaction if locks are contended or taking too long 2012-08-21 16:45:03 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: fix implicit stat.h usage in dmapool.c 2011-10-31 09:20:12 -04:00
fadvise.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
filemap.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
fremap.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
frontswap.c frontswap: support exclusive gets if tmem backend is capable 2012-09-21 10:38:12 -04:00
highmem.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
huge_memory.c mm: kill vma flag VM_INSERTPAGE 2012-10-09 16:22:17 +09:00
hugetlb_cgroup.c hugetlb/cgroup: remove exclude and wakeup rmdir calls from migrate 2012-07-31 18:42:41 -07:00
hugetlb.c mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables 2012-07-31 18:42:50 -07:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: compaction: Abort async compaction if locks are contended or taking too long 2012-08-21 16:45:03 -07:00
Kconfig mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: Replace list_for_each_continue_rcu with new interface 2012-09-23 07:42:52 -07:00
ksm.c mm: kill vma flag VM_INSERTPAGE 2012-10-09 16:22:17 +09:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: Hold a file reference in madvise_remove 2012-07-06 10:34:38 -07:00
Makefile mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
memblock.c mm/memblock: Use NULL instead of 0 for pointers 2012-09-05 08:32:30 +02:00
memcontrol.c cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them 2012-09-14 12:01:16 -07:00
memory_hotplug.c memory hotplug: fix section info double registration bug 2012-09-17 15:00:38 -07:00
memory-failure.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
memory.c mm: kill vma flag VM_INSERTPAGE 2012-10-09 16:22:17 +09:00
mempolicy.c Remove user-triggerable BUG from mpol_to_str 2012-09-06 09:37:58 -07:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm: memcg: fix compaction/migration failing due to memcg limits 2012-07-31 18:42:48 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas 2012-10-09 16:22:18 +09:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm: mmu_notifier: fix freed page still mapped in secondary MMU 2012-07-31 18:42:49 -07:00
mmzone.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: account the total_vm in the vm_stat_account() 2012-07-31 18:42:39 -07:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-11 16:04:50 -07:00
nommu.c mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas 2012-10-09 16:22:18 +09:00
oom_kill.c mm, memcg: move all oom handling to memcontrol.c 2012-07-31 18:42:45 -07:00
page_alloc.c mm: remove __GFP_NO_KSWAPD 2012-10-09 16:22:15 +09:00
page_cgroup.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c memory-hotplug: fix kswapd looping forever problem 2012-07-31 18:42:45 -07:00
page-writeback.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c sections: fix section conflicts in mm/percpu.c 2012-10-06 03:04:44 +09:00
pgtable-generic.c arch/tile: allow building Linux with transparent huge pages enabled 2012-05-25 12:48:21 -04:00
prio_tree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
rmap.c mm: remove swap token code 2012-05-29 16:22:19 -07:00
shmem.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
slab_common.c Revert "mm/sl[aou]b: Move sysfs_slab_add to common" 2012-09-05 12:07:44 +03:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-10-07 07:53:13 +09:00
slab.h Revert "mm/sl[aou]b: Move sysfs_slab_add to common" 2012-09-05 12:07:44 +03:00
slob.c Merge branch 'slab/tracing' into slab/for-linus 2012-10-03 09:57:17 +03:00
slub.c Merge branch 'slab/common-for-cgroups' into slab/for-linus 2012-10-03 09:56:37 +03:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm/sparse: remove index_init_lock 2012-07-31 18:42:49 -07:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swap.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
swapfile.c mm: swapfile: clean up unuse_pte race handling 2012-07-31 18:42:48 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c mm: Use __do_krealloc to do the krealloc job 2012-09-04 10:22:58 +03:00
vmalloc.c mm: make vb_alloc() more foolproof 2012-07-31 18:42:39 -07:00
vmscan.c memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails 2012-09-17 15:00:37 -07:00
vmstat.c workqueue: make deferrable delayed_work initializer names consistent 2012-08-21 13:18:23 -07:00