linux/fs
David Drysdale 51f39a1f0c syscalls: implement execveat() system call
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts).  The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.

Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns.  The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.

This patch (of 4):

Add a new execveat(2) system call.  execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers.  This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).

Based on patches by Meredydd Luff.

Signed-off-by: David Drysdale <drysdale@google.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:51 -08:00
..
9p assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
adfs adfs: add __printf verification, fix format/argument mismatches 2014-08-08 15:57:24 -07:00
affs assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
autofs4 assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
befs befs: remove dead code 2014-12-13 12:42:51 -08:00
bfs fs/bfs: use bfs prefix for dump_imap 2014-08-08 15:57:24 -07:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-12-12 11:15:23 -08:00
cachefiles assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
ceph Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
cifs Merge branch 'akpm' (patchbomb from Andrew) 2014-12-10 18:34:42 -08:00
coda move d_rcu from overlapping d_child to overlapping d_alias 2014-11-03 15:20:29 -05:00
configfs assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
cramfs fs/cramfs/inode.c: use linux/uaccess.h 2014-08-08 15:57:25 -07:00
debugfs Merge tag 'trace-seq-file-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into for-next 2014-11-19 13:02:53 -05:00
devpts
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
ecryptfs kill f_dentry uses 2014-11-19 13:01:25 -05:00
efivarfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
efs fs/efs/namei.c: return is not a function 2014-08-08 15:57:18 -07:00
exofs Boaz Harrosh - Fix broken email address 2014-10-19 20:22:32 +03:00
exportfs move d_rcu from overlapping d_child to overlapping d_alias 2014-11-03 15:20:29 -05:00
ext2 ext2: Convert to private i_dquot field 2014-11-10 10:06:10 +01:00
ext3 ext3: Convert to private i_dquot field 2014-11-10 10:06:10 +01:00
ext4 Lots of bugs fixes, including Zheng and Jan's extent status shrinker 2014-12-12 09:28:03 -08:00
f2fs f2fs: avoid to ra unneeded blocks in recover flow 2014-12-08 14:19:09 -08:00
fat fat: fix data past EOF resulting from fsx testsuite 2014-12-13 12:42:51 -08:00
freevxfs
fscache fs/fscache/object-list.c: use __seq_open_private() 2014-10-13 17:52:21 +01:00
fuse assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
hfs fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp 2014-12-10 17:41:16 -08:00
hfsplus
hostfs hostfs: support rename flags 2014-08-07 14:40:09 -04:00
hpfs fs/hpfs/dnode.c: fix suspect code indent 2014-08-08 15:57:22 -07:00
hppfs vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
hugetlbfs mm: convert i_mmap_mutex to rwsem 2014-12-13 12:42:45 -08:00
isofs isofs: avoid unused function warning 2014-11-19 13:09:37 -05:00
jbd jbd: Deletion of an unnecessary check before the function call "iput" 2014-11-18 10:15:29 +01:00
jbd2 Lots of bugs fixes, including Zheng and Jan's extent status shrinker 2014-12-12 09:28:03 -08:00
jffs2 [jffs2] kill wbuf_queued/wbuf_dwork_lock 2014-10-09 02:39:01 -04:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
kernfs switch d_materialise_unique() users to d_splice_alias() 2014-11-19 13:01:20 -05:00
lockd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
logfs fs/logfs/readwrite.c: kernel-doc warning fixes 2014-08-06 18:01:12 -07:00
minix minix zmap block counts calculation fix 2014-08-08 15:57:20 -07:00
ncpfs Merge branch 'akpm' (patchbomb from Andrew) 2014-12-10 18:34:42 -08:00
nfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
nfs_common lockd: move lockd's grace period handling into its own module 2014-09-17 16:33:11 -04:00
nfsd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
nilfs2 nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races 2014-12-10 17:41:16 -08:00
nls
notify Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
ntfs assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
ocfs2 Merge branch 'akpm' (patchbomb from Andrew) 2014-12-10 18:34:42 -08:00
omfs FS/OMFS: block number sanity check during fill_super operation 2014-10-14 02:18:22 +02:00
openpromfs
overlayfs Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
proc Merge branch 'akpm' (patchbomb from Andrew) 2014-12-10 18:34:42 -08:00
pstore pstore-ram: Allow optional mapping with pgprot_noncached 2014-12-11 13:38:31 -08:00
qnx4
qnx6 fs/qnx6: update debugging to current functions 2014-08-08 15:57:26 -07:00
quota vfs: Remove i_dquot field from inode 2014-11-10 10:06:18 +01:00
ramfs fs/ramfs/file-nommu.c: replace count*size kzalloc by kcalloc 2014-08-08 15:57:18 -07:00
reiserfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-12-12 10:08:06 -08:00
romfs fs/romfs/super.c: add blank line after declarations 2014-08-08 15:57:25 -07:00
squashfs fs/squashfs/super.c: logging cleanup 2014-08-06 18:01:13 -07:00
sysfs
sysv
ubifs UBIFS: fix a couple bugs in UBIFS xattr length calculation 2014-11-07 12:32:22 +02:00
udf udf: One function call less in udf_fill_super() after error detection 2014-11-19 21:56:06 +01:00
ufs fs/ufs/balloc.c: remove unused variable 2014-10-14 02:18:20 +02:00
xfs xfs: update for 3.19-rc1 2014-12-12 09:48:17 -08:00
aio.c Merge git://git.kvack.org/~bcrl/aio-fixes 2014-11-25 18:55:44 -08:00
anon_inodes.c
attr.c
bad_inode.c bad_inode: add ->rename2() 2014-08-07 14:40:09 -04:00
binfmt_aout.c assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
binfmt_elf_fdpic.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_elf.c Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2014-12-11 17:56:37 -08:00
binfmt_em86.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_flat.c
binfmt_misc.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_script.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_som.c
block_dev.c fs: add freeze_super/thaw_super fs hooks 2014-11-17 10:35:17 +00:00
buffer.c fs: clarify rate limit suppressed buffer I/O errors 2014-10-21 13:55:11 -06:00
char_dev.c fs/char_dev.c: remove pointless assignment from __register_chrdev_region() 2014-12-10 17:41:04 -08:00
compat_binfmt_elf.c
compat_ioctl.c
compat.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
coredump.c coredump: add %i/%I in core_pattern to report the tid of the crashed thread 2014-10-14 02:18:21 +02:00
dcache.c Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
dcookies.c
direct-io.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
drop_caches.c mm: vmscan: invoke slab shrinkers from shrink_zone() 2014-12-13 12:42:48 -08:00
eventfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
eventpoll.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
exec.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
fcntl.c security: make security_file_set_fowner, f_setown and __f_setown void return 2014-09-09 16:01:36 -04:00
fhandle.c
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
file.c fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) 2014-12-10 17:41:10 -08:00
filesystems.c
fs_pin.c make fs/{namespace,super}.c forget about acct.h 2014-08-07 14:40:09 -04:00
fs_struct.c
fs-writeback.c
inode.c mm: convert i_mmap_mutex to rwsem 2014-12-13 12:42:45 -08:00
internal.h vfs: export __inode_permission() to modules 2014-10-24 00:14:35 +02:00
ioctl.c fs: add freeze_super/thaw_super fs hooks 2014-11-17 10:35:17 +00:00
Kconfig overlay filesystem 2014-10-24 00:14:38 +02:00
Kconfig.binfmt binfmt_elf: allow arch code to examine PT_LOPROC ... PT_HIPROC headers 2014-11-24 07:45:02 +01:00
libfs.c move d_rcu from overlapping d_child to overlapping d_alias 2014-11-03 15:20:29 -05:00
locks.c locks: flock_make_lock should return a struct file_lock (or PTR_ERR) 2014-10-07 14:06:13 -04:00
Makefile ovl: rename filesystem type to "overlay" 2014-11-20 16:39:59 +01:00
mbcache.c
mount.h vfs: Add a function to lazily unmount all mounts from any dentry. 2014-10-09 02:38:55 -04:00
mpage.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
namei.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
namespace.c vfs: introduce clone_private_mount() 2014-10-24 00:14:36 +02:00
no-block.c
open.c new helper: audit_file() 2014-11-19 13:01:26 -05:00
pipe.c
pnode.c get rid of propagate_umount() mistakenly treating slaves as busy. 2014-08-30 18:31:41 -04:00
pnode.h
posix_acl.c
proc_namespace.c namespaces: Use task_lock and not rcu to protect nsproxy 2014-07-29 18:08:50 -07:00
read_write.c cachefiles_write_page(): switch to __kernel_write() 2014-10-09 02:39:05 -04:00
readdir.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
select.c
seq_file.c fs, seq_file: fallback to vmalloc instead of oom kill processes 2014-12-13 12:42:49 -08:00
signalfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
splice.c vfs: export do_splice_direct() to modules 2014-10-24 00:14:35 +02:00
stack.c fs: fix comment for 'CONFIG_LBADF' 2014-08-26 09:35:56 +02:00
stat.c
statfs.c
super.c vfs: Remove i_dquot field from inode 2014-11-10 10:06:18 +01:00
sync.c kill f_dentry uses 2014-11-19 13:01:25 -05:00
timerfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
utimes.c
xattr.c new helper: audit_file() 2014-11-19 13:01:26 -05:00