linux/fs
Eric W. Biederman 6e4eab577a fs: Add user namespace member to struct super_block
Start marking filesystems with a user namespace owner, s_user_ns.  In
this change this is only used for permission checks of who may mount a
filesystem.  Ultimately s_user_ns will be used for translating ids and
checking capabilities for filesystems mounted from user namespaces.

The default policy for setting s_user_ns is implemented in sget(),
which arranges for s_user_ns to be set to current_user_ns() and to
ensure that the mounter of the filesystem has CAP_SYS_ADMIN in that
user_ns.

The guts of sget are split out into another function sget_userns().
The function sget_userns calls alloc_super with the specified user
namespace or it verifies the existing superblock that was found
has the expected user namespace, and fails with EBUSY when it is not.
This failing prevents users with the wrong privileges mounting a
filesystem.

The reason for the split of sget_userns from sget is that in some
cases such as mount_ns and kernfs_mount_ns a different policy for
permission checking of mounts and setting s_user_ns is necessary, and
the existence of sget_userns() allows those policies to be
implemented.

The helper mount_ns is expected to be used for filesystems such as
proc and mqueuefs which present per namespace information.  The
function mount_ns is modified to call sget_userns instead of sget to
ensure the user namespace owner of the namespace whose information is
presented by the filesystem is used on the superblock.

For sysfs and cgroup the appropriate permission checks are already in
place, and kernfs_mount_ns is modified to call sget_userns so that
the init_user_ns is the only user namespace used.

For the cgroup filesystem cgroup namespace mounts are bind mounts of a
subset of the full cgroup filesystem and as such s_user_ns must be the
same for all of them as there is only a single superblock.

Mounts of sysfs that vary based on the network namespace could in principle
change s_user_ns but it keeps the analysis and implementation of kernfs
simpler if that is not supported, and at present there appear to be no
benefits from supporting a different s_user_ns on any sysfs mount.

Getting the details of setting s_user_ns correct has been
a long process.  Thanks to Pavel Tikhorirorv who spotted a leak
in sget_userns.  Thanks to Seth Forshee who has kept the work alive.

Thanks-to: Seth Forshee <seth.forshee@canonical.com>
Thanks-to: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-06-23 15:41:55 -05:00
..
9p switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
adfs
affs affs: fix remount failure when there are no options changed 2016-05-28 16:50:24 -07:00
afs remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
autofs4 dcache_{readdir,dir_lseek}() users: switch to ->iterate_shared 2016-05-02 19:49:32 -04:00
befs fs/befs/io.c:befs_bread(): remove unneeded initialization to NULL 2016-05-23 17:04:14 -07:00
bfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
btrfs Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-06-04 11:56:28 -07:00
cachefiles FS-Cache: make check_consistency callback return int 2016-06-01 10:29:39 +02:00
ceph ceph: use i_version to check validity of fscache 2016-06-01 10:32:14 +02:00
cifs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
coda introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
configfs configfs_readdir(): make safe under shared lock 2016-05-09 11:41:13 -04:00
cramfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
crypto fscrypto/f2fs: allow fs-specific key prefix for fs encryption 2016-05-07 10:32:33 -07:00
debugfs Merge 4.6-rc4 into driver-core-next 2016-04-19 04:28:28 +09:00
devpts devpts: Make each mount of devpts an independent filesystem. 2016-06-05 10:36:01 -07:00
dlm mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
ecryptfs switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
efivarfs fs/efivarfs/inode.c: use generic UUID library 2016-05-20 17:58:30 -07:00
efs fs/efs/super.c: fix return value 2016-05-20 17:58:30 -07:00
exofs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2016-05-17 17:05:30 -07:00
exportfs introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
f2fs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
fat Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
freevxfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
fscache FS-Cache: wake write waiter after invalidating writes 2016-06-01 10:29:09 +02:00
fuse switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
hfs switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
hfsplus switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
hostfs hostfs: switch to ->iterate_shared() 2016-05-12 19:49:30 -04:00
hpfs hpfs: implement the show_options method 2016-05-28 16:50:24 -07:00
hugetlbfs mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
isofs Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
jbd2 Fix a number of bugs, most notably a potential stale data exposure 2016-05-24 12:55:26 -07:00
jffs2 switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
jfs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
kernfs fs: Add user namespace member to struct super_block 2016-06-23 15:41:55 -05:00
lockd
logfs logfs: no need to lock directory in lseek 2016-05-09 11:42:19 -04:00
minix simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
ncpfs mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
nfs nfs: fix anonymous member initializer build failure with older compilers 2016-05-27 17:20:27 -07:00
nfs_common
nfsd vfs: Pass data, ns, and ns->userns to mount_ns 2016-06-23 15:41:53 -05:00
nilfs2 nilfs2: fix block comments 2016-05-23 17:04:14 -07:00
nls
notify fsnotify: avoid spurious EMFILE errors from inotify_init() 2016-05-19 19:12:14 -07:00
ntfs fs: simplify the generic_write_sync prototype 2016-05-01 19:58:39 -04:00
ocfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
omfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
openpromfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
orangefs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
overlayfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
proc proc: Convert proc_mount to use mount_ns. 2016-06-23 15:41:54 -05:00
pstore mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
qnx4 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
qnx6 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
quota fs/quota: use nla_put_u64_64bit() 2016-04-26 12:00:48 -04:00
ramfs tmpfs/ramfs: fix VM_MAYSHARE mappings for NOMMU 2016-05-20 17:58:30 -07:00
reiserfs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
romfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
squashfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
sysfs mnt: Refactor fs_fully_visible into mount_too_revealing 2016-06-23 15:41:46 -05:00
sysv simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
tracefs
ubifs This pull request contains mostly cleanups and minor 2016-05-27 18:49:29 -07:00
udf Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
ufs simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
xfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
aio.c aio: make aio_setup_ring killable 2016-05-23 17:04:14 -07:00
anon_inodes.c
attr.c
bad_inode.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
binfmt_aout.c fs: fix binfmt_aout.c build error 2016-05-28 16:34:59 -07:00
binfmt_elf_fdpic.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:51:59 -07:00
binfmt_elf.c mm: remove more IS_ERR_VALUE abuses 2016-05-27 15:57:31 -07:00
binfmt_em86.c
binfmt_flat.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
binfmt_misc.c
binfmt_script.c
block_dev.c DAX error handling for 4.7 2016-05-26 19:34:26 -07:00
buffer.c mm, page_alloc: avoid looking up the first zone in a zonelist twice 2016-05-19 19:12:14 -07:00
char_dev.c chrdev: emit a warning when we go below dynamic major range 2016-03-29 10:11:44 -07:00
compat_binfmt_elf.c
compat_ioctl.c Merge 4.5-rc4 into char-misc-next 2016-02-14 14:25:59 -08:00
compat.c Fix a number of bugs, most notably a potential stale data exposure 2016-05-24 12:55:26 -07:00
coredump.c coredump: make coredump_wait wait for mmap_sem for write killable 2016-05-23 17:04:14 -07:00
dax.c Filesystem DAX locking for 4.7 2016-05-26 20:00:28 -07:00
dcache.c Merge branch 'hash' of git://ftp.sciencehorizons.net/linux 2016-05-28 16:15:25 -07:00
dcookies.c
direct-io.c direct-io: fix direct write stale data exposure from concurrent buffered read 2016-05-27 14:49:37 -07:00
drop_caches.c
eventfd.c eventfd: document lockless access in eventfd_poll 2016-03-22 15:36:02 -07:00
eventpoll.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
exec.c exec: make exec path waiting for mmap_sem killable 2016-05-23 17:04:14 -07:00
fcntl.c
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
file_table.c
file.c give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() 2016-05-02 19:49:28 -04:00
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c mm,writeback: don't use memory reserves for wb_start_writeback 2016-05-20 17:58:30 -07:00
inode.c parallel lookups: actual switch to rwsem 2016-05-02 19:49:28 -04:00
internal.h
ioctl.c
Kconfig dax: Make huge page handling depend of CONFIG_BROKEN 2016-05-19 15:13:17 -06:00
Kconfig.binfmt ELF/MIPS build fix 2016-05-23 17:04:14 -07:00
libfs.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
locks.c
Makefile Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux 2016-03-26 12:59:04 -07:00
mbcache.c mbcache: add reusable flag to cache entries 2016-02-22 22:44:04 -05:00
mount.h
mpage.c mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
namei.c devpts: Make each mount of devpts an independent filesystem. 2016-06-05 10:36:01 -07:00
namespace.c mnt: Refactor fs_fully_visible into mount_too_revealing 2016-06-23 15:41:46 -05:00
no-block.c
nsfs.c
open.c Merge branch 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 14:41:03 -07:00
pipe.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
pnode.c propogate_mnt: Handle the first propogated copy being a slave 2016-05-05 09:54:45 -05:00
pnode.h
posix_acl.c switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-03-16 13:09:08 -04:00
read_write.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:46:23 -07:00
readdir.c restore killability of old mutex_lock_killable(&inode->i_mutex) users 2016-05-26 00:13:25 -04:00
select.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
seq_file.c Make file credentials available to the seqfile interfaces 2016-04-14 12:56:09 -07:00
signalfd.c
splice.c Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
stack.c
stat.c
statfs.c
super.c fs: Add user namespace member to struct super_block 2016-06-23 15:41:55 -05:00
sync.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
timerfd.c
userfaultfd.c userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-20 17:58:30 -07:00
utimes.c
xattr.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00