linux/fs
Nick Piggin b3e19d924b fs: scale mntget/mntput
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:33 +11:00
..
9p fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
adfs fs: dcache reduce branches in lookup path 2011-01-07 17:50:28 +11:00
affs fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
afs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
autofs4 fs: rcu-walk aware d_revalidate method 2011-01-07 17:50:29 +11:00
befs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
bfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
btrfs btrfs: provide simple rcu-walk ACL implementation 2011-01-07 17:50:30 +11:00
cachefiles llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
ceph fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
cifs fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
coda fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
configfs fs: dcache reduce branches in lookup path 2011-01-07 17:50:28 +11:00
cramfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
debugfs convert get_sb_single() users 2010-10-29 04:16:28 -04:00
devpts convert get_sb_single() users 2010-10-29 04:16:28 -04:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm 2010-10-22 17:33:16 -07:00
ecryptfs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
efs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
exofs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
exportfs fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
ext2 ext2,3,4: provide simple rcu-walk ACL implementation 2011-01-07 17:50:30 +11:00
ext3 ext2,3,4: provide simple rcu-walk ACL implementation 2011-01-07 17:50:30 +11:00
ext4 ext2,3,4: provide simple rcu-walk ACL implementation 2011-01-07 17:50:30 +11:00
fat fs: rcu-walk aware d_revalidate method 2011-01-07 17:50:29 +11:00
freevxfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
fscache Add a dummy printk function for the maintenance of unused printks 2010-08-12 09:51:35 -07:00
fuse fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
gfs2 fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
hfs fs: rcu-walk aware d_revalidate method 2011-01-07 17:50:29 +11:00
hfsplus fs: dcache reduce branches in lookup path 2011-01-07 17:50:28 +11:00
hostfs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
hpfs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
hppfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
hugetlbfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
isofs fs: dcache reduce branches in lookup path 2011-01-07 17:50:28 +11:00
jbd Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-10-27 20:13:18 -07:00
jbd2 jbd2: fix /proc/fs/jbd2/<dev> when using an external journal 2010-11-17 21:46:26 -05:00
jffs2 fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
jfs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
lockd BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
logfs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
minix fs: dcache reduce branches in lookup path 2011-01-07 17:50:28 +11:00
ncpfs fs: rcu-walk aware d_revalidate method 2011-01-07 17:50:29 +11:00
nfs fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
nfs_common
nfsd fs: dcache scale dentry refcount 2011-01-07 17:50:21 +11:00
nilfs2 fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
nls
notify fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
ntfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
ocfs2 fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
omfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
openpromfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
partitions Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2010-10-25 07:45:10 -07:00
proc fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
qnx4 fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
quota quota: Fix possible oops in __dquot_initialize() 2010-10-28 01:30:06 +02:00
ramfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
reiserfs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
romfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
squashfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
sysfs fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
sysv fs: dcache reduce branches in lookup path 2011-01-07 17:50:28 +11:00
ubifs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
udf fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
ufs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
xfs xfs: provide simple rcu-walk ACL implementation 2011-01-07 17:50:30 +11:00
aio.c new helper: ihold() 2010-10-25 21:26:11 -04:00
anon_inodes.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
attr.c check ATTR_SIZE contraints in inode_change_ok 2010-08-09 16:47:39 -04:00
bad_inode.c fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
binfmt_aout.c Don't dump task struct in a.out core-dumps 2010-10-14 10:57:40 -07:00
binfmt_elf_fdpic.c
binfmt_elf.c ARM: 6342/1: fix ASLR of PIE executables 2010-10-08 10:02:53 +01:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c convert get_sb_single() users 2010-10-29 04:16:28 -04:00
binfmt_script.c Make do_execve() take a const filename pointer 2010-08-17 18:07:43 -07:00
binfmt_som.c
bio-integrity.c fs/bio-integrity.c: return -ENOMEM on kmalloc failure 2010-08-23 13:36:59 +02:00
bio.c bio: take care not overflow page count when mapping/copying user data 2010-11-10 14:40:43 +01:00
block_dev.c fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
buffer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
char_dev.c Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl 2010-10-22 10:52:56 -07:00
compat_binfmt_elf.c
compat_ioctl.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
compat.c exec: copy-and-paste the fixes into compat_do_execve() paths 2010-11-30 17:56:38 -08:00
dcache.c fs: implement faster dentry memcmp 2011-01-07 17:50:32 +11:00
dcookies.c
direct-io.c fs/direct-io.c: fix truncation error in dio_complete() return 2010-10-26 16:52:13 -07:00
drop_caches.c simplify checks for I_CLEAR/I_FREEING 2010-08-09 16:47:44 -04:00
eventfd.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
eventpoll.c epoll: make epoll_wait() use the hrtimer range feature 2010-10-27 18:03:18 -07:00
exec.c install_special_mapping skips security_file_mmap check. 2010-12-15 12:30:36 -08:00
fcntl.c fasync: Fix placement of FASYNC flag comment 2010-10-27 18:17:02 -07:00
fifo.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
file_table.c fs: allow for more than 2^31 files 2010-10-26 16:52:15 -07:00
file.c vfs: use kmalloc() to allocate fdmem if possible 2010-08-11 08:59:02 -07:00
filesystems.c fs: rcu-walk for path lookup 2011-01-07 17:50:27 +11:00
fs_struct.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
fs-writeback.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2010-10-30 09:05:48 -07:00
generic_acl.c fs: provide simple rcu-walk generic_check_acl implementation 2011-01-07 17:50:29 +11:00
inode.c fs: avoid inode RCU freeing for pseudo fs 2011-01-07 17:50:26 +11:00
internal.h fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
ioctl.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2010-11-19 19:46:45 -08:00
ioprio.c ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() 2010-11-15 10:23:31 +01:00
Kconfig Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
Kconfig.binfmt coredump: default CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y 2010-10-27 18:03:12 -07:00
libfs.c fs: dcache reduce branches in lookup path 2011-01-07 17:50:28 +11:00
locks.c fs: dcache scale dentry refcount 2011-01-07 17:50:21 +11:00
Makefile Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
mbcache.c mbcache: Limit the maximum number of cache entries 2010-08-18 06:24:41 -04:00
mpage.c
namei.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
namespace.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
nfsctl.c
no-block.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
open.c fix open/umount race 2010-10-29 04:14:56 -04:00
pipe.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
pnode.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
pnode.h
posix_acl.c
read_write.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
read_write.h
readdir.c vfs: fix warning: 'dirent' is used uninitialized in this function 2010-08-09 20:45:05 -07:00
select.c epoll: make epoll_wait() use the hrtimer range feature 2010-10-27 18:03:18 -07:00
seq_file.c fs: take dcache_lock inside __d_path 2010-10-25 21:26:12 -04:00
signalfd.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2010-10-26 10:13:10 -07:00
splice.c Export 'get_pipe_info()' to other users 2010-11-28 14:09:57 -08:00
stack.c
stat.c Mark arguments to certain syscalls as being const 2010-08-13 16:53:13 -07:00
statfs.c add f_flags to struct statfs(64) 2010-08-09 16:48:44 -04:00
super.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
sync.c get rid of file_fsync() 2010-08-09 16:47:43 -04:00
timerfd.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
utimes.c Mark arguments to certain syscalls as being const 2010-08-13 16:53:13 -07:00
xattr_acl.c
xattr.c