linux/fs
Jens Axboe fe0f07d08e direct-io: only inc/dec inode->i_dio_count for file systems
do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-24 15:45:28 -04:00
..
9p fs/9p: fix readdir() 2015-04-24 15:45:03 -04:00
adfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
affs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
afs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
autofs4 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
befs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
bfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
btrfs direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
cachefiles VFS: fs/cachefiles: d_backing_inode() annotations 2015-04-15 15:06:59 -04:00
ceph VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
cifs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
coda VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
configfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
cramfs
debugfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
devpts VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dlm netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
ecryptfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
efivarfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
efs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
exofs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
exportfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
ext2 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
ext3 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
ext4 direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
f2fs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
fat VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
freevxfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
fscache
fuse VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
gfs2 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
hfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
hfsplus VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
hostfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
hpfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
hppfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
hugetlbfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
isofs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
jbd
jbd2 jbd2: complain about descriptor block checksum errors 2015-01-19 15:59:58 -05:00
jffs2 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
jfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
kernfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
lockd Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linux 2015-02-12 10:39:41 -08:00
logfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
minix VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
ncpfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
nfs direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
nfs_common
nfsd VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
nilfs2 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
nls
notify fanotify: fix event filtering with FAN_ONDIR set 2015-03-12 18:46:08 -07:00
ntfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
ocfs2 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
omfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
openpromfs
overlayfs ovl: upper fs should not be R/O 2015-03-18 10:29:48 +01:00
proc VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
pstore VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
qnx4
qnx6 VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
quota VFS: fs library helpers: d_inode() annotations 2015-04-15 15:06:58 -04:00
ramfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
reiserfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
romfs make new_sync_{read,write}() static 2015-04-11 22:29:40 -04:00
squashfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
sysfs driver core patches for 3.20-rc1 2015-02-15 11:11:47 -08:00
sysv VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
ubifs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
udf VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
ufs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
aio.c mirror O_APPEND and O_DIRECT into iocb->ki_flags 2015-04-11 22:30:22 -04:00
anon_inodes.c
attr.c
bad_inode.c don't bother with most of the bad_file_ops methods 2015-02-20 04:03:58 -05:00
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c x86, mm/ASLR: Fix stack randomization on 64-bit systems 2015-02-19 12:21:36 +01:00
binfmt_em86.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_flat.c
binfmt_misc.c VFS: assorted weird filesystems: d_inode() annotations 2015-04-15 15:06:58 -04:00
binfmt_script.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
block_dev.c direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
buffer.c
char_dev.c fs: introduce f_op->mmap_capabilities for nommu mmap support 2015-01-20 14:02:58 -07:00
compat_binfmt_elf.c
compat_ioctl.c Bluetooth: bnep: Add support for get bnep features via ioctl 2015-04-03 23:21:34 +02:00
compat.c
coredump.c coredump: accept any write method 2015-04-11 22:29:39 -04:00
dax.c direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
dcache.c VFS: Impose ordering on accesses of d_inode and d_flags 2015-04-15 15:05:28 -04:00
dcookies.c
direct-io.c direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
drop_caches.c vmscan: per memory cgroup slab shrinkers 2015-02-12 18:54:09 -08:00
eventfd.c eventfd: don't take the spinlock in eventfd_poll 2015-02-17 14:34:52 -08:00
eventpoll.c epoll: optimize setting task running after blocking 2015-02-13 21:21:40 -08:00
exec.c fs: create proper filename objects using getname_kernel() 2015-01-23 00:22:20 -05:00
fcntl.c vfs: renumber FMODE_NONOTIFY and add to uniqueness check 2015-01-08 15:10:52 -08:00
fhandle.c
file_table.c ->aio_read and ->aio_write removed 2015-04-11 22:29:43 -04:00
file.c fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) 2014-12-10 17:41:10 -08:00
filesystems.c
fs_pin.c switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
fs_struct.c
fs-writeback.c fs: add dirtytime_expire_seconds sysctl 2015-03-17 12:23:32 -04:00
inode.c direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
internal.h trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
ioctl.c fsioctl.c: make generic_block_fiemap() signal-tolerant 2015-02-10 14:30:30 -08:00
Kconfig dax: does not work correctly with virtual aliasing caches 2015-02-16 17:56:04 -08:00
Kconfig.binfmt fs/binfmt_som: Drop kernel support for HP-UX SOM binaries 2015-02-17 16:29:36 +01:00
libfs.c VFS: fs library helpers: d_inode() annotations 2015-04-15 15:06:58 -04:00
locks.c locks: fix file_lock deletion inside loop 2015-03-27 07:18:20 -04:00
Makefile Merge branch 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2015-02-17 14:25:58 -08:00
mbcache.c
mount.h switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
mpage.c
namei.c VFS: Make pathwalk use d_is_reg() rather than S_ISREG() 2015-04-15 15:05:30 -04:00
namespace.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
no-block.c
nsfs.c VFS: assorted weird filesystems: d_inode() annotations 2015-04-15 15:06:58 -04:00
open.c ->aio_read and ->aio_write removed 2015-04-11 22:29:43 -04:00
pipe.c VFS: assorted weird filesystems: d_inode() annotations 2015-04-15 15:06:58 -04:00
pnode.c
pnode.h
posix_acl.c VFS: assorted d_backing_inode() annotations 2015-04-15 15:06:59 -04:00
proc_namespace.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
read_write.c new_sync_write(): discard ->ki_pos unless the return value is positive 2015-04-11 22:29:46 -04:00
readdir.c
select.c all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
seq_file.c bitmap, cpumask, nodemask: remove dedicated formatting functions 2015-02-13 21:21:39 -08:00
signalfd.c
splice.c vmsplice_to_user(): switch to import_iovec() 2015-04-11 22:27:11 -04:00
stack.c
stat.c VFS: assorted d_backing_inode() annotations 2015-04-15 15:06:59 -04:00
statfs.c
super.c trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
sync.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
timerfd.c
utimes.c
xattr.c