linux/fs
Jens Axboe 03ba3782e8 writeback: switch to per-bdi threads for flushing data
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  1      0 608848   2652 375372    0    0     0 71024  604    24  1 10 48 42
 0  1      0 549644   2712 433736    0    0     0 60692  505    27  1  8 48 44
 1  0      0 476928   2784 505192    0    0     4 29540  553    24  0  9 53 37
 0  1      0 457972   2808 524008    0    0     0 54876  331    16  0  4 38 58
 0  1      0 366128   2928 614284    0    0     4 92168  710    58  0 13 53 34
 0  1      0 295092   3000 684140    0    0     0 62924  572    23  0  9 53 37
 0  1      0 236592   3064 741704    0    0     4 58256  523    17  0  8 48 44
 0  1      0 165608   3132 811464    0    0     0 57460  560    21  0  8 54 38
 0  1      0 102952   3200 873164    0    0     4 74748  540    29  1 10 48 41
 0  1      0  48604   3252 926472    0    0     0 53248  469    29  0  7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  1      0 678716   5792 303380    0    0     0 74064  565    50  1 11 52 36
 1  0      0 662488   5864 319396    0    0     4   352  302   329  0  2 47 51
 0  1      0 599312   5924 381468    0    0     0 78164  516    55  0  9 51 40
 0  1      0 519952   6008 459516    0    0     4 78156  622    56  1 11 52 37
 1  1      0 436640   6092 541632    0    0     0 82244  622    54  0 11 48 41
 0  1      0 436640   6092 541660    0    0     0     8  152    39  0  0 51 49
 0  1      0 332224   6200 644252    0    0     4 102800  728    46  1 13 49 36
 1  0      0 274492   6260 701056    0    0     4 12328  459    49  0  7 50 43
 0  1      0 211220   6324 763356    0    0     0 106940  515    37  1 10 51 39
 1  0      0 160412   6376 813468    0    0     0  8224  415    43  0  6 49 45
 1  1      0  85980   6452 886556    0    0     4 113516  575    39  1 11 54 34
 0  2      0  85968   6452 886620    0    0     0  1640  158   211  0  0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
..
9p 9p: remove unnecessary v9fses->options which duplicates the mount string 2009-08-17 16:42:28 -05:00
adfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
affs affs: add ->sync_fs 2009-06-11 21:36:14 -04:00
afs AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr 2009-08-27 12:22:08 -07:00
autofs switch follow_down() 2009-06-11 21:36:01 -04:00
autofs4 autofs4 - fix missed case when changing to use struct path 2009-08-31 17:44:05 -10:00
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-06-17 08:46:57 -07:00
bfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
btrfs btrfs: fix inode rbtree corruption 2009-08-21 10:09:44 +02:00
cachefiles enforce ->sync_fs is only called for rw superblock 2009-06-11 21:36:06 -04:00
cifs [CIFS] Update readme to reflect forceuid mount parms 2009-08-04 03:53:28 +00:00
coda splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
configfs configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-04-30 10:48:26 -07:00
cramfs fs/cramfs: return f_fsid for statfs(2) 2009-04-02 19:05:08 -07:00
debugfs debugfs: use specified mode to possibly mark files read/write only 2009-06-15 21:30:28 -07:00
devpts devpts: remove module-related code 2009-06-24 08:15:24 -04:00
dlm dlm: free socket in error exit path 2009-07-14 12:28:43 -05:00
ecryptfs eCryptfs: parse_tag_3_packet check tag 3 packet encrypted key size 2009-07-28 14:26:06 -07:00
efs get rid of BKL in fs/efs 2009-06-17 00:36:36 -04:00
exofs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
exportfs
ext2 ext[234]: move over to 'check_acl' permission model 2009-09-08 11:09:04 -07:00
ext3 ext[234]: move over to 'check_acl' permission model 2009-09-08 11:09:04 -07:00
ext4 ext[234]: move over to 'check_acl' permission model 2009-09-08 11:09:04 -07:00
fat headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
freevxfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
fscache FS-Cache: Fixup renamed filenames in comments in internal.h 2009-05-27 10:20:13 -07:00
fuse Revert "fuse: Fix build error" as unnecessary 2009-07-11 11:22:34 -07:00
gfs2 GFS2: Fix permissions on "recover" file 2009-08-14 14:04:46 +01:00
hfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
hfsplus headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
hostfs hostfs: set maximum filesize in superblock for proper LFS support 2009-06-30 18:56:03 -07:00
hpfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
hppfs hppfs: hppfs_read_file() may return -ERROR 2009-04-02 19:04:53 -07:00
hugetlbfs mm: fix hugetlb bug due to user_shm_unlock call 2009-08-24 12:53:01 -07:00
isofs isofs: fix Joliet regression 2009-07-10 19:18:59 -07:00
jbd jbd: fix race between write_metadata_buffer and get_write_access 2009-07-21 11:54:42 +02:00
jbd2 jbd2: fix race between write_metadata_buffer and get_write_access 2009-07-13 17:55:35 -04:00
jffs2 jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' 2009-09-08 11:09:04 -07:00
jfs jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' 2009-09-08 11:09:04 -07:00
lockd headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
minix Making fs/minix/minix.h double including safe 2009-06-22 11:34:42 -07:00
ncpfs NLS: update handling of Unicode 2009-06-15 21:44:43 -07:00
nfs NFSv4: Fix an infinite looping problem with the nfs4_state_manager 2009-08-24 16:28:42 -07:00
nfs_common
nfsd headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
nilfs2 nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key 2009-08-31 12:03:06 +09:00
nls NLS: update handling of Unicode 2009-06-15 21:44:43 -07:00
notify inotify: update the group mask on mark addition 2009-08-28 12:51:14 -04:00
ntfs ntfs: use is_power_of_2() function for clarity. 2009-06-16 19:47:48 -07:00
ocfs2 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 2009-09-05 13:38:37 -07:00
omfs switch omfs to simple_fsync() 2009-06-11 21:36:13 -04:00
openpromfs
partitions partitions: fix broken uevent_suppress conversion 2009-07-12 13:02:09 -07:00
proc mm: revert "oom: move oom_adj value" 2009-08-18 16:31:13 -07:00
qnx4 fs/qnx4: sanitize includes 2009-06-11 21:36:12 -04:00
quota quota: Silence lockdep on quota_on 2009-07-30 17:31:23 +02:00
ramfs fs/ramfs/file-nommu.c needs include/linux/sched.h 2009-07-29 19:10:36 -07:00
reiserfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
romfs ROMFS: romfs_dev_read() error ignored 2009-05-09 10:49:41 -04:00
smbfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
squashfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
sysfs sysfs: fix hardlink count on device_move 2009-07-28 13:45:21 -07:00
sysv get rid of BKL in fs/sysv 2009-06-17 00:36:37 -04:00
ubifs writeback: get rid of generic_sync_sb_inodes() export 2009-09-11 09:20:25 +02:00
udf udf: Fix loading of VAT inode when drive wrongly reports number of recorded blocks 2009-07-30 17:28:26 +02:00
ufs ufs: sector_t cannot be negative 2009-06-18 13:03:46 -07:00
xfs jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' 2009-09-08 11:09:04 -07:00
aio.c eventfd: revised interface and cleanups 2009-06-30 18:55:58 -07:00
anon_inodes.c fs: Provide empty .set_page_dirty() aop for anon inodes 2009-06-18 14:46:10 +02:00
attr.c vfs: Use lowercase names of quota functions 2009-03-26 02:18:35 +01:00
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c elf_core_dump: use rcu_read_lock() to access ->real_parent 2009-06-18 13:03:52 -07:00
binfmt_elf.c binfmt_elf: fix PT_INTERP bss handling 2009-09-09 20:03:47 -07:00
binfmt_em86.c
binfmt_flat.c flat: fix uninitialized ptr with shared libs 2009-08-07 10:39:57 -07:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c Don't crap into descriptor table in binfmt_som 2009-03-31 23:00:28 -04:00
bio-integrity.c block: Create bip slabs with embedded integrity vectors 2009-07-01 10:56:25 +02:00
bio.c block: fix sg SG_DXFER_TO_FROM_DEV regression 2009-07-10 20:31:53 +02:00
block_dev.c PM / Hibernate: Replace bdget call with simple atomic_inc of i_count 2009-07-29 21:07:55 +02:00
buffer.c writeback: switch to per-bdi threads for flushing data 2009-09-11 09:20:25 +02:00
char_dev.c headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
compat_binfmt_elf.c
compat_ioctl.c compat_ioctl: hook up compat handler for FIEMAP ioctl 2009-08-07 10:39:56 -07:00
compat.c exec: do not sleep in TASK_TRACED under ->cred_guard_mutex 2009-09-05 11:30:42 -07:00
dcache.c dcache: extrace and use d_unlinked() 2009-06-11 21:36:06 -04:00
dcookies.c
direct-io.c block: Do away with the notion of hardsect_size 2009-05-22 23:22:54 +02:00
drop_caches.c mm: remove __invalidate_mapping_pages variant 2009-06-16 19:47:43 -07:00
eventfd.c eventfd: revised interface and cleanups 2009-06-30 18:55:58 -07:00
eventpoll.c epoll: fix nested calls support 2009-06-18 13:03:41 -07:00
exec.c exec: do not sleep in TASK_TRACED under ->cred_guard_mutex 2009-09-05 11:30:42 -07:00
fcntl.c headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
fifo.c
file_table.c fs: move mark_files_ro into file_table.c 2009-06-11 21:36:02 -04:00
file.c
filesystems.c fs: Mark get_filesystem_list() as __init function. 2009-04-20 23:02:52 -04:00
fs_struct.c Get rid of indirect include of fs_struct.h 2009-03-31 23:00:27 -04:00
fs-writeback.c writeback: switch to per-bdi threads for flushing data 2009-09-11 09:20:25 +02:00
generic_acl.c New helper - current_umask() 2009-03-31 23:00:26 -04:00
inode.c vfs: add __destroy_inode 2009-08-07 14:38:29 -03:00
internal.h Trim a bit of crap from fs.h 2009-06-11 21:36:07 -04:00
ioctl.c fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls 2009-06-24 08:15:27 -04:00
ioprio.c
Kconfig fs/Kconfig: move nilfs2 out 2009-07-14 12:34:17 +09:00
Kconfig.binfmt
libfs.c vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed 2009-08-18 16:31:12 -07:00
locks.c lockd: call locks_release_private to cleanup per-filesystem state 2009-04-24 16:36:03 -04:00
Makefile nilfs2: update makefile and Kconfig 2009-04-07 08:31:16 -07:00
mbcache.c
mpage.c ext4: Properly initialize the buffer_head state 2009-05-13 15:13:42 -04:00
namei.c Make 'check_acl()' a first-class filesystem op 2009-09-08 11:07:44 -07:00
namespace.c vfs: mnt_want_write_file(): fix special file handling 2009-08-07 10:39:56 -07:00
nfsctl.c
no-block.c
open.c fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls 2009-06-24 08:15:27 -04:00
pipe.c lockdep: Fix lockdep annotation for pipe_double_lock() 2009-07-22 21:14:14 +02:00
pnode.c
pnode.h
posix_acl.c
read_write.c splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
read_write.h
readdir.c
select.c poll/select: initialize triggered field of struct poll_wqueues 2009-08-15 18:40:11 -07:00
seq_file.c seq_file: add function to write binary data 2009-06-18 13:03:57 -07:00
signalfd.c
splice.c splice: fix kmaps in default_file_splice_write() 2009-05-19 11:37:46 +02:00
stack.c
stat.c kill vfs_stat_fd / vfs_lstat_fd 2009-04-20 23:02:52 -04:00
super.c writeback: switch to per-bdi threads for flushing data 2009-09-11 09:20:25 +02:00
sync.c writeback: switch to per-bdi threads for flushing data 2009-09-11 09:20:25 +02:00
timerfd.c timerfd: add flags check 2009-02-18 15:37:53 -08:00
utimes.c
xattr_acl.c
xattr.c fs: introduce mnt_clone_write 2009-06-11 21:36:02 -04:00