linux/fs
Theodore Ts'o 00d4e7362e ext4: fix potential deadlock in ext4_nonda_switch()
In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle().  The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
lockdep output below).

As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned.  Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us.  The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.

Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.

   [ INFO: possible circular locking dependency detected ]
   3.6.0-rc1-00042-gce894ca #367 Not tainted
   -------------------------------------------------------
   dd/8298 is trying to acquire lock:
    (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46

   but task is already holding lock:
    (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   which lock already depends on the new lock.

   2 locks held by dd/8298:
    #0:  (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3
    #1:  (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   stack backtrace:
   Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
   Call Trace:
    [<c015b79c>] ? console_unlock+0x345/0x372
    [<c06d62a1>] print_circular_bug+0x190/0x19d
    [<c019906c>] __lock_acquire+0x86d/0xb6c
    [<c01999db>] ? mark_held_locks+0x5c/0x7b
    [<c0199724>] lock_acquire+0x66/0xb9
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c06db935>] down_read+0x28/0x58
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
    [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4
    [<c0271ece>] ext4_da_write_begin+0x27/0x193
    [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb
    [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205
    [<c01ddce7>] generic_file_aio_write+0x78/0xd3
    [<c026d336>] ext4_file_write+0x480/0x4a6
    [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c
    [<c0180944>] ? sched_clock_cpu+0x11a/0x13e
    [<c01967e9>] ? trace_hardirqs_off+0xb/0xd
    [<c018099f>] ? local_clock+0x37/0x4e
    [<c0209f2c>] do_sync_write+0x67/0x9d
    [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44
    [<c020a7b9>] vfs_write+0x7b/0xe6
    [<c020a9a6>] sys_write+0x3b/0x64
    [<c06dd4bd>] syscall_call+0x7/0xb

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-09-19 22:42:36 -04:00
..
9p 9p: Push file_update_time() into v9fs_vm_page_mkwrite() 2012-07-31 01:02:46 +04:00
adfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
affs affs: use memweight() 2012-07-30 17:25:16 -07:00
afs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
autofs4 switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
befs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
bfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
cachefiles fs: cachefiles: add support for large files in filesystem caching 2012-07-30 17:25:21 -07:00
ceph ceph: simplify+fix atomic_open 2012-08-02 09:11:19 -07:00
cifs CIFS: Add SMB2 support for rmdir 2012-07-27 15:17:50 -05:00
coda don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
configfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
cramfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
debugfs Driver core merge for 3.6-rc1 2012-07-26 11:25:33 -07:00
devpts VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
dlm dlm: fix missing dir remove 2012-07-16 14:24:43 -05:00
ecryptfs - Fixes a bug when the lower filesystem mount options include 'acl', but the 2012-08-02 10:56:34 -07:00
efs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
exofs Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-07-23 12:27:27 -07:00
exportfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
ext3 ext3: use memweight() 2012-07-30 17:25:16 -07:00
ext4 ext4: fix potential deadlock in ext4_nonda_switch() 2012-09-19 22:42:36 -04:00
fat Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
freevxfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
fscache
fuse fuse: Convert to new freezing mechanism 2012-07-31 09:45:50 +04:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
hfs hfs: get rid of hfs_sync_super 2012-07-22 23:58:09 +04:00
hfsplus hfsplus: use -ENOMEM when kzalloc() fails 2012-07-30 17:25:19 -07:00
hostfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
hpfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
hppfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
hugetlbfs hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages 2012-07-31 18:42:40 -07:00
isofs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-07-24 17:40:44 -07:00
jbd jbd: Check return value of blkdev_issue_flush() 2012-07-09 23:38:36 +02:00
jbd2 jbd2: don't write superblock when if its empty 2012-08-18 22:29:40 -04:00
jffs2 don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
jfs don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
lockd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
logfs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
minix minixfs: fix block limit check 2012-07-30 17:25:19 -07:00
ncpfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
nfs Merge branch 'akpm' (Andrew's patch-bomb) 2012-07-31 19:25:39 -07:00
nfs_common
nfsd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
nilfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
nls nls: fix (and rename) mac NLS table files and config options 2012-06-01 19:51:22 -07:00
notify switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
ocfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
omfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
openpromfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
proc proc: do not allow negative offsets on /proc/<pid>/environ 2012-07-30 17:25:20 -07:00
pstore pstore/ram: Make tracing log versioned 2012-07-17 16:48:09 -07:00
qnx4 qnx4fs: use memweight() 2012-07-30 17:25:16 -07:00
qnx6 stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-07-24 17:40:44 -07:00
ramfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
reiserfs don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
romfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
squashfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
sysv fs/sysv: stop using write_super and s_dirt 2012-07-22 23:58:12 +04:00
ubifs * Added another debugfs knob for forcing UBIFS R/O mode without flushing caches 2012-07-23 15:50:52 -07:00
udf Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-07-24 17:40:44 -07:00
ufs fs/ufs: get rid of write_super 2012-07-22 23:58:16 +04:00
xfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
aio.c aio: now fput() is OK from interrupt context; get rid of manual delayed __fput() 2012-07-22 23:57:59 +04:00
anon_inodes.c
attr.c notify_change(): check that i_mutex is held 2012-07-14 16:35:42 +04:00
bad_inode.c don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
binfmt_aout.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
binfmt_elf_fdpic.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
binfmt_elf.c binfmt_elf: switch elf_map() to vm_mmap/vm_munmap 2012-05-30 21:04:55 -04:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: use vm_munmap, we are missing ->mmap_sem there 2012-05-30 21:04:56 -04:00
binfmt_misc.c vfs: Rename end_writeback() to clear_inode() 2012-05-06 13:43:41 +08:00
binfmt_script.c
binfmt_som.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
bio-integrity.c
bio.c Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block 2012-05-30 08:52:42 -07:00
block_dev.c vfs: Create function for iterating over block devices 2012-07-22 23:58:45 +04:00
buffer.c fs: Protect write paths by sb_start_write - sb_end_write 2012-07-31 09:45:47 +04:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-06-01 11:53:44 -07:00
dcache.c __d_unalias() should refuse to move mountpoints 2012-07-14 16:35:15 +04:00
dcookies.c
direct-io.c fs/direct-io.c: adjust suspicious bit operation 2012-07-14 16:32:46 +04:00
drop_caches.c
eventfd.c eventfd: change int to __u64 in eventfd_signal() 2012-05-31 17:49:32 -07:00
eventpoll.c PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND 2012-07-17 21:37:27 +02:00
exec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
fcntl.c c/r: fcntl: add F_GETOWNER_UIDS option 2012-07-30 17:25:21 -07:00
fhandle.c
fifo.c fifo: Do not restart open() if it already found a partner 2012-07-16 08:33:14 -07:00
file_table.c fs: Add freezing handling to mnt_want_write() / mnt_drop_write() 2012-07-31 09:40:38 +04:00
file.c
filesystems.c
fs_struct.c get rid of ->mnt_longterm 2012-07-14 16:32:47 +04:00
fs-writeback.c ext4: fix potential deadlock in ext4_nonda_switch() 2012-09-19 22:42:36 -04:00
generic_acl.c
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
internal.h fs: Add freezing handling to mnt_want_write() / mnt_drop_write() 2012-07-31 09:40:38 +04:00
ioctl.c
ioprio.c Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block 2012-05-30 08:52:42 -07:00
Kconfig
Kconfig.binfmt C6X: add support to build with BINFMT_ELF_FDPIC 2012-05-15 09:17:34 -04:00
libfs.c VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
locks.c locks: remove unused lm_release_private 2012-08-01 09:01:46 -07:00
Makefile
mbcache.c
mount.h get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
mpage.c
namei.c fs: Push mnt_want_write() outside of i_mutex 2012-07-31 01:02:49 +04:00
namespace.c fs: Add freezing handling to mnt_want_write() / mnt_drop_write() 2012-07-31 09:40:38 +04:00
no-block.c
open.c fs: Protect write paths by sb_start_write - sb_end_write 2012-07-31 09:45:47 +04:00
pipe.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
pnode.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2012-07-14 16:37:27 +04:00
pnode.h
posix_acl.c
proc_namespace.c get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
read_write.c vfs: allow custom EOF in generic_file_llseek code 2012-07-23 00:00:15 +04:00
read_write.h
readdir.c switch readdir/getdents to fget_light/fput_light 2012-05-29 23:28:29 -04:00
select.c posix_types.h: Cleanup stale __NFDBITS and related definitions 2012-07-26 13:36:43 -07:00
seq_file.c seq_file: Add seq_vprintf function and export it 2012-06-11 13:16:35 +01:00
signalfd.c switch signalfd4() to fget_light/fput_light 2012-05-29 23:28:30 -04:00
splice.c fs: Protect write paths by sb_start_write - sb_end_write 2012-07-31 09:45:47 +04:00
stack.c
stat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
statfs.c switch statfs to fget_light/fput_light 2012-05-29 23:28:31 -04:00
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
sync.c vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes 2012-07-22 23:59:01 +04:00
timerfd.c
utimes.c switch utimes() to fget_light/fput_light 2012-05-29 23:28:32 -04:00
xattr_acl.c
xattr.c fs/xattr.c:getxattr(): improve handling of allocation failures 2012-07-30 17:25:11 -07:00