linux/fs
Josef Bacik c495144bc6 btrfs: move the dio_sem higher up the callchain
We're getting a lockdep splat because we take the dio_sem under the
log_mutex.  What we really need is to protect fsync() from logging an
extent map for an extent we never waited on higher up, so just guard the
whole thing with dio_sem.

======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted
------------------------------------------------------
aio-dio-invalid/30928 is trying to acquire lock:
0000000092621cfd (&mm->mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0

but task is already holding lock:
00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (&ei->dio_sem){++++}:
       lock_acquire+0xbd/0x220
       down_write+0x51/0xb0
       btrfs_log_changed_extents+0x80/0xa40
       btrfs_log_inode+0xbaf/0x1000
       btrfs_log_inode_parent+0x26f/0xa80
       btrfs_log_dentry_safe+0x50/0x70
       btrfs_sync_file+0x357/0x540
       do_fsync+0x38/0x60
       __ia32_sys_fdatasync+0x12/0x20
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-> #4 (&ei->log_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_record_unlink_dir+0x2a/0xa0
       btrfs_unlink+0x5a/0xc0
       vfs_unlink+0xb1/0x1a0
       do_unlinkat+0x264/0x2b0
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-> #3 (sb_internal#2){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       start_transaction+0x3e6/0x590
       btrfs_evict_inode+0x475/0x640
       evict+0xbf/0x1b0
       btrfs_run_delayed_iputs+0x6c/0x90
       cleaner_kthread+0x124/0x1a0
       kthread+0x106/0x140
       ret_from_fork+0x3a/0x50

-> #2 (&fs_info->cleaner_delayed_iput_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_alloc_data_chunk_ondemand+0x197/0x530
       btrfs_check_data_free_space+0x4c/0x90
       btrfs_delalloc_reserve_space+0x20/0x60
       btrfs_page_mkwrite+0x87/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-> #1 (sb_pagefaults){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       btrfs_page_mkwrite+0x6a/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-> #0 (&mm->mmap_sem){++++}:
       __lock_acquire+0x42e/0x7a0
       lock_acquire+0xbd/0x220
       down_read+0x48/0xb0
       get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_fast+0xa4/0x150
       iov_iter_get_pages+0xc3/0x340
       do_direct_IO+0xf93/0x1d70
       __blockdev_direct_IO+0x32d/0x1c20
       btrfs_direct_IO+0x227/0x400
       generic_file_direct_write+0xcf/0x180
       btrfs_file_write_iter+0x308/0x58c
       aio_write+0xf8/0x1d0
       io_submit_one+0x3a9/0x620
       __ia32_compat_sys_io_submit+0xb2/0x270
       do_int80_syscall_32+0x5b/0x1a0
       entry_INT80_compat+0x88/0xa0

other info that might help us debug this:

Chain exists of:
  &mm->mmap_sem --> &ei->log_mutex --> &ei->dio_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->dio_sem);
                               lock(&ei->log_mutex);
                               lock(&ei->dio_sem);
  lock(&mm->mmap_sem);

 *** DEADLOCK ***

1 lock held by aio-dio-invalid/30928:
 #0: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

stack backtrace:
CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 dump_stack+0x7c/0xbb
 print_circular_bug.isra.37+0x297/0x2a4
 check_prev_add.constprop.45+0x781/0x7a0
 ? __lock_acquire+0x42e/0x7a0
 validate_chain.isra.41+0x7f0/0xb00
 __lock_acquire+0x42e/0x7a0
 lock_acquire+0xbd/0x220
 ? get_user_pages_unlocked+0x5a/0x1e0
 down_read+0x48/0xb0
 ? get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_fast+0xa4/0x150
 iov_iter_get_pages+0xc3/0x340
 do_direct_IO+0xf93/0x1d70
 ? __alloc_workqueue_key+0x358/0x490
 ? __blockdev_direct_IO+0x14b/0x1c20
 __blockdev_direct_IO+0x32d/0x1c20
 ? btrfs_run_delalloc_work+0x40/0x40
 ? can_nocow_extent+0x490/0x490
 ? kvm_clock_read+0x1f/0x30
 ? can_nocow_extent+0x490/0x490
 ? btrfs_run_delalloc_work+0x40/0x40
 btrfs_direct_IO+0x227/0x400
 ? btrfs_run_delalloc_work+0x40/0x40
 generic_file_direct_write+0xcf/0x180
 btrfs_file_write_iter+0x308/0x58c
 aio_write+0xf8/0x1d0
 ? kvm_clock_read+0x1f/0x30
 ? __might_fault+0x3e/0x90
 io_submit_one+0x3a9/0x620
 ? io_submit_one+0xe5/0x620
 __ia32_compat_sys_io_submit+0xb2/0x270
 do_int80_syscall_32+0x5b/0x1a0
 entry_INT80_compat+0x88/0xa0

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-19 12:20:04 +02:00
..
9p
adfs adfs: use timespec64 for time conversion 2018-08-22 10:52:51 -07:00
affs
afs afs: Fix afs_server struct leak 2018-10-12 17:36:40 +02:00
autofs Merge branch 'akpm' (patches from Andrew) 2018-08-22 12:34:08 -07:00
befs
bfs
btrfs btrfs: move the dio_sem higher up the callchain 2018-10-19 12:20:04 +02:00
cachefiles
ceph ceph: avoid a use-after-free in ceph_destroy_options() 2018-09-06 16:18:04 +02:00
cifs smb3: fix lease break problem introduced by compounding 2018-10-02 18:54:09 -05:00
coda
configfs
cramfs
crypto
debugfs
devpts
dlm
ecryptfs
efivarfs
efs
exofs
exportfs
ext2 ext2, dax: set ext2_dax_aops for dax files 2018-09-19 15:03:04 +02:00
ext4 Various ext4 bug fixes; primarily making ext4 more robust against 2018-09-17 09:13:47 +02:00
f2fs f2fs-for-4.19-rc1 2018-08-22 13:29:39 -07:00
fat fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() 2018-10-13 09:31:03 +02:00
freevxfs
fscache
fuse
gfs2 gfs2: Fix iomap buffered write support for journaled files (2) 2018-10-12 17:14:42 +02:00
hfs hfs: prevent crash on exit from failed search 2018-08-23 18:48:42 -07:00
hfsplus hfsplus: prevent crash on exit from failed search 2018-08-23 18:48:42 -07:00
hostfs
hpfs hpfs: remove unnecessary checks on the value of r when assigning error code 2018-08-25 12:42:33 -07:00
hugetlbfs mm: zero out the vma in vma_init() 2018-08-22 10:52:44 -07:00
isofs
jbd2
jffs2
jfs
kernfs
lockd
minix
nfs NFS client bugfixes for Linux 4.19 2018-09-14 19:25:28 -10:00
nfs_common
nfsd vfs: swap names of {do,vfs}_clone_file_range() 2018-09-24 10:54:01 +02:00
nilfs2 nilfs2: convert to SPDX license tags 2018-09-04 16:45:02 -07:00
nls
notify fsnotify: fix ignore mask logic in fsnotify() 2018-09-03 14:57:41 +02:00
ntfs
ocfs2 ocfs2: fix a GCC warning 2018-10-13 09:31:02 +02:00
omfs
openpromfs
orangefs
overlayfs ovl: fix format of setxattr debug 2018-10-04 14:49:10 +02:00
proc proc: restrict kernel stack dumps to root 2018-10-05 16:32:05 -07:00
pstore pstore/ram: Fix failure-path memory leak in ramoops_init 2018-09-30 10:15:41 -07:00
qnx4
qnx6
quota
ramfs
reiserfs reiserfs: fix broken xattr handling (heap corruption, bad retval) 2018-08-22 10:52:50 -07:00
romfs
squashfs
sysfs
sysv fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp 2018-08-22 10:52:51 -07:00
tracefs
ubifs ubifs: Fix WARN_ON logic in exit path 2018-10-13 11:05:02 +02:00
udf udf: Fix mounting of Win7 created UDF filesystems 2018-08-24 11:13:32 +02:00
ufs
xfs xfs: fix data corruption w/ unaligned reflink ranges 2018-10-06 11:44:39 +10:00
aio.c
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c
buffer.c notifier: Remove notifier header file wherever not used 2018-08-30 12:56:40 +02:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c
coredump.c
d_path.c
dax.c filesystem-dax: Fix dax_layout_busy_page() livelock 2018-10-08 11:38:44 -07:00
dcache.c
dcookies.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c fs/eventpoll.c: simplify ep_is_linked() callers 2018-08-22 10:52:49 -07:00
exec.c
fcntl.c
fhandle.c
file_table.c
file.c
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c
inode.c
internal.h
ioctl.c vfs: swap names of {do,vfs}_clone_file_range() 2018-09-24 10:54:01 +02:00
iomap.c iomap: set page dirty after partial delalloc on mkwrite 2018-09-29 13:51:01 +10:00
Kconfig
Kconfig.binfmt
libfs.c
locks.c
Makefile
mbcache.c
mount.h
mpage.c
namei.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-08-23 18:48:43 -07:00
namespace.c Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
no-block.c
nsfs.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c vfs: swap names of {do,vfs}_clone_file_range() 2018-09-24 10:54:01 +02:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c
stack.c
stat.c
statfs.c
super.c Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
sync.c
timerfd.c
userfaultfd.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
utimes.c
xattr.c sysfs: Do not return POSIX ACL xattrs via listxattr 2018-09-18 07:30:48 -04:00