linux/fs
Filipe Manana f7fa1107f3 Btrfs: fix race between cloning range ending at eof and writeback
The recent rework that makes btrfs' remap_file_range operation use the
generic helper generic_remap_file_range_prep() introduced a race between
writeback and cloning a range that covers the eof extent of the source
file into a destination offset that is greater then the same file's size.

This happens because we now wait for writeback to complete before doing
the truncation of the eof block, while previously we did the truncation
and then waited for writeback to complete. This leads to a race between
writeback of the truncated block and cloning the file extents in the
source range, because we copy each file extent item we find in the fs
root into a buffer, then release the path and then increment the reference
count for the extent referred in that file extent item we copied, which
can no longer exist if writeback of the truncated eof block completes
after we copied the file extent item into the buffer and before we
incremented the reference count. This is illustrated by the following
diagram:

        CPU 1                                       CPU 2

  btrfs_clone_files()
    btrfs_cont_expand()
      btrfs_truncate_block()
         --> zeroes part of the
             page containg eof,
             marking it for
            delalloc

    btrfs_clone()
      --> finds extent item
          covering eof,
          points to extent
          at bytenr X
      --> copies it into a
          local buffer
      --> releases path

                                        writeback starts

                                        btrfs_finish_ordered_io()
                                          insert_reserved_file_extent()
                                            __btrfs_drop_extents()
                                              --> creates delayed
                                                  reference to drop
                                                  the extent at
                                                  bytenr X

      --> starts transaction
      --> creates delayed
          reference to
          increment extent
          at bytenr X

                    <delayed references are run, due to a transaction
                     commit for example, and the transaction is aborted
                     with -EIO because we attempt to increment reference
                     count for the extent at bytenr X after we freed it>

When this race is hit the running transaction ends up getting aborted with
an -EIO error and a trace like the following is produced:

[ 4382.553858] WARNING: CPU: 2 PID: 3648 at fs/btrfs/extent-tree.c:1552 lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
(...)
[ 4382.556293] CPU: 2 PID: 3648 Comm: btrfs Tainted: G        W         4.20.0-rc6-btrfs-next-41 #1
[ 4382.556294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[ 4382.556308] RIP: 0010:lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
(...)
[ 4382.556310] RSP: 0018:ffffac784408f738 EFLAGS: 00010202
[ 4382.556311] RAX: 0000000000000001 RBX: ffff8980673c3a48 RCX: 0000000000000001
[ 4382.556312] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
[ 4382.556312] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 4382.556313] R10: 0000000000000001 R11: ffff897f40000000 R12: 0000000000001000
[ 4382.556313] R13: 00000000c224f000 R14: ffff89805de9bd40 R15: ffff8980453f4548
[ 4382.556315] FS:  00007f5e759178c0(0000) GS:ffff89807b300000(0000) knlGS:0000000000000000
[ 4382.563130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4382.563562] CR2: 00007f2e9789fcbc CR3: 0000000120512001 CR4: 00000000003606e0
[ 4382.564005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4382.564451] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4382.564887] Call Trace:
[ 4382.565343]  insert_inline_extent_backref+0x55/0xe0 [btrfs]
[ 4382.565796]  __btrfs_inc_extent_ref.isra.60+0x88/0x260 [btrfs]
[ 4382.566249]  ? __btrfs_run_delayed_refs+0x93/0x1650 [btrfs]
[ 4382.566702]  __btrfs_run_delayed_refs+0xa22/0x1650 [btrfs]
[ 4382.567162]  btrfs_run_delayed_refs+0x7e/0x1d0 [btrfs]
[ 4382.567623]  btrfs_commit_transaction+0x50/0x9c0 [btrfs]
[ 4382.568112]  ? _raw_spin_unlock+0x24/0x30
[ 4382.568557]  ? block_rsv_release_bytes+0x14e/0x410 [btrfs]
[ 4382.569006]  create_subvol+0x3c8/0x830 [btrfs]
[ 4382.569461]  ? btrfs_mksubvol+0x317/0x600 [btrfs]
[ 4382.569906]  btrfs_mksubvol+0x317/0x600 [btrfs]
[ 4382.570383]  ? rcu_sync_lockdep_assert+0xe/0x60
[ 4382.570822]  ? __sb_start_write+0xd4/0x1c0
[ 4382.571262]  ? mnt_want_write_file+0x24/0x50
[ 4382.571712]  btrfs_ioctl_snap_create_transid+0x117/0x1a0 [btrfs]
[ 4382.572155]  ? _copy_from_user+0x66/0x90
[ 4382.572602]  btrfs_ioctl_snap_create+0x66/0x80 [btrfs]
[ 4382.573052]  btrfs_ioctl+0x7c1/0x30e0 [btrfs]
[ 4382.573502]  ? mem_cgroup_commit_charge+0x8b/0x570
[ 4382.573946]  ? do_raw_spin_unlock+0x49/0xc0
[ 4382.574379]  ? _raw_spin_unlock+0x24/0x30
[ 4382.574803]  ? __handle_mm_fault+0xf29/0x12d0
[ 4382.575215]  ? do_vfs_ioctl+0xa2/0x6f0
[ 4382.575622]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[ 4382.576020]  do_vfs_ioctl+0xa2/0x6f0
[ 4382.576405]  ksys_ioctl+0x70/0x80
[ 4382.576776]  __x64_sys_ioctl+0x16/0x20
[ 4382.577137]  do_syscall_64+0x60/0x1b0
[ 4382.577488]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[ 4382.578837] RSP: 002b:00007ffe04bf64c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 4382.579174] RAX: ffffffffffffffda RBX: 00005564136f3050 RCX: 00007f5e74724dd7
[ 4382.579505] RDX: 00007ffe04bf64d0 RSI: 000000005000940e RDI: 0000000000000003
[ 4382.579848] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000044
[ 4382.580164] R10: 0000000000000541 R11: 0000000000000202 R12: 00005564136f3010
[ 4382.580477] R13: 0000000000000003 R14: 00005564136f3035 R15: 00005564136f3050
[ 4382.580792] irq event stamp: 0
[ 4382.581106] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
[ 4382.581441] hardirqs last disabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
[ 4382.581772] softirqs last  enabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
[ 4382.582095] softirqs last disabled at (0): [<0000000000000000>]           (null)
[ 4382.582413] ---[ end trace d3c188e3e9367382 ]---
[ 4382.623855] BTRFS: error (device sdc) in btrfs_run_delayed_refs:2981: errno=-5 IO failure
[ 4382.624295] BTRFS info (device sdc): forced readonly

Fix this by waiting for writeback to complete after truncating the eof
block.

Fixes: 34a28e3d77 ("Btrfs: use generic_remap_file_range_prep() for cloning and deduplication")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-09 14:52:22 +01:00
..
9p Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
adfs adfs: use timespec64 for time conversion 2018-08-22 10:52:51 -07:00
affs affs: fix potential memory leak when parsing option 'prefix' 2018-05-28 12:36:41 +02:00
afs Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-30 10:47:50 -08:00
autofs Merge branch 'akpm' (patches from Andrew) 2018-08-22 12:34:08 -07:00
befs fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
bfs bfs: add sanity check at bfs_fill_super() 2018-11-03 10:09:38 -07:00
btrfs Btrfs: fix race between cloning range ending at eof and writeback 2019-01-09 14:52:22 +01:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-11-30 16:00:58 +00:00
ceph ceph: make 'nocopyfrom' a default mount option 2018-12-11 18:22:17 +01:00
cifs CIFS: Avoid returning EBUSY to upper layer VFS 2018-12-07 00:59:23 -06:00
coda vfs: change inode times to use struct timespec64 2018-06-05 16:57:31 -07:00
configfs configfs: fix registered group removal 2018-07-17 06:14:07 -07:00
cramfs Make the Cramfs code more robust against filesystem corruptions, 2018-10-30 12:46:25 -07:00
crypto crypto: speck - remove Speck 2018-09-04 11:35:03 +08:00
debugfs Revert "debugfs: inode: debugfs_create_dir uses mode permission from parent" 2018-06-12 20:52:16 -07:00
devpts devpts: Convert to new IDA API 2018-08-21 23:54:17 -04:00
dlm iov_iter: Separate type from direction and use accessor functions 2018-10-24 00:41:07 +01:00
ecryptfs ecryptfs_rename(): verify that lower dentries are still OK after lock_rename() 2018-10-09 23:33:17 -04:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exofs fs/exofs: only use true/false for asignment of bool type variable 2018-10-18 02:04:59 -04:00
exportfs exportfs: do not read dentry after free 2018-11-23 09:08:17 -05:00
ext2 ext2: fix potential use after free 2018-11-27 10:21:15 +01:00
ext4 A large number of ext4 bug fixes, mostly buffer and memory leaks on 2018-11-11 16:53:02 -06:00
f2fs Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
fat fat: truncate inode timestamp updates in setattr 2018-10-31 08:54:14 -07:00
freevxfs freevxfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
fscache fscache: fix race between enablement and dropping of object 2018-11-30 15:57:31 +00:00
fuse fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS 2018-12-11 21:47:28 +01:00
gfs2 Merge tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 2018-11-16 11:38:14 -06:00
hfs hfs: do not free node before using 2018-11-30 14:56:14 -08:00
hfsplus hfsplus: do not free node before using 2018-11-30 14:56:14 -08:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs hpfs: remove unnecessary checks on the value of r when assigning error code 2018-08-25 12:42:33 -07:00
hugetlbfs mm: zero out the vma in vma_init() 2018-08-22 10:52:44 -07:00
isofs Update email address 2018-09-29 22:47:48 -04:00
jbd2 jbd2: fix use after free in jbd2_log_do_checkpoint() 2018-10-05 18:44:40 -04:00
jffs2 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-10-24 11:22:39 +01:00
jfs jfs: remove redundant dquot_initialize() in jfs_evict_inode() 2018-09-20 09:28:49 -05:00
kernfs mm: zero-seek shrinkers 2018-10-26 16:26:33 -07:00
lockd lockd: fix access beyond unterminated strings in prints 2018-10-29 16:58:04 -04:00
minix minix_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
nfs nfs: don't dirty kernel pages read by direct-io 2018-12-02 09:43:56 -05:00
nfs_common net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
nfsd nfsd: COPY and CLONE operations require the saved filehandle to be set 2018-11-08 12:11:45 -05:00
nilfs2 nilfs2: Use xa_erase_irq 2018-11-05 14:57:05 -05:00
nls
notify fanotify: fix handling of events on child sub-directory 2018-11-08 15:43:48 +01:00
ntfs ntfs: don't open-code ERR_CAST 2018-10-12 22:46:50 -04:00
ocfs2 ocfs2: fix potential use after free 2018-11-30 14:56:15 -08:00
omfs omfs_lookup(): report IO errors, use d_splice_alias() 2018-05-22 14:27:58 -04:00
openpromfs openpromfs: switch to d_splice_alias() 2018-05-22 14:27:57 -04:00
orangefs Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
overlayfs Revert "ovl: relax permission checking on underlying layers" 2018-12-04 11:31:30 +01:00
proc New gcc plugin: stackleak 2018-11-01 11:46:27 -07:00
pstore pstore fix: 2018-11-30 09:03:15 -08:00
qnx4 qnx4_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
qnx6 qnx6_lookup: switch to d_splice_alias() 2018-05-22 14:27:54 -04:00
quota fs/quota: Fix spectre gadget in do_quotactl 2018-08-22 18:17:48 +02:00
ramfs
reiserfs reiserfs: remove workaround code for GCC 3.x 2018-10-31 08:54:14 -07:00
romfs romfs_lookup: switch to d_splice_alias() 2018-05-22 14:27:55 -04:00
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs Driver core patches for 4.19-rc1 2018-08-18 11:44:53 -07:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-11-10 08:02:40 -05:00
tracefs tracefs: Annotate tracefs_ops with __ro_after_init 2018-07-31 11:32:44 -04:00
ubifs ubifs: Remove unneeded semicolon 2018-10-23 13:49:02 +02:00
udf udf: Allow mounting volumes with incorrect identification strings 2018-11-19 10:27:59 +01:00
ufs fs/ufs: use ktime_get_real_seconds for sb and cg timestamps 2018-08-17 16:20:27 -07:00
xfs xfs: fix inverted return from xfs_btree_sblock_verify_crc 2018-12-04 08:50:49 -08:00
aio.c for-linus-20181214 2018-12-14 12:18:30 -08:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c fs: Fix attr.c kernel-doc 2018-07-03 16:44:45 -04:00
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_elf_fdpic.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
binfmt_elf.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
binfmt_em86.c
binfmt_flat.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_misc.c turn filp_clone_open() into inline wrapper for dentry_open() 2018-07-10 23:29:03 -04:00
binfmt_script.c
block_dev.c iov_iter: Use accessor function 2018-10-24 00:40:44 +01:00
buffer.c for-linus-20181102 2018-11-02 11:25:48 -07:00
char_dev.c block, char_dev: Use correct format specifier for unsigned ints 2018-03-15 17:59:24 +01:00
compat_binfmt_elf.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
compat_ioctl.c media updates for v4.20-rc1 2018-10-29 14:29:58 -07:00
compat.c ncpfs: remove compat functionality 2018-06-05 19:23:26 +02:00
coredump.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
d_path.c split d_path() and friends into a separate file 2018-03-29 15:07:46 -04:00
dax.c dax: Fix unlock mismatch with updated API 2018-12-04 21:32:00 -08:00
dcache.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
dcookies.c fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall 2018-04-02 20:15:39 +02:00
direct-io.c fs: fix lost error code in dio_complete 2018-11-30 08:35:14 -07:00
drop_caches.c
eventfd.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
eventpoll.c fs/eventpoll.c: simplify ep_is_linked() callers 2018-08-22 10:52:49 -07:00
exec.c Revert "exec: make de_thread() freezable" 2018-12-04 16:04:20 +01:00
fcntl.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
fhandle.c
file_table.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
file.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
filesystems.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
fs_pin.c
fs_struct.c
fs-writeback.c fs: Convert writeback to XArray 2018-10-21 10:46:42 -04:00
inode.c mm: don't reclaim inodes with many attached pages 2018-11-18 10:15:09 -08:00
internal.h overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
ioctl.c vfs: rework data cloning infrastructure 2018-11-02 09:33:08 -07:00
iomap.c fs/iomap.c: get/put the page in iomap_page_create/release() 2018-12-14 15:05:45 -08:00
Kconfig autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
libfs.c fs, dax: prepare for dax-specific address_space_operations 2018-03-30 11:34:55 -07:00
locks.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
Makefile autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
mbcache.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
mount.h
mpage.c mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
namei.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-08-23 18:48:43 -07:00
namespace.c mnt: fix __detach_mounts infinite loop 2018-11-12 01:02:34 -06:00
no-block.c
nsfs.c
open.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
pipe.c Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 19:58:36 -07:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c vfs: allow some remap flags to be passed to vfs_clone_file_range 2018-12-04 08:50:49 -08:00
readdir.c fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() 2018-04-02 20:16:02 +02:00
select.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
seq_file.c fs/seq_file.c: simplify seq_file iteration code and interface 2018-08-17 16:20:28 -07:00
signalfd.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
splice.c splice: don't read more than available pipe space 2018-12-04 08:50:49 -08:00
stack.c
stat.c y2038: Remove newstat family from default syscall set 2018-08-29 15:42:20 +02:00
statfs.c kernel: add kcompat_sys_{f,}statfs64() 2018-07-12 14:49:48 +01:00
super.c fsnotify: add super block object type 2018-09-03 15:14:01 +02:00
sync.c Changes for this release: 2018-04-04 12:44:02 -07:00
timerfd.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
userfaultfd.c userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered 2018-12-14 15:05:45 -08:00
utimes.c y2038: utimes: Rework #ifdef guards for compat syscalls 2018-08-29 15:42:23 +02:00
xattr.c sysfs: Do not return POSIX ACL xattrs via listxattr 2018-09-18 07:30:48 -04:00