linux/fs
Filipe Manana ff1f8250a9 Btrfs: fix race between block group creation and their cache writeout
So creating a block group has 2 distinct phases:

Phase 1 - creates the btrfs_block_group_cache item and adds it to the
rbtree fs_info->block_group_cache_tree and to the corresponding list
space_info->block_groups[];

Phase 2 - adds the block group item to the extent tree and corresponding
items to the chunk tree.

The first phase adds the block_group_cache_item to a list of pending block
groups in the transaction handle, and phase 2 happens when
btrfs_end_transaction() is called against the transaction handle.

It happens that once phase 1 completes, other concurrent tasks that use
their own transaction handle, but points to the same running transaction
(struct btrfs_trans_handle->transaction), can use this block group for
space allocations and therefore mark it dirty. Dirty block groups are
tracked in a list belonging to the currently running transaction (struct
btrfs_transaction) and not in the transaction handle (btrfs_trans_handle).

This is a problem because once a task calls btrfs_commit_transaction(),
it calls btrfs_start_dirty_block_groups() which will see all dirty block
groups and attempt to start their writeout, including those that are
still attached to the transaction handle of some concurrent task that
hasn't called btrfs_end_transaction() yet - which means those block
groups haven't gone through phase 2 yet and therefore when
write_one_cache_group() is called, it won't find the block group items
in the extent tree and abort the current transaction with -ENOENT,
turning the fs into readonly mode and require a remount.

Fix this by ignoring -ENOENT when looking for block group items in the
extent tree when we attempt to start the writeout of the block group
caches outside the critical section of the transaction commit. We will
try again later during the critical section and if there we still don't
find the block group item in the extent tree, we then abort the current
transaction.

This issue happened twice, once while running fstests btrfs/067 and once
for btrfs/078, which produced the following trace:

[ 3278.703014] WARNING: CPU: 7 PID: 18499 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
[ 3278.707329] BTRFS: Transaction aborted (error -2)
(...)
[ 3278.731555] Call Trace:
[ 3278.732396]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
[ 3278.733860]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
[ 3278.735312]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
[ 3278.736874]  [<ffffffffa03ada6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
[ 3278.738302]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
[ 3278.739520]  [<ffffffffa03ada6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
[ 3278.741222]  [<ffffffffa03b9e56>] write_one_cache_group+0xae/0xbf [btrfs]
[ 3278.742797]  [<ffffffffa03c487b>] btrfs_start_dirty_block_groups+0x170/0x2b2 [btrfs]
[ 3278.744492]  [<ffffffffa03d309c>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
[ 3278.746084]  [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf
[ 3278.747249]  [<ffffffffa03e5660>] btrfs_sync_file+0x313/0x387 [btrfs]
[ 3278.748744]  [<ffffffff8117acad>] vfs_fsync_range+0x95/0xa4
[ 3278.749958]  [<ffffffff81435b54>] ? ret_from_sys_call+0x1d/0x58
[ 3278.751218]  [<ffffffff8117acd8>] vfs_fsync+0x1c/0x1e
[ 3278.754197]  [<ffffffff8117ae54>] do_fsync+0x34/0x4e
[ 3278.755192]  [<ffffffff8117b07c>] SyS_fsync+0x10/0x14
[ 3278.756236]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
[ 3278.757366] ---[ end trace 9a4d4df4969709aa ]---

Fixes: 1bbc621ef2 ("Btrfs: allow block group cache writeout
                      outside critical section in commit")

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-05-11 07:59:10 -07:00
..
9p VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
adfs
affs fs/affs/super.c: fix switch indentation 2015-02-17 14:34:53 -08:00
afs Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
autofs4 autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation 2015-02-22 11:43:34 -05:00
befs fs/befs/linuxvfs.c: remove unnecessary casting 2015-02-17 14:34:50 -08:00
bfs
btrfs Btrfs: fix race between block group creation and their cache writeout 2015-05-11 07:59:10 -07:00
cachefiles Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions 2015-02-22 11:38:41 -05:00
ceph Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-22 17:42:14 -08:00
cifs Revert "locks: keep a count of locks on the flctx lists" 2015-02-16 14:32:03 -05:00
coda VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
configfs configfs: Fix potential NULL d_inode dereference 2015-02-20 04:56:43 -05:00
cramfs
debugfs debugfs: leave freeing a symlink body until inode eviction 2015-02-22 11:38:43 -05:00
devpts
dlm netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
ecryptfs eCryptfs: don't pass fs-specific ioctl commands through 2015-03-03 02:03:56 -06:00
efivarfs * Move efivarfs from the misc filesystem section to pseudo filesystem, 2015-01-29 19:16:40 +01:00
efs
exofs vfs: remove get_xip_mem 2015-02-16 17:56:03 -08:00
exportfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
ext2 ext2: get rid of most mentions of XIP in ext2 2015-02-16 17:56:04 -08:00
ext3 ext3: destroy sbi mutexes in put_super 2015-01-05 11:13:55 +01:00
ext4 Ext4 bug fixes for 3.20. We also reserved code points for encryption 2015-02-22 18:05:13 -08:00
f2fs Merge tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs 2015-02-12 19:28:50 -08:00
fat fs: fat: use MSDOS_SB macro to get msdos_sb_info 2015-02-17 14:34:51 -08:00
freevxfs
fscache fs/fscache/object-list.c: use __seq_open_private() 2014-10-13 17:52:21 +01:00
fuse fuse: explicitly set /dev/fuse file's private_data 2015-03-19 15:29:22 +01:00
gfs2 VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
hfs fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp 2014-12-10 17:41:16 -08:00
hfsplus VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
hostfs
hpfs
hppfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
hugetlbfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
isofs isofs: Fix bug in the way to check if the year is a leap year 2015-01-07 09:51:49 +01:00
jbd jbd: Deletion of an unnecessary check before the function call "iput" 2014-11-18 10:15:29 +01:00
jbd2 jbd2: complain about descriptor block checksum errors 2015-01-19 15:59:58 -05:00
jffs2 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-22 17:42:14 -08:00
jfs Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
kernfs kernfs: handle poll correctly on 'direct_read' files. 2015-03-16 21:51:20 +01:00
lockd Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linux 2015-02-12 10:39:41 -08:00
logfs
minix
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
nfs NFSv4.1: Clear the old state by our client id before establishing a new lease 2015-03-03 21:52:30 -05:00
nfs_common lockd: move lockd's grace period handling into its own module 2014-09-17 16:33:11 -04:00
nfsd Subject: nfsd: don't recursively call nfsd4_cb_layout_fail 2015-03-19 15:49:27 -04:00
nilfs2 nilfs2: fix deadlock of segment constructor during recovery 2015-03-12 18:46:08 -07:00
nls
notify fanotify: fix event filtering with FAN_ONDIR set 2015-03-12 18:46:08 -07:00
ntfs fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info 2015-01-20 14:03:04 -07:00
ocfs2 ocfs2: make append_dio an incompat feature 2015-03-12 18:46:07 -07:00
omfs FS/OMFS: block number sanity check during fill_super operation 2014-10-14 02:18:22 +02:00
openpromfs
overlayfs ovl: upper fs should not be R/O 2015-03-18 10:29:48 +01:00
proc pagemap: do not leak physical addresses to non-privileged userspace 2015-03-17 09:31:30 -07:00
pstore pstore: Fix sprintf format specifier in pstore_dump() 2015-01-16 16:01:29 -08:00
qnx4
qnx6
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2015-02-10 15:52:38 -08:00
ramfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
reiserfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
romfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
squashfs Squashfs: Add LZ4 compression configuration option 2014-11-27 18:48:44 +00:00
sysfs driver core patches for 3.20-rc1 2015-02-15 11:11:47 -08:00
sysv
ubifs Merge branch 'for-linus-v3.20' of git://git.infradead.org/linux-ubifs 2015-02-15 10:11:39 -08:00
udf udf: remove bool assignment to 0/1 2015-02-05 16:34:25 +01:00
ufs fs/ufs/super.c: fix potential race condition 2015-02-17 14:34:51 -08:00
xfs xfs: cancel failed transaction in xfs_fs_commit_blocks() 2015-02-24 10:15:18 +11:00
aio.c fs/aio.c: Remove duplicate function name in pr_debug messages 2015-02-20 04:56:44 -05:00
anon_inodes.c
attr.c
bad_inode.c don't bother with most of the bad_file_ops methods 2015-02-20 04:03:58 -05:00
binfmt_aout.c assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
binfmt_elf_fdpic.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_elf.c x86, mm/ASLR: Fix stack randomization on 64-bit systems 2015-02-19 12:21:36 +01:00
binfmt_em86.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_flat.c
binfmt_misc.c unfuck binfmt_misc.c (broken by commit e6084d4) 2014-12-17 08:27:14 -05:00
binfmt_script.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
block_dev.c Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block 2015-02-12 14:13:23 -08:00
buffer.c fs: clarify rate limit suppressed buffer I/O errors 2014-10-21 13:55:11 -06:00
char_dev.c fs: introduce f_op->mmap_capabilities for nommu mmap support 2015-01-20 14:02:58 -07:00
compat_binfmt_elf.c
compat_ioctl.c
compat.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
coredump.c coredump: Fix typo in comment 2015-02-20 04:56:44 -05:00
dax.c dax: add dax_zero_page_range 2015-02-16 17:56:04 -08:00
dcache.c VFS: Split DCACHE_FILE_TYPE into regular and special types 2015-02-22 11:38:38 -05:00
dcookies.c
direct-io.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
drop_caches.c vmscan: per memory cgroup slab shrinkers 2015-02-12 18:54:09 -08:00
eventfd.c eventfd: don't take the spinlock in eventfd_poll 2015-02-17 14:34:52 -08:00
eventpoll.c epoll: optimize setting task running after blocking 2015-02-13 21:21:40 -08:00
exec.c fs: create proper filename objects using getname_kernel() 2015-01-23 00:22:20 -05:00
fcntl.c vfs: renumber FMODE_NONOTIFY and add to uniqueness check 2015-01-08 15:10:52 -08:00
fhandle.c
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
file.c fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) 2014-12-10 17:41:10 -08:00
filesystems.c
fs_pin.c switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
fs_struct.c
fs-writeback.c trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
inode.c Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
internal.h trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
ioctl.c fsioctl.c: make generic_block_fiemap() signal-tolerant 2015-02-10 14:30:30 -08:00
Kconfig dax: does not work correctly with virtual aliasing caches 2015-02-16 17:56:04 -08:00
Kconfig.binfmt fs/binfmt_som: Drop kernel support for HP-UX SOM binaries 2015-02-17 16:29:36 +01:00
libfs.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
locks.c locks: fix generic_delete_lease tracepoint to use victim pointer 2015-03-14 09:45:35 -04:00
Makefile Merge branch 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2015-02-17 14:25:58 -08:00
mbcache.c
mount.h switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
mpage.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
namei.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
namespace.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
no-block.c
nsfs.c take the targets of /proc/*/ns/* symlinks to separate fs 2014-12-10 21:30:20 -05:00
open.c Merge branch 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 15:27:47 -08:00
pipe.c
pnode.c mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. 2014-12-02 10:46:50 -06:00
pnode.h
posix_acl.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
proc_namespace.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
read_write.c Merge branch 'iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 15:48:33 -08:00
readdir.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
select.c all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
seq_file.c bitmap, cpumask, nodemask: remove dedicated formatting functions 2015-02-13 21:21:39 -08:00
signalfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
splice.c fs: add vfs_iter_{read,write} helpers 2015-01-29 00:13:13 -05:00
stack.c
stat.c
statfs.c
super.c trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
sync.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
timerfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
utimes.c
xattr.c new helper: audit_file() 2014-11-19 13:01:26 -05:00