linux/fs/btrfs
Filipe Manana 0376374a98 Btrfs: fix locking bugs when defragging leaves
When running fstests btrfs/070, with a higher number of fsstress
operations, I ran frequently into two different locking bugs when
defragging directories.

The first bug produced the following traces:

[133860.229792] ------------[ cut here ]------------
[133860.251062] WARNING: CPU: 2 PID: 26057 at fs/btrfs/locking.c:46 btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]()
[133860.253576] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 psmouse parport
[133860.282566] CPU: 2 PID: 26057 Comm: btrfs Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[133860.284393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[133860.286827]  0000000000000000 ffff880207697b78 ffffffff812566f4 0000000000000000
[133860.288341]  ffff880207697bb0 ffffffff8104d0a6 ffffffffa052d4c1 ffff880178f60e00
[133860.294219]  ffff880178f60e00 0000000000000000 00000000000000f6 ffff880207697bc0
[133860.295831] Call Trace:
[133860.306518]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[133860.307473]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[133860.308619]  [<ffffffffa052d4c1>] ? btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]
[133860.310068]  [<ffffffff8104d172>] warn_slowpath_null+0x1a/0x1c
[133860.312552]  [<ffffffffa052d4c1>] btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]
[133860.314630]  [<ffffffffa04d5787>] btrfs_set_lock_blocking+0xe/0x10 [btrfs]
[133860.323596]  [<ffffffffa04d99cb>] btrfs_realloc_node+0xb3/0x341 [btrfs]
[133860.325233]  [<ffffffffa050e396>] btrfs_defrag_leaves+0x239/0x2fa [btrfs]
[133860.332427]  [<ffffffffa04fc2ce>] btrfs_defrag_root+0x63/0xca [btrfs]
[133860.337259]  [<ffffffffa052a34e>] btrfs_ioctl_defrag+0x78/0x14e [btrfs]
[133860.340147]  [<ffffffffa052b00b>] btrfs_ioctl+0x746/0x24c6 [btrfs]
[133860.344833]  [<ffffffff81087481>] ? arch_local_irq_save+0x9/0xc
[133860.346343]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.353248]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.354242]  [<ffffffff8113adba>] ? __might_fault+0xa5/0xa7
[133860.355232]  [<ffffffff81171139>] ? cp_new_stat+0x15d/0x174
[133860.356237]  [<ffffffff8117c610>] do_vfs_ioctl+0x427/0x4e6
[133860.358587]  [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[133860.360195]  [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[133860.361380]  [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[133860.363578]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[133860.366217] ---[ end trace 2cadb2f653437e49 ]---
[133860.367399] ------------[ cut here ]------------
[133860.368162] kernel BUG at fs/btrfs/locking.c:307!
[133860.369430] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[133860.370205] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 psmouse parport
[133860.370205] CPU: 2 PID: 26057 Comm: btrfs Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[133860.370205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[133860.370205] task: ffff8800aec6db40 ti: ffff880207694000 task.ti: ffff880207694000
[133860.370205] RIP: 0010:[<ffffffffa052d466>]  [<ffffffffa052d466>] btrfs_assert_tree_locked+0x10/0x14 [btrfs]
[133860.370205] RSP: 0018:ffff880207697bc0  EFLAGS: 00010246
[133860.370205] RAX: 0000000000000000 RBX: ffff880178f60e00 RCX: 0000000000000000
[133860.370205] RDX: ffff88023ec4fb50 RSI: 00000000ffffffff RDI: ffff880178f60e00
[133860.370205] RBP: ffff880207697bc0 R08: 0000000000000001 R09: 0000000000000000
[133860.370205] R10: 0000160000000000 R11: ffffffff81651000 R12: ffff880178f60e00
[133860.370205] R13: 0000000000000000 R14: 00000000000000f6 R15: ffff8801ff409000
[133860.370205] FS:  00007f763efd48c0(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000
[133860.370205] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[133860.370205] CR2: 0000000002158048 CR3: 000000003fd6c000 CR4: 00000000000006e0
[133860.370205] Stack:
[133860.370205]  ffff880207697bd8 ffffffffa052d4d0 0000000000000000 ffff880207697be8
[133860.370205]  ffffffffa04d5787 ffff880207697c80 ffffffffa04d99cb ffff8801ff409590
[133860.370205]  ffff880207697ca8 000000f507697c80 ffff880183c11bb8 0000000000000000
[133860.370205] Call Trace:
[133860.370205]  [<ffffffffa052d4d0>] btrfs_set_lock_blocking_rw+0x66/0xbd [btrfs]
[133860.370205]  [<ffffffffa04d5787>] btrfs_set_lock_blocking+0xe/0x10 [btrfs]
[133860.370205]  [<ffffffffa04d99cb>] btrfs_realloc_node+0xb3/0x341 [btrfs]
[133860.370205]  [<ffffffffa050e396>] btrfs_defrag_leaves+0x239/0x2fa [btrfs]
[133860.370205]  [<ffffffffa04fc2ce>] btrfs_defrag_root+0x63/0xca [btrfs]
[133860.370205]  [<ffffffffa052a34e>] btrfs_ioctl_defrag+0x78/0x14e [btrfs]
[133860.370205]  [<ffffffffa052b00b>] btrfs_ioctl+0x746/0x24c6 [btrfs]
[133860.370205]  [<ffffffff81087481>] ? arch_local_irq_save+0x9/0xc
[133860.370205]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.370205]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.370205]  [<ffffffff8113adba>] ? __might_fault+0xa5/0xa7
[133860.370205]  [<ffffffff81171139>] ? cp_new_stat+0x15d/0x174
[133860.370205]  [<ffffffff8117c610>] do_vfs_ioctl+0x427/0x4e6
[133860.370205]  [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[133860.370205]  [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[133860.370205]  [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[133860.370205]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f

This bug happened because we assumed that by setting keep_locks to 1 in
our search path, our path after a call to btrfs_search_slot() would have
all nodes locked, which is not always true because unlock_up() (called by
btrfs_search_slot()) will unlock a node in a path if the slot of the node
below it doesn't point to the last item or beyond the last item. For
example, when the tree has a heigth of 2 and path->slots[0] has a value
smaller than btrfs_header_nritems(path->nodes[0]) - 1, the node at level 2
will be unlocked (also because lowest_unlock is set to 1 due to the fact
that the value passed as ins_len to btrfs_search_slot is 0).
This resulted in btrfs_find_next_key(), called before btrfs_realloc_node(),
to release out path and call again btrfs_search_slot(), but this time with
the cow parameter set to 0, meaning the resulting path got only read locks.
Therefore when we called btrfs_realloc_node(), with path->nodes[1] having
a read lock, it resulted in the warning and BUG_ON when calling
btrfs_set_lock_blocking() against the node, as that function expects the
node to have a write lock.

The second bug happened often when the first bug didn't happen, and made
us hang and hitting the following warning at fs/btrfs/locking.c:

   251  void btrfs_tree_lock(struct extent_buffer *eb)
   252  {
   253          WARN_ON(eb->lock_owner == current->pid);

This happened because the tree search we made at btrfs_defrag_leaves()
before calling btrfs_find_next_key() locked a leaf and all the other
nodes in the path, so btrfs_find_next_key() had no need to release the
path and make a new search (with path->lowest_level set to 1). This
made btrfs_realloc_node() attempt to write lock the same leaf again,
resulting in a hang/deadlock.

So fix these issues by calling btrfs_find_next_key() after calling
btrfs_realloc_node() and setting the search path's lowest_level to 1
to avoid the hang/deadlock when attempting to write lock the leaves
at btrfs_realloc_node().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-18 02:51:32 +00:00
..
tests Btrfs: tests: checking for NULL instead of IS_ERR() 2015-11-25 05:19:50 -08:00
acl.c btrfs: remove useless ACL check 2014-06-09 17:20:42 -07:00
async-thread.c btrfs: async_thread: Fix workqueue 'max_active' value when initializing 2015-08-31 11:46:40 -07:00
async-thread.h btrfs: async_thread: Fix workqueue 'max_active' value when initializing 2015-08-31 11:46:40 -07:00
backref.c Btrfs: use btrfs_get_fs_root in resolve_indirect_ref 2015-11-25 05:22:08 -08:00
backref.h btrfs: cleanup, remove inode_item_info helper 2015-01-14 19:23:47 +01:00
btrfs_inode.h Btrfs: Direct I/O: Fix space accounting 2015-09-21 13:47:55 -07:00
check-integrity.c Merge branch 'cleanups/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4 2015-10-21 18:21:40 -07:00
check-integrity.h block: submit_bio_wait() conversions 2013-11-24 16:33:41 -07:00
compression.c Merge branch 'cleanups/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4 2015-10-21 18:21:40 -07:00
compression.h btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
ctree.c Merge branch 'cleanups/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4 2015-10-21 18:21:40 -07:00
ctree.h Btrfs: fix scrub preventing unused block groups from being deleted 2015-11-25 05:22:08 -08:00
delayed-inode.c btrfs: comment the rest of implicit barriers before waitqueue_active 2015-10-10 18:42:00 +02:00
delayed-inode.h Btrfs: introduce the delayed inode ref deletion for the single link inode 2014-01-28 13:20:09 -08:00
delayed-ref.c btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans 2015-10-26 19:44:39 -07:00
delayed-ref.h btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans 2015-10-26 19:44:39 -07:00
dev-replace.c Merge branch 'fix/waitqueue-barriers' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4 2015-10-12 16:24:40 -07:00
dev-replace.h
dir-item.c Btrfs: make xattr replace operations atomic 2014-11-20 17:20:07 -08:00
disk-io.c btrfs: qgroup: exit the rescan worker during umount 2015-11-05 10:32:20 +00:00
disk-io.h Btrfs: add btrfs_read_dev_one_super() to read one specific SB 2015-10-01 17:29:38 +02:00
export.c BTRFS: support NFSv2 export 2015-10-06 06:55:23 -07:00
export.h
extent_io.c btrfs: extent_io: Introduce new function clear_record_extent_bits() 2015-10-21 18:37:44 -07:00
extent_io.h btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function 2015-10-21 18:37:45 -07:00
extent_map.c Btrfs: do not move em to modified list when unpinning 2014-11-21 11:59:54 -08:00
extent_map.h Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
extent-tree.c Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list 2015-12-10 11:22:38 +00:00
extent-tree.h btrfs: qgroup: Add new qgroup calculation function 2015-06-10 09:25:49 -07:00
file-item.c Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:52:48 -07:00
file.c Btrfs: check prepare_uptodate_page() error code earlier 2015-12-15 09:09:38 -08:00
free-space-cache.c Merge branch 'for-chris-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.4 2015-12-15 09:09:59 -08:00
free-space-cache.h Btrfs: keep track of largest extent in bitmaps 2015-10-21 18:55:40 -07:00
hash.c btrfs: LLVMLinux: Remove VLAIS 2014-10-14 10:51:22 +02:00
hash.h Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
inode-item.c Btrfs: consolidate btrfs_error() to btrfs_std_error() 2015-09-29 16:30:00 +02:00
inode-map.c btrfs: qgroup: Cleanup old inaccurate facilities 2015-10-21 18:41:06 -07:00
inode-map.h
inode.c Btrfs: fix leaking of ordered extents after direct IO write error 2015-12-17 10:59:51 +00:00
ioctl.c btrfs: check unsupported filters in balance arguments 2015-10-26 19:38:26 -07:00
Kconfig rcu: Make SRCU optional by using CONFIG_SRCU 2015-01-06 11:04:29 -08:00
locking.c btrfs: comment the rest of implicit barriers before waitqueue_active 2015-10-10 18:42:00 +02:00
locking.h btrfs: fix lockups from btrfs_clear_path_blocking 2014-11-19 10:34:35 -08:00
lzo.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
Makefile Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
math.h btrfs: cleanup 64bit/32bit divs, compile time constants 2015-03-03 17:23:57 +01:00
ordered-data.c Btrfs: change how we wait for pending ordered extents 2015-10-21 18:51:40 -07:00
ordered-data.h Btrfs: change how we wait for pending ordered extents 2015-10-21 18:51:40 -07:00
orphan.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
print-tree.c btrfs: remove parameter blocksize from read_tree_block 2014-10-02 17:14:50 +02:00
print-tree.h
props.c btrfs: cleanup iterating over prop_handlers array 2015-10-21 18:28:48 +02:00
props.h Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
qgroup.c btrfs: qgroup: account shared subtree during snapshot delete 2015-11-25 05:27:33 -08:00
qgroup.h btrfs: qgroup: Check if qgroup reserved space leaked 2015-10-21 18:41:10 -07:00
raid56.c btrfs: comment waitqueue_active implied by locks 2015-10-10 18:35:10 +02:00
raid56.h Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation 2015-08-09 07:34:26 -07:00
rcu-string.h
reada.c btrfs: reada: Fix returned errno code 2015-10-21 18:29:50 +02:00
relocation.c Btrfs: fix regression running delayed references when using qgroups 2015-10-25 19:53:26 +00:00
root-tree.c Merge branch 'cleanups/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4 2015-10-21 18:21:40 -07:00
scrub.c Btrfs: fix scrub preventing unused block groups from being deleted 2015-11-25 05:22:08 -08:00
send.c btrfs: fix resending received snapshot with parent 2015-10-13 20:04:10 +01:00
send.h
struct-funcs.c
super.c Btrfs: add fragment=* debug mount option 2015-10-21 18:51:43 -07:00
sysfs.c Btrfs: rename super_kobj to fsid_kobj 2015-09-29 16:29:59 +02:00
sysfs.h Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_link 2015-09-29 16:29:59 +02:00
transaction.c Btrfs: fix memory leaks after transaction is aborted 2015-12-17 10:59:48 +00:00
transaction.h Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list 2015-12-10 11:22:38 +00:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c Btrfs: fix regression running delayed references when using qgroups 2015-10-25 19:53:26 +00:00
tree-log.h Btrfs: fix metadata inconsistencies after directory fsync 2015-03-26 17:56:23 -07:00
ulist.c btrfs: ulist: Add ulist_del() function. 2015-06-10 09:26:17 -07:00
ulist.h btrfs: ulist: Add ulist_del() function. 2015-06-10 09:26:17 -07:00
uuid-tree.c Btrfs: make btrfs_search_forward return with nodes unlocked 2014-09-17 13:38:02 -07:00
volumes.c Btrfs: fix race when finishing dev replace leading to transaction abort 2015-12-17 10:59:46 +00:00
volumes.h btrfs: fix clashing number of the enhanced balance usage filter 2015-11-25 05:19:50 -08:00
xattr.c Btrfs: fix race when listing an inode's xattrs 2015-11-09 18:34:40 +00:00
xattr.h btrfs: use generic posix ACL infrastructure 2014-01-25 23:58:18 -05:00
zlib.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00