linux/fs
Jeff Mahoney 144439376b btrfs: fix lockup in find_free_extent with read-only block groups
If we have a block group that is all of the following:
1) uncached in memory
2) is read-only
3) has a disk cache state that indicates we need to recreate the cache

AND the file system has enough free space fragmentation such that the
request for an extent of a given size can't be honored;

AND have a single CPU core;

AND it's the block group with the highest starting offset such that
there are no opportunities (like reading from disk) for the loop to
yield the CPU;

We can end up with a lockup.

The root cause is simple.  Once we're in the position that we've read in
all of the other block groups directly and none of those block groups
can honor the request, there are no more opportunities to sleep.  We end
up trying to start a caching thread which never gets run if we only have
one core.  This *should* present as a hung task waiting on the caching
thread to make some progress, but it doesn't.  Instead, it degrades into
a busy loop because of the placement of the read-only check.

During the first pass through the loop, block_group->cached will be set
to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
read-only check and short circuit the loop.  We're not yet in
LOOP_CACHING_WAIT, so we skip that loop back before going through the
loop again for other raid groups.

Then we move to LOOP_CACHING_WAIT state.

During the this pass through the loop, ->cached will still be
BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
cache_block_group, do a lot of nothing, and return, and also set
have_caching_bg again.  Then we hit the read-only check and short circuit
the loop.  The same thing happens as before except now we DO trigger
the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
top.  We do this forever.

There are two fixes in this patch since they address the same underlying
bug.

The first is to add a cond_resched to the end of the loop to ensure
that the caching thread always has an opportunity to run.  This will
fix the soft lockup issue, but find_free_extent will still loop doing
nothing until the thread has completed.

The second is to move the read-only check to the top of the loop.  We're
never going to return an allocation within a read-only block group so
we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
would cause the same problem except that BTRFS_CACHE_ERROR is considered
a "done" state and we won't re-set have_caching_bg again.

Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
the testing process.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-07-24 16:04:02 +02:00
..
9p 9p: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
adfs
affs fs/affs: add rename exchange 2017-05-05 15:24:52 -04:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-05-02 16:40:27 -07:00
autofs4 Fix dead URLs to ftp.kernel.org 2017-03-28 16:16:52 +02:00
befs befs: make export work with cold dcache 2017-05-05 11:35:35 +01:00
bfs Tigran has moved 2017-05-12 15:57:15 -07:00
btrfs btrfs: fix lockup in find_free_extent with read-only block groups 2017-07-24 16:04:02 +02:00
cachefiles sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
ceph ceph: unify inode i_ctime update 2017-06-14 19:37:23 +02:00
cifs [CIFS] Minor cleanup of xattr query function 2017-05-12 20:59:10 -05:00
coda coda: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
configfs configfs: Introduce config_item_get_unless_zero() 2017-06-12 13:20:20 +02:00
cramfs
crypto fscrypt: introduce helper function for filename matching 2017-05-04 11:44:37 -04:00
debugfs fs: constify tree_descr arrays passed to simple_fill_super() 2017-04-26 23:54:06 -04:00
devpts
dlm net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
ecryptfs ecryptfs: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
efivarfs
efs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
exofs fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag 2017-05-08 17:15:14 -07:00
exportfs Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
ext2 dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n case 2017-05-13 17:52:16 -07:00
ext4 ext4: fix fdatasync(2) after extent manipulation operations 2017-05-29 13:24:55 -04:00
f2fs crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-08 17:36:03 +08:00
fat fat: fix using uninitialized fields of fat_inode/fsinfo_inode 2017-03-09 17:01:10 -08:00
freevxfs
fscache KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload() 2017-03-02 10:09:00 +11:00
fuse Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-05-20 16:12:30 -07:00
gfs2 gfs2: Make flush bios explicitely sync 2017-05-24 13:35:20 +02:00
hfs fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag 2017-05-08 17:15:14 -07:00
hfsplus fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag 2017-05-08 17:15:14 -07:00
hostfs vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
hpfs sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
hugetlbfs mm: larger stack guard gap, between vmas 2017-06-19 21:50:20 +08:00
isofs sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
jbd2 jbd2: preserve original nofs flag during journal restart 2017-05-21 22:32:23 -04:00
jffs2 jffs2: fix spelling mistake: "requestied" -> "requested" 2017-04-19 11:35:55 -07:00
jfs jfs: Remove jfs_get_inode_flags() 2017-04-19 14:21:23 +02:00
kernfs kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file() 2017-03-17 10:25:59 +09:00
lockd Another RDMA update from Chuck Lever, and a bunch of miscellaneous 2017-05-10 13:29:23 -07:00
minix statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
ncpfs ncpfs: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
nfs NFS client bugfixes for Linux 4.12 2017-06-04 11:56:53 -07:00
nfs_common netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
nfsd nfsd4: fix null dereference on replay 2017-05-23 14:20:58 -04:00
nilfs2 fs: Remove SB_I_DYNBDI flag 2017-04-20 12:09:55 -06:00
nls
notify fanotify: don't expose EOPENSTALE to userspace 2017-04-25 15:48:06 +02:00
ntfs ntfs: Use ERR_CAST() to avoid cross-structure cast 2017-05-28 10:11:48 -07:00
ocfs2 ocfs2: Use ERR_CAST() to avoid cross-structure cast 2017-05-28 10:11:49 -07:00
omfs sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
openpromfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
orangefs Some cleanups: 2017-05-05 13:36:10 -07:00
overlayfs ovl: filter trusted xattr for non-admin 2017-05-29 15:15:27 +02:00
proc mm: larger stack guard gap, between vmas 2017-06-19 21:50:20 +08:00
pstore Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
qnx4
qnx6
quota ext4: fix quota charging for shared xattr blocks 2017-05-24 18:24:07 -04:00
ramfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
reiserfs reiserfs: Make flush bios explicitely sync 2017-05-24 13:35:20 +02:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-01-24 16:26:14 -08:00
squashfs fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version 2017-02-24 17:46:57 -08:00
sysfs sysfs: be careful of error returns from ops->show() 2017-04-08 17:33:32 +02:00
sysv statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
tracefs fs: constify tree_descr arrays passed to simple_fill_super() 2017-04-26 23:54:06 -04:00
ubifs This pull request contains updates for both UBI and UBIFS: 2017-05-13 10:23:12 -07:00
udf udf: use kmap_atomic for memcpy copying 2017-04-24 16:28:02 +02:00
ufs Merge branch 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-06-17 17:30:07 +09:00
xfs Changes since last update: 2017-06-17 17:34:41 +09:00
aio.c Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
anon_inodes.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
attr.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
bad_inode.c statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
binfmt_aout.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
binfmt_elf_fdpic.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
binfmt_elf.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
binfmt_em86.c
binfmt_flat.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
binfmt_misc.c fs: constify tree_descr arrays passed to simple_fill_super() 2017-04-26 23:54:06 -04:00
binfmt_script.c
block_dev.c Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 2017-05-12 15:43:10 -07:00
buffer.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-09 09:12:53 -07:00
char_dev.c chardev: add helper function to register char devs with a struct device 2017-03-21 06:44:32 +01:00
compat_binfmt_elf.c fs/binfmt: Convert obsolete cputime type to nsecs 2017-02-01 09:13:51 +01:00
compat_ioctl.c fs: compat: Remove warning from COMPATIBLE_IOCTL 2017-04-29 17:47:19 -04:00
compat.c fs/compat.c: trim unused includes 2017-04-17 12:52:27 -04:00
coredump.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
dax.c dax: fix race between colliding PMD & PTE entries 2017-06-02 15:07:37 -07:00
dcache.c Hang/soft lockup in d_invalidate with simultaneous calls 2017-06-15 06:52:09 -04:00
dcookies.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
direct-io.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
drop_caches.c
eventfd.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
eventpoll.c epoll: Add busy poll support to epoll with socket fds. 2017-03-24 20:49:31 -07:00
exec.c x86/arch_prctl: Add ARCH_[GET|SET]_CPUID 2017-03-20 16:10:34 +01:00
fcntl.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-09 09:12:53 -07:00
fhandle.c fhandle: move compat syscalls from compat.c 2017-04-17 12:52:26 -04:00
file_table.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
file.c mm, vmalloc: use __GFP_HIGHMEM implicitly 2017-05-08 17:15:13 -07:00
filesystems.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fs_pin.c
fs_struct.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
fs-writeback.c writeback: fix memory leak in wb_queue_work() 2017-03-13 08:27:34 -06:00
inode.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-09 09:12:53 -07:00
internal.h Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-09 09:12:53 -07:00
ioctl.c sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency 2017-03-02 08:42:37 +01:00
iomap.c fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag 2017-05-08 17:15:14 -07:00
Kconfig block, dax: move "select DAX" from BLOCK to FS_DAX 2017-05-08 10:55:27 -07:00
Kconfig.binfmt
libfs.c fs: constify tree_descr arrays passed to simple_fill_super() 2017-04-26 23:54:06 -04:00
locks.c locks: Set FL_CLOSE when removing flock locks on close() 2017-04-21 10:45:01 -04:00
Makefile logfs: remove from tree 2016-12-14 23:48:11 -05:00
mbcache.c mbcache: document that "find" functions only return reusable entries 2016-12-03 15:55:01 -05:00
mount.h fsnotify: Free fsnotify_mark_connector when there is no mark attached 2017-04-10 17:37:35 +02:00
mpage.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
namei.c Merge branch 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-12 11:39:59 -07:00
namespace.c fs: don't forget to put old mntns in mntns_install 2017-06-15 06:53:05 -04:00
no-block.c
nsfs.c ns: allow ns_entries to have custom symlink content 2017-05-08 17:15:12 -07:00
open.c Merge branch 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-12 11:39:59 -07:00
pipe.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pnode.c mnt: Tuck mounts under others instead of creating shadow/side mounts. 2017-02-04 00:01:06 +13:00
pnode.h mnt: Tuck mounts under others instead of creating shadow/side mounts. 2017-02-04 00:01:06 +13:00
posix_acl.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
proc_namespace.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
read_write.c fs: pass on flags in compat_writev 2017-06-16 18:40:51 +09:00
readdir.c readdir: move compat syscalls from compat.c 2017-04-17 12:52:24 -04:00
select.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00
seq_file.c mm: introduce kv[mz]alloc helpers 2017-05-08 17:15:12 -07:00
signalfd.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
splice.c Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-02 11:38:06 -07:00
stack.c
stat.c ufs: restore maintaining ->i_blocks 2017-06-09 16:28:01 -04:00
statfs.c statfs: move compat syscalls from compat.c 2017-04-17 12:52:23 -04:00
super.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
sync.c vfs: use helper for calling f_op->fsync() 2017-02-20 16:51:23 +01:00
timerfd.c timerfd: Only check CAP_WAKE_ALARM when it is needed 2017-03-01 12:53:44 +01:00
userfaultfd.c userfaultfd: shmem: handle coredumping in handle_userfault() 2017-06-17 06:37:05 +09:00
utimes.c utimes: move compat syscalls from compat.c 2017-04-17 12:52:23 -04:00
xattr.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00