linux/fs/ext4
Jeff Moyer 491caa4363 ext4: fix race between sync and completed io work
The following command line will leave the aio-stress process unkillable
on an ext4 file system (in my case, mounted on /mnt/test):

aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2

This is using the aio-stress program from the xfstests test suite.
That particular command line tells aio-stress to do random writes to
20 files from 20 threads (one thread per file).  The files are NOT
preallocated, so you will get writes to random offsets within the
file, thus creating holes and extending i_size.  It also opens the
file with O_DIRECT and O_SYNC.

On to the problem.  When an I/O requires unwritten extent conversion,
it is queued onto the completed_io_list for the ext4 inode.  Two code
paths will pull work items from this list.  The first is the
ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
which is called via the fsync path (and O_SYNC handling, as well).
There are two issues I've found in these code paths.  First, if the
fsync path beats the work routine to a particular I/O, the work
routine will free the io_end structure!  It does not take into account
the fact that the io_end may still be in use by the fsync path.  I've
fixed this issue by adding yet another IO_END flag, indicating that
the io_end is being processed by the fsync path.

The second problem is that the work routine will make an assignment to
io->flag outside of the lock.  I have witnessed this result in a hang
at umount.  Moving the flag setting inside the lock resolved that
problem.

The problem was introduced by commit b82e384c7b ("ext4: optimize
locking for end_io extent conversion"), which first appeared in 3.2.
As such, the fix should be backported to that release (probably along
with the unwritten extent conversion race fix).

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
CC: stable@kernel.org
2012-03-05 10:29:52 -05:00
..
acl.c switch posix_acl_equiv_mode() to umode_t * 2011-08-01 02:10:06 -04:00
acl.h fs: take the ACL checks to common code 2011-07-25 14:30:23 -04:00
balloc.c ext4: fix balloc.c printk-format-warning 2012-02-20 17:57:24 -05:00
bitmap.c ext4: Change unsigned long to unsigned int 2008-11-05 00:14:04 -05:00
block_validity.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
dir.c ext4: remove an unneeded NULL check in __ext4_check_dir_entry() 2012-02-20 17:53:05 -05:00
ext4_extents.h ext4: Fix bigalloc quota accounting and i_blocks value 2011-09-09 19:04:51 -04:00
ext4_jbd2.c jbd2: add debugging information to jbd2_journal_dirty_metadata() 2011-09-04 10:18:14 -04:00
ext4_jbd2.h ext4: expand commit callback and 2012-02-20 17:53:02 -05:00
ext4.h ext4: fix race between sync and completed io work 2012-03-05 10:29:52 -05:00
extents.c Merge branch 'for_linus' into for_linus_merged 2012-01-10 11:54:07 -05:00
file.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2011-11-02 10:06:20 -07:00
fsync.c ext4: fix race between sync and completed io work 2012-03-05 10:29:52 -05:00
hash.c ext4: Add support for non-native signed/unsigned htree hash algorithms 2008-10-28 13:21:44 -04:00
ialloc.c ext4: fix race when setting bitmap_uptodate flag 2012-02-20 17:52:46 -05:00
indirect.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
inode.c ext4: clean up the flags passed to __blockdev_direct_IO 2012-03-05 10:19:52 -05:00
ioctl.c Merge branch 'for_linus' into for_linus_merged 2012-01-10 11:54:07 -05:00
Kconfig ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configured 2009-12-21 10:54:09 -05:00
Makefile ext4: move ext4_ind_* functions from inode.c to indirect.c 2011-06-27 19:40:50 -04:00
mballoc.c ext4: remove EXT4_MB_{BITMAP,BUDDY} macros 2012-02-20 17:54:06 -05:00
mballoc.h ext4: remove EXT4_MB_{BITMAP,BUDDY} macros 2012-02-20 17:54:06 -05:00
migrate.c ext4: using PTR_ERR() on the wrong variable in ext4_ext_migrate() 2012-02-20 17:53:06 -05:00
mmp.c ext4: Fix endianness bug when reading the MMP block 2012-02-27 01:09:03 -05:00
move_extent.c ext4: add some tracepoints in ext4/extents.c 2011-09-09 19:18:51 -04:00
namei.c ext4: format flag in dx_probe() 2012-02-20 23:09:36 -05:00
page-io.c ext4: fix race between sync and completed io work 2012-03-05 10:29:52 -05:00
resize.c ext4: fix resize when resizing within single group 2012-02-20 23:02:06 -05:00
super.c ext4: try to deprecate noacl and noxattr_user mount options 2012-03-04 22:06:20 -05:00
symlink.c ext4: symlink must be handled via filesystem specific operation 2010-05-16 02:00:00 -04:00
truncate.h ext4: move common truncate functions to header file 2011-06-27 19:16:04 -04:00
xattr_security.c Merge branch 'for_linus' into for_linus_merged 2012-01-10 11:54:07 -05:00
xattr_trusted.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
xattr_user.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
xattr.c ext4: avoid deadlock on sync-mounted FS w/o journal 2012-02-20 23:06:18 -05:00
xattr.h fs/vfs/security: pass last path component to LSM on inode creation 2011-02-01 11:12:29 -05:00