linux/fs
Amir Goldstein 47c7d0b195 xfs: fix incorrect log_flushed on fsync
When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

 Task A                    Task B
 -------------------------------------------------------
 xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
..
9p 9p: Implement show_options 2017-07-11 06:08:58 -04:00
adfs
affs affs: Implement show_options 2017-07-11 06:06:17 -04:00
afs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
autofs4 Fix up over-eager 'wait_queue_t' renaming 2017-07-10 11:40:19 -07:00
befs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
bfs bfs: fix sanity checks for empty files 2017-07-12 16:26:00 -07:00
btrfs Merge branch 'for-4.13-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-07-28 12:26:59 -07:00
cachefiles sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming 2017-06-20 12:19:14 +02:00
ceph ceph: fix race in concurrent readdir 2017-07-17 14:54:59 +02:00
cifs Add wait_for_random_bytes() and get_random_*_wait() functions so that 2017-07-15 12:44:02 -07:00
coda fs: implement vfs_iter_write using do_iter_write 2017-06-29 17:49:23 -04:00
configfs configfs: Introduce config_item_get_unless_zero() 2017-06-12 13:20:20 +02:00
cramfs
crypto The first major feature for ext4 this merge window is the largedir 2017-07-09 09:31:22 -07:00
debugfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
devpts pty: fix the cached path of the pty slave file descriptor in the master 2017-08-17 09:10:48 -07:00
dlm
ecryptfs ecryptfs: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
efivarfs VFS: Kill off s_options and helpers 2017-07-11 06:09:21 -04:00
efs
exofs mm: drop "wait" parameter from write_one_page() 2017-07-05 18:44:22 -04:00
exportfs
ext2 ext2: preserve i_mode if ext2_set_acl() fails 2017-07-18 11:23:56 +02:00
ext4 ext4: fix copy paste error in ext4_swap_extents() 2017-08-06 01:33:07 -04:00
f2fs f2fs: avoid cpu lockup 2017-07-17 19:23:18 -07:00
fat
freevxfs
fscache
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2017-08-11 11:20:48 -07:00
gfs2 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
hfs fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag 2017-05-08 17:15:14 -07:00
hfsplus hfsplus: Don't clear SGID when inheriting ACLs 2017-07-18 18:23:39 +02:00
hostfs
hpfs
hugetlbfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
isofs isofs: Fix off-by-one in 'session' mount option parsing 2017-07-18 12:33:16 +02:00
jbd2 Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
jffs2 jffs2: fix spelling mistake: "requestied" -> "requested" 2017-04-19 11:35:55 -07:00
jfs JFS fixes for 4.13 2017-07-25 08:51:57 -07:00
kernfs
lockd sunrpc: mark all struct svc_version instances as const 2017-07-13 15:58:03 -04:00
minix Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:50:54 -07:00
ncpfs mm: per-cgroup memory reclaim stats 2017-07-06 16:24:35 -07:00
nfs Some more NFS client bugfixes for 4.13 2017-08-11 13:54:09 -07:00
nfs_common
nfsd nfsd: Fix a memory scribble in the callback channel 2017-07-17 13:15:06 -04:00
nilfs2 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-03 13:08:04 -07:00
nls
notify dentry name snapshots 2017-07-07 20:09:10 -04:00
ntfs ntfs: Use ERR_CAST() to avoid cross-structure cast 2017-05-28 10:11:48 -07:00
ocfs2 ocfs2: don't clear SGID when inheriting ACLs 2017-08-02 17:16:13 -07:00
omfs omfs: Implement show_options 2017-07-06 03:31:46 -04:00
openpromfs
orangefs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
overlayfs ovl: check for bad and whiteout index on lookup 2017-07-20 11:08:21 +02:00
proc mm: fix KSM data corruption 2017-08-10 15:54:07 -07:00
pstore Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
qnx4
qnx6
quota quota: correct space limit check 2017-08-07 16:51:28 +02:00
ramfs ramfs: Implement show_options 2017-07-06 03:31:46 -04:00
reiserfs reiserfs: preserve i_mode if __reiserfs_set_acl() fails 2017-07-18 11:24:08 +02:00
romfs
squashfs
sysfs
sysv mm: drop "wait" parameter from write_one_page() 2017-07-05 18:44:22 -04:00
tracefs VFS: Don't use save/replace_mount_options if not using generic_show_options 2017-07-06 03:31:46 -04:00
ubifs ubifs: Set double hash cookie also for RENAME_EXCHANGE 2017-07-14 22:50:57 +02:00
udf udf: Convert udf_disk_stamp_to_time() to use mktime64() 2017-06-14 11:21:02 +02:00
ufs Writeback error handling fixes (pile #1) 2017-07-07 18:39:15 -07:00
xfs xfs: fix incorrect log_flushed on fsync 2017-09-01 13:08:26 -07:00
aio.c fs: add O_DIRECT and aio support for sending down write life time hints 2017-06-27 12:05:36 -06:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks 2017-08-16 20:32:02 +02:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: Use %u to format u32 2017-07-16 09:24:05 -07:00
binfmt_misc.c fs: constify tree_descr arrays passed to simple_fill_super() 2017-04-26 23:54:06 -04:00
binfmt_script.c
block_dev.c Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
buffer.c fs/buffer.c: make bh_lru_install() more efficient 2017-07-10 16:32:30 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'work.__copy_in_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:15:02 -07:00
compat.c
coredump.c
dax.c Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
dcache.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
dcookies.c
direct-io.c fs: add O_DIRECT and aio support for sending down write life time hints 2017-06-27 12:05:36 -06:00
drop_caches.c
eventfd.c There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
eventpoll.c kcmp: fs/epoll: wrap kcmp code with CONFIG_CHECKPOINT_RESTORE 2017-07-12 16:26:01 -07:00
exec.c exec: Limit arg stack to at most 75% of _STK_LIM 2017-07-07 20:05:08 -07:00
fcntl.c vfs: fix flock compat thinko 2017-07-07 13:48:18 -07:00
fhandle.c
file_table.c fs: new infrastructure for writeback error handling and reporting 2017-07-06 07:02:25 -04:00
file.c fs/file.c: replace alloc_fdmem() with kvmalloc() alternative 2017-07-06 16:24:30 -07:00
filesystems.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
fs_pin.c sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming 2017-06-20 12:19:14 +02:00
fs_struct.c
fs-writeback.c writeback: rework wb_[dec|inc]_stat family of functions 2017-07-12 16:26:05 -07:00
inode.c xfs: evict all inodes involved with log redo item 2017-09-01 10:55:30 -07:00
internal.h xfs: evict all inodes involved with log redo item 2017-09-01 10:55:30 -07:00
ioctl.c
iomap.c iomap: return VM_FAULT_* codes from iomap_page_mkwrite 2017-09-01 10:55:30 -07:00
Kconfig fs/Kconfig: kill CONFIG_PERCPU_RWSEM some more 2017-07-12 16:26:00 -07:00
Kconfig.binfmt
libfs.c fs: convert __generic_file_fsync to use errseq_t based reporting 2017-07-06 07:02:29 -04:00
locks.c fs/locks: pass kernel struct flock to fcntl_getlk/setlk 2017-05-27 06:07:19 -04:00
Makefile
mbcache.c ext4: xattr inode deduplication 2017-06-22 11:44:55 -04:00
mount.h Now that IPC and other changes have landed, enable manual markings for 2017-07-19 08:55:18 -07:00
mpage.c There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
namei.c Now that IPC and other changes have landed, enable manual markings for 2017-07-19 08:55:18 -07:00
namespace.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
no-block.c
nsfs.c VFS: Provide empty name qstr 2017-07-06 03:27:09 -04:00
open.c Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
pipe.c VFS: Provide empty name qstr 2017-07-06 03:27:09 -04:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-05-23 08:41:17 -05:00
pnode.h
posix_acl.c
proc_namespace.c
read_write.c Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-07 21:48:15 -07:00
readdir.c
select.c Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 20:57:13 -07:00
seq_file.c mm: introduce kv[mz]alloc helpers 2017-05-08 17:15:12 -07:00
signalfd.c sched/wait: Rename wait_queue_t => wait_queue_entry_t 2017-06-20 12:18:27 +02:00
splice.c fs: implement vfs_iter_write using do_iter_write 2017-06-29 17:49:23 -04:00
stack.c
stat.c ufs: restore maintaining ->i_blocks 2017-06-09 16:28:01 -04:00
statfs.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:50:54 -07:00
super.c VFS: Kill off s_options and helpers 2017-07-11 06:09:21 -04:00
sync.c fs: remove call_fsync helper function 2017-07-05 18:44:23 -04:00
timerfd.c timerfd: Use get_itimerspec64() and put_itimerspec64() 2017-06-30 04:14:38 -04:00
userfaultfd.c userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage 2017-08-10 15:54:07 -07:00
utimes.c
xattr.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00