linux/fs/xfs
Brian Foster 5d11fb4b9a xfs: rework zero range to prevent invalid i_size updates
The zero range operation is analogous to fallocate with the exception of
converting the range to zeroes. E.g., it attempts to allocate zeroed
blocks over the range specified by the caller. The XFS implementation
kills all delalloc blocks currently over the aligned range, converts the
range to allocated zero blocks (unwritten extents) and handles the
partial pages at the ends of the range by sending writes through the
pagecache.

The current implementation suffers from several problems associated with
inode size. If the aligned range covers an extending I/O, said I/O is
discarded and an inode size update from a previous write never makes it
to disk. Further, if an unaligned zero range extends beyond eof, the
page write induced for the partial end page can itself increase the
inode size, even if the zero range request is not supposed to update
i_size (via KEEP_SIZE, similar to an fallocate beyond EOF).

The latter behavior not only incorrectly increases the inode size, but
can lead to stray delalloc blocks on the inode. Typically, post-eof
preallocation blocks are either truncated on release or inode eviction
or explicitly written to by xfs_zero_eof() on natural file size
extension. If the inode size increases due to zero range, however,
associated blocks leak into the address space having never been
converted or mapped to pagecache pages. A direct I/O to such an
uncovered range cannot convert the extent via writeback and will BUG().
For example:

$ xfs_io -fc "pwrite 0 128k" -c "fzero -k 1m 54321" <file>
...
$ xfs_io -d -c "pread 128k 128k" <file>
<BUG>

If the entire delalloc extent happens to not have page coverage
whatsoever (e.g., delalloc conversion couldn't find a large enough free
space extent), even a full file writeback won't convert what's left of
the extent and we'll assert on inode eviction.

Rework xfs_zero_file_space() to avoid buffered I/O for partial pages.
Use the existing hole punch and prealloc mechanisms as primitives for
zero range. This implementation is not efficient nor ideal as we
writeback dirty data over the range and remove existing extents rather
than convert to unwrittern. The former writeback, however, is currently
the only mechanism available to ensure consistency between pagecache and
extent state. Even a pagecache truncate/delalloc punch prior to hole
punch has lead to inconsistencies due to racing with writeback.

This provides a consistent, correct implementation of zero range that
survives fsstress/fsx testing without assert failures. The
implementation can be optimized from this point forward once the
fundamental issue of pagecache and delalloc extent state consistency is
addressed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-30 10:35:11 +11:00
..
libxfs Merge branch 'xfs-misc-fixes-for-3.18-3' into for-next 2014-10-13 10:22:45 +11:00
Kconfig xfs: require 64-bit sector_t 2014-07-30 09:12:05 +10:00
kmem.c xfs: kill time.h 2014-10-02 09:18:13 +10:00
kmem.h xfs: simplify kmem_{zone_}zalloc 2013-11-06 16:31:27 -06:00
Makefile xfs: add xfs_mount sysfs kobject 2014-07-15 08:07:01 +10:00
mrlock.h xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
uuid.c
uuid.h
xfs_acl.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_acl.h xfs: use generic posix ACL infrastructure 2014-01-25 23:58:21 -05:00
xfs_aops.c Merge branch 'xfs-misc-fixes-for-3.18-3' into for-next 2014-10-13 10:22:45 +11:00
xfs_aops.h direct-io: Implement generic deferred AIO completions 2013-09-04 09:23:46 -04:00
xfs_attr_inactive.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_attr_list.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_attr.h xfs: kill xfs_vnodeops.[ch] 2013-08-12 16:53:39 -05:00
xfs_bit.c xfs: fix static and extern sparse warnings 2013-10-30 13:59:56 -05:00
xfs_bmap_util.c xfs: rework zero range to prevent invalid i_size updates 2014-10-30 10:35:11 +11:00
xfs_bmap_util.h xfs: refine the allocation stack switch 2014-07-15 07:08:24 +10:00
xfs_buf_item.c Merge branch 'xfs-buf-iosubmit' into for-next 2014-10-02 09:11:14 +10:00
xfs_buf_item.h xfs: decouple inode and bmap btree header files 2013-10-23 16:28:49 -05:00
xfs_buf.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
xfs_buf.h xfs: check xfs_buf_read_uncached returns correctly 2014-10-02 09:05:32 +10:00
xfs_dir2_readdir.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_discard.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_discard.h
xfs_dquot_item.c xfs: remove the quotaoff log format from the quotaoff log item 2013-12-13 11:34:08 +11:00
xfs_dquot_item.h xfs: remove the quotaoff log format from the quotaoff log item 2013-12-13 11:34:08 +11:00
xfs_dquot.c xfs: quotacheck leaves dquot buffers without verifiers 2014-08-04 12:43:26 +10:00
xfs_dquot.h xfs: run an eofblocks scan on ENOSPC/EDQUOT 2014-07-24 19:49:28 +10:00
xfs_error.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_error.h xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_export.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_export.h
xfs_extent_busy.c xfs: decouple inode and bmap btree header files 2013-10-23 16:28:49 -05:00
xfs_extent_busy.h xfs: decouple inode and bmap btree header files 2013-10-23 16:28:49 -05:00
xfs_extfree_item.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_extfree_item.h xfs: split out EFI/EFD log item format definition 2013-08-12 16:07:13 -05:00
xfs_file.c Merge branch 'xfs-misc-fixes-for-3.18-1' into for-next 2014-09-09 13:25:31 +10:00
xfs_filestream.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_filestream.h xfs: add filestream allocator tracepoints 2014-04-23 07:11:52 +10:00
xfs_fs.h Merge branch 'xfs-misc-fixes-3.17-1' into for-next 2014-08-04 13:54:14 +10:00
xfs_fsops.c xfs: check xfs_buf_read_uncached returns correctly 2014-10-02 09:05:32 +10:00
xfs_fsops.h
xfs_globals.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_icache.c xfs: remove second xfs_quota.h inclusion in xfs_icache.c 2014-09-23 16:05:55 +10:00
xfs_icache.h xfs: run an eofblocks scan on ENOSPC/EDQUOT 2014-07-24 19:49:28 +10:00
xfs_icreate_item.c xfs: format log items write directly into the linear CIL buffer 2013-12-13 11:34:02 +11:00
xfs_icreate_item.h xfs: separate icreate log format definitions from xfs_icreate_item.h 2013-08-12 16:10:35 -05:00
xfs_inode_item.c xfs: xfs_iflush_done checks the wrong log item callback 2014-10-03 09:09:50 +10:00
xfs_inode_item.h xfs: remove the inode log format from the inode log item 2013-12-13 11:34:05 +11:00
xfs_inode.c Merge branch 'xfs-misc-fixes-for-3.18-3' into for-next 2014-10-13 10:22:45 +11:00
xfs_inode.h xfs: check for inode size overflow in xfs_new_eof() 2014-10-02 09:21:53 +10:00
xfs_ioctl32.c xfs: compat_xfs_bstat does not have forkoff 2014-10-02 09:17:58 +10:00
xfs_ioctl32.h xfs: compat_xfs_bstat does not have forkoff 2014-10-02 09:17:58 +10:00
xfs_ioctl.c Merge branch 'xfs-misc-fixes-for-3.18-3' into for-next 2014-10-13 10:22:45 +11:00
xfs_ioctl.h xfs: consolidate extent swap code 2013-08-12 16:56:06 -05:00
xfs_iomap.c xfs: check for null dquot in xfs_quota_calc_throttle() 2014-10-02 09:27:09 +10:00
xfs_iomap.h xfs: get rid of count from xfs_iomap_write_allocate() 2013-10-01 15:42:34 -05:00
xfs_iops.c xfs: flush entire last page of old EOF on truncate up 2014-09-23 22:55:00 +10:00
xfs_iops.h xfs: use generic posix ACL infrastructure 2014-01-25 23:58:21 -05:00
xfs_itable.c xfs: Check error during inode btree iteration in xfs_bulkstat() 2014-10-30 10:34:52 +11:00
xfs_itable.h xfs: introduce xfs_bulkstat_ag_ichunk 2014-08-04 11:22:31 +10:00
xfs_linux.h xfs: kill time.h 2014-10-02 09:18:13 +10:00
xfs_log_cil.c xfs: xlog_cil_force_lsn doesn't always wait correctly 2014-09-23 15:57:59 +10:00
xfs_log_priv.h xfs: add xlog sysfs kobject and attribute handlers 2014-07-15 08:07:29 +10:00
xfs_log_recover.c Merge branch 'xfs-buf-iosubmit' into for-next 2014-10-02 09:11:14 +10:00
xfs_log.c xfs: introduce xfs_buf_submit[_wait] 2014-10-02 09:05:14 +10:00
xfs_log.h xfs: log vector rounding leaks log space 2014-05-20 08:18:09 +10:00
xfs_message.c xfs: decouple log and transaction headers 2013-10-23 16:17:44 -05:00
xfs_message.h xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
xfs_mount.c Merge branch 'xfs-buf-iosubmit' into for-next 2014-10-02 09:11:14 +10:00
xfs_mount.h xfs: add xfs_mount sysfs kobject 2014-07-15 08:07:01 +10:00
xfs_mru_cache.c xfs: mark all internal workqueues as freezable 2014-09-09 11:44:46 +10:00
xfs_mru_cache.h xfs: embedd mru_elem into parent structure 2014-04-23 07:11:51 +10:00
xfs_qm_bhv.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_qm_syscalls.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_qm.c xfs: xfs_qm_dquot_isolate needs locking annotations for sparse 2014-09-29 10:43:40 +10:00
xfs_qm.h xfs: mark xfs_qm_quotacheck as static 2014-07-24 20:49:57 +10:00
xfs_quota.h xfs: split dquot buffer operations out 2013-10-23 14:28:35 -05:00
xfs_quotaops.c xfs: fix uflags detection at xfs_fs_rm_xquota 2014-07-24 21:27:17 +10:00
xfs_rtalloc.c Merge branch 'xfs-buf-iosubmit' into for-next 2014-10-02 09:11:14 +10:00
xfs_rtalloc.h xfs: combine xfs_rtmodify_summary and xfs_rtget_summary 2014-09-09 11:58:42 +10:00
xfs_stats.c xfs: support the XFS_BTNUM_FINOBT free inode btree type 2014-04-24 16:00:52 +10:00
xfs_stats.h xfs: support the XFS_BTNUM_FINOBT free inode btree type 2014-04-24 16:00:52 +10:00
xfs_super.c xfs: xfs_kset should be static 2014-09-29 10:46:08 +10:00
xfs_super.h xfs: require 64-bit sector_t 2014-07-30 09:12:05 +10:00
xfs_symlink.c xfs: check resblks before calling xfs_dir_canenter 2014-09-09 11:57:52 +10:00
xfs_symlink.h xfs: push down inactive transaction mgmt for remote symlinks 2013-10-08 14:53:02 -05:00
xfs_sysctl.c xfs: Convert use of typedef ctl_table to struct ctl_table 2013-06-17 17:42:25 -05:00
xfs_sysctl.h xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.h xfs: add debug sysfs attribute set 2014-09-09 11:52:42 +10:00
xfs_trace.c xfs: add filestream allocator tracepoints 2014-04-23 07:11:52 +10:00
xfs_trace.h xfs: introduce xfs_buf_submit[_wait] 2014-10-02 09:05:14 +10:00
xfs_trans_ail.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_trans_buf.c xfs: introduce xfs_buf_submit[_wait] 2014-10-02 09:05:14 +10:00
xfs_trans_dquot.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_trans_extfree.c xfs: decouple log and transaction headers 2013-10-23 16:17:44 -05:00
xfs_trans_inode.c xfs: kill time.h 2014-10-02 09:18:13 +10:00
xfs_trans_priv.h xfs: remove unused ail pointer arg from xfs_trans_ail_cursor_done() 2014-04-14 19:06:05 +10:00
xfs_trans.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs_trans.h xfs: format log items write directly into the linear CIL buffer 2013-12-13 11:34:02 +11:00
xfs_types.h xfs: require 64-bit sector_t 2014-07-30 09:12:05 +10:00
xfs_xattr.c xfs: global error sign conversion 2014-06-25 14:58:08 +10:00
xfs.h xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00