linux/fs/xfs
Brian Foster 7f884dc198 xfs: fix quota block reservation leak when tp allocates and frees blocks
Al Viro reports that generic/231 fails frequently on XFS and bisected
the problem to the following commit:

	5d11fb4b xfs: rework zero range to prevent invalid i_size updates

... which is just the first commit that happens to cause fsx to
reproduce the problem. fsx reproduces via zero range calls. The
aforementioned commit overhauls zero range to use hole punch and
fallocate. As it turns out, the problem is reproducible on demand using
basic hole punch as follows:

$ mkfs.xfs -f -m crc=1,finobt=1 <dev>
$ mount <dev> /mnt -o uquota
$ xfs_io -f -c "falloc 0 50m" /mnt/file
$ for i in $(seq 1 20); do xfs_io -c "fpunch ${i}m 32k" /mnt/file; done
$ rm -f /mnt/file
$ repquota -us /mnt
...
User            used    soft    hard  grace    used  soft  hard  grace
----------------------------------------------------------------------
root      --     32K      0K      0K              3     0     0

A file is allocated with a single 50m extent. The extent count increases
via hole punches until the bmap converts to btree format. The file is
removed but quota reports 32k of space usage for the user. This
reservation is effectively leaked for the lifetime of the mount.

The reason this occurs is because the quota block reservation tracking
is confused when a transaction happens to free and allocate blocks at
the same time. Consider the following sequence of events:

- tp is allocated from xfs_free_file_space() and reserves several blocks
  for btree management. Blocks are reserved against the dquot and marked
  as such in the transaction (qtrx->qt_blk_res).
- 8 blocks are accounted free when the 32k range is punched out.
  xfs_trans_mod_dquot() is called with XFS_TRANS_DQ_BCOUNT and sets
  ->qt_bcount_delta to -8.
- Subsequently, a block is allocated against the same transaction by
  xfs_bmap_extents_to_btree() for btree conversion. A call to
  xfs_trans_mod_dquot() increases qt_blk_res_used to 1 and qt_bcount_delta
  to -7.
- The transaction is dup'd and committed by xfs_bmap_finish().
  xfs_trans_dup_dqinfo() sets the first transaction up such that it has a
  matching qt_blk_res and qt_blk_res_used of 1. The remaining unused
  reservation is transferred to the duplicate tp.

When the transactions are committed, the dquots are fixed up in
xfs_trans_apply_dquot_deltas() according to one of two methods:

1.) If the transaction holds a block reservation (->qt_blk_res != 0),
_only_ the unused portion reservation is unaccounted from the dquot.
Note that the tp duplication behavior of xfs_bmap_finish() makes it such
that qt_blk_res is typically 0 for tp's with unused reservation.
2.) Otherwise, the dquot is fixed up based on the block delta
(->qt_bcount_delta) created by the transaction.

Therefore, if a transaction has a negative qt_bcount_delta and positive
qt_blk_res_used, the former set of blocks that have been removed from
the file are never factored out of the in-core dquot reservation.
Instead, *_apply_dquot_deltas() sees 1 block used out of a 1 block
reservation and believes there is nothing to fix up. The on-disk
d_bcount is updated independently from qt_bcount_delta, and thus is
correct (and allows the quota usage to correct on remount).

To deal with this situation, we effectively want the "used reservation"
part of the transaction to be consistent with any freed blocks with
respect to quota tracking. For example, if 8 blocks are freed, the
subsequent single block allocation does not need to consume the initial
reservation made by the tp. Instead, it simply borrows one from the
previously freed. One possible implementation of such borrowing is to
avoid the blks_res_used increment when bcount_delta is negative. This
alone is flawed logic in that it only handles the case where blocks are
freed before allocated, however.

Rather than add more complexity to manage synchronization between
bcount_delta and blks_res_used, kill the latter entirely. blk_res_used
is only updated in one place and always in sync with delta_bcount.
Therefore, the net block reservation consumption of the transaction is
always available from bcount_delta. Calculate the reservation
consumption on the fly where necessary based on whether the tp has a
reservation and results in a positive net block delta on the inode.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-01 07:15:37 +10:00
..
libxfs xfs: always log the inode on unwritten extent conversion 2015-06-01 07:15:23 +10:00
Kconfig xfs: require 64-bit sector_t 2014-07-30 09:12:05 +10:00
kmem.c xfs: change kmem_free to use generic kvfree() 2015-02-02 09:54:18 +11:00
kmem.h xfs: change kmem_free to use generic kvfree() 2015-02-02 09:54:18 +11:00
Makefile xfs: implement pNFS export operations 2015-02-16 11:49:23 +11:00
mrlock.h
uuid.c
uuid.h
xfs_acl.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_acl.h xfs: move acl structures to xfs_format.h 2014-11-28 14:24:37 +11:00
xfs_aops.c xfs: update for 4.1-rc1 2015-04-24 07:08:41 -07:00
xfs_aops.h xfs: don't allocate an ioend for direct I/O completions 2015-02-02 10:02:09 +11:00
xfs_attr_inactive.c xfs: pass attr geometry to attr leaf header conversion functions 2015-04-13 11:26:02 +10:00
xfs_attr_list.c xfs: pass attr geometry to attr leaf header conversion functions 2015-04-13 11:26:02 +10:00
xfs_attr.h
xfs_bit.c
xfs_bmap_util.c Merge branch 'xfs-misc-fixes-for-4.1-3' into for-next 2015-04-13 11:40:16 +10:00
xfs_bmap_util.h xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate 2015-03-25 15:08:56 +11:00
xfs_buf_item.c xfs: clarify async write failure ratelimit message 2015-02-24 10:14:04 +11:00
xfs_buf_item.h
xfs_buf.c list_lru: add helpers to isolate items 2015-02-12 18:54:10 -08:00
xfs_buf.h xfs: split metadata and log buffer completion to separate workqueues 2014-12-04 09:43:17 +11:00
xfs_dir2_readdir.c Merge branch 'xfs-misc-fixes-for-3.19-2' into for-next 2014-12-04 09:46:17 +11:00
xfs_discard.c xfs: pass mp to XFS_WANT_CORRUPTED_GOTO 2015-02-23 22:39:08 +11:00
xfs_discard.h
xfs_dquot_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_dquot_item.h xfs: remove the quotaoff log format from the quotaoff log item 2013-12-13 11:34:08 +11:00
xfs_dquot.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_dquot.h xfs: fix implicit bool to int conversion 2015-01-09 10:48:58 +11:00
xfs_error.c xfs: %pF is only for function pointers 2015-03-25 14:56:21 +11:00
xfs_error.h xfs: pass mp to XFS_WANT_CORRUPTED_RETURN 2015-02-23 22:39:13 +11:00
xfs_export.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs_export.h
xfs_extent_busy.c xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_extent_busy.h
xfs_extfree_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_extfree_item.h
xfs_file.c xfs: update for 4.1-rc1 2015-04-24 07:08:41 -07:00
xfs_filestream.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
xfs_filestream.h xfs: add filestream allocator tracepoints 2014-04-23 07:11:52 +10:00
xfs_fsops.c xfs: remove xfs_mod_incore_sb API 2015-02-23 21:24:37 +11:00
xfs_fsops.h
xfs_globals.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_icache.c xfs: inodes are new until the dentry cache is set up 2015-02-23 22:38:08 +11:00
xfs_icache.h xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_icreate_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_icreate_item.h
xfs_inode_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_inode_item.h
xfs_inode.c Merge branch 'xfs-misc-fixes-for-4.1-2' into for-next 2015-03-25 15:12:30 +11:00
xfs_inode.h Merge branch 'xfs-mmap-lock' into for-next 2015-02-24 10:27:47 +11:00
xfs_ioctl32.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs_ioctl32.h xfs: compat_xfs_bstat does not have forkoff 2014-10-02 09:17:58 +10:00
xfs_ioctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
xfs_ioctl.h
xfs_iomap.c xfs: Remove icsb infrastructure 2015-02-23 21:22:31 +11:00
xfs_iomap.h xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten 2015-01-09 10:48:12 +11:00
xfs_iops.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
xfs_iops.h xfs: inodes are new until the dentry cache is set up 2015-02-23 22:38:08 +11:00
xfs_itable.c xfs: pass mp to XFS_WANT_CORRUPTED_RETURN 2015-02-23 22:39:13 +11:00
xfs_itable.h xfs: bulkstat chunk formatting cursor is broken 2014-11-07 08:30:30 +11:00
xfs_linux.h xfs: Remove icsb infrastructure 2015-02-23 21:22:31 +11:00
xfs_log_cil.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_log_priv.h xfs: add xlog sysfs kobject and attribute handlers 2014-07-15 08:07:29 +10:00
xfs_log_recover.c xfs: Remove icsb infrastructure 2015-02-23 21:22:31 +11:00
xfs_log.c Merge branch 'xfs-sb-logging-rework' into for-next 2015-01-22 09:20:53 +11:00
xfs_log.h xfs: log vector rounding leaks log space 2014-05-20 08:18:09 +10:00
xfs_message.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_message.h
xfs_mount.c xfs: remove xfs_mod_incore_sb API 2015-02-23 21:24:37 +11:00
xfs_mount.h xfs: remove xfs_mod_incore_sb API 2015-02-23 21:24:37 +11:00
xfs_mru_cache.c xfs: xfs_mru_cache_insert() should use GFP_NOFS 2015-03-25 14:57:53 +11:00
xfs_mru_cache.h xfs: embedd mru_elem into parent structure 2014-04-23 07:11:51 +10:00
xfs_pnfs.c Merge branch 'xfs-misc-fixes-for-4.1-3' into for-next 2015-04-13 11:40:16 +10:00
xfs_pnfs.h xfs: unlock i_mutex in xfs_break_layouts 2015-04-13 11:38:29 +10:00
xfs_qm_bhv.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_qm_syscalls.c xfs: Convert to using ->get_state callback 2015-03-04 16:06:36 +01:00
xfs_qm.c Merge branch 'xfs-misc-fixes-for-4.1' into for-next 2015-02-24 10:24:07 +11:00
xfs_qm.h xfs: Convert to using ->get_state callback 2015-03-04 16:06:36 +01:00
xfs_quota.h xfs: fix quota block reservation leak when tp allocates and frees blocks 2015-06-01 07:15:37 +10:00
xfs_quotaops.c xfs: Add support for Q_SETINFO 2015-03-04 16:06:38 +01:00
xfs_rtalloc.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_rtalloc.h xfs: combine xfs_rtmodify_summary and xfs_rtget_summary 2014-09-09 11:58:42 +10:00
xfs_stats.c xfs: support the XFS_BTNUM_FINOBT free inode btree type 2014-04-24 16:00:52 +10:00
xfs_stats.h xfs: support the XFS_BTNUM_FINOBT free inode btree type 2014-04-24 16:00:52 +10:00
xfs_super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
xfs_super.h xfs: Remove icsb infrastructure 2015-02-23 21:22:31 +11:00
xfs_symlink.c xfs: inodes are new until the dentry cache is set up 2015-02-23 22:38:08 +11:00
xfs_symlink.h
xfs_sysctl.c xfs: remove deprecated sysctls 2015-01-09 10:47:43 +11:00
xfs_sysctl.h xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.h xfs: add debug sysfs attribute set 2014-09-09 11:52:42 +10:00
xfs_trace.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trace.h Merge branch 'xfs-dio-extend-fix' into for-next 2015-04-16 22:13:18 +10:00
xfs_trans_ail.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trans_buf.c xfs: only trace buffer items if they exist 2015-02-10 09:23:40 +11:00
xfs_trans_dquot.c xfs: fix quota block reservation leak when tp allocates and frees blocks 2015-06-01 07:15:37 +10:00
xfs_trans_extfree.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trans_inode.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trans_priv.h xfs: remove unused ail pointer arg from xfs_trans_ail_cursor_done() 2014-04-14 19:06:05 +10:00
xfs_trans.c xfs: replace xfs_mod_incore_sb_batched 2015-02-23 21:24:11 +11:00
xfs_trans.h
xfs_xattr.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs.h