linux/fs/xfs
Darrick J. Wong 509955823c xfs: log recovery should replay deferred ops in order
As part of testing log recovery with dm_log_writes, Amir Goldstein
discovered an error in the deferred ops recovery that lead to corruption
of the filesystem metadata if a reflink+rmap filesystem happened to shut
down midway through a CoW remap:

"This is what happens [after failed log recovery]:

"Phase 1 - find and verify superblock...
"Phase 2 - using internal log
"        - zero log...
"        - scan filesystem freespace and inode maps...
"        - found root inode chunk
"Phase 3 - for each AG...
"        - scan (but don't clear) agi unlinked lists...
"        - process known inodes and perform inode discovery...
"        - agno = 0
"data fork in regular inode 134 claims CoW block 376
"correcting nextents for inode 134
"bad data fork in inode 134
"would have cleared inode 134"

Hou Tao dissected the log contents of exactly such a crash:

"According to the implementation of xfs_defer_finish(), these ops should
be completed in the following sequence:

"Have been done:
"(1) CUI: Oper (160)
"(2) BUI: Oper (161)
"(3) CUD: Oper (194), for CUI Oper (160)
"(4) RUI A: Oper (197), free rmap [0x155, 2, -9]

"Should be done:
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI A
"(8) RUD: for RUI B

"Actually be done by xlog_recover_process_intents()
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI B
"(8) RUD: for RUI A

"So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
from the log record in post_mount.log (generated after umount) and the trace
print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
entry [0x155, 2, -9] are freed."

When reconstructing the internal log state from the log items found on
disk, it's required that deferred ops replay in exactly the same order
that they would have had the filesystem not gone down.  However,
replaying unfinished deferred ops can create /more/ deferred ops.  These
new deferred ops are finished in the wrong order.  This causes fs
corruption and replay crashes, so let's create a single defer_ops to
handle the subsequent ops created during replay, then use one single
transaction at the end of log recovery to ensure that everything is
replayed in the same order as they're supposed to be.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Analyzed-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-27 09:34:08 -08:00
..
libxfs xfs: abstract out dev_t conversions 2017-11-21 01:44:53 -08:00
scrub xfs: check the uniqueness of the AGFL entries 2017-11-09 19:27:32 -08:00
Kconfig xfs: create an ioctl to scrub AG metadata 2017-10-26 15:38:23 -07:00
kmem.c
kmem.h slab, slub, slob: add slab_flags_t 2017-11-15 18:21:01 -08:00
Makefile xfs: use a b+tree for the in-core extent list 2017-11-06 11:53:41 -08:00
mrlock.h
xfs_acl.c xfs: don't change inode mode if ACL update fails 2017-10-11 10:21:06 -07:00
xfs_acl.h xfs: Don't clear SGID when inheriting ACLs 2017-06-27 18:23:21 -07:00
xfs_aops.c xfs: trim writepage mapping to within eof 2017-10-16 12:26:50 -07:00
xfs_aops.h xfs: perform dax_device lookup at mount 2017-08-31 09:31:47 -07:00
xfs_attr_inactive.c xfs: fail if xattr inactivation hits a hole 2017-10-26 15:38:22 -07:00
xfs_attr_list.c xfs: remove u_int* type usage 2017-11-09 15:50:29 -08:00
xfs_attr.h xfs: scrub extended attributes 2017-10-26 15:38:26 -07:00
xfs_bmap_item.c xfs: log recovery should replay deferred ops in order 2017-11-27 09:34:08 -08:00
xfs_bmap_item.h xfs: log recovery should replay deferred ops in order 2017-11-27 09:34:08 -08:00
xfs_bmap_util.c xfs: remove support for inlining data/extents into the inode fork 2017-11-06 11:53:40 -08:00
xfs_bmap_util.h xfs: simplify the xfs_getbmap interface 2017-10-26 15:38:20 -07:00
xfs_buf_item.c xfs: fix compiler warnings 2017-09-02 08:22:19 -07:00
xfs_buf_item.h xfs: remove unnecessary dirty bli format check for ordered bufs 2017-09-01 10:55:30 -07:00
xfs_buf.c xfs: move error injection tags into their own file 2017-11-01 15:03:16 -07:00
xfs_buf.h xfs: buffer lru reference count error injection tag 2017-10-26 15:38:23 -07:00
xfs_dir2_readdir.c xfs: introduce the xfs_iext_cursor abstraction 2017-11-06 11:53:40 -08:00
xfs_discard.c
xfs_discard.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfs_dquot_item.c
xfs_dquot_item.h
xfs_dquot.c xfs: remove unreachable error injection code in xfs_qm_dqget 2017-11-06 11:57:39 -08:00
xfs_dquot.h
xfs_error.c xfs: mark xfs_errortag_ktype static 2017-11-06 11:57:39 -08:00
xfs_error.h xfs: move error injection tags into their own file 2017-11-01 15:03:16 -07:00
xfs_export.c
xfs_export.h
xfs_extent_busy.c
xfs_extent_busy.h
xfs_extfree_item.c
xfs_extfree_item.h
xfs_file.c libnvdimm for 4.15 2017-11-17 09:51:57 -08:00
xfs_filestream.c
xfs_filestream.h
xfs_fsmap.c xfs: move two more RT specific functions into CONFIG_XFS_RT 2017-10-16 12:26:50 -07:00
xfs_fsmap.h
xfs_fsops.c
xfs_fsops.h
xfs_globals.c
xfs_icache.c xfs: return a distinct error code value for IGET_INCORE cache misses 2017-10-26 15:38:23 -07:00
xfs_icache.h
xfs_icreate_item.c
xfs_icreate_item.h
xfs_inode_item.c xfs: use a b+tree for the in-core extent list 2017-11-06 11:53:41 -08:00
xfs_inode_item.h xfs: remove inode log format typedef 2017-11-01 15:03:16 -07:00
xfs_inode.c xfs: always free inline data before resetting inode fork during ifree 2017-11-27 09:33:25 -08:00
xfs_inode.h xfs: remove if_rdev 2017-10-26 15:38:27 -07:00
xfs_ioctl32.c xfs: create an ioctl to scrub AG metadata 2017-10-26 15:38:23 -07:00
xfs_ioctl32.h
xfs_ioctl.c xfs: remove u_int* type usage 2017-11-09 15:50:29 -08:00
xfs_ioctl.h xfs: remove u_int* type usage 2017-11-09 15:50:29 -08:00
xfs_iomap.c libnvdimm for 4.15 2017-11-17 09:51:57 -08:00
xfs_iomap.h xfs: update i_size after unwritten conversion in dio completion 2017-09-26 10:55:19 -07:00
xfs_iops.c xfs: truncate pagecache before writeback in xfs_setattr_size() 2017-11-03 09:45:56 -07:00
xfs_iops.h
xfs_itable.c xfs: remove if_rdev 2017-10-26 15:38:27 -07:00
xfs_itable.h xfs: create inode pointer verifiers 2017-10-26 15:38:23 -07:00
xfs_linux.h xfs: abstract out dev_t conversions 2017-11-21 01:44:53 -08:00
xfs_log_cil.c xfs: Fix leak of discard bio 2017-08-04 13:43:36 -07:00
xfs_log_priv.h locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
xfs_log_recover.c xfs: log recovery should replay deferred ops in order 2017-11-27 09:34:08 -08:00
xfs_log.c xfs: mark xlog_verify_dest_ptr STATIC 2017-11-06 11:57:39 -08:00
xfs_log.h
xfs_message.c
xfs_message.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfs_mount.c xfs: on failed mount, force-reclaim inodes after unmounting quota controls 2017-11-09 19:27:33 -08:00
xfs_mount.h xfs: convert drop_writes to use the errortag mechanism 2017-06-27 18:23:20 -07:00
xfs_mru_cache.c
xfs_mru_cache.h
xfs_ondisk.h xfs: Don't log uninitialised fields in inode structures 2017-10-11 10:21:06 -07:00
xfs_pnfs.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfs_pnfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfs_qm_bhv.c
xfs_qm_syscalls.c
xfs_qm.c xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves 2017-09-01 13:08:26 -07:00
xfs_qm.h
xfs_quota.h
xfs_quotaops.c VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) 2017-07-17 08:45:34 +01:00
xfs_refcount_item.c xfs: log recovery should replay deferred ops in order 2017-11-27 09:34:08 -08:00
xfs_refcount_item.h xfs: log recovery should replay deferred ops in order 2017-11-27 09:34:08 -08:00
xfs_reflink.c xfs: simplify xfs_reflink_convert_cow 2017-11-06 11:53:40 -08:00
xfs_reflink.h
xfs_rmap_item.c
xfs_rmap_item.h
xfs_rtalloc.c xfs: remove the ip argument to xfs_defer_finish 2017-09-01 10:55:30 -07:00
xfs_rtalloc.h xfs: create block pointer check functions 2017-10-26 15:38:23 -07:00
xfs_stats.c
xfs_stats.h
xfs_super.c Convert fs/*/* to SB_I_VERSION 2017-10-18 18:51:27 -04:00
xfs_super.h
xfs_symlink.c xfs: remove the ip argument to xfs_defer_finish 2017-09-01 10:55:30 -07:00
xfs_symlink.h xfs: allow reading of already-locked remote symbolic link 2017-06-20 10:45:22 -07:00
xfs_sysctl.c
xfs_sysctl.h
xfs_sysfs.c xfs: replace log_badcrc_factor knob with error injection tag 2017-06-27 18:23:21 -07:00
xfs_sysfs.h
xfs_trace.c
xfs_trace.h libnvdimm for 4.15 2017-11-17 09:51:57 -08:00
xfs_trans_ail.c xfs: move error injection tags into their own file 2017-11-01 15:03:16 -07:00
xfs_trans_bmap.c
xfs_trans_buf.c xfs: disallow marking previously dirty buffers as ordered 2017-09-01 10:55:30 -07:00
xfs_trans_dquot.c
xfs_trans_extfree.c
xfs_trans_inode.c xfs: refactor xfs_trans_roll 2017-09-01 10:55:30 -07:00
xfs_trans_priv.h xfs: Properly retry failed inode items in case of error during buffer writeback 2017-08-22 09:22:23 -07:00
xfs_trans_refcount.c
xfs_trans_rmap.c
xfs_trans.c xfs: refactor xfs_trans_roll 2017-09-01 10:55:30 -07:00
xfs_trans.h xfs: disallow marking previously dirty buffers as ordered 2017-09-01 10:55:30 -07:00
xfs_xattr.c
xfs.h xfs: always define STATIC to static noinline 2017-11-06 11:53:58 -08:00