Remove the header definitions for these ioctls. The just-removed
implementation has allowed callers to read stale disk contents for more
than **21 years** and nobody noticed or complained, which implies a lack
of users aside from exploit programs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlp28ACgkQ+H93GTRK
tOt77Q/+KP7XtwaloGbkBfMzk9JhVh4wISagQ8BxZ5Jb8cHg9ekWJz585he9siNY
mfcuaLh5onWiBRsfqo4RXg03X4E3c/U+Q3oqe405O3TZkK2LOZMjkPE1ijDdTej6
V9WhCTCRGuJA01sKgFuuwFJxJRjIROE29FPgRQP9EvLtEITIxLbHMFZRYKpYh143
EhNwzQQwwPg/L4/m1qmWfC3L+Z7qTm6EQhOnpyzxlKxyX7qXIjBHi6WvErcDeJ3F
pS7v1fwcZxctrh7PrKwhXSrkbMmd5J3p5qI/MrCtGEKWNXk+rv6AC0n4gcXvpJ/v
wL0OTyik9pwA8V0XPQcuWvQXmrm8vR2XvMok6gXkHB1jCfzYAwJsrHQbop4pyCFe
U3HU46x0g7UFXY7jUjztD8YNIT1+B+ducetCCGAhI97HiQrSsqSvgvPFZNle7Cef
Oheab4iIs1zUblNrVzyGCQmK42ankypxPbfrrtvLi7SFrLRAGXeWeqDf1RXJnt5b
xrOqCe1hgXR4RJrkTPWiiQindLlhDuywfa+Q1Y5fYZatsTtgceE/HIOg80x4pPgR
4Ip7hW9lsjoDckpu0bC0bvYiqhrYM1eztpUToYdy7FeOkQKkPHO9xm/m1tHbqzmi
bF3hkBo6bLByXiY/ZXzrQGrErJ6OTdNVpsR1vYjoaycrQt6wznI=
=hq3i
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull more xfs irix ioctl housecleaning from Darrick Wong:
"Withdraw the XFS_IOC_ALLOCSP* and XFS_IOC_FREESP* ioctl definitions.
This is the third and final of a series of small pull requests that
perform some long overdue housecleaning of XFS ioctls. This time,
we're withdrawing all variants of the ALLOCSP and FREESP ioctls from
XFS' userspace API. This might be a little premature since we've only
just removed the functionality, but as I pointed out in the last pull
request, nobody (including fstests) noticed that it was broken for 20
years.
In response to the patch, we received a single comment from someone
who stated that they 'augment' the ioctl for their own purposes, but
otherwise acquiesced to the withdrawal. I still want to try to clobber
these old ioctl definitions in 5.17.
So remove the header definitions for these ioctls. The just-removed
implementation has allowed callers to read stale disk contents for
more than **21 years** and nobody noticed or complained, which implies
a lack of users aside from exploit programs"
* tag 'xfs-5.17-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove the XFS_IOC_{ALLOC,FREE}SP* definitions
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlpbEACgkQ+H93GTRK
tOs8MBAAjOyUswNqVMYgMB6eww7oT8j54jGdTmo+d+N2hIoX4OAV6GdCI6te/7DE
yz6TtzbQ9fs9lDYsdRsMjfwr18R8N1m0D23cpFSJ6qq6e2NJi0mqVwmC8mPRNAJH
68yL0p6O4Tk5jj7kae+zsv50nsSANXclfoHjsZQ0DGOuGagmPkNVqT82KIC4n3dn
aw66xhuaiLLJE+4boLe4NexRVBbyOuHQ2uB7xUnKwc9tHvjAf8EFCIhUV1wp0qZf
CfA2wg8+Jzwrqz/gVRKUZOjz7LeIY6E2qCBrA+DATv2dcv7QhmvGHaQ9OkrvIE72
CbvI92IhvOcKFzpfMrRYGhOh7KE6SkxLGqsAXgnjPoFQkCDudgCaExBO96RMMd6u
cX43mXWZbUl++Sh2GhPD/xkiskLRZFjiHJbKBX/5nwjU2BzTHQY/7Jy07fIkR4c4
IrkKgiXfSJT4j/KeAMkBpZ7THMjRMSUgwliSWHL0QWUz5Bou8WRnHUl8CMsu9vDJ
fYeekXDQYuAX+UrcsDlbA0UukigOLSIZiQTAEgSbIkd/+Zb6U6e0IF7pTTZJ9uFs
bndLFYqZtEAySDrMCBM+W8VYmR48EDxfN8xsdS1kbZIqEdNhmkEMj9tMf0rs+FRi
lo1vMi08O7VcuyiyNrKs0e1d1Gkd2jwmwIskSQweslP5BbfPzRE=
=d+fT
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs ioctl housecleaning from Darrick Wong:
"This is the first of a series of small pull requests that perform some
long overdue housecleaning of XFS ioctls. This first pull request
removes the FSSETDM ioctl, which was used to set DMAPI event
attributes on XFS files. The DMAPI support has never been merged
upstream and the implementation of FSSETDM itself was removed two
years ago, so let's withdraw it completely.
- Withdraw the ioctl definition for the FSSETDM ioctl"
* tag 'xfs-5.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove the XFS_IOC_FSSETDM definitions
Now that we've made these ioctls defunct, move them from xfs_fs.h to
xfs_ioctl.c, which effectively removes them from the publicly supported
ioctl interfaces for XFS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Remove the definitions for these ioctls, since the functionality (and,
weirdly, the 32-bit compat ioctl definitions) were removed from the
kernel in November 2019.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
- Simplify the dax_operations API
- Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining
and applying a partition offset to all its DAX iomap operations.
- Remove wrappers and device-mapper stacked callbacks for
->copy_from_iter() and ->copy_to_iter() in favor of moving
block_device relative offset responsibility to the
dax_direct_access() caller.
- Remove the need for an @bdev in filesystem-DAX infrastructure
- Remove unused uio helpers copy_from_iter_flushcache() and
copy_mc_to_iter() as only the non-check_copy_size() versions are
used for DAX.
- Prepare XFS for the pending (next merge window) DAX+reflink support
- Remove deprecated DEV_DAX_PMEM_COMPAT support
- Cleanup a straggling misuse of the GUID api
Tags offered after the branch was cut:
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/Ydb/3P+8nvjCjYfO@redhat.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYd3dTAAKCRDfioYZHlFs
Z//UAP9zetoTE+O7zJG7CXja4jSopSadbdbh6QKSXaqfKBPvQQD+N4US3wA2bGv8
f/qCY62j2Hj3hUTGHs9RvTyw3JsSYAA=
=QvDs
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax and libnvdimm updates from Dan Williams:
"The bulk of this is a rework of the dax_operations API after
discovering the obstacles it posed to the work-in-progress DAX+reflink
support for XFS and other copy-on-write filesystem mechanics.
Primarily the need to plumb a block_device through the API to handle
partition offsets was a sticking point and Christoph untangled that
dependency in addition to other cleanups to make landing the
DAX+reflink support easier.
The DAX_PMEM_COMPAT option has been around for 4 years and not only
are distributions shipping userspace that understand the current
configuration API, but some are not even bothering to turn this option
on anymore, so it seems a good time to remove it per the deprecation
schedule. Recall that this was added after the device-dax subsystem
moved from /sys/class/dax to /sys/bus/dax for its sysfs organization.
All recent functionality depends on /sys/bus/dax.
Some other miscellaneous cleanups and reflink prep patches are
included as well.
Summary:
- Simplify the dax_operations API:
- Eliminate bdev_dax_pgoff() in favor of the filesystem
maintaining and applying a partition offset to all its DAX iomap
operations.
- Remove wrappers and device-mapper stacked callbacks for
->copy_from_iter() and ->copy_to_iter() in favor of moving
block_device relative offset responsibility to the
dax_direct_access() caller.
- Remove the need for an @bdev in filesystem-DAX infrastructure
- Remove unused uio helpers copy_from_iter_flushcache() and
copy_mc_to_iter() as only the non-check_copy_size() versions are
used for DAX.
- Prepare XFS for the pending (next merge window) DAX+reflink support
- Remove deprecated DEV_DAX_PMEM_COMPAT support
- Cleanup a straggling misuse of the GUID api"
* tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits)
iomap: Fix error handling in iomap_zero_iter()
ACPI: NFIT: Import GUID before use
dax: remove the copy_from_iter and copy_to_iter methods
dax: remove the DAXDEV_F_SYNC flag
dax: simplify dax_synchronous and set_dax_synchronous
uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
iomap: turn the byte variable in iomap_zero_iter into a ssize_t
memremap: remove support for external pgmap refcounts
fsdax: don't require CONFIG_BLOCK
iomap: build the block based code conditionally
dax: fix up some of the block device related ifdefs
fsdax: shift partition offset handling into the file systems
dax: return the partition offset from fs_dax_get_by_bdev
iomap: add a IOMAP_DAX flag
xfs: pass the mapping flags to xfs_bmbt_to_iomap
xfs: use xfs_direct_write_iomap_ops for DAX zeroing
xfs: move dax device handling into xfs_{alloc,free}_buftarg
ext4: cleanup the dax handling in ext4_fill_super
ext2: cleanup the dax handling in ext2_fill_super
fsdax: decouple zeroing from the iomap buffered I/O code
...
To prepare for looking at the IOMAP_DAX flag in xfs_bmbt_to_iomap pass in
the input mapping flags to xfs_bmbt_to_iomap.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-24-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.
The deadlock as below:
[983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080
[ 983.923405] Call Trace:
[ 983.923410] __schedule+0x2c4/0x700
[ 983.923412] schedule+0x37/0xa0
[ 983.923414] schedule_timeout+0x274/0x300
[ 983.923416] __down+0x9b/0xf0
[ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[ 983.923453] down+0x3b/0x50
[ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs]
[ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs]
[ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs]
[ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
[ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs]
[ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
[ 983.923624] ? kmem_cache_alloc+0x12e/0x270
[ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs]
[ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs]
[ 983.923664] xfs_attr_set+0x273/0x320 [xfs]
[ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs]
[ 983.923686] __vfs_removexattr+0x4d/0x60
[ 983.923688] __vfs_removexattr_locked+0xac/0x130
[ 983.923689] vfs_removexattr+0x4e/0xf0
[ 983.923690] removexattr+0x4d/0x80
[ 983.923693] ? __check_object_size+0xa8/0x16b
[ 983.923695] ? strncpy_from_user+0x47/0x1a0
[ 983.923696] ? getname_flags+0x6a/0x1e0
[ 983.923697] ? _cond_resched+0x15/0x30
[ 983.923699] ? __sb_start_write+0x1e/0x70
[ 983.923700] ? mnt_want_write+0x28/0x50
[ 983.923701] path_removexattr+0x9b/0xb0
[ 983.923702] __x64_sys_removexattr+0x17/0x20
[ 983.923704] do_syscall_64+0x5b/0x1a0
[ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 983.923707] RIP: 0033:0x7f080f10ee1b
When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.
Then subsequent removexattr will hang because of it.
This bug was introduced by kernel commit 07120f1abd ("xfs: Add xfs_has_attr and subroutines").
It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
in this case. But xfs_attr_node_hasname will free state itself instead of caller if
xfs_da3_node_lookup_int fails.
Fix this bug by moving the step of free state into caller.
Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
xfs_attr_node_removename_setup function because we should free state ourselves.
Fixes: 07120f1abd ("xfs: Add xfs_has_attr and subroutines")
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Sync this one last bit of discrepancy between kernel and userspace
libxfs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
The xfs_perag structure and initialization is unused in userspace,
so #ifdef it out with __KERNEL__ to facilitate the xfsprogs sync
and build.
Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The owner info parameter is always NULL, so get rid of the parameter.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
We only use EFIs to free metadata blocks -- not regular data/attr fork
extents. Remove all the fields that we never use, for a net reduction
of 16 bytes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
xfs_bmap_add_free isn't a block mapping function; it schedules deferred
freeing operations for a later point in a compound transaction chain.
While it's primarily used by bunmapi, its use has expanded beyond that.
Move it to xfs_alloc.c and rename the function since it's now general
freeing functionality. Bring the slab cache bits in line with the
way we handle the other intent items.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Create slab caches for the high-level structures that coordinate
deferred intent items, since they're used fairly heavily.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Rearrange these structs to reduce the amount of unused padding bytes.
This saves eight bytes for each of the three structs changed here, which
means they're now all (rmap/bmap are 64 bytes, refc is 32 bytes) even
powers of two.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Now that we've gotten rid of the kmem_zone_t typedef, rename the
variables to _cache since that's what they are.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Remove these typedefs by referencing kmem_cache directly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Now that we have the infrastructure to track the max possible height of
each btree type, we can create a separate slab cache for cursors of each
type of btree. For smaller indices like the free space btrees, this
means that we can pack more cursors into a slab page, improving slab
utilization.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Add code for all five btree types so that we can compute the absolute
maximum possible btree height for each btree type. This is a setup for
the next patch, which makes every btree type have its own cursor cache.
The functions are exported so that we can have xfs_db report the
absolute maximum btree heights for each btree type, rather than making
everyone run their own ad-hoc computations.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Nobody uses this symbol anymore, so kill it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big
enough to handle the maximally tall rmap btree when all blocks are in
use and maximally shared, let's compute the maximum height assuming the
rmapbt consumes as many blocks as possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
During review of the next patch, Dave remarked that he found these two
btree geometry calculation functions lacking in documentation and that
they performed more work than was really necessary.
These functions take the same parameters and have nearly the same logic;
the only real difference is in the return values. Reword the function
comment to make it clearer what each function does, and move them to be
adjacent to reinforce their relation.
Clean up both of them to stop opencoding the howmany functions, stop
using the uint typedefs, and make them both support computations for
more than 2^32 leaf records, since we're going to need all of the above
for files with large data forks and large rmap btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Compute the actual maximum AG btree height for deciding if a per-AG
block reservation is critically low. This only affects the sanity check
condition, since we /generally/ will trigger on the 10% threshold. This
is a long-winded way of saying that we're removing one more usage of
XFS_BTREE_MAXLEVELS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Years ago when XFS was thought to be much more simple, we introduced
m_ag_maxlevels to specify the maximum btree height of per-AG btrees for
a given filesystem mount. Then we observed that inode btrees don't
actually have the same height and split that off; and now we have rmap
and refcount btrees with much different geometries and separate
maxlevels variables.
The 'ag' part of the name doesn't make much sense anymore, so rename
this to m_alloc_maxlevels to reinforce that this is the maximum height
of the *free space* btrees. This sets us up for the next patch, which
will add a variable to track the maximum height of all AG btrees.
(Also take the opportunity to improve adjacent comments and fix minor
style problems.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
To support future btree code, we need to be able to size btree cursors
dynamically for very large btrees. Switch the maxlevels computation to
use the precomputed values in the superblock, and create cursors that
can handle a certain height. For now, we retain the btree cursor cache
that can handle up to 9-level btrees, though a subsequent patch
introduces separate caches for each btree type, where each cache's
objects will be exactly tall enough to handle the specific btree type.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Encode the maximum btree height in the cursor, since we're soon going to
allow smaller cursors for AG btrees and larger cursors for file btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Refactor btree allocation to a common helper.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reduce the size of the btree cursor structure some more by rearranging
fields to eliminate unused space. While we're at it, fix the ragged
indentation and a spelling error.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Split out the btree level information into a separate struct and put it
at the end of the cursor structure as a VLA. Files with huge data forks
(and in the future, the realtime rmap btree) will require the ability to
support many more levels than a per-AG btree cursor, which means that
we're going to create per-btree type cursor caches to conserve memory
for the more common case.
Note that a subsequent patch actually introduces dynamic cursor heights.
This one merely rearranges the structure to prepare for that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
We're never going to run more than 4 billion btree operations on a
refcount cursor, so shrink the field to an unsigned int to reduce the
structure size. Fix whitespace alignment too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
The for_each_perag*() set of macros are hacky in that some (i.e.
those based on sb_agcount) rely on the assumption that perag
iteration terminates naturally with a NULL perag at the specified
end_agno. Others allow for the final AG to have a valid perag and
require the calling function to clean up any potential leftover
xfs_perag reference on termination of the loop.
Aside from providing a subtly inconsistent interface, the former
variant is racy with growfs because growfs can create discoverable
post-eofs perags before the final superblock update that completes
the grow operation and increases sb_agcount. This leads to the
following assert failure (reproduced by xfs/104) in the perag free
path during unmount:
XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195
This occurs because one of the many for_each_perag() loops in the
code that is expected to terminate with a NULL pag (and thus has no
post-loop xfs_perag_put() check) raced with a growfs and found a
non-NULL post-EOFS perag, but terminated naturally based on the
end_agno check without releasing the post-EOFS perag.
Rework the iteration logic to lift the agno check from the main for
loop conditional to the iteration helper function. The for loop now
purely terminates on a NULL pag and xfs_perag_next() avoids taking a
reference to any perag beyond end_agno in the first place.
Fixes: f250eedcf7 ("xfs: make for_each_perag... a first class citizen")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The for_each_perag_from() iteration macro relies on sb_agcount to
process every perag currently within EOFS from a given starting
point. It's perfectly valid to have perag structures beyond
sb_agcount, however, such as if a growfs is in progress. If a perag
loop happens to race with growfs in this manner, it will actually
attempt to process the post-EOFS perag where ->pag_agno ==
sb_agcount. This is reproduced by xfs/104 and manifests as the
following assert failure in superblock write verifier context:
XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
Update the corresponding macro to only process perags that are
within the current sb_agcount.
Fixes: 58d43a7e32 ("xfs: pass perags around in fsmap data dev functions")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Rename the next_agno variable to be consistent across the several
iteration macros and shorten line length.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Fold the loop iteration logic into a helper in preparation for
further fixups. No functional change in this patch.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Remove the few leftover instances of the xfs_dinode_t typedef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Remove the few leftover instances of the xfs_dinode_t typedef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Remove the few leftover instances of the xfs_dinode_t typedef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Warn if we ever bump nlevels higher than the allowed maximum cursor
height.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Get rid of this old typedef before we start changing other things.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The btree geometry computation function has an off-by-one error in that
it does not allow maximally tall btrees (nlevels == XFS_BTREE_MAXLEVELS).
This can result in repairs failing unnecessarily on very fragmented
filesystems. Subsequent patches to remove MAXLEVELS usage in favor of
the per-btree type computations will make this a much more likely
occurrence.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When log recovery tries to recover a transaction that had log intent
items attached to it, it has to save certain parts of the transaction
state (reservation, dfops chain, inodes with no automatic unlock) so
that it can finish single-stepping the recovered transactions before
finishing the chains.
This is done with the xfs_defer_ops_capture and xfs_defer_ops_continue
functions. Right now they open-code this functionality, so let's port
this to the formalized resource capture structure that we introduced in
the previous patch. This enables us to hold up to two inodes and two
buffers during log recovery, the same way we do for regular runtime.
With this patch applied, we'll be ready to support atomic extent swap
which holds two inodes; and logged xattrs which holds one inode and one
xattr leaf buffer.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Transaction users are allowed to flag up to two buffers and two inodes
for ownership preservation across a deferred transaction roll. Hoist
the variables and code responsible for this out of xfs_defer_trans_roll
so that we can use it for the defer capture mechanism.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Stop directly referencing b_bn in code outside the buffer cache, as
b_bn is supposed to be used only as an internal cache index.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Introduce a helper function xfs_buf_daddr() to extract the disk
address of the buffer from the struct xfs_buf. This will replace
direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
the XFS_BUF_ADDR() macro.
This patch introduces the helper function and replaces all uses of
XFS_BUF_ADDR() as this is just a simple sed replacement.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
All callers to xfs_dinode_good_version() and XFS_DINODE_SIZE() in
both the kernel and userspace have a xfs_mount structure available
which means they can use mount features checks instead looking
directly are the superblock.
Convert these functions to take a mount and use a xfs_has_v3inodes()
check and move it out of the libxfs/xfs_format.h file as it really
doesn't have anything to do with the definition of the on-disk
format.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Rather than open coding XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5
checks everywhere, add a simple wrapper to encapsulate this and make
the code easier to read.
This allows us to remove the xfs_sb_version_has_v3inode() wrapper
which is only used in xfs_format.h now and is just a version number
check.
There are a couple of places where we should be checking the mount
feature bits rather than the superblock version (e.g. remount), so
those are converted to use xfs_has_crc(mp) instead.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The vast majority of these wrappers are now unused. Remove them
leaving just the small subset of wrappers that are used to either
add feature bits or make the mount features field setup code
simpler.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
This is a conversion of the remaining xfs_sb_version_has..(sbp)
checks to use xfs_has_..(mp) feature checks.
This was largely done with a vim replacement macro that did:
:0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR>
A couple of other variants were also used, and the rest touched up
by hand.
$ size -t fs/xfs/built-in.a
text data bss dec hex filename
before 1127533 311352 484 1439369 15f689 (TOTALS)
after 1125360 311352 484 1437196 15ee0c (TOTALS)
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The superblock verifiers are one of the last places that use the sb
version functions to do feature checks. This are all quite simple
uses, and there aren't many of them so open code them all.
Also, move the good version number check into xfs_sb.c instead of it
being an inline function in xfs_format.h
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>