Commit Graph

8863 Commits

Author SHA1 Message Date
Linus Torvalds
8751b21ad9 New code for 6.12:
* Introduce new ioctls to exchange contents of two files.
     The first ioctl does the preparation work to exchange the contents of two
     files while the second ioctl performs the actual exchange if the target
     file has not been changed since a given sampling point.
 
   * Fixes
     - Fix bugs associated with calculating the maximum range of realtime
       extents to scan for free space.
     - Copy keys instead of records when resizing the incore BMBT root block.
     - Do not report FITRIMming more bytes than possibly exist in the
       filesystem.
     - Modify xfs_fs.h to prevent C++ compilation errors.
     - Do not over eagerly free post-EOF speculative preallocation.
     - Ensure st_blocks never goes to zero during COW writes
 
   * Cleanups/refactors
     - Use Xarray to hold per-AG data instead of a Radix tree.
     - Cleanup the following functionality,
       - Realtime bitmap.
       - Inode allocator.
       - Quota.
       - Inode rooted btree code.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZtmy/gAKCRAH7y4RirJu
 9H3GAP9CnoiZu+U/QmNL5T15fgNGs+BQDrUNbmbn3bNlmIZviQEAi3p+50OlT0nP
 lcQ/89NJ6uDFNBiphpkGajlp5vn2BQ0=
 =7wy/
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.12-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:
 "New code:

   - Introduce new ioctls to exchange contents of two files.

     The first ioctl does the preparation work to exchange the contents
     of two files while the second ioctl performs the actual exchange if
     the target file has not been changed since a given sampling point.

  Fixes:

   - Fix bugs associated with calculating the maximum range of realtime
     extents to scan for free space.

   - Copy keys instead of records when resizing the incore BMBT root
     block.

   - Do not report FITRIMming more bytes than possibly exist in the
     filesystem.

   - Modify xfs_fs.h to prevent C++ compilation errors.

   - Do not over eagerly free post-EOF speculative preallocation.

   - Ensure st_blocks never goes to zero during COW writes

  Cleanups/refactors:

   - Use Xarray to hold per-AG data instead of a Radix tree.

   - Cleanups to:
      - realtime bitmap
      - inode allocator
      - quota
      - inode rooted btree code"

* tag 'xfs-6.12-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (61 commits)
  xfs: ensure st_blocks never goes to zero during COW writes
  xfs: use xas_for_each_marked in xfs_reclaim_inodes_count
  xfs: convert perag lookup to xarray
  xfs: simplify tagged perag iteration
  xfs: move the tagged perag lookup helpers to xfs_icache.c
  xfs: use kfree_rcu_mightsleep to free the perag structures
  xfs: use LIST_HEAD() to simplify code
  xfs: Remove duplicate xfs_trans_priv.h header
  xfs: remove unnecessary check
  xfs: Use xfs set and clear mp state helpers
  xfs: reclaim speculative preallocations for append only files
  xfs: simplify extent lookup in xfs_can_free_eofblocks
  xfs: check XFS_EOFBLOCKS_RELEASED earlier in xfs_release_eofblocks
  xfs: only free posteof blocks on first close
  xfs: don't free post-EOF blocks on read close
  xfs: skip all of xfs_file_release when shut down
  xfs: don't bother returning errors from xfs_file_release
  xfs: refactor f_op->release handling
  xfs: remove the i_mode check in xfs_release
  xfs: standardize the btree maxrecs function parameters
  ...
2024-09-19 07:03:55 +02:00
Linus Torvalds
9ea925c806 Updates for timers and timekeeping:
- Core:
 
 	- Overhaul of posix-timers in preparation of removing the
 	  workaround for periodic timers which have signal delivery
 	  ignored.
 
         - Remove the historical extra jiffie in msleep()
 
 	  msleep() adds an extra jiffie to the timeout value to ensure
 	  minimal sleep time. The timer wheel ensures minimal sleep
 	  time since the large rewrite to a non-cascading wheel, but the
 	  extra jiffie in msleep() remained unnoticed. Remove it.
 
         - Make the timer slack handling correct for realtime tasks.
 
 	  The procfs interface is inconsistent and does neither reflect
 	  reality nor conforms to the man page. Show the correct 0 slack
 	  for real time tasks and enforce it at the core level instead of
 	  having inconsistent individual checks in various timer setup
 	  functions.
 
         - The usual set of updates and enhancements all over the place.
 
   - Drivers:
 
         - Allow the ACPI PM timer to be turned off during suspend
 
 	- No new drivers
 
 	- The usual updates and enhancements in various drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn7jQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobqnD/9COlU0nwsulABI/aNIrsh6iYvnCC9v
 14CcNta7Qn+157Wfw9BWOyHdNhR1/fPCXE8jJ71zTyIOeW27HV2JyTtxTwe9ZcdK
 ViHAaj7YcIjcVUEC3StCoRCPnvLslEw4qJA5AOQuDyMivdQn+YVa2c0baJxKaXZt
 xk4HZdMj4NAS0jRKnoZSwtKW/+Oz6rR4GAWrZo+Zs1/8ur3HfqnQfi8lJ1hJtLLW
 V7XDCVRvamVi6Ah3ocYPPp/1P6yeQDA1ge9aMddqaza5STWISXRtSnFMUmYP3rbS
 FaL8TyL+ilfny8pkGB2WlG6nLuSbtvogtdEh1gG1k1RmZt44kAtk8ba/KiWFPBSb
 zK9cjojRMBS71f9G4kmb5F4rnXoLsg1YbD1Nzhz3wq2Cs1Z90dc2QwMren0zoQ1x
 Fn56ueRyAiagBlnrSaKyso/2RvqJTNoSdi3RkpjYeAph0UoDCqvTvKjGAf1mWiw1
 T/1lUWSVqWHnzZbM7XXzzajIN9bl6A7bbqlcAJ2O9vZIDt7273DG+bQym9Vh6Why
 0LTGGERHxzKBsG7WRg+2Gmvv6S18UPKRo8tLtlA758rHlFuPTZCShWrIriwSNl1K
 Hxon+d4BparSnm1h9W/NHPKJA574UbWRCBjdk58IkAj8DxZZY4ORD9SMP+ggkV7G
 F6p9cgoDNP9KFg==
 =jE0N
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Core:

   - Overhaul of posix-timers in preparation of removing the workaround
     for periodic timers which have signal delivery ignored.

   - Remove the historical extra jiffie in msleep()

     msleep() adds an extra jiffie to the timeout value to ensure
     minimal sleep time. The timer wheel ensures minimal sleep time
     since the large rewrite to a non-cascading wheel, but the extra
     jiffie in msleep() remained unnoticed. Remove it.

   - Make the timer slack handling correct for realtime tasks.

     The procfs interface is inconsistent and does neither reflect
     reality nor conforms to the man page. Show the correct 0 slack for
     real time tasks and enforce it at the core level instead of having
     inconsistent individual checks in various timer setup functions.

   - The usual set of updates and enhancements all over the place.

  Drivers:

   - Allow the ACPI PM timer to be turned off during suspend

   - No new drivers

   - The usual updates and enhancements in various drivers"

* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  ntp: Make sure RTC is synchronized when time goes backwards
  treewide: Fix wrong singular form of jiffies in comments
  cpu: Use already existing usleep_range()
  timers: Rename next_expiry_recalc() to be unique
  platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
  clocksource/drivers/jcore: Use request_percpu_irq()
  clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
  clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
  clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
  clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
  platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
  clocksource: acpi_pm: Add external callback for suspend/resume
  clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
  dt-bindings: timer: rockchip: Add rk3576 compatible
  timers: Annotate possible non critical data race of next_expiry
  timers: Remove historical extra jiffie for timeout in msleep()
  hrtimer: Use and report correct timerslack values for realtime tasks
  hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
  timers: Add sparse annotation for timer_sync_wait_running().
  signal: Replace BUG_ON()s
  ...
2024-09-17 07:25:37 +02:00
Linus Torvalds
ee25861f26 vfs-6.12.fallocate
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEwAAKCRCRxhvAZXjc
 omD7AQCZuWPXkEGYFD37MJZuRXNEoq7Tuj6yd0O2b5khUpzvyAD+MPuthGiCMPsu
 voPpUP83x7T0D3JsEsCAXtNeVRcIBQI=
 =xTs6
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fallocate updates from Christian Brauner:
 "This contains work to try and cleanup some the fallocate mode
  handling. Currently, it confusingly mixes operation modes and an
  optional flag.

  The work here tries to better define operation modes and optional
  flags allowing the core and filesystem code to use switch statements
  to switch on the operation mode"

* tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  xfs: refactor xfs_file_fallocate
  xfs: move the xfs_is_always_cow_inode check into xfs_alloc_file_space
  xfs: call xfs_flush_unmap_range from xfs_free_file_space
  fs: sort out the fallocate mode vs flag mess
  ext4: remove tracing for FALLOC_FL_NO_HIDE_STALE
  block: remove checks for FALLOC_FL_NO_HIDE_STALE
2024-09-16 09:34:08 +02:00
Thomas Gleixner
2f7eedca6c Merge branch 'linus' into timers/core
To update with the latest fixes.
2024-09-10 13:49:53 +02:00
Anna-Maria Behnsen
bd7c8ff9fe treewide: Fix wrong singular form of jiffies in comments
There are several comments all over the place, which uses a wrong singular
form of jiffies.

Replace 'jiffie' by 'jiffy'. No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Link: https://lore.kernel.org/all/20240904-devel-anna-maria-b4-timers-flseep-v1-3-e98760256370@linutronix.de
2024-09-08 20:47:40 +02:00
Christoph Hellwig
90fa22da6d xfs: ensure st_blocks never goes to zero during COW writes
COW writes remove the amount overwritten either directly for delalloc
reservations, or in earlier deferred transactions than adding the new
amount back in the bmap map transaction.  This means st_blocks on an
inode where all data is overwritten using the COW path can temporarily
show a 0 st_blocks.  This can easily be reproduced with the pending
zoned device support where all writes use this path and trips the
check in generic/615, but could also happen on a reflink file without
that.

Fix this by temporarily add the pending blocks to be mapped to
i_delayed_blks while the item is queued.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:47 +05:30
Christoph Hellwig
866cf1dd3d xfs: use xas_for_each_marked in xfs_reclaim_inodes_count
xfs_reclaim_inodes_count iterates over all AGs to sum up the reclaimable
inodes counts.  There is no point in grabbing a reference to the them or
unlock the RCU critical section for each iteration, so switch to the
more efficient xas_for_each_marked iterator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:46 +05:30
Christoph Hellwig
32fa4059fe xfs: convert perag lookup to xarray
Convert the perag lookup from the legacy radix tree to the xarray,
which allows for much nicer iteration and bulk lookup semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:46 +05:30
Christoph Hellwig
f9ffd095c8 xfs: simplify tagged perag iteration
Pass the old perag structure to the tagged loop helpers so that they can
grab the old agno before releasing the reference.  This removes the need
to separately track the agno and the iterator macro, and thus also
obsoletes the for_each_perag_tag syntactic sugar.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:44 +05:30
Christoph Hellwig
f48f0a8e00 xfs: move the tagged perag lookup helpers to xfs_icache.c
The tagged perag helpers are only used in xfs_icache.c in the kernel code
and not at all in xfsprogs.  Move them to xfs_icache.c in preparation for
switching to an xarray, for which I have no plan to implement the tagged
lookup functions for userspace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:43 +05:30
Christoph Hellwig
4ef7c6d39d xfs: use kfree_rcu_mightsleep to free the perag structures
Using the kfree_rcu_mightsleep is simpler and removes the need for a
rcu_head in the perag structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:43 +05:30
Hongbo Li
70045dafdf xfs: use LIST_HEAD() to simplify code
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD().

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:42 +05:30
Jiapeng Chong
9db384feea xfs: Remove duplicate xfs_trans_priv.h header
./fs/xfs/libxfs/xfs_defer.c: xfs_trans_priv.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9491
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:41 +05:30
Dan Carpenter
fb8b941c75 xfs: remove unnecessary check
We checked that "pip" is non-NULL at the start of the if else statement
so there is no need to check again here.  Delete the check.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:40 +05:30
John Garry
ca57120dfe xfs: Use xfs set and clear mp state helpers
Use the set and clear mp state helpers instead of open-coding.

It is noted that in some instances calls to atomic operation set_bit() and
clear_bit() are being replaced with test_and_set_bit() and
test_and_clear_bit(), respectively, as there is no specific helpers for
set_bit() and clear_bit() only. However should be ok, as we are just
ignoring the returned value from those "test" variants.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:39 +05:30
Christoph Hellwig
9372dce08b xfs: reclaim speculative preallocations for append only files
The XFS XFS_DIFLAG_APPEND maps to the VFS S_APPEND flag, which forbids
writes that don't append at the current EOF.

But the commit originally adding XFS_DIFLAG_APPEND support (commit
a23321e766d in xfs xfs-import repository) also checked it to skip
releasing speculative preallocations, which doesn't make any sense.

Another commit (dd9f438e32 in the xfs-import repository) later extended
that flag to also report these speculation preallocations which should
not exist in getbmap.

Remove these checks as nothing XFS_DIFLAG_APPEND implies that
preallocations beyond EOF should exist, but explicitly check for
XFS_DIFLAG_APPEND in xfs_file_release to bypass the algorithm that
discard preallocations on the first close as append only files aren't
expected to be written to only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:39 +05:30
Christoph Hellwig
11f4c3a53a xfs: simplify extent lookup in xfs_can_free_eofblocks
xfs_can_free_eofblocks just cares if there is an extent beyond EOF.
Replace the call to xfs_bmapi_read with a xfs_iext_lookup_extent
as we've already checked that extents are read in earlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:38 +05:30
Christoph Hellwig
b717089efe xfs: check XFS_EOFBLOCKS_RELEASED earlier in xfs_release_eofblocks
If the XFS_EOFBLOCKS_RELEASED flag is set, we are not going to free the
eofblocks, so don't bother locking the inode or performing the checks in
xfs_can_free_eofblocks.  Also switch to a test_and_set operation once
the iolock has been acquire so that only the caller that sets it actually
frees the post-EOF blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:38 +05:30
Darrick J. Wong
f1204d9645 xfs: only free posteof blocks on first close
Certain workloads fragment files on XFS very badly, such as a software
package that creates a number of threads, each of which repeatedly run
the sequence: open a file, perform a synchronous write, and close the
file, which defeats the speculative preallocation mechanism.  We work
around this problem by only deleting posteof blocks the /first/ time a
file is closed to preserve the behavior that unpacking a tarball lays
out files one after the other with no gaps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: rebased, updated comment, renamed the flag]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:38 +05:30
Dave Chinner
816e3599ca xfs: don't free post-EOF blocks on read close
When we have a workload that does open/read/close in parallel with other
allocation, the file becomes rapidly fragmented. This is due to close()
calling xfs_file_release() and removing the speculative preallocation
beyond EOF.

Add a check for a writable context to xfs_file_release to skip the
post-EOF block freeing (an the similarly pointless flushing on truncate
down).

Before:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 919
/mnt/scratch/file.1: 916
/mnt/scratch/file.2: 919
/mnt/scratch/file.3: 920
/mnt/scratch/file.4: 920
/mnt/scratch/file.5: 921
/mnt/scratch/file.6: 916
/mnt/scratch/file.7: 918

After:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 24
/mnt/scratch/file.1: 24
/mnt/scratch/file.2: 11
/mnt/scratch/file.3: 24
/mnt/scratch/file.4: 3
/mnt/scratch/file.5: 24
/mnt/scratch/file.6: 24
/mnt/scratch/file.7: 23

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: wordsmithing, fix commit message]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: ported to the new ->release code structure]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:38 +05:30
Christoph Hellwig
c741d79c1a xfs: skip all of xfs_file_release when shut down
There is no point in trying to free post-EOF blocks when the file system
is shutdown, as it will just error out ASAP.  Instead return instantly
when xfs_file_release is called on a shut down file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:38 +05:30
Christoph Hellwig
98e44e2bc0 xfs: don't bother returning errors from xfs_file_release
While ->release returns int, the only caller ignores the return value.
As we're only doing cleanup work there isn't much of a point in
return a value to start with, so just document the situation instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:38 +05:30
Christoph Hellwig
5d3ca62611 xfs: refactor f_op->release handling
Currently f_op->release is split in not very obvious ways.  Fix that by
folding xfs_release into xfs_file_release.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:37 +05:30
Christoph Hellwig
6e13dbebd5 xfs: remove the i_mode check in xfs_release
xfs_release is only called from xfs_file_release, which is wired up as
the f_op->release handler for regular files only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-09-03 10:07:37 +05:30
Darrick J. Wong
411a71256d xfs: standardize the btree maxrecs function parameters
Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions.  This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:20 -07:00
Darrick J. Wong
79124b3740 xfs: replace shouty XFS_BM{BT,DR} macros
Replace all the shouty bmap btree and bmap disk root macros with actual
functions.

sed \
 -e 's/XFS_BMBT_BLOCK_LEN/xfs_bmbt_block_len/g' \
 -e 's/XFS_BMBT_REC_ADDR/xfs_bmbt_rec_addr/g' \
 -e 's/XFS_BMBT_KEY_ADDR/xfs_bmbt_key_addr/g' \
 -e 's/XFS_BMBT_PTR_ADDR/xfs_bmbt_ptr_addr/g' \
 -e 's/XFS_BMDR_REC_ADDR/xfs_bmdr_rec_addr/g' \
 -e 's/XFS_BMDR_KEY_ADDR/xfs_bmdr_key_addr/g' \
 -e 's/XFS_BMDR_PTR_ADDR/xfs_bmdr_ptr_addr/g' \
 -e 's/XFS_BMAP_BROOT_PTR_ADDR/xfs_bmap_broot_ptr_addr/g' \
 -e 's/XFS_BMAP_BROOT_SPACE_CALC/xfs_bmap_broot_space_calc/g' \
 -e 's/XFS_BMAP_BROOT_SPACE/xfs_bmap_broot_space/g' \
 -e 's/XFS_BMDR_SPACE_CALC/xfs_bmdr_space_calc/g' \
 -e 's/XFS_BMAP_BMDR_SPACE/xfs_bmap_bmdr_space/g' \
 -i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:20 -07:00
Darrick J. Wong
de55149b66 xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block.  However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records.  We /should/ be copying keys.

Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)).  However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:20 -07:00
Darrick J. Wong
c460f0f1a2 xfs: fix FITRIM reporting again
Don't report FITRIMming more bytes than possibly exist in the
filesystem.

Fixes: 410e8a18f8 ("xfs: don't bother reporting blocks trimmed via FITRIM")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:20 -07:00
Darrick J. Wong
64dfa18d6e xfs: fix C++ compilation errors in xfs_fs.h
Several people reported C++ compilation errors due to things that C
compilers allow but C++ compilers do not.  Fix both of these problems,
and hope there aren't more of these brown paper bags in 2 months when we
finally get these fixes through the process into a released xfsprogs.

NOTE: I am submitting this bugfix over the objections of a former
maintainer, who insists that we should remove this function from the
published userspace ABI instead of fixing the C++ compilation errors.
No deprecation period, no discussion, just a hard drop of an already
provided and correct C function, which would be in contravention of
Linus' rules.  IOWs, removing ABI that have already shipped in a
released kernel requires a careful deprecation period, so I will let
that maintainer run that process.

Reported-by: kernel@mattwhitlock.name
Reported-by: sam@gentoo.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219203
Fixes: 233f4e12bb ("xfs: add parent pointer ioctls")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:20 -07:00
Darrick J. Wong
2c4162be6c xfs: refactor loading quota inodes in the regular case
Create a helper function to load quota inodes in the case where the
dqtype and the sb quota inode fields correspond.  This is true for
nearly all the iget callsites in the quota code, except for when we're
switching the group and project quota inodes.  We'll need this in
subsequent patches to make the metadir handling less convoluted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:20 -07:00
Darrick J. Wong
2ca7b9d7b8 xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c
Move this function out of xfs_ioctl.c to reduce the clutter in there,
and make the entire getfsmap implementation self-contained in a single
file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
516f91035c xfs: rearrange xfs_fsmap.c a little bit
The order of the functions in this file has gotten a little confusing
over the years.  Specifically, the two data device implementations
(bnobt and rmapbt) could be adjacent in the source code instead of split
in two by the logdev and rtdev fsmap implementations.  We're about to
add more functionality to this file, so rearrange things now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
33912286cb xfs: replace m_rsumsize with m_rsumblocks
Track the RT summary file size in blocks, just like the RT bitmap
file.  While we have users of both units, blocks are used slightly
more often and this matches the bitmap file for consistency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
1fc51cf11d xfs: remove xfs_{rtbitmap,rtsummary}_wordcount
xfs_rtbitmap_wordcount and xfs_rtsummary_wordcount are currently unused,
so remove them to simplify refactoring other rtbitmap helpers.  They
can be added back or simply open coded when actually needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
0902819fe6 xfs: add xchk_setup_nothing and xchk_nothing helpers
Add common helpers for no-op scrubbing methods.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
ec12f97f1b xfs: make the rtalloc start hint a xfs_rtblock_t
0 is a valid start RT extent, and with pending changes it will become
both more common and non-unique.  Switch to pass a xfs_rtblock_t instead
so that we can use NULLRTBLOCK to determine if a hint was set or not.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
b2dd85f414 xfs: factor out a xfs_rtallocate_align helper
Split the code to calculate the aligned allocation request from
xfs_bmap_rtalloc into a separate self-contained helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
fd048a1bb3 xfs: rework the rtalloc fallback handling
xfs_rtallocate currently has two fallbacks, when an allocation fails:

 1) drop the requested extent size alignment, if any, and retry
 2) ignore the locality hint

Oddly enough it does those in order, as trying a different location
is more in line with what the user asked for, and does it in a very
unstructured way.

Lift the fallback to try to allocate without the locality hint into
xfs_rtallocate to both perform them in a more sensible order and to
clean up the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
a9f646af43 xfs: factor out a xfs_rtallocate helper
Split out a helper from xfs_rtallocate that performs the actual
allocation.  This keeps the scope of the xfs_rtalloc_args structure
contained, and prepares for rtgroups support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
1e21d1897f xfs: clean up the ISVALID macro in xfs_bmap_adjacent
Turn the  ISVALID macro defined and used inside in xfs_bmap_adjacent
that relies on implict context into a proper inline function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
df8b181f15 xfs: simplify xfs_rtalloc_query_range
There isn't much of a good reason to pass the xfs_rtalloc_rec structures
that describe extents to xfs_rtalloc_query_range as we really just want
a lower and upper bound xfs_rtxnum_t.  Pass the rtxnum directly and
simply the interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
fa0fc38b25 xfs: remove xfs_rtb_to_rtxrem
Simplify the number of block number conversion helpers by removing
xfs_rtb_to_rtxrem.  Any recent compiler is smart enough to eliminate
the double divisions if using separate xfs_rtb_to_rtx and
xfs_rtb_to_rtxoff calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
9e9be9840f xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block
This function tries to find a suitable free space extent starting from
a particular rtbitmap block.  Some time ago, I added a clamping function
to prevent the free space scans from running off the end of the bitmap,
but I didn't quite get the logic right.

Let's say there's an allocation request with a minlen of 5 and a maxlen
of 32 and we're scanning the last rtbitmap block.  If we come within 4
rtx of the end of the rt volume, maxlen will get clamped to 4.  If the
next 3 rtx are free, we could have satisfied the allocation, but the
code setting partial besti/bestlen for "minlen < maxlen" will think that
we're doing a non-variable allocation and ignore it.

The root of this problem is overwriting maxlen; I should have stuffed
the results in a different variable, which would not have introduced
this bug.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
74c234bbe5 xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near
The near rt allocator employs two allocation strategies -- first it
tries to allocate at exactly @start.  If that fails, it will pivot back
and forth around that starting point looking for an appropriately sized
free space.

However, I clamped maxlen ages ago to prevent the exact allocation scan
from running off the end of the rt volume.  This, I realize, was
excessive.  If the allocation request is (say) for 32 rtx but the start
position is 5 rtx from the end of the volume, we clamp maxlen to 5.  If
the exact allocation fails, we then pivot back and forth looking for 5
rtx, even though the original intent was to try to get 32 rtx.

If we then find 5 rtx when we could have gotten 32 rtx, we've not done
as well as we could have.  This may be moot if the caller immediately
comes back for more space, but it might not be.  Either way, we can do
better here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
62c3d24968 xfs: clean up xfs_rtallocate_extent_exact a bit
Before we start doing more surgery on the rt allocator, let's clean up
the exact allocator so that it doesn't change its arguments and uses the
helper introduced in the previous patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
e6a74dcf9b xfs: refactor aligning bestlen to prod
There are two places in xfs_rtalloc.c where we want to make sure that a
count of rt extents is aligned with a particular prod(uct) factor.  In
one spot, we actually use rounddown(), albeit unnecessarily if prod < 2.
In the other case, we open-code this rounding inefficiently by promoting
the 32-bit length value to a 64-bit value and then performing a 64-bit
division to figure out the subtraction.

Refactor this into a single helper that uses the correct types and
division method for the type, and skips the division entirely unless
prod is large enough to make a difference.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
e99aa0401e xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block
The loop conditional here is not quite correct because an rtbitmap block
can represent rtextents beyond the end of the rt volume.  There's no way
that it makes sense to scan for free space beyond EOFS, so don't do it.
This overrun has been present since v2.6.0.

Also fix the type of bestlen, which was incorrectly converted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Darrick J. Wong
cb59233e82 xfs: don't return too-short extents from xfs_rtallocate_extent_block
If xfs_rtallocate_extent_block is asked for a variable-sized allocation,
it will try to return the best-sized free extent, which is apparently
the largest one that it finds starting in this rtbitmap block.  It will
then trim the size of the extent as needed to align it with prod.

However, it misses one thing -- rounding down the best-fit candidate to
the required alignment could make the extent shorter than minlen.  In
the case where minlen > 1, we'd rather the caller relaxed its alignment
requirements and tried again, as the allocator already supports that.

Returning a too-short extent that causes xfs_bmapi_write to return
ENOSR if there aren't enough nmaps to handle multiple new allocations,
which can then cause filesystem shutdowns.

I haven't seen this happen on any production systems, but then I don't
think it's very common to set a per-file extent size hint on realtime
files.  I tripped it while working on the rtgroups feature and pounding
on the realtime allocator enthusiastically.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
86a0264ef2 xfs: ensure rtx mask/shift are correct after growfs
When growfs sets an extent size, it doesn't updated the m_rtxblklog and
m_rtxblkmask values, which could lead to incorrect usage of them if they
were set before and can't be used for the new extent size.

Add a xfs_mount_sb_set_rextsize helper that updates the two fields, and
also use it when calculating the new RT geometry instead of disabling
the optimization there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00
Christoph Hellwig
a18a69bbec xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock
After going great length to calculate the transaction reservation for
the new geometry, we should also use it to allocate the transaction it
was calculated for.

Fixes: 578bd4ce71 ("xfs: recompute growfsrtfree transaction reservation while growing rt volume")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-09-01 08:58:19 -07:00