linux/fs/xfs/Kconfig
Darrick J. Wong 4e98cc905c xfs: allow scrub to hook metadata updates in other writers
Certain types of filesystem metadata can only be checked by scanning
every file in the entire filesystem.  Specific examples of this include
quota counts, file link counts, and reverse mappings of file extents.
Directory and parent pointer reconstruction may also fall into this
category.  File scanning is much trickier than scanning AG metadata
because we have to take inode locks in the same order as the rest of
[VX]FS, we can't be holding buffer locks when we do that, and scanning
the whole filesystem takes time.

Earlier versions of the online repair patchset relied heavily on
fsfreeze as a means to quiesce the filesystem so that we could take
locks in the proper order without worrying about concurrent updates from
other writers.  Reviewers of those patches opined that freezing the
entire fs to check and repair something was not sufficiently better than
unmounting to run fsck offline.  I don't agree with that 100%, but the
message was clear: find a way to repair things that minimizes the
quiet period where nobody can write to the filesystem.

Generally, building btree indexes online can be split into two phases: a
collection phase where we compute the records that will be put into the
new btree; and a construction phase, where we construct the physical
btree blocks and persist them.  While it's simple to hold resource locks
for the entirety of the two phases to ensure that the new index is
consistent with the rest of the system, we don't need to hold resource
locks during the collection phase if we have a means to receive live
updates of other work going on elsewhere in the system.

The goal of this patch, then, is to enable online fsck to learn about
metadata updates going on in other threads while it constructs a shadow
copy of the metadata records to verify or correct the real metadata.  To
minimize the overhead when online fsck isn't running, we use srcu
notifiers because they prioritize fast access to the notifier call chain
(particularly when the chain is empty) at a cost to configuring
notifiers.  Online fsck should be relatively infrequent, so this is
acceptable.

The intended usage model is fairly simple.  Code that modifies a
metadata structure of interest should declare a xfs_hook_chain structure
in some well defined place, and call xfs_hook_call whenever an update
happens.  Online fsck code should define a struct notifier_block and use
xfs_hook_add to attach the block to the chain, along with a function to
be called.  This function should synchronize with the fsck scanner to
update whatever in-memory data the scanner is collecting.  When
finished, xfs_hook_del removes the notifier from the list and waits for
them all to complete.

Originally, I selected srcu notifiers over blocking notifiers to
implement live hooks because they seemed to have fewer impacts to
scalability.  The per-call cost of srcu_notifier_call_chain is higher
(19ns) than blocking_notifier_ (4ns) in the single threaded case, but
blocking notifiers use an rwsem to stabilize the list.  Cacheline
bouncing for that rwsem is costly to runtime code when there are a lot
of CPUs running regular filesystem operations.  If there are no hooks
installed, this is a total waste of CPU time.

Therefore, I stuck with srcu notifiers, despite trading off single
threaded performance for multithreaded performance.  I also wasn't
thrilled with the very high teardown time for srcu notifiers, since the
caller has to wait for the next rcu grace period.  This can take a long
time if there are a lot of CPUs.

Then I discovered the jump label implementation of static keys.

Jump labels use kernel code patching to replace a branch with a nop sled
when the key is disabled.  IOWs, they can eliminate the overhead of
_call_chain when there are no hooks enabled.  This makes blocking
notifiers competitive again -- scrub runs faster because teardown of the
chain is a lot cheaper, and runtime code only pays the rwsem locking
overhead when scrub is actually running.

With jump labels enabled, calls to empty notifier chains are elided from
the call sites when there are no hooks registered, which means that the
overhead is 0.36ns when fsck is not running.  This is perfect for most
of the architectures that XFS is expected to run on (e.g. x86, powerpc,
arm64, s390x, riscv).

For architectures that don't support jump labels (e.g. m68k) the runtime
overhead of checking the static key is an atomic counter read.  This
isn't great, but it's still cheaper than taking a shared rwsem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-02-22 12:30:45 -08:00

224 lines
8.4 KiB
Plaintext

# SPDX-License-Identifier: GPL-2.0-only
config XFS_FS
tristate "XFS filesystem support"
depends on BLOCK
select EXPORTFS
select LIBCRC32C
select FS_IOMAP
help
XFS is a high performance journaling filesystem which originated
on the SGI IRIX platform. It is completely multi-threaded, can
support large files and large filesystems, extended attributes,
variable block sizes, is extent based, and makes extensive use of
Btrees (directories, extents, free space) to aid both performance
and scalability.
Refer to the documentation at <http://oss.sgi.com/projects/xfs/>
for complete details. This implementation is on-disk compatible
with the IRIX version of XFS.
To compile this file system support as a module, choose M here: the
module will be called xfs. Be aware, however, that if the file
system of your root partition is compiled as a module, you'll need
to use an initial ramdisk (initrd) to boot.
config XFS_SUPPORT_V4
bool "Support deprecated V4 (crc=0) format"
depends on XFS_FS
default y
help
The V4 filesystem format lacks certain features that are supported
by the V5 format, such as metadata checksumming, strengthened
metadata verification, and the ability to store timestamps past the
year 2038. Because of this, the V4 format is deprecated. All users
should upgrade by backing up their files, reformatting, and restoring
from the backup.
Administrators and users can detect a V4 filesystem by running
xfs_info against a filesystem mountpoint and checking for a string
beginning with "crc=". If the string "crc=0" is found, the
filesystem is a V4 filesystem. If no such string is found, please
upgrade xfsprogs to the latest version and try again.
This option will become default N in September 2025. Support for the
V4 format will be removed entirely in September 2030. Distributors
can say N here to withdraw support earlier.
To continue supporting the old V4 format (crc=0), say Y.
To close off an attack surface, say N.
config XFS_SUPPORT_ASCII_CI
bool "Support deprecated case-insensitive ascii (ascii-ci=1) format"
depends on XFS_FS
default y
help
The ASCII case insensitivity filesystem feature only works correctly
on systems that have been coerced into using ISO 8859-1, and it does
not work on extended attributes. The kernel has no visibility into
the locale settings in userspace, so it corrupts UTF-8 names.
Enabling this feature makes XFS vulnerable to mixed case sensitivity
attacks. Because of this, the feature is deprecated. All users
should upgrade by backing up their files, reformatting, and restoring
from the backup.
Administrators and users can detect such a filesystem by running
xfs_info against a filesystem mountpoint and checking for a string
beginning with "ascii-ci=". If the string "ascii-ci=1" is found, the
filesystem is a case-insensitive filesystem. If no such string is
found, please upgrade xfsprogs to the latest version and try again.
This option will become default N in September 2025. Support for the
feature will be removed entirely in September 2030. Distributors
can say N here to withdraw support earlier.
To continue supporting case-insensitivity (ascii-ci=1), say Y.
To close off an attack surface, say N.
config XFS_QUOTA
bool "XFS Quota support"
depends on XFS_FS
select QUOTACTL
help
If you say Y here, you will be able to set limits for disk usage on
a per user and/or a per group basis under XFS. XFS considers quota
information as filesystem metadata and uses journaling to provide a
higher level guarantee of consistency. The on-disk data format for
quota is also compatible with the IRIX version of XFS, allowing a
filesystem to be migrated between Linux and IRIX without any need
for conversion.
If unsure, say N. More comprehensive documentation can be found in
README.quota in the xfsprogs package. XFS quota can be used either
with or without the generic quota support enabled (CONFIG_QUOTA) -
they are completely independent subsystems.
config XFS_POSIX_ACL
bool "XFS POSIX ACL support"
depends on XFS_FS
select FS_POSIX_ACL
help
POSIX Access Control Lists (ACLs) support permissions for users and
groups beyond the owner/group/world scheme.
If you don't know what Access Control Lists are, say N.
config XFS_RT
bool "XFS Realtime subvolume support"
depends on XFS_FS
help
If you say Y here you will be able to mount and use XFS filesystems
which contain a realtime subvolume. The realtime subvolume is a
separate area of disk space where only file data is stored. It was
originally designed to provide deterministic data rates suitable
for media streaming applications, but is also useful as a generic
mechanism for ensuring data and metadata/log I/Os are completely
separated. Regular file I/Os are isolated to a separate device
from all other requests, and this can be done quite transparently
to applications via the inherit-realtime directory inode flag.
See the xfs man page in section 5 for additional information.
If unsure, say N.
config XFS_DRAIN_INTENTS
bool
select JUMP_LABEL if HAVE_ARCH_JUMP_LABEL
config XFS_LIVE_HOOKS
bool
select JUMP_LABEL if HAVE_ARCH_JUMP_LABEL
config XFS_ONLINE_SCRUB
bool "XFS online metadata check support"
default n
depends on XFS_FS
depends on TMPFS && SHMEM
select XFS_LIVE_HOOKS
select XFS_DRAIN_INTENTS
help
If you say Y here you will be able to check metadata on a
mounted XFS filesystem. This feature is intended to reduce
filesystem downtime by supplementing xfs_repair. The key
advantage here is to look for problems proactively so that
they can be dealt with in a controlled manner.
This feature is considered EXPERIMENTAL. Use with caution!
See the xfs_scrub man page in section 8 for additional information.
If unsure, say N.
config XFS_ONLINE_SCRUB_STATS
bool "XFS online metadata check usage data collection"
default y
depends on XFS_ONLINE_SCRUB
select DEBUG_FS
help
If you say Y here, the kernel will gather usage data about
the online metadata check subsystem. This includes the number
of invocations, the outcomes, and the results of repairs, if any.
This may slow down scrub slightly due to the use of high precision
timers and the need to merge per-invocation information into the
filesystem counters.
Usage data are collected in /sys/kernel/debug/xfs/scrub.
If unsure, say N.
config XFS_ONLINE_REPAIR
bool "XFS online metadata repair support"
default n
depends on XFS_FS && XFS_ONLINE_SCRUB
help
If you say Y here you will be able to repair metadata on a
mounted XFS filesystem. This feature is intended to reduce
filesystem downtime by fixing minor problems before they cause the
filesystem to go down. However, it requires that the filesystem be
formatted with secondary metadata, such as reverse mappings and inode
parent pointers.
This feature is considered EXPERIMENTAL. Use with caution!
See the xfs_scrub man page in section 8 for additional information.
If unsure, say N.
config XFS_WARN
bool "XFS Verbose Warnings"
depends on XFS_FS && !XFS_DEBUG
help
Say Y here to get an XFS build with many additional warnings.
It converts ASSERT checks to WARN, so will log any out-of-bounds
conditions that occur that would otherwise be missed. It is much
lighter weight than XFS_DEBUG and does not modify algorithms and will
not cause the kernel to panic on non-fatal errors.
However, similar to XFS_DEBUG, it is only advisable to use this if you
are debugging a particular problem.
config XFS_DEBUG
bool "XFS Debugging support"
depends on XFS_FS
help
Say Y here to get an XFS build with many debugging features,
including ASSERT checks, function wrappers around macros,
and extra sanity-checking functions in various code paths.
Note that the resulting code will be HUGE and SLOW, and probably
not useful unless you are debugging a particular problem.
Say N unless you are an XFS developer, or you play one on TV.
config XFS_ASSERT_FATAL
bool "XFS fatal asserts"
default y
depends on XFS_FS && XFS_DEBUG
help
Set the default DEBUG mode ASSERT failure behavior.
Say Y here to cause DEBUG mode ASSERT failures to result in fatal
errors that BUG() the kernel by default. If you say N, ASSERT failures
result in warnings.
This behavior can be modified at runtime via sysfs.