linux

Author	SHA1	Message	Date
Al Viro	51c6546c30	follow_dotdot{,_rcu}(): change calling conventions Instead of returning NULL when we are in root, just make it return the current position (and set seqp and inodep accordingly). That collapses the calls of step_into() in handle_dots() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-05 16:16:28 -04:00
Al Viro	82ef069805	namei: get rid of pointless unlikely(read_seqcount_retry(...)) read_seqcount_retry() et.al. are inlined and there's enough annotations for compiler to figure out that those are unlikely to return non-zero. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-05 16:16:16 -04:00
Al Viro	20aac6c609	__follow_mount_rcu(): verify that mount_lock remains unchanged Validate mount_lock seqcount as soon as we cross into mount in RCU mode. Sure, ->mnt_root is pinned and will remain so until we do rcu_read_unlock() anyway, and we will eventually fail to unlazy if the mount_lock had been touched, but we might run into a hard error (e.g. -ENOENT) before trying to unlazy. And it's possible to end up with RCU pathwalk racing with rename() and umount() in a way that would fail with -ENOENT while non-RCU pathwalk would've succeeded with any timings. Once upon a time we hadn't needed that, but analysis had been subtle, brittle and went out of window as soon as RENAME_EXCHANGE had been added. It's narrow, hard to hit and won't get you anything other than stray -ENOENT that could be arranged in much easier way with the same priveleges, but it's a bug all the same. Cc: stable@kernel.org X-sky-is-falling: unlikely Fixes: `da1ce0670c` "vfs: add cross-rename" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-05 16:14:14 -04:00
David Howells	85e4ea1049	fscache: Fix invalidation/lookup race If an NFS file is opened for writing and closed, fscache_invalidate() will be asked to invalidate the file - however, if the cookie is in the LOOKING_UP state (or the CREATING state), then request to invalidate doesn't get recorded for fscache_cookie_state_machine() to do something with. Fix this by making __fscache_invalidate() set a flag if it sees the cookie is in the LOOKING_UP state to indicate that we need to go to invalidation. Note that this requires a count on the n_accesses counter for the state machine, which that will release when it's done. fscache_cookie_state_machine() then shifts to the INVALIDATING state if it sees the flag. Without this, an nfs file can get corrupted if it gets modified locally and then read locally as the cache contents may not get updated. Fixes: `d24af13e2e` ("fscache: Implement cookie invalidation") Reported-by: Max Kellermann <mk@cm4all.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Max Kellermann <mk@cm4all.com> Link: https://lore.kernel.org/r/YlWWbpW5Foynjllo@rabbit.intern.cm-ag [1]	2022-07-05 16:12:55 +01:00
Jia Zhu	65aa5f6fd8	cachefiles: narrow the scope of flushed requests when releasing fd When an anonymous fd is released, only flush the requests associated with it, rather than all of requests in xarray. Fixes: `9032b6e858` ("cachefiles: implement on-demand read") Signed-off-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://listman.redhat.com/archives/linux-cachefs/2022-June/006937.html	2022-07-05 16:12:21 +01:00
Yue Hu	5c4588aea6	fscache: Introduce fscache_cookie_is_dropped() FSCACHE_COOKIE_STATE_DROPPED will be read more than once, so let's add a helper to avoid code duplication. Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://listman.redhat.com/archives/linux-cachefs/2022-May/006919.html	2022-07-05 16:12:20 +01:00
Yue Hu	bf17455b9c	fscache: Fix if condition in fscache_wait_on_volume_collision() After waiting for the volume to complete the acquisition with timeout, the if condition under which potential volume collision occurs should be acquire the volume is still pending rather than not pending so that we will continue to wait until the pending flag is cleared. Also, use the existing test pending wrapper directly instead of test_bit(). Fixes: `62ab633523` ("fscache: Implement volume registration") Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://listman.redhat.com/archives/linux-cachefs/2022-May/006918.html	2022-07-05 16:12:20 +01:00
Colin Ian King	cc83b0c7e3	fs/ntfs3: Remove duplicated assignment to variable r The assignment to variable r is duplicated, the second assignment is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-07-05 16:07:54 +03:00
Dan Carpenter	4838ec0d80	fs/ntfs3: Unlock on error in attr_insert_range() This error path needs to call up_write(&ni->file.run_lock) and do some other clean up before returning. Fixes: `aa30eccb24` ("fs/ntfs3: Fallocate (FALLOC_FL_INSERT_RANGE) implementation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-07-05 16:07:53 +03:00
Pavel Skripkin	e66af07ca2	fs/ntfs3: Make ntfs_update_mftmirr return void None of callers check the return value of ntfs_update_mftmirr(), so make it return void to make code simpler. Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-07-05 16:07:53 +03:00
Pavel Skripkin	321460ca3b	fs/ntfs3: Fix NULL deref in ntfs_update_mftmirr If ntfs_fill_super() wasn't called then sbi->sb will be equal to NULL. Code should check this ptr before dereferencing. Syzbot hit this issue via passing wrong mount param as can be seen from log below Fail log: ntfs3: Unknown parameter 'iochvrset' general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 1 PID: 3589 Comm: syz-executor210 Not tainted 5.18.0-rc3-syzkaller-00016-gb253435746d9 #0 ... Call Trace: <TASK> put_ntfs+0x1ed/0x2a0 fs/ntfs3/super.c:463 ntfs_fs_free+0x6a/0xe0 fs/ntfs3/super.c:1363 put_fs_context+0x119/0x7a0 fs/fs_context.c:469 do_new_mount+0x2b4/0xad0 fs/namespace.c:3044 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] Fixes: `82cae269cf` ("fs/ntfs3: Add initialization of super block") Reported-and-tested-by: syzbot+c95173762127ad76a824@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-07-05 16:07:46 +03:00
Vincent Whitchurch	3093484301	mm/smaps: add Pss_Dirty Pss is the sum of the sizes of clean and dirty private pages, and the proportional sizes of clean and dirty shared pages: Private = Private_Dirty + Private_Clean Shared_Proportional = Shared_Dirty_Proportional + Shared_Clean_Proportional Pss = Private + Shared_Proportional The Shared*Proportional fields are not present in smaps, so it is not always possible to determine how much of the Pss is from dirty pages and how much is from clean pages. This information can be useful for measuring memory usage for the purpose of optimisation, since clean pages can usually be discarded by the kernel immediately while dirty pages cannot. The smaps routines in the kernel already have access to this data, so add a Pss_Dirty to show it to userspace. Pss_Clean is not added since it can be calculated from Pss and Pss_Dirty. Link: https://lkml.kernel.org/r/20220620081251.2928103-1-vincent.whitchurch@axis.com Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-03 18:08:49 -07:00
Roman Gushchin	e33c267ab7	mm: shrinkers: provide shrinkers with names Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-03 18:08:40 -07:00
Ryusuke Konishi	5924e6ec15	nilfs2: fix incorrect masking of permission flags for symlinks The permission flags of newly created symlinks are wrongly dropped on nilfs2 with the current umask value even though symlinks should have 777 (rwxrwxrwx) permissions: $ umask 0022 $ touch file && ln -s file symlink; ls -l file symlink -rw-r--r--. 1 root root 0 Jun 23 16:29 file lrwxr-xr-x. 1 root root 4 Jun 23 16:29 symlink -> file This fixes the bug by inserting a missing check that excludes symlinks. Link: https://lkml.kernel.org/r/1655974441-5612-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: Tommy Pettersson <ptp@lysator.liu.se> Reported-by: Ciprian Craciun <ciprian.craciun@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-03 15:42:33 -07:00
Linus Torvalds	20855e4cb3	Merge tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "This fixes some stalling problems and corrects the last of the problems (I hope) observed during testing of the new atomic xattr update feature. - Fix statfs blocking on background inode gc workers - Fix some broken inode lock assertion code - Fix xattr leaf buffer leaks when cancelling a deferred xattr update operation - Clean up xattr recovery to make it easier to understand. - Fix xattr leaf block verifiers tripping over empty blocks. - Remove complicated and error prone xattr leaf block bholding mess. - Fix a bug where an rt extent crossing EOF was treated as "posteof" blocks and cleaned unnecessarily. - Fix a UAF when log shutdown races with unmount" * tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: prevent a UAF when log IO errors race with unmount xfs: dont treat rt extents beyond EOF as eofblocks to be cleared xfs: don't hold xattr leaf buffers across transaction rolls xfs: empty xattr leaf header blocks are not corruption xfs: clean up the end of xfs_attri_item_recover xfs: always free xattri_leaf_bp when cancelling a deferred op xfs: use invalidate_lock to check the state of mmap_lock xfs: factor out the common lock flags assert xfs: introduce xfs_inodegc_push() xfs: bound maximum wait time for inodegc work	2022-07-03 09:42:17 -07:00
Linus Torvalds	69cb6c6556	Merge tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Notable regression fixes: - Fix NFSD crash during NFSv4.2 READ_PLUS operation - Fix incorrect status code returned by COMMIT operation" * tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix READ_PLUS crasher NFSD: restore EINVAL error translation in nfsd_commit()	2022-07-02 11:20:56 -07:00
Yang Li	e3baced02a	9p: Fix some kernel-doc comments Remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/9p/fid.c:35: warning: Function parameter or member 'pfid' not described in 'v9fs_fid_add' fs/9p/fid.c:35: warning: Excess function parameter 'fid' description in 'v9fs_fid_add' fs/9p/fid.c:80: warning: Function parameter or member 'pfid' not described in 'v9fs_open_fid_add' fs/9p/fid.c:80: warning: Excess function parameter 'fid' description in 'v9fs_open_fid_add' Link: https://lkml.kernel.org/r/20220615012039.43479-1-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [Dominique: further adjust comment] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Dominique Martinet	dafbe68973	9p fid refcount: cleanup p9_fid_put calls Simplify p9_fid_put cleanup path in many 9p functions since the function is noop on null or error fids. Also make the *_add_fid() helpers "steal" the fid by nulling its pointer, so put after them will be noop. This should lead to no change of behaviour Link: https://lkml.kernel.org/r/20220612085330.1451496-7-asmadeus@codewreck.org Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Dominique Martinet	b48dbb998d	9p fid refcount: add p9_fid_get/put wrappers I was recently reminded that it is not clear that p9_client_clunk() was actually just decrementing refcount and clunking only when that reaches zero: make it clear through a set of helpers. This will also allow instrumenting refcounting better for debugging next patch Link: https://lkml.kernel.org/r/20220612085330.1451496-5-asmadeus@codewreck.org Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Tyler Hicks	b296d05746	9p: Fix minor typo in code comment Fix s/patch/path/ typo and make it clear that we're talking about multiple path components. Link: https://lkml.kernel.org/r/20220527000003.355812-6-tyhicks@linux.microsoft.com Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Tyler Hicks	47b1e3432b	9p: Remove unnecessary variable for old fids while walking from d_parent Remove the ofid variable that's local to the conditional block in favor of the old_fid variable that's local to the entire function. Link: https://lkml.kernel.org/r/20220527000003.355812-5-tyhicks@linux.microsoft.com Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Tyler Hicks	c58c72d301	9p: Make the path walk logic more clear about when cloning is required Cloning during a path component walk is only needed when the old fid used for the walk operation is the root fid. Make that clear by comparing the two fids rather than using an additional variable. Link: https://lkml.kernel.org/r/20220527000003.355812-4-tyhicks@linux.microsoft.com Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Tyler Hicks	cba83f47fc	9p: Track the root fid with its own variable during lookups Improve readability by using a new variable when dealing with the root fid. The root fid requires special handling in comparison to other fids used during fid lookup. Link: https://lkml.kernel.org/r/20220527000003.355812-3-tyhicks@linux.microsoft.com Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Zhang Jiaming	5036793d7d	exec: Fix a spelling mistake Change 'wont't' to 'won't'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Reviewed-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220629072932.27506-1-jiaming@nfschina.com	2022-07-01 15:06:14 -07:00
Linus Torvalds	76ff294e16	Merge tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Allocate a fattr for _nfs4_discover_trunking() - Fix module reference count leak in nfs4_run_state_manager() * tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4: Add an fattr allocation to _nfs4_discover_trunking() NFS: restore module put when manager exits.	2022-07-01 11:11:32 -07:00
Linus Torvalds	6f8693ea2b	Merge tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client Pull ceph fix from Ilya Dryomov: "A ceph filesystem fix, marked for stable. There appears to be a deeper issue on the MDS side, but for now we are going with this one-liner to avoid busy looping and potential soft lockups" * tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client: ceph: wait on async create before checking caps for syncfs	2022-07-01 11:06:21 -07:00
Linus Torvalds	0a35d1622d	Merge tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two minor tweaks: - While we still can, adjust the send/recv based flags to be in ->ioprio rather than in ->addr2. This is consistent with eg accept, and also doesn't waste a full 64-bit field for flags (Pavel) - 5.18-stable fix for re-importing provided buffers. Not much real world relevance here as it'll only impact non-pollable files gone async, which is more of a practical test case rather than something that is used in the wild (Dylan)" * tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block: io_uring: fix provided buffer import io_uring: keep sendrecv flags in ioprio	2022-07-01 10:52:01 -07:00
Dave Chinner	af1c2146a5	xfs: introduce per-cpu CIL tracking structure The CIL push lock is highly contended on larger machines, becoming a hard bottleneck that about 700,000 transaction commits/s on >16p machines. To address this, start moving the CIL tracking infrastructure to utilise per-CPU structures. We need to track the space used, the amount of log reservation space reserved to write the CIL, the log items in the CIL and the busy extents that need to be completed by the CIL commit. This requires a couple of per-cpu counters, an unordered per-cpu list and a globally ordered per-cpu list. Create a per-cpu structure to hold these and all the management interfaces needed, as well as the hooks to handle hotplug CPUs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2022-07-02 02:13:52 +10:00
Dave Chinner	31151cc342	xfs: rework per-iclog header CIL reservation For every iclog that a CIL push will use up, we need to ensure we have space reserved for the iclog header in each iclog. It is extremely difficult to do this accurately with a per-cpu counter without expensive summing of the counter in every commit. However, we know what the maximum CIL size is going to be because of the hard space limit we have, and hence we know exactly how many iclogs we are going to need to write out the CIL. We are constrained by the requirement that small transactions only have reservation space for a single iclog header built into them. At commit time we don't know how much of the current transaction reservation is made up of iclog header reservations as calculated by xfs_log_calc_unit_res() when the ticket was reserved. As larger reservations have multiple header spaces reserved, we can steal more than one iclog header reservation at a time, but we only steal the exact number needed for the given log vector size delta. As a result, we don't know exactly when we are going to steal iclog header reservations, nor do we know exactly how many we are going to need for a given CIL. To make things simple, start by calculating the worst case number of iclog headers a full CIL push will require. Record this into an atomic variable in the CIL. Then add a byte counter to the log ticket that records exactly how much iclog header space has been reserved in this ticket by xfs_log_calc_unit_res(). This tells us exactly how much space we can steal from the ticket at transaction commit time. Now, at transaction commit time, we can check if the CIL has a full iclog header reservation and, if not, steal the entire reservation the current ticket holds for iclog headers. This minimises the number of times we need to do atomic operations in the fast path, but still guarantees we get all the reservations we need. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2022-07-02 02:12:52 +10:00
Dave Chinner	12380d237b	xfs: lift init CIL reservation out of xc_cil_lock The xc_cil_lock is the most highly contended lock in XFS now. To start the process of getting rid of it, lift the initial reservation of the CIL log space out from under the xc_cil_lock. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2022-07-02 02:11:52 +10:00
Dave Chinner	88591e7f06	xfs: use the CIL space used counter for emptiness checks In the next patches we are going to make the CIL list itself per-cpu, and so we cannot use list_empty() to check is the list is empty. Replace the list_empty() checks with a flag in the CIL to indicate we have committed at least one transaction to the CIL and hence the CIL is not empty. We need this flag to be an atomic so that we can clear it without holding any locks in the commit fast path, but we also need to be careful to avoid atomic operations in the fast path. Hence we use the fact that test_bit() is not an atomic op to first check if the flag is set and then run the atomic test_and_clear_bit() operation to clear it and steal the initial unit reservation for the CIL context checkpoint. When we are switching to a new context in a push, we place the setting of the XLOG_CIL_EMPTY flag under the xc_push_lock. THis allows all the other places that need to check whether the CIL is empty to use test_bit() and still be serialised correctly with the CIL context swaps that set the bit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2022-07-02 02:10:52 +10:00
Darrick J. Wong	7561cea5db	xfs: prevent a UAF when log IO errors race with unmount KASAN reported the following use after free bug when running generic/475: XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) XFS (dm-0): Ending recovery (logdev: internal) Buffer I/O error on dev dm-0, logical block 20639616, async page read Buffer I/O error on dev dm-0, logical block 20639617, async page read XFS (dm-0): log I/O error -5 XFS (dm-0): Filesystem has been shut down due to log error (0x2). XFS (dm-0): Unmounting Filesystem XFS (dm-0): Please unmount the filesystem and rectify the problem(s). ================================================================== BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270 Read of size 4 at addr ffff888109dd84c4 by task 3:1H/136 CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4 8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs] Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x2b8/0x661 ? do_raw_spin_lock+0x246/0x270 kasan_report+0xab/0x120 ? do_raw_spin_lock+0x246/0x270 do_raw_spin_lock+0x246/0x270 ? rwlock_bug.part.0+0x90/0x90 xlog_force_shutdown+0xf6/0x370 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] xlog_ioend_work+0x100/0x190 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] process_one_work+0x672/0x1040 worker_thread+0x59b/0xec0 ? __kthread_parkme+0xc6/0x1f0 ? process_one_work+0x1040/0x1040 ? process_one_work+0x1040/0x1040 kthread+0x29e/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 154099: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0x8d/0x2e0 [xfs] xlog_cil_init+0x1f/0x540 [xfs] xlog_alloc_log+0xd1e/0x1260 [xfs] xfs_log_mount+0xba/0x640 [xfs] xfs_mountfs+0xf2b/0x1d00 [xfs] xfs_fs_fill_super+0x10af/0x1910 [xfs] get_tree_bdev+0x383/0x670 vfs_get_tree+0x7d/0x240 path_mount+0xdb7/0x1890 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 154151: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x110/0x190 slab_free_freelist_hook+0xab/0x180 kfree+0xbc/0x310 xlog_dealloc_log+0x1b/0x2b0 [xfs] xfs_unmountfs+0x119/0x200 [xfs] xfs_fs_put_super+0x6e/0x2e0 [xfs] generic_shutdown_super+0x12b/0x3a0 kill_block_super+0x95/0xd0 deactivate_locked_super+0x80/0x130 cleanup_mnt+0x329/0x4d0 task_work_run+0xc5/0x160 exit_to_user_mode_prepare+0xd4/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This appears to be a race between the unmount process, which frees the CIL and waits for in-flight iclog IO; and the iclog IO completion. When generic/475 runs, it starts fsstress in the background, waits a few seconds, and substitutes a dm-error device to simulate a disk falling out of a machine. If the fsstress encounters EIO on a pure data write, it will exit but the filesystem will still be online. The next thing the test does is unmount the filesystem, which tries to clean the log, free the CIL, and wait for iclog IO completion. If an iclog was being written when the dm-error switch occurred, it can race with log unmounting as follows: Thread 1 Thread 2 xfs_log_unmount xfs_log_clean xfs_log_quiesce xlog_ioend_work <observe error> xlog_force_shutdown test_and_set_bit(XLOG_IOERROR) xfs_log_force <log is shut down, nop> xfs_log_umount_write <log is shut down, nop> xlog_dealloc_log xlog_cil_destroy <wait for iclogs> spin_lock(&log->l_cilp->xc_push_lock) <KABOOM> Therefore, free the CIL after waiting for the iclogs to complete. I /think/ this race has existed for quite a few years now, though I don't remember the ~2014 era logging code well enough to know if it was a real threat then or if the actual race was exposed only more recently. Fixes: `ac983517ec` ("xfs: don't sleep in xlog_cil_force_lsn on shutdown") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-07-01 09:09:52 -07:00
Amir Goldstein	e252f2ed1c	fanotify: introduce FAN_MARK_IGNORE This flag is a new way to configure ignore mask which allows adding and removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask. The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on directories and would ignore events on children depending on whether the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask. FAN_MARK_IGNORE can be used to ignore events on children without setting FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on directories unconditionally, only when FAN_ONDIR is set in ignore mask. The new behavior is non-downgradable. After calling fanotify_mark() with FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK on the same object will return EEXIST error. Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark has no meaning and will return ENOTDIR error. The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new FAN_MARK_IGNORE flag, but with a few semantic differences: 1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount marks and on an inode mark on a directory. Omitting this flag will return EINVAL or EISDIR error. 2. An ignore mask on a non-directory inode that survives modify could never be downgraded to an ignore mask that does not survive modify. With new FAN_MARK_IGNORE semantics we make that rule explicit - trying to update a surviving ignore mask without the flag FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error. The conveniene macro FAN_MARK_IGNORE_SURV is added for (FAN_MARK_IGNORE \| FAN_MARK_IGNORED_SURV_MODIFY), because the common case should use short constant names. Link: https://lore.kernel.org/r/20220629144210.2983229-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-07-01 14:53:01 +02:00
Amir Goldstein	8afd7215aa	fanotify: cleanups for fanotify_mark() input validations Create helper fanotify_may_update_existing_mark() for checking for conflicts between existing mark flags and fanotify_mark() flags. Use variable mark_cmd to make the checks for mark command bits cleaner. Link: https://lore.kernel.org/r/20220629144210.2983229-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-07-01 14:51:49 +02:00
Amir Goldstein	31a371e419	fanotify: prepare for setting event flags in ignore mask Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect. The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and ignore mask is always implicitly applied to events on directories. Define a mark flag that replaces this legacy behavior with logic of applying the ignore mask according to event flags in ignore mask. Implement the new logic to prepare for supporting an ignore mask that ignores events on children and ignore mask that does not ignore events on directories. To emphasize the change in terminology, also rename ignored_mask mark member to ignore_mask and use accessors to get only the effective ignored events or the ignored events and flags. This change in terminology finally aligns with the "ignore mask" language in man pages and in most of the comments. Link: https://lore.kernel.org/r/20220629144210.2983229-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-07-01 14:51:13 +02:00
Oliver Ford	c05787b4c2	fs: inotify: Fix typo in inotify comment Correct spelling in comment. Signed-off-by: Oliver Ford <ojford@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220518145959.41-1-ojford@gmail.com	2022-07-01 14:49:56 +02:00
Yushan Zhou	72b5d5aef2	kernfs: fix potential NULL dereference in __kernfs_remove When lockdep is enabled, lockdep_assert_held_write would cause potential NULL pointer dereference. Fix the following smatch warnings: fs/kernfs/dir.c:1353 __kernfs_remove() warn: variable dereferenced before check 'kn' (see line 1346) Fixes: `393c371408` ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Signed-off-by: Yushan Zhou <katrinzhou@tencent.com> Link: https://lore.kernel.org/r/20220630082512.3482581-1-zys.zljxml@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-07-01 10:36:36 +02:00
Amir Goldstein	868f9f2f8e	vfs: fix copy_file_range() regression in cross-fs copies A regression has been reported by Nicolas Boichat, found while using the copy_file_range syscall to copy a tracefs file. Before commit `5dae222a5f` ("vfs: allow copy_file_range to copy across devices") the kernel would return -EXDEV to userspace when trying to copy a file across different filesystems. After this commit, the syscall doesn't fail anymore and instead returns zero (zero bytes copied), as this file's content is generated on-the-fly and thus reports a size of zero. Another regression has been reported by He Zhe - the assertion of WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when copying from a sysfs file whose read operation may return -EOPNOTSUPP. Since we do not have test coverage for copy_file_range() between any two types of filesystems, the best way to avoid these sort of issues in the future is for the kernel to be more picky about filesystems that are allowed to do copy_file_range(). This patch restores some cross-filesystem copy restrictions that existed prior to commit `5dae222a5f` ("vfs: allow copy_file_range to copy across devices"), namely, cross-sb copy is not allowed for filesystems that do not implement ->copy_file_range(). Filesystems that do implement ->copy_file_range() have full control of the result - if this method returns an error, the error is returned to the user. Before this change this was only true for fs that did not implement the ->remap_file_range() operation (i.e. nfsv3). Filesystems that do not implement ->copy_file_range() still fall-back to the generic_copy_file_range() implementation when the copy is within the same sb. This helps the kernel can maintain a more consistent story about which filesystems support copy_file_range(). nfsd and ksmbd servers are modified to fall-back to the generic_copy_file_range() implementation in case vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of server-side-copy. fall-back to generic_copy_file_range() is not implemented for the smb operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct change of behavior. Fixes: `5dae222a5f` ("vfs: allow copy_file_range to copy across devices") Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/ Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/ Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/ Reported-by: Nicolas Boichat <drinkcat@chromium.org> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Luis Henriques <lhenriques@suse.de> Fixes: `64bf5ff58d` ("vfs: no fallback for ->copy_file_range") Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/ Reported-by: He Zhe <zhe.he@windriver.com> Tested-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-06-30 15:16:38 -07:00
Scott Mayhew	4f40a5b554	NFSv4: Add an fattr allocation to _nfs4_discover_trunking() This was missed in `c3ed222745` ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") and causes a panic when mounting with '-o trunkdiscovery': PID: 1604 TASK: ffff93dac3520000 CPU: 3 COMMAND: "mount.nfs" #0 [ffffb79140f738f8] machine_kexec at ffffffffaec64bee #1 [ffffb79140f73950] __crash_kexec at ffffffffaeda67fd #2 [ffffb79140f73a18] crash_kexec at ffffffffaeda76ed #3 [ffffb79140f73a30] oops_end at ffffffffaec2658d #4 [ffffb79140f73a50] general_protection at ffffffffaf60111e [exception RIP: nfs_fattr_init+0x5] RIP: ffffffffc0c18265 RSP: ffffb79140f73b08 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff93dac304a800 RCX: 0000000000000000 RDX: ffffb79140f73bb0 RSI: ffff93dadc8cbb40 RDI: d03ee11cfaf6bd50 RBP: ffffb79140f73be8 R8: ffffffffc0691560 R9: 0000000000000006 R10: ffff93db3ffd3df8 R11: 0000000000000000 R12: ffff93dac4040000 R13: ffff93dac2848e00 R14: ffffb79140f73b60 R15: ffffb79140f73b30 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffb79140f73b08] _nfs41_proc_get_locations at ffffffffc0c73d53 [nfsv4] #6 [ffffb79140f73bf0] nfs4_proc_get_locations at ffffffffc0c83e90 [nfsv4] #7 [ffffb79140f73c60] nfs4_discover_trunking at ffffffffc0c83fb7 [nfsv4] #8 [ffffb79140f73cd8] nfs_probe_fsinfo at ffffffffc0c0f95f [nfs] #9 [ffffb79140f73da0] nfs_probe_server at ffffffffc0c1026a [nfs] RIP: 00007f6254fce26e RSP: 00007ffc69496ac8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6254fce26e RDX: 00005600220a82a0 RSI: 00005600220a64d0 RDI: 00005600220a6520 RBP: 00007ffc69496c50 R8: 00005600220a8710 R9: 003035322e323231 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc69496c50 R13: 00005600220a8440 R14: 0000000000000010 R15: 0000560020650ef9 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b Fixes: `c3ed222745` ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2022-06-30 16:13:00 -04:00
NeilBrown	080abad71e	NFS: restore module put when manager exits. Commit `f49169c97f` ("NFSD: Remove svc_serv_ops::svo_module") removed calls to module_put_and_kthread_exit() from threads that acted as SUNRPC servers and had a related svc_serv_ops structure. This was correct. It ALSO removed the module_put_and_kthread_exit() call from nfs4_run_state_manager() which is NOT a SUNRPC service. Consequently every time the NFSv4 state manager runs the module count increments and won't be decremented. So the nfsv4 module cannot be unloaded. So restore the module_put_and_kthread_exit() call. Fixes: `f49169c97f` ("NFSD: Remove svc_serv_ops::svo_module") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2022-06-30 16:13:00 -04:00
Dylan Yudaken	09007af2b6	io_uring: fix provided buffer import io_import_iovec uses the s pointer, but this was changed immediately after the iovec was re-imported and so it was imported into the wrong place. Change the ordering. Fixes: `2be2eb02e2` ("io_uring: ensure reads re-import for selected buffers") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630132006.2825668-1-dylany@fb.com [axboe: ensure we don't half-import as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-30 11:34:41 -06:00
Kaixu Xia	f8189d5d5f	dax: set did_zero to true when zeroing successfully It is unnecessary to check and set did_zero value in while() loop in dax_zero_iter(), we can set did_zero to true only when zeroing successfully at last. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-06-30 10:05:11 -07:00
Kaixu Xia	98eb8d9502	iomap: set did_zero to true when zeroing successfully It is unnecessary to check and set did_zero value in while() loop in iomap_zero_iter(), we can set did_zero to true only when zeroing successfully at last. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-06-30 10:04:18 -07:00
Linus Torvalds	9fb3bb25d1	Merge tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A fix for recently added fanotify API to have stricter checks and refuse some invalid flag combinations to make our life easier in the future" * tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: refine the validation checks on non-dir inode mask	2022-06-30 09:57:18 -07:00
Chris Mason	d58562ca6c	iomap: skip pages past eof in iomap_do_writepage() iomap_do_writepage() sends pages past i_size through folio_redirty_for_writepage(), which normally isn't a problem because truncate and friends clean them very quickly. When the system has cgroups configured, we can end up in situations where one cgroup has almost no dirty pages at all, and other cgroups consume the entire background dirty limit. This is especially common in our XFS workloads in production because they have cgroups using O_DIRECT for almost all of the IO mixed in with cgroups that do more traditional buffered IO work. We've hit storms where the redirty path hits millions of times in a few seconds, on all a single file that's only ~40 pages long. This leads to long tail latencies for file writes because the pdflush workers are hogging the CPU from some kworkers bound to the same CPU. Reproducing this on 5.18 was tricky because `869ae85dae` ("xfs: flush new eof page on truncate...") ends up writing/waiting most of these dirty pages before truncate gets a chance to wait on them. The actual repro looks like this: /* * run me in a cgroup all alone. Start a second cgroup with dd * streaming IO into the block device. / int main(int ac, char av) { int fd; int ret; char buf[BUFFER_SIZE]; char filename = av[1]; memset(buf, 0, BUFFER_SIZE); if (ac != 2) { fprintf(stderr, "usage: looper filename\n"); exit(1); } fd = open(filename, O_WRONLY \| O_CREAT, 0600); if (fd < 0) { err(errno, "failed to open"); } fprintf(stderr, "looping on %s\n", filename); while(1) { /* * skip past page 0 so truncate doesn't write and wait * on our extent before changing i_size / ret = lseek(fd, 8192, SEEK_SET); if (ret < 0) err(errno, "lseek"); ret = write(fd, buf, BUFFER_SIZE); if (ret != BUFFER_SIZE) err(errno, "write failed"); / start IO so truncate has to wait after i_size is 0 / ret = sync_file_range(fd, 16384, 4095, SYNC_FILE_RANGE_WRITE); if (ret < 0) err(errno, "sync_file_range"); ret = ftruncate(fd, 0); if (ret < 0) err(errno, "truncate"); usleep(1000); } } And this bpftrace script will show when you've hit a redirty storm: kretprobe:xfs_vm_writepages { delete(@dirty[pid]); } kprobe:xfs_vm_writepages { @dirty[pid] = 1; } kprobe:folio_redirty_for_writepage /@dirty[pid] > 0/ { $inode = ((struct folio )arg1)->mapping->host->i_ino; @inodes[$inode] = count(); @redirty++; if (@redirty > 90000) { printf("inode %d redirty was %d", $inode, @redirty); exit(); } } This patch has the same number of failures on xfstests as unpatched 5.18: Failures: generic/648 xfs/019 xfs/050 xfs/168 xfs/299 xfs/348 xfs/506 xfs/543 I also ran it through a long stress of multiple fsx processes hammering. (Johannes Weiner did significant tracing and debugging on this as well) Signed-off-by: Chris Mason <clm@fb.com> Co-authored-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Domas Mituzas <domas@fb.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-30 09:52:57 -07:00
Pavel Begunkov	29c1ac230e	io_uring: keep sendrecv flags in ioprio We waste a u64 SQE field for flags even though we don't need as many bits and it can be used for something more useful later. Store io_uring specific send/recv flags in sqe->ioprio instead of ->addr2. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: `0455d4ccec` ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg") [axboe: change comment in io_uring.h as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-30 07:15:50 -06:00
Linus Torvalds	732f306943	Merge tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd Pull ksmbd server fixes from Steve French: - seek null check (don't use f_seek op directly and blindly) - offset validation in FSCTL_SET_ZERO_DATA - fallocate fix (relates e.g. to xfstests generic/091 and 263) - two cleanup fixes - fix socket settings on some arch * tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use vfs_llseek instead of dereferencing NULL ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA ksmbd: remove duplicate flag set in smb2_write ksmbd: smbd: Remove useless license text when SPDX-License-Identifier is already used ksmbd: use SOCK_NONBLOCK type for kernel_accept()	2022-06-29 09:20:40 -07:00
Jeff Layton	8692969e91	ceph: wait on async create before checking caps for syncfs Currently, we'll call ceph_check_caps, but if we're still waiting on the reply, we'll end up spinning around on the same inode in flush_dirty_session_caps. Wait for the async create reply before flushing caps. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55823 Fixes: `fbed7045f5` ("ceph: wait for async create reply before sending any cap messages") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-06-29 18:02:57 +02:00
Darrick J. Wong	8944c6fb8a	xfs: dont treat rt extents beyond EOF as eofblocks to be cleared On a system with a realtime volume and a 28k realtime extent, generic/491 fails because the test opens a file on a frozen filesystem and closing it causes xfs_release -> xfs_can_free_eofblocks to mistakenly think that the the blocks of the realtime extent beyond EOF are posteof blocks to be freed. Realtime extents cannot be partially unmapped, so this is pointless. Worse yet, this triggers posteof cleanup, which stalls on a transaction allocation, which is why the test fails. Teach the predicate to account for realtime extents properly. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-06-29 08:47:56 -07:00
Darrick J. Wong	e53bcffad0	xfs: don't hold xattr leaf buffers across transaction rolls Now that we've established (again!) that empty xattr leaf buffers are ok, we no longer need to bhold them to transactions when we're creating new leaf blocks. Get rid of the entire mechanism, which should simplify the xattr code quite a bit. The original justification for using bhold here was to prevent the AIL from trying to write the empty leaf block into the fs during the brief time that we release the buffer lock. The reason for /that/ was to prevent recovery from tripping over the empty ondisk block. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-29 08:47:56 -07:00

... 16 17 18 19 20 ...

77928 Commits