Commit Graph

76795 Commits

Author SHA1 Message Date
Xiubo Li
a74379543d ceph: try to queue a writeback if revoking fails
If the pagecaches writeback just finished and the i_wrbuffer_ref
reaches zero it will try to trigger ceph_check_caps(). But if just
before ceph_check_caps() the i_wrbuffer_ref could be increased
again by mmap/cache write, then the Fwb revoke will fail.

We need to try to queue a writeback in this case instead of
triggering the writeback by BDI's delayed work per 5 seconds.

URL: https://tracker.ceph.com/issues/46904
URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Luís Henriques
55ab552080 ceph: fix statfs for subdir mounts
When doing a mount using as base a directory that has 'max_bytes' quotas
statfs uses that value as the total; if a subdirectory is used instead,
the same 'max_bytes' too in statfs, unless there is another quota set.

Unfortunately, if this subdirectory only has the 'max_files' quota set,
then statfs uses the filesystem total.  Fix this by making sure we only
lookup realms that contain the 'max_bytes' quota.

Cc: Ryan Taylor <rptaylor@uvic.ca>
URL: https://tracker.ceph.com/issues/55090
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
825978fd6a ceph: fix possible deadlock when holding Fwb to get inline_data
1, mount with wsync.
2, create a file with O_RDWR, and the request was sent to mds.0:

   ceph_atomic_open()-->
     ceph_mdsc_do_request(openc)
     finish_open(file, dentry, ceph_open)-->
       ceph_open()-->
         ceph_init_file()-->
           ceph_init_file_info()-->
             ceph_uninline_data()-->
             {
               ...
               if (inline_version == 1 || /* initial version, no data */
                   inline_version == CEPH_INLINE_NONE)
                     goto out_unlock;
               ...
             }

The inline_version will be 1, which is the initial version for the
new create file. And here the ci->i_inline_version will keep with 1,
it's buggy.

3, buffer write to the file immediately:

   ceph_write_iter()-->
     ceph_get_caps(file, need=Fw, want=Fb, ...);
     generic_perform_write()-->
       a_ops->write_begin()-->
         ceph_write_begin()-->
           netfs_write_begin()-->
             netfs_begin_read()-->
               netfs_rreq_submit_slice()-->
                 netfs_read_from_server()-->
                   rreq->netfs_ops->issue_read()-->
                     ceph_netfs_issue_read()-->
                     {
                       ...
                       if (ci->i_inline_version != CEPH_INLINE_NONE &&
                           ceph_netfs_issue_op_inline(subreq))
                         return;
                       ...
                     }
     ceph_put_cap_refs(ci, Fwb);

The ceph_netfs_issue_op_inline() will send a getattr(Fsr) request to
mds.1.

4, then the mds.1 will request the rd lock for CInode::filelock from
the auth mds.0, the mds.0 will do the CInode::filelock state transation
from excl --> sync, but it need to revoke the Fxwb caps back from the
clients.

While the kernel client has aleady held the Fwb caps and waiting for
the getattr(Fsr).

It's deadlock!

URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
3459bd0c55 ceph: redirty the page for writepage on failure
When run out of memories we should redirty the page before failing
the writepage. Or we will hit BUG_ON(folio_get_private(folio)) in
ceph_dirty_folio().

URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
5eed80fba6 ceph: try to choose the auth MDS if possible for getattr
If any 'x' caps is issued we can just choose the auth MDS instead
of the random replica MDSes. Because only when the Locker is in
LOCK_EXEC state will the loner client could get the 'x' caps. And
if we send the getattr requests to any replica MDS it must auth pin
and tries to rdlock from the auth MDS, and then the auth MDS need
to do the Locker state transition to LOCK_SYNC. And after that the
lock state will change back.

This cost much when doing the Locker state transition and usually
will need to revoke caps from clients.

URL: https://tracker.ceph.com/issues/55240
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
f7a2d0688a ceph: disable updating the atime since cephfs won't maintain it
Since CephFS makes no attempt to maintain atime, we shouldn't
try to update it in mmap and generic read cases and ignore updating
it in direct and sync read cases.

And even we update it in mmap and generic read cases we will drop
it and won't sync it to MDS. And we are seeing the atime will be
updated and then dropped to the floor again and again.

URL: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VSJM7T4CS5TDRFF6XFPIYMHP75K73PZ6/
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
1b2ba3c561 ceph: flush the mdlog for filesystem sync
Before waiting for a request's safe reply, we will send the mdlog flush
request to the relevant MDS. And this will also flush the mdlog for all
the other unsafe requests in the same session, so we can record the last
session and no need to flush mdlog again in the next loop. But there
still have cases that it may send the mdlog flush requst twice or more,
but that should be not often.

Rename wait_unsafe_requests() to
flush_mdlog_and_wait_mdsc_unsafe_requests() to make it more
descriptive.

[xiubli: fold in MDS request refcount leak fix from Jeff]

URL: https://tracker.ceph.com/issues/55284
URL: https://tracker.ceph.com/issues/55411
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
ae06706330 ceph: rename unsafe_request_wait()
Rename it to flush_mdlog_and_wait_inode_unsafe_requests() to make
it more descriptive.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
261998c300 ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check
From the posix and the initial statx supporting commit comments,
the AT_STATX_DONT_SYNC is a lightweight stat and the
AT_STATX_FORCE_SYNC is a heaverweight one. And also checked all
the other current usage about these two flags they are all doing
the same, that is only when the AT_STATX_FORCE_SYNC is not set
and the AT_STATX_DONT_SYNC is set will they skip sync retriving
the attributes from storage.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
68e5ec2ec9 ceph: no need to invalidate the fscache twice
Fixes: 400e1286c0 ("ceph: conversion to new fscache API")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Jakob Koschel
3ffa9d6f99 ceph: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean.

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Jakob Koschel
57a5df0e86 ceph: use dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
7ffe4fcea7 ceph: update the dlease for the hashed dentry when removing
The MDS will always refresh the dentry lease when removing the files
or directories. And if the dentry is still hashed, we can update
the dentry lease and no need to do the lookup from the MDS later.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
546a5d6122 ceph: stop retrying the request when exceeding 256 times
The type of 'r_attempts' in kernel 'ceph_mds_request' is 'int',
while in 'ceph_mds_request_head' the type of 'num_retry' is '__u8'.
So in case the request retries exceeding 256 times, the MDS will
receive a incorrect retry seq.

In this case it's ususally a bug in MDS and continue retrying the
request makes no sense. For now let's limit it to 256. In future
this could be fixed in ceph code, so avoid using the hardcode here.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
1980b1bf17 ceph: stop forwarding the request when exceeding 256 times
The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
while in 'ceph_mds_request_head' the type is '__u8'. So in case
the request bounces between MDSes exceeding 256 times, the client
will get stuck.

In this case it's ususally a bug in MDS and continue bouncing the
request makes no sense.

URL: https://tracker.ceph.com/issues/55130
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
6c1dc50284 ceph: remove unused CEPH_MDS_LEASE_RELEASE related code
The ceph_mdsc_lease_release() has been removed by commit 8aa152c778
(ceph: remove ceph_mdsc_lease_release). ceph_mdsc_lease_send_msg will
never be called with CEPH_MDS_LEASE_RELEASE.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Venky Shankar
d7a2dc5230 ceph: allow ceph.dir.rctime xattr to be updatable
`rctime' has been a pain point in cephfs due to its buggy
nature - inconsistent values reported and those sorts.
Fixing rctime is non-trivial needing an overall redesign
of the entire nested statistics infrastructure.

As a workaround, PR

     http://github.com/ceph/ceph/pull/37938

allows this extended attribute to be manually set. This allows
users to "fixup" inconsistent rctime values. While this sounds
messy, its probably the wisest approach allowing users/scripts
to workaround buggy rctime values.

The above PR enables Ceph MDS to allow manually setting
rctime extended attribute with the corresponding user-land
changes. We may as well allow the same to be done via kclient
for parity.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Yufen Yu
908ea65416 f2fs: add f2fs_init_write_merge_io function
Almost all other initialization of variables in f2fs_fill_super are
extraced to a single function. Also do it for write_io[], which can
make code more clean.

This patch just refactors the code, theres no functional change.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-25 10:48:56 -07:00
Paulo Alcantara
de3a9e943d cifs: fix ntlmssp on old servers
Some older servers seem to require the workstation name during ntlmssp
to be at most 15 chars (RFC1001 name length), so truncate it before
sending when using insecure dialects.

Link: https://lore.kernel.org/r/e6837098-15d9-acb6-7e34-1923cf8c6fe1@winds.org
Reported-by: Byron Stanoszek <gandalf@winds.org>
Tested-by: Byron Stanoszek <gandalf@winds.org>
Fixes: 49bd49f983 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-25 07:41:22 -05:00
Jens Axboe
54739cc6b4 io_uring: make prep and issue side of req handlers named consistently
Almost all of them are, the odd ones out are the poll remove and the
files update request. Name them like the others, which is:

io_#cmdname_prep	for request preparation
io_#cmdname		for request issue

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-25 05:37:06 -06:00
Jens Axboe
ecddc25d13 io_uring: make timeout prep handlers consistent with other prep handlers
All other opcodes take a {req, sqe} set for prep handling, split out
a timeout prep handler so that timeout and linked timeouts can use
the same one.

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-25 05:36:54 -06:00
Chao Yu
78901cfa44 f2fs: avoid unneeded error handling for revoke_entry_slab allocation
In __f2fs_commit_atomic_write(), we will guarantee success of
revoke_entry_slab allocation, so let's avoid unneeded error handling.

Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-24 23:06:51 -07:00
Sungjong Seo
66d34fcbbe f2fs: allow compression for mmap files in compress_mode=user
Since commit e3c548323d ("f2fs: let's allow compression for mmap files"),
it has been allowed to compress mmap files. However, in compress_mode=user,
it is not allowed yet. To keep the same concept in both compress_modes,
f2fs_ioc_(de)compress_file() should also allow it.

Let's remove checking mmap files in f2fs_ioc_(de)compress_file() so that
the compression for mmap files is also allowed in compress_mode=user.

Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-24 23:06:38 -07:00
Linus Torvalds
fdaf9a5840 Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache
Pull page cache updates from Matthew Wilcox:

 - Appoint myself page cache maintainer

 - Fix how scsicam uses the page cache

 - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS

 - Remove the AOP flags entirely

 - Remove pagecache_write_begin() and pagecache_write_end()

 - Documentation updates

 - Convert several address_space operations to use folios:
     - is_dirty_writeback
     - readpage becomes read_folio
     - releasepage becomes release_folio
     - freepage becomes free_folio

 - Change filler_t to require a struct file pointer be the first
   argument like ->read_folio

* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
  nilfs2: Fix some kernel-doc comments
  Appoint myself page cache maintainer
  fs: Remove aops->freepage
  secretmem: Convert to free_folio
  nfs: Convert to free_folio
  orangefs: Convert to free_folio
  fs: Add free_folio address space operation
  fs: Convert drop_buffers() to use a folio
  fs: Change try_to_free_buffers() to take a folio
  jbd2: Convert release_buffer_page() to use a folio
  jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
  reiserfs: Convert release_buffer_page() to use a folio
  fs: Remove last vestiges of releasepage
  ubifs: Convert to release_folio
  reiserfs: Convert to release_folio
  orangefs: Convert to release_folio
  ocfs2: Convert to release_folio
  nilfs2: Remove comment about releasepage
  nfs: Convert to release_folio
  jfs: Convert to release_folio
  ...
2022-05-24 19:55:07 -07:00
Linus Torvalds
8642174b52 Merge tag 'iomap-5.19-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
 "There's a couple of corrections sent in by Andreas for some accounting
  errors.

  The biggest change this time around is that writeback errors longer
  clear pageuptodate nor does XFS invalidate the page cache anymore.
  This brings XFS (and gfs2/zonefs) behavior in line with every other
  Linux filesystem driver, and fixes some UAF bugs that only cropped up
  after willy turned on multipage folios for XFS in 5.18-rc1.

  Regrettably, it took all the way to the end of the 5.18 cycle to find
  the source of these bugs and reach a consensus that XFS' writeback
  failure behavior from 20 years ago is no longer necessary.

  Summary:

   - Fix a couple of accounting errors in the buffered io code.

   - Discontinue the practice of marking folios !uptodate and
     invalidating them when writeback fails.

     This fixes some UAF bugs when multipage folios are enabled, and
     brings the behavior of XFS/gfs/zonefs into alignment with the
     behavior of all the other Linux filesystems"

* tag 'iomap-5.19-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: don't invalidate folios after writeback errors
  iomap: iomap_write_end cleanup
  iomap: iomap_write_failed fix
2022-05-24 19:21:30 -07:00
Linus Torvalds
f289811258 Merge tag 'dlm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
 "This includes several large patches to improve endian handling and
  remove sparse warnings. The code previously used in/out, in-place
  endianness conversion functions.

  Other code cleanup includes the list iterator changes.

  Finally, a long standing bug was found and fixed, caused by missed
  decrement on an lock struct ref count"

* tag 'dlm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (28 commits)
  dlm: use kref_put_lock in __put_lkb
  dlm: use kref_put_lock in put_rsb
  dlm: remove unnecessary error assign
  dlm: fix missing lkb refcount handling
  fs: dlm: cast resource pointer to uintptr_t
  dlm: replace usage of found with dedicated list iterator variable
  dlm: remove usage of list iterator for list_add() after the loop body
  dlm: fix pending remove if msg allocation fails
  dlm: fix wake_up() calls for pending remove
  dlm: check required context while close
  dlm: cleanup lock handling in dlm_master_lookup
  dlm: remove found label in dlm_master_lookup
  dlm: remove __user conversion warnings
  dlm: move conversion to compile time
  dlm: use __le types for dlm messages
  dlm: use __le types for rcom messages
  dlm: use __le types for dlm header
  dlm: use __le types for options header
  dlm: add __CHECKER__ for false positives
  dlm: move global to static inits
  ...
2022-05-24 19:09:16 -07:00
Linus Torvalds
fea3043314 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Various bug fixes and cleanups for ext4.

  In particular, move the crypto related fucntions from fs/ext4/super.c
  into a new fs/ext4/crypto.c, and fix a number of bugs found by fuzzers
  and error injection tools"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
  ext4: only allow test_dummy_encryption when supported
  ext4: fix bug_on in __es_tree_search
  ext4: avoid cycles in directory h-tree
  ext4: verify dir block before splitting it
  ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state
  ext4: fix bug_on in ext4_writepages
  ext4: refactor and move ext4_ioctl_get_encryption_pwsalt()
  ext4: cleanup function defs from ext4.h into crypto.c
  ext4: move ext4 crypto code to its own file crypto.c
  ext4: fix memory leak in parse_apply_sb_mount_options()
  ext4: reject the 'commit' option on ext2 filesystems
  ext4: remove duplicated #include of dax.h in inode.c
  ext4: fix race condition between ext4_write and ext4_convert_inline_data
  ext4: convert symlink external data block mapping to bdev
  ext4: add nowait mode for ext4_getblk()
  ext4: fix journal_ioprio mount option handling
  ext4: mark group as trimmed only if it was fully scanned
  ext4: fix use-after-free in ext4_rename_dir_prepare
  ext4: add unmount filesystem message
  ext4: remove unnecessary conditionals
  ...
2022-05-24 19:04:46 -07:00
Linus Torvalds
7208c9842c Merge tag 'gfs2-v5.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:

 - Clean up the allocation of glocks that have an address space attached

 - Quota locking fix and quota iomap conversion

 - Fix the FITRIM error reporting

 - Some list iterator cleanups

* tag 'gfs2-v5.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Convert function bh_get to use iomap
  gfs2: use i_lock spin_lock for inode qadata
  gfs2: Return more useful errors from gfs2_rgrp_send_discards()
  gfs2: Use container_of() for gfs2_glock(aspace)
  gfs2: Explain some direct I/O oddities
  gfs2: replace 'found' with dedicated list iterator variable
2022-05-24 19:00:41 -07:00
Linus Torvalds
bd1b7c1384 Merge tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "Features:

   - subpage:
      - support for PAGE_SIZE > 4K (previously only 64K)
      - make it work with raid56

   - repair super block num_devices automatically if it does not match
     the number of device items

   - defrag can convert inline extents to regular extents, up to now
     inline files were skipped but the setting of mount option
     max_inline could affect the decision logic

   - zoned:
      - minimal accepted zone size is explicitly set to 4MiB
      - make zone reclaim less aggressive and don't reclaim if there are
        enough free zones
      - add per-profile sysfs tunable of the reclaim threshold

   - allow automatic block group reclaim for non-zoned filesystems, with
     sysfs tunables

   - tree-checker: new check, compare extent buffer owner against owner
     rootid

  Performance:

   - avoid blocking on space reservation when doing nowait direct io
     writes (+7% throughput for reads and writes)

   - NOCOW write throughput improvement due to refined locking (+3%)

   - send: reduce pressure to page cache by dropping extent pages right
     after they're processed

  Core:

   - convert all radix trees to xarray

   - add iterators for b-tree node items

   - support printk message index

   - user bulk page allocation for extent buffers

   - switch to bio_alloc API, use on-stack bios where convenient, other
     bio cleanups

   - use rw lock for block groups to favor concurrent reads

   - simplify workques, don't allocate high priority threads for all
     normal queues as we need only one

   - refactor scrub, process chunks based on their constraints and
     similarity

   - allocate direct io structures on stack and pass around only
     pointers, avoids allocation and reduces potential error handling

  Fixes:

   - fix count of reserved transaction items for various inode
     operations

   - fix deadlock between concurrent dio writes when low on free data
     space

   - fix a few cases when zones need to be finished

  VFS, iomap:

   - add helper to check if sb write has started (usable for assertions)

   - new helper iomap_dio_alloc_bio, export iomap_dio_bio_end_io"

* tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (173 commits)
  btrfs: zoned: introduce a minimal zone size 4M and reject mount
  btrfs: allow defrag to convert inline extents to regular extents
  btrfs: add "0x" prefix for unsupported optional features
  btrfs: do not account twice for inode ref when reserving metadata units
  btrfs: zoned: fix comparison of alloc_offset vs meta_write_pointer
  btrfs: send: avoid trashing the page cache
  btrfs: send: keep the current inode open while processing it
  btrfs: allocate the btrfs_dio_private as part of the iomap dio bio
  btrfs: move struct btrfs_dio_private to inode.c
  btrfs: remove the disk_bytenr in struct btrfs_dio_private
  btrfs: allocate dio_data on stack
  iomap: add per-iomap_iter private data
  iomap: allow the file system to provide a bio_set for direct I/O
  btrfs: add a btrfs_dio_rw wrapper
  btrfs: zoned: zone finish unused block group
  btrfs: zoned: properly finish block group on metadata write
  btrfs: zoned: finish block group when there are no more allocatable bytes left
  btrfs: zoned: consolidate zone finish functions
  btrfs: zoned: introduce btrfs_zoned_bg_is_full
  btrfs: improve error reporting in lookup_inline_extent_backref
  ...
2022-05-24 18:52:35 -07:00
Linus Torvalds
3842007b1a Merge tag 'zonefs-5.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
 "A single patch to fix zonefs_init_file_inode() return value"

* tag 'zonefs-5.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Fix zonefs_init_file_inode() return value
2022-05-24 18:48:36 -07:00
Linus Torvalds
65965d9530 Merge tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs (and fscache) updates from Gao Xiang:
 "After working on it on the mailing list for more than half a year, we
  finally form 'erofs over fscache' feature into shape. Hopefully it
  could bring more possibility to the communities.

  The story mainly started from a new project what we called "RAFS v6" [1]
  for Nydus image service almost a year ago, which enhances EROFS to be
  a new form of one bootstrap (which includes metadata representing the
  whole fs tree) + several data-deduplicated content addressable blobs
  (actually treated as multiple devices). Each blob can represent one
  container image layer but not quite exactly since all new data can be
  fully existed in the previous blobs so no need to introduce another
  new blob.

  It is actually not a new idea (at least on my side it's much like a
  simpilied casync [2] for now) and has many benefits over per-file
  blobs or some other exist ways since typically each RAFS v6 image only
  has dozens of device blobs instead of thousands of per-file blobs.
  It's easy to be signed with user keys as a golden image, transfered
  untouchedly with minimal overhead over the network, kept in some type
  of storage conveniently, and run with (optional) runtime verification
  but without involving too many irrelevant features crossing the system
  beyond EROFS itself. At least it's our final goal and we're keeping
  working on it. There was also a good summary of this approach from the
  casync author [3].

  Regardless further optimizations, this work is almost done in the
  previous Linux release cycles. In this round, we'd like to introduce
  on-demand load for EROFS with the fscache/cachefiles infrastructure,
  considering the following advantages:

   - Introduce new file-based backend to EROFS. Although each image only
     contains dozens of blobs but in densely-deployed runC host for
     example, there could still be massive blobs on a machine, which is
     messy if each blob is treated as a device. In contrast, fscache and
     cachefiles are really great interfaces for us to make them work.

   - Introduce on-demand load to fscache and EROFS. Previously, fscache
     is mainly used to caching network-likewise filesystems, now it can
     support on-demand downloading for local fses too with the exact
     localfs on-disk format. It has many advantages which we're been
     described in the latest patchset cover letter [4]. In addition to
     that, most importantly, the cached data is still stored in the
     original local fs on-disk format so that it's still the one signed
     with private keys but only could be partially available. Users can
     fully trust it during running. Later, users can also back up
     cachefiles easily to another machine.

   - More reliable on-demand approach in principle. After data is all
     available locally, user daemon can be no longer online in some use
     cases, which helps daemon crash recovery (filesystems can still in
     service) and hot-upgrade (user daemon can be upgraded more
     frequently due to new features or protocols introduced.)

   - Other format can also be converted to EROFS filesystem format over
     the internet on the fly with the new on-demand load feature and
     mounted. That is entirely possible with on-demand load feature as
     long as such archive format metadata can be fetched in advance like
     stargz.

  In addition, although currently our target user is Nydus image service [5],
  but laterly, it can be used for other use cases like on-demand system
  booting, etc. As for the fscache on-demand load feature itself,
  strictly it can be used for other local fses too. Laterly we could
  promote most code to the iomap infrastructure and also enhance it in
  the read-write way if other local fses are interested.

  Thanks David Howells for taking so much time and patience on this
  these months, many thanks with great respect here again! Thanks Jeffle
  for working on this feature and Xin Yin from Bytedance for
  asynchronous I/O implementation as well as Zichen Tian, Jia Zhu, and
  Yan Song for testing, much appeciated. We're also exploring more
  possibly over fscache cache management over FSDAX for secure
  containers and working on more improvements and useful features for
  fscache, cachefiles, and on-demand load.

  In addition to "erofs over fscache", NFS export and idmapped mount are
  also completed in this cycle for container use cases as well.

  Summary:

   - Add erofs on-demand load support over fscache

   - Support NFS export for erofs

   - Support idmapped mounts for erofs

   - Don't prompt for risk any more when using big pcluster

   - Fix buffer copy overflow of ztailpacking feature

   - Several minor cleanups"

[1] https://lore.kernel.org/r/20210730194625.93856-1-hsiangkao@linux.alibaba.com
[2] https://github.com/systemd/casync
[3] http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
[4] https://lore.kernel.org/r/20220509074028.74954-1-jefflexu@linux.alibaba.com
[5] https://github.com/dragonflyoss/image-service

* tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (29 commits)
  erofs: scan devices from device table
  erofs: change to use asynchronous io for fscache readpage/readahead
  erofs: add 'fsid' mount option
  erofs: implement fscache-based data readahead
  erofs: implement fscache-based data read for inline layout
  erofs: implement fscache-based data read for non-inline layout
  erofs: implement fscache-based metadata read
  erofs: register fscache context for extra data blobs
  erofs: register fscache context for primary data blob
  erofs: add erofs_fscache_read_folios() helper
  erofs: add anonymous inode caching metadata for data blobs
  erofs: add fscache context helper functions
  erofs: register fscache volume
  erofs: add fscache mode check helper
  erofs: make erofs_map_blocks() generally available
  cachefiles: document on-demand read mode
  cachefiles: add tracepoints for on-demand read mode
  cachefiles: enable on-demand read mode
  cachefiles: implement on-demand read
  cachefiles: notify the user daemon when withdrawing cookie
  ...
2022-05-24 18:42:04 -07:00
Linus Torvalds
850f6033cd Merge tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:

 - fix referencing wrong parent directory information during rename

 - introduce a sys_tz mount option to use system timezone

 - improve performance while zeroing a cluster with dirsync mount option

 - fix slab-out-bounds in exat_clear_bitmap() reported from syzbot

* tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: check if cluster num is valid
  exfat: reduce block requests when zeroing a cluster
  block: add sync_blockdev_range()
  exfat: introduce mount option 'sys_tz'
  exfat: fix referencing wrong parent directory information after renaming
2022-05-24 18:30:27 -07:00
Linus Torvalds
f30fabe78a Merge tag 'fs.idmapped.v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull fs idmapping updates from Christian Brauner:
 "This contains two minor updates:

   - An update to the idmapping documentation by Rodrigo making it
     easier to understand that we first introduce several use-cases that
     fail without idmapped mounts simply to explain how they can be
     handled with idmapped mounts.

   - When changing a mount's idmapping we now hold writers to make it
     more robust.

     This is similar to turning a mount ro with the difference that in
     contrast to turning a mount ro changing the idmapping can only ever
     be done once while a mount can transition between ro and rw as much
     as it wants.

     The vfs layer itself takes care to retrieve the idmapping of a
     mount once ensuring that the idmapping used for vfs permission
     checking is identical to the idmapping passed down to the
     filesystem. All filesystems with FS_ALLOW_IDMAP raised take the
     same precautions as the vfs in code-paths that are outside of
     direct control of the vfs such as ioctl()s.

     However, holding writers makes this more robust and predictable for
     both the kernel and userspace.

     This is a minor user-visible change. But it is extremely unlikely
     to matter. The caller must've created a detached mount via
     OPEN_TREE_CLONE and then handed that O_PATH fd to another process
     or thread which then must've gotten a writable fd for that mount
     and started creating files in there while the caller is still
     changing mount properties. While not impossible it will be an
     extremely rare corner-case and should in general be considered a
     bug in the application. Consider making a mount MOUNT_ATTR_NOEXEC
     or MOUNT_ATTR_NODEV while allowing someone else to perform lookups
     or exec'ing in parallel by handing them a copy of the
     OPEN_TREE_CLONE fd or another fd beneath that mount.

     I've pinged all major users of idmapped mounts pointing out this
     change and none of them have active writers on a mount while still
     changing mount properties. It would've been strange if they did.

  The rest and majority of the work will be coming through the overlayfs
  tree this cycle. In addition to overlayfs this cycle should also see
  support for idmapped mounts on erofs as I've acked a patch to this
  effect a little while ago"

* tag 'fs.idmapped.v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: hold writers when changing mount's idmapping
  docs: Add small intro to idmap examples
2022-05-24 18:19:06 -07:00
Linus Torvalds
0350785b0a Merge tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull IMA updates from Mimi Zohar:
 "New is IMA support for including fs-verity file digests and signatures
  in the IMA measurement list as well as verifying the fs-verity file
  digest based signatures, both based on policy.

  In addition, are two bug fixes:

   - avoid reading UEFI variables, which cause a page fault, on Apple
     Macs with T2 chips.

   - remove the original "ima" template Kconfig option to address a boot
     command line ordering issue.

  The rest is a mixture of code/documentation cleanup"

* tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  integrity: Fix sparse warnings in keyring_handler
  evm: Clean up some variables
  evm: Return INTEGRITY_PASS for enum integrity_status value '0'
  efi: Do not import certificates from UEFI Secure Boot for T2 Macs
  fsverity: update the documentation
  ima: support fs-verity file digest based version 3 signatures
  ima: permit fsverity's file digests in the IMA measurement list
  ima: define a new template field named 'd-ngv2' and templates
  fs-verity: define a function to return the integrity protected file digest
  ima: use IMA default hash algorithm for integrity violations
  ima: fix 'd-ng' comments and documentation
  ima: remove the IMA_TEMPLATE Kconfig option
  ima: remove redundant initialization of pointer 'file'.
2022-05-24 13:50:39 -07:00
Linus Torvalds
a6b450573b Merge tag 'execve-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:

 - Fix binfmt_flat GOT handling for riscv (Niklas Cassel)

 - Remove unused/broken binfmt_flat shared library and coredump code
   (Eric W. Biederman)

* tag 'execve-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_flat: Remove shared library support
  binfmt_flat: Drop vestiges of coredump support
  binfmt_flat: do not stop relocating GOT entries prematurely on riscv
2022-05-24 12:49:48 -07:00
Eric Biggers
5f41fdaea6 ext4: only allow test_dummy_encryption when supported
Make the test_dummy_encryption mount option require that the encrypt
feature flag be already enabled on the filesystem, rather than
automatically enabling it.  Practically, this means that "-O encrypt"
will need to be included in MKFS_OPTIONS when running xfstests with the
test_dummy_encryption mount option.  (ext4/053 also needs an update.)

Moreover, as long as the preconditions for test_dummy_encryption are
being tightened anyway, take the opportunity to start rejecting it when
!CONFIG_FS_ENCRYPTION rather than ignoring it.

The motivation for requiring the encrypt feature flag is that:

- Having the filesystem auto-enable feature flags is problematic, as it
  bypasses the usual sanity checks.  The specific issue which came up
  recently is that in kernel versions where ext4 supports casefold but
  not encrypt+casefold (v5.1 through v5.10), the kernel will happily add
  the encrypt flag to a filesystem that has the casefold flag, making it
  unmountable -- but only for subsequent mounts, not the initial one.
  This confused the casefold support detection in xfstests, causing
  generic/556 to fail rather than be skipped.

- The xfstests-bld test runners (kvm-xfstests et al.) already use the
  required mkfs flag, so they will not be affected by this change.  Only
  users of test_dummy_encryption alone will be affected.  But, this
  option has always been for testing only, so it should be fine to
  require that the few users of this option update their test scripts.

- f2fs already requires it (for its equivalent feature flag).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Link: https://lore.kernel.org/r/20220519204437.61645-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-24 15:34:27 -04:00
Baokun Li
d36f6ed761 ext4: fix bug_on in __es_tree_search
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/extents_status.c:199!
[...]
RIP: 0010:ext4_es_end fs/ext4/extents_status.c:199 [inline]
RIP: 0010:__es_tree_search+0x1e0/0x260 fs/ext4/extents_status.c:217
[...]
Call Trace:
 ext4_es_cache_extent+0x109/0x340 fs/ext4/extents_status.c:766
 ext4_cache_extents+0x239/0x2e0 fs/ext4/extents.c:561
 ext4_find_extent+0x6b7/0xa20 fs/ext4/extents.c:964
 ext4_ext_map_blocks+0x16b/0x4b70 fs/ext4/extents.c:4384
 ext4_map_blocks+0xe26/0x19f0 fs/ext4/inode.c:567
 ext4_getblk+0x320/0x4c0 fs/ext4/inode.c:980
 ext4_bread+0x2d/0x170 fs/ext4/inode.c:1031
 ext4_quota_read+0x248/0x320 fs/ext4/super.c:6257
 v2_read_header+0x78/0x110 fs/quota/quota_v2.c:63
 v2_check_quota_file+0x76/0x230 fs/quota/quota_v2.c:82
 vfs_load_quota_inode+0x5d1/0x1530 fs/quota/dquot.c:2368
 dquot_enable+0x28a/0x330 fs/quota/dquot.c:2490
 ext4_quota_enable fs/ext4/super.c:6137 [inline]
 ext4_enable_quotas+0x5d7/0x960 fs/ext4/super.c:6163
 ext4_fill_super+0xa7c9/0xdc00 fs/ext4/super.c:4754
 mount_bdev+0x2e9/0x3b0 fs/super.c:1158
 mount_fs+0x4b/0x1e4 fs/super.c:1261
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
ext4_fill_super
 ext4_enable_quotas
  ext4_quota_enable
   ext4_iget
    __ext4_iget
     ext4_ext_check_inode
      ext4_ext_check
       __ext4_ext_check
        ext4_valid_extent_entries
         Check for overlapping extents does't take effect
   dquot_enable
    vfs_load_quota_inode
     v2_check_quota_file
      v2_read_header
       ext4_quota_read
        ext4_bread
         ext4_getblk
          ext4_map_blocks
           ext4_ext_map_blocks
            ext4_find_extent
             ext4_cache_extents
              ext4_es_cache_extent
               ext4_es_cache_extent
                __es_tree_search
                 ext4_es_end
                  BUG_ON(es->es_lblk + es->es_len < es->es_lblk)

The error ext4 extents is as follows:
0af3 0300 0400 0000 00000000    extent_header
00000000 0100 0000 12000000     extent1
00000000 0100 0000 18000000     extent2
02000000 0400 0000 14000000     extent3

In the ext4_valid_extent_entries function,
if prev is 0, no error is returned even if lblock<=prev.
This was intended to skip the check on the first extent, but
in the error image above, prev=0+1-1=0 when checking the second extent,
so even though lblock<=prev, the function does not return an error.
As a result, bug_ON occurs in __es_tree_search and the system panics.

To solve this problem, we only need to check that:
1. The lblock of the first extent is not less than 0.
2. The lblock of the next extent  is not less than
   the next block of the previous extent.
The same applies to extent_idx.

Cc: stable@kernel.org
Fixes: 5946d08937 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220518120816.1541863-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-24 15:34:17 -04:00
Jan Kara
3ba733f879 ext4: avoid cycles in directory h-tree
A maliciously corrupted filesystem can contain cycles in the h-tree
stored inside a directory. That can easily lead to the kernel corrupting
tree nodes that were already verified under its hands while doing a node
split and consequently accessing unallocated memory. Fix the problem by
verifying traversed block numbers are unique.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220518093332.13986-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-24 15:34:13 -04:00
Jan Kara
46c116b920 ext4: verify dir block before splitting it
Before splitting a directory block verify its directory entries are sane
so that the splitting code does not access memory it should not.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220518093332.13986-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-24 15:34:08 -04:00
Theodore Ts'o
c878bea3c9 ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state
The EXT4_FC_REPLAY bit in sbi->s_mount_state is used to indicate that
we are in the middle of replay the fast commit journal.  This was
actually a mistake, since the sbi->s_mount_info is initialized from
es->s_state.  Arguably s_mount_state is misleadingly named, but the
name is historical --- s_mount_state and s_state dates back to ext2.

What should have been used is the ext4_{set,clear,test}_mount_flag()
inline functions, which sets EXT4_MF_* bits in sbi->s_mount_flags.

The problem with using EXT4_FC_REPLAY is that a maliciously corrupted
superblock could result in EXT4_FC_REPLAY getting set in
s_mount_state.  This bypasses some sanity checks, and this can trigger
a BUG() in ext4_es_cache_extent().  As a easy-to-backport-fix, filter
out the EXT4_FC_REPLAY bit for now.  We should eventually transition
away from EXT4_FC_REPLAY to something like EXT4_MF_REPLAY.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20220420192312.1655305-1-phind.uet@gmail.com
Link: https://lore.kernel.org/r/20220517174028.942119-1-tytso@mit.edu
Reported-by: syzbot+c7358a3cd05ee786eb31@syzkaller.appspotmail.com
2022-05-24 15:33:58 -04:00
Ronnie Sahlberg
d87c48ce4d cifs: cache the dirents for entries in a cached directory
This adds caching of the directory entries for a cached directory while we keep
a lease on the directory.

Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-24 14:33:11 -05:00
Bob Peterson
c360abbb9d gfs2: Convert function bh_get to use iomap
Before this patch, function bh_get used block_map to figure out the
block it needed to read in from the quota_change file. This patch
changes it to use iomap directly to make it more efficient.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-24 21:29:14 +02:00
Bob Peterson
5fcff61eea gfs2: use i_lock spin_lock for inode qadata
Before this patch, functions gfs2_qa_get and _put used the i_rw_mutex to
prevent simultaneous access to its i_qadata. But i_rw_mutex is now used
for many other things, including iomap_begin and end, which causes a
conflict according to lockdep. We cannot just remove the lock since
simultaneous opens (gfs2_open -> gfs2_open_common -> gfs2_qa_get) can
then stomp on each others values for i_qadata.

This patch solves the conflict by using the i_lock spin_lock in the inode
to prevent simultaneous access.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-24 21:29:14 +02:00
Andrew Price
f4a47561fc gfs2: Return more useful errors from gfs2_rgrp_send_discards()
The bug that 27ca8273f ("gfs2: Make sure FITRIM minlen is rounded up to
fs block size") fixes was a little confusing as the user saw
"Input/output error" which masked the -EINVAL that sb_issue_discard()
returned.

sb_issue_discard() can fail for various reasons, so we should return its
return value from gfs2_rgrp_send_discards() to avoid all errors being
reported as IO errors.

This improves error reporting for FITRIM and makes no difference to the
-o discard code path because the return value from
gfs2_rgrp_send_discards() gets thrown away in that case (and the option
switches off). Presumably that's why it was ok to just return -EIO in
the past, before FITRIM was implemented.

Tested with xfstests.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-24 21:29:14 +02:00
Kees Cook
11d8b79e84 gfs2: Use container_of() for gfs2_glock(aspace)
Clang's structure layout randomization feature gets upset when it sees
struct address_space (which is randomized) cast to struct gfs2_glock.
This is due to seeing the mapping pointer as being treated as an array
of gfs2_glock, rather than "something else, before struct address_space":

In file included from fs/gfs2/acl.c:23:
fs/gfs2/meta_io.h:44:12: error: casting from randomized structure pointer type 'struct address_space *' to 'struct gfs2_glock *'
	return (((struct gfs2_glock *)mapping) - 1)->gl_name.ln_sbd;
		^

Replace the instances of open-coded pointer math with container_of()
usage, and update the allocator to match.

Some cleanups and conversion of gfs2_glock_get() and
gfs2_glock_dealloc() by Andreas.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202205041550.naKxwCBj-lkp@intel.com
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bill Wendling <morbo@google.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-24 21:29:14 +02:00
Andreas Gruenbacher
53bb540fd5 gfs2: Explain some direct I/O oddities
Add some comments explaining the oddities of partial direct I/O reads
and writes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-24 21:29:14 +02:00
Linus Torvalds
51518aa68c Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
 "A couple small cleanups for fs/verity/"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fs-verity: Use struct_size() helper in enable_verity()
  fs-verity: remove unused parameter desc_size in fsverity_create_info()
2022-05-24 12:22:56 -07:00
Linus Torvalds
c1f4cfdbef Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
 "Some cleanups for fs/crypto/:

   - Split up the misleadingly-named FS_CRYPTO_BLOCK_SIZE constant.

   - Consistently report the encryption implementation that is being
     used.

   - Add helper functions for the test_dummy_encryption mount option
     that work properly with the new mount API. ext4 and f2fs will use
     these"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: add new helper functions for test_dummy_encryption
  fscrypt: factor out fscrypt_policy_to_key_spec()
  fscrypt: log when starting to use inline encryption
  fscrypt: split up FS_CRYPTO_BLOCK_SIZE
2022-05-24 12:17:45 -07:00
Shyam Prasad N
5752bf645f cifs: avoid parallel session setups on same channel
After allowing channels to reconnect in parallel, it now
becomes important to take care that multiple processes do not
call negotiate/session setup in parallel on the same channel.

This change avoids that by marking a channel as "in_reconnect".
During session setup if the channel in question has this flag
set, we return immediately.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-24 14:16:32 -05:00
Shyam Prasad N
dd3cd8709e cifs: use new enum for ses_status
ses->status today shares statusEnum with server->tcpStatus.
This has been confusing, and tcon->status has deviated to use
a new enum. Follow suit and use new enum for ses_status as well.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-24 14:11:17 -05:00