Commit Graph

77928 Commits

Author SHA1 Message Date
Christian Brauner
81a1807d80 attr: fix kernel doc
When building kernel documentation new warnings were generated because
the name in the parameter documentation didn't match the parameter name.

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-27 16:08:31 +02:00
Keith Busch
bf8d08532b iomap: add support for dma aligned direct-io
Use the address alignment requirements from the block_device for direct
io instead of requiring addresses be aligned to the block size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220610195830.3574005-12-kbusch@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27 06:29:11 -06:00
Darrick J. Wong
f94e08b602 xfs: clean up the end of xfs_attri_item_recover
The end of this function could use some cleanup -- the EAGAIN
conditionals make it harder to figure out what's going on with the
disposal of xattri_leaf_bp, and the dual error/ret variables aren't
needed.  Turn the EAGAIN case into a separate block documenting all the
subtleties of recovering in the middle of an xattr update chain, which
makes the rest of the prologue much simpler.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-06-26 14:43:28 -07:00
Darrick J. Wong
b822ea17fd xfs: always free xattri_leaf_bp when cancelling a deferred op
While running the following fstest with logged xattrs DISabled, I
noticed the following:

# FSSTRESS_AVOID="-z -f unlink=1 -f rmdir=1 -f creat=2 -f mkdir=2 -f
getfattr=3 -f listfattr=3 -f attr_remove=4 -f removefattr=4 -f
setfattr=20 -f attr_set=60" ./check generic/475

INFO: task u9:1:40 blocked for more than 61 seconds.
      Tainted: G           O      5.19.0-rc2-djwx #rc2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:u9:1            state:D stack:12872 pid:   40 ppid:     2 flags:0x00004000
Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs]
Call Trace:
 <TASK>
 __schedule+0x2db/0x1110
 schedule+0x58/0xc0
 schedule_timeout+0x115/0x160
 __down_common+0x126/0x210
 down+0x54/0x70
 xfs_buf_lock+0x2d/0xe0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
 xfs_buf_item_unpin+0x227/0x3a0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
 xfs_trans_committed_bulk+0x18e/0x320 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
 xlog_cil_committed+0x2ea/0x360 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
 xlog_cil_push_work+0x60f/0x690 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
 process_one_work+0x1df/0x3c0
 worker_thread+0x53/0x3b0
 kthread+0xea/0x110
 ret_from_fork+0x1f/0x30
 </TASK>

This appears to be the result of shortform_to_leaf creating a new leaf
buffer as part of adding an xattr to a file.  The new leaf buffer is
held and attached to the xfs_attr_intent structure, but then the
filesystem shuts down.  Instead of the usual path (which adds the attr
to the held leaf buffer which releases the hold), we instead cancel the
entire deferred operation.

Unfortunately, xfs_attr_cancel_item doesn't release any attached leaf
buffers, so we leak the locked buffer.  The CIL cannot do anything
about that, and hangs.  Fix this by teaching it to release leaf buffers,
and make XFS a little more careful about not leaving a dangling
reference.

The prologue of xfs_attri_item_recover is (in this author's opinion) a
little hard to figure out, so I'll clean that up in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-06-26 14:43:28 -07:00
Kaixu Xia
82af880639 xfs: use invalidate_lock to check the state of mmap_lock
We should use invalidate_lock and XFS_MMAPLOCK_SHARED to check the state
of mmap_lock rw_semaphore in xfs_isilocked(), rather than i_rwsem and
XFS_IOLOCK_SHARED.

Fixes: 2433480a7e ("xfs: Convert to use invalidate_lock")
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-06-26 14:43:28 -07:00
Kaixu Xia
ca76a761ea xfs: factor out the common lock flags assert
There are similar lock flags assert in xfs_ilock(), xfs_ilock_nowait(),
xfs_iunlock(), thus we can factor it out into a helper that is clear.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-06-26 14:43:28 -07:00
Linus Torvalds
413c1f1491 Merge tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
 "Minor things, mainly - mailmap updates, MAINTAINERS updates, etc.

  Fixes for this merge window:

   - fix for a damon boot hang, from SeongJae

   - fix for a kfence warning splat, from Jason Donenfeld

   - fix for zero-pfn pinning, from Alex Williamson

   - fix for fallocate hole punch clearing, from Mike Kravetz

  Fixes for previous releases:

   - fix for a performance regression, from Marcelo

   - fix for a hwpoisining BUG from zhenwei pi"

* tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: add entry for Christian Marangi
  mm/memory-failure: disable unpoison once hw error happens
  hugetlbfs: zero partial pages during fallocate hole punch
  mm: memcontrol: reference to tools/cgroup/memcg_slabinfo.py
  mm: re-allow pinning of zero pfns
  mm/kfence: select random number before taking raw lock
  MAINTAINERS: add maillist information for LoongArch
  MAINTAINERS: update MM tree references
  MAINTAINERS: update Abel Vesa's email
  MAINTAINERS: add MEMORY HOT(UN)PLUG section and add David as reviewer
  MAINTAINERS: add Miaohe Lin as a memory-failure reviewer
  mailmap: add alias for jarkko@profian.com
  mm/damon/reclaim: schedule 'damon_reclaim_timer' only after 'system_wq' is initialized
  kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal
  mm: lru_cache_disable: use synchronize_rcu_expedited
  mm/page_isolation.c: fix one kernel-doc comment
2022-06-26 14:00:55 -07:00
Linus Torvalds
82708bb1eb Merge tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - zoned relocation fixes:
      - fix critical section end for extent writeback, this could lead
        to out of order write
      - prevent writing to previous data relocation block group if space
        gets low

 - reflink fixes:
      - fix race between reflinking and ordered extent completion
      - proper error handling when block reserve migration fails
      - add missing inode iversion/mtime/ctime updates on each iteration
        when replacing extents

 - fix deadlock when running fsync/fiemap/commit at the same time

 - fix false-positive KCSAN report regarding pid tracking for read locks
   and data race

 - minor documentation update and link to new site

* tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Documentation: update btrfs list of features and link to readthedocs.io
  btrfs: fix deadlock with fsync+fiemap+transaction commit
  btrfs: don't set lock_owner when locking extent buffer for reading
  btrfs: zoned: fix critical section of relocation inode writeback
  btrfs: zoned: prevent allocation from previous data relocation BG
  btrfs: do not BUG_ON() on failure to migrate space when replacing extents
  btrfs: add missing inode updates on each iteration when replacing extents
  btrfs: fix race between reflinking and ordered extent completion
2022-06-26 10:11:36 -07:00
Christian Brauner
b27c82e129 attr: port attribute changes to new types
Now that we introduced new infrastructure to increase the type safety
for filesystems supporting idmapped mounts port the first part of the
vfs over to them.

This ports the attribute changes codepaths to rely on the new better
helpers using a dedicated type.

Before this change we used to take a shortcut and place the actual
values that would be written to inode->i_{g,u}id into struct iattr. This
had the advantage that we moved idmappings mostly out of the picture
early on but it made reasoning about changes more difficult than it
should be.

The filesystem was never explicitly told that it dealt with an idmapped
mount. The transition to the value that needed to be stored in
inode->i_{g,u}id appeared way too early and increased the probability of
bugs in various codepaths.

We know place the same value in struct iattr no matter if this is an
idmapped mount or not. The vfs will only deal with type safe
vfs{g,u}id_t. This makes it massively safer to perform permission checks
as the type will tell us what checks we need to perform and what helpers
we need to use.

Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
inode->i_{g,u}id since they are different types. Instead they need to
use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
vfs{g,u}id into the filesystem.

The other nice effect is that filesystems like overlayfs don't need to
care about idmappings explicitly anymore and can simply set up struct
iattr accordingly directly.

Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-26 18:18:56 +02:00
Christian Brauner
0e363cf3fa security: pass down mount idmapping to setattr hook
Before this change we used to take a shortcut and place the actual
values that would be written to inode->i_{g,u}id into struct iattr. This
had the advantage that we moved idmappings mostly out of the picture
early on but it made reasoning about changes more difficult than it
should be.

The filesystem was never explicitly told that it dealt with an idmapped
mount. The transition to the value that needed to be stored in
inode->i_{g,u}id appeared way too early and increased the probability of
bugs in various codepaths.

We know place the same value in struct iattr no matter if this is an
idmapped mount or not. The vfs will only deal with type safe
vfs{g,u}id_t. This makes it massively safer to perform permission checks
as the type will tell us what checks we need to perform and what helpers
we need to use.

Adapt the security_inode_setattr() helper to pass down the mount's
idmapping to account for that change.

Link: https://lore.kernel.org/r/20220621141454.2914719-8-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-26 18:18:56 +02:00
Christian Brauner
71e7b535b8 quota: port quota helpers mount ids
Port the is_quota_modification() and dqout_transfer() helper to type
safe vfs{g,u}id_t. Since these helpers are only called by a few
filesystems don't introduce a new helper but simply extend the existing
helpers to pass down the mount's idmapping.

Note, that this is a non-functional change, i.e. nothing will have
happened here or at the end of this series to how quota are done! This
a change necessary because we will at the end of this series make
ownership changes easier to reason about by keeping the original value
in struct iattr for both non-idmapped and idmapped mounts.

For now we always pass the initial idmapping which makes the idmapping
functions these helpers call nops.

This is done because we currently always pass the actual value to be
written to i_{g,u}id via struct iattr. While this allowed us to treat
the {g,u}id values in struct iattr as values that can be directly
written to inode->i_{g,u}id it also increases the potential for
confusion for filesystems.

Now that we are have dedicated types to prevent this confusion we will
ultimately only map the value from the idmapped mount into a filesystem
value that can be written to inode->i_{g,u}id when the filesystem
actually updates the inode. So pass down the initial idmapping until we
finished that conversion at which point we pass down the mount's
idmapping.

Since struct iattr uses an anonymous union with overlapping types as
supported by the C standard, filesystems that haven't converted to
ia_vfs{g,u}id won't see any difference and things will continue to work
as before. In other words, no functional changes intended with this
change.

Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-26 18:18:55 +02:00
Christian Brauner
35faf3109a fs: port to iattr ownership update helpers
Earlier we introduced new helpers to abstract ownership update and
remove code duplication. This converts all filesystems supporting
idmapped mounts to make use of these new helpers.

For now we always pass the initial idmapping which makes the idmapping
functions these helpers call nops.

This is done because we currently always pass the actual value to be
written to i_{g,u}id via struct iattr. While this allowed us to treat
the {g,u}id values in struct iattr as values that can be directly
written to inode->i_{g,u}id it also increases the potential for
confusion for filesystems.

Now that we are have dedicated types to prevent this confusion we will
ultimately only map the value from the idmapped mount into a filesystem
value that can be written to inode->i_{g,u}id when the filesystem
actually updates the inode. So pass down the initial idmapping until we
finished that conversion at which point we pass down the mount's
idmapping.

No functional changes intended.

Link: https://lore.kernel.org/r/20220621141454.2914719-6-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-26 18:18:55 +02:00
Linus Torvalds
97d4d02697 Merge tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fix from Namjae Jeon:

 - Use updated exfat_chain directly instead of snapshot values in
   rename.

* tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: use updated exfat_chain directly during renaming
2022-06-26 08:41:04 -07:00
Linus Torvalds
918c30dffd Merge tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
 "Fixes addressing important multichannel, and reconnect issues.

  Multichannel mounts when the server network interfaces changed, or ip
  addresses changed, uncovered problems, especially in reconnect, but
  the patches for this were held up until recently due to some lock
  conflicts that are now addressed.

  Included in this set of fixes:

   - three fixes relating to multichannel reconnect, dynamically
     adjusting the list of server interfaces to avoid problems during
     reconnect

   - a lock conflict fix related to the above

   - two important fixes for negotiate on secondary channels (null
     netname can unintentionally cause multichannel to be disabled to
     some servers)

   - a reconnect fix (reporting incorrect IP address in some cases)"

* tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update cifs_ses::ip_addr after failover
  cifs: avoid deadlocks while updating iface
  cifs: periodically query network interfaces from server
  cifs: during reconnect, update interface if necessary
  cifs: change iface_list from array to sorted linked list
  smb3: use netname when available on secondary channels
  smb3: fix empty netname context on secondary channels
2022-06-26 08:34:52 -07:00
Jason A. Donenfeld
067baa9a37 ksmbd: use vfs_llseek instead of dereferencing NULL
By not checking whether llseek is NULL, this might jump to NULL. Also,
it doesn't check FMODE_LSEEK. Fix this by using vfs_llseek(), which
always does the right thing.

Fixes: f441584858 ("cifsd: add file operations")
Cc: stable@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-25 19:52:49 -05:00
Eric Biggers
c5bca38d2e f2fs: use the updated test_dummy_encryption helper functions
Switch f2fs over to the functions that are replacing
fscrypt_set_test_dummy_encryption().  Since f2fs hasn't been converted
to the new mount API yet, this doesn't really provide a benefit for
f2fs.  But it allows fscrypt_set_test_dummy_encryption() to be removed.

Also take the opportunity to eliminate an #ifdef.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-06-25 12:11:56 -07:00
Linus Torvalds
29eeafc661 Merge tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
 "Some urgent fixes to avoid generating corrupted inodes caused by
  compressed and inline_data files.

  In addition, avoid a wrong error report which prevents a roll-forward
  recovery"

* tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: do not count ENOENT for error case
  f2fs: fix iostat related lock protection
  f2fs: attach inline_data after setting compression
2022-06-25 09:19:51 -07:00
Ard Biesheuvel
2d82e6227e efi: vars: Move efivar caching layer into efivarfs
Move the fiddly bits of the efivar layer into its only remaining user,
efivarfs, and confine its use to that particular module. All other uses
of the EFI variable store have no need for this additional layer of
complexity, given that they either only read variables, or read and
write variables into a separate GUIDed namespace, and cannot be used to
manipulate EFI variables that are covered by the EFI spec and/or affect
the boot flow.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-24 20:40:19 +02:00
Ard Biesheuvel
3a75f9f2f9 efi: vars: Use locking version to iterate over efivars linked lists
Both efivars and efivarfs uses __efivar_entry_iter() to go over the
linked list that shadows the list of EFI variables held by the firmware,
but fail to call the begin/end helpers that are documented as a
prerequisite.

So switch to the proper version, which is efivar_entry_iter(). Given
that in both cases, efivar_entry_remove() is invoked with the lock held
already, don't take the lock there anymore.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-24 20:40:19 +02:00
Ard Biesheuvel
ec3507b2ca efi: vars: Don't drop lock in the middle of efivar_init()
Even though the efivars_lock lock is documented as protecting the
efivars->ops pointer (among other things), efivar_init() happily
releases and reacquires the lock for every EFI variable that it
enumerates. This used to be needed because the lock was originally a
spinlock, which prevented the callback that is invoked for every
variable from being able to sleep. However, releasing the lock could
potentially invalidate the ops pointer, but more importantly, it might
allow a SetVariable() runtime service call to take place concurrently,
and the UEFI spec does not define how this affects an enumeration that
is running in parallel using the GetNextVariable() runtime service,
which is what efivar_init() uses.

In the meantime, the lock has been converted into a semaphore, and the
only reason we need to drop the lock is because the efivarfs pseudo
filesystem driver will otherwise deadlock when it invokes the efivars
API from the callback to create the efivar_entry items and insert them
into the linked list. (EFI pstore is affected in a similar way)

So let's switch to helpers that can be used while the lock is already
taken. This way, we can hold on to the lock throughout the enumeration.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-24 20:40:18 +02:00
Ard Biesheuvel
8ca869b245 pstore: Add priv field to pstore_record for backend specific use
The EFI pstore backend will need to store per-record variable name data
when we switch away from the efivars layer. Add a priv field to struct
pstore_record, and document it as holding a backend specific pointer
that is assumed to be a kmalloc()d buffer, and will be kfree()d when the
entire record is freed.

Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-24 20:40:04 +02:00
Paulo Alcantara
af3a6d1018 cifs: update cifs_ses::ip_addr after failover
cifs_ses::ip_addr wasn't being updated in cifs_session_setup() when
reconnecting SMB sessions thus returning wrong value in
/proc/fs/cifs/DebugData.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-24 13:34:28 -05:00
Linus Torvalds
598f240487 Merge tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "A few fixes that should go into the 5.19 release. All are fixing
  issues that either happened in this release, or going to stable.

  In detail:

   - A small series of fixlets for the poll handling, all destined for
     stable (Pavel)

   - Fix a merge error from myself that caused a potential -EINVAL for
     the recv/recvmsg flag setting (me)

   - Fix a kbuf recycling issue for partial IO (me)

   - Use the original request for the inflight tracking (me)

   - Fix an issue introduced this merge window with trace points using a
     custom decoder function, which won't work for perf (Dylan)"

* tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block:
  io_uring: use original request task for inflight tracking
  io_uring: move io_uring_get_opcode out of TP_printk
  io_uring: fix double poll leak on repolling
  io_uring: fix wrong arm_poll error handling
  io_uring: fail links when poll fails
  io_uring: fix req->apoll_events
  io_uring: fix merge error in checking send/recv addr2 flags
  io_uring: mark reissue requests with REQ_F_PARTIAL_IO
2022-06-24 11:02:26 -07:00
Alexander Aring
8d614a4457 fs: dlm: remove timeout from dlm_user_adopt_orphan
Remove the unused timeout parameter from dlm_user_adopt_orphan().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:53 -05:00
Alexander Aring
2bb2a3d66c fs: dlm: remove waiter warnings
This patch removes warning messages that could be logged when
remote requests had been waiting on a reply message for some timeout
period (which could be set through configfs, but was rarely enabled.)
The improved midcomms layer now carefully tracks all messages and
replies, and logs much more useful messages if there is an actual
problem.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:52 -05:00
Alexander Aring
dfc020f334 fs: dlm: fix grammar in lowcomms output
This patch fixes some grammar output in lowcomms implementation by
removing the "successful" word which should be "successfully" but it
can never be unsuccessfully so we remove it.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:50 -05:00
Alexander Aring
f10da927a5 fs: dlm: add comment about lkb IFL flags
This patch adds comments about the difference between the lower 2 bytes
of lkb flags and the 2 upper bytes of the lkb IFL flags. In short the
upper 2 bytes will be handled as internal flags whereas the lower 2
bytes are part of the DLM protocol and are used to exchange messages.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:49 -05:00
Alexander Aring
3182599f5f fs: dlm: handle recovery result outside of ls_recover
This patch cleans up the handling of recovery results by moving it from
ls_recover() to the caller do_ls_recovery(). This makes the error handling
clearer.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:48 -05:00
Alexander Aring
682bb91b6b fs: dlm: make new_lockspace() wait until recovery completes
Make dlm_new_lockspace() wait until a full recovery completes
sucessfully or fails. Previously, dlm_new_lockspace() returned
to the caller after dlm_recover_members() finished, which is
only partially through recovery.  The result of the previous
behavior is that the new lockspace would not be usable for some
time (especially with overlapping recoveries), and some errors
in the later part of recovery could not be returned to the caller.

Kernel callers gfs2 and cluster-md have their own wait handling to
wait for recovery to complete after calling dlm_new_lockspace().
This continues to work, but will be unnecessary.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:47 -05:00
Alexander Aring
7e09b15cfe fs: dlm: call dlm_lsop_recover_prep once
A lockspace can be "stopped" multiple times consecutively before
being "started" (when recoveries overlap.)  In this case, the
lsop_recover_prep callback only needs to be called once when the
lockspace is first stopped, and not repeatedly for each stop.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:46 -05:00
Alexander Aring
ca8031d917 fs: dlm: update comments about recovery and membership handling
Make clear that a particular recovery iteration must not be aborted
before membership changes are applied to the members list (ls_nodes)
and midcomms layer.  Interrupting recovery before this can result
in missing node-specific changes in midcomms or through lsops.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:40 -05:00
Alexander Aring
5d92a30e90 fs: dlm: add resource name to tracepoints
This patch adds the resource name to dlm tracepoints.  The name
usually comes through the lkb_resource, but in some cases a resource
may not yet be associated with an lkb, in which case the name and
namelen parameters are used.

It should be okay to access the lkb_resource and the res_name field at
the time when the tracepoint is invoked. The resource is assigned to a
lkb and it's reference is being held during the tracepoint call. During
this time the resource cannot be freed. Also a lkb will never switch
its assigned resource. The name of a dlm_rsb is assigned at creation
time and should never be changed during runtime as well.

The TP_printk() call uses always a hexadecimal string array
representation for the resource name (which is not necessarily ascii.)

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:53:09 -05:00
Alexander Aring
0c4c516fa2 fs: dlm: remove additional dereference of lksb
This patch removes a dereference of lksb of lkb when calling ast
tracepoint. First it reduces additional overhead, even if traces
are not active. Second we can deference it in TP_fast_assign from
the existing lkb parameter.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:53:08 -05:00
Alexander Aring
cd1e8ca9f3 fs: dlm: change ast and bast trace order
This patch moves the trace calls for ast and bast to before the
ast and bast callback functions are called rather than after.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:53:06 -05:00
Alexander Aring
b92a4e3f86 fs: dlm: change posix lock sigint handling
This patch changes the handling of a plock operation that was interrupted
while waiting for a user space reply from dlm_controld.  (This is not
the lock blocking state, i.e. locks_lock_file_wait().)

Currently, when an op is interrupted while waiting on user space, the
op is removed.  When the user space result later arrives, a kernel
message is loggged: "dev_write no op...".  This can be seen from a test
such as "stress-ng --fcntl 100" and interrupting it with ctrl-c.

Now, leave the op in place when interrupted and remove it when the
result arrives (the result will be ignored.)  With this change, the
logged message is not expected to appear, and would indicate a bug.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:53:05 -05:00
Alexander Aring
4d413ae9ce fs: dlm: use dlm_plock_info for do_unlock_close
This patch refactors do_unlock_close() by using only struct dlm_plock_info
as a parameter.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:53:04 -05:00
Alexander Aring
ea06d4cabf fs: dlm: change plock interrupted message to debug again
This patch reverses the commit bcfad4265c ("dlm: improve plock logging
if interrupted") by moving it to debug level and notifying the user an op
was removed.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:52:36 -05:00
Shyam Prasad N
8da33fd11c cifs: avoid deadlocks while updating iface
We use cifs_tcp_ses_lock to protect a lot of things.
Not only does it protect the lists of connections, sessions,
tree connects, open file lists, etc., we also use it to
protect some fields in each of it's entries.

In this case, cifs_mark_ses_for_reconnect takes the
cifs_tcp_ses_lock to traverse the lists, and then calls
cifs_update_iface. However, that can end up calling
cifs_put_tcp_session, which picks up the same lock again.

Avoid this by taking a ref for the session, drop the lock,
and then call update iface.

Also, in cifs_update_iface, avoid nested locking of iface_lock
and chan_lock, as much as possible. When unavoidable, we need
to pick iface_lock first.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-24 09:17:56 -05:00
Namjae Jeon
b5e5f9dfc9 ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA
FileOffset should not be greater than BeyondFinalZero in FSCTL_ZERO_DATA.
And don't call ksmbd_vfs_zero_data() if length is zero.

Cc: stable@vger.kernel.org
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23 23:30:46 -05:00
Namjae Jeon
18e39fb960 ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA
generic/091, 263 test failed since commit f66f8b94e7 ("cifs: when
extending a file with falloc we should make files not-sparse").
FSCTL_ZERO_DATA sets the range of bytes to zero without extending file
size. The VFS_FALLOCATE_FL_KEEP_SIZE flag should be used even on
non-sparse files.

Cc: stable@vger.kernel.org
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23 23:30:46 -05:00
Hyunchul Lee
745bbc0995 ksmbd: remove duplicate flag set in smb2_write
The writethrough flag is set again if is_rdma_channel is false.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23 23:30:46 -05:00
Zhang Jiaming
fe39dc98fb gfs2: Fix spelling mistake in comment
Change 'accomodate' to 'accommodate'.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-24 01:49:50 +02:00
Dave Chinner
5e672cd69f xfs: introduce xfs_inodegc_push()
The current blocking mechanism for pushing the inodegc queue out to
disk can result in systems becoming unusable when there is a long
running inodegc operation. This is because the statfs()
implementation currently issues a blocking flush of the inodegc
queue and a significant number of common system utilities will call
statfs() to discover something about the underlying filesystem.

This can result in userspace operations getting stuck on inodegc
progress, and when trying to remove a heavily reflinked file on slow
storage with a full journal, this can result in delays measuring in
hours.

Avoid this problem by adding "push" function that expedites the
flushing of the inodegc queue, but doesn't wait for it to complete.

Convert xfs_fs_statfs() and xfs_qm_scall_getquota() to use this
mechanism so they don't block but still ensure that queued
operations are expedited.

Fixes: ab23a77687 ("xfs: per-cpu deferred inode inactivation queues")
Reported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[djwong: fix _getquota_next to use _inodegc_push too]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-06-23 13:34:38 -07:00
Dave Chinner
7cf2b0f961 xfs: bound maximum wait time for inodegc work
Currently inodegc work can sit queued on the per-cpu queue until
the workqueue is either flushed of the queue reaches a depth that
triggers work queuing (and later throttling). This means that we
could queue work that waits for a long time for some other event to
trigger flushing.

Hence instead of just queueing work at a specific depth, use a
delayed work that queues the work at a bound time. We can still
schedule the work immediately at a given depth, but we no long need
to worry about leaving a number of items on the list that won't get
processed until external events prevail.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-06-23 13:34:38 -07:00
Alexander Aring
19d7ca051d fs: dlm: add pid to debug log
This patch adds the pid information which requested the lock operation
to the debug log output.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-23 14:41:39 -05:00
Alexander Aring
976a062434 fs: dlm: plock use list_first_entry
This patch will use the list helper list_first_entry() instead of using
list_entry() to get the first element of a list.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-23 14:22:10 -05:00
Linus Torvalds
fa1796a835 Merge tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Check for NULL in kretprobe_dispatcher()

   NULL can now be passed in, make sure it can handle it

 - Clean up unneeded #endif #ifdef of the same preprocessor
   check in the middle of the block.

 - Comment clean up

 - Remove unneeded initialization of the "ret" variable in
   __trace_uprobe_create()

* tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobes: Remove unwanted initialization in __trace_uprobe_create()
  tracefs: Fix syntax errors in comments
  tracing: Simplify conditional compilation code in tracing_set_tracer()
  tracing/kprobes: Check whether get_kretprobe() returns NULL in kretprobe_dispatcher()
2022-06-23 12:24:49 -05:00
Jens Axboe
386e4fb696 io_uring: use original request task for inflight tracking
In prior kernels, we did file assignment always at prep time. This meant
that req->task == current. But after deferring that assignment and then
pushing the inflight tracking back in, we've got the inflight tracking
using current when it should in fact now be using req->task.

Fixup that error introduced by adding the inflight tracking back after
file assignments got modifed.

Fixes: 9cae36a094 ("io_uring: reinstate the inflight tracking")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-23 11:06:43 -06:00
Dan Carpenter
2c09d1443b pstore/zone: cleanup "rcnt" type
The info->read() function returns ssize_t.  That means that info->read()
either returns either negative error codes or a positive number
representing the bytes read.

The "rcnt" variable should be declared as ssize_t as well.  Most places
do this correctly but psz_kmsg_recover_meta() needed to be fixed.

This code casts the "rcnt" to int.  That is unnecessary when "rcnt"
is already signed.  It's also slightly wrong because if info->read()
returned a very high (more than INT_MAX) number of bytes then this might
treat that as an error.  This bug cannot happen in real life, so it
doesn't affect run time, but static checkers correctly complain that it
is wrong.

fs/pstore/zone.c:366 psz_kmsg_recover_data() warn: casting 'rcnt' truncates high bits

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/YrRtPSFHDVJzV6d+@kili
2022-06-23 08:27:52 -07:00
Shyam Prasad N
6e1c1c08cd cifs: periodically query network interfaces from server
Currently, we only query the server for network interfaces
information at the time of mount, and never afterwards.
This can be a problem, especially for services like Azure,
where the IP address of the channel endpoints can change
over time.

With this change, we schedule a 600s polling of this info
from the server for each tree connect.

An alternative for periodic polling was to do this only at
the time of reconnect. But this could delay the reconnect
time slightly. Also, there are some challenges w.r.t how
we have cifs_reconnect implemented today.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-22 19:51:43 -05:00