linux

Author	SHA1	Message	Date
Christian Brauner	81a1807d80	attr: fix kernel doc When building kernel documentation new warnings were generated because the name in the parameter documentation didn't match the parameter name. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-27 16:08:31 +02:00
Keith Busch	bf8d08532b	iomap: add support for dma aligned direct-io Use the address alignment requirements from the block_device for direct io instead of requiring addresses be aligned to the block size. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-12-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Darrick J. Wong	f94e08b602	xfs: clean up the end of xfs_attri_item_recover The end of this function could use some cleanup -- the EAGAIN conditionals make it harder to figure out what's going on with the disposal of xattri_leaf_bp, and the dual error/ret variables aren't needed. Turn the EAGAIN case into a separate block documenting all the subtleties of recovering in the middle of an xattr update chain, which makes the rest of the prologue much simpler. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-06-26 14:43:28 -07:00
Darrick J. Wong	b822ea17fd	xfs: always free xattri_leaf_bp when cancelling a deferred op While running the following fstest with logged xattrs DISabled, I noticed the following: # FSSTRESS_AVOID="-z -f unlink=1 -f rmdir=1 -f creat=2 -f mkdir=2 -f getfattr=3 -f listfattr=3 -f attr_remove=4 -f removefattr=4 -f setfattr=20 -f attr_set=60" ./check generic/475 INFO: task u9:1:40 blocked for more than 61 seconds. Tainted: G O 5.19.0-rc2-djwx #rc2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:u9:1 state:D stack:12872 pid: 40 ppid: 2 flags:0x00004000 Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs] Call Trace: <TASK> __schedule+0x2db/0x1110 schedule+0x58/0xc0 schedule_timeout+0x115/0x160 __down_common+0x126/0x210 down+0x54/0x70 xfs_buf_lock+0x2d/0xe0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xfs_buf_item_unpin+0x227/0x3a0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xfs_trans_committed_bulk+0x18e/0x320 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xlog_cil_committed+0x2ea/0x360 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xlog_cil_push_work+0x60f/0x690 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] process_one_work+0x1df/0x3c0 worker_thread+0x53/0x3b0 kthread+0xea/0x110 ret_from_fork+0x1f/0x30 </TASK> This appears to be the result of shortform_to_leaf creating a new leaf buffer as part of adding an xattr to a file. The new leaf buffer is held and attached to the xfs_attr_intent structure, but then the filesystem shuts down. Instead of the usual path (which adds the attr to the held leaf buffer which releases the hold), we instead cancel the entire deferred operation. Unfortunately, xfs_attr_cancel_item doesn't release any attached leaf buffers, so we leak the locked buffer. The CIL cannot do anything about that, and hangs. Fix this by teaching it to release leaf buffers, and make XFS a little more careful about not leaving a dangling reference. The prologue of xfs_attri_item_recover is (in this author's opinion) a little hard to figure out, so I'll clean that up in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-06-26 14:43:28 -07:00
Kaixu Xia	82af880639	xfs: use invalidate_lock to check the state of mmap_lock We should use invalidate_lock and XFS_MMAPLOCK_SHARED to check the state of mmap_lock rw_semaphore in xfs_isilocked(), rather than i_rwsem and XFS_IOLOCK_SHARED. Fixes: `2433480a7e` ("xfs: Convert to use invalidate_lock") Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-26 14:43:28 -07:00
Kaixu Xia	ca76a761ea	xfs: factor out the common lock flags assert There are similar lock flags assert in xfs_ilock(), xfs_ilock_nowait(), xfs_iunlock(), thus we can factor it out into a helper that is clear. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-26 14:43:28 -07:00
Linus Torvalds	413c1f1491	Merge tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Minor things, mainly - mailmap updates, MAINTAINERS updates, etc. Fixes for this merge window: - fix for a damon boot hang, from SeongJae - fix for a kfence warning splat, from Jason Donenfeld - fix for zero-pfn pinning, from Alex Williamson - fix for fallocate hole punch clearing, from Mike Kravetz Fixes for previous releases: - fix for a performance regression, from Marcelo - fix for a hwpoisining BUG from zhenwei pi" * tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add entry for Christian Marangi mm/memory-failure: disable unpoison once hw error happens hugetlbfs: zero partial pages during fallocate hole punch mm: memcontrol: reference to tools/cgroup/memcg_slabinfo.py mm: re-allow pinning of zero pfns mm/kfence: select random number before taking raw lock MAINTAINERS: add maillist information for LoongArch MAINTAINERS: update MM tree references MAINTAINERS: update Abel Vesa's email MAINTAINERS: add MEMORY HOT(UN)PLUG section and add David as reviewer MAINTAINERS: add Miaohe Lin as a memory-failure reviewer mailmap: add alias for jarkko@profian.com mm/damon/reclaim: schedule 'damon_reclaim_timer' only after 'system_wq' is initialized kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal mm: lru_cache_disable: use synchronize_rcu_expedited mm/page_isolation.c: fix one kernel-doc comment	2022-06-26 14:00:55 -07:00
Linus Torvalds	82708bb1eb	Merge tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - zoned relocation fixes: - fix critical section end for extent writeback, this could lead to out of order write - prevent writing to previous data relocation block group if space gets low - reflink fixes: - fix race between reflinking and ordered extent completion - proper error handling when block reserve migration fails - add missing inode iversion/mtime/ctime updates on each iteration when replacing extents - fix deadlock when running fsync/fiemap/commit at the same time - fix false-positive KCSAN report regarding pid tracking for read locks and data race - minor documentation update and link to new site * tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Documentation: update btrfs list of features and link to readthedocs.io btrfs: fix deadlock with fsync+fiemap+transaction commit btrfs: don't set lock_owner when locking extent buffer for reading btrfs: zoned: fix critical section of relocation inode writeback btrfs: zoned: prevent allocation from previous data relocation BG btrfs: do not BUG_ON() on failure to migrate space when replacing extents btrfs: add missing inode updates on each iteration when replacing extents btrfs: fix race between reflinking and ordered extent completion	2022-06-26 10:11:36 -07:00
Christian Brauner	b27c82e129	attr: port attribute changes to new types Now that we introduced new infrastructure to increase the type safety for filesystems supporting idmapped mounts port the first part of the vfs over to them. This ports the attribute changes codepaths to rely on the new better helpers using a dedicated type. Before this change we used to take a shortcut and place the actual values that would be written to inode->i_{g,u}id into struct iattr. This had the advantage that we moved idmappings mostly out of the picture early on but it made reasoning about changes more difficult than it should be. The filesystem was never explicitly told that it dealt with an idmapped mount. The transition to the value that needed to be stored in inode->i_{g,u}id appeared way too early and increased the probability of bugs in various codepaths. We know place the same value in struct iattr no matter if this is an idmapped mount or not. The vfs will only deal with type safe vfs{g,u}id_t. This makes it massively safer to perform permission checks as the type will tell us what checks we need to perform and what helpers we need to use. Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to inode->i_{g,u}id since they are different types. Instead they need to use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the vfs{g,u}id into the filesystem. The other nice effect is that filesystems like overlayfs don't need to care about idmappings explicitly anymore and can simply set up struct iattr accordingly directly. Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1] Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-26 18:18:56 +02:00
Christian Brauner	0e363cf3fa	security: pass down mount idmapping to setattr hook Before this change we used to take a shortcut and place the actual values that would be written to inode->i_{g,u}id into struct iattr. This had the advantage that we moved idmappings mostly out of the picture early on but it made reasoning about changes more difficult than it should be. The filesystem was never explicitly told that it dealt with an idmapped mount. The transition to the value that needed to be stored in inode->i_{g,u}id appeared way too early and increased the probability of bugs in various codepaths. We know place the same value in struct iattr no matter if this is an idmapped mount or not. The vfs will only deal with type safe vfs{g,u}id_t. This makes it massively safer to perform permission checks as the type will tell us what checks we need to perform and what helpers we need to use. Adapt the security_inode_setattr() helper to pass down the mount's idmapping to account for that change. Link: https://lore.kernel.org/r/20220621141454.2914719-8-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-26 18:18:56 +02:00
Christian Brauner	71e7b535b8	quota: port quota helpers mount ids Port the is_quota_modification() and dqout_transfer() helper to type safe vfs{g,u}id_t. Since these helpers are only called by a few filesystems don't introduce a new helper but simply extend the existing helpers to pass down the mount's idmapping. Note, that this is a non-functional change, i.e. nothing will have happened here or at the end of this series to how quota are done! This a change necessary because we will at the end of this series make ownership changes easier to reason about by keeping the original value in struct iattr for both non-idmapped and idmapped mounts. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. Since struct iattr uses an anonymous union with overlapping types as supported by the C standard, filesystems that haven't converted to ia_vfs{g,u}id won't see any difference and things will continue to work as before. In other words, no functional changes intended with this change. Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-26 18:18:55 +02:00
Christian Brauner	35faf3109a	fs: port to iattr ownership update helpers Earlier we introduced new helpers to abstract ownership update and remove code duplication. This converts all filesystems supporting idmapped mounts to make use of these new helpers. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. No functional changes intended. Link: https://lore.kernel.org/r/20220621141454.2914719-6-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-26 18:18:55 +02:00
Linus Torvalds	97d4d02697	Merge tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: - Use updated exfat_chain directly instead of snapshot values in rename. * tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: use updated exfat_chain directly during renaming	2022-06-26 08:41:04 -07:00
Linus Torvalds	918c30dffd	Merge tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs client fixes from Steve French: "Fixes addressing important multichannel, and reconnect issues. Multichannel mounts when the server network interfaces changed, or ip addresses changed, uncovered problems, especially in reconnect, but the patches for this were held up until recently due to some lock conflicts that are now addressed. Included in this set of fixes: - three fixes relating to multichannel reconnect, dynamically adjusting the list of server interfaces to avoid problems during reconnect - a lock conflict fix related to the above - two important fixes for negotiate on secondary channels (null netname can unintentionally cause multichannel to be disabled to some servers) - a reconnect fix (reporting incorrect IP address in some cases)" * tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update cifs_ses::ip_addr after failover cifs: avoid deadlocks while updating iface cifs: periodically query network interfaces from server cifs: during reconnect, update interface if necessary cifs: change iface_list from array to sorted linked list smb3: use netname when available on secondary channels smb3: fix empty netname context on secondary channels	2022-06-26 08:34:52 -07:00
Jason A. Donenfeld	067baa9a37	ksmbd: use vfs_llseek instead of dereferencing NULL By not checking whether llseek is NULL, this might jump to NULL. Also, it doesn't check FMODE_LSEEK. Fix this by using vfs_llseek(), which always does the right thing. Fixes: `f441584858` ("cifsd: add file operations") Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: Ronnie Sahlberg <lsahlber@redhat.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-25 19:52:49 -05:00
Eric Biggers	c5bca38d2e	f2fs: use the updated test_dummy_encryption helper functions Switch f2fs over to the functions that are replacing fscrypt_set_test_dummy_encryption(). Since f2fs hasn't been converted to the new mount API yet, this doesn't really provide a benefit for f2fs. But it allows fscrypt_set_test_dummy_encryption() to be removed. Also take the opportunity to eliminate an #ifdef. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-06-25 12:11:56 -07:00
Linus Torvalds	29eeafc661	Merge tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Some urgent fixes to avoid generating corrupted inodes caused by compressed and inline_data files. In addition, avoid a wrong error report which prevents a roll-forward recovery" * tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: do not count ENOENT for error case f2fs: fix iostat related lock protection f2fs: attach inline_data after setting compression	2022-06-25 09:19:51 -07:00
Ard Biesheuvel	2d82e6227e	efi: vars: Move efivar caching layer into efivarfs Move the fiddly bits of the efivar layer into its only remaining user, efivarfs, and confine its use to that particular module. All other uses of the EFI variable store have no need for this additional layer of complexity, given that they either only read variables, or read and write variables into a separate GUIDed namespace, and cannot be used to manipulate EFI variables that are covered by the EFI spec and/or affect the boot flow. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-06-24 20:40:19 +02:00
Ard Biesheuvel	3a75f9f2f9	efi: vars: Use locking version to iterate over efivars linked lists Both efivars and efivarfs uses __efivar_entry_iter() to go over the linked list that shadows the list of EFI variables held by the firmware, but fail to call the begin/end helpers that are documented as a prerequisite. So switch to the proper version, which is efivar_entry_iter(). Given that in both cases, efivar_entry_remove() is invoked with the lock held already, don't take the lock there anymore. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-06-24 20:40:19 +02:00
Ard Biesheuvel	ec3507b2ca	efi: vars: Don't drop lock in the middle of efivar_init() Even though the efivars_lock lock is documented as protecting the efivars->ops pointer (among other things), efivar_init() happily releases and reacquires the lock for every EFI variable that it enumerates. This used to be needed because the lock was originally a spinlock, which prevented the callback that is invoked for every variable from being able to sleep. However, releasing the lock could potentially invalidate the ops pointer, but more importantly, it might allow a SetVariable() runtime service call to take place concurrently, and the UEFI spec does not define how this affects an enumeration that is running in parallel using the GetNextVariable() runtime service, which is what efivar_init() uses. In the meantime, the lock has been converted into a semaphore, and the only reason we need to drop the lock is because the efivarfs pseudo filesystem driver will otherwise deadlock when it invokes the efivars API from the callback to create the efivar_entry items and insert them into the linked list. (EFI pstore is affected in a similar way) So let's switch to helpers that can be used while the lock is already taken. This way, we can hold on to the lock throughout the enumeration. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-06-24 20:40:18 +02:00
Ard Biesheuvel	8ca869b245	pstore: Add priv field to pstore_record for backend specific use The EFI pstore backend will need to store per-record variable name data when we switch away from the efivars layer. Add a priv field to struct pstore_record, and document it as holding a backend specific pointer that is assumed to be a kmalloc()d buffer, and will be kfree()d when the entire record is freed. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-06-24 20:40:04 +02:00
Paulo Alcantara	af3a6d1018	cifs: update cifs_ses::ip_addr after failover cifs_ses::ip_addr wasn't being updated in cifs_session_setup() when reconnecting SMB sessions thus returning wrong value in /proc/fs/cifs/DebugData. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-24 13:34:28 -05:00
Linus Torvalds	598f240487	Merge tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "A few fixes that should go into the 5.19 release. All are fixing issues that either happened in this release, or going to stable. In detail: - A small series of fixlets for the poll handling, all destined for stable (Pavel) - Fix a merge error from myself that caused a potential -EINVAL for the recv/recvmsg flag setting (me) - Fix a kbuf recycling issue for partial IO (me) - Use the original request for the inflight tracking (me) - Fix an issue introduced this merge window with trace points using a custom decoder function, which won't work for perf (Dylan)" * tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block: io_uring: use original request task for inflight tracking io_uring: move io_uring_get_opcode out of TP_printk io_uring: fix double poll leak on repolling io_uring: fix wrong arm_poll error handling io_uring: fail links when poll fails io_uring: fix req->apoll_events io_uring: fix merge error in checking send/recv addr2 flags io_uring: mark reissue requests with REQ_F_PARTIAL_IO	2022-06-24 11:02:26 -07:00
Alexander Aring	8d614a4457	fs: dlm: remove timeout from dlm_user_adopt_orphan Remove the unused timeout parameter from dlm_user_adopt_orphan(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:53 -05:00
Alexander Aring	2bb2a3d66c	fs: dlm: remove waiter warnings This patch removes warning messages that could be logged when remote requests had been waiting on a reply message for some timeout period (which could be set through configfs, but was rarely enabled.) The improved midcomms layer now carefully tracks all messages and replies, and logs much more useful messages if there is an actual problem. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:52 -05:00
Alexander Aring	dfc020f334	fs: dlm: fix grammar in lowcomms output This patch fixes some grammar output in lowcomms implementation by removing the "successful" word which should be "successfully" but it can never be unsuccessfully so we remove it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:50 -05:00
Alexander Aring	f10da927a5	fs: dlm: add comment about lkb IFL flags This patch adds comments about the difference between the lower 2 bytes of lkb flags and the 2 upper bytes of the lkb IFL flags. In short the upper 2 bytes will be handled as internal flags whereas the lower 2 bytes are part of the DLM protocol and are used to exchange messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:49 -05:00
Alexander Aring	3182599f5f	fs: dlm: handle recovery result outside of ls_recover This patch cleans up the handling of recovery results by moving it from ls_recover() to the caller do_ls_recovery(). This makes the error handling clearer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:48 -05:00
Alexander Aring	682bb91b6b	fs: dlm: make new_lockspace() wait until recovery completes Make dlm_new_lockspace() wait until a full recovery completes sucessfully or fails. Previously, dlm_new_lockspace() returned to the caller after dlm_recover_members() finished, which is only partially through recovery. The result of the previous behavior is that the new lockspace would not be usable for some time (especially with overlapping recoveries), and some errors in the later part of recovery could not be returned to the caller. Kernel callers gfs2 and cluster-md have their own wait handling to wait for recovery to complete after calling dlm_new_lockspace(). This continues to work, but will be unnecessary. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:47 -05:00
Alexander Aring	7e09b15cfe	fs: dlm: call dlm_lsop_recover_prep once A lockspace can be "stopped" multiple times consecutively before being "started" (when recoveries overlap.) In this case, the lsop_recover_prep callback only needs to be called once when the lockspace is first stopped, and not repeatedly for each stop. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:46 -05:00
Alexander Aring	ca8031d917	fs: dlm: update comments about recovery and membership handling Make clear that a particular recovery iteration must not be aborted before membership changes are applied to the members list (ls_nodes) and midcomms layer. Interrupting recovery before this can result in missing node-specific changes in midcomms or through lsops. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:40 -05:00
Alexander Aring	5d92a30e90	fs: dlm: add resource name to tracepoints This patch adds the resource name to dlm tracepoints. The name usually comes through the lkb_resource, but in some cases a resource may not yet be associated with an lkb, in which case the name and namelen parameters are used. It should be okay to access the lkb_resource and the res_name field at the time when the tracepoint is invoked. The resource is assigned to a lkb and it's reference is being held during the tracepoint call. During this time the resource cannot be freed. Also a lkb will never switch its assigned resource. The name of a dlm_rsb is assigned at creation time and should never be changed during runtime as well. The TP_printk() call uses always a hexadecimal string array representation for the resource name (which is not necessarily ascii.) Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:09 -05:00
Alexander Aring	0c4c516fa2	fs: dlm: remove additional dereference of lksb This patch removes a dereference of lksb of lkb when calling ast tracepoint. First it reduces additional overhead, even if traces are not active. Second we can deference it in TP_fast_assign from the existing lkb parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:08 -05:00
Alexander Aring	cd1e8ca9f3	fs: dlm: change ast and bast trace order This patch moves the trace calls for ast and bast to before the ast and bast callback functions are called rather than after. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:06 -05:00
Alexander Aring	b92a4e3f86	fs: dlm: change posix lock sigint handling This patch changes the handling of a plock operation that was interrupted while waiting for a user space reply from dlm_controld. (This is not the lock blocking state, i.e. locks_lock_file_wait().) Currently, when an op is interrupted while waiting on user space, the op is removed. When the user space result later arrives, a kernel message is loggged: "dev_write no op...". This can be seen from a test such as "stress-ng --fcntl 100" and interrupting it with ctrl-c. Now, leave the op in place when interrupted and remove it when the result arrives (the result will be ignored.) With this change, the logged message is not expected to appear, and would indicate a bug. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:05 -05:00
Alexander Aring	4d413ae9ce	fs: dlm: use dlm_plock_info for do_unlock_close This patch refactors do_unlock_close() by using only struct dlm_plock_info as a parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:04 -05:00
Alexander Aring	ea06d4cabf	fs: dlm: change plock interrupted message to debug again This patch reverses the commit `bcfad4265c` ("dlm: improve plock logging if interrupted") by moving it to debug level and notifying the user an op was removed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:52:36 -05:00
Shyam Prasad N	8da33fd11c	cifs: avoid deadlocks while updating iface We use cifs_tcp_ses_lock to protect a lot of things. Not only does it protect the lists of connections, sessions, tree connects, open file lists, etc., we also use it to protect some fields in each of it's entries. In this case, cifs_mark_ses_for_reconnect takes the cifs_tcp_ses_lock to traverse the lists, and then calls cifs_update_iface. However, that can end up calling cifs_put_tcp_session, which picks up the same lock again. Avoid this by taking a ref for the session, drop the lock, and then call update iface. Also, in cifs_update_iface, avoid nested locking of iface_lock and chan_lock, as much as possible. When unavoidable, we need to pick iface_lock first. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-24 09:17:56 -05:00
Namjae Jeon	b5e5f9dfc9	ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA FileOffset should not be greater than BeyondFinalZero in FSCTL_ZERO_DATA. And don't call ksmbd_vfs_zero_data() if length is zero. Cc: stable@vger.kernel.org Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-23 23:30:46 -05:00
Namjae Jeon	18e39fb960	ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA generic/091, 263 test failed since commit `f66f8b94e7` ("cifs: when extending a file with falloc we should make files not-sparse"). FSCTL_ZERO_DATA sets the range of bytes to zero without extending file size. The VFS_FALLOCATE_FL_KEEP_SIZE flag should be used even on non-sparse files. Cc: stable@vger.kernel.org Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-23 23:30:46 -05:00
Hyunchul Lee	745bbc0995	ksmbd: remove duplicate flag set in smb2_write The writethrough flag is set again if is_rdma_channel is false. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-23 23:30:46 -05:00
Zhang Jiaming	fe39dc98fb	gfs2: Fix spelling mistake in comment Change 'accomodate' to 'accommodate'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-24 01:49:50 +02:00
Dave Chinner	5e672cd69f	xfs: introduce xfs_inodegc_push() The current blocking mechanism for pushing the inodegc queue out to disk can result in systems becoming unusable when there is a long running inodegc operation. This is because the statfs() implementation currently issues a blocking flush of the inodegc queue and a significant number of common system utilities will call statfs() to discover something about the underlying filesystem. This can result in userspace operations getting stuck on inodegc progress, and when trying to remove a heavily reflinked file on slow storage with a full journal, this can result in delays measuring in hours. Avoid this problem by adding "push" function that expedites the flushing of the inodegc queue, but doesn't wait for it to complete. Convert xfs_fs_statfs() and xfs_qm_scall_getquota() to use this mechanism so they don't block but still ensure that queued operations are expedited. Fixes: `ab23a77687` ("xfs: per-cpu deferred inode inactivation queues") Reported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: fix _getquota_next to use _inodegc_push too] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-23 13:34:38 -07:00
Dave Chinner	7cf2b0f961	xfs: bound maximum wait time for inodegc work Currently inodegc work can sit queued on the per-cpu queue until the workqueue is either flushed of the queue reaches a depth that triggers work queuing (and later throttling). This means that we could queue work that waits for a long time for some other event to trigger flushing. Hence instead of just queueing work at a specific depth, use a delayed work that queues the work at a bound time. We can still schedule the work immediately at a given depth, but we no long need to worry about leaving a number of items on the list that won't get processed until external events prevail. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-23 13:34:38 -07:00
Alexander Aring	19d7ca051d	fs: dlm: add pid to debug log This patch adds the pid information which requested the lock operation to the debug log output. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-23 14:41:39 -05:00
Alexander Aring	976a062434	fs: dlm: plock use list_first_entry This patch will use the list helper list_first_entry() instead of using list_entry() to get the first element of a list. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-23 14:22:10 -05:00
Linus Torvalds	fa1796a835	Merge tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Check for NULL in kretprobe_dispatcher() NULL can now be passed in, make sure it can handle it - Clean up unneeded #endif #ifdef of the same preprocessor check in the middle of the block. - Comment clean up - Remove unneeded initialization of the "ret" variable in __trace_uprobe_create() * tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/uprobes: Remove unwanted initialization in __trace_uprobe_create() tracefs: Fix syntax errors in comments tracing: Simplify conditional compilation code in tracing_set_tracer() tracing/kprobes: Check whether get_kretprobe() returns NULL in kretprobe_dispatcher()	2022-06-23 12:24:49 -05:00
Jens Axboe	386e4fb696	io_uring: use original request task for inflight tracking In prior kernels, we did file assignment always at prep time. This meant that req->task == current. But after deferring that assignment and then pushing the inflight tracking back in, we've got the inflight tracking using current when it should in fact now be using req->task. Fixup that error introduced by adding the inflight tracking back after file assignments got modifed. Fixes: `9cae36a094` ("io_uring: reinstate the inflight tracking") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-23 11:06:43 -06:00
Dan Carpenter	2c09d1443b	pstore/zone: cleanup "rcnt" type The info->read() function returns ssize_t. That means that info->read() either returns either negative error codes or a positive number representing the bytes read. The "rcnt" variable should be declared as ssize_t as well. Most places do this correctly but psz_kmsg_recover_meta() needed to be fixed. This code casts the "rcnt" to int. That is unnecessary when "rcnt" is already signed. It's also slightly wrong because if info->read() returned a very high (more than INT_MAX) number of bytes then this might treat that as an error. This bug cannot happen in real life, so it doesn't affect run time, but static checkers correctly complain that it is wrong. fs/pstore/zone.c:366 psz_kmsg_recover_data() warn: casting 'rcnt' truncates high bits Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/YrRtPSFHDVJzV6d+@kili	2022-06-23 08:27:52 -07:00
Shyam Prasad N	6e1c1c08cd	cifs: periodically query network interfaces from server Currently, we only query the server for network interfaces information at the time of mount, and never afterwards. This can be a problem, especially for services like Azure, where the IP address of the channel endpoints can change over time. With this change, we schedule a 600s polling of this info from the server for each tree connect. An alternative for periodic polling was to do this only at the time of reconnect. But this could delay the reconnect time slightly. Also, there are some challenges w.r.t how we have cifs_reconnect implemented today. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-22 19:51:43 -05:00

... 18 19 20 21 22 ...

77928 Commits