linux

Author	SHA1	Message	Date
Darrick J. Wong	7be3bd8856	xfs: empty xattr leaf header blocks are not corruption TLDR: Revert commit `51e6104fdb` ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") because it was wrong. Every now and then we get a corruption report from the kernel or xfs_repair about empty leaf blocks in the extended attribute structure. We've long thought that these shouldn't be possible, but prior to 5.18 one would shake loose in the recoveryloop fstests about once a month. A new addition to the xattr leaf block verifier in 5.19-rc1 makes this happen every 7 minutes on my testing cloud. I added a ton of logging to detect any time we set the header count on an xattr leaf block to zero. This produced the following dmesg output on generic/388: XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_create+0x187/0x230 xfs_attr_shortform_to_leaf+0xd1/0x2f0 xfs_attr_set_iter+0x73e/0xa90 xfs_xattri_finish_update+0x45/0x80 xfs_attr_finish_item+0x1b/0xd0 xfs_defer_finish_noroll+0x19c/0x770 __xfs_trans_commit+0x153/0x3e0 xfs_attr_set+0x36b/0x740 xfs_xattr_set+0x89/0xd0 __vfs_setxattr+0x67/0x80 __vfs_setxattr_noperm+0x6e/0x120 vfs_setxattr+0x97/0x180 setxattr+0x88/0xa0 path_setxattr+0xc3/0xe0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 So now we know that someone is creating empty xattr leaf blocks as part of converting a sf xattr structure into a leaf xattr structure. The conversion routine logs any existing sf attributes in the same transaction that creates the leaf block, so we know this is a setxattr to a file that has no attributes at all. Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log recovery. I also augmented buffer item recovery to call ->verify_struct on any attr leaf blocks and complain if it finds a failure: XFS (sda4): Unmounting Filesystem XFS (sda4): Mounting V5 Filesystem XFS (sda4): Starting recovery (logdev: internal) XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_verify+0x3b8/0x420 xlog_recover_buf_commit_pass2+0x60a/0x6c0 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x33c/0x350 xlog_recovery_process_trans+0xa5/0xe0 xlog_recover_process_data+0x8d/0x140 xlog_do_recovery_pass+0x19b/0x720 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x33/0x1d0 xlog_recover+0xda/0x190 xfs_log_mount+0x14c/0x360 xfs_mountfs+0x517/0xa60 xfs_fs_fill_super+0x6bc/0x950 get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fc61e241eae And a moment later, the _delwri_submit of the recovered buffers trips the same verifier and recovery fails: XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78 XFS (sda4): Unmount and run xfs_repair XFS (sda4): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;....... 00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00 .....).x........ 00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d ......I......1>. 00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00 ................ 00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 .P.............. 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518). Shutting down filesystem. XFS (sda4): Please unmount the filesystem and rectify the problem(s) XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed I think I see what's going on here -- setxattr is racing with something that shuts down the filesystem: Thread 1 Thread 2 -------- -------- xfs_attr_sf_addname xfs_attr_shortform_to_leaf <create empty leaf> xfs_trans_bhold(leaf) xattri_dela_state = XFS_DAS_LEAF_ADD <roll transaction> <flush log> <shut down filesystem> xfs_trans_bhold_release(leaf) <discover fs is dead, bail> Thread 3 -------- <cycle mount, start recovery> xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer <replay empty leaf buffer from recovered buf item> xfs_buf_delwri_queue(leaf) xfs_buf_delwri_submit _xfs_buf_ioapply(leaf) xfs_attr3_leaf_write_verify <trip over empty leaf buffer> <fail recovery> As you can see, the bhold keeps the leaf buffer locked and thus prevents the AIL from tripping over the ichdr.count==0 check in the write verifier. Unfortunately, it doesn't prevent the log from getting flushed to disk, which sets up log recovery to fail. So. It's clear that the kernel has always had the ability to persist attr leaf blocks with ichdr.count==0, which means that it's part of the ondisk format now. Unfortunately, this check has been added and removed multiple times throughout history. It first appeared in[1] kernel 3.10 as part of the early V5 format patches. The check was later discovered to break log recovery and hence disabled[2] during log recovery in kernel 4.10. Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to weed out the empty leaf blocks. This was still not correct because log recovery would recover an empty attr leaf block successfully only for regular xattr operations to trip over the empty block during of the block during regular operation. Therefore, the check was removed entirely[4] in kernel 5.7 but removal of the xfs_repair check was forgotten. The continued complaints from xfs_repair lead to us mistakenly re-adding[5] the verifier check for kernel 5.19. Remove it once again. [1] `517c22207b` ("xfs: add CRCs to attr leaf blocks") [2] `2e1d23370e` ("xfs: ignore leaf attr ichdr.count in verifier during log replay") [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0") [4] `f28cef9e4d` ("xfs: don't fail verifier on empty attr3 leaf block") [5] `51e6104fdb` ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Looking at the rest of the xattr code, it seems that files with empty leaf blocks behave as expected -- listxattr reports no attributes; getxattr on any xattr returns nothing as expected; removexattr does nothing; and setxattr can add attributes just fine. Original-bug: `517c22207b` ("xfs: add CRCs to attr leaf blocks") Still-not-fixed-by: `2e1d23370e` ("xfs: ignore leaf attr ichdr.count in verifier during log replay") Removed-in: `f28cef9e4d` ("xfs: don't fail verifier on empty attr3 leaf block") Fixes: `51e6104fdb` ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-06-29 08:47:56 -07:00
Andreas Gruenbacher	6feaec8147	gfs2: List traversal in do_promote is safe In do_promote(), we're never removing the current entry from the list and so the list traversal is actually safe. Switch back to list_for_each_entry(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-29 17:01:59 +02:00
Bob Peterson	0befb8511e	gfs2: do_promote glock holder stealing fix In do_promote(), when the glock had no strong holders, we were accidentally calling demote_incompat_holders() with new_gh == NULL, so no weak holders were considered incompatible. Instead, the new holder should have been passed in. For doing that, the HIF_HOLDER flag needs to be set in new_gh to prevent may_grant() from complaining. This means that the new holder will now be recognized as a current holder, so skip over it explicitly in demote_incompat_holders() to prevent it from being dequeued. To further clarify things, we can now rename new_gh to current_gh in demote_incompat_holders(); after all, the HIF_HOLDER flag is already set, which means the new holder is already a current holder. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-29 17:00:55 +02:00
Andreas Gruenbacher	8f0028fc60	gfs2: Use better variable name In do_promote() and add_to_queue(), use current_gh as the variable name for the first strong holder we could find: this matches the variable name is may_grant(), and more clearly indicates that we're interested in one (any) of the current strong holders. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-29 17:00:21 +02:00
Andreas Gruenbacher	5f38a4d3c4	gfs2: Make go_instantiate take a glock Make go_instantiate take a glock instead of a glock holder as its argument: this handler is supposed to instantiate the object associated with the glock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-29 16:59:07 +02:00
Andreas Gruenbacher	86c30a01f5	gfs2: Add new go_held glock operation Right now, inode_go_instantiate() contains functionality that relates to how a glock is held rather than the glock itself, like waiting for pending direct I/O to complete and completing interrupted truncates. This code is meant to be run each time a holder is acquired, but go_instantiate is actually only called once, when the glock is instantiated. To fix that, introduce a new go_held glock operation that is called each time a glock holder is acquired. Move the holder specific code in inode_go_instantiate() over to inode_go_held(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-29 16:56:41 +02:00
Andreas Gruenbacher	de3f906f0a	gfs2: Revert 'Fix "truncate in progress" hang' Now that interrupted truncates are completed in the context of the process taking the glock, there is no need for the glock state engine to delegate that task to gfs2_quotad or for quotad to perform those truncates anymore. Get rid of the obsolete associated infrastructure. Reverts commit `813e0c46c9` ("GFS2: Fix "truncate in progress" hang"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2022-06-29 16:54:59 +02:00
Andreas Gruenbacher	53d6913295	gfs2: Instantiate glocks ouside of glock state engine Instantiate glocks outside of the glock state engine: there is no real reason for instantiating them inside the glock state engine; it only complicates the code. Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait() using the new gfs2_glock_holder_ready() helper. On top of that, the only other place that acquires a glock without using gfs2_glock_wait() or gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call gfs2_glock_holder_ready() there as well. If a dinode has a pending truncate, the glock-specific instantiate function for inodes wakes up the truncate function in the quota daemon. Waiting for the completion of the truncate was previously done by the glock state engine, but we now need to wait in inode_go_instantiate(). This also means that gfs2_instantiate() will now no longer return any "special" error codes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-29 16:53:22 +02:00
Andreas Gruenbacher	bdff777cbb	gfs2: Fix up gfs2_glock_async_wait Since commit `1fc05c8d84` ("gfs2: cancel timed-out glock requests"), a pending locking request can be canceled by calling gfs2_glock_dq() on the pending holder. In gfs2_glock_async_wait(), when we time out, use that to cancel the remaining locking requests and dequeue the locking requests already granted. That's simpler as well as more efficient than waiting for all locking requests to eventually be granted and dequeuing them then. In addition, gfs2_glock_async_wait() promises that by the time the function completes, all glocks are either granted or dequeued, but the implementation doesn't keep that promise if individual locking requests fail. Fix that as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-29 16:47:44 +02:00
Matthew Wilcox (Oracle)	0b768a9610	nfs: Leave pages in the pagecache if readpage failed The pagecache handles readpage failing by itself; it doesn't want filesystems to remove pages from under it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	6e8e79fc84	buffer: Remove check for PageError If a buffer is completed with an error, its uptodate flag will be clear, so the page_uptodate variable will have been set to 0. There's no need to check PageError here. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	9329883a1c	orangefs: Remove test for folio error The page cache clears the error bit before calling ->read_folio(), so this condition could never have been true. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	ba9863127c	iomap: Remove test for folio error Just because there has been a read error doesn't mean we should avoid marking this part of the folio as uptodate. Indeed, it may overwrite the error part of the folio and let us mark the entire folio uptodate. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	3b60d53df0	jfs: Remove check for PageUptodate Pages returned from read_mapping_page() are always uptodate, so this check is unnecessary. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	da028b6b64	remap_range: Remove check of uptodate flag read_mapping_folio() returns an ERR_PTR if the folio is not uptodate, so this check is simply dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	771075e15e	ufs: Remove checks for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, or a page that is not Uptodate, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	b0c971e7b7	reiserfs: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	19cb4273a2	ntfs3: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	62a3a4dd47	ntfs: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	79ea65563a	nilfs2: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this test is not needed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:07 -04:00
Matthew Wilcox (Oracle)	750cd7d0e6	ext2: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this test is not needed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	17bb554879	ntfs: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	ca02bcabd7	hfsplus: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	c9ed489c66	hfs: Remove check for PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	54c6260fa8	freevxfs: Remove check of PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	59fc647405	afs: Remove check of PageError If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	f6e0e17344	nilfs2: Convert nilfs_copy_back_pages() to use filemap_get_folios() Use folios throughout. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	1508062ecd	hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios() Use folios throughout this function. That removes the last caller of huge_pagevec_release(), so delete that too. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	bbfe4f6600	f2fs: Convert f2fs_invalidate_compress_pages() to use filemap_get_folios() Convert this function to use folios throughout. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Chao Yu <chao@kernel.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	7530d0935c	ext4: Convert mpage_map_and_submit_buffers() to use filemap_get_folios() The called functions all use pages, so just convert back to a page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	fb5a5be05f	ext4: Convert mpage_release_unused_pages() to use filemap_get_folios() If the folio is large, it may overlap the beginning or end of the unused range. If it does, we need to avoid invalidating it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-29 08:51:06 -04:00
Matthew Wilcox (Oracle)	9e0b6f31ba	buffer: Convert clean_bdev_aliases() to use filemap_get_folios() Use a folio throughout this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-29 08:51:05 -04:00
Matthew Wilcox (Oracle)	d9ef44de5d	hugetlb: Convert huge_add_to_page_cache() to use a folio Remove the last caller of add_to_page_cache() Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com>	2022-06-29 08:51:05 -04:00
Matthew Wilcox (Oracle)	211d04445b	mpage: Convert do_mpage_readpage() to use a folio Pass in a folio from mpage_readahead(). Also convert map_buffer_to_page() to map_buffer_to_folio(). There's still no support for large folios here; there are numerous places which depend on the folio being PAGE_SIZE. The VM_BUG_ON prevents anyone from thinking that it will work. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:05 -04:00
Matthew Wilcox (Oracle)	6ffcd825e7	mm: Remove __delete_from_page_cache() This wrapper is no longer used. Remove it and all references to it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:05 -04:00
Andreas Gruenbacher	44dab005fd	gfs2: Minor gfs2_glock_nq_m cleanup Add state and flags arguments to gfs2_rlist_alloc() to make it somewhat more obvious which state and flags an rlist uses. With that, stop knocking off flags in gfs2_glock_nq_m() and its nq_m_sync() helper that are never set in the first place. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-06-28 20:38:15 +02:00
Siddhesh Poyarekar	ed5fce76b5	vfs: escape hash as well When a filesystem is mounted with a name that starts with a #: # mount '#name' /mnt/bad -t tmpfs this will cause the entry to look like this (leading space added so that git does not strip it out): #name /mnt/bad tmpfs rw,seclabel,relatime,inode64 0 0 This breaks getmntent and any code that aims to parse fstab as well as /proc/mounts with the same logic since they need to strip leading spaces or skip over comment lines, due to which they report incorrect output or skip over the line respectively. Solve this by translating the hash character into its octal encoding equivalent so that applications can decode the name correctly. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-06-28 13:58:05 -04:00
Chao Yu	29be7ec3df	f2fs: initialize page_array_entry slab only if compression feature is on Otherwise, in image which doesn't support compression feature, page_array_entry will be initialized w/o use. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-06-28 09:27:10 -07:00
Jack Qiu	a4a0e16dbf	f2fs: optimize error handling in redirty_blocks Current error handling is at risk of page leaks. However, we dot't seek any failure scenarios, just use f2fs_bug_on. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-06-28 09:27:09 -07:00
Jaegeuk Kim	7859e97f62	f2fs: do not skip updating inode when retrying to flush node page Let's try to flush dirty inode again to improve subtle i_blocks mismatch. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-06-28 09:27:09 -07:00
Konstantin Komarov	e4d2f4fd53	fs/ntfs3: Enable FALLOC_FL_INSERT_RANGE Changed logic in ntfs_fallocate - more clear checks in beginning instead of the middle of function and added FALLOC_FL_INSERT_RANGE. Fixes xfstest generic/064 Fixes: `4342306f0f` ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-06-28 18:51:12 +03:00
Konstantin Komarov	aa30eccb24	fs/ntfs3: Fallocate (FALLOC_FL_INSERT_RANGE) implementation Add functions for inserting hole in file and inserting range in run. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-06-28 18:51:06 +03:00
Amir Goldstein	8698e3bab4	fanotify: refine the validation checks on non-dir inode mask Commit `ceaf69f8ea` ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense. For backward compatibility, these restictions were added only to new (v5.17+) APIs. It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well. Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode. Fixes: `ceaf69f8ea` ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3.lan/ Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-06-28 11:18:13 +02:00
akpm	ee56c3e8ee	Merge branch 'master' into mm-nonmm-stable	2022-06-27 10:31:44 -07:00
Imran Khan	1d25b84e44	kernfs: Replace global kernfs_open_file_mutex with hashed mutexes. In current kernfs design a single mutex, kernfs_open_file_mutex, protects the list of kernfs_open_file instances corresponding to a sysfs attribute. So even if different tasks are opening or closing different sysfs files they can contend on osq_lock of this mutex. The contention is more apparent in large scale systems with few hundred CPUs where most of the CPUs have running tasks that are opening, accessing or closing sysfs files at any point of time. Using hashed mutexes in place of a single global mutex, can significantly reduce contention around global mutex and hence can provide better scalability. Moreover as these hashed mutexes are not part of kernfs_node objects we will not see any singnificant change in memory utilization of kernfs based file systems like sysfs, cgroupfs etc. Modify interface introduced in previous patch to make use of hashed mutexes. Use kernfs_node address as hashing key. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-5-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-27 16:46:15 +02:00
Imran Khan	41448c6148	kernfs: Introduce interface to access global kernfs_open_file_mutex. This allows to change underlying mutex locking, without needing to change the users of the lock. For example next patch modifies this interface to use hashed mutexes in place of a single global kernfs_open_file_mutex. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-4-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-27 16:46:15 +02:00
Imran Khan	b8f35fa118	kernfs: Change kernfs_notify_list to llist. At present kernfs_notify_list is implemented as a singly linked list of kernfs_node(s), where last element points to itself and value of ->attr.next tells if node is present on the list or not. Both addition and deletion to list happen under kernfs_notify_lock. Change kernfs_notify_list to llist so that addition to list can heppen locklessly. Suggested by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-3-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-27 16:46:15 +02:00
Imran Khan	086c00c71f	kernfs: make ->attr.open RCU protected. After removal of kernfs_open_node->refcnt in the previous patch, kernfs_open_node_lock can be removed as well by making ->attr.open RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock. Suggested by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-27 16:46:14 +02:00
Lin Feng	dcab8da13f	kernfs/file.c: remove redundant error return counter assignment Since previous 'rc = -EINVAL;', rc value doesn't change, so not necessary to re-assign it again. Signed-off-by: Lin Feng <linf@wangsu.com> Link: https://lore.kernel.org/r/20220617091746.206515-1-linf@wangsu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-27 16:44:40 +02:00
Alexey Khoroshilov	8a9ffb8c85	NFSD: restore EINVAL error translation in nfsd_commit() commit `555dbf1a9a` ("nfsd: Replace use of rwsem with errseq_t") incidentally broke translation of -EINVAL to nfserr_notsupp. The patch restores that. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: `555dbf1a9a` ("nfsd: Replace use of rwsem with errseq_t") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-06-27 10:33:05 -04:00

... 17 18 19 20 21 ...

77928 Commits