linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-11 06:31:49 +00:00

Author	SHA1	Message	Date
Kemeng Shi	8ffc0cd24c	ext4: alloc test super block from sget This fix the oops in ext4 unit test which is cuased by NULL sb.s_user_ns as following: <4>[ 14.344565] map_id_range_down (kernel/user_namespace.c:318) <4>[ 14.345378] make_kuid (kernel/user_namespace.c:415) <4>[ 14.345998] inode_init_always (include/linux/fs.h:1375 fs/inode.c:174) <4>[ 14.346696] alloc_inode (fs/inode.c:268) <4>[ 14.347353] new_inode_pseudo (fs/inode.c:1007) <4>[ 14.348016] new_inode (fs/inode.c:1033) <4>[ 14.348644] ext4_mb_init (fs/ext4/mballoc.c:3404 fs/ext4/mballoc.c:3719) <4>[ 14.349312] mbt_kunit_init (fs/ext4/mballoc-test.c:57 fs/ext4/mballoc-test.c:314) <4>[ 14.349983] kunit_try_run_case (lib/kunit/test.c:388 lib/kunit/test.c:443) <4>[ 14.350696] kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:30) <4>[ 14.351530] kthread (kernel/kthread.c:388) <4>[ 14.352168] ret_from_fork (arch/arm64/kernel/entry.S:861) <0>[ 14.353385] Code: 52808004 b8236ae7 72be5e44 b90004c4 (38e368a1) Alloc test super block from sget to properly initialize test super block to fix the issue. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240304163543.6700-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Arnd Bergmann	d60c53694c	ext4: kunit: use dynamic inode allocation Storing an inode structure on the stack pushes some functions over the warning limit for stack frame size: In file included from fs/ext4/mballoc.c:7039: fs/ext4/mballoc-test.c:506:13: error: stack frame size (1032) exceeds limit (1024) in 'test_mark_diskspace_used' [-Werror,-Wframe-larger-than] 506 \| static void test_mark_diskspace_used(struct kunit *test) \| ^ Use kunit_kzalloc() for all inodes. There may be a better way to do it by preallocating the inode, which would result in a larger rework. Fixes: `2b81493f8e` ("ext4: Add unit test for ext4_mb_mark_diskspace_used") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240227161548.2929881-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Srivathsa Dara	07be778c70	ext4: enable meta_bg only when new desc blocks are needed This patch addresses an issue observed when resize_inode is disabled and an online extension of a filesysyem is performed. When a filesystem is expanded to a size that does not require a addition of a new descriptor block, the meta_bg feature is being enabled even though no part of the filesystem uses this layout. This patch ensures that the meta_bg feature is only enabled if any of the added block groups utilize meta_bg layout. Signed-off-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> Link: https://lore.kernel.org/r/20240227131329.2608466-1-srivathsa.d.dara@oracle.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Wenchao Hao	0efcd739fc	ext4: remove unused parameter biop in ext4_issue_discard() all caller of ext4_issue_discard() would set biop to NULL since 'commit `55cdd0af2b` ("ext4: get discard out of jbd2 commit kthread contex")', it's unnecessary to keep this parameter any more. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240226081731.3224470-1-haowenchao2@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Chengming Zhou	708623737b	ext4: remove SLAB_MEM_SPREAD flag usage The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1, so it became a dead flag since the commit `16a1d96835` ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete to avoid confusion for users. Here we can just remove all its users, which has no functional change. [1] https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/ Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20240224134822.829456-1-chengming.zhou@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Jan Kara	40da553f5d	ext4: verify s_clusters_per_group even without bigalloc Currently we ignore s_clusters_per_group field in the on-disk superblock if bigalloc feature is not enabled. However e2fsprogs don't even open the filesystem if s_clusters_per_group is invalid. This results in an odd state where kernel happily works with the filesystem while even e2fsck refuses to touch it. Verify that s_clusters_per_group is valid even if bigalloc feature is not enabled to make things consistent. Due to current e2fsprogs behavior it is unlikely there are filesystems out in the wild (except for intentionally fuzzed ones) with invalid s_clusters_per_group counts. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240219171033.22882-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Maximilian Heyne	a6b3bfe176	ext4: fix corruption during on-line resize We observed a corruption during on-line resize of a file system that is larger than 16 TiB with 4k block size. With having more then 2^32 blocks resize_inode is turned off by default by mke2fs. The issue can be reproduced on a smaller file system for convenience by explicitly turning off resize_inode. An on-line resize across an 8 GiB boundary (the size of a meta block group in this setup) then leads to a corruption: dev=/dev/<some_dev> # should be >= 16 GiB mkdir -p /corruption /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 221 - 215)) mount -t ext4 $dev /corruption dd if=/dev/zero bs=4096 of=/corruption/test count=$((2221 - 42*15)) sha1sum /corruption/test # 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test /sbin/resize2fs $dev $((22*21)) # drop page cache to force reload the block from disk echo 1 > /proc/sys/vm/drop_caches sha1sum /corruption/test # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test 2^21 = 2^152^6 equals 8 GiB whereof 2^15 is the number of blocks per block group and 2^6 are the number of block groups that make a meta block group. The last checksum might be different depending on how the file is laid out across the physical blocks. The actual corruption occurs at physical block 63*2^15 = 2064384 which would be the location of the backup of the meta block group's block descriptor. During the on-line resize the file system will be converted to meta_bg starting at s_first_meta_bg which is 2 in the example - meaning all block groups after 16 GiB. However, in ext4_flex_group_add we might add block groups that are not part of the first meta block group yet. In the reproducer we achieved this by substracting the size of a whole block group from the point where the meta block group would start. This must be considered when updating the backup block group descriptors to follow the non-meta_bg layout. The fix is to add a test whether the group to add is already part of the meta block group or not. Fixes: `01f795f9e0` ("ext4: add online resizing support for meta_bg and 64-bit file systems") Cc: <stable@vger.kernel.org> Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Tested-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> Reviewed-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> Link: https://lore.kernel.org/r/20240215155009.94493-1-mheyne@amazon.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Jan Kara	fa60629380	ext4: don't report EOPNOTSUPP errors from discard When ext4 is mounted without journal, with discard mount option, and on a device not supporting trim, we print error for each and every freed extent. This is not only useless but actively harmful. Instead ignore the EOPNOTSUPP error. Trim is only advisory anyway and when the filesystem has journal we silently ignore trim error as well. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240213101601.17463-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Jan Kara	7f48212678	ext4: drop duplicate ea_inode handling in ext4_xattr_block_set() ext4_xattr_block_set() drops ea_inode reference in two places. Handling it just under the 'cleanup' label is enough so drop the second occurence. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240209112107.10585-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-03-07 13:32:54 -05:00
Jan Kara	8208c41c43	ext4: fold quota accounting into ext4_xattr_inode_lookup_create() When allocating EA inode, quota accounting is done just before ext4_xattr_inode_lookup_create(). Logically these two operations belong together so just fold quota accounting into ext4_xattr_inode_lookup_create(). We also make ext4_xattr_inode_lookup_create() return the looked up / created inode to convert the function to a more standard calling convention. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240209112107.10585-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 23:44:50 -05:00
Baokun Li	4fbf8bc733	ext4: correct best extent lstart adjustment logic When yangerkun review commit `93cdf49f6e` ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()"), it was found that the best extent did not completely cover the original request after adjusting the best extent lstart in ext4_mb_new_inode_pa() as follows: original request: 2/10(8) normalized request: 0/64(64) best extent: 0/9(9) When we check if best ex can be kept at start of goal, ac_o_ex.fe_logical is 2 less than the adjusted best extent logical end 9, so we think the adjustment is done. But obviously 0/9(9) doesn't cover 2/10(8), so we should determine here if the original request logical end is less than or equal to the adjusted best extent logical end. In addition, add a comment stating when adjusted best_ex will not cover the original request, and remove the duplicate assertion because adjusting lstart makes no change to b_ex.fe_len. Link: https://lore.kernel.org/r/3630fa7f-b432-7afd-5f79-781bc3b2c5ea@huawei.com Fixes: `93cdf49f6e` ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") Cc: <stable@kernel.org> Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20240201141845.1879253-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:46:39 -05:00
Ye Bin	d8b945fa47	ext4: forbid commit inconsistent quota data when errors=remount-ro There's issue as follows When do IO fault injection test: Quota error (device dm-3): find_block_dqentry: Quota for id 101 referenced but not present Quota error (device dm-3): qtree_read_dquot: Can't read quota structure for id 101 Quota error (device dm-3): do_check_range: Getting block 2021161007 out of range 1-186 Quota error (device dm-3): qtree_read_dquot: Can't read quota structure for id 661 Now, ext4_write_dquot()/ext4_acquire_dquot()/ext4_release_dquot() may commit inconsistent quota data even if process failed. This may lead to filesystem corruption. To ensure filesystem consistent when errors=remount-ro there is need to call ext4_handle_error() to abort journal. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240119062908.3598806-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:40:57 -05:00
Zhang Yi	68ee261fb1	ext4: add a hint for block bitmap corrupt state in mb_groups If one group is marked as block bitmap corrupted, its free blocks cannot be used and its free count is also deducted from the global sbi->s_freeclusters_counter. User might be confused about the absent free space because we can't query the information about corrupted block groups except unreliable error messages in syslog. So add a hint to show block bitmap corrupted groups in mb_groups. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240119061154.1525781-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:39:38 -05:00
Cheng Nie	547e64bda9	ext4: fix the comment of ext4_map_blocks()/ext4_ext_map_blocks() this comment of ext4_map_blocks()/ext4_ext_map_blocks() need update after commit c21770573319("ext4: Define a new set of flags for ext4_get_blocks()"). Signed-off-by: Cheng Nie <niecheng1@uniontech.com> Link: https://lore.kernel.org/r/20240118062511.28276-1-niecheng1@uniontech.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:21 -05:00
yangerkun	4b55d3431c	ext4: improve error msg for ext4_mb_seq_groups_show While cat mb_groups for a mounted ext4 image which has some corrupted group, the string return to userspace was just "I/O error" which confuse me a lot. Improve it with ext4_decode_error. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240118042557.380058-2-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:21 -05:00
yangerkun	250448802c	ext4: remove unused buddy_loaded in ext4_mb_seq_groups_show We can just first call ext4_mb_unload_buddy, then copy information from ext4_group_info. So remove this unused value. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240118042557.380058-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:21 -05:00
Kemeng Shi	2b81493f8e	ext4: Add unit test for ext4_mb_mark_diskspace_used Add unit test for ext4_mb_mark_diskspace_used Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240103104900.464789-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:20 -05:00
Kemeng Shi	b7098e1fa7	ext4: Add unit test for mb_free_blocks Add unit test for mb_free_blocks. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240103104900.464789-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:20 -05:00
Kemeng Shi	ac96b56a2f	ext4: Add unit test for mb_mark_used Add unit test for mb_mark_used Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240103104900.464789-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:20 -05:00
Kemeng Shi	67d2a11b22	ext4: Add unit test of ext4_mb_generate_buddy Add unit test of ext4_mb_generate_buddy Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240103104900.464789-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:20 -05:00
Kemeng Shi	6c5e0c9c21	ext4: Add unit test for test_free_blocks_simple Add unit test for test_free_blocks_simple. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240103104900.464789-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-21 22:33:20 -05:00
Linus Torvalds	3f24fcdacd	Miscellaneous bug fixes and cleanups in ext4's multi-block allocator and extent handling code. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmW/G4YACgkQ8vlZVpUN gaPTpwf/c/Fk27GV8ge9PQtR6gmir/lyw2qkvK3Z+12aEsblZRmyvElyZWjAuNQG bciQyltabIPOA4XxfsZOrdgYI42n0rTTFG7bmI0lr+BJM/HRw0tGGMN91FZla0FP EXv/AiHKCqlT5OFZbD+8n1TzfdOgWotjug1VLteXve3YKjkDgt5IQm/0Gx9hKBld IR8SrQlD/rYe+VPvaHz5G4u09Ne5pUE5fDj3xE23wxfU5cloVzuVRCSOGWUCTnCW T0v6sHeKrmiLC8tIOZkBjer4nXC0MOu0p5+geAjwOArc9VJ1Lh2eAkH+GgDOVprx ahdl2qmbIbacBYECIeQ/+1i78+O1yw== =CmYr -----END PGP SIGNATURE----- Merge tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator and extent handling code" * tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type ext4: make ext4_map_blocks() distinguish delalloc only extent ext4: add a hole extent entry in cache after punch ext4: correct the hole length returned by ext4_map_blocks() ext4: convert to exclusive lock while inserting delalloc extents ext4: refactor ext4_da_map_blocks() ext4: remove 'needed' in trace_ext4_discard_preallocations ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations ext4: remove unused return value of ext4_mb_release_group_pa ext4: remove unused return value of ext4_mb_release_inode_pa ext4: remove unused return value of ext4_mb_release ext4: remove unused ext4_allocation_context::ac_groups_considered ext4: remove unneeded return value of ext4_mb_release_context ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*() ext4: remove unused return value of __mb_check_buddy ext4: mark the group block bitmap as corrupted before reporting an error ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() ...	2024-02-04 07:33:01 +00:00
Linus Torvalds	9e28c7a23b	five smb3 client fixes, mostly multichannel related -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmW+oQoACgkQiiy9cAdy T1E6GwwAk9PQo1N8h6AI0PWCPoiHps8YzpISTgBfchDs+sIvo1sZEVMzTmSlWm4F cON4nb3QBkD4aqWbCdb1tjby2VAb0aHuRbCQPAczW4+R7He8ELlGYow7IPBWmyI8 CTQRirYrJuIePh1aT2UG4mykNyQzp5i5ycsdcWFiIGliXrTp0De0rvE/60KGLuJ/ lnDZp7+FHytIfpwVzl4bGax4odQh14whxdQme4C9a8kVfYPQGKFyADbuxy0KmWmu 8kkEcGpY/bPdqLE1tGzNoVci9cEdYK6yUzkc8dj2ddsZ7YDHPz0NtsdWlC517qt+ ekDADHRQZiKwyJzMGHibK8E4WoP0+JjB4j6OTc369ICzgjuIutWCCIexbS0IV/po IuyRQTvLSTfYpE8eoQavAKaty6sDnJcP7FepPtH2k5yGr4+ILZV4ozoH+hqzn2/K sf/q9ZfkyfaEi2JkMD9TTv7k2EWLMntpxklbpg59VKHY5MGvlQqoRl9b+jzfgKEZ xS/myr3e =q5wf -----END PGP SIGNATURE----- Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: "Five smb3 client fixes, mostly multichannel related: - four multichannel fixes including fix for channel allocation when multiple inactive channels, fix for unneeded race in channel deallocation, correct redundant channel scaling, and redundant multichannel disabling scenarios - add warning if max compound requests reached" * tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: increase number of PDUs allowed in a compound request cifs: failure to add channel on iface should bump up weight cifs: do not search for channel if server is terminating cifs: avoid redundant calls to disable multichannel cifs: make sure that channel scaling is done only once	2024-02-04 07:26:19 +00:00
Linus Torvalds	fc86e5c990	Bug fixes for 6.8-rc3: * Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format attribute fork. * Remove conditional compilation of realtime geometry validator functions to prevent confusing error messages from being printed on the console during the mount operation. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZbo7FAAKCRAH7y4RirJu 9Cn+APsFEbHA8YQpCSxGDM+Xelez9X7wroi6QkyOxRP6Lqq6ogD6A96uuV86TxkQ Hkse9IAKkFoLmyzohT9u7Bv46M/X4w8= =Ez8Z -----END PGP SIGNATURE----- Merge tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Chandan Babu: - Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format attribute fork - Remove conditional compilation of realtime geometry validator functions to prevent confusing error messages from being printed on the console during the mount operation * tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove conditional building of rt geometry validator functions xfs: reset XFS_ATTR_INCOMPLETE filter on node removal	2024-02-04 07:22:51 +00:00
Linus Torvalds	56897d5188	Tracing and eventfs fixes for v6.8: - Fix the return code for ring_buffer_poll_wait() It was returing a -EINVAL instead of EPOLLERR. - Zero out the tracefs_inode so that all fields are initialized. The ti->private could have had stale data, but instead of just initializing it to NULL, clear out the entire structure when it is allocated. - Fix a crash in timerlat The hrtimer was initialized at read and not open, but is canceled at close. If the file was opened and never read the close will pass a NULL pointer to hrtime_cancel(). - Rewrite of eventfs. Linus wrote a patch series to remove the dentry references in the eventfs_inode and to use ref counting and more of proper VFS interfaces to make it work. - Add warning to put_ei() if ei is not set to free. That means something is about to free it when it shouldn't. - Restructure the eventfs_inode to make it more compact, and remove the unused llist field. - Remove the fsnotify() funtions for when the inodes were being created in the lookup code. It doesn't make sense to notify about creation just because something is being looked up. - The inode hard link count was not accurate. It was being updated when a file was looked up. The inodes of directories were updating their parent inode hard link count every time the inode was created. That means if memory reclaim cleaned a stale directory inode and the inode was lookup up again, it would increment the parent inode again as well. Al Viro said to just have all eventfs directories have a hard link count of 1. That tells user space not to trust it. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZb1l/RQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qk6jAQDmecDOnx+j/Rm5krbX/meVPYXFj2CU 1wO7w1HBzopsBwEA5AjTKm9IGrl/eVG/+jViS165b+sJfwEcblHEFPWcIwo= =uUzb -----END PGP SIGNATURE----- Merge tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing and eventfs fixes from Steven Rostedt: - Fix the return code for ring_buffer_poll_wait() It was returing a -EINVAL instead of EPOLLERR. - Zero out the tracefs_inode so that all fields are initialized. The ti->private could have had stale data, but instead of just initializing it to NULL, clear out the entire structure when it is allocated. - Fix a crash in timerlat The hrtimer was initialized at read and not open, but is canceled at close. If the file was opened and never read the close will pass a NULL pointer to hrtime_cancel(). - Rewrite of eventfs. Linus wrote a patch series to remove the dentry references in the eventfs_inode and to use ref counting and more of proper VFS interfaces to make it work. - Add warning to put_ei() if ei is not set to free. That means something is about to free it when it shouldn't. - Restructure the eventfs_inode to make it more compact, and remove the unused llist field. - Remove the fsnotify() funtions for when the inodes were being created in the lookup code. It doesn't make sense to notify about creation just because something is being looked up. - The inode hard link count was not accurate. It was being updated when a file was looked up. The inodes of directories were updating their parent inode hard link count every time the inode was created. That means if memory reclaim cleaned a stale directory inode and the inode was lookup up again, it would increment the parent inode again as well. Al Viro said to just have all eventfs directories have a hard link count of 1. That tells user space not to trust it. * tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Keep all directory links at 1 eventfs: Remove fsnotify*() functions from lookup() eventfs: Restructure eventfs_inode structure to be more condensed eventfs: Warn if an eventfs_inode is freed without is_freed being set tracing/timerlat: Move hrtimer_init to timerlat_fd open() eventfs: Get rid of dentry pointers without refcounts eventfs: Clean up dentry ops and add revalidate function eventfs: Remove unused d_parent pointer field tracefs: dentry lookup crapectomy tracefs: Avoid using the ei->dentry pointer unnecessarily eventfs: Initialize the tracefs inode properly tracefs: Zero out the tracefs_inode when allocating it ring-buffer: Clean ring_buffer_poll_wait() error return	2024-02-02 15:32:58 -08:00
Linus Torvalds	6b89b6af45	gfs2: revert this broken commit -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmW9FuAUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpX2A//ZE1Tb/YHSqiVHvUSH0YFyKc50H24 Q3KqNH7LHRKBBIyIUJxZ2QaYsmmGPQqaB3RDAAQoItYb6mXaJQR0qIAxfKmcaUec genFrJNuaICIq/U+cQeAypnRSvOhnMfsY9c/jFPCZI72bS8mMDcGC1vvJW+R1N1V c5VAGuI/W1hkYOaVba3CkQ7sKcV/P+qbCI6dHB7DuCMr8L0foftItz0NIQ7tL9VM PMRtcUSehgsiDSH7HthvmHvC2bfBXgjgXTde9f5aqMoBkJE/QuCl4W3FCChcAFLg bxy1Ke6z4SF6N70BiKj+enSBDUVljOZOd/5IZOUPd6BuDcKDK/vKX6Vva5xYjnJ2 iZOBdGFoPVAmevt9iXPehMdoWcEZOZeHzBV82+k9My5wjWdLWp6ZtFQ/zhvt42YN Z0EWPIKqxN4xSpQFEjczlKA1IWPv6YJTJJBPaCS86rBduU/cMvSwVFGnk3XnES2o k4fyBF6UUYUe3tCD0E6NfBJCgHiq4V1ZjMMXiXKAUUe6L5zSIucjws27l3VZofFo abjE/wxR5ad+3T0rt0sx+gFD8aiCzqKYaHgHHYAbofUXiTpEt7gMNhtNGlzUudO9 KpkAiWGHtYUlDP1qIWR8lHTdf8Vj6KpCrcuyPVDIjWb9im6rC+6fh63PE9ddrLfI CqjwSoFOXRGEjm0= =Qv1t -----END PGP SIGNATURE----- Merge tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 revert from Andreas Gruenbacher: "It turns out that the commit to use GL_NOBLOCK flag for non-blocking lookups has several issues, and not all of them have a simple fix" * tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"	2024-02-02 15:30:33 -08:00
Andreas Gruenbacher	e9f1e6bb55	Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" Commit "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" has several issues, some of which are non-trivial to fix, so revert it for now: https://lore.kernel.org/gfs2/20240202050230.GA875515@ZenIV/T/ This reverts commit `dd00aaeb34`. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2024-02-02 17:21:44 +01:00
Zhang Yi	ec9d669eba	ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type Since ext4_map_blocks() can recognize a delayed allocated only extent, make ext4_set_iomap() can also recognize it, and remove the useless separate check in ext4_iomap_begin_report(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-01 23:59:21 -05:00
Zhang Yi	874eaba96d	ext4: make ext4_map_blocks() distinguish delalloc only extent Add a new map flag EXT4_MAP_DELAYED to indicate the mapping range is a delayed allocated only (not unwritten) one, and making ext4_map_blocks() can distinguish it, no longer mixing it with holes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-01 23:59:21 -05:00
Zhang Yi	9f1118223a	ext4: add a hole extent entry in cache after punch In order to cache hole extents in the extent status tree and keep the hole length as long as possible, re-add a hole entry to the cache just after punching a hole. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-01 23:59:21 -05:00
Zhang Yi	6430dea07e	ext4: correct the hole length returned by ext4_map_blocks() In ext4_map_blocks(), if we can't find a range of mapping in the extents cache, we are calling ext4_ext_map_blocks() to search the real path and ext4_ext_determine_hole() to determine the hole range. But if the querying range was partially or completely overlaped by a delalloc extent, we can't find it in the real extent path, so the returned hole length could be incorrect. Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc extent, but it searches start from the expanded hole_start, doesn't start from the querying range, so the delalloc extent found could not be the one that overlaped the querying range, plus, it also didn't adjust the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle delalloc and insert adjusted hole extent in ext4_ext_determine_hole(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-01 23:47:02 -05:00
Zhang Yi	acf795dc16	ext4: convert to exclusive lock while inserting delalloc extents ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem when inserting delalloc extents, it could be raced by another querying path of ext4_map_blocks() without i_rwsem, .e.g buffered read path. Suppose we buffered read a file containing just a hole, and without any cached extents tree, then it is raced by another delayed buffered write to the same area or the near area belongs to the same hole, and the new delalloc extent could be overwritten to a hole extent. pread() pwrite() filemap_read_folio() ext4_mpage_readpages() ext4_map_blocks() down_read(i_data_sem) ext4_ext_determine_hole() //find hole ext4_ext_put_gap_in_cache() ext4_es_find_extent_range() //no delalloc extent ext4_da_map_blocks() down_read(i_data_sem) ext4_insert_delayed_block() //insert delalloc extent ext4_es_insert_extent() //overwrite delalloc extent to hole This race could lead to inconsistent delalloc extents tree and incorrect reserved space counter. Fix this by converting to hold i_data_sem in exclusive mode when adding a new delalloc extent in ext4_da_map_blocks(). Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-01 23:47:02 -05:00
Zhang Yi	3fcc2b887a	ext4: refactor ext4_da_map_blocks() Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary parameters and branches, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-02-01 23:47:02 -05:00
Linus Torvalds	49a4be2c84	Description for this pull request: - Fix BUG in iov_iter_revert reported from syzbot. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmW7jWYWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCJvdEACPLj/xOrjTCzOPam6iKaf4NXGF 8LlAtHv9nDwKIXoA3lqMFGkc74cx2JOk/rnuO4l2SE75eWdlTHjo27XKFyDoakcF kij+1bw7BUy4AwgANc4jIUTFYCx8mbzJF18eTuJdrs2h0+3rFHhcs9PpPmLfjDbZ BGY6Xhc/a/1/56lA6IMsioLHBiOn7FCVl+I+joj+I0etMfqKg3mkro2LJ2AaiCjv kvsiZ8A3HHgHM8X3JEOqqzokhTrbtQ6Thrit1NSH7e/JdjKSZvkAIv81otU6Fect vLQSz4mfRdfzAhwrrX+cTSLljkgv7OwD9AfeyI2Fhhu0lQjzjIFknTghZsIreQm4 1YnAfKukWJ8AfzVmAMQVbqtWfP3WAs2gHYgoIibm8VIkjHEOmPHnK83cJCqFlq92 OvQZsqZK2u3hXj2L3fl+1sQlSt+bcnpSE/vEDxpEwxVN3YifY9pJoyhP09RjH1GE /VnSHgX1laVeaNXinwlMKL1uTTVCjuosm4zFzByAQwwka6YCG4nLoHVZ6u2M26Bb a/tw9RTeGnlQiMy/rb5QdxKTBVCuZM8eMbcaVc5yNnD9P7yRqbRNaI0SnmBjOU7I V6bNqJUZwlPjQMeFJ0cUB5uaYopioA7lCjA0Hv8Eh+FX6v9g7oJvIXiGFPb8o4Pq adUVsyGvxbVBabdAjg== =CryD -----END PGP SIGNATURE----- Merge tag 'exfat-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: - Fix BUG in iov_iter_revert reported from syzbot * tag 'exfat-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix zero the unwritten part for dio read	2024-02-01 11:45:53 -08:00
Paulo Alcantara	11d4d1dba3	smb: client: increase number of PDUs allowed in a compound request With the introduction of SMB2_OP_QUERY_WSL_EA, the client may now send 5 commands in a single compound request in order to query xattrs from potential WSL reparse points, which should be fine as we currently allow up to 5 PDUs in a single compound request. However, if encryption is enabled (e.g. 'seal' mount option) or enforced by the server, current MAX_COMPOUND(5) won't be enough as we require an extra PDU for the transform header. Fix this by increasing MAX_COMPOUND to 7 and, while we're at it, add an WARN_ON_ONCE() and return -EIO instead of -ENOMEM in case we attempt to send a compound request that couldn't include the extra transform header. Signed-off-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-02-01 12:15:51 -06:00
Shyam Prasad N	6aac002bcf	cifs: failure to add channel on iface should bump up weight After the interface selection policy change to do a weighted round robin, each iface maintains a weight_fulfilled. When the weight_fulfilled reaches the total weight for the iface, we know that the weights can be reset and ifaces can be allocated from scratch again. During channel allocation failures on a particular channel, weight_fulfilled is not incremented. If a few interfaces are inactive, we could end up in a situation where the active interfaces are all allocated for the total_weight, and inactive ones are all that remain. This can cause a situation where no more channels can be allocated further. This change fixes it by increasing weight_fulfilled, even when channel allocation failure happens. This could mean that if there are temporary failures in channel allocation, the iface weights may not strictly be adhered to. But that's still okay. Fixes: `a6d8fb54a5` ("cifs: distribute channels across interfaces based on speed") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-02-01 12:13:05 -06:00
Shyam Prasad N	88675b22d3	cifs: do not search for channel if server is terminating In order to scale down the channels, the following sequence of operations happen: 1. server struct is marked for terminate 2. the channel is deallocated in the ses->chans array 3. at a later point the cifsd thread actually terminates the server Between 2 and 3, there can be calls to find the channel for a server struct. When that happens, there can be an ugly warning that's logged. But this is expected. So this change does two things: 1. in cifs_ses_get_chan_index, if server->terminate is set, return 2. always make sure server->terminate is set with chan_lock held Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-02-01 12:12:17 -06:00
Shyam Prasad N	e77e15fa5e	cifs: avoid redundant calls to disable multichannel When the server reports query network interface info call as unsupported following a tree connect, it means that multichannel is unsupported, even if the server capabilities report otherwise. When this happens, cifs_chan_skip_or_disable is called to disable multichannel on the client. However, we only need to call this when multichannel is currently setup. Fixes: `f591062bdb` ("cifs: handle servers that still advertise multichannel after disabling") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-02-01 12:10:57 -06:00
Steven Rostedt (Google)	ca185770db	eventfs: Keep all directory links at 1 The directory link count in eventfs was somewhat bogus. It was only being updated when a directory child was being looked up and not on creation. One solution would be to update in get_attr() the link count by iterating the ei->children list and then adding 2. But that could slow down simple stat() calls, especially if it's done on all directories in eventfs. Another solution would be to add a parent pointer to the eventfs_inode and keep track of the number of sub directories it has on creation. But this adds overhead for something not really worthwhile. The solution decided upon is to keep all directory links in eventfs as 1. This tells user space not to rely on the hard links of directories. Which in this case it shouldn't. Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis.org Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Fixes: `c1504e5102` ("eventfs: Implement eventfs dir creation functions") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 11:53:53 -05:00
Steven Rostedt (Google)	12d823b31f	eventfs: Remove fsnotify() functions from lookup() The dentries and inodes are created when referenced in the lookup code. There's no reason to call fsnotify_() functions when they are created by a reference. It doesn't make any sense. Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.166973329@goodmis.org Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Fixes: `a376007917` ("eventfs: Implement functions to create files and dirs when accessed"); Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 11:53:53 -05:00
Steven Rostedt (Google)	264424dfdd	eventfs: Restructure eventfs_inode structure to be more condensed Some of the eventfs_inode structure has holes in it. Rework the structure to be a bit more condensed, and also remove the no longer used llist field. Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.002321438@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 11:53:52 -05:00
Steven Rostedt (Google)	5a49f99604	eventfs: Warn if an eventfs_inode is freed without is_freed being set There should never be a case where an evenfs_inode is being freed without is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would mean there was one too many put_ei()s. Link: https://lore.kernel.org/linux-trace-kernel/20240201161616.843551963@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 11:53:52 -05:00
Linus Torvalds	43aa6f97c2	eventfs: Get rid of dentry pointers without refcounts The eventfs inode had pointers to dentries (and child dentries) without actually holding a refcount on said pointer. That is fundamentally broken, and while eventfs tried to then maintain coherence with dentries going away by hooking into the '.d_iput' callback, that doesn't actually work since it's not ordered wrt lookups. There were two reasonms why eventfs tried to keep a pointer to a dentry: - the creation of a 'events' directory would actually have a stable dentry pointer that it created with tracefs_start_creating(). And it needed that dentry when tearing it all down again in eventfs_remove_events_dir(). This use is actually ok, because the special top-level events directory dentries are actually stable, not just a temporary cache of the eventfs data structures. - the 'eventfs_inode' (aka ei) needs to stay around as long as there are dentries that refer to it. It then used these dentry pointers as a replacement for doing reference counting: it would try to make sure that there was only ever one dentry associated with an event_inode, and keep a child dentry array around to see which dentries might still refer to the parent ei. This gets rid of the invalid dentry pointer use, and renames the one valid case to a different name to make it clear that it's not just any random dentry. The magic child dentry array that is kind of a "reverse reference list" is simply replaced by having child dentries take a ref to the ei. As does the directory dentries. That makes the broken use case go away. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `c1504e5102` ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 10:31:17 -05:00
Linus Torvalds	8dce06e98c	eventfs: Clean up dentry ops and add revalidate function In order for the dentries to stay up-to-date with the eventfs changes, just add a 'd_revalidate' function that checks the 'is_freed' bit. Also, clean up the dentry release to actually use d_release() rather than the slightly odd d_iput() function. We don't care about the inode, all we want to do is to get rid of the refcount to the eventfs data added by dentry->d_fsdata. It would probably be cleaner to make eventfs its own filesystem, or at least set its own dentry ops when looking up eventfs files. But as it is, only eventfs dentries use d_fsdata, so we don't really need to split these things up by use. Another thing that might be worth doing is to make all eventfs lookups mark their dentries as not worth caching. We could do that with d_delete(), but the DCACHE_DONTCACHE flag would likely be even better. As it is, the dentries are all freeable, but they only tend to get freed at memory pressure rather than more proactively. But that's a separate issue. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `c1504e5102` ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 10:31:17 -05:00
Linus Torvalds	408600be78	eventfs: Remove unused d_parent pointer field It's never used Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.961772428@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `c1504e5102` ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 10:31:17 -05:00
Linus Torvalds	49304c2b93	tracefs: dentry lookup crapectomy The dentry lookup for eventfs files was very broken, and had lots of signs of the old situation where the filesystem names were all created statically in the dentry tree, rather than being looked up dynamically based on the eventfs data structures. You could see it in the naming - how it claimed to "create" dentries rather than just look up the dentries that were given it. You could see it in various nonsensical and very incorrect operations, like using "simple_lookup()" on the dentries that were passed in, which only results in those dentries becoming negative dentries. Which meant that any other lookup would possibly return ENOENT if it saw that negative dentry before the data was then later filled in. You could see it in the immense amount of nonsensical code that didn't actually just do lookups. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131233227.73db55e1@gandalf.local.home Cc: stable@vger.kernel.org Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: `c1504e5102` ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-02-01 10:30:33 -05:00
Shyam Prasad N	ee36a3b345	cifs: make sure that channel scaling is done only once Following a successful cifs_tree_connect, we have the code to scale up/down the number of channels in the session. However, it is not protected by a lock today. As a result, this code can be executed by several processes that select the same channel. The core functions handle this well, as they pick chan_lock. However, we've seen cases where smb2_reconnect throws some warnings. To fix that, this change introduces a flags bitmap inside the cifs_ses structure. A new flag type is used to ensure that only one process enters this section at any time. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-31 16:52:03 -06:00
Linus Torvalds	99c001cb61	tracefs: Avoid using the ei->dentry pointer unnecessarily The eventfs_find_events() code tries to walk up the tree to find the event directory that a dentry belongs to, in order to then find the eventfs inode that is associated with that event directory. However, it uses an odd combination of walking the dentry parent, looking up the eventfs inode associated with that, and then looking up the dentry from there. Repeat. But the code shouldn't have back-pointers to dentries in the first place, and it should just walk the dentry parenthood chain directly. Similarly, 'set_top_events_ownership()' looks up the dentry from the eventfs inode, but the only reason it wants a dentry is to look up the superblock in order to look up the root dentry. But it already has the real filesystem inode, which has that same superblock pointer. So just pass in the superblock pointer using the information that's already there, instead of looking up extraneous data that is irrelevant. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.638645365@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `c1504e5102` ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-31 14:15:48 -05:00
Linus Torvalds	4fa4b010b8	eventfs: Initialize the tracefs inode properly The tracefs-specific fields in the inode were not initialized before the inode was exposed to others through the dentry with 'd_instantiate()'. Move the field initializations up to before the d_instantiate. Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.478449628@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-31 14:15:48 -05:00
Steven Rostedt (Google)	d81786f53a	tracefs: Zero out the tracefs_inode when allocating it eventfs uses the tracefs_inode and assumes that it's already initialized to zero. That is, it doesn't set fields to zero (like ti->private) after getting its tracefs_inode. This causes bugs due to stale values. Just initialize the entire structure to zero on allocation so there isn't any more surprises. This is a partial fix to access to ti->private. The assignment still needs to be made before the dentry is instantiated. Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.315825944@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-31 14:15:47 -05:00

1 2 3 4 5 ...

89171 Commits