Commit Graph

1279395 Commits

Author SHA1 Message Date
Chen Ni
b80cc4df11
ipc: mqueue: remove assignment from IS_ERR argument
Remove assignment from IS_ERR() argument.
This is detected by coccinelle.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Link: https://lore.kernel.org/r/20240708080404.3859094-1-nichen@iscas.ac.cn
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-09 06:47:40 +02:00
Mateusz Guzik
f378ec4eec
vfs: rename parent_ino to d_parent_ino and make it use RCU
The routine is used by procfs through dir_emit_dots.

The combined RCU and lock fallback implementation is too big for an
inline. Given that the routine takes a dentry argument fs/dcache.c seems
like the place to put it in.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240627161152.802567-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-27 18:34:21 +02:00
Mateusz Guzik
0ef625bba6
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
The newly used helper also checks for empty ("") paths.

NULL paths with any flag value other than AT_EMPTY_PATH go the usual
route and end up with -EFAULT to retain compatibility (Rust is abusing
calls of the sort to detect availability of statx).

This avoids path lookup code, lockref management, memory allocation and
in case of NULL path userspace memory access (which can be quite
expensive with SMAP on x86_64).

Benchmarked with statx(..., AT_EMPTY_PATH, ...) running on Sapphire
Rapids, with the "" path for the first two cases and NULL for the last
one.

Results in ops/s:
stock:     4231237
pre-check: 5944063 (+40%)
NULL path: 6601619 (+11%/+56%)

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240625151807.620812-1-mjguzik@gmail.com
Tested-by: Xi Ruoyao <xry111@xry111.site>
[brauner: use path_mounted() and other tweaks]
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-27 18:31:20 +02:00
Christian Brauner
27a2d0cb2f
stat: use vfs_empty_path() helper
Use the newly added helper for this.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-27 18:31:20 +02:00
Christian Brauner
1bc6d4452d
fs: new helper vfs_empty_path()
Make it possible to quickly check whether AT_EMPTY_PATH is valid.
Note, after some discussion we decided to also allow NULL to be passed
instead of requiring the empty string.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-27 18:31:19 +02:00
Christian Brauner
9fb9ff7ed1
fs: reflow may_create_in_sticky()
This function is so messy and hard to read, let's reflow it to make it
more readable.

Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240621-affekt-denkzettel-3c115f68355a@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-27 18:31:17 +02:00
Mateusz Guzik
8e3447822d
vfs: remove redundant smp_mb for thp handling in do_dentry_open
opening for write performs:

if (f->f_mode & FMODE_WRITE) {
[snip]
        smp_mb();
        if (filemap_nr_thps(inode->i_mapping)) {
[snip]
        }
}

filemap_nr_thps on kernels built without CONFIG_READ_ONLY_THP_FOR
expands to 0, allowing the compiler to eliminate the entire thing, with
exception of the fence (and the branch leading there).

So happens required synchronisation between i_writecount and nr_thps
changes is already provided by the full fence coming from
get_write_access -> atomic_inc_unless_negative, thus the smp_mb instance
above can be removed regardless of CONFIG_READ_ONLY_THP_FOR.

While I updated commentary in places claiming to match the now-removed
fence, I did not try to patch them to act on the compile option.

I did not bother benchmarking it, not issuing a spurious full fence in
the fast path does not warrant justification from perf standpoint.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240624085402.493630-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25 11:15:48 +02:00
Youling Tang
153216cf7b
fuse: Use in_group_or_capable() helper
Use the in_group_or_capable() helper function to simplify the code.

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Link: https://lore.kernel.org/r/20240620032335.147136-3-youling.tang@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25 11:15:48 +02:00
Youling Tang
8444ee22ad
f2fs: Use in_group_or_capable() helper
Use the in_group_or_capable() helper function to simplify the code.

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Link: https://lore.kernel.org/r/20240620032335.147136-2-youling.tang@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25 11:15:48 +02:00
Youling Tang
9b6a14f08b
fs: Export in_group_or_capable()
Export in_group_or_capable() as a VFS helper function.

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Link: https://lore.kernel.org/r/20240620032335.147136-1-youling.tang@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25 11:15:48 +02:00
Mateusz Guzik
5e362bd5ee
vfs: reorder checks in may_create_in_sticky
The routine is called for all directories on file creation and weirdly
postpones the check if the dir is sticky to begin with. Instead it first
checks fifos and regular files (in that order), while avoidably pulling
globals.

No functional changes.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240620120359.151258-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25 11:15:47 +02:00
Chao Yu
26a2ed1079
hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode()
Syzbot reports uninitialized value access issue as below:

loop0: detected capacity change from 0 to 64
=====================================================
BUG: KMSAN: uninit-value in hfs_revalidate_dentry+0x307/0x3f0 fs/hfs/sysdep.c:30
 hfs_revalidate_dentry+0x307/0x3f0 fs/hfs/sysdep.c:30
 d_revalidate fs/namei.c:862 [inline]
 lookup_fast+0x89e/0x8e0 fs/namei.c:1649
 walk_component fs/namei.c:2001 [inline]
 link_path_walk+0x817/0x1480 fs/namei.c:2332
 path_lookupat+0xd9/0x6f0 fs/namei.c:2485
 filename_lookup+0x22e/0x740 fs/namei.c:2515
 user_path_at_empty+0x8b/0x390 fs/namei.c:2924
 user_path_at include/linux/namei.h:57 [inline]
 do_mount fs/namespace.c:3689 [inline]
 __do_sys_mount fs/namespace.c:3898 [inline]
 __se_sys_mount+0x66b/0x810 fs/namespace.c:3875
 __x64_sys_mount+0xe4/0x140 fs/namespace.c:3875
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

BUG: KMSAN: uninit-value in hfs_ext_read_extent fs/hfs/extent.c:196 [inline]
BUG: KMSAN: uninit-value in hfs_get_block+0x92d/0x1620 fs/hfs/extent.c:366
 hfs_ext_read_extent fs/hfs/extent.c:196 [inline]
 hfs_get_block+0x92d/0x1620 fs/hfs/extent.c:366
 block_read_full_folio+0x4ff/0x11b0 fs/buffer.c:2271
 hfs_read_folio+0x55/0x60 fs/hfs/inode.c:39
 filemap_read_folio+0x148/0x4f0 mm/filemap.c:2426
 do_read_cache_folio+0x7c8/0xd90 mm/filemap.c:3553
 do_read_cache_page mm/filemap.c:3595 [inline]
 read_cache_page+0xfb/0x2f0 mm/filemap.c:3604
 read_mapping_page include/linux/pagemap.h:755 [inline]
 hfs_btree_open+0x928/0x1ae0 fs/hfs/btree.c:78
 hfs_mdb_get+0x260c/0x3000 fs/hfs/mdb.c:204
 hfs_fill_super+0x1fb1/0x2790 fs/hfs/super.c:406
 mount_bdev+0x628/0x920 fs/super.c:1359
 hfs_mount+0xcd/0xe0 fs/hfs/super.c:456
 legacy_get_tree+0x167/0x2e0 fs/fs_context.c:610
 vfs_get_tree+0xdc/0x5d0 fs/super.c:1489
 do_new_mount+0x7a9/0x16f0 fs/namespace.c:3145
 path_mount+0xf98/0x26a0 fs/namespace.c:3475
 do_mount fs/namespace.c:3488 [inline]
 __do_sys_mount fs/namespace.c:3697 [inline]
 __se_sys_mount+0x919/0x9e0 fs/namespace.c:3674
 __ia32_sys_mount+0x15b/0x1b0 fs/namespace.c:3674
 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
 __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
 entry_SYSENTER_compat_after_hwframe+0x70/0x82

Uninit was created at:
 __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590
 __alloc_pages_node include/linux/gfp.h:238 [inline]
 alloc_pages_node include/linux/gfp.h:261 [inline]
 alloc_slab_page mm/slub.c:2190 [inline]
 allocate_slab mm/slub.c:2354 [inline]
 new_slab+0x2d7/0x1400 mm/slub.c:2407
 ___slab_alloc+0x16b5/0x3970 mm/slub.c:3540
 __slab_alloc mm/slub.c:3625 [inline]
 __slab_alloc_node mm/slub.c:3678 [inline]
 slab_alloc_node mm/slub.c:3850 [inline]
 kmem_cache_alloc_lru+0x64d/0xb30 mm/slub.c:3879
 alloc_inode_sb include/linux/fs.h:3018 [inline]
 hfs_alloc_inode+0x5a/0xc0 fs/hfs/super.c:165
 alloc_inode+0x83/0x440 fs/inode.c:260
 new_inode_pseudo fs/inode.c:1005 [inline]
 new_inode+0x38/0x4f0 fs/inode.c:1031
 hfs_new_inode+0x61/0x1010 fs/hfs/inode.c:186
 hfs_mkdir+0x54/0x250 fs/hfs/dir.c:228
 vfs_mkdir+0x49a/0x700 fs/namei.c:4126
 do_mkdirat+0x529/0x810 fs/namei.c:4149
 __do_sys_mkdirat fs/namei.c:4164 [inline]
 __se_sys_mkdirat fs/namei.c:4162 [inline]
 __x64_sys_mkdirat+0xc8/0x120 fs/namei.c:4162
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

It missed to initialize .tz_secondswest, .cached_start and .cached_blocks
fields in struct hfs_inode_info after hfs_alloc_inode(), fix it.

Cc: stable@vger.kernel.org
Reported-by: syzbot+3ae6be33a50b5aae4dab@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-fsdevel/0000000000005ad04005ee48897f@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240616013841.2217-1-chao@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25 11:15:47 +02:00
Christophe JAILLET
7f07ee5a23
proc: Remove usage of the deprecated ida_simple_xx() API
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_max() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/ae10003feb87d240163d0854de95f09e1f00be7d.1717855701.git.christophe.jaillet@wanadoo.fr
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25 11:15:47 +02:00
Chao Yu
be4edd1642
hfsplus: fix to avoid false alarm of circular locking
Syzbot report potential ABBA deadlock as below:

loop0: detected capacity change from 0 to 1024
======================================================
WARNING: possible circular locking dependency detected
6.9.0-syzkaller-10323-g8f6a15f095a6 #0 Not tainted
------------------------------------------------------
syz-executor171/5344 is trying to acquire lock:
ffff88807cb980b0 (&tree->tree_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595

but task is already holding lock:
ffff88807a930108 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x2da/0xb50 fs/hfsplus/extents.c:576

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __mutex_lock_common kernel/locking/mutex.c:608 [inline]
       __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
       hfsplus_file_extend+0x21b/0x1b70 fs/hfsplus/extents.c:457
       hfsplus_bmap_reserve+0x105/0x4e0 fs/hfsplus/btree.c:358
       hfsplus_rename_cat+0x1d0/0x1050 fs/hfsplus/catalog.c:456
       hfsplus_rename+0x12e/0x1c0 fs/hfsplus/dir.c:552
       vfs_rename+0xbdb/0xf00 fs/namei.c:4887
       do_renameat2+0xd94/0x13f0 fs/namei.c:5044
       __do_sys_rename fs/namei.c:5091 [inline]
       __se_sys_rename fs/namei.c:5089 [inline]
       __x64_sys_rename+0x86/0xa0 fs/namei.c:5089
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&tree->tree_lock){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3134 [inline]
       check_prevs_add kernel/locking/lockdep.c:3253 [inline]
       validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
       __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __mutex_lock_common kernel/locking/mutex.c:608 [inline]
       __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
       hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595
       hfsplus_setattr+0x1ce/0x280 fs/hfsplus/inode.c:265
       notify_change+0xb9d/0xe70 fs/attr.c:497
       do_truncate+0x220/0x310 fs/open.c:65
       handle_truncate fs/namei.c:3308 [inline]
       do_open fs/namei.c:3654 [inline]
       path_openat+0x2a3d/0x3280 fs/namei.c:3807
       do_filp_open+0x235/0x490 fs/namei.c:3834
       do_sys_openat2+0x13e/0x1d0 fs/open.c:1406
       do_sys_open fs/open.c:1421 [inline]
       __do_sys_creat fs/open.c:1497 [inline]
       __se_sys_creat fs/open.c:1491 [inline]
       __x64_sys_creat+0x123/0x170 fs/open.c:1491
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&HFSPLUS_I(inode)->extents_lock);
                               lock(&tree->tree_lock);
                               lock(&HFSPLUS_I(inode)->extents_lock);
  lock(&tree->tree_lock);

This is a false alarm as tree_lock mutex are different, one is
from sbi->cat_tree, and another is from sbi->ext_tree:

Thread A			Thread B
- hfsplus_rename
 - hfsplus_rename_cat
  - hfs_find_init
   - mutext_lock(cat_tree->tree_lock)
				- hfsplus_setattr
				 - hfsplus_file_truncate
				  - mutex_lock(hip->extents_lock)
				  - hfs_find_init
				   - mutext_lock(ext_tree->tree_lock)
  - hfs_bmap_reserve
   - hfsplus_file_extend
    - mutex_lock(hip->extents_lock)

So, let's call mutex_lock_nested for tree_lock mutex lock, and pass
correct lock class for it.

Fixes: 31651c6071 ("hfsplus: avoid deadlock on file truncation")
Reported-by: syzbot+6030b3b1b9bf70e538c4@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-fsdevel/000000000000e37a4005ef129563@google.com
Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240607142304.455441-1-chao@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-21 11:41:04 +02:00
Jemmy
deebbd505c
Improve readability of copy_tree
by employing `copy mount tree from src to dst` concept.
This involves renaming the opaque variables (e.g., p, q, r, s)
to be more descriptive, aiming to make the code easier to understand.

Changes:
mnt     -> src_root (root of the tree to copy)
r       -> src_root_child (direct child of the root being cloning)
p       -> src_parent (parent of src_mnt)
s       -> src_mnt (current mount being copying)
parent  -> dst_parent (parent of dst_child)
q       -> dst_mnt (freshly cloned mount)

Signed-off-by: Jemmy <jemmywong512@gmail.com>
Link: https://lore.kernel.org/r/20240606173912.99442-1-jemmywong512@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-21 11:41:04 +02:00
Mateusz Guzik
d4f50ea957
vfs: shave a branch in getname_flags
Check for an error while copying and no path in one branch.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240604155257.109500-4-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-21 11:40:49 +02:00
Mateusz Guzik
dff60734fc vfs: retire user_path_at_empty and drop empty arg from getname_flags
No users after do_readlinkat started doing the job on its own.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240604155257.109500-3-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-05 17:03:57 +02:00
Mateusz Guzik
969ce92da3 vfs: stop using user_path_at_empty in do_readlinkat
It is the only consumer and it saddles getname_flags with an argument set
to NULL by everyone else.

Instead the routine can do the empty check on its own.

Then user_path_at_empty can get retired and getname_flags can lose the
argument.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240604155257.109500-2-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-05 17:03:56 +02:00
Mikulas Patocka
ca86a5d2f9 tmpfs: don't interrupt fallocate with EINTR
I have a program that sets up a periodic timer with 10ms interval. When
the program attempts to call fallocate(2) on tmpfs, it goes into an
infinite loop. fallocate(2) takes longer than 10ms, so it gets
interrupted by a signal and it returns EINTR. On EINTR, the fallocate
call is restarted, going into the same loop again.

Let's change the signal_pending() check in shmem_fallocate() loop to
fatal_signal_pending(). This solves the problem of shmem_fallocate()
constantly restarting. Since most other filesystem's fallocate methods
don't react to signals, it is unlikely userspace really relies on timely
delivery of non-fatal signals while fallocate is running. Also the
comment before the signal check:

/*
 * Good, the fallocate(2) manpage permits EINTR: we may have
 * been interrupted because we are using up too much memory.
 */

indicates that the check was mainly added for OOM situations in which
case the process will be sent a fatal signal so this change preserves
the behavior in OOM situations.

[JK: Update changelog and comment based on upstream discussion]

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240515221044.590-1-jack@suse.cz
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-03 16:35:17 +02:00
Christian Brauner
2a010c4128 fs: don't block i_writecount during exec
Back in 2021 we already discussed removing deny_write_access() for
executables. Back then I was hesistant because I thought that this might
cause issues in userspace. But even back then I had started taking some
notes on what could potentially depend on this and I didn't come up with
a lot so I've changed my mind and I would like to try this.

Here are some of the notes that I took:

(1) The deny_write_access() mechanism is causing really pointless issues
    such as [1]. If a thread in a thread-group opens a file writable,
    then writes some stuff, then closing the file descriptor and then
    calling execve() they can fail the execve() with ETXTBUSY because
    another thread in the thread-group could have concurrently called
    fork(). Multi-threaded libraries such as go suffer from this.

(2) There are userspace attacks that rely on overwriting the binary of a
    running process. These attacks are _mitigated_ but _not at all
    prevented_ from ocurring by the deny_write_access() mechanism.

    I'll go over some details. The clearest example of such attacks was
    the attack against runC in CVE-2019-5736 (cf. [3]).

    An attack could compromise the runC host binary from inside a
    _privileged_ runC container. The malicious binary could then be used
    to take over the host.

    (It is crucial to note that this attack is _not_ possible with
     unprivileged containers. IOW, the setup here is already insecure.)

    The attack can be made when attaching to a running container or when
    starting a container running a specially crafted image. For example,
    when runC attaches to a container the attacker can trick it into
    executing itself.

    This could be done by replacing the target binary inside the
    container with a custom binary pointing back at the runC binary
    itself. As an example, if the target binary was /bin/bash, this
    could be replaced with an executable script specifying the
    interpreter path #!/proc/self/exe.

    As such when /bin/bash is executed inside the container, instead the
    target of /proc/self/exe will be executed. That magic link will
    point to the runc binary on the host. The attacker can then proceed
    to write to the target of /proc/self/exe to try and overwrite the
    runC binary on the host.

    However, this will not succeed because of deny_write_access(). Now,
    one might think that this would prevent the attack but it doesn't.

    To overcome this, the attacker has multiple ways:
    * Open a file descriptor to /proc/self/exe using the O_PATH flag and
      then proceed to reopen the binary as O_WRONLY through
      /proc/self/fd/<nr> and try to write to it in a busy loop from a
      separate process. Ultimately it will succeed when the runC binary
      exits. After this the runC binary is compromised and can be used
      to attack other containers or the host itself.
    * Use a malicious shared library annotating a function in there with
      the constructor attribute making the malicious function run as an
      initializor. The malicious library will then open /proc/self/exe
      for creating a new entry under /proc/self/fd/<nr>. It'll then call
      exec to a) force runC to exit and b) hand the file descriptor off
      to a program that then reopens /proc/self/fd/<nr> for writing
      (which is now possible because runC has exited) and overwriting
      that binary.

    To sum up: the deny_write_access() mechanism doesn't prevent such
    attacks in insecure setups. It just makes them minimally harder.
    That's all.

    The only way back then to prevent this is to create a temporary copy
    of the calling binary itself when it starts or attaches to
    containers. So what I did back then for LXC (and Aleksa for runC)
    was to create an anonymous, in-memory file using the memfd_create()
    system call and to copy itself into the temporary in-memory file,
    which is then sealed to prevent further modifications. This sealed,
    in-memory file copy is then executed instead of the original on-disk
    binary.

    Any compromising write operations from a privileged container to the
    host binary will then write to the temporary in-memory binary and
    not to the host binary on-disk, preserving the integrity of the host
    binary. Also as the temporary, in-memory binary is sealed, writes to
    this will also fail.

    The point is that deny_write_access() is uselss to prevent these
    attacks.

(3) Denying write access to an inode because it's currently used in an
    exec path could easily be done on an LSM level. It might need an
    additional hook but that should be about it.

(4) The MAP_DENYWRITE flag for mmap() has been deprecated a long time
    ago so while we do protect the main executable the bigger portion of
    the things you'd think need protecting such as the shared libraries
    aren't. IOW, we let anyone happily overwrite shared libraries.

(5) We removed all remaining uses of VM_DENYWRITE in [2]. That means:
    (5.1) We removed the legacy uselib() protection for preventing
          overwriting of shared libraries. Nobody cared in 3 years.
    (5.2) We allow write access to the elf interpreter after exec
          completed treating it on a par with shared libraries.

Yes, someone in userspace could potentially be relying on this. It's not
completely out of the realm of possibility but let's find out if that's
actually the case and not guess.

Link: https://github.com/golang/go/issues/22315 [1]
Link: 49624efa65 ("Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux") [2]
Link: https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736 [3]
Link: https://lwn.net/Articles/866493
Link: https://github.com/golang/go/issues/22220
Link: 5bf8c0cf09/src/cmd/go/internal/work/buildid.go (L724)
Link: 5bf8c0cf09/src/cmd/go/internal/work/exec.go (L1493)
Link: 5bf8c0cf09/src/cmd/go/internal/script/cmds.go (L457)
Link: 5bf8c0cf09/src/cmd/go/internal/test/test.go (L1557)
Link: 5bf8c0cf09/src/os/exec/lp_linux_test.go (L61)
Link: https://github.com/buildkite/agent/pull/2736
Link: https://github.com/rust-lang/rust/issues/114554
Link: https://bugs.openjdk.org/browse/JDK-8068370
Link: https://github.com/dotnet/runtime/issues/58964
Link: https://lore.kernel.org/r/20240531-vfs-i_writecount-v1-1-a17bea7ee36b@kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-03 15:52:10 +02:00
Thorsten Blum
992f03ff86 readdir: Add missing quote in macro comment
Add a missing double quote in the unsafe_copy_dirent_name() macro
comment.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Link: https://lore.kernel.org/r/20240602004729.229634-2-thorsten.blum@toblux.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-03 15:49:26 +02:00
Thorsten Blum
c12c0bb031 readdir: Remove unused header include
Since commit c512c69187 ("uaccess: implement a proper
unsafe_copy_to_user() and switch filldir over to it") the header file is
no longer needed.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Link: https://lore.kernel.org/r/20240602101534.348159-2-thorsten.blum@toblux.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-03 15:48:39 +02:00
Mateusz Guzik
54018131e6 vfs: replace WARN(down_read_trylock, ...) abuse with proper asserts
Note the macro used here works regardless of LOCKDEP.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240602123720.775702-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-03 15:45:47 +02:00
Christian Brauner
620c266f39 fhandle: relax open_by_handle_at() permission checks
A current limitation of open_by_handle_at() is that it's currently not possible
to use it from within containers at all because we require CAP_DAC_READ_SEARCH
in the initial namespace. That's unfortunate because there are scenarios where
using open_by_handle_at() from within containers.

Two examples:

(1) cgroupfs allows to encode cgroups to file handles and reopen them with
    open_by_handle_at().
(2) Fanotify allows placing filesystem watches they currently aren't usable in
    containers because the returned file handles cannot be used.

Here's a proposal for relaxing the permission check for open_by_handle_at().

(1) Opening file handles when the caller has privileges over the filesystem
    (1.1) The caller has an unobstructed view of the filesystem.
    (1.2) The caller has permissions to follow a path to the file handle.

This doesn't address the problem of opening a file handle when only a portion
of a filesystem is exposed as is common in containers by e.g., bind-mounting a
subtree. The proposal to solve this use-case is:

(2) Opening file handles when the caller has privileges over a subtree
    (2.1) The caller is able to reach the file from the provided mount fd.
    (2.2) The caller has permissions to construct an unobstructed path to the
          file handle.
    (2.3) The caller has permissions to follow a path to the file handle.

The relaxed permission checks are currently restricted to directory file
handles which are what both cgroupfs and fanotify need. Handling disconnected
non-directory file handles would lead to a potentially non-deterministic api.
If a disconnected non-directory file handle is provided we may fail to decode
a valid path that we could use for permission checking. That in itself isn't a
problem as we would just return EACCES in that case. However, confusion may
arise if a non-disconnected dentry ends up in the cache later and those opening
the file handle would suddenly succeed.

* It's potentially possible to use timing information (side-channel) to infer
  whether a given inode exists. I don't think that's particularly
  problematic. Thanks to Jann for bringing this to my attention.

* An unrelated note (IOW, these are thoughts that apply to
  open_by_handle_at() generically and are unrelated to the changes here):
  Jann pointed out that we should verify whether deleted files could
  potentially be reopened through open_by_handle_at(). I don't think that's
  possible though.

  Another potential thing to check is whether open_by_handle_at() could be
  abused to open internal stuff like memfds or gpu stuff. I don't think so
  but I haven't had the time to completely verify this.

This dates back to discussions Amir and I had quite some time ago and thanks to
him for providing a lot of details around the export code and related patches!

Link: https://lore.kernel.org/r/20240524-vfs-open_by_handle_at-v1-1-3d4b7d22736b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-28 15:57:23 +02:00
Hongbo Li
ef44c8ab06 fs: fsconfig: intercept non-new mount API in advance for FSCONFIG_CMD_CREATE_EXCL command
fsconfig with FSCONFIG_CMD_CREATE_EXCL command requires the new mount api,
here we should return -EOPNOTSUPP in advance to avoid extra procedure.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20240522030422.315892-1-lihongbo22@huawei.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-27 15:58:46 +02:00
Jeff Layton
3aa63a569c fs: switch timespec64 fields in inode to discrete integers
Adjacent struct timespec64's pack poorly. Switch them to discrete
integer fields for the seconds and nanoseconds. This shaves 8 bytes off
struct inode with a garden-variety Fedora Kconfig on x86_64, but that
also moves the i_lock into the previous cacheline, away from the fields
it protects.

To remedy that, move i_generation above the i_lock, which moves the new
4-byte hole to just after the i_fsnotify_mask in my setup. Amir has
plans to use that to expand the i_fsnotify_mask, so add a comment to
that effect as well.

Cc: Amir Goldstein <amir73il@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240517-amtime-v1-1-7b804ca4be8f@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-27 15:58:46 +02:00
Justin Stitt
23cc6ef6fd fs: remove accidental overflow during wraparound check
Running syzkaller with the newly enabled signed integer overflow
sanitizer produces this report:

[  195.401651] ------------[ cut here ]------------
[  195.404808] UBSAN: signed-integer-overflow in ../fs/open.c:321:15
[  195.408739] 9223372036854775807 + 562984447377399 cannot be represented in type 'loff_t' (aka 'long long')
[  195.414683] CPU: 1 PID: 703 Comm: syz-executor.0 Not tainted 6.8.0-rc2-00039-g14de58dbe653-dirty #11
[  195.420138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  195.425804] Call Trace:
[  195.427360]  <TASK>
[  195.428791]  dump_stack_lvl+0x93/0xd0
[  195.431150]  handle_overflow+0x171/0x1b0
[  195.433640]  vfs_fallocate+0x459/0x4f0
...
[  195.490053] ------------[ cut here ]------------
[  195.493146] UBSAN: signed-integer-overflow in ../fs/open.c:321:61
[  195.497030] 9223372036854775807 + 562984447377399 cannot be represented in type 'loff_t' (aka 'long long)
[  195.502940] CPU: 1 PID: 703 Comm: syz-executor.0 Not tainted 6.8.0-rc2-00039-g14de58dbe653-dirty #11
[  195.508395] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  195.514075] Call Trace:
[  195.515636]  <TASK>
[  195.517000]  dump_stack_lvl+0x93/0xd0
[  195.519255]  handle_overflow+0x171/0x1b0
[  195.521677]  vfs_fallocate+0x4cb/0x4f0
[  195.524033]  __x64_sys_fallocate+0xb2/0xf0

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang. It was re-enabled in the
kernel with Commit 557f8c582a ("ubsan: Reintroduce signed overflow
sanitizer").

Let's use the check_add_overflow helper to first verify the addition
stays within the bounds of its type (long long); then we can use that
sum for the following check.

Link: https://github.com/llvm/llvm-project/pull/82432 [1]
Closes: https://github.com/KSPP/linux/issues/356
Cc: linux-hardening@vger.kernel.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240513-b4-sio-vfs_fallocate-v2-1-db415872fb16@google.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-27 15:58:46 +02:00
Linus Torvalds
1613e604df Linux 6.10-rc1 2024-05-26 15:20:12 -07:00
Kent Overstreet
9b0abe7948 mm: percpu: Include smp.h in alloc_tag.h
percpu.h depends on smp.h, but doesn't include it directly because of
circular header dependency issues; percpu.h is needed in a bunch of low
level headers.

This fixes a randconfig build error on mips:

  include/linux/alloc_tag.h: In function '__alloc_tag_ref_set':
  include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration]

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 24e44cc22a ("mm: percpu: enable per-cpu allocation tagging")
Closes: https://lore.kernel.org/oe-kbuild-all/202405210052.DIrMXJNz-lkp@intel.com/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-05-26 14:40:39 -07:00
Linus Torvalds
6fbf71854e Revert a patch causing a regression as described in the cset:
"This made a simple 'perf record -e cycles:pp make -j199' stop working on
     the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
     at length in the threads in the Link tags below.
 
     The fix provided by Ian wasn't acceptable and work to fix this will take
     time we don't have at this point, so lets revert this and work on it on
     the next devel cycle."
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZlMhdgAKCRCyPKLppCJ+
 JzCNAPwM7gQLjeoCdkn9KDl1fj1R7/jBE/TsVqP9s1htc9vJEgD/UgLDkpdzlxBC
 HbndVTOrnvyV9ySA28654ODHpQxmXwM=
 =WJwT
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tool fix from Arnaldo Carvalho de Melo:
 "Revert a patch causing a regression.

  This made a simple 'perf record -e cycles:pp make -j199' stop working
  on the Ampere ARM64 system Linus uses to test ARM64 kernels".

* tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
2024-05-26 09:54:26 -07:00
Arnaldo Carvalho de Melo
4f1b067359 Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
This reverts commit 617824a7f0.

This made a simple 'perf record -e cycles:pp make -j199' stop working on
the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
at length in the threads in the Link tags below.

The fix provided by Ian wasn't acceptable and work to fix this will take
time we don't have at this point, so lets revert this and work on it on
the next devel cycle.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Ethan Adams <j.ethan.adams@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com
Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-26 08:41:34 -03:00
Linus Torvalds
c13320499b four smb client fixes, including two important netfs integration fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmZRYNIACgkQiiy9cAdy
 T1FhPAv7BQhGc2HrloB74G5EQaPaCFdWOihIKCZMc15oGsrsTuPpFvOBbV6E3dyZ
 HYthBS31nSO9Nyy6+J7zXyZGTys20rB8fbO7E9RiyTcZcKFbw3zdTyAoUlklnn3a
 0wwKzQLOYDMGdnLYbL7lQR1/qAoFq+NQ7gACn+HeASPxRbJ+7Y8+USHPimUtUw52
 XnJG4bfIDhZhoPIztNMeodR3lkvpzPy0eP4xE856e6z4I7VGHukqBwEnwytz23Op
 thciepFzK2S9G7C7s4VBe7nyko+6SH7VbumU7Zb9/1rSeDYaJOGnGFUFpeib50P9
 f5Mby8JM9pnnAURJ4/0P5sFyhcveBMuoOjQsbCKZnfxqqldQn4dLgG/oXCylXjNq
 mWRfPxIZwNLUqfAbocN1eczWG2ozwbrxJYzDbYz6RepyNKus0b6oniGGgU5Eo0Au
 OAZW/QJ567mzu5hfhn6iWyzsncwtyCLor/nM4buO5Vs68xsJIdsVLjGZRNrF7gpV
 ScE1TfDe
 =p0nS
 -----END PGP SIGNATURE-----

Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - two important netfs integration fixes - including for a data
   corruption and also fixes for multiple xfstests

 - reenable swap support over SMB3

* tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix missing set of remote_i_size
  cifs: Fix smb3_insert_range() to move the zero_point
  cifs: update internal version number
  smb3: reenable swapfiles over SMB3 mounts
2024-05-25 22:33:10 -07:00
Linus Torvalds
9b62e02e63 16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes,
 various singletons fixing various issues in various parts.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZlIOUgAKCRDdBJ7gKXxA
 jrYnAP9UeOw8YchTIsjEllmAbTMAqWGI+54CU/qD78jdIHoVWAEAmp0QqgFW3r2p
 jze4jBkh3lGQjykTjkUskaR71h9AZww=
 =AHeV
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "16 hotfixes, 11 of which are cc:stable.

  A few nilfs2 fixes, the remainder are for MM: a couple of selftests
  fixes, various singletons fixing various issues in various parts"

* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/ksm: fix possible UAF of stable_node
  mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
  mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
  nilfs2: fix potential hang in nilfs_detach_log_writer()
  nilfs2: fix unexpected freezing of nilfs_segctor_sync()
  nilfs2: fix use-after-free of timer for log writer thread
  selftests/mm: fix build warnings on ppc64
  arm64: patching: fix handling of execmem addresses
  selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
  selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
  selftests/mm: compaction_test: fix bogus test success on Aarch64
  mailmap: update email address for Satya Priya
  mm/huge_memory: don't unpoison huge_zero_folio
  kasan, fortify: properly rename memintrinsics
  lib: add version into /proc/allocinfo output
  mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
2024-05-25 15:10:33 -07:00
Linus Torvalds
a0db36ed57 Misc fixes:
- Fix x86 IRQ vector leak caused by a CPU offlining race
 
  - Fix build failure in the riscv-imsic irqchip driver
    caused by an API-change semantic conflict
 
  - Fix use-after-free in irq_find_at_or_after()
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZRwMURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h/zQ//TTrgyXi6+1xXY4R0LDU45j+wavMTMkq3
 kM3eUeyXgy+FDtvLRVaYgEAYbtuR4LGFN9qmVuEHJPZQwpi3AFlnGFUFjFUvyE43
 xJuOtHoxFv3mj09VgRGsjZvzp8bxYSkEn3h0ryTWGUHzR+QmoQmYWrU6HExgXw3R
 +s8pvi14g6R/+PAy05cF0k1J7aeSsYaOfd38D/XnpyhuhXvPMS2eHgovV6I5Qhk4
 5lV6rzJv8XlKxVr7bOYJkRePE3z0HMtx0G7eo8eYERBQapHede18V8imv4OpUiua
 vmG8cFhF4Lq9KFdEtiVuf1X9/XH3PoEKTGA81oqQ9lLN9USx7ME/Peg6U5ezvEkp
 YmQx2LS12DWqYp5PZQTN0CHnfmMLgksmyGELM3JE/dFFCVh4HdpMrh+2wLwWGRJ3
 JLzAJh3YwcPhayLpNVgsSF9AtLKTkDoS0bHd43mHnB6VaEKkus8zbeuCxYAsUeMJ
 5wCZw3xQjTZEaMMNd1hJN5O/9TX2of+T6Z4C4cacMBmwpD7vX5oXmDYLE/wUHw6m
 9Z67fvOvTdIf3MkYSqjGXFKD1JobL/PmwCfaaGUQFVJkbX5WVNDk6C1zgs5FhmuY
 U/AcYfadbNdLVXrN3VLnX6Gmb7gFPShOAE1GgXGeszSReI4pbOUy2zopRGAEWSZS
 fRu8nyveGjw=
 =vxJh
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:

 - Fix x86 IRQ vector leak caused by a CPU offlining race

 - Fix build failure in the riscv-imsic irqchip driver
   caused by an API-change semantic conflict

 - Fix use-after-free in irq_find_at_or_after()

* tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
  genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
  irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
2024-05-25 14:48:40 -07:00
Linus Torvalds
3a390f24b7 Miscellaneous fixes:
- Fix regressions of the new x86 CPU VFM (vendor/family/model)
    enumeration/matching code
 
  - Fix crash kernel detection on buggy firmware with
    non-compliant ACPI MADT tables
 
  - Address Kconfig warning
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZRvZwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iDhg//fWdgn2x4B4hEiCQQVYGpHLua59jduucb
 oh+NF1jGu75obRu3gdQJLR7OYjxntWf2ryxpS+kkyLP6sYeMAoL4vnIoi5XJGJ13
 VG0BcbXA8asL2KEfVz66AENAGQjAGr7Bcg9urHIuw8Nz6lJqaFyQkdQJfeClWtdL
 zkgnCdooP1eREgfrQH5+hhTCrr/GwiFUU+wNKeIIpis1enMZYMiqA5U23w3DKlP8
 Jx0cRY7ysa63O/H9oD01edRPkZpfbMqAocVwc9v42zOjlJLZYAtAW4mSC+GhG9X6
 iGFWiW1ROBte/HYLE1LdKfahO990Tw0GsIcS42E8AtYfVu/W7U525SyKG100ndYH
 nVoUSOPWF8YCT810YtOEM2ueMQKZMEjB8yAp5QQIi2NMcgkFxNdVQiC8zFATisHd
 KFdEkH2fDGW9YiUNRBYjI/da3Q2v83JwAIKnYXmoFjcru4iJOPDIFdGZcJDh7oNW
 ys/SWSK5dJkbLz+cHm8E5ceLTpZTsFHJm1Vd1W2gU/jkESBW/2i1rZ757ykfHURe
 N7JUPI4g0DOVj8Elket9gnKD/xVFg/lsTnA1/5wxdWhWhJzZcM/XyICATno1/BaY
 STWUmUr6sTsoB4+2PRuFC2zaRqIstLbkmKOAlHezd4uIwFxznQAg1K8f8kHlqfLH
 l3VA8nRbOFc=
 =nIcm
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix regressions of the new x86 CPU VFM (vendor/family/model)
   enumeration/matching code

 - Fix crash kernel detection on buggy firmware with
   non-compliant ACPI MADT tables

 - Address Kconfig warning

* tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
  crypto: x86/aes-xts - switch to new Intel CPU model defines
  x86/topology: Handle bogus ACPI tables correctly
  x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
2024-05-25 14:40:09 -07:00
Linus Torvalds
56676c4c06 ipmi: Mostly updates for deprecated interfaces
These changes are mostly updates for deprecated interfaces,
 platform.remove and converting from a tasklet to a BH workqueue.  Also
 use HAS_IOPORT for disabling inb()/outb().
 
 -corey
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE/Q1c5nzg9ZpmiCaGYfOMkJGb/4EFAmZCYRMACgkQYfOMkJGb
 /4FqGBAAq2KsgdpCSquJMl8uvlxXzxpZJsUpZBMPeZyI9kJUPimNVvkd0iz2zx4C
 0yU9UxUcIViU1MuL8KyRhC/vpoDg3cZLspvkntWoU/qGcfQMsnxgbuogBKXTjRWt
 cOi/rFSBFXMv0gQ+aNWlFHVZdzQQ1vuiIIZNvIBNjvfaJargYXlUX91ejY/1DZHc
 Eeq0N7AiswPD0q1KZjJ5W2YYwqiNRubUbA2eq2mFvTbODi0JPgyNYI0qikMTFGM/
 L2vFRylqzMr63uFh5j+Dc1oJqdc+XtqOMIDd+34h+43td7XbBW+magdNBBLzMuYt
 HmrvBvlNH1zjXVEpAkqrOFnBYCisynKXvco7rhyTX1MaLuBo+pFNBzFlOw9s6Ydv
 +4VIBMcyz7ZVcOQgrZX5d5ly5Oh5wjAnIN3IPeDczaieMlfhk+e/MJBwDNT/7frD
 j7bnQ0tuzMvMhtZ/YfdzOZR/08cBMPRB2EJBGIQrTZeKJPcTLW73IkbQqQ6/8San
 dOgOqOGx/mAegf0vRp8ZPlyYAlSZ5l5cANRyWZjG8+EkajFB26iJmXY0jyf1yZ+I
 Vzb79YqmtB2eu/fFuJ5suF4UkvI8RLFEJxnPRvBe+420rFHs4yHL5TzyzIOACL29
 OGAoatvC7qBWNYFWh8P3jFyxScVBC4lV1FYWwoCwOFevMz9kAXw=
 =IPex
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi

Pull ipmi updates from Corey Minyard:
 "Mostly updates for deprecated interfaces, platform.remove and
  converting from a tasklet to a BH workqueue.

  Also use HAS_IOPORT for disabling inb()/outb()"

* tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi:
  ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void
  ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void
  ipmi: ipmi_ssif: Convert to platform remove callback returning void
  ipmi: ipmi_si_platform: Convert to platform remove callback returning void
  ipmi: ipmi_powernv: Convert to platform remove callback returning void
  ipmi: bt-bmc: Convert to platform remove callback returning void
  char: ipmi: handle HAS_IOPORT dependencies
  ipmi: Convert from tasklet to BH workqueue
2024-05-25 14:32:29 -07:00
Linus Torvalds
74eca356f6 We have a series from Xiubo that adds support for additional access
checks based on MDS auth caps which were recently made available to
 clients.  This is needed to prevent scenarios where the MDS quietly
 discards updates that a UID-restricted client previously (wrongfully)
 acked to the user.  Other than that, just a documentation fixup.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmZRsm0THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi6++B/4o7j4CNzjJcBw9UgxEUugwJYBe2Ht3
 vSTUkHD9NILVgrYSHNhgkCdvU8ckv8Sd+7W/Kb0BC/GRyQd57F7zoM6mvR1WMozt
 lLbYU/kdVI+TcIY2bhupMma5f+nWv6vIBTzca78UhogEBuIEYHAG3BnNaT/AEqF+
 yZ3uQEVQ2bHmwbPn3A5dibYxOR8zLyhmaq/RUvqpiiYcSkEfZ6QqKiMlZkcVD7F8
 c+NfjwXXGNTXDhfIbG4VndQi7xLXk3GI5E8xvdo2ALwumfp2KdVlMEYT8SHggSPS
 A8bWq+d5o7uVV6WviNK63XcGRpadCSSSL310vR78K+tNIIq5dnI81IsU
 =6c1e
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A series from Xiubo that adds support for additional access checks
  based on MDS auth caps which were recently made available to clients.

  This is needed to prevent scenarios where the MDS quietly discards
  updates that a UID-restricted client previously (wrongfully) acked to
  the user.

  Other than that, just a documentation fixup"

* tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
  doc: ceph: update userspace command to get CephFS metadata
  ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
  ceph: check the cephx mds auth access for async dirop
  ceph: check the cephx mds auth access for open
  ceph: check the cephx mds auth access for setattr
  ceph: add ceph_mds_check_access() helper
  ceph: save cap_auths in MDS client when session is opened
2024-05-25 14:23:58 -07:00
Linus Torvalds
89b61ca478 driver ntfs3 for linux 6.10
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmZQY1sACgkQqbAzH4Mk
 B7bMZQ/9Ea7GFiNcIS+tCOIn5VhmGP7Dwd92T0jS9W0AiTkM83hql4lzamrSTR7B
 7tQ2lNakUYIwMUqrZgTrzx5eEgWXU4YH43Az+0m/lfTpBGr0wKiXB/3yu84sdoCG
 87PkxOO+hDdK11zQ+hHkVvNBpmYSNh9GIPET4L+SMNgJztJ3ZnYWSmne0EBdI3S7
 PAeFMgxag67p1zqg4qIVrO4PMeXd9WBsKL3oO1E/abXoQnHV6I5PBdYHQaafVRy8
 Meokx+uyJALScI/AEEigBlCDQ6MYBIWvtb4T4+eSJCSaMMqCRj66zynEPL5oY6Z3
 Ah2IR+8wiwKjDMyYODZI8nqwJtZOVne0EprHNLzu+dQ4e60N5gQbPWRWlcWJmHaT
 2rhNv1uUSnXnU3oLaqkXza9nayuWPstQlgG/I92B/GHPk4c+K5Fxtiv/wmzNn77h
 qzqNuXPFC0tnvbmNGMxEOzRM/duaoyf5nurBpO0Xlo2dl29RAhWfXpyGu3IVSt7P
 03bkXt9FdoAyyDHH8i24yKqPLnLcDZwH+63qDPPz6EgD07DZDdBIpjosbJF/AGW7
 eGvsFKcGM5zH0ktZ8ZYN5a66/jaW8NP/e+vUa5RIwipdhwCn7ZEf1ERjfS14w+yD
 cbgKdbORm6DV3psM9mI/XxWLeihr0daeZWWB2GaSy+8JuYCFYVI=
 =IALr
 -----END PGP SIGNATURE-----

Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 updates from Konstantin Komarov:
 "Fixes:
   - reusing of the file index (could cause the file to be trimmed)
   - infinite dir enumeration
   - taking DOS names into account during link counting
   - le32_to_cpu conversion, 32 bit overflow, NULL check
   - some code was refactored

  Changes:
   - removed max link count info display during driver init

  Remove:
   - atomic_open has been removed for lack of use"

* tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Break dir enumeration if directory contents error
  fs/ntfs3: Fix case when index is reused during tree transformation
  fs/ntfs3: Mark volume as dirty if xattr is broken
  fs/ntfs3: Always make file nonresident on fallocate call
  fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
  fs/ntfs3: Use variable length array instead of fixed size
  fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
  fs/ntfs3: Check 'folio' pointer for NULL
  fs/ntfs3: Missed le32_to_cpu conversion
  fs/ntfs3: Remove max link count info display during driver init
  fs/ntfs3: Taking DOS names into account during link counting
  fs/ntfs3: remove atomic_open
  fs/ntfs3: use kcalloc() instead of kzalloc()
2024-05-25 14:19:01 -07:00
Linus Torvalds
6c8b1a2dca two ksmbd server fixes, both for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmZRUpIACgkQiiy9cAdy
 T1FIeQwAhZbsWGN/zwAP2tLbfrCpX0kFXfWIesOLZW53+6vnEXHUDHP7iN3Au1OA
 EDyeH4Sh1YcgP80mIzR+9iN3jY1FDY9G+dxfY5XdE4tSPQIIORko5B1GiumDHAZU
 NJivgLawK8OOvO01RgR5V/jyElZf0W+P2gaC/RWx4W7rGvzs9J+2uZnciTdmAGH0
 gF7zqgzM3lp7BTGD0zuaGU32W/4gcrAxVdqTkqR+i4n5/Jr9eJJWbHcxsKSun1HY
 75/BEvEJAHwQ4kMkR329pJwySyp3Zgzs7m4HAZwHKOUDzYLAnR7w2WNVNdY88T3/
 b0dQxY4V1cONaxiSN9MycoMMv59/P7VVnhvZIMhd5e7hBd4AgxUFj2c11okssGOf
 9P5BpTPGlAFYNAYR2p0uv0i3Xh6WF1Kor2SCdoIh2OkyCmhtertw+AqkqkOCiFGM
 ttwardj7+KuCKaKG7wl5UDfwnol04v6d4xoeh76jQx14euBpEfpED+ENS5fbdLu8
 F5I++L3M
 =JStz
 -----END PGP SIGNATURE-----

Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two ksmbd server fixes, both for stable"

* tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: ignore trailing slashes in share paths
  ksmbd: avoid to send duplicate oplock break notifications
2024-05-25 14:15:39 -07:00
Linus Torvalds
54f71b0369 RTC for 6.10
New driver:
  - Epson RX8111
 
 Drivers:
  - Many Device Tree bindings conversions to dtschema
  - pcf8563: wakeup-source support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmZRmpcACgkQY6TcMGxw
 OjK9rRAAmbqROjGoGnJAwWSKX3qeirgkF+bcujgJXfPKEvFnjYmko/dMXCn7iljP
 qHIEzxagOKnY9a5hnMt5MPKVjJ+WhcbVgemnXt7UB8jefctlNqkGw2eeJ93B52VR
 dCwnfDNhnBdREcUqPM6q1FREi7zdbm47Ox9z+ZQdjkwqjMfYrttesXhDOjcsEJiZ
 66hCDiNs8WWx0ZveWy3McaPQgukzM6iX0wgRwkl24ydr65+zcBr7FH2bAfZskZML
 pqJ493vjNdC4z62ZiZjPhtJBGdApF3H1vYCoW7bqNFrvsM8Wj65bfbMOL6ePE8z1
 yaYVHCBeT9FdEllLr5r1TIzCIKqQcI5OTvtCa7ESH0Ivft8Q9QBCvij0xXpkd38V
 ssC81K+aeQ8OzTTw5KlFHYwXJUgTEbq1AiC5eEVTM+axzs/sXqiX2x5HqPhcp5Bm
 gIqlqE/hQIm7nKNurF2w3aMp7+f1YPpCLzCBIjp8BuNkBCCZ6P8IFXUsGwUQNN0f
 LKy2ggbO/ArSwxGr8SQNYGF+4ySpnrdfJXRrO/leNlN37lVs40GoNI6GarftMIRr
 3N0gRDrs6m2JDoCISb96s47oroyiYx2VhZBKJdZx0pxIrzTnoi31k3Zhj4u+1FKw
 paExPLoVlU3Re0lVmcjafDc7qOJje2PZMuS4XfPKsZEe7bVAnYU=
 =Kclo
 -----END PGP SIGNATURE-----

Merge tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "There is one new driver and then most of the changes are the device
  tree bindings conversions to yaml.

  New driver:
   - Epson RX8111

  Drivers:
   - Many Device Tree bindings conversions to dtschema
   - pcf8563: wakeup-source support"

* tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  pcf8563: add wakeup-source support
  rtc: rx8111: handle VLOW flag
  rtc: rx8111: demote warnings to debug level
  rtc: rx6110: Constify struct regmap_config
  dt-bindings: rtc: convert trivial devices into dtschema
  dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
  dt-bindings: rtc: pxa-rtc: convert to dtschema
  rtc: Add driver for Epson RX8111
  dt-bindings: rtc: Add Epson RX8111
  rtc: mcp795: drop unneeded MODULE_ALIAS
  rtc: nuvoton: Modify part number value
  rtc: test: Split rtc unit test into slow and normal speed test
  dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
  dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
  dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
  dt-bindings: rtc: armada-380-rtc: convert to dtschema
  rtc: cros-ec: provide ID table for avoiding fallback match
2024-05-25 13:33:53 -07:00
Linus Torvalds
4286e1fceb I3C for 6.10
Core:
  - Allow device driver to trigger controller runtime PM
 
 Drivers:
  - dw: hot-join support
  - svc: better IBI handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmZRluEACgkQY6TcMGxw
 OjJuEw//eCoRPLtAinSBRrPirmWyryVIhMOjZFK+8752Ycau7sl/RIp+8N6QiW8C
 cOLugoONPq90nlVOxwYq8jtF+qaPTBkSXNbsJox0qnuZrJsLjyF1CxIYUVMClQ5/
 f7RGIhZv7paF26qP0li3ewQ8L0F13dCbdSzC2b9tI/Yy/GJmpzgpC6nv18YrlGK3
 1+L6wmz4rhR18pbjZm8myMoa//NG0lkmXhAwPYF+X9n9IJufBNhsijW1akFmqbvE
 +8DVWGLDqaY4AnotD6XBLsbFQsPF+Z0F1oVO1cL45Ned92yBsnyEKbnkH0K5SLj+
 FMvkiPGAOp5m1KxPLJ3XfthBUJp14SPufaJTsJo3tzQ5lE+4CMUkAD1Z3YTPqAZB
 HfkW48trejyVf8irUTv7NzZFQ62DPUREIHWcjnLASymgIqSWgiNCKHbufo61sU6u
 FoqMgZdcvsGmxz7Ld7RnVeU2hxaGqMNrc5jLC70YtrsFSkQB3bIfkHKhUSqeKdeO
 3cD1pyNfgnqIJ4uhb3fG1bm/MkNcOpupTNlP+XVGBYTIpIy/HWfgPCyQlO9qCXGd
 ZBbp+FojyAYb9UTbXhAVJn9U8kvqcRHI5jRuYXmxJuptSfa2semoPtQ6Ism7n/2+
 geHMAUs9s4KT670zzbv7FsPxma/b4mTcqpeCDxzjCQrNkGyfvDA=
 =EmUy
 -----END PGP SIGNATURE-----

Merge tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
 "Runtime PM (power management) is improved and hot-join support has
  been added to the dw controller driver.

  Core:
   - Allow device driver to trigger controller runtime PM

  Drivers:
   - dw: hot-join support
   - svc: better IBI handling"

* tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: dw: Add hot-join support.
  i3c: master: Enable runtime PM for master controller
  i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
  i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
  i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
2024-05-25 13:28:29 -07:00
Linus Torvalds
6951abe8f3 This pull request contains the following changes for JFFS2:
- Fix Illegal memory access jffs2_free_inode()
 - Kernel-doc fixes
 - Print symbolic error names
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmZRBAAWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wRU2EADNfW+wgJj+5ZTdUnIhU+Ng3+Qe
 B80+KjJzMHNyS1GILcNl4kGna2qam0BsnZKAbQuGK3dKoXjTqH9Vk1+s7h/PiOBi
 kPfn6XNReJfIebzNQSXJRzGZUFXEUWsonDbiMWtg+Pmjf8GGookmjcz3Ik74orqa
 q63F1YNa4cgDLsWSX9hYZR1yBbCaNcTvVgiwbXLAnxN5vr7d/cDSVvrsUmXyX4Mf
 oDLDxs8UgAKfpaiP2igJzqWmRDBUN3TJeL7ZbQwsdIkD4AiDMxYjTXHiLKuIvPw2
 31cLcVwg+bavkWehS2kYVK34qLXjhuFmIAjwozxFz2LrjHqf39PM5iHD6aWFgAeA
 1+6mU2Mzibir3P9L12VAYaZ6P73Kf3jxXlH08+U8+Saox5Tgzy1bmBPxHmVx5bo/
 h1Lp7M8PgVMJ9a1Zb/Woqb/WwqJaAd2WsiTRXt6yQ5a5v5OjJf1cIgHCNKWuu8gW
 OXRB8/aKueGjkKIpmxFGvjQWYlLBmnMDOnldyXBe2j4dFzVNaest5DaZROgEnMv1
 wgQCAfdpQCQaOLR7c2JCl/M0J0rUQQ2xoNzFGjgZOStb6EiVJuKH8w0WmMM7TKfE
 /FnrLXzUO6yY//ZW5TLdwGKJdvaVHB/9wGGtrA3MXIpMbXgi6QkBJ/bVFwcCE6Vj
 4KTk3gOYqV7TBRWSeg==
 =LMvL
 -----END PGP SIGNATURE-----

Merge tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull jffs2 updates from Richard Weinberger:

 - Fix illegal memory access in jffs2_free_inode()

 - Kernel-doc fixes

 - print symbolic error names

* tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  jffs2: Fix potential illegal address access in jffs2_free_inode
  jffs2: Simplify the allocation of slab caches
  jffs2: nodemgmt: fix kernel-doc comments
  jffs2: print symbolic error name instead of error code
2024-05-25 13:23:42 -07:00
Linus Torvalds
2313022ec5 This pull request contains the following changes for UML:
- Fixes for -Wmissing-prototypes warnings and further cleanup
 - Remove callback returning void from rtc and virtio drivers
 - Fix bash location
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmZQ/FYWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wfbLEAC1X55jjignxMIt4gEbtOXL2Pgn
 Md3z8sr5QhyQeLEkoYEhAqYHcKYY8A9ZshfNS4RNTbhU6qaFQBNwbBuFnJ1MsllC
 236EKgy0xFChgqH0bszGW97VRcIs79qauDt0mE0AXQGpuW7AjJX9chT2ikp9Sr5z
 P2Gnp7+l/OaAH7UXFpaYYOWOzRAQCbA67hN3nRcSBCPq+Plw2bQCCKKK0g4UwqmI
 vukAguO3eGZ0B4oQEsPX/krM0IigM01l5pJVhkdNzJgMOfd7eWb3o3juE35f4KPx
 vSd8LPmoBvDJt9dKbZE38fC58+U9qWDcBDLfDlf7F0dGtWQi6QeZmrmQSteQUAFF
 YWHllQ+P6xdh1kdSXWk8IesVINydMAc79DpqmKkEUgmCGVX+grt40aOTnOIUuzjq
 9lMcfKgjjBz6qsC3fWyGMvjaPpRRbe4G1wnAOij+hdBNR2fEFaqv8Dx9Zx42G3lm
 oYDylqjP73SbtOKbTCdHTqOfTSC83KYmo6w5ttwnFZcDVtbXRY8NejIX08Go8KIn
 OXeZ8Pxf3DmQ4yuhE3mWOoT/eFiZnXpoNiteQZ/8RhyPMJllVijtSIlnLteuah4d
 Z68Nh9/P52VcjMH0wS1eTKrkUAgfGBQ3kIOZqbU8UMSeq8vTB2kx++HwAtmUNi07
 pDaNOQVtW5m4HMhVlw==
 =umlG
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Fixes for -Wmissing-prototypes warnings and further cleanup

 - Remove callback returning void from rtc and virtio drivers

 - Fix bash location

* tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
  um: virtio_uml: Convert to platform remove callback returning void
  um: rtc: Convert to platform remove callback returning void
  um: Remove unused do_get_thread_area function
  um: Fix -Wmissing-prototypes warnings for __vdso_*
  um: Add an internal header shared among the user code
  um: Fix the declaration of kasan_map_memory
  um: Fix the -Wmissing-prototypes warning for get_thread_reg
  um: Fix the -Wmissing-prototypes warning for __switch_mm
  um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
  um: Stop tracking host PID in cpu_tasks
  um: process: remove unused 'n' variable
  um: vector: remove unused len variable/calculation
  um: vector: fix bpfflash parameter evaluation
  um: slirp: remove set but unused variable 'pid'
  um: signal: move pid variable where needed
  um: Makefile: use bash from the environment
  um: Add winch to winch_handlers before registering winch IRQ
  um: Fix -Wmissing-prototypes warnings for __warp_* and foo
  um: Fix -Wmissing-prototypes warnings for text_poke*
  um: Move declarations to proper headers
  ...
2024-05-25 13:17:48 -07:00
Linus Torvalds
56fb6f9285 drm fixes for 6.10-rc1
nouveau:
 - fix bo metadata uAPI for vm bind
 
 panthor:
 - Fixes for panthor's heap logical block.
 - Reset on unrecoverable fault
 - Fix VM references.
 - Reset fix.
 
 xlnx:
 - xlnx compile and doc fixes.
 
 amdgpu:
 - Handle vbios table integrated info v2.3
 
 amdkfd:
 - Handle duplicate BOs in reserve_bo_and_cond_vms
 - Handle memory limitations on small APUs
 
 dp/mst:
 - MST null deref fix.
 
 bridge:
 - Don't let next bridge create connector in adv7511 to make probe work.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZQ9rAACgkQDHTzWXnE
 hr7kew//e+OwUJcock4rqpMsXrVwgse4bJ1H7gnHLmN/AQKsXWcaHKixJu+WkkQI
 62gyhOEiSH+J4Je6zi5JbIt9AKgIKyZeJqg/QeSMqUFJzYkPKhrhj0aX4AefuNJB
 dxCI1+UnkNdQ/LM+Dwb4sq757FX0/yPqhzBiytWEJJVHP4Th5SMRFmSs8YsT1Z4F
 bn220QFvPv+KrVV2eKEf6plJqYhs92nT8cVkNw8x9LhnivLRhyh7we3Ew6R2zouV
 Izm1nYx1CtTqaiHo3+UOeqRLzhqpmukzmJyML9zk/qt+s+YMj9yI/ie02EnTPqg6
 NQCqqmoDy1BpUa37N486Q7MFTiS076HMaQpYiDmVz3PIWWPvKZDqcC5vonoh99ra
 b88X4J4w3e747xZP7VDsnAOFsaLgcRah4JNtv8H3rcurbz0FrcV/0g/W/kt+Nenc
 k2m2s2hv56yWXd6B7bd3tfUe5dOwxc+CgiNddM5X3WPenEvnH+DKBOoFojSRl2o7
 LpRL7WhY6fceRUqVrHzfIPLYgdfz/Ho9LYST2jGIdKDnwV0Hw0sNfU9C0KOzm0J6
 rjgGEeMHgVdZPzSeXSmaO82mBRvR7Ke6da7FIazbocledOapsB5qxjhLvkRyULAp
 mlGZPRPJdNVM8slNLxEqrT/ugorIo5owtQdEnJmDJXFdozcLTMY=
 =pexB
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Some fixes for the end of the merge window, mostly amdgpu and panthor,
  with one nouveau uAPI change that fixes a bad decision we made a few
  months back.

  nouveau:
   - fix bo metadata uAPI for vm bind

  panthor:
   - Fixes for panthor's heap logical block.
   - Reset on unrecoverable fault
   - Fix VM references.
   - Reset fix.

  xlnx:
   - xlnx compile and doc fixes.

  amdgpu:
   - Handle vbios table integrated info v2.3

  amdkfd:
   - Handle duplicate BOs in reserve_bo_and_cond_vms
   - Handle memory limitations on small APUs

  dp/mst:
   - MST null deref fix.

  bridge:
   - Don't let next bridge create connector in adv7511 to make probe
     work"

* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu/atomfirmware: add intergrated info v2.3 table
  drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
  drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
  drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
  drm/bridge: adv7511: Attach next bridge without creating connector
  drm/buddy: Fix the warn on's during force merge
  drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
  drm/panthor: Call panthor_sched_post_reset() even if the reset failed
  drm/panthor: Reset the FW VM to NULL on unplug
  drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
  drm/panthor: Force an immediate reset on unrecoverable faults
  drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
  drm/panthor: Fix an off-by-one in the heap context retrieval logic
  drm/panthor: Relax the constraints on the tiler chunk size
  drm/panthor: Make sure the tiler initial/max chunks are consistent
  drm/panthor: Fix tiler OOM handling to allow incremental rendering
  drm: xlnx: zynqmp_dpsub: Fix compilation error
  drm: xlnx: zynqmp_dpsub: Fix few function comments
2024-05-24 17:28:02 -07:00
David Howells
93a4315512 cifs: Fix missing set of remote_i_size
Occasionally, the generic/001 xfstest will fail indicating corruption in
one of the copy chains when run on cifs against a server that supports
FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs).  The
problem is that the remote_i_size value isn't updated by cifs_setsize()
when called by smb2_duplicate_extents(), but i_size *is*.

This may cause cifs_remap_file_range() to then skip the bit after calling
->duplicate_extents() that sets sizes.

Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
calling cifs_setsize() to set i_size.

This means we don't then need to call netfs_resize_file() upon return from
->duplicate_extents(), but we also fix the test to compare against the pre-dup
inode size.

[Note that this goes back before the addition of remote_i_size with the
netfs_inode struct.  It should probably have been setting cifsi->server_eof
previously.]

Fixes: cfc63fc812 ("smb3: fix cached file size problems in duplicate extents (reflink)")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-24 16:05:56 -05:00
David Howells
8a16072335 cifs: Fix smb3_insert_range() to move the zero_point
Fix smb3_insert_range() to move the zero_point over to the new EOF.
Without this, generic/147 fails as reads of data beyond the old EOF point
return zeroes.

Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-24 16:04:36 -05:00
Linus Torvalds
0b32d436c0 Jeff Xu's implementation of the mseal() syscall.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZlDhVAAKCRDdBJ7gKXxA
 jqDSAP0aGY505ka3+ffe6e5OP7W7syKjXHLy84Hp2t6YWnU+6QEA86qcXnfOI7HB
 7FPy+fa9sMm6BfAAZPkYnICAgVpbBAw=
 =Q3vf
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more mm updates from Andrew Morton:
 "Jeff Xu's implementation of the mseal() syscall"

* tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  selftest mm/mseal read-only elf memory segment
  mseal: add documentation
  selftest mm/mseal memory sealing
  mseal: add mseal syscall
  mseal: wire up mseal syscall
2024-05-24 12:47:28 -07:00
Chengming Zhou
90e8234988 mm/ksm: fix possible UAF of stable_node
The commit 2c653d0ee2 ("ksm: introduce ksm_max_page_sharing per page
deduplication limit") introduced a possible failure case in the
stable_tree_insert(), where we may free the new allocated stable_node_dup
if we fail to prepare the missing chain node.

Then that kfolio return and unlock with a freed stable_node set...  And
any MM activities can come in to access kfolio->mapping, so UAF.

Fix it by moving folio_set_stable_node() to the end after stable_node
is inserted successfully.

Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev
Fixes: 2c653d0ee2 ("ksm: introduce ksm_max_page_sharing per page deduplication limit")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:08 -07:00
Miaohe Lin
8cf360b9d6 mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
When I did memory failure tests recently, below panic occurs:

page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __rmqueue_pcplist+0x23b/0x520
 get_page_from_freelist+0x26b/0xe40
 __alloc_pages_noprof+0x113/0x1120
 __folio_alloc_noprof+0x11/0xb0
 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
 __alloc_fresh_hugetlb_folio+0xe7/0x140
 alloc_pool_huge_folio+0x68/0x100
 set_max_huge_pages+0x13d/0x340
 hugetlb_sysctl_handler_common+0xe8/0x110
 proc_sys_call_handler+0x194/0x280
 vfs_write+0x387/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
 </TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---

And before the panic, there had an warning about bad page state:

BUG: Bad page state in process page-types  pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
 <TASK>
 dump_stack_lvl+0x83/0xa0
 bad_page+0x63/0xf0
 free_unref_page+0x36e/0x5c0
 unpoison_memory+0x50b/0x630
 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
 debugfs_attr_write+0x42/0x60
 full_proxy_write+0x5b/0x80
 vfs_write+0xcd/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
 </TASK>

The root cause should be the below race:

 memory_failure
  try_memory_failure_hugetlb
   me_huge_page
    __page_handle_poison
     dissolve_free_hugetlb_folio
     drain_all_pages -- Buddy page can be isolated e.g. for compaction.
     take_page_off_buddy -- Failed as page is not in the buddy list.
	     -- Page can be putback into buddy after compaction.
    page_ref_inc -- Leads to buddy page with refcnt = 1.

Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning.  And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.

Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.

Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes: ceaf8fbea7 ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:08 -07:00
Yuanyuan Zhong
6d065f507d mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
After switching smaps_rollup to use VMA iterator, searching for next entry
is part of the condition expression of the do-while loop.  So the current
VMA needs to be addressed before the continue statement.

Otherwise, with some VMAs skipped, userspace observed memory
consumption from /proc/pid/smaps_rollup will be smaller than the sum of
the corresponding fields from /proc/pid/smaps.

Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
Fixes: c4c84f0628 ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:07 -07:00