linux/mm
Linus Torvalds 3b3f874cc1 vfs-6.7.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTpoQAAKCRCRxhvAZXjc
 ovFNAQDgIRjXfZ1Ku+USxsRRdqp8geJVaNc3PuMmYhOYhUenqgEAmC1m+p0y31dS
 P6+HlL16Mqgu0tpLCcJK9BibpDZ0Ew4=
 =7yD1
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Rename and export helpers that get write access to a mount. They
     are used in overlayfs to get write access to the upper mount.

   - Print the pretty name of the root device on boot failure. This
     helps in scenarios where we would usually only print
     "unknown-block(1,2)".

   - Add an internal SB_I_NOUMASK flag. This is another part in the
     endless POSIX ACL saga in a way.

     When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip
     the umask because if the relevant inode has POSIX ACLs set it might
     take the umask from there. But if the inode doesn't have any POSIX
     ACLs set then we apply the umask in the filesytem itself. So we end
     up with:

      (1) no SB_POSIXACL -> strip umask in vfs
      (2) SB_POSIXACL    -> strip umask in filesystem

     The umask semantics associated with SB_POSIXACL allowed filesystems
     that don't even support POSIX ACLs at all to raise SB_POSIXACL
     purely to avoid umask stripping. That specifically means NFS v4 and
     Overlayfs. NFS v4 does it because it delegates this to the server
     and Overlayfs because it needs to delegate umask stripping to the
     upper filesystem, i.e., the filesystem used as the writable layer.

     This went so far that SB_POSIXACL is raised eve on kernels that
     don't even have POSIX ACL support at all.

     Stop this blatant abuse and add SB_I_NOUMASK which is an internal
     superblock flag that filesystems can raise to opt out of umask
     handling. That should really only be the two mentioned above. It's
     not that we want any filesystems to do this. Ideally we have all
     umask handling always in the vfs.

   - Make overlayfs use SB_I_NOUMASK too.

   - Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in
     IS_POSIXACL() if the kernel doesn't have support for it. This is a
     very old patch but it's only possible to do this now with the wider
     cleanup that was done.

   - Follow-up work on fake path handling from last cycle. Citing mostly
     from Amir:

     When overlayfs was first merged, overlayfs files of regular files
     and directories, the ones that are installed in file table, had a
     "fake" path, namely, f_path is the overlayfs path and f_inode is
     the "real" inode on the underlying filesystem.

     In v6.5, we took another small step by introducing of the
     backing_file container and the file_real_path() helper. This change
     allowed vfs and filesystem code to get the "real" path of an
     overlayfs backing file. With this change, we were able to make
     fsnotify work correctly and report events on the "real" filesystem
     objects that were accessed via overlayfs.

     This method works fine, but it still leaves the vfs vulnerable to
     new code that is not aware of files with fake path. A recent
     example is commit db1d1e8b98 ("IMA: use vfs_getattr_nosec to get
     the i_version"). This commit uses direct referencing to f_path in
     IMA code that otherwise uses file_inode() and file_dentry() to
     reference the filesystem objects that it is measuring.

     This contains work to switch things around: instead of having
     filesystem code opt-in to get the "real" path, have generic code
     opt-in for the "fake" path in the few places that it is needed.

     Is it far more likely that new filesystems code that does not use
     the file_dentry() and file_real_path() helpers will end up causing
     crashes or averting LSM/audit rules if we keep the "fake" path
     exposed by default.

     This change already makes file_dentry() moot, but for now we did
     not change this helper just added a WARN_ON() in ovl_d_real() to
     catch if we have made any wrong assumptions.

     After the dust settles on this change, we can make file_dentry() a
     plain accessor and we can drop the inode argument to ->d_real().

   - Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small
     change but it really isn't and I would like to see everyone on
     their tippie toes for any possible bugs from this work.

     Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for
     files since a very long time because of the nasty interactions
     between the SCM_RIGHTS file descriptor garbage collection. So
     extending it makes a lot of sense but it is a subtle change. There
     are almost no places that fiddle with file rcu semantics directly
     and the ones that did mess around with struct file internal under
     rcu have been made to stop doing that because it really was always
     dodgy.

     I forgot to put in the link tag for this change and the discussion
     in the commit so adding it into the merge message:

       https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com

  Cleanups:

   - Various smaller pipe cleanups including the removal of a spin lock
     that was only used to protect against writes without pipe_lock()
     from O_NOTIFICATION_PIPE aka watch queues. As that was never
     implemented remove the additional locking from pipe_write().

   - Annotate struct watch_filter with the new __counted_by attribute.

   - Clarify do_unlinkat() cleanup so that it doesn't look like an extra
     iput() is done that would cause issues.

   - Simplify file cleanup when the file has never been opened.

   - Use module helper instead of open-coding it.

   - Predict error unlikely for stale retry.

   - Use WRITE_ONCE() for mount expiry field instead of just commenting
     that one hopes the compiler doesn't get smart.

  Fixes:

   - Fix readahead on block devices.

   - Fix writeback when layztime is enabled and inodes whose timestamp
     is the only thing that changed reside on wb->b_dirty_time. This
     caused excessively large zombie memory cgroup when lazytime was
     enabled as such inodes weren't handled fast enough.

   - Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()"

* tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
  file, i915: fix file reference for mmap_singleton()
  vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups
  writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
  chardev: Simplify usage of try_module_get()
  ovl: rely on SB_I_NOUMASK
  fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n
  fs: store real path instead of fake path in backing file f_path
  fs: create helper file_user_path() for user displayed mapped file path
  fs: get mnt_writers count for an open backing file's real path
  vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked
  vfs: predict the error in retry_estale as unlikely
  backing file: free directly
  vfs: fix readahead(2) on block devices
  io_uring: use files_lookup_fd_locked()
  file: convert to SLAB_TYPESAFE_BY_RCU
  vfs: shave work on failed file open
  fs: simplify misleading code to remove ambiguity regarding ihold()/iput()
  watch_queue: Annotate struct watch_filter with __counted_by
  fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
  fs/pipe: remove unnecessary spinlock from pipe_write()
  ...
2023-10-30 09:14:19 -10:00
..
damon mm/damon/sysfs: check DAMOS regions update progress from before_terminate() 2023-10-18 12:12:41 -07:00
kasan kasan: disable kasan_non_canonical_hook() for HW tags 2023-10-18 12:12:41 -07:00
kfence LoongArch changes for v6.6 2023-09-08 12:16:52 -07:00
kmsan mm: kmsan: use helper macros PAGE_ALIGN and PAGE_ALIGN_DOWN 2023-08-21 13:37:29 -07:00
backing-dev.c writeback: remove redundant checks for root memcg 2023-08-21 13:37:48 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c dma-maping updates for Linux 6.6 2023-08-29 20:32:10 -07:00
cma.h
compaction.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
debug_page_alloc.c mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
dmapool.c dmapool: create/destroy cleanup 2023-06-09 16:25:17 -07:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c mm: page_alloc: split out FAIL_PAGE_ALLOC 2023-06-09 16:25:23 -07:00
failslab.c
filemap.c mm: report success more often from filemap_map_folio_range() 2023-09-29 17:20:45 -07:00
folio-compat.c filemap: Add fgf_t typedef 2023-07-24 18:04:30 -04:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h
gup.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-08-18 10:12:14 -07:00
hugetlb_vmemmap.h
hugetlb.c hugetlbfs: close race between MADV_DONTNEED and page fault 2023-10-18 12:12:41 -07:00
hwpoison-inject.c
init-mm.c mm: move dummy_vm_ops out of a header 2023-08-21 13:37:46 -07:00
internal.h Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-05-29 16:14:28 +01:00
khugepaged.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
kmemleak.c mm/kmemleak: move up cond_resched() call in page scanning loop 2023-09-02 15:17:34 -07:00
ksm.c mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() 2023-09-05 11:11:52 -07:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
madvise.c swap: remove remnants of polling from read_swap_cache_async 2023-08-24 16:20:16 -07:00
Makefile - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c mm: disable kernelcore=mirror when no mirror memory 2023-08-21 13:37:43 -07:00
memcontrol.c mm, memcg: reconsider kmem.limit_in_bytes deprecation 2023-09-29 17:20:47 -07:00
memfd.c revert "memfd: improve userspace warnings for missing exec-related flags". 2023-09-05 11:11:52 -07:00
memory_hotplug.c mm/memory_hotplug: embed vmem_altmap details in memory block 2023-08-21 13:37:49 -07:00
memory-failure.c Seven hotfixes. Four are cc:stable and the remainder pertain to issues 2023-09-05 12:22:39 -07:00
memory-tiers.c memory tier: use helper macro __ATTR_RW() 2023-08-18 10:12:38 -07:00
memory.c hugetlbfs: close race between MADV_DONTNEED and page fault 2023-10-18 12:12:41 -07:00
mempolicy.c mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer 2023-10-06 14:11:38 -07:00
mempool.c
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c mm: memtest: convert to memtest_report_meminfo() 2023-08-21 13:37:47 -07:00
migrate_device.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
migrate.c mm/migrate: fix do_pages_move for compat pointers 2023-10-06 14:11:38 -07:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
mm_init.c mm/mm_init: use helper macro BITS_PER_LONG and BITS_PER_BYTE 2023-08-21 13:37:47 -07:00
mm_slot.h
mmap_lock.c
mmap.c mmap: fix error paths with dup_anon_vma() 2023-10-06 14:11:38 -07:00
mmu_gather.c mm: fix kernel-doc warning from tlb_flush_rmaps() 2023-08-24 16:20:30 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c
mprotect.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mremap.c vm: fix move_vma() memory accounting being off 2023-09-16 15:23:31 -07:00
msync.c
nommu.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
oom_kill.c mm: remove redundant K() macro definition 2023-08-21 13:37:44 -07:00
page_alloc.c mm/page_alloc: correct start page when guard page debug is enabled 2023-10-06 14:11:38 -07:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c zswap: make zswap_load() take a folio 2023-08-21 13:37:27 -07:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_ext: use page_ext_data helper in page_owner 2023-08-21 13:37:27 -07:00
page_poison.c mm/page_poison: remove unused page_ext.h from page_poison 2023-08-21 13:37:30 -07:00
page_reporting.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: correct stale comment of function check_pte 2023-08-18 10:12:13 -07:00
page-writeback.c mm: remove folio_account_redirty 2023-08-21 14:52:16 +02:00
pagewalk.c mm/pagewalk: fix bootstopping regression from extra pte_unmap() 2023-09-02 08:39:21 -07:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm/percpu.c: print error message too if atomic alloc failed 2023-08-25 08:04:59 -07:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm/gup: remove unused vmas parameter from pin_user_pages_remote() 2023-06-09 16:25:25 -07:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c vfs: fix readahead(2) on block devices 2023-10-19 11:02:49 +02:00
rmap.c mm: hugetlb: add huge page size param to set_huge_pte_at() 2023-09-29 17:20:47 -07:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem_quota.c shmem: Add default quota limit mount options 2023-08-09 09:15:40 +02:00
shmem.c Revert "tmpfs: add support for multigrain timestamps" 2023-09-20 18:05:30 +02:00
show_mem.c mm,thp: no space after colon in Mem-Info fields 2023-08-21 13:38:01 -07:00
shrinker_debug.c Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless" 2023-06-19 13:19:34 -07:00
shuffle.c
shuffle.h mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab_common.c mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign() 2023-10-11 15:24:49 +02:00
slab.c Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab.h Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slub.c mm/slub: remove freelist_dereference() 2023-07-14 09:57:21 +02:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparse: remove redundant judgments from macro for_each_present_section_nr 2023-08-18 10:12:14 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h swap: remove remnants of polling from read_swap_cache_async 2023-08-24 16:20:16 -07:00
swapfile.c mm/swap: Convert to use bdev_open_by_dev() 2023-10-28 13:29:19 +02:00
truncate.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
util.c rcu: dump vmalloc memory info safely 2023-09-05 10:13:45 -07:00
vmalloc.c mm: hugetlb: add huge page size param to set_huge_pte_at() 2023-09-29 17:20:47 -07:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-08-16 12:21:32 +01:00
vmscan.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
vmstat.c mm/vmstat: remove unused page_ext.h from vmstat 2023-08-21 13:37:30 -07:00
workingset.c mm: memcg: use rstat for non-hierarchical stats 2023-08-24 16:20:18 -07:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
zswap.c mm: zswap: fix pool refcount bug around shrink_worker() 2023-10-18 12:12:40 -07:00