Commit Graph

78350 Commits

Author SHA1 Message Date
Linus Torvalds
bc32a6330f The first two changes that involve files outside of fs/ext4:
- submit_bh() can never return an error, so change it to return void,
   and remove the unused checks from its callers
 
 - fix I_DIRTY_TIME handling so it will be set even if the inode
   already has I_DIRTY_INODE
 
 Performance:
 
 - Always enable i_version counter (as btrfs and xfs already do).
   Remove some uneeded i_version bumps to avoid unnecessary nfs cache
   invalidations.
 
 - Wake up journal waters in FIFO order, to avoid some journal users
   from not getting a journal handle for an unfairly long time.
 
 - In ext4_write_begin() allocate any necessary buffer heads before
   starting the journal handle.
 
 - Don't try to prefetch the block allocation bitmaps for a read-only
   file system.
 
 Bug Fixes:
 
 - Fix a number of fast commit bugs, including resources leaks and out
   of bound references in various error handling paths and/or if the fast
   commit log is corrupted.
 
 - Avoid stopping the online resize early when expanding a file system
   which is less than 16TiB to a size greater than 16TiB.
 
 - Fix apparent metadata corruption caused by a race with a metadata
   buffer head getting migrated while it was trying to be read.
 
 - Mark the lazy initialization thread freezable to prevent suspend
   failures.
 
 - Other miscellaneous bug fixes.
 
 Cleanups:
 
 - Break up the incredibly long ext4_full_super() function by
   refactoring to move code into more understandable, smaller
   functions.
 
 - Remove the deprecated (and ignored) noacl and nouser_attr mount
   option.
 
 - Factor out some common code in fast commit handling.
 
 - Other miscellaneous cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmM8/2gACgkQ8vlZVpUN
 gaPohAf9GDMUq3QIYoWLlJ+ygJhL0xQGPfC6sypMjHaUO5GSo+1+sAMU3JBftxUS
 LrgTtmzSKzwp9PyOHNs+mswUzhLZivKVCLMmOznQUZS228GSVKProhN1LPL4UP2Q
 Ks8i1M5XTWS+mtJ5J5Mw6jRHxcjfT6ynyJKPnIWKTwXyeru1WSJ2PWqtWQD4EZkE
 lImECy0jX/zlK02s0jDYbNIbXIvI/TTYi7wT8o1ouLCAXMDv5gJRc5TXCVtX8i59
 /Pl9rGG/+IWTnYT/aQ668S2g0Cz6Wyv2EkmiPUW0Y8NoLaaouBYZoC2hDujiv+l1
 ucEI14TEQ+DojJTdChrtwKqgZfqDOw==
 =xoLC
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The first two changes involve files outside of fs/ext4:

   - submit_bh() can never return an error, so change it to return void,
     and remove the unused checks from its callers

   - fix I_DIRTY_TIME handling so it will be set even if the inode
     already has I_DIRTY_INODE

  Performance:

   - Always enable i_version counter (as btrfs and xfs already do).
     Remove some uneeded i_version bumps to avoid unnecessary nfs cache
     invalidations

   - Wake up journal waiters in FIFO order, to avoid some journal users
     from not getting a journal handle for an unfairly long time

   - In ext4_write_begin() allocate any necessary buffer heads before
     starting the journal handle

   - Don't try to prefetch the block allocation bitmaps for a read-only
     file system

  Bug Fixes:

   - Fix a number of fast commit bugs, including resources leaks and out
     of bound references in various error handling paths and/or if the
     fast commit log is corrupted

   - Avoid stopping the online resize early when expanding a file system
     which is less than 16TiB to a size greater than 16TiB

   - Fix apparent metadata corruption caused by a race with a metadata
     buffer head getting migrated while it was trying to be read

   - Mark the lazy initialization thread freezable to prevent suspend
     failures

   - Other miscellaneous bug fixes

  Cleanups:

   - Break up the incredibly long ext4_full_super() function by
     refactoring to move code into more understandable, smaller
     functions

   - Remove the deprecated (and ignored) noacl and nouser_attr mount
     option

   - Factor out some common code in fast commit handling

   - Other miscellaneous cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (53 commits)
  ext4: fix potential out of bound read in ext4_fc_replay_scan()
  ext4: factor out ext4_fc_get_tl()
  ext4: introduce EXT4_FC_TAG_BASE_LEN helper
  ext4: factor out ext4_free_ext_path()
  ext4: remove unnecessary drop path references in mext_check_coverage()
  ext4: update 'state->fc_regions_size' after successful memory allocation
  ext4: fix potential memory leak in ext4_fc_record_regions()
  ext4: fix potential memory leak in ext4_fc_record_modified_inode()
  ext4: remove redundant checking in ext4_ioctl_checkpoint
  jbd2: add miss release buffer head in fc_do_one_pass()
  ext4: move DIOREAD_NOLOCK setting to ext4_set_def_opts()
  ext4: remove useless local variable 'blocksize'
  ext4: unify the ext4 super block loading operation
  ext4: factor out ext4_journal_data_mode_check()
  ext4: factor out ext4_load_and_init_journal()
  ext4: factor out ext4_group_desc_init() and ext4_group_desc_free()
  ext4: factor out ext4_geometry_check()
  ext4: factor out ext4_check_feature_compatibility()
  ext4: factor out ext4_init_metadata_csum()
  ext4: factor out ext4_encoding_init()
  ...
2022-10-06 17:45:53 -07:00
Linus Torvalds
7f198ba7ae affs-for-6.1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmM71hEACgkQxWXV+ddt
 WDsiGg/9EGardWzoQs/YCexFxjDTnIQxaTmoxN2igfsNJ7PwQQDb2R2hUSSQgarp
 vepmjkO0YdFI4Hlym81x5t/cRmdGYz1fjt/0nBibhrhxJBUi3S6CcSOABuOdP699
 E9Iuzyx6YRa4eJN5+rmyDj/51sOgZNGZ+cMKY8ukkXk1Y/VBfOzU/yIoLVjNKPMJ
 UZwXN8C/Tvh/wbE4uBYu5G4dbXfC+qx3ywz/+KdccmSm1iPgA1NzmbAJDFjyCVKI
 R0qub1qXzOTqM0Y1uAu1+LjS2mIv1uASCt2MXt1ragZJsClFK8k+pQwHU9/m4S4G
 lOtUobsfrAPN8laH9B6aIWJMTSY2hglQxbKLFXAy12znlHoPXa5VANNh7KIurBNH
 xtV9eFjcc1b3CJ+nF1NaekamFhWFDSL1kHQpu16gyTZc1p9wkbAh8mYSl0LVHOCp
 n7ZiSJ1Gq5+WdbTMvjVAlgObtzpjOkvl0MpsDRqBn7fva/CNYI+D1NWbf2uP9NyY
 Pe9XO8w7bNXRLyV8vKtISrCnxULGuXxsXPG23K8nd78h5ZPrjCyU3RQaAANFfgDe
 YEzCXF+WKLT8UflJOB1l9rYnY6VrwlktVYOHgHRT94CHPn7vJkCf2HVlOg28sBnM
 39sspyIyO9b55IsWyY0RxRwdQ6mMTXnJqxG5GgorMQ+ZrBg9BPk=
 =fNXH
 -----END PGP SIGNATURE-----

Merge tag 'affs-for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull affs update from David Sterba:
 "One minor update for AFFS, switching away from strlcpy"

* tag 'affs-for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  affs: move from strlcpy with unused retval to strscpy
2022-10-06 17:42:29 -07:00
Linus Torvalds
76e4503534 for-6.1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmM6zNkACgkQxWXV+ddt
 WDsNMg/+LTuwf6Js+mAl1AgtSpLOl2gLfNBJAUXhzwPbc3nF9bwONE/EUYEXTo5h
 kTf1cQRj0NCIZ7iHDwXuWNm77diNl+SChEDIoc7k0d6P7Qmmn2AWbTLM4dleyg5S
 6jxPpOMbegycQfL9tSJNaiT9zlZxj9Z+0yPibR99otrgtuv6zuvRxcdh34rEFIyf
 xoabO3/18lAKHzYzAZxNXMpbUSBmqLPVoZEOcfBAXvcuIJkzKRP6Y9gwlYs+kn+D
 J8BPa3LoSNxXrpCvWzlu7vO3gwNp7H7pQQqZKjjEcOZ+dj2UYQeTyJvl1vdzaNyk
 EoFYlkaKkYi7RaonuHjNaTeD/igJf8Eo6DTiXzACECssbKutlvNG4HXuFApsWy7M
 T7KZ5jTAQ98ZMYjgZ27UbEpFZd8lYHzV952Njjo9zbRVbqwaPEZTTdkjpz+3X6t4
 Z0A951ixOYKiOVdu3Uj1fHaBv0n/p0wrXIGt3ZIdjufM9TctV3oJwOZOiM2H0ccb
 XJVwsQG92+ja9XLZrw8H62PCKBYo3LL52r9b9NVodY9aTsQWTfiV5OP84RRlncCp
 hzPkHmO1YIyVcLoijagiO7cW21pQbKfqsRX/P1F7DXyjosHppmDS7IHDWA7Adf3W
 QA6eBnoWqVwBh7P+IyxJuRG0CrnxkPZeAZIhohDwk5Mt4NGATkA=
 =NlUz
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "There's a bunch of performance improvements, most notably the FIEMAP
  speedup, the new block group tree to speed up mount on large
  filesystems, more io_uring integration, some sysfs exports and the
  usual fixes and core updates.

  Summary:

  Performance:

   - outstanding FIEMAP speed improvement
      - algorithmic change how extents are enumerated leads to orders of
        magnitude speed boost (uncached and cached)
      - extent sharing check speedup (2.2x uncached, 3x cached)
      - add more cancellation points, allowing to interrupt seeking in
        files with large number of extents
      - more efficient hole and data seeking (4x uncached, 1.3x cached)
      - sample results:
	    256M, 32K extents:   4s ->  29ms  (~150x)
	    512M, 64K extents:  30s ->  59ms  (~550x)
	    1G,  128K extents: 225s -> 120ms (~1800x)

   - improved inode logging, especially for directories (on dbench
     workload throughput +25%, max latency -21%)

   - improved buffered IO, remove redundant extent state tracking,
     lowering memory consumption and avoiding rb tree traversal

   - add sysfs tunable to let qgroup temporarily skip exact accounting
     when deleting snapshot, leading to a speedup but requiring a rescan
     after that, will be used by snapper

   - support io_uring and buffered writes, until now it was just for
     direct IO, with the no-wait semantics implemented in the buffered
     write path it now works and leads to speed improvement in IOPS
     (2x), throughput (2.2x), latency (depends, 2x to 150x)

   - small performance improvements when dropping and searching for
     extent maps as well as when flushing delalloc in COW mode
     (throughput +5MB/s)

  User visible changes:

   - new incompatible feature block-group-tree adding a dedicated tree
     for tracking block groups, this allows a much faster load during
     mount and avoids seeking unlike when it's scattered in the extent
     tree items
      - this reduces mount time for many-terabyte sized filesystems
      - conversion tool will be provided so existing filesystem can also
        be updated in place
      - to reduce test matrix and feature combinations requires no-holes
        and free-space-tree (mkfs defaults since 5.15)

   - improved reporting of super block corruption detected by scrub

   - scrub also tries to repair super block and does not wait until next
     commit

   - discard stats and tunables are exported in sysfs
     (/sys/fs/btrfs/FSID/discard)

   - qgroup status is exported in sysfs
     (/sys/sys/fs/btrfs/FSID/qgroups/)

   - verify that super block was not modified when thawing filesystem

  Fixes:

   - FIEMAP fixes
      - fix extent sharing status, does not depend on the cached status
        where merged
      - flush delalloc so compressed extents are reported correctly

   - fix alignment of VMA for memory mapped files on THP

   - send: fix failures when processing inodes with no links (orphan
     files and directories)

   - fix race between quota enable and quota rescan ioctl

   - handle more corner cases for read-only compat feature verification

   - fix missed extent on fsync after dropping extent maps

  Core:

   - lockdep annotations to validate various transactions states and
     state transitions

   - preliminary support for fs-verity in send

   - more effective memory use in scrub for subpage where sector is
     smaller than page

   - block group caching progress logic has been removed, load is now
     synchronous

   - simplify end IO callbacks and bio handling, use chained bios
     instead of own tracking

   - add no-wait semantics to several functions (tree search, nocow,
     flushing, buffered write

   - cleanups and refactoring

  MM changes:

   - export balance_dirty_pages_ratelimited_flags"

* tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (177 commits)
  btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer
  btrfs: drop extent map range more efficiently
  btrfs: avoid pointless extent map tree search when flushing delalloc
  btrfs: remove unnecessary next extent map search
  btrfs: remove unnecessary NULL pointer checks when searching extent maps
  btrfs: assert tree is locked when clearing extent map from logging
  btrfs: remove unnecessary extent map initializations
  btrfs: remove the refcount warning/check at free_extent_map()
  btrfs: add helper to replace extent map range with a new extent map
  btrfs: move open coded extent map tree deletion out of inode eviction
  btrfs: use cond_resched_rwlock_write() during inode eviction
  btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
  btrfs: move btrfs_drop_extent_cache() to extent_map.c
  btrfs: fix missed extent on fsync after dropping extent maps
  btrfs: remove stale prototype of btrfs_write_inode
  btrfs: enable nowait async buffered writes
  btrfs: assert nowait mode is not used for some btree search functions
  btrfs: make btrfs_buffered_write nowait compatible
  btrfs: plumb NOWAIT through the write path
  btrfs: make lock_and_cleanup_extent_if_need nowait compatible
  ...
2022-10-06 17:36:48 -07:00
Linus Torvalds
4c0ed7d8d6 whack-a-mole: constifying struct path *
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxmRQAKCRBZ7Krx/gZQ
 6+/kAQD2xyf+i4zOYVBr1NB3qBbhVS1zrni1NbC/kT3dJPgTvwEA7z7eqwnrN4zg
 scKFP8a3yPoaQBfs4do5PolhuSr2ngA=
 =NBI+
 -----END PGP SIGNATURE-----

Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs constification updates from Al Viro:
 "whack-a-mole: constifying struct path *"

* tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ecryptfs: constify path
  spufs: constify path
  nd_jump_link(): constify path
  audit_init_parent(): constify path
  __io_setxattr(): constify path
  do_proc_readlink(): constify path
  overlayfs: constify path
  fs/notify: constify path
  may_linkat(): constify path
  do_sys_name_to_handle(): constify path
  ->getprocattr(): attribute name is const char *, TYVM...
2022-10-06 17:31:02 -07:00
Linus Torvalds
ab29622157 whack-a-mole: cropped up open-coded file_inode() uses...
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxj0gAKCRBZ7Krx/gZQ
 66/1AQC/KfIAINNOPxozsZaxOaOKo0ouVJ7sJV4ZGsPKpU69gwD/UodJZCtyZ52h
 wwkmfzTDjAgGt1QCKj96zk2XFqg4swE=
 =u0pv
 -----END PGP SIGNATURE-----

Merge tag 'pull-file_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file_inode() updates from Al Vrio:
 "whack-a-mole: cropped up open-coded file_inode() uses..."

* tag 'pull-file_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: use ->f_mapping
  _nfs42_proc_copy(): use ->f_mapping instead of file_inode()->i_mapping
  dma_buf: no need to bother with file_inode()->i_mapping
  nfs_finish_open(): don't open-code file_inode()
  bprm_fill_uid(): don't open-code file_inode()
  sgx: use ->f_mapping...
  exfat_iterate(): don't open-code file_inode(file)
  ibmvmc: don't open-code file_inode()
2022-10-06 17:22:11 -07:00
Linus Torvalds
7a3353c5c4 struct file-related stuff
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxjIQAKCRBZ7Krx/gZQ
 6/FPAQCNCZygQzd+54//vo4kTwv5T2Bv3hS8J51rASPJT87/BQD/TfCLS5urt/Gt
 81A1dFOfnTXseofuBKyGSXwQm0dWpgA=
 =PLre
 -----END PGP SIGNATURE-----

Merge tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs file updates from Al Viro:
 "struct file-related stuff"

* tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dma_buf_getfile(): don't bother with ->f_flags reassignments
  Change calling conventions for filldir_t
  locks: fix TOCTOU race when granting write lease
2022-10-06 17:13:18 -07:00
Linus Torvalds
70df64d6c6 d_path pile
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxjQAAKCRBZ7Krx/gZQ
 683pAP9oSHaXo3Twl6rweirNbHocgm8MynCgIU3bpzeVPi6Z1wEApfEq4IInWQyL
 R6ObOneoSobi+9Iaqsoe+uKu54MghAY=
 =rt7w
 -----END PGP SIGNATURE-----

Merge tag 'pull-d_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs d_path updates from Al Viro.

* tag 'pull-d_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  d_path.c: typo fix...
  dynamic_dname(): drop unused dentry argument
2022-10-06 16:55:41 -07:00
Linus Torvalds
46811b5cb3 saner inode_init_always()
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxivgAKCRBZ7Krx/gZQ
 66iyAQD2btSJlwKoqMlo+Xnj0J7nvHEYpFOf+rMWa/1SJIJOnAEAr9VYpgstELHP
 bAZ3EbGyLPZbLPnmiTzrDlvqZDbKDgs=
 =4e/D
 -----END PGP SIGNATURE-----

Merge tag 'pull-inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs inode update from Al Viro:
 "Saner inode_init_always(), also fixing a nilfs problem"

* tag 'pull-inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: fix UAF/GPF bug in nilfs_mdt_destroy
2022-10-06 16:49:00 -07:00
Linus Torvalds
0326074ff4 Networking changes for 6.1.
Core
 ----
 
  - Introduce and use a single page frag cache for allocating small skb
    heads, clawing back the 10-20% performance regression in UDP flood
    test from previous fixes.
 
  - Run packets which already went thru HW coalescing thru SW GRO.
    This significantly improves TCP segment coalescing and simplifies
    deployments as different workloads benefit from HW or SW GRO.
 
  - Shrink the size of the base zero-copy send structure.
 
  - Move TCP init under a new slow / sleepable version of DO_ONCE().
 
 BPF
 ---
 
  - Add BPF-specific, any-context-safe memory allocator.
 
  - Add helpers/kfuncs for PKCS#7 signature verification from BPF
    programs.
 
  - Define a new map type and related helpers for user space -> kernel
    communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
 
  - Allow targeting BPF iterators to loop through resources of one
    task/thread.
 
  - Add ability to call selected destructive functions.
    Expose crash_kexec() to allow BPF to trigger a kernel dump.
    Use CAP_SYS_BOOT check on the loading process to judge permissions.
 
  - Enable BPF to collect custom hierarchical cgroup stats efficiently
    by integrating with the rstat framework.
 
  - Support struct arguments for trampoline based programs.
    Only structs with size <= 16B and x86 are supported.
 
  - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
    sockets (instead of just TCP and UDP sockets).
 
  - Add a helper for accessing CLOCK_TAI for time sensitive network
    related programs.
 
  - Support accessing network tunnel metadata's flags.
 
  - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
 
  - Add support for writing to Netfilter's nf_conn:mark.
 
 Protocols
 ---------
 
  - WiFi: more Extremely High Throughput (EHT) and Multi-Link
    Operation (MLO) work (802.11be, WiFi 7).
 
  - vsock: improve support for SO_RCVLOWAT.
 
  - SMC: support SO_REUSEPORT.
 
  - Netlink: define and document how to use netlink in a "modern" way.
    Support reporting missing attributes via extended ACK.
 
  - IPSec: support collect metadata mode for xfrm interfaces.
 
  - TCPv6: send consistent autoflowlabel in SYN_RECV state
    and RST packets.
 
  - TCP: introduce optional per-netns connection hash table to allow
    better isolation between namespaces (opt-in, at the cost of memory
    and cache pressure).
 
  - MPTCP: support TCP_FASTOPEN_CONNECT.
 
  - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
 
  - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
 
  - Open vSwitch:
    - Allow specifying ifindex of new interfaces.
    - Allow conntrack and metering in non-initial user namespace.
 
  - TLS: support the Korean ARIA-GCM crypto algorithm.
 
  - Remove DECnet support.
 
 Driver API
 ----------
 
  - Allow selecting the conduit interface used by each port
    in DSA switches, at runtime.
 
  - Ethernet Power Sourcing Equipment and Power Device support.
 
  - Add tc-taprio support for queueMaxSDU parameter, i.e. setting
    per traffic class max frame size for time-based packet schedules.
 
  - Support PHY rate matching - adapting between differing host-side
    and link-side speeds.
 
  - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
 
  - Validate OF (device tree) nodes for DSA shared ports; make
    phylink-related properties mandatory on DSA and CPU ports.
    Enforcing more uniformity should allow transitioning to phylink.
 
  - Require that flash component name used during update matches one
    of the components for which version is reported by info_get().
 
  - Remove "weight" argument from driver-facing NAPI API as much
    as possible. It's one of those magic knobs which seemed like
    a good idea at the time but is too indirect to use in practice.
 
  - Support offload of TLS connections with 256 bit keys.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Microchip KSZ9896 6-port Gigabit Ethernet Switch
    - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
    - Analog Devices ADIN1110 and ADIN2111 industrial single pair
      Ethernet (10BASE-T1L) MAC+PHY.
    - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
 
  - Ethernet SFPs / modules:
    - RollBall / Hilink / Turris 10G copper SFPs
    - HALNy GPON module
 
  - WiFi:
    - CYW43439 SDIO chipset (brcmfmac)
    - CYW89459 PCIe chipset (brcmfmac)
    - BCM4378 on Apple platforms (brcmfmac)
 
 Drivers
 -------
 
  - CAN:
    - gs_usb: HW timestamp support
 
  - Ethernet PHYs:
    - lan8814: cable diagnostics
 
  - Ethernet NICs:
    - Intel (100G):
      - implement control of FCS/CRC stripping
      - port splitting via devlink
      - L2TPv3 filtering offload
    - nVidia/Mellanox:
      - tunnel offload for sub-functions
      - MACSec offload, w/ Extended packet number and replay
        window offload
      - significantly restructure, and optimize the AF_XDP support,
        align the behavior with other vendors
    - Huawei:
      - configuring DSCP map for traffic class selection
      - querying standard FEC statistics
      - querying SerDes lane number via ethtool
    - Marvell/Cavium:
      - egress priority flow control
      - MACSec offload
    - AMD/SolarFlare:
      - PTP over IPv6 and raw Ethernet
    - small / embedded:
      - ax88772: convert to phylink (to support SFP cages)
      - altera: tse: convert to phylink
      - ftgmac100: support fixed link
      - enetc: standard Ethtool counters
      - macb: ZynqMP SGMII dynamic configuration support
      - tsnep: support multi-queue and use page pool
      - lan743x: Rx IP & TCP checksum offload
      - igc: add xdp frags support to ndo_xdp_xmit
 
  - Ethernet high-speed switches:
    - Marvell (prestera):
      - support SPAN port features (traffic mirroring)
      - nexthop object offloading
    - Microchip (sparx5):
      - multicast forwarding offload
      - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - support RGMII cmode
    - NXP (felix):
      - standardized ethtool counters
    - Microchip (lan966x):
      - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
      - traffic policing and mirroring
      - link aggregation / bonding offload
      - QUSGMII PHY mode support
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - cold boot calibration support on WCN6750
    - support to connect to a non-transmit MBSSID AP profile
    - enable remain-on-channel support on WCN6750
    - Wake-on-WLAN support for WCN6750
    - support to provide transmit power from firmware via nl80211
    - support to get power save duration for each client
    - spectral scan support for 160 MHz
 
  - MediaTek WiFi (mt76):
    - WiFi-to-Ethernet bridging offload for MT7986 chips
 
  - RealTek WiFi (rtw89):
    - P2P support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S
 Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY
 fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe
 IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j
 TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52
 bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs
 HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89
 Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B
 AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q
 9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf
 LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE
 ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8=
 =Gsio
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Introduce and use a single page frag cache for allocating small skb
     heads, clawing back the 10-20% performance regression in UDP flood
     test from previous fixes.

   - Run packets which already went thru HW coalescing thru SW GRO. This
     significantly improves TCP segment coalescing and simplifies
     deployments as different workloads benefit from HW or SW GRO.

   - Shrink the size of the base zero-copy send structure.

   - Move TCP init under a new slow / sleepable version of DO_ONCE().

  BPF:

   - Add BPF-specific, any-context-safe memory allocator.

   - Add helpers/kfuncs for PKCS#7 signature verification from BPF
     programs.

   - Define a new map type and related helpers for user space -> kernel
     communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).

   - Allow targeting BPF iterators to loop through resources of one
     task/thread.

   - Add ability to call selected destructive functions. Expose
     crash_kexec() to allow BPF to trigger a kernel dump. Use
     CAP_SYS_BOOT check on the loading process to judge permissions.

   - Enable BPF to collect custom hierarchical cgroup stats efficiently
     by integrating with the rstat framework.

   - Support struct arguments for trampoline based programs. Only
     structs with size <= 16B and x86 are supported.

   - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
     sockets (instead of just TCP and UDP sockets).

   - Add a helper for accessing CLOCK_TAI for time sensitive network
     related programs.

   - Support accessing network tunnel metadata's flags.

   - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.

   - Add support for writing to Netfilter's nf_conn:mark.

  Protocols:

   - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
     (MLO) work (802.11be, WiFi 7).

   - vsock: improve support for SO_RCVLOWAT.

   - SMC: support SO_REUSEPORT.

   - Netlink: define and document how to use netlink in a "modern" way.
     Support reporting missing attributes via extended ACK.

   - IPSec: support collect metadata mode for xfrm interfaces.

   - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
     packets.

   - TCP: introduce optional per-netns connection hash table to allow
     better isolation between namespaces (opt-in, at the cost of memory
     and cache pressure).

   - MPTCP: support TCP_FASTOPEN_CONNECT.

   - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.

   - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.

   - Open vSwitch:
      - Allow specifying ifindex of new interfaces.
      - Allow conntrack and metering in non-initial user namespace.

   - TLS: support the Korean ARIA-GCM crypto algorithm.

   - Remove DECnet support.

  Driver API:

   - Allow selecting the conduit interface used by each port in DSA
     switches, at runtime.

   - Ethernet Power Sourcing Equipment and Power Device support.

   - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
     traffic class max frame size for time-based packet schedules.

   - Support PHY rate matching - adapting between differing host-side
     and link-side speeds.

   - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.

   - Validate OF (device tree) nodes for DSA shared ports; make
     phylink-related properties mandatory on DSA and CPU ports.
     Enforcing more uniformity should allow transitioning to phylink.

   - Require that flash component name used during update matches one of
     the components for which version is reported by info_get().

   - Remove "weight" argument from driver-facing NAPI API as much as
     possible. It's one of those magic knobs which seemed like a good
     idea at the time but is too indirect to use in practice.

   - Support offload of TLS connections with 256 bit keys.

  New hardware / drivers:

   - Ethernet:
      - Microchip KSZ9896 6-port Gigabit Ethernet Switch
      - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
      - Analog Devices ADIN1110 and ADIN2111 industrial single pair
        Ethernet (10BASE-T1L) MAC+PHY.
      - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).

   - Ethernet SFPs / modules:
      - RollBall / Hilink / Turris 10G copper SFPs
      - HALNy GPON module

   - WiFi:
      - CYW43439 SDIO chipset (brcmfmac)
      - CYW89459 PCIe chipset (brcmfmac)
      - BCM4378 on Apple platforms (brcmfmac)

  Drivers:

   - CAN:
      - gs_usb: HW timestamp support

   - Ethernet PHYs:
      - lan8814: cable diagnostics

   - Ethernet NICs:
      - Intel (100G):
         - implement control of FCS/CRC stripping
         - port splitting via devlink
         - L2TPv3 filtering offload
      - nVidia/Mellanox:
         - tunnel offload for sub-functions
         - MACSec offload, w/ Extended packet number and replay window
           offload
         - significantly restructure, and optimize the AF_XDP support,
           align the behavior with other vendors
      - Huawei:
         - configuring DSCP map for traffic class selection
         - querying standard FEC statistics
         - querying SerDes lane number via ethtool
      - Marvell/Cavium:
         - egress priority flow control
         - MACSec offload
      - AMD/SolarFlare:
         - PTP over IPv6 and raw Ethernet
      - small / embedded:
         - ax88772: convert to phylink (to support SFP cages)
         - altera: tse: convert to phylink
         - ftgmac100: support fixed link
         - enetc: standard Ethtool counters
         - macb: ZynqMP SGMII dynamic configuration support
         - tsnep: support multi-queue and use page pool
         - lan743x: Rx IP & TCP checksum offload
         - igc: add xdp frags support to ndo_xdp_xmit

   - Ethernet high-speed switches:
      - Marvell (prestera):
         - support SPAN port features (traffic mirroring)
         - nexthop object offloading
      - Microchip (sparx5):
         - multicast forwarding offload
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - support RGMII cmode
      - NXP (felix):
         - standardized ethtool counters
      - Microchip (lan966x):
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
         - traffic policing and mirroring
         - link aggregation / bonding offload
         - QUSGMII PHY mode support

   - Qualcomm 802.11ax WiFi (ath11k):
      - cold boot calibration support on WCN6750
      - support to connect to a non-transmit MBSSID AP profile
      - enable remain-on-channel support on WCN6750
      - Wake-on-WLAN support for WCN6750
      - support to provide transmit power from firmware via nl80211
      - support to get power save duration for each client
      - spectral scan support for 160 MHz

   - MediaTek WiFi (mt76):
      - WiFi-to-Ethernet bridging offload for MT7986 chips

   - RealTek WiFi (rtw89):
      - P2P support"

* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
  eth: pse: add missing static inlines
  once: rename _SLOW to _SLEEPABLE
  net: pse-pd: add regulator based PSE driver
  dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
  ethtool: add interface to interact with Ethernet Power Equipment
  net: mdiobus: search for PSE nodes by parsing PHY nodes.
  net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
  net: add framework to support Ethernet PSE and PDs devices
  dt-bindings: net: phy: add PoDL PSE property
  net: marvell: prestera: Propagate nh state from hw to kernel
  net: marvell: prestera: Add neighbour cache accounting
  net: marvell: prestera: add stub handler neighbour events
  net: marvell: prestera: Add heplers to interact with fib_notifier_info
  net: marvell: prestera: Add length macros for prestera_ip_addr
  net: marvell: prestera: add delayed wq and flush wq on deinit
  net: marvell: prestera: Add strict cleanup of fib arbiter
  net: marvell: prestera: Add cleanup of allocated fib_nodes
  net: marvell: prestera: Add router nexthops ABI
  eth: octeon: fix build after netif_napi_add() changes
  net/mlx5: E-Switch, Return EBUSY if can't get mode lock
  ...
2022-10-04 13:38:03 -07:00
Linus Torvalds
725737e7c2 STATX_DIOALIGN for 6.1
Make statx() support reporting direct I/O (DIO) alignment information.
 This provides a generic interface for userspace programs to determine
 whether a file supports DIO, and if so with what alignment restrictions.
 Specifically, STATX_DIOALIGN works on block devices, and on regular
 files when their containing filesystem has implemented support.
 
 An interface like this has been requested for years, since the
 conditions for when DIO is supported in Linux have gotten increasingly
 complex over time.  Today, DIO support and alignment requirements can be
 affected by various filesystem features such as multi-device support,
 data journalling, inline data, encryption, verity, compression,
 checkpoint disabling, log-structured mode, etc.  Further complicating
 things, Linux v6.0 relaxed the traditional rule of DIO needing to be
 aligned to the block device's logical block size; now user buffers (but
 not file offsets) only need to be aligned to the DMA alignment.
 
 The approach of uplifting the XFS specific ioctl XFS_IOC_DIOINFO was
 discarded in favor of creating a clean new interface with statx().
 
 For more information, see the individual commits and the man page update
 https://lore.kernel.org/r/20220722074229.148925-1-ebiggers@kernel.org.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYzpV2xQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKwF1AQDetPX5hyuq0/mwikOywLTTJsoHgGY5
 euO+dISqjH/InwD9HAQqfPRkdM1j4ml82BjjkAfrhzZXOOWPKJm0zOhMIQg=
 =0Oav
 -----END PGP SIGNATURE-----

Merge tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull STATX_DIOALIGN support from Eric Biggers:
 "Make statx() support reporting direct I/O (DIO) alignment information.

  This provides a generic interface for userspace programs to determine
  whether a file supports DIO, and if so with what alignment
  restrictions. Specifically, STATX_DIOALIGN works on block devices, and
  on regular files when their containing filesystem has implemented
  support.

  An interface like this has been requested for years, since the
  conditions for when DIO is supported in Linux have gotten increasingly
  complex over time. Today, DIO support and alignment requirements can
  be affected by various filesystem features such as multi-device
  support, data journalling, inline data, encryption, verity,
  compression, checkpoint disabling, log-structured mode, etc.

  Further complicating things, Linux v6.0 relaxed the traditional rule
  of DIO needing to be aligned to the block device's logical block size;
  now user buffers (but not file offsets) only need to be aligned to the
  DMA alignment.

  The approach of uplifting the XFS specific ioctl XFS_IOC_DIOINFO was
  discarded in favor of creating a clean new interface with statx().

  For more information, see the individual commits and the man page
  update[1]"

Link: https://lore.kernel.org/r/20220722074229.148925-1-ebiggers@kernel.org [1]

* tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  xfs: support STATX_DIOALIGN
  f2fs: support STATX_DIOALIGN
  f2fs: simplify f2fs_force_buffered_io()
  f2fs: move f2fs_force_buffered_io() into file.c
  ext4: support STATX_DIOALIGN
  fscrypt: change fscrypt_dio_supported() to prepare for STATX_DIOALIGN
  vfs: support STATX_DIOALIGN on block devices
  statx: add direct I/O alignment information
2022-10-03 20:33:41 -07:00
Linus Torvalds
5779aa2dac fsverity updates for 6.1
Minor changes to convert uses of kmap() to kmap_local_page().
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYzpKvRQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK9KSAP9PyrI3kDLBpiG9os3HIZtHyPHgt6OZ
 FA978i0UuAxgHAD7BPiIT55oBdOrn6CVy2g8PkwmkcKQx0kvhVQq8Pyz5ww=
 =49pm
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fsverity updates from Eric Biggers:
 "Minor changes to convert uses of kmap() to kmap_local_page()"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fs-verity: use kmap_local_page() instead of kmap()
  fs-verity: use memcpy_from_page()
2022-10-03 20:27:34 -07:00
Linus Torvalds
438b2cdd17 fscrypt updates for 6.1
This release contains some implementation changes, but no new features:
 
 - Rework the implementation of the fscrypt filesystem-level keyring to
   not be as tightly coupled to the keyrings subsystem.  This resolves
   several issues.
 
 - Eliminate most direct uses of struct request_queue from fs/crypto/,
   since struct request_queue is considered to be a block layer
   implementation detail.
 
 - Stop using the PG_error flag to track decryption failures.  This is a
   prerequisite for freeing up PG_error for other uses.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYzpMMRQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKxYbAP0VrWjlqonO75gYkIxwX0aTxajoKC3m
 awUDAC/feQ910gD6A4WbJivanLngJKgcxfbhN5paalZJEGNOBBrOUB1WLgs=
 =CxSh
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt updates from Eric Biggers:
 "This release contains some implementation changes, but no new
  features:

   - Rework the implementation of the fscrypt filesystem-level keyring
     to not be as tightly coupled to the keyrings subsystem. This
     resolves several issues.

   - Eliminate most direct uses of struct request_queue from fs/crypto/,
     since struct request_queue is considered to be a block layer
     implementation detail.

   - Stop using the PG_error flag to track decryption failures. This is
     a prerequisite for freeing up PG_error for other uses"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: work on block_devices instead of request_queues
  fscrypt: stop holding extra request_queue references
  fscrypt: stop using keyrings subsystem for fscrypt_master_key
  fscrypt: stop using PG_error to track error status
  fscrypt: remove fscrypt_set_test_dummy_encryption()
2022-10-03 20:18:34 -07:00
Linus Torvalds
f4309528f3 dlm for 6.1
This set of commits includes:
 . Fix a couple races found with a new torture test.
 . Improve errors when api functions are used incorrectly.
 . Improve tracing for lock requests from user space.
 . Fix use after free in recently added tracing code.
 . Small internal code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJjOyfeAAoJEDgbc8f8gGmqHF4QALKGo+95JGzfXN37dNL2ve8L
 DAKxESYIwaTEWuKxmD4AGogClEl55UoC8kxMB3dHwLZEd4U0v5ZDULR6NUYXMpos
 6miaoF+pJfBnpNRqpCieWRW5dYXD4TwSdquv5rUSmUBrdOSy34s/nORWB4kL443K
 hFPcbo5Mv1L0W70/+gdj1uBlBsenZxnXu6aEmrckONqwj9Q2SBjJTik9WuNwh+FF
 tEcmUt8kDanGkbwtMCxnbT3HDOdfQyW+qq4IJ6MOYHlW9Cqbp9QUvAIho4DEpr7f
 eGurQ/urSD3dltzuYQcZ81zGhaGxzaRt5d2AEHRrGugQ2ZvnsG74oSAmEINZTSw4
 RV2EXyJ4hXcXK/yJXo3fGzFm2/5JFvYhnvddo6wts3vQZHwefExIRCHVz2cJL9eS
 gFpfFu4uB8z7w7l9s9LJKv7cTriaDd1WHuIWZGonz3wlFSUOn7IxunDxM3Hc5YO3
 okawhr6sWe03fFcKsw1WeWymfDUwmk/7OV15OSDanItAwX5vkBYDBvAcA/cwm8cj
 P0Vb3c1/Sf1IjjHGGA13vHpD1JXJ7FHafg6jyWmjJNqaS+wtShvs2As9MqbtSWMb
 o2OcYTEEzME4mMIXZzVlKP7hhkLMaVR5PwGmbPovlyAkEUX0soH7nefyLMAqP3JG
 7VZYV46VCL7wm3yjrKYw
 =sL1G
 -----END PGP SIGNATURE-----

Merge tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:

 - Fix a couple races found with a new torture test

 - Improve errors when api functions are used incorrectly

 - Improve tracing for lock requests from user space

 - Fix use after free in recently added tracing cod.

 - Small internal code cleanups

* tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  fs: dlm: fix possible use after free if tracing
  fs: dlm: const void resource name parameter
  fs: dlm: LSFL_CB_DELAY only for kernel lockspaces
  fs: dlm: remove DLM_LSFL_FS from uapi
  fs: dlm: trace user space callbacks
  fs: dlm: change ls_clear_proc_locks to spinlock
  fs: dlm: remove dlm_del_ast prototype
  fs: dlm: handle rcom in else if branch
  fs: dlm: allow lockspaces have zero lvblen
  fs: dlm: fix invalid derefence of sb_lvbptr
  fs: dlm: handle -EINVAL as log_error()
  fs: dlm: use __func__ for function name
  fs: dlm: handle -EBUSY first in unlock validation
  fs: dlm: handle -EBUSY first in lock arg validation
  fs: dlm: fix race between test_bit() and queue_work()
  fs: dlm: fix race in lowcomms
2022-10-03 20:11:59 -07:00
Linus Torvalds
f90497a16e NFSD 6.1 Release Notes
This release is mostly bug fixes, clean-ups, and optimizations.
 
 One notable set of fixes addresses a subtle buffer overflow issue
 that occurs if a small RPC Call message arrives in an oversized
 RPC record. This is only possible on a framed RPC transport such
 as TCP.
 
 Because NFSD shares the receive and send buffers in one set of
 pages, an oversized RPC record steals pages from the send buffer
 that will be used to construct the RPC Reply message. NFSD must
 not assume that a full-sized buffer is always available to it;
 otherwise, it will walk off the end of the send buffer while
 constructing its reply.
 
 In this release, we also introduce the ability for the server to
 wait a moment for clients to return delegations before it responds
 with NFS4ERR_DELAY. This saves a retransmit and a network round-
 trip when a delegation recall is needed. This work will be built
 upon in future releases.
 
 The NFS server adds another shrinker to its collection. Because
 courtesy clients can linger for quite some time, they might be
 freeable when the server host comes under memory pressure. A new
 shrinker has been added that releases courtesy client resources
 during low memory scenarios.
 
 Lastly, of note: the maximum number of operations per NFSv4
 COMPOUND that NFSD can handle is increased from 16 to 50. There
 are NFSv4 client implementations that need more than 16 to
 successfully perform a mount operation that uses a pathname
 with many components.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmM66P4ACgkQM2qzM29m
 f5fhMg//afS2mp4fgPz4MjoFIqD/Icep8qFEPA8Gy6I1dDGRxd9wNgjoN4JALFdr
 NKX1oRVISBvDrOG/C84GbYnXEDlzY8q1HmyPoJA8VAR57hnJXfPZN6CBN//Bx4mU
 nISPJeNGY9SMNVhS8916V/yzd41uWDQuD+H+i5mluBTJHONgSzwzc80sQ+eq+yZQ
 PV6mlJN6hcm14LCaDOTXF7oY2Wm6dQc2rV87YChJWnc+vdXKnme/LWTMY1ABkePD
 g88mSL6w3YDKEuKciWda5/QU1ETp/Q7XTjFGDKEQSnnNsvCLmUKogJTKVa2QqyLY
 P1qlrj6XwukqAe414W4amlLL3q4NUFmJZPNWDxdf+qtTrQrBBEFrsKy/bSt27XoD
 cTvBWcorMG2riSlYPViVeh8RpyC6qwhttPbvGAmflVF2KEyXpfgc5Pnn0/Xm1Ac9
 XKzaCTJlUyRb/2wdqVtQbIpyh3sbhzp8zhv7sWKXgQOEXxKOO3ZAIrQXeL6oFN/b
 HlXDty7wKhFRj8IbkZfQ9SvN1saTONwB3clYHbCXTetkw/nnrUgLYcu8NIDBK9ou
 wkBcz1++XgVTqjRFwUwagb62cPJnRM6UiROYCVbQp7qcUe4/U+WP9t6dnZlnGZVZ
 dtipKlH/LTGKW+d7ysZOqb4hsRza5Kaduz5a7lML7UIGQXmxjM0=
 =hE6t
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "This release is mostly bug fixes, clean-ups, and optimizations.

  One notable set of fixes addresses a subtle buffer overflow issue that
  occurs if a small RPC Call message arrives in an oversized RPC record.
  This is only possible on a framed RPC transport such as TCP.

  Because NFSD shares the receive and send buffers in one set of pages,
  an oversized RPC record steals pages from the send buffer that will be
  used to construct the RPC Reply message. NFSD must not assume that a
  full-sized buffer is always available to it; otherwise, it will walk
  off the end of the send buffer while constructing its reply.

  In this release, we also introduce the ability for the server to wait
  a moment for clients to return delegations before it responds with
  NFS4ERR_DELAY. This saves a retransmit and a network round- trip when
  a delegation recall is needed. This work will be built upon in future
  releases.

  The NFS server adds another shrinker to its collection. Because
  courtesy clients can linger for quite some time, they might be
  freeable when the server host comes under memory pressure. A new
  shrinker has been added that releases courtesy client resources during
  low memory scenarios.

  Lastly, of note: the maximum number of operations per NFSv4 COMPOUND
  that NFSD can handle is increased from 16 to 50. There are NFSv4
  client implementations that need more than 16 to successfully perform
  a mount operation that uses a pathname with many components"

* tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (53 commits)
  nfsd: extra checks when freeing delegation stateids
  nfsd: make nfsd4_run_cb a bool return function
  nfsd: fix comments about spinlock handling with delegations
  nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
  NFSD: fix use-after-free on source server when doing inter-server copy
  NFSD: Cap rsize_bop result based on send buffer size
  NFSD: Rename the fields in copy_stateid_t
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
  nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
  NFSD: Pack struct nfsd4_compoundres
  NFSD: Remove unused nfsd4_compoundargs::cachetype field
  NFSD: Remove "inline" directives on op_rsize_bop helpers
  NFSD: Clean up nfs4svc_encode_compoundres()
  SUNRPC: Fix typo in xdr_buf_subsegment's kdoc comment
  NFSD: Clean up WRITE arg decoders
  NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
  NFSD: Refactor common code out of dirlist helpers
  ...
2022-10-03 20:07:15 -07:00
Linus Torvalds
3497640a80 Changes since last update:
- Introduce fscache-based domain to share blobs between images;
 
  - Support recording fragments in a special packed inode;
 
  - Support partial-referenced pclusters for global compressed data
    deduplication;
 
  - Fix an order >= MAX_ORDER warning due to crafted negative i_size;
 
  - Several cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYzq3FxEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBJbRAQDab/0DJu7iDktzupazfCibkg8vWzakXIi+
 KE0y5O8VaQEAwn9bdPU4cp+raowoMt3z8eGsj4H9ZO9NM8NfPUX0uQQ=
 =TNVH
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "In this cycle, for container use cases, fscache-based shared domain is
  introduced [1] so that data blobs in the same domain will be storage
  deduplicated and it will also be used for page cache sharing later.

  Also, a special packed inode is now introduced to record inode
  fragments which keep the tail part of files by Yue Hu [2]. You can
  keep arbitary length or (at will) the whole file as a fragment and
  then fragments can be optionally compressed in the packed inode
  together and even deduplicated for smaller image sizes.

  In addition to that, global compressed data deduplication by sharing
  partial-referenced pclusters is also supported in this cycle.

  Summary:

   - Introduce fscache-based domain to share blobs between images

   - Support recording fragments in a special packed inode

   - Support partial-referenced pclusters for global compressed data
     deduplication

   - Fix an order >= MAX_ORDER warning due to crafted negative i_size

   - Several cleanups"

Link: https://lore.kernel.org/r/20220916085940.89392-1-zhujia.zj@bytedance.com [1]
Link: https://lore.kernel.org/r/cover.1663065968.git.huyue2@coolpad.com [2]

* tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: clean up erofs_iget()
  erofs: clean up unnecessary code and comments
  erofs: fold in z_erofs_reload_indexes()
  erofs: introduce partial-referenced pclusters
  erofs: support on-disk compressed fragments data
  erofs: support interlaced uncompressed data for compressed files
  erofs: clean up .read_folio() and .readahead() in fscache mode
  erofs: introduce 'domain_id' mount option
  erofs: Support sharing cookies in the same domain
  erofs: introduce a pseudo mnt to manage shared cookies
  erofs: introduce fscache-based domain
  erofs: code clean up for fscache
  erofs: use kill_anon_super() to kill super in fscache mode
  erofs: fix order >= MAX_ORDER warning due to crafted negative i_size
2022-10-03 20:01:40 -07:00
Linus Torvalds
8bea8ff34a fs.vfsuid.fat.v6.1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYzqjGgAKCRCRxhvAZXjc
 ovVFAQCbZXflZk/DGy1CVEWHwJDkMlmN62jCY+3gZP6UPsCeCQEA4uB3Hub7jihO
 b2q9yhR+p6G7DHkeNAo2qUu4bI0lDgQ=
 =zG6Z
 -----END PGP SIGNATURE-----

Merge tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull fatfs vfsuid conversion from Christian Brauner:
 "Last cycle we introduced the new vfs{g,u}id_t types that we had agreed
  on. The most important parts of the vfs have been converted but there
  are a few more places we need to switch before we can remove the old
  helpers completely.

  This cycle we converted all filesystems that called idmapped mount
  helpers directly. The affected filesystems are f2fs, fat, fuse, ksmbd,
  overlayfs, and xfs. We've sent patches for all of them. Looking at
  -next f2fs, ksmbd, overlayfs, and xfs have all picked up these patches
  and they should land in mainline during the v6.1 merge window.

  So all filesystems that have a separate tree should send the vfsuid
  conversion themselves. Onle the fat conversion is going through this
  generic fs trees because there is no fat tree.

  In order to change time settings on an inode fat checks that the
  caller either is the owner of the inode or the inode's group is in the
  caller's group list. If fat is on an idmapped mount we compare whether
  the inode mapped into the mount is equivalent to the caller's fsuid.
  If it isn't we compare whether the inode's group mapped into the mount
  is in the caller's group list.

  We now use the new vfsuid based helpers for that"

* tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fat: port to vfs{g,u}id_t and associated helpers
2022-10-03 19:54:29 -07:00
Linus Torvalds
223b845253 fs.acl.rework.prep.v6.1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYzqi8gAKCRCRxhvAZXjc
 orKNAQCGKPJ3Kc3LVVnh8qdjm9npP+j9UQAB7jDZi9q7RijIIAD/VYjj+z5XLg4V
 k96ibCyir1+4EOF8ihY0WQi40MSWYws=
 =S/Wf
 -----END PGP SIGNATURE-----

Merge tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs acl updates from Christian Brauner:
 "These are general fixes and preparatory changes related to the ongoing
  posix acl rework. The actual rework where we build a type safe posix
  acl api wasn't ready for this merge window but we're hopeful for the
  next merge window.

  General fixes:

   - Some filesystems like 9p and cifs have to implement custom posix
     acl handlers because they require access to the dentry in order to
     set and get posix acls while the set and get inode operations
     currently don't. But the ntfs3 filesystem has no such requirement
     and thus implemented custom posix acl xattr handlers when it really
     didn't have to. So this pr contains patch that just implements set
     and get inode operations for ntfs3 and switches it to rely on the
     generic posix acl xattr handlers. (We would've appreciated reviews
     from the ntfs3 maintainers but we didn't get any. But hey, if we
     really broke it we'll fix it. But fstests for ntfs3 said it's
     fine.)

   - The posix_acl_fix_xattr_common() helper has been adapted so it can
     be used by a few more callers and avoiding open-coding the same
     checks over and over.

  Other than the two general fixes this series introduces a new helper
  vfs_set_acl_prepare(). The reason for this helper is so that we can
  mitigate one of the source that change {g,u}id values directly in the
  uapi struct. With the vfs_set_acl_prepare() helper we can move the
  idmapped mount fixup into the generic posix acl set handler.

  The advantage of this is that it allows us to remove the
  posix_acl_setxattr_idmapped_mnt() helper which so far we had to call
  in vfs_setxattr() to account for idmapped mounts. While semantically
  correct the problem with this approach was that we had to keep the
  value parameter of the generic vfs_setxattr() call as non-const. This
  is rectified in this series.

  Ultimately, we will get rid of all the extreme kludges and type
  unsafety once we have merged the posix api - hopefully during the next
  merge window - built solely around get and set inode operations. Which
  incidentally will also improve handling of posix acls in security and
  especially in integrity modesl. While this will come with temporarily
  having two inode operation for posix acls that is nothing compared to
  the problems we have right now and so well worth it. We'll end up with
  something that we can actually reason about instead of needing to
  write novels to explain what's going on"

* tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  xattr: always us is_posix_acl_xattr() helper
  acl: fix the comments of posix_acl_xattr_set
  xattr: constify value argument in vfs_setxattr()
  ovl: use vfs_set_acl_prepare()
  acl: move idmapping handling into posix_acl_xattr_set()
  acl: add vfs_set_acl_prepare()
  acl: return EOPNOTSUPP in posix_acl_fix_xattr_common()
  ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers
2022-10-03 19:48:54 -07:00
Linus Torvalds
da380aefdd fs/coredump fix
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzuCiQAKCRBZ7Krx/gZQ
 65z6AQDbHgeZ3vXLyHdxs3VWsUkKWMUV1gb5MmzBs/eGq0K3hAD9FfGsOoT2ow9h
 2ics8AGrvMZMHrFOwFAmolXjLW7qQws=
 =y0rq
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull coredump fix from Al Viro:
 "Brown paper bag bug fix for the coredumping fix late in the 6.0
  release cycle"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [brown paperbag] fix coredump breakage
2022-10-03 18:03:58 -07:00
Linus Torvalds
26b84401da lsm/stable-6.1 PR 20221003
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68YIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOTbA//TR8i+Wy8iswUCmtfmYg91h1uebpl
 /kjNsSmfgivAUTGamr3eN2WRlGhZfkFDPIHa25uybSA6Q+75p4lst83Rt3HDbjkv
 Ga7grCXnHwSDwJoHOSeFh0pojV2u7Zvfmiib2U5hPZEmd3kBw3NCgAJVcSGN80B2
 dct36fzZNXjvpWDbygmFtRRkmEseslSkft8bUVvNZBP+B0zvv3vcNY1QFuKuK+W2
 8wWpvO/cCSmke5i2c2ktHSk2f8/Y6n26Ik/OTHcTVfoKZLRaFbXEzLyxzLrNWd6m
 hujXgcxszTtHdmoXx+J6uBauju7TR8pi1x8mO2LSGrlpRc1cX0A5ED8WcH71+HVE
 8L1fIOmZShccPZn8xRok7oYycAUm/gIfpmSLzmZA76JsZYAe+mp9Ze9FA6fZtSwp
 7Q/rfw/Rlz25WcFBe4xypP078HkOmqutkCk2zy5liR+cWGrgy/WKX15vyC0TaPrX
 tbsRKuCLkipgfXrTk0dX3kmhz+3bJYjqeZEt7sfPSZYpaOGkNXVmAW0wnCOTuLMU
 +8pIVktvQxMmACEj2gBMz11iooR4DpWLxOcQQR/impgCpNdZ60nA0a6KPJoIXC+5
 NfTa422FZkc99QRVblUZyWSgJBW78Z3ZAQcQlo1AGLlFydbfrSFTRLbmNJZo/Nkl
 KwpGvWs5nB0rVw0=
 =VZl5
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM updates from Paul Moore:
 "Seven patches for the LSM layer and we've got a mix of trivial and
  significant patches. Highlights below, starting with the smaller bits
  first so they don't get lost in the discussion of the larger items:

   - Remove some redundant NULL pointer checks in the common LSM audit
     code.

   - Ratelimit the lockdown LSM's access denial messages.

     With this change there is a chance that the last visible lockdown
     message on the console is outdated/old, but it does help preserve
     the initial series of lockdown denials that started the denial
     message flood and my gut feeling is that these might be the more
     valuable messages.

   - Open userfaultfds as readonly instead of read/write.

     While this code obviously lives outside the LSM, it does have a
     noticeable impact on the LSMs with Ondrej explaining the situation
     in the commit description. It is worth noting that this patch
     languished on the VFS list for over a year without any comments
     (objections or otherwise) so I took the liberty of pulling it into
     the LSM tree after giving fair notice. It has been in linux-next
     since the end of August without any noticeable problems.

   - Add a LSM hook for user namespace creation, with implementations
     for both the BPF LSM and SELinux.

     Even though the changes are fairly small, this is the bulk of the
     diffstat as we are also including BPF LSM selftests for the new
     hook.

     It's also the most contentious of the changes in this pull request
     with Eric Biederman NACK'ing the LSM hook multiple times during its
     development and discussion upstream. While I've never taken NACK's
     lightly, I'm sending these patches to you because it is my belief
     that they are of good quality, satisfy a long-standing need of
     users and distros, and are in keeping with the existing nature of
     the LSM layer and the Linux Kernel as a whole.

     The patches in implement a LSM hook for user namespace creation
     that allows for a granular approach, configurable at runtime, which
     enables both monitoring and control of user namespaces. The general
     consensus has been that this is far preferable to the other
     solutions that have been adopted downstream including outright
     removal from the kernel, disabling via system wide sysctls, or
     various other out-of-tree mechanisms that users have been forced to
     adopt since we haven't been able to provide them an upstream
     solution for their requests. Eric has been steadfast in his
     objections to this LSM hook, explaining that any restrictions on
     the user namespace could have significant impact on userspace.
     While there is the possibility of impacting userspace, it is
     important to note that this solution only impacts userspace when it
     is requested based on the runtime configuration supplied by the
     distro/admin/user. Frederick (the pathset author), the LSM/security
     community, and myself have tried to work with Eric during
     development of this patchset to find a mutually acceptable
     solution, but Eric's approach and unwillingness to engage in a
     meaningful way have made this impossible. I have CC'd Eric directly
     on this pull request so he has a chance to provide his side of the
     story; there have been no objections outside of Eric's"

* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lockdown: ratelimit denial messages
  userfaultfd: open userfaultfds with O_RDONLY
  selinux: Implement userns_create hook
  selftests/bpf: Add tests verifying bpf lsm userns_create hook
  bpf-lsm: Make bpf_lsm_userns_create() sleepable
  security, lsm: Introduce security_create_user_ns()
  lsm: clean up redundant NULL pointer check
2022-10-03 17:51:52 -07:00
Al Viro
4f526fef91 [brown paperbag] fix coredump breakage
Let me count the ways in which I'd screwed up:

* when emitting a page, handling of gaps in coredump should happen
before fetching the current file position.
* fix for a problem that occurs on rather uncommon setups (and hadn't
been observed in the wild) had been sent very late in the cycle.
* ... with badly insufficient testing, introducing an easily
reproducible breakage.  Without giving it time to soak in -next.

Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: "J. R. Okajima" <hooanon05g@gmail.com>
Tested-by: "J. R. Okajima" <hooanon05g@gmail.com>
Fixes: 06bbaa6dc5 "[coredump] don't use __kernel_write() on kmap_local_page()"
Cc: stable@kernel.org	# v6.0-only
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-10-03 20:28:38 -04:00
Linus Torvalds
12ed00ba01 execve updates for v6.1-rc1
- Remove a.out implementation globally (Eric W. Biederman)
 
 - Remove unused linux_binprm::taso member (Lukas Bulwahn)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4bQsWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJkftD/4gTcnAd3BCUgerhiQfq64kYNPt
 l47p0BM6GzXvl1Mf4Q0TDV35WJ/JD5Yd3ij3V7J2XJWSHANUAlHbxm9yfChVLACU
 99YRVhuWSdohJkF7p0b8dkAQO551aeodj/JUKGiNrJyNR4L336r+YG5aqEjOSPji
 jQH5I4SonDeaGdLy8nYO/aRhEryIF1FqvLH6egNp6Tt8Q69UqDYIojdCgZ/MS5lb
 dldrFsDb3ZjoXET0NdeIzZEZVS6zDM2iehb2W8dtRFoNsjMXz4jSiy7AEoprETBz
 wAKZ16t0dj2sARLLVGL8i3m2k6tzD6zzkIIoc9X3VOyeOADa6aghagDMyDpRo6ZB
 2ML7wMNCHCboCVVfG3n2rWTIFmrqeycAiny0hZxU4bjBBYSxTK4qD9lFQtlXk0cD
 BESZhnM6gg87vVFgLV/8aefCvwRd5eb8Pugtwb3qF4NsSvgosZtIXhWnIeU3cjg2
 425+4XOPbLsBv/u8NVkG8yIHaHZbtXH78JDIcNFMgSvw9iBwn2NH9184EpHI4qRx
 9aBjHPz+VOqzPTnNR6ISP5J4VXaOvHeocb/ckbrS+xhY8zQmbWX9QHaAZ+XnDYby
 PVY0HjmYTFSlijHSRhqLqNMHwtOAeiSZwJNk4wh+39H0ynpTzThGjQ9NlGYQ5cUE
 TVF5ukO5QRa5GbmhIQ==
 =Ns31
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve updates from Kees Cook:
 "This removes a.out support globally; it has been disabled for a while
  now.

   - Remove a.out implementation globally (Eric W. Biederman)

   - Remove unused linux_binprm::taso member (Lukas Bulwahn)"

* tag 'execve-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt: remove taso from linux_binprm struct
  a.out: Remove the a.out implementation
2022-10-03 16:56:40 -07:00
Ye Bin
1b45cc5c7b ext4: fix potential out of bound read in ext4_fc_replay_scan()
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
when mounting corrupt file system image.
ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
three tags will read data during scan, tag length couldn't less than data length
which will read.

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
dcc5827484 ext4: factor out ext4_fc_get_tl()
Factor out ext4_fc_get_tl() to fill 'tl' with host byte order.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
fdc2a3c75d ext4: introduce EXT4_FC_TAG_BASE_LEN helper
Introduce EXT4_FC_TAG_BASE_LEN helper for calculate length of
struct ext4_fc_tl.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-2-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
7ff5fddadd ext4: factor out ext4_free_ext_path()
Factor out ext4_free_ext_path() to free extent path. As after previous patch
'ext4_ext_drop_refs()' is only used in 'extents.c', so make it static.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220924021211.3831551-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
b6a750c019 ext4: remove unnecessary drop path references in mext_check_coverage()
According to Jan Kara's suggestion:
"The use in mext_check_coverage() can be actually removed
- get_ext_path() -> ext4_find_extent() takes care of dropping the references."
So remove unnecessary call ext4_ext_drop_refs() in mext_check_coverage().

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220924021211.3831551-2-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
27cd497803 ext4: update 'state->fc_regions_size' after successful memory allocation
To avoid to 'state->fc_regions_size' mismatch with 'state->fc_regions'
when fail to reallocate 'fc_reqions',only update 'state->fc_regions_size'
after 'state->fc_regions' is allocated successfully.

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220921064040.3693255-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
7069d105c1 ext4: fix potential memory leak in ext4_fc_record_regions()
As krealloc may return NULL, in this case 'state->fc_regions' may not be
freed by krealloc, but 'state->fc_regions' already set NULL. Then will
lead to 'state->fc_regions' memory leak.

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220921064040.3693255-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
9305721a30 ext4: fix potential memory leak in ext4_fc_record_modified_inode()
As krealloc may return NULL, in this case 'state->fc_modified_inodes'
may not be freed by krealloc, but 'state->fc_modified_inodes' already
set NULL. Then will lead to 'state->fc_modified_inodes' memory leak.

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220921064040.3693255-2-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Guoqing Jiang
78ed9354c5 ext4: remove redundant checking in ext4_ioctl_checkpoint
It is already checked after comment "check for invalid bits set",
so let's remove this one.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20220918115219.12407-1-guoqing.jiang@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Ye Bin
dfff66f30f jbd2: add miss release buffer head in fc_do_one_pass()
In fc_do_one_pass() miss release buffer head after use which will lead
to reference count leak.

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220917093805.1782845-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Jason Yan
3df11e27f0 ext4: move DIOREAD_NOLOCK setting to ext4_set_def_opts()
Now since all preparations is done, we can move the DIOREAD_NOLOCK
setting to ext4_set_def_opts().

Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220916141527.1012715-17-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Jason Yan
c8267c5142 ext4: remove useless local variable 'blocksize'
Since sb->s_blocksize is now initialized at the very beginning, the
local variable 'blocksize' in __ext4_fill_super() is not needed now.
Remove it and use sb->s_blocksize instead.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220916141527.1012715-16-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Jason Yan
a7a79c292a ext4: unify the ext4 super block loading operation
Now we load the super block from the disk in two steps. First we load
the super block with the default block size(EXT4_MIN_BLOCK_SIZE). Second
we load the super block with the real block size. The second step is a
little far from the first step. This patch move these two steps together
in a new function.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220916141527.1012715-15-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
a5991e539c ext4: factor out ext4_journal_data_mode_check()
Factor out ext4_journal_data_mode_check(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara<jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-14-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
9c1dd22d74 ext4: factor out ext4_load_and_init_journal()
This patch group the journal load and initialize code together and
factor out ext4_load_and_init_journal(). This patch also removes the
lable 'no_journal' which is not needed after refactor.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-13-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
a4e6a511d7 ext4: factor out ext4_group_desc_init() and ext4_group_desc_free()
Factor out ext4_group_desc_init() and ext4_group_desc_free(). No
functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-12-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
bc62dbf914 ext4: factor out ext4_geometry_check()
Factor out ext4_geometry_check(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-11-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
d7f3542b32 ext4: factor out ext4_check_feature_compatibility()
Factor out ext4_check_feature_compatibility(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-10-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
b26458d151 ext4: factor out ext4_init_metadata_csum()
Factor out ext4_init_metadata_csum(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-9-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
39c135b08c ext4: factor out ext4_encoding_init()
Factor out ext4_encoding_init(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220916141527.1012715-8-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
0e495f7cc3 ext4: factor out ext4_inode_info_init()
Factor out ext4_inode_info_init(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-7-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
f7314a6732 ext4: factor out ext4_fast_commit_init()
Factor out ext4_fast_commit_init(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-6-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
4a8557b094 ext4: factor out ext4_handle_clustersize()
Factor out ext4_handle_clustersize(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-5-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:53 -04:00
Jason Yan
5f6d662d12 ext4: factor out ext4_set_def_opts()
Factor out ext4_set_def_opts(). No functional change.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-4-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:52 -04:00
Jason Yan
a5fc511935 ext4: remove cantfind_ext4 error handler
The 'cantfind_ext4' error handler is just a error msg print and then
goto failed_mount. This two level goto makes the code complex and not
easy to read. The only benefit is that is saves a little bit code.
However some branches can merge and some branches dot not even need it.
So do some refactor and remove it.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-3-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:52 -04:00
Jason Yan
43bd6f1b49 ext4: goto right label 'failed_mount3a'
Before these two branches neither loaded the journal nor created the
xattr cache. So the right label to goto is 'failed_mount3a'. Although
this did not cause any issues because the error handler validated if the
pointer is null. However this still made me confused when reading
the code. So it's still worth to modify to goto the right label.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220916141527.1012715-2-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:52 -04:00
Ye Bin
e64e6ca909 ext4: adjust fast commit disable judgement order in ext4_fc_track_inode
If fastcommit is already disabled, there isn't need to mark inode ineligible.
So move 'ext4_fc_disabled()' judgement bofore 'ext4_should_journal_data(inode)'
judgement which can avoid to do meaningless judgement.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220916083836.388347-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:52 -04:00
Ye Bin
b7b80a35fb ext4: factor out ext4_fc_disabled()
Factor out ext4_fc_disabled(). No functional change.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220916083836.388347-2-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:52 -04:00
Ye Bin
ccbf8eeb39 ext4: fix miss release buffer head in ext4_fc_write_inode
In 'ext4_fc_write_inode' function first call 'ext4_get_inode_loc' get 'iloc',
after use it miss release 'iloc.bh'.
So just release 'iloc.bh' before 'ext4_fc_write_inode' return.

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220914100859.1415196-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:52 -04:00