Commit Graph

3094 Commits

Author SHA1 Message Date
Linus Torvalds
fdaf9a5840 Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache
Pull page cache updates from Matthew Wilcox:

 - Appoint myself page cache maintainer

 - Fix how scsicam uses the page cache

 - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS

 - Remove the AOP flags entirely

 - Remove pagecache_write_begin() and pagecache_write_end()

 - Documentation updates

 - Convert several address_space operations to use folios:
     - is_dirty_writeback
     - readpage becomes read_folio
     - releasepage becomes release_folio
     - freepage becomes free_folio

 - Change filler_t to require a struct file pointer be the first
   argument like ->read_folio

* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
  nilfs2: Fix some kernel-doc comments
  Appoint myself page cache maintainer
  fs: Remove aops->freepage
  secretmem: Convert to free_folio
  nfs: Convert to free_folio
  orangefs: Convert to free_folio
  fs: Add free_folio address space operation
  fs: Convert drop_buffers() to use a folio
  fs: Change try_to_free_buffers() to take a folio
  jbd2: Convert release_buffer_page() to use a folio
  jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
  reiserfs: Convert release_buffer_page() to use a folio
  fs: Remove last vestiges of releasepage
  ubifs: Convert to release_folio
  reiserfs: Convert to release_folio
  orangefs: Convert to release_folio
  ocfs2: Convert to release_folio
  nilfs2: Remove comment about releasepage
  nfs: Convert to release_folio
  jfs: Convert to release_folio
  ...
2022-05-24 19:55:07 -07:00
Linus Torvalds
bd1b7c1384 Merge tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "Features:

   - subpage:
      - support for PAGE_SIZE > 4K (previously only 64K)
      - make it work with raid56

   - repair super block num_devices automatically if it does not match
     the number of device items

   - defrag can convert inline extents to regular extents, up to now
     inline files were skipped but the setting of mount option
     max_inline could affect the decision logic

   - zoned:
      - minimal accepted zone size is explicitly set to 4MiB
      - make zone reclaim less aggressive and don't reclaim if there are
        enough free zones
      - add per-profile sysfs tunable of the reclaim threshold

   - allow automatic block group reclaim for non-zoned filesystems, with
     sysfs tunables

   - tree-checker: new check, compare extent buffer owner against owner
     rootid

  Performance:

   - avoid blocking on space reservation when doing nowait direct io
     writes (+7% throughput for reads and writes)

   - NOCOW write throughput improvement due to refined locking (+3%)

   - send: reduce pressure to page cache by dropping extent pages right
     after they're processed

  Core:

   - convert all radix trees to xarray

   - add iterators for b-tree node items

   - support printk message index

   - user bulk page allocation for extent buffers

   - switch to bio_alloc API, use on-stack bios where convenient, other
     bio cleanups

   - use rw lock for block groups to favor concurrent reads

   - simplify workques, don't allocate high priority threads for all
     normal queues as we need only one

   - refactor scrub, process chunks based on their constraints and
     similarity

   - allocate direct io structures on stack and pass around only
     pointers, avoids allocation and reduces potential error handling

  Fixes:

   - fix count of reserved transaction items for various inode
     operations

   - fix deadlock between concurrent dio writes when low on free data
     space

   - fix a few cases when zones need to be finished

  VFS, iomap:

   - add helper to check if sb write has started (usable for assertions)

   - new helper iomap_dio_alloc_bio, export iomap_dio_bio_end_io"

* tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (173 commits)
  btrfs: zoned: introduce a minimal zone size 4M and reject mount
  btrfs: allow defrag to convert inline extents to regular extents
  btrfs: add "0x" prefix for unsupported optional features
  btrfs: do not account twice for inode ref when reserving metadata units
  btrfs: zoned: fix comparison of alloc_offset vs meta_write_pointer
  btrfs: send: avoid trashing the page cache
  btrfs: send: keep the current inode open while processing it
  btrfs: allocate the btrfs_dio_private as part of the iomap dio bio
  btrfs: move struct btrfs_dio_private to inode.c
  btrfs: remove the disk_bytenr in struct btrfs_dio_private
  btrfs: allocate dio_data on stack
  iomap: add per-iomap_iter private data
  iomap: allow the file system to provide a bio_set for direct I/O
  btrfs: add a btrfs_dio_rw wrapper
  btrfs: zoned: zone finish unused block group
  btrfs: zoned: properly finish block group on metadata write
  btrfs: zoned: finish block group when there are no more allocatable bytes left
  btrfs: zoned: consolidate zone finish functions
  btrfs: zoned: introduce btrfs_zoned_bg_is_full
  btrfs: improve error reporting in lookup_inline_extent_backref
  ...
2022-05-24 18:52:35 -07:00
Linus Torvalds
65965d9530 Merge tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs (and fscache) updates from Gao Xiang:
 "After working on it on the mailing list for more than half a year, we
  finally form 'erofs over fscache' feature into shape. Hopefully it
  could bring more possibility to the communities.

  The story mainly started from a new project what we called "RAFS v6" [1]
  for Nydus image service almost a year ago, which enhances EROFS to be
  a new form of one bootstrap (which includes metadata representing the
  whole fs tree) + several data-deduplicated content addressable blobs
  (actually treated as multiple devices). Each blob can represent one
  container image layer but not quite exactly since all new data can be
  fully existed in the previous blobs so no need to introduce another
  new blob.

  It is actually not a new idea (at least on my side it's much like a
  simpilied casync [2] for now) and has many benefits over per-file
  blobs or some other exist ways since typically each RAFS v6 image only
  has dozens of device blobs instead of thousands of per-file blobs.
  It's easy to be signed with user keys as a golden image, transfered
  untouchedly with minimal overhead over the network, kept in some type
  of storage conveniently, and run with (optional) runtime verification
  but without involving too many irrelevant features crossing the system
  beyond EROFS itself. At least it's our final goal and we're keeping
  working on it. There was also a good summary of this approach from the
  casync author [3].

  Regardless further optimizations, this work is almost done in the
  previous Linux release cycles. In this round, we'd like to introduce
  on-demand load for EROFS with the fscache/cachefiles infrastructure,
  considering the following advantages:

   - Introduce new file-based backend to EROFS. Although each image only
     contains dozens of blobs but in densely-deployed runC host for
     example, there could still be massive blobs on a machine, which is
     messy if each blob is treated as a device. In contrast, fscache and
     cachefiles are really great interfaces for us to make them work.

   - Introduce on-demand load to fscache and EROFS. Previously, fscache
     is mainly used to caching network-likewise filesystems, now it can
     support on-demand downloading for local fses too with the exact
     localfs on-disk format. It has many advantages which we're been
     described in the latest patchset cover letter [4]. In addition to
     that, most importantly, the cached data is still stored in the
     original local fs on-disk format so that it's still the one signed
     with private keys but only could be partially available. Users can
     fully trust it during running. Later, users can also back up
     cachefiles easily to another machine.

   - More reliable on-demand approach in principle. After data is all
     available locally, user daemon can be no longer online in some use
     cases, which helps daemon crash recovery (filesystems can still in
     service) and hot-upgrade (user daemon can be upgraded more
     frequently due to new features or protocols introduced.)

   - Other format can also be converted to EROFS filesystem format over
     the internet on the fly with the new on-demand load feature and
     mounted. That is entirely possible with on-demand load feature as
     long as such archive format metadata can be fetched in advance like
     stargz.

  In addition, although currently our target user is Nydus image service [5],
  but laterly, it can be used for other use cases like on-demand system
  booting, etc. As for the fscache on-demand load feature itself,
  strictly it can be used for other local fses too. Laterly we could
  promote most code to the iomap infrastructure and also enhance it in
  the read-write way if other local fses are interested.

  Thanks David Howells for taking so much time and patience on this
  these months, many thanks with great respect here again! Thanks Jeffle
  for working on this feature and Xin Yin from Bytedance for
  asynchronous I/O implementation as well as Zichen Tian, Jia Zhu, and
  Yan Song for testing, much appeciated. We're also exploring more
  possibly over fscache cache management over FSDAX for secure
  containers and working on more improvements and useful features for
  fscache, cachefiles, and on-demand load.

  In addition to "erofs over fscache", NFS export and idmapped mount are
  also completed in this cycle for container use cases as well.

  Summary:

   - Add erofs on-demand load support over fscache

   - Support NFS export for erofs

   - Support idmapped mounts for erofs

   - Don't prompt for risk any more when using big pcluster

   - Fix buffer copy overflow of ztailpacking feature

   - Several minor cleanups"

[1] https://lore.kernel.org/r/20210730194625.93856-1-hsiangkao@linux.alibaba.com
[2] https://github.com/systemd/casync
[3] http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
[4] https://lore.kernel.org/r/20220509074028.74954-1-jefflexu@linux.alibaba.com
[5] https://github.com/dragonflyoss/image-service

* tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (29 commits)
  erofs: scan devices from device table
  erofs: change to use asynchronous io for fscache readpage/readahead
  erofs: add 'fsid' mount option
  erofs: implement fscache-based data readahead
  erofs: implement fscache-based data read for inline layout
  erofs: implement fscache-based data read for non-inline layout
  erofs: implement fscache-based metadata read
  erofs: register fscache context for extra data blobs
  erofs: register fscache context for primary data blob
  erofs: add erofs_fscache_read_folios() helper
  erofs: add anonymous inode caching metadata for data blobs
  erofs: add fscache context helper functions
  erofs: register fscache volume
  erofs: add fscache mode check helper
  erofs: make erofs_map_blocks() generally available
  cachefiles: document on-demand read mode
  cachefiles: add tracepoints for on-demand read mode
  cachefiles: enable on-demand read mode
  cachefiles: implement on-demand read
  cachefiles: notify the user daemon when withdrawing cookie
  ...
2022-05-24 18:42:04 -07:00
Linus Torvalds
2319be1356 Merge tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - rwsem cleanups & optimizations/fixes:
    - Conditionally wake waiters in reader/writer slowpaths
    - Always try to wake waiters in out_nolock path

 - Add try_cmpxchg64() implementation, with arch optimizations - and use
   it to micro-optimize sched_clock_{local,remote}()

 - Various force-inlining fixes to address objdump instrumentation-check
   warnings

 - Add lock contention tracepoints:

    lock:contention_begin
    lock:contention_end

 - Misc smaller fixes & cleanups

* tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
  locking/atomic/x86: Introduce arch_try_cmpxchg64
  locking/atomic: Add generic try_cmpxchg64 support
  futex: Remove a PREEMPT_RT_FULL reference.
  locking/qrwlock: Change "queue rwlock" to "queued rwlock"
  lockdep: Delete local_irq_enable_in_hardirq()
  locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
  locking: Apply contention tracepoints in the slow path
  locking: Add lock contention tracepoints
  locking/rwsem: Always try to wake waiters in out_nolock path
  locking/rwsem: Conditionally wake waiters in reader/writer slowpaths
  locking/rwsem: No need to check for handoff bit if wait queue empty
  lockdep: Fix -Wunused-parameter for _THIS_IP_
  x86/mm: Force-inline __phys_addr_nodebug()
  x86/kvm/svm: Force-inline GHCB accessors
  task_stack, x86/cea: Force-inline stack helpers
2022-05-24 10:18:23 -07:00
Linus Torvalds
8443516da6 Merge tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
 "This includes some small changes to kernel/stop_machine.c and arch/x86
  which are deps of the new Intel IFS support.

  Highlights:

   - New drivers:
       - Intel "In Field Scan" (IFS) support
       - Winmate FM07/FM07P buttons
       - Mellanox SN2201 support

   -  AMD PMC driver enhancements

   -  Lots of various other small fixes and hardware-id additions"

* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
  platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
  platform/x86: intel_cht_int33fe: Set driver data
  platform/x86: intel-hid: fix _DSM function index handling
  platform/x86: toshiba_acpi: use kobj_to_dev()
  platform/x86: samsung-laptop: use kobj_to_dev()
  platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
  tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
  tools/power/x86/intel-speed-select: Display error on turbo mode disabled
  Documentation: In-Field Scan
  platform/x86/intel/ifs: add ABI documentation for IFS
  trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
  platform/x86/intel/ifs: Add IFS sysfs interface
  platform/x86/intel/ifs: Add scan test support
  platform/x86/intel/ifs: Authenticate and copy to secured memory
  platform/x86/intel/ifs: Check IFS Image sanity
  platform/x86/intel/ifs: Read IFS firmware image
  platform/x86/intel/ifs: Add stub driver for In-Field Scan
  stop_machine: Add stop_core_cpuslocked() for per-core operations
  x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
  x86/microcode/intel: Expose collect_cpu_info_early() for IFS
  ...
2022-05-23 20:38:39 -07:00
Linus Torvalds
6e01f86fb2 Merge tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and timekeeping updates from Thomas Gleixner:

 - Expose CLOCK_TAI to instrumentation to aid with TSN debugging.

 - Ensure that the clockevent is stopped when there is no timer armed to
   avoid pointless wakeups.

 - Make the sched clock frequency handling and rounding consistent.

 - Provide a better debugobject hint for delayed works. The timer
   callback is always the same, which makes it difficult to identify the
   underlying work. Use the work function as a hint instead.

 - Move the timer specific sysctl code into the timer subsystem.

 - The usual set of improvements and cleanups

* tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Provide a better debugobjects hint for delayed works
  time/sched_clock: Fix formatting of frequency reporting code
  time/sched_clock: Use Hz as the unit for clock rate reporting below 4kHz
  time/sched_clock: Round the frequency reported to nearest rather than down
  timekeeping: Consolidate fast timekeeper
  timekeeping: Annotate ktime_get_boot_fast_ns() with data_race()
  timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
  timekeeping: Introduce fast accessor to clock tai
  tracing/timer: Add missing argument documentation of trace points
  clocksource: Replace cpumask_weight() with cpumask_empty()
  timers: Move timer sysctl into the timer code
  clockevents: Use dedicated list iterator variable
  timers: Simplify calc_index()
  timers: Initialize base::next_expiry_recalc in timers_prepare_cpu()
2022-05-23 17:05:55 -07:00
Linus Torvalds
9836e93c0a Merge tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block
Pull io_uring NVMe command passthrough from Jens Axboe:
 "On top of everything else, this adds support for passthrough for
  io_uring.

  The initial feature for this is NVMe passthrough support, which allows
  non-filesystem based IO commands and admin commands.

  To support this, io_uring grows support for SQE and CQE members that
  are twice as big, allowing to pass in a full NVMe command without
  having to copy data around. And to complete with more than just a
  single 32-bit value as the output"

* tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits)
  io_uring: cleanup handling of the two task_work lists
  nvme: enable uring-passthrough for admin commands
  nvme: helper for uring-passthrough checks
  blk-mq: fix passthrough plugging
  nvme: add vectored-io support for uring-cmd
  nvme: wire-up uring-cmd support for io-passthru on char-device.
  nvme: refactor nvme_submit_user_cmd()
  block: wire-up support for passthrough plugging
  fs,io_uring: add infrastructure for uring-cmd
  io_uring: support CQE32 for nop operation
  io_uring: enable CQE32
  io_uring: support CQE32 in /proc info
  io_uring: add tracing for additional CQE32 fields
  io_uring: overflow processing for CQE32
  io_uring: flush completions for CQE32
  io_uring: modify io_get_cqe for CQE32
  io_uring: add CQE32 completion processing
  io_uring: add CQE32 setup processing
  io_uring: change ring size calculation for CQE32
  io_uring: store add. return values for CQE32
  ...
2022-05-23 13:06:15 -07:00
Linus Torvalds
368da430d0 Merge tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block
Pull io_uring socket() support from Jens Axboe:
 "This adds support for socket(2) for io_uring. This is handy when using
  direct / registered file descriptors with io_uring.

  Outside of those two patches, a small series from Dylan on top that
  improves the tracing by providing a text representation of the opcode
  rather than needing to decode this by reading the header file every
  time.

  That sits in this branch as it was the last opcode added (until it
  wasn't...)"

* tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()
2022-05-23 12:42:33 -07:00
Linus Torvalds
09beaff75e Merge tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block
Pull io_uring xattr support from Jens Axboe:
 "Support for the xattr variants"

* tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: cleanup error-handling around io_req_complete
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr
2022-05-23 12:30:30 -07:00
Linus Torvalds
3a166bdbf3 Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
 "Here are the main io_uring changes for 5.19. This contains:

   - Fixes for sparse type warnings (Christoph, Vasily)

   - Support for multi-shot accept (Hao)

   - Support for io_uring managed fixed files, rather than always
     needing the applicationt o manage the indices (me)

   - Fix for a spurious poll wakeup (Dylan)

   - CQE overflow fixes (Dylan)

   - Support more types of cancelations (me)

   - Support for co-operative task_work signaling, rather than always
     forcing an IPI (me)

   - Support for doing poll first when appropriate, rather than always
     attempting a transfer first (me)

   - Provided buffer cleanups and support for mapped buffers (me)

   - Improve how io_uring handles inflight SCM files (Pavel)

   - Speedups for registered files (Pavel, me)

   - Organize the completion data in a struct in io_kiocb rather than
     keep it in separate spots (Pavel)

   - task_work improvements (Pavel)

   - Cleanup and optimize the submission path, in general and for
     handling links (Pavel)

   - Speedups for registered resource handling (Pavel)

   - Support sparse buffers and file maps (Pavel, me)

   - Various fixes and cleanups (Almog, Pavel, me)"

* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
  io_uring: fix incorrect __kernel_rwf_t cast
  io_uring: disallow mixed provided buffer group registrations
  io_uring: initialize io_buffer_list head when shared ring is unregistered
  io_uring: add fully sparse buffer registration
  io_uring: use rcu_dereference in io_close
  io_uring: consistently use the EPOLL* defines
  io_uring: make apoll_events a __poll_t
  io_uring: drop a spurious inline on a forward declaration
  io_uring: don't use ERR_PTR for user pointers
  io_uring: use a rwf_t for io_rw.flags
  io_uring: add support for ring mapped supplied buffers
  io_uring: add io_pin_pages() helper
  io_uring: add buffer selection support to IORING_OP_NOP
  io_uring: fix locking state for empty buffer group
  io_uring: implement multishot mode for accept
  io_uring: let fast poll support multishot
  io_uring: add REQ_F_APOLL_MULTISHOT for requests
  io_uring: add IORING_ACCEPT_MULTISHOT for accept
  io_uring: only wake when the correct events are set
  io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
  ...
2022-05-23 12:22:49 -07:00
Vasily Averin
0e7579ca73 io_uring: fix incorrect __kernel_rwf_t cast
Currently 'make C=1 fs/io_uring.o' generates sparse warning:

  CHECK   fs/io_uring.c
fs/io_uring.c: note: in included file (through
include/trace/trace_events.h, include/trace/define_trace.h, i
nclude/trace/events/io_uring.h):
./include/trace/events/io_uring.h:488:1:
 warning: incorrect type in assignment (different base types)
    expected unsigned int [usertype] op_flags
    got restricted __kernel_rwf_t const [usertype] rw_flags

This happen on cast of sqe->rw_flags which is defined as __kernel_rwf_t,
this type is bitwise and requires __force attribute for any casts.
However rw_flags is a member of the union, and its access can be safely
replaced by using of its neighbours, so let's use poll32_events to fix
the sparse warning.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Link: https://lore.kernel.org/r/6f009241-a63f-ae43-a04b-62841aaef293@openvz.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-19 12:27:59 -06:00
Linus Torvalds
01464a73a6 Merge tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Two small changes fixing issues from the 5.18 merge window:

   - Fix wrong ordering of a tracepoint (Dylan)

   - Fix MSG_RING on IOPOLL rings (me)"

* tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
  io_uring: don't attempt to IOPOLL for MSG_RING requests
  io_uring: fix ordering of args in io_uring_queue_async_work
2022-05-18 14:21:30 -10:00
Jeffle Xu
1519670e4f cachefiles: add tracepoints for on-demand read mode
Add tracepoints for on-demand read mode. Currently following tracepoints
are added:

	OPEN request / COPEN reply
	CLOSE request
	READ request / CREAD reply
	write through anonymous fd
	release of anonymous fd

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-8-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-18 00:11:18 +08:00
Jeffle Xu
c838305450 cachefiles: notify the user daemon when looking up cookie
Fscache/CacheFiles used to serve as a local cache for a remote
networking fs. A new on-demand read mode will be introduced for
CacheFiles, which can boost the scenario where on-demand read semantics
are needed, e.g. container image distribution.

The essential difference between these two modes is seen when a cache
miss occurs: In the original mode, the netfs will fetch the data from
the remote server and then write it to the cache file; in on-demand
read mode, fetching the data and writing it into the cache is delegated
to a user daemon.

As the first step, notify the user daemon when looking up cookie. In
this case, an anonymous fd is sent to the user daemon, through which the
user daemon can write the fetched data to the cache file. Since the user
daemon may move the anonymous fd around, e.g. through dup(), an object
ID uniquely identifying the cache file is also attached.

Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
the cache file size shall be retrieved at runtime. This helps the
scenario where one cache file contains multiple netfs files, e.g. for
the purpose of deduplication. In this case, netfs itself has no idea the
size of the cache file, whilst the user daemon should give the hint on
it.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-18 00:11:17 +08:00
Christoph Hellwig
a31b4a4368 btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue
Just let the one caller that wants optional WQ_HIGHPRI handling allocate
a separate btrfs_workqueue for that.  This allows to rename struct
__btrfs_workqueue to btrfs_workqueue, remove a pointer indirection and
separate allocation for all btrfs_workqueue users and generally simplify
the code.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16 17:03:15 +02:00
Tony Luck
51af802fc0 trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
Add tracing support which may be useful for debugging systems that fail to complete
In Field Scan tests.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220506225410.1652287-11-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Dylan Yudaken
2d2d5cb6ca io_uring: fix ordering of args in io_uring_queue_async_work
Fix arg ordering in TP_ARGS macro, which fixes the output.

Fixes: 502c87d655 ("io-uring: Make tracepoints consistent.")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220512091834.728610-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12 06:18:58 -06:00
Delyan Kratunov
9c2136be08 sched/tracing: Append prev_state to tp args instead
Commit fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.

This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.

If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.

Fixes: fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
2022-05-12 00:37:11 +02:00
Stefan Roesch
c4bb964fa0 io_uring: add tracing for additional CQE32 fields
This adds tracing for the extra1 and extra2 fields.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09 06:35:34 -06:00
Matthew Wilcox (Oracle)
9d6b0cd757 fs: Remove flags parameter from aops->write_begin
There are no more aop flags left, so remove the parameter.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08 14:28:19 -04:00
Dylan Yudaken
033b87d24f io_uring: use the text representation of ops in trace
It is annoying to translate opcodes to textwhen tracing io_uring. Use the
io_uring_get_opcode function instead to use the text representation.

A downside here might have been that if the opcode is invalid it will not
be obvious, however the opcode is already overridden in these cases to
0 (NOP) in io_init_req(). Therefore this is a non issue.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-5-dylany@fb.com
[axboe: don't include register, those are not req opcodes]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-28 17:06:03 -06:00
Dylan Yudaken
1460af7de6 io_uring: rename op -> opcode
do this for consistency with the other trace messages

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-26 06:50:42 -06:00
Jens Axboe
0200ce6a57 io_uring: fix trace for reduced sqe padding
__pad2 is only 1 u64 now, the other one is addr3. Adjust the trace so
that it matches up.

Fixes: a56834e0fa ("io_uring: add fgetxattr and getxattr support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:46 -06:00
Dylan Yudaken
47894438e9 io_uring: add trace support for CQE overflow
Add trace function for overflowing CQ ring.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Thomas Gleixner
ce8abf340e Merge tag 'tai-for-tracing' into timers/core
Pull in the NMI safe TAI accessor which was provided for the tracing tree
to prepare for further changes in this area.
2022-04-14 16:55:47 +02:00
Anna-Maria Behnsen
fde33ca4cb tracing/timer: Add missing argument documentation of trace points
Documentation of trace points timer_start, timer_expire_entry and
hrtimer_start lack always the last argument. Add it to keep implementation
and documentation in sync.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220411140115.24185-1-anna-maria@linutronix.de
2022-04-14 16:14:49 +02:00
Linus Torvalds
c1488c9751 Merge tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:

 - Fix a write performance regression

 - Fix crashes during request deferral on RDMA transports

* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix the svc_deferred_event trace class
  SUNRPC: Fix NFSD's request deferral on RDMA transports
  nfsd: Clean up nfsd_file_put()
  nfsd: Fix a write performance regression
  SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-12 14:23:19 -10:00
Linus Torvalds
1a3b1bba7c Merge tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
 "Stable fixes:

   - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()

  Bugfixes:

   - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the
     NFSv4.2 xattr code.

   - Fix for open() using an file open mode of '3' in NFSv4

   - Replace readdir's use of xxhash() with hash_64()

   - Several patches to handle malloc() failure in SUNRPC"

* tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()
  SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
  SUNRPC: Handle allocation failure in rpc_new_task()
  NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()
  NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget
  SUNRPC: Handle low memory situations in call_status()
  SUNRPC: Handle ENOMEM in call_transmit_status()
  NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation
  SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
  NFS: Replace readdir's use of xxhash() with hash_64()
  SUNRPC: handle malloc failure in ->request_prepare
  NFSv4: fix open failure with O_ACCMODE flag
  Revert "NFSv4: Handle the special Linux file open access mode"
2022-04-08 07:39:17 -10:00
Trond Myklebust
f00432063d SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.

Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c96d ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494f9: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07 16:19:47 -04:00
Chuck Lever
4d5004451a SUNRPC: Fix the svc_deferred_event trace class
Fix a NULL deref crash that occurs when an svc_rqst is deferred
while the sunrpc tracing subsystem is enabled. svc_revisit() sets
dr->xprt to NULL, so it can't be relied upon in the tracepoint to
provide the remote's address.

Unfortunately we can't revert the "svc_deferred_class" hunk in
commit ece200ddd5 ("sunrpc: Save remote presentation address in
svc_xprt for trace events") because there is now a specific check
of event format specifiers for unsafe dereferences. The warning
that check emits is:

  event svc_defer_recv has unsafe dereference of argument 1

A "%pISpc" format specifier with a "struct sockaddr *" is indeed
flagged by this check.

Instead, take the brute-force approach used by the svcrdma_qp_error
tracepoint. Convert the dr::addr field into a presentation address
in the TP_fast_assign() arm of the trace event, and store that as
a string. This fix can be backported to -stable kernels.

In the meantime, commit c6ced22997 ("tracing: Update print fmt
check to handle new __get_sockaddr() macro") is now in v5.18, so
this wonky fix can be replaced with __sockaddr() and friends
properly during the v5.19 merge window.

Fixes: ece200ddd5 ("sunrpc: Save remote presentation address in svc_xprt for trace events")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-04-07 10:22:51 -04:00
Peter Zijlstra
dc1f7893a7 locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
Have the trace_contention_*() tracepoints consistently include
adaptive spinning. In order to differentiate between the spinning and
non-spinning states add LCB_F_MUTEX and combine with LCB_F_SPIN.

The consequence is that a mutex contention can now triggler multiple
_begin() tracepoints before triggering an _end().

Additionally, this fixes one path where mutex would trigger _end()
without ever seeing a _begin().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-04-05 10:24:36 +02:00
Namhyung Kim
16edd9b511 locking: Add lock contention tracepoints
This adds two new lock contention tracepoints like below:

 * lock:contention_begin
 * lock:contention_end

The lock:contention_begin takes a flags argument to classify locks.  I
found it useful to identify what kind of locks it's tracing like if
it's spinning or sleeping, reader-writer lock, real-time, and per-cpu.

Move tracepoint definitions into mutex.c so that we can use them
without lockdep.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Link: https://lkml.kernel.org/r/20220322185709.141236-2-namhyung@kernel.org
2022-04-05 10:24:35 +02:00
Linus Torvalds
09bb8856d4 Merge tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:

 - Rename the staging files to give them some meaning. Just
   stage1,stag2,etc, does not show what they are for

 - Check for NULL from allocation in bootconfig

 - Hold event mutex for dyn_event call in user events

 - Mark user events to broken (to work on the API)

 - Remove eBPF updates from user events

 - Remove user events from uapi header to keep it from being installed.

 - Move ftrace_graph_is_dead() into inline as it is called from hot
   paths and also convert it into a static branch.

* tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Move user_events.h temporarily out of include/uapi
  ftrace: Make ftrace_graph_is_dead() a static branch
  tracing: Set user_events to BROKEN
  tracing/user_events: Remove eBPF interfaces
  tracing/user_events: Hold event_mutex during dyn_event_add
  proc: bootconfig: Add null pointer check
  tracing: Rename the staging files for trace_events
2022-04-03 12:26:01 -07:00
Steven Rostedt (Google)
84055411d8 tracing: Rename the staging files for trace_events
When looking for implementation of different phases of the creation of the
TRACE_EVENT() macro, it is pretty useless when all helper macro
redefinitions are in files labeled "stageX_defines.h". Rename them to
state which phase the files are for. For instance, when looking for the
defines that are used to create the event fields, seeing
"stage4_event_fields.h" gives the developer a good idea that the defines
are in that file.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02 08:40:04 -04:00
Linus Torvalds
f008b1d6e1 Merge tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfs updates from David Howells:
 "Netfs prep for write helpers.

  Having had a go at implementing write helpers and content encryption
  support in netfslib, it seems that the netfs_read_{,sub}request
  structs and the equivalent write request structs were almost the same
  and so should be merged, thereby requiring only one set of
  alloc/get/put functions and a common set of tracepoints.

  Merging the structs also has the advantage that if a bounce buffer is
  added to the request struct, a read operation can be performed to fill
  the bounce buffer, the contents of the buffer can be modified and then
  a write operation can be performed on it to send the data wherever it
  needs to go using the same request structure all the way through. The
  I/O handlers would then transparently perform any required crypto.
  This should make it easier to perform RMW cycles if needed.

  The potentially common functions and structs, however, by their names
  all proclaim themselves to be associated with the read side of things.

  The bulk of these changes alter this in the following ways:

   - Rename struct netfs_read_{,sub}request to netfs_io_{,sub}request.

   - Rename some enums, members and flags to make them more appropriate.

   - Adjust some comments to match.

   - Drop "read"/"rreq" from the names of common functions. For
     instance, netfs_get_read_request() becomes netfs_get_request().

   - The ->init_rreq() and ->issue_op() methods become ->init_request()
     and ->issue_read(). I've kept the latter as a read-specific
     function and in another branch added an ->issue_write() method.

  The driver source is then reorganised into a number of files:

        fs/netfs/buffered_read.c        Create read reqs to the pagecache
        fs/netfs/io.c                   Dispatchers for read and write reqs
        fs/netfs/main.c                 Some general miscellaneous bits
        fs/netfs/objects.c              Alloc, get and put functions
        fs/netfs/stats.c                Optional procfs statistics.

  and future development can be fitted into this scheme, e.g.:

        fs/netfs/buffered_write.c       Modify the pagecache
        fs/netfs/buffered_flush.c       Writeback from the pagecache
        fs/netfs/direct_read.c          DIO read support
        fs/netfs/direct_write.c         DIO write support
        fs/netfs/unbuffered_write.c     Write modifications directly back

  Beyond the above changes, there are also some changes that affect how
  things work:

   - Make fscache_end_operation() generally available.

   - In the netfs tracing header, generate enums from the symbol ->
     string mapping tables rather than manually coding them.

   - Add a struct for filesystems that uses netfslib to put into their
     inode wrapper structs to hold extra state that netfslib is
     interested in, such as the fscache cookie. This allows netfslib
     functions to be set in filesystem operation tables and jumped to
     directly without having to have a filesystem wrapper.

   - Add a member to the struct added above to track the remote inode
     length as that may differ if local modifications are buffered. We
     may need to supply an appropriate EOF pointer when storing data (in
     AFS for example).

   - Pass extra information to netfs_alloc_request() so that the
     ->init_request() hook can access it and retain information to
     indicate the origin of the operation.

   - Make the ->init_request() hook return an error, thereby allowing a
     filesystem that isn't allowed to cache an inode (ceph or cifs, for
     example) to skip readahead.

   - Switch to using refcount_t for subrequests and add tracepoints to
     log refcount changes for the request and subrequest structs.

   - Add a function to consolidate dispatching a read request. Similar
     code is used in three places and another couple are likely to be
     added in the future"

Link: https://lore.kernel.org/all/2639515.1648483225@warthog.procyon.org.uk/

* tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Maintain netfs_i_context::remote_i_size
  netfs: Keep track of the actual remote file size
  netfs: Split some core bits out into their own file
  netfs: Split fs/netfs/read_helper.c
  netfs: Rename read_helper.c to io.c
  netfs: Prepare to split read_helper.c
  netfs: Add a function to consolidate beginning a read
  netfs: Add a netfs inode context
  ceph: Make ceph_init_request() check caps on readahead
  netfs: Change ->init_request() to return an error code
  netfs: Refactor arguments for netfs_alloc_read_request
  netfs: Adjust the netfs_failure tracepoint to indicate non-subreq lines
  netfs: Trace refcounting on the netfs_io_subrequest struct
  netfs: Trace refcounting on the netfs_io_request struct
  netfs: Adjust the netfs_rreq tracepoint slightly
  netfs: Split netfs_io_* object handling out
  netfs: Finish off rename of netfs_read_request to netfs_io_request
  netfs: Rename netfs_read_*request to netfs_io_*request
  netfs: Generate enums from trace symbol mapping lists
  fscache: export fscache_end_operation()
2022-03-31 15:49:36 -07:00
Linus Torvalds
2975dbdc39 Merge tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking updates from Jakub Kicinski:
 "Networking fixes and rethook patches.

  Features:

   - kprobes: rethook: x86: replace kretprobe trampoline with rethook

  Current release - regressions:

   - sfc: avoid null-deref on systems without NUMA awareness in the new
     queue sizing code

  Current release - new code bugs:

   - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices

   - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when
     interface is down

  Previous releases - always broken:

   - openvswitch: correct neighbor discovery target mask field in the
     flow dump

   - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak

   - rxrpc: fix call timer start racing with call destruction

   - rxrpc: fix null-deref when security type is rxrpc_no_security

   - can: fix UAF bugs around echo skbs in multiple drivers

  Misc:

   - docs: move netdev-FAQ to the 'process' section of the
     documentation"

* tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
  openvswitch: Add recirc_id to recirc warning
  rxrpc: fix some null-ptr-deref bugs in server_key.c
  rxrpc: Fix call timer start racing with call destruction
  net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
  net: hns3: fix the concurrency between functions reading debugfs
  docs: netdev: move the netdev-FAQ to the process pages
  docs: netdev: broaden the new vs old code formatting guidelines
  docs: netdev: call out the merge window in tag checking
  docs: netdev: add missing back ticks
  docs: netdev: make the testing requirement more stringent
  docs: netdev: add a question about re-posting frequency
  docs: netdev: rephrase the 'should I update patchwork' question
  docs: netdev: rephrase the 'Under review' question
  docs: netdev: shorten the name and mention msgid for patch status
  docs: netdev: note that RFC postings are allowed any time
  docs: netdev: turn the net-next closed into a Warning
  docs: netdev: move the patch marking section up
  docs: netdev: minor reword
  docs: netdev: replace references to old archives
  ...
2022-03-31 11:23:31 -07:00
David Howells
4a7f62f919 rxrpc: Fix call timer start racing with call destruction
The rxrpc_call struct has a timer used to handle various timed events
relating to a call.  This timer can get started from the packet input
routines that are run in softirq mode with just the RCU read lock held.
Unfortunately, because only the RCU read lock is held - and neither ref or
other lock is taken - the call can start getting destroyed at the same time
a packet comes in addressed to that call.  This causes the timer - which
was already stopped - to get restarted.  Later, the timer dispatch code may
then oops if the timer got deallocated first.

Fix this by trying to take a ref on the rxrpc_call struct and, if
successful, passing that ref along to the timer.  If the timer was already
running, the ref is discarded.

The timer completion routine can then pass the ref along to the call's work
item when it queues it.  If the timer or work item where already
queued/running, the extra ref is discarded.

Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html
Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-31 12:25:25 +02:00
Linus Torvalds
965181d7ef Merge tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Switch NFS to use readahead instead of the obsolete readpages.

   - Readdir fixes to improve cacheability of large directories when
     there are multiple readers and writers.

   - Readdir performance improvements when doing a seekdir() immediately
     after opening the directory (common when re-exporting NFS).

   - NFS swap improvements from Neil Brown.

   - Loosen up memory allocation to permit direct reclaim and write back
     in cases where there is no danger of deadlocking the writeback code
     or NFS swap.

   - Avoid sillyrename when the NFSv4 server claims to support the
     necessary features to recover the unlinked but open file after
     reboot.

  Bugfixes:

   - Patch from Olga to add a mount option to control NFSv4.1 session
     trunking discovery, and default it to being off.

   - Fix a lockup in nfs_do_recoalesce().

   - Two fixes for list iterator variables being used when pointing to
     the list head.

   - Fix a kernel memory scribble when reading from a non-socket
     transport in /sys/kernel/sunrpc.

   - Fix a race where reconnecting to a server could leave the TCP
     socket stuck forever in the connecting state.

   - Patch from Neil to fix a shutdown race which can leave the SUNRPC
     transport timer primed after we free the struct xprt itself.

   - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2
     copy offload.

   - Sunrpc patch from Olga to avoid resending a task on an offlined
     transport.

  Cleanups:

   - Patches from Dave Wysochanski to clean up the fscache code"

* tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (91 commits)
  NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
  NFS: Don't loop forever in nfs_do_recoalesce()
  SUNRPC: Don't return error values in sysfs read of closed files
  SUNRPC: Do not dereference non-socket transports in sysfs
  NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
  SUNRPC don't resend a task on an offlined transport
  NFS: replace usage of found with dedicated list iterator variable
  SUNRPC: avoid race between mod_timer() and del_timer_sync()
  pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod
  pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod
  NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
  NFS: Avoid writeback threads getting stuck in mempool_alloc()
  NFS: nfsiod should not block forever in mempool_alloc()
  SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  SUNRPC: Fix unx_lookup_cred() allocation
  NFS: Fix memory allocation in rpc_alloc_task()
  NFS: Fix memory allocation in rpc_malloc()
  SUNRPC: Improve accuracy of socket ENOBUFS determination
  SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE
  SUNRPC: Fix socket waits for write buffer space
  ...
2022-03-29 18:55:37 -07:00
Linus Torvalds
02e2af20f4 Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc and other driver updates from Greg KH:
 "Here is the big set of char/misc and other small driver subsystem
  updates for 5.18-rc1.

  Included in here are merges from driver subsystems which contain:

   - iio driver updates and new drivers

   - fsi driver updates

   - fpga driver updates

   - habanalabs driver updates and support for new hardware

   - soundwire driver updates and new drivers

   - phy driver updates and new drivers

   - coresight driver updates

   - icc driver updates

  Individual changes include:

   - mei driver updates

   - interconnect driver updates

   - new PECI driver subsystem added

   - vmci driver updates

   - lots of tiny misc/char driver updates

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (556 commits)
  firmware: google: Properly state IOMEM dependency
  kgdbts: fix return value of __setup handler
  firmware: sysfb: fix platform-device leak in error path
  firmware: stratix10-svc: add missing callback parameter on RSU
  arm64: dts: qcom: add non-secure domain property to fastrpc nodes
  misc: fastrpc: Add dma handle implementation
  misc: fastrpc: Add fdlist implementation
  misc: fastrpc: Add helper function to get list and page
  misc: fastrpc: Add support to secure memory map
  dt-bindings: misc: add fastrpc domain vmid property
  misc: fastrpc: check before loading process to the DSP
  misc: fastrpc: add secure domain support
  dt-bindings: misc: add property to support non-secure DSP
  misc: fastrpc: Add support to get DSP capabilities
  misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP
  misc: fastrpc: separate fastrpc device from channel context
  dt-bindings: nvmem: brcm,nvram: add basic NVMEM cells
  dt-bindings: nvmem: make "reg" property optional
  nvmem: brcm_nvram: parse NVRAM content into NVMEM cells
  nvmem: dt-bindings: Fix the error of dt-bindings check
  ...
2022-03-28 12:27:35 -07:00
Linus Torvalds
5627ecb837 Merge branch 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:

 - tracepoints when Linux acts as an I2C client

 - added support for AMD PSP

 - whole subsystem now uses generic_handle_irq_safe()

 - piix4 driver gained MMIO access enabling so far missed controllers
   with AMD chipsets

 - a bulk of device driver updates, refactorization, and fixes.

* 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (61 commits)
  i2c: mux: demux-pinctrl: do not deactivate a master that is not active
  i2c: meson: Fix wrong speed use from probe
  i2c: add tracepoints for I2C slave events
  i2c: designware: Remove code duplication
  i2c: cros-ec-tunnel: Fix syntax errors in comments
  MAINTAINERS: adjust XLP9XX I2C DRIVER after removing the devicetree binding
  i2c: designware: Mark dw_i2c_plat_{suspend,resume}() as __maybe_unused
  i2c: mediatek: Add i2c compatible for Mediatek MT8168
  dt-bindings: i2c: update bindings for MT8168 SoC
  i2c: mt65xx: Simplify with clk-bulk
  i2c: i801: Drop two outdated comments
  i2c: xiic: Make bus names unique
  i2c: i801: Add support for the Process Call command
  i2c: i801: Drop useless masking in i801_access
  i2c: tegra: Add SMBus block read function
  i2c: designware: Use the i2c_mark_adapter_suspended/resumed() helpers
  i2c: designware: Lock the adapter while setting the suspended flag
  i2c: mediatek: remove redundant null check
  i2c: mediatek: modify bus speed calculation formula
  i2c: designware: Fix improper usage of readl
  ...
2022-03-26 12:46:08 -07:00
Linus Torvalds
561593a048 Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block
Pull NVMe write streams removal from Jens Axboe:
 "This removes the write streams support in NVMe. No vendor ever really
  shipped working support for this, and they are not interested in
  supporting it.

  With the NVMe support gone, we have nothing in the tree that supports
  this. Remove passing around of the hints.

  The only discussion point in this patchset imho is the fact that the
  file specific write hint setting/getting fcntl helpers will now return
  -1/EINVAL like they did before we supported write hints. No known
  applications use these functions, I only know of one prototype that I
  help do for RocksDB, and that's not used. That said, with a change
  like this, it's always a bit controversial. Alternatively, we could
  just make them return 0 and pretend it worked. It's placement based
  hints after all"

* tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
  fs: remove fs.f_write_hint
  fs: remove kiocb.ki_hint
  block: remove the per-bio/request write hint
  nvme: remove support or stream based temperature hint
2022-03-26 11:51:46 -07:00
David Hildenbrand
363106c4ce mm/khugepaged: remove reuse_swap_page() usage
reuse_swap_page() currently indicates if we can write to an anon page
without COW.  A COW is required if the page is shared by multiple
processes (either already mapped or via swap entries) or if there is
concurrent writeback that cannot tolerate concurrent page modifications.

However, in the context of khugepaged we're not actually going to write to
a read-only mapped page, we'll copy the page content to our newly
allocated THP and map that THP writable.  All we have to make sure is that
the read-only mapped page we're about to copy won't get reused by another
process sharing the page, otherwise, page content would get modified.  But
that is already guaranteed via multiple mechanisms (e.g., holding a
reference, holding the page lock, removing the rmap after copying the
page).

The swapcache handling was introduced in commit 10359213d0 ("mm:
incorporate read-only pages into transparent huge pages") and it sounds
like it merely wanted to mimic what do_swap_page() would do when trying to
map a page obtained via the swapcache writable.

As that logic is unnecessary, let's just remove it, removing the last user
of reuse_swap_page().

Link: https://lkml.kernel.org/r/20220131162940.210846-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Liang Zhang <zhangliang5@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:51 -07:00
Andrey Konovalov
9353ffa6e9 kasan, page_alloc: allow skipping memory init for HW_TAGS
Add a new GFP flag __GFP_SKIP_ZERO that allows to skip memory
initialization.  The flag is only effective with HW_TAGS KASAN.

This flag will be used by vmalloc code for page_alloc allocations backing
vmalloc() mappings in a following patch.  The reason to skip memory
initialization for these pages in page_alloc is because vmalloc code will
be initializing them instead.

With the current implementation, when __GFP_SKIP_ZERO is provided,
__GFP_ZEROTAGS is ignored.  This doesn't matter, as these two flags are
never provided at the same time.  However, if this is changed in the
future, this particular implementation detail can be changed as well.

Link: https://lkml.kernel.org/r/0d53efeff345de7d708e0baa0d8829167772521e.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Andrey Konovalov
53ae233c30 kasan, page_alloc: allow skipping unpoisoning for HW_TAGS
Add a new GFP flag __GFP_SKIP_KASAN_UNPOISON that allows skipping KASAN
poisoning for page_alloc allocations.  The flag is only effective with
HW_TAGS KASAN.

This flag will be used by vmalloc code for page_alloc allocations backing
vmalloc() mappings in a following patch.  The reason to skip KASAN
poisoning for these pages in page_alloc is because vmalloc code will be
poisoning them instead.

Also reword the comment for __GFP_SKIP_KASAN_POISON.

Link: https://lkml.kernel.org/r/35c97d77a704f6ff971dd3bfe4be95855744108e.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Andrey Konovalov
f49d9c5bb1 kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS
Only define the ___GFP_SKIP_KASAN_POISON flag when CONFIG_KASAN_HW_TAGS is
enabled.

This patch it not useful by itself, but it prepares the code for additions
of new KASAN-specific GFP patches.

Link: https://lkml.kernel.org/r/44e5738a584c11801b2b8f1231898918efc8634a.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Anshuman Khandual
4cc79b3303 mm/migration: add trace events for base page and HugeTLB migrations
This adds two trace events for base page and HugeTLB page migrations.
These events, closely follow the implementation details like setting and
removing of PTE migration entries, which are essential operations for
migration.  The new CREATE_TRACE_POINTS in <mm/rmap.c> covers both
<events/migration.h> and <events/tlb.h> based trace events.  Hence drop
redundant CREATE_TRACE_POINTS from other places which could have otherwise
conflicted during build.

Link: https://lkml.kernel.org/r/1643368182-9588-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:45 -07:00
Anshuman Khandual
283fd6fe05 mm/migration: add trace events for THP migrations
Patch series "mm/migration: Add trace events", v3.

This adds trace events for all migration scenarios including base page,
THP and HugeTLB.

This patch (of 3):

This adds two trace events for PMD based THP migration without split.
These events closely follow the implementation details like setting and
removing of PMD migration entries, which are essential operations for THP
migration.  This moves CREATE_TRACE_POINTS into generic THP from powerpc
for these new trace events to be available on other platforms as well.

Link: https://lkml.kernel.org/r/1643368182-9588-1-git-send-email-anshuman.khandual@arm.com
Link: https://lkml.kernel.org/r/1643368182-9588-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:45 -07:00
Linus Torvalds
169e77764a Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Linus Torvalds
b4bc93bd76 Merge tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM driver updates from Arnd Bergmann:
 "There are a few separately maintained driver subsystems that we merge
  through the SoC tree, notable changes are:

   - Memory controller updates, mainly for Tegra and Mediatek SoCs, and
     clarifications for the memory controller DT bindings

   - SCMI firmware interface updates, in particular a new transport
     based on OPTEE and support for atomic operations.

   - Cleanups to the TEE subsystem, refactoring its memory management

  For SoC specific drivers without a separate subsystem, changes include

   - Smaller updates and fixes for TI, AT91/SAMA5, Qualcomm and NXP
     Layerscape SoCs.

   - Driver support for Microchip SAMA5D29, Tesla FSD, Renesas RZ/G2L,
     and Qualcomm SM8450.

   - Better power management on Mediatek MT81xx, NXP i.MX8MQ and older
     NVIDIA Tegra chips"

* tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (154 commits)
  ARM: spear: fix typos in comments
  soc/microchip: fix invalid free in mpfs_sys_controller_delete
  soc: s4: Add support for power domains controller
  dt-bindings: power: add Amlogic s4 power domains bindings
  ARM: at91: add support in soc driver for new SAMA5D29
  soc: mediatek: mmsys: add sw0_rst_offset in mmsys driver data
  dt-bindings: memory: renesas,rpc-if: Document RZ/V2L SoC
  memory: emif: check the pointer temp in get_device_details()
  memory: emif: Add check for setup_interrupts
  dt-bindings: arm: mediatek: mmsys: add support for MT8186
  dt-bindings: mediatek: add compatible for MT8186 pwrap
  soc: mediatek: pwrap: add pwrap driver for MT8186 SoC
  soc: mediatek: mmsys: add mmsys reset control for MT8186
  soc: mediatek: mtk-infracfg: Disable ACP on MT8192
  soc: ti: k3-socinfo: Add AM62x JTAG ID
  soc: mediatek: add MTK mutex support for MT8186
  soc: mediatek: mmsys: add mt8186 mmsys routing table
  soc: mediatek: pm-domains: Add support for mt8186
  dt-bindings: power: Add MT8186 power domains
  soc: mediatek: pm-domains: Add support for mt8195
  ...
2022-03-23 18:23:13 -07:00
Linus Torvalds
1bc191051d Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:

 - New user_events interface. User space can register an event with the
   kernel describing the format of the event. Then it will receive a
   byte in a page mapping that it can check against. A privileged task
   can then enable that event like any other event, which will change
   the mapped byte to true, telling the user space application to start
   writing the event to the tracing buffer.

 - Add new "ftrace_boot_snapshot" kernel command line parameter. When
   set, the tracing buffer will be saved in the snapshot buffer at boot
   up when the kernel hands things over to user space. This will keep
   the traces that happened at boot up available even if user space boot
   up has tracing as well.

 - Have TRACE_EVENT_ENUM() also update trace event field type
   descriptions. Thus if a static array defines its size with an enum,
   the user space trace event parsers can still know how to parse that
   array.

 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This
   will make one tracepoint be able to trace different content and not
   be stuck at only what the original TRACE_EVENT() macro exports.

 - Fixes to tracing error logging.

 - Better saving of cmdlines to PIDs when tracing (use the wakeup events
   for mapping).

* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
  tracing: Have type enum modifications copy the strings
  user_events: Add trace event call as root for low permission cases
  tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
  tracing: Add snapshot at end of kernel boot up
  tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
  tracing: Fix strncpy warning in trace_events_synth.c
  user_events: Prevent dyn_event delete racing with ioctl add/delete
  tracing: Add TRACE_CUSTOM_EVENT() macro
  tracing: Move the defines to create TRACE_EVENTS into their own files
  tracing: Add sample code for custom trace events
  tracing: Allow custom events to be added to the tracefs directory
  tracing: Fix last_cmd_set() string management in histogram code
  user_events: Fix potential uninitialized pointer while parsing field
  tracing: Fix allocation of last_cmd in last_cmd_set()
  user_events: Add documentation file
  user_events: Add sample code for typical usage
  user_events: Add self-test for validator boundaries
  user_events: Add self-test for perf_event integration
  user_events: Add self-test for dynamic_events integration
  user_events: Add self-test for ftrace integration
  ...
2022-03-23 11:40:25 -07:00