Pull RCU updates from Paul McKenney:
- Fix idle detection (Neeraj Upadhyay) and missing access marking
detected by KCSAN.
- Reduce coupling between rcu_barrier() and CPU-hotplug operations, so
that rcu_barrier() no longer needs to do cpus_read_lock(). This may
also someday allow system boot to bring CPUs online concurrently.
- Enable more aggressive movement to per-CPU queueing when reacting to
excessive lock contention due to workloads placing heavy update-side
stress on RCU tasks.
- Improvements to RCU priority boosting, including changes from Neeraj
Upadhyay, Zqiang, and Alison Chaiken.
- Various fixes improving test robustness and debug information.
- Add tests for SRCU size transitions, further compress torture.sh
build products, and improve debug output.
- Miscellaneous fixes.
* tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
rcu: Replace cpumask_weight with cpumask_empty where appropriate
rcu: Remove __read_mostly annotations from rcu_scheduler_active externs
rcu: Uninline multi-use function: finish_rcuwait()
rcu: Mark writes to the rcu_segcblist structure's ->flags field
kasan: Record work creation stack trace with interrupts enabled
rcu: Inline __call_rcu() into call_rcu()
rcu: Add mutex for rcu boost kthread spawning and affinity setting
rcu: Fix description of kvfree_rcu()
MAINTAINERS: Add Frederic and Neeraj to their RCU files
rcutorture: Provide non-power-of-two Tasks RCU scenarios
rcutorture: Test SRCU size transitions
torture: Make torture.sh help message match reality
rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention
rcu-tasks: Use order_base_2() instead of ilog2()
rcu: Create and use an rcu_rdp_cpu_online()
rcu: Make rcu_barrier() no longer block CPU-hotplug operations
rcu: Rework rcu_barrier() and callback-migration logic
rcu: Refactor rcu_barrier() empty-list handling
rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
torture: Change KVM environment variable to RCUTORTURE
...
We always write out an entire folio at once. This conversion removes
a few calls to compound_head() and gets the NR_VMSCAN_WRITE statistic
right when writing out a large folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Make the tracing formatting for user_data and flags consistent.
Having consistent formatting allows one for example to grep for a specific
user_data/flags and be able to trace a single sqe through easily.
Change user_data to 0x%llx and flags to 0x%x everywhere. The '0x' is
useful to disambiguate for example "user_data 100".
Additionally remove the '=' for flags in io_uring_req_failed, again for consistency.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220316095204.2191498-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This code adds the on disk structures for the block group root, which
will hold the block group items for extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
NFS_RPC_SWAPFLAGS is only used for READ requests.
It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to
requests. This is not needed for swap READ - though it is for writes
where it is set via a different mechanism.
RPC_TASK_ROOTCREDS causes the 'machine' credential to be used.
This is not needed as the root credential is saved when the swap file is
opened, and this is used for all IO.
So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of
RPC_TASK_ROOTCREDS, that isn't needed either.
Remove both.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
A network filesystem may set coherency data on a volume cookie, and if
given, cachefiles will store this in an xattr on the directory in the
cache corresponding to the volume.
The function that sets the xattr just stores the contents of the volume
coherency buffer directly into the xattr, with nothing added; the
checking function, on the other hand, has a cut'n'paste error whereby it
tries to interpret the xattr contents as would be the xattr on an
ordinary file (using the cachefiles_xattr struct). This results in a
failure to match the coherency data because the buffer ends up being
shifted by 18 bytes.
Fix this by defining a structure specifically for the volume xattr and
making both the setting and checking functions use it.
Since the volume coherency doesn't work if used, take the opportunity to
insert a reserved field for future use, set it to 0 and check that it is
0. Log mismatch through the appropriate tracepoint.
Note that this only affects cifs; 9p, afs, ceph and nfs don't use the
volume coherency data at the moment.
Fixes: 32e150037d ("fscache, cachefiles: Store the volume coherency data")
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The tracepoint in get_mapping_status() only dumped the incoming mpext
fields. This patch added a new tracepoint in mptcp_sendmsg_frag() to dump
the outgoing mpext too.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The TUN can be used as vhost-net backend. E.g, the tun_net_xmit() is the
interface to forward the skb from TUN to vhost-net/virtio-net.
However, there are many "goto drop" in the TUN driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.
The below reasons are introduced:
- SKB_DROP_REASON_DEV_READY
- SKB_DROP_REASON_NOMEM
- SKB_DROP_REASON_HDR_TRUNC
- SKB_DROP_REASON_TAP_FILTER
- SKB_DROP_REASON_TAP_TXFILTER
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
the interface to forward the skb from TAP to vhost-net/virtio-net.
However, there are many "goto drop" in the TAP driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.
The below reasons are introduced:
- SKB_DROP_REASON_SKB_CSUM
- SKB_DROP_REASON_SKB_GSO_SEG
- SKB_DROP_REASON_SKB_UCOPY_FAULT
- SKB_DROP_REASON_DEV_HDR
- SKB_DROP_REASON_FULL_RING
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add reason for skb drops to __netif_receive_skb_core() when packet_type
not found to handle the skb. For this purpose, the drop reason
SKB_DROP_REASON_PTYPE_ABSENT is introduced. Take ether packets for
example, this case mainly happens when L3 protocol is not supported.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in sch_handle_ingress() with
kfree_skb_reason(). Following drop reasons are introduced:
SKB_DROP_REASON_TC_INGRESS
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in do_xdp_generic() with kfree_skb_reason().
The drop reason SKB_DROP_REASON_XDP is introduced for this case.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in enqueue_to_backlog() with
kfree_skb_reason(). The skb rop reason SKB_DROP_REASON_CPU_BACKLOG is
introduced for the case of failing to enqueue the skb to the per CPU
backlog queue. The further reason can be backlog queue full or RPS
flow limition, and I think we needn't to make further distinctions.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add reasons for skb drops to __dev_xmit_skb() by replacing
kfree_skb_list() with kfree_skb_list_reason(). The drop reason of
SKB_DROP_REASON_QDISC_DROP is introduced for qdisc enqueue fails.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in sch_handle_egress() with kfree_skb_reason().
The drop reason SKB_DROP_REASON_TC_EGRESS is introduced. Considering
the code path of tc egerss, we make it distinct with the drop reason
of SKB_DROP_REASON_QDISC_DROP in the next commit.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of commit
c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
the following sequence becomes possible:
p->__state = TASK_INTERRUPTIBLE;
__schedule()
deactivate_task(p);
ttwu()
READ !p->on_rq
p->__state=TASK_WAKING
trace_sched_switch()
__trace_sched_switch_state()
task_state_index()
return 0;
TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
the trace event.
Prevent this by pushing the value read from __schedule() down the trace
event.
Reported-by: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com
To make server-side trace events more useful in container-ized
environments, capture not just the remote's IP address, but the
local IP address and network namespace as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Enable a struct sockaddr to be stored in a trace record as a
dynamically-sized field. The common cases are AF_INET and AF_INET6
which are different sizes, and are vastly smaller than a struct
sockaddr_storage.
These are safer because, when used properly, the size of the
sockaddr destination field in each trace record is now guaranteed
to be the same as the source address that is being copied into it.
Link: https://lore.kernel.org/all/164182978641.8391.8277203495236105391.stgit@bazille.1015granger.net/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace kfree_skb() used in __neigh_event_send() with
kfree_skb_reason(). Following drop reasons are added:
SKB_DROP_REASON_NEIGH_FAILED
SKB_DROP_REASON_NEIGH_QUEUEFULL
SKB_DROP_REASON_NEIGH_DEAD
The first two reasons above should be the hot path that skb drops
in neighbour subsystem.
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() which is used in the packet egress path of IP layer
with kfree_skb_reason(). Functions that are involved include:
__ip_queue_xmit()
ip_finish_output()
ip_mc_finish_output()
ip6_output()
ip6_finish_output()
ip6_finish_output2()
Following new drop reasons are introduced:
SKB_DROP_REASON_IP_OUTNOROUTES
SKB_DROP_REASON_BPF_CGROUP_EGRESS
SKB_DROP_REASON_IPV6DISABLED
SKB_DROP_REASON_NEIGH_CREATEFAIL
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arm SCMI firmware interface updates for v5.18
Few main additions include:
- Support for OPTEE based SCMI transport to enable using SCMI service
provided by OPTEE on some platforms
- Support for atomic SCMI transports which enables few SCMI transactions
to be completed in atomic context. This involves other refactoring work
associated with it. It also marks SMC and OPTEE as atomic transport as
the commands are completed once the return.
- Support for polling mode in SCMI VirtIO transport in order to support
atomic operations
- Support for atomic clock operations based on availability of atomic
capability in the underlying SCMI transport
Other changes involves some trace and log enhancements and miscellaneous
bug fixes.
* tag 'scmi-updates-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (28 commits)
clk: scmi: Support atomic clock enable/disable API
firmware: arm_scmi: Add support for clock_enable_latency
firmware: arm_scmi: Add atomic support to clock protocol
firmware: arm_scmi: Support optional system wide atomic-threshold-us
dt-bindings: firmware: arm,scmi: Add atomic-threshold-us optional property
firmware: arm_scmi: Add atomic mode support to virtio transport
firmware: arm_scmi: Review virtio free_list handling
firmware: arm_scmi: Add a virtio channel refcount
firmware: arm_scmi: Disable ftrace for Clang Thumb2 builds
firmware: arm_scmi: Add new parameter to mark_txdone
firmware: arm_scmi: Add atomic mode support to smc transport
firmware: arm_scmi: Add support for atomic transports
firmware: arm_scmi: Make optee support sync_cmds_completed_on_ret
firmware: arm_scmi: Make smc support sync_cmds_completed_on_ret
firmware: arm_scmi: Add sync_cmds_completed_on_ret transport flag
firmware: arm_scmi: Make smc transport use common completions
firmware: arm_scmi: Add configurable polling mode for transports
firmware: arm_scmi: Use new trace event scmi_xfer_response_wait
include: trace: Add new scmi_xfer_response_wait event
firmware: arm_scmi: Refactor message response path
...
Link: https://lore.kernel.org/r/20220222201742.3338589-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
These explicit tracepoints aren't really used and show sign of aging.
It's work to keep these up to date, and before I attempted to keep them
up to date, they weren't up to date, which indicates that they're not
really used. These days there are better ways of introspecting anyway.
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
We've been using a flurry of int, unsigned int, size_t, and ssize_t.
Let's unify all of this into size_t where it makes sense, as it does in
most places, and leave ssize_t for return values with possible errors.
In addition, keeping with the convention of other functions in this
file, functions that are dealing with raw bytes now take void *
consistently instead of a mix of that and u8 *, because much of the time
we're actually passing some other structure that is then interpreted as
bytes by the function.
We also take the opportunity to fix the outdated and incorrect comment
in get_random_bytes_arch().
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Joel writes:
FSI changes for v5.18
* Improvements in SCOM and OCC drivers for error handling and retries
* Addition of tracepoints for initialisation path
* API for setting long running SBE FIFO operations
* tag 'fsi-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi:
fsi: Add trace events in initialization path
fsi: sbefifo: Implement FSI_SBEFIFO_READ_TIMEOUT_SECONDS ioctl
fsi: sbefifo: Use specified value of start of response timeout
fsi: occ: Improve response status checking
fsi: scom: Remove retries in indirect scoms
fsi: scom: Fix error handling
Our pool is 256 bits, and we only ever use all of it or don't use it at
all, which is decided by whether or not it has at least 128 bits in it.
So we can drastically simplify the accounting and cmpxchg loop to do
exactly this. While we're at it, we move the minimum bit size into a
constant so it can be shared between the two places where it matters.
The reason we want any of this is for the case in which an attacker has
compromised the current state, and then bruteforces small amounts of
entropy added to it. By demanding a particular minimum amount of entropy
be present before reseeding, we make that bruteforcing difficult.
Note that this rationale no longer includes anything about /dev/random
blocking at the right moment, since /dev/random no longer blocks (except
for at ~boot), but rather uses the crng. In a former life, /dev/random
was different and therefore required a more nuanced account(), but this
is no longer.
Behaviorally, nothing changes here. This is just a simplification of
the code.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>