Pull locking updates from Ingo Molnar:
- rwsem cleanups & optimizations/fixes:
- Conditionally wake waiters in reader/writer slowpaths
- Always try to wake waiters in out_nolock path
- Add try_cmpxchg64() implementation, with arch optimizations - and use
it to micro-optimize sched_clock_{local,remote}()
- Various force-inlining fixes to address objdump instrumentation-check
warnings
- Add lock contention tracepoints:
lock:contention_begin
lock:contention_end
- Misc smaller fixes & cleanups
* tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
locking/atomic/x86: Introduce arch_try_cmpxchg64
locking/atomic: Add generic try_cmpxchg64 support
futex: Remove a PREEMPT_RT_FULL reference.
locking/qrwlock: Change "queue rwlock" to "queued rwlock"
lockdep: Delete local_irq_enable_in_hardirq()
locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
locking: Apply contention tracepoints in the slow path
locking: Add lock contention tracepoints
locking/rwsem: Always try to wake waiters in out_nolock path
locking/rwsem: Conditionally wake waiters in reader/writer slowpaths
locking/rwsem: No need to check for handoff bit if wait queue empty
lockdep: Fix -Wunused-parameter for _THIS_IP_
x86/mm: Force-inline __phys_addr_nodebug()
x86/kvm/svm: Force-inline GHCB accessors
task_stack, x86/cea: Force-inline stack helpers
Pull arm64 updates from Catalin Marinas:
- Initial support for the ARMv9 Scalable Matrix Extension (SME).
SME takes the approach used for vectors in SVE and extends this to
provide architectural support for matrix operations. No KVM support
yet, SME is disabled in guests.
- Support for crashkernel reservations above ZONE_DMA via the
'crashkernel=X,high' command line option.
- btrfs search_ioctl() fix for live-lock with sub-page faults.
- arm64 perf updates: support for the Hisilicon "CPA" PMU for
monitoring coherent I/O traffic, support for Arm's CMN-650 and
CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup.
- Kselftest updates for SME, BTI, MTE.
- Automatic generation of the system register macros from a 'sysreg'
file describing the register bitfields.
- Update the type of the function argument holding the ESR_ELx register
value to unsigned long to match the architecture register size
(originally 32-bit but extended since ARMv8.0).
- stacktrace cleanups.
- ftrace cleanups.
- Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
avoid executable mappings in kexec/hibernate code, drop TLB flushing
from get_clear_flush() (and rename it to get_clear_contig()),
ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits)
arm64/sysreg: Generate definitions for FAR_ELx
arm64/sysreg: Generate definitions for DACR32_EL2
arm64/sysreg: Generate definitions for CSSELR_EL1
arm64/sysreg: Generate definitions for CPACR_ELx
arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
arm64/sysreg: Generate definitions for CLIDR_EL1
arm64/sve: Move sve_free() into SVE code section
arm64: Kconfig.platforms: Add comments
arm64: Kconfig: Fix indentation and add comments
arm64: mm: avoid writable executable mappings in kexec/hibernate code
arm64: lds: move special code sections out of kernel exec segment
arm64/hugetlb: Implement arm64 specific huge_ptep_get()
arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
arm64: kdump: Do not allocate crash low memory if not needed
arm64/sve: Generate ZCR definitions
arm64/sme: Generate defintions for SVCR
arm64/sme: Generate SMPRI_EL1 definitions
arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
arm64/sme: Automatically generate SMIDR_EL1 defines
arm64/sme: Automatically generate defines for SMCR
...
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-05-23
We've added 113 non-merge commits during the last 26 day(s) which contain
a total of 121 files changed, 7425 insertions(+), 1586 deletions(-).
The main changes are:
1) Speed up symbol resolution for kprobes multi-link attachments, from Jiri Olsa.
2) Add BPF dynamic pointer infrastructure e.g. to allow for dynamically sized ringbuf
reservations without extra memory copies, from Joanne Koong.
3) Big batch of libbpf improvements towards libbpf 1.0 release, from Andrii Nakryiko.
4) Add BPF link iterator to traverse links via seq_file ops, from Dmitrii Dolgov.
5) Add source IP address to BPF tunnel key infrastructure, from Kaixi Fan.
6) Refine unprivileged BPF to disable only object-creating commands, from Alan Maguire.
7) Fix JIT blinding of ld_imm64 when they point to subprogs, from Alexei Starovoitov.
8) Add BPF access to mptcp_sock structures and their meta data, from Geliang Tang.
9) Add new BPF helper for access to remote CPU's BPF map elements, from Feng Zhou.
10) Allow attaching 64-bit cookie to BPF link of fentry/fexit/fmod_ret, from Kui-Feng Lee.
11) Follow-ups to typed pointer support in BPF maps, from Kumar Kartikeya Dwivedi.
12) Add busy-poll test cases to the XSK selftest suite, from Magnus Karlsson.
13) Improvements in BPF selftest test_progs subtest output, from Mykola Lysenko.
14) Fill bpf_prog_pack allocator areas with illegal instructions, from Song Liu.
15) Add generic batch operations for BPF map-in-map cases, from Takshak Chahande.
16) Make bpf_jit_enable more user friendly when permanently on 1, from Tiezhu Yang.
17) Fix an array overflow in bpf_trampoline_get_progs(), from Yuntao Wang.
====================
Link: https://lore.kernel.org/r/20220523223805.27931-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull block updates from Jens Axboe:
"Here are the core block changes for 5.19. This contains:
- blk-throttle accounting fix (Laibin)
- Series removing redundant assignments (Michal)
- Expose bio cache via the bio_set, so that DM can use it (Mike)
- Finish off the bio allocation interface cleanups by dealing with
the weirdest member of the family. bio_kmalloc combines a kmalloc
for the bio and bio_vecs with a hidden bio_init call and magic
cleanup semantics (Christoph)
- Clean up the block layer API so that APIs consumed by file systems
are (almost) only struct block_device based, so that file systems
don't have to poke into block layer internals like the
request_queue (Christoph)
- Clean up the blk_execute_rq* API (Christoph)
- Clean up various lose end in the blk-cgroup code to make it easier
to follow in preparation of reworking the blkcg assignment for bios
(Christoph)
- Fix use-after-free issues in BFQ when processes with merged queues
get moved to different cgroups (Jan)
- BFQ fixes (Jan)
- Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming,
Wolfgang, me)"
* tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits)
blk-mq: fix typo in comment
bfq: Remove bfq_requeue_request_body()
bfq: Remove superfluous conversion from RQ_BIC()
bfq: Allow current waker to defend against a tentative one
bfq: Relax waker detection for shared queues
blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
blk-throttle: Set BIO_THROTTLED when bio has been throttled
blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
blk-cgroup: always terminate io.stat lines
block, bfq: make bfq_has_work() more accurate
block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
block: cleanup the VM accounting in submit_bio
block: Fix the bio.bi_opf comment
block: reorder the REQ_ flags
blk-iocost: combine local_stat and desc_stat to stat
block: improve the error message from bio_check_eod
block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
block: remove superfluous calls to blkcg_bio_issue_init
kthread: unexport kthread_blkcg
blk-cgroup: cleanup blkcg_maybe_throttle_current
...
Pull RCU update from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offloading updates, mainly simplifications
- RCU-tasks updates, including some -rt fixups, handling of systems
with sparse CPU numbering, and a fix for a boot-time race-condition
failure
- Put SRCU on a memory diet in order to reduce the size of the
srcu_struct structure
- Torture-test updates fixing some bugs in tests and closing some
testing holes
- Torture-test updates for the RCU tasks flavors, most notably ensuring
that building rcutorture and friends does not change the
RCU-tasks-related Kconfig options
- Torture-test scripting updates
- Expedited grace-period updates, most notably providing
milliseconds-scale (not all that) soft real-time response from
synchronize_rcu_expedited().
This is also the first time in almost 30 years of RCU that someone
other than me has pushed for a reduction in the RCU CPU stall-warning
timeout, in this case by more than three orders of magnitude from 21
seconds to 20 milliseconds. This tighter timeout applies only to
expedited grace periods
* tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
rcu: Move expedited grace period (GP) work to RT kthread_worker
rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
srcu: Drop needless initialization of sdp in srcu_gp_start()
srcu: Prevent expedited GPs and blocking readers from consuming CPU
srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
srcu: Automatically determine size-transition strategy at boot
rcutorture: Make torture.sh allow for --kasan
rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU
rcutorture: Make kvm.sh allow more memory for --kasan runs
torture: Save "make allmodconfig" .config file
scftorture: Remove extraneous "scf" from per_version_boot_params
rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC
torture: Enable CSD-lock stall reports for scftorture
torture: Skip vmlinux check for kvm-again.sh runs
scftorture: Adjust for TASKS_RCU Kconfig option being selected
rcuscale: Allow rcuscale without RCU Tasks Rude/Trace
rcuscale: Allow rcuscale without RCU Tasks
refscale: Allow refscale without RCU Tasks Rude/Trace
refscale: Allow refscale without RCU Tasks
rcutorture: Allow specifying per-scenario stat_interval
...
Commit fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.
This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.
If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.
Fixes: fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
Pass a cookie along with BPF_LINK_CREATE requests.
Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie.
The cookie of a bpf_tracing_link is available by calling
bpf_get_attach_cookie when running the BPF program of the attached
link.
The value of a cookie will be set at bpf_tramp_run_ctx by the
trampoline of the link.
Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com
Adding ftrace_lookup_symbols function that resolves array of symbols
with single pass over kallsyms.
The user provides array of string pointers with count and pointer to
allocated array for resolved values.
int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt,
unsigned long *addrs)
It iterates all kallsyms symbols and tries to loop up each in provided
symbols array with bsearch. The symbols array needs to be sorted by
name for this reason.
We also check each symbol to pass ftrace_location, because this API
will be used for fprobe symbols resolving.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220510122616.2652285-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- sched/core is on a pretty old -rc1 base - refresh it to include recent fixes.
- this also allows up to resolve a (trivial) .mailmap conflict
Conflicts:
.mailmap
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All callers of bio_blkcg actually want the CSS, so replace it with an
interface that does return the CSS. This now allows to move
struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a
public header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass the cgroup_subsys_state instead of a the blkg so that blktrace
doesn't need to poke into blk-cgroup internals, and give the name a
blk prefix as the current name is way too generic for a public
interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the new logic was made to handle deltas of events from interrupts
that interrupted other events, it required 64 bit local atomics.
Unfortunately, 64 bit local atomics are expensive on 32 bit architectures.
Thus, commit 10464b4aa6 ("ring-buffer: Add rb_time_t 64 bit operations
for speeding up 32 bit") created a type of seq lock timer for 32 bits.
It used two 32 bit local atomics, but required 2 bits from them each for
synchronization, making it only 60 bits.
Add a new "msb" field to hold the extra 4 bits that are cut off.
Link: https://lore.kernel.org/all/20220426175338.3807ca4f@gandalf.local.home/
Link: https://lkml.kernel.org/r/20220427170812.53cc7139@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There's an absolute timestamp event in the ring buffer, but this only
saves 59 bits of the timestamp, as the 5 MSB is used for meta data
(stating it is an absolute time stamp). This was never an issue as all the
clocks currently in use never used those 5 MSB. But now there's a new
clock (TAI) that does.
To handle this case, when reading an absolute timestamp, a previous full
timestamp is passed in, and the 5 MSB of that timestamp is OR'd to the
absolute timestamp (if any of the 5 MSB are set), and then to test for
overflow, if the new result is smaller than the passed in previous
timestamp, then 1 << 59 is added to it.
All the extra processing is done on the reader "slow" path, with the
exception of the "too big delta" check, and the reading of timestamps
for histograms.
Note, libtraceevent will need to be updated to handle this case as well.
But this is not a user space regression, as user space was never able to
handle any timestamps that used more than 59 bits.
Link: https://lore.kernel.org/all/20220426175338.3807ca4f@gandalf.local.home/
Link: https://lkml.kernel.org/r/20220427153339.16c33f75@gandalf.local.home
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
To prepare for support asynchronous tracer_init_tracefs initcall,
avoid calling create_trace_option_files before __update_tracer_options.
Otherwise, create_trace_option_files will show warning because
some tracers in trace_types list are already in tr->topts.
For example, hwlat_tracer call register_tracer in late_initcall,
and global_trace.dir is already created in tracing_init_dentry,
hwlat_tracer will be put into tr->topts.
Then if the __update_tracer_options is executed after hwlat_tracer
registered, create_trace_option_files find that hwlat_tracer is
already in tr->topts.
Link: https://lkml.kernel.org/r/20220426122407.17042-2-mark-pk.tsai@mediatek.com
Link: https://lore.kernel.org/lkml/20220322133339.GA32582@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When setting bootparams="trace_event=initcall:initcall_start tp_printk=1" in the
cmdline, the output_printk() was called, and the spin_lock_irqsave() was called in the
atomic and irq disable interrupt context suitation. On the PREEMPT_RT kernel,
these locks are replaced with sleepable rt-spinlock, so the stack calltrace will
be triggered.
Fix it by raw_spin_lock_irqsave when PREEMPT_RT and "trace_event=initcall:initcall_start
tp_printk=1" enabled.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at:
[<ffffffff8992303e>] try_to_wake_up+0x7e/0xba0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt17+ #19 34c5812404187a875f32bee7977f7367f9679ea7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x8c
dump_stack+0x10/0x12
__might_resched.cold+0x11d/0x155
rt_spin_lock+0x40/0x70
trace_event_buffer_commit+0x2fa/0x4c0
? map_vsyscall+0x93/0x93
trace_event_raw_event_initcall_start+0xbe/0x110
? perf_trace_initcall_finish+0x210/0x210
? probe_sched_wakeup+0x34/0x40
? ttwu_do_wakeup+0xda/0x310
? trace_hardirqs_on+0x35/0x170
? map_vsyscall+0x93/0x93
do_one_initcall+0x217/0x3c0
? trace_event_raw_event_initcall_level+0x170/0x170
? push_cpu_stop+0x400/0x400
? cblist_init_generic+0x241/0x290
kernel_init_freeable+0x1ac/0x347
? _raw_spin_unlock_irq+0x65/0x80
? rest_init+0xf0/0xf0
kernel_init+0x1e/0x150
ret_from_fork+0x22/0x30
</TASK>
Link: https://lkml.kernel.org/r/20220419013910.894370-1-jun.miao@intel.com
Signed-off-by: Jun Miao <jun.miao@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently the tp_printk option has no effect on syscall tracepoint.
When adding the kernel option parameter tp_printk, then:
echo 1 > /sys/kernel/debug/tracing/events/syscalls/enable
When running any application, no trace information is printed on the
terminal.
Now added printk for syscall tracepoints.
Link: https://lkml.kernel.org/r/20220410145025.681144-1-xiehuan09@gmail.com
Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add the description of @n_sort_keys and make @sort_key ->
@sort_keys in tracing_map_sort_entries() kernel-doc comment
to remove warnings found by running scripts/kernel-doc, which
is caused by using 'make W=1'.
kernel/trace/tracing_map.c:1073: warning: Function parameter or member
'sort_keys' not described in 'tracing_map_sort_entries'
kernel/trace/tracing_map.c:1073: warning: Function parameter or member
'n_sort_keys' not described in 'tracing_map_sort_entries'
kernel/trace/tracing_map.c:1073: warning: Excess function parameter
'sort_key' description in 'tracing_map_sort_entries'
Link: https://lkml.kernel.org/r/20220402072015.45864-1-yang.lee@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Code for registering triggers assumes it's possible to register more
than one trigger at a time. In fact, it's unimplemented and there
doesn't seem to be a reason to do that.
Remove the n_registered param from event_trigger_register() and fix up
callers.
Doing so simplifies the logic in event_trigger_register to the point
that it just becomes a wrapper calling event_command.reg().
It also removes the problematic call to event_command.unreg() in case
of failure. A new function, event_trigger_unregister() is also added
for callers to call themselves.
The changes to trace_events_hist.c simply allow compilation; a
separate patch follows which updates the hist triggers to work
correctly with the new changes.
Link: https://lkml.kernel.org/r/6149fec7a139d93e84fa4535672fb5bef88006b0.1644010575.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Ok so hopefully this is the last of it. 0day picked up a build
failure [0] when SYSCTL=y but DYNAMIC_FTRACE=n. This can be fixed
by just declaring an empty routine for the calls moved just
recently.
[0] https://lkml.kernel.org/r/202204161203.6dSlgKJX-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Fixes: f8b7d2b4c1 ("ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y")
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
One can enable dyanmic tracing but disable sysctls.
When this is doen we get the compile kernel warning:
CC kernel/trace/ftrace.o
kernel/trace/ftrace.c:3086:13: warning: ‘ftrace_shutdown_sysctl’ defined
but not used [-Wunused-function]
3086 | static void ftrace_shutdown_sysctl(void)
| ^~~~~~~~~~~~~~~~~~~~~~
kernel/trace/ftrace.c:3068:13: warning: ‘ftrace_startup_sysctl’ defined
but not used [-Wunused-function]
3068 | static void ftrace_startup_sysctl(void)
When CONFIG_DYNAMIC_FTRACE=n the ftrace_startup_sysctl() and
routines ftrace_shutdown_sysctl() still compiles, so these
are actually more just used for when SYSCTL=y.
Fix this then by just moving these routines to when sysctls
are enabled.
Fixes: 7cde53da38a3 ("ftrace: move sysctl_ftrace_enabled to ftrace.c")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Currently, any kernel built with CONFIG_PREEMPTION=y also gets
CONFIG_TASKS_RCU=y, which is not helpful to people trying to build
preemptible kernels of minimal size.
Because CONFIG_TASKS_RCU=y is needed only in kernels doing tracing of
one form or another, this commit moves from TASKS_RCU deciding when it
should be enabled to the tracing Kconfig options explicitly selecting it.
This allows building preemptible kernels without TASKS_RCU, if desired.
This commit also updates the SRCU-N and TREE09 rcutorture scenarios
in order to avoid Kconfig errors that would otherwise result from
CONFIG_TASKS_RCU being selected without its CONFIG_RCU_EXPERT dependency
being met.
[ paulmck: Apply BPF_SYSCALL feedback from Andrii Nakryiko. ]
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Commit 7d08c2c911 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros
into functions") switched a bunch of BPF_PROG_RUN macros to inline
routines. This changed the semantic a bit. Due to arguments expansion
of macros, it used to be:
rcu_read_lock();
array = rcu_dereference(cgrp->bpf.effective[atype]);
...
Now, with with inline routines, we have:
array_rcu = rcu_dereference(cgrp->bpf.effective[atype]);
/* array_rcu can be kfree'd here */
rcu_read_lock();
array = rcu_dereference(array_rcu);
I'm assuming in practice rcu subsystem isn't fast enough to trigger
this but let's use rcu API properly.
Also, rename to lower caps to not confuse with macros. Additionally,
drop and expand BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY.
See [1] for more context.
[1] https://lore.kernel.org/bpf/CAKH8qBs60fOinFdxiiQikK_q0EcVxGvNTQoWvHLEUGbgcj1UYg@mail.gmail.com/T/#u
v2
- keep rcu locks inside by passing cgroup_bpf
Fixes: 7d08c2c911 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220414161233.170780-1-sdf@google.com
If CONFIG_SYSCTL and CONFIG_DYNAMIC_FTRACE is n, build warns:
kernel/trace/ftrace.c:7912:13: error: ‘is_permanent_ops_registered’ defined but not used [-Werror=unused-function]
static bool is_permanent_ops_registered(void)
^~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/trace/ftrace.c:89:12: error: ‘last_ftrace_enabled’ defined but not used [-Werror=unused-variable]
static int last_ftrace_enabled;
^~~~~~~~~~~~~~~~~~~
Move is_permanent_ops_registered() to ifdef block and mark last_ftrace_enabled as
__maybe_unused to fix this.
Fixes: 7cde53da38a3 ("ftrace: move sysctl_ftrace_enabled to ftrace.c")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-04-09
We've added 63 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 4852 insertions(+), 619 deletions(-).
The main changes are:
1) Add libbpf support for USDT (User Statically-Defined Tracing) probes.
USDTs are an abstraction built on top of uprobes, critical for tracing
and BPF, and widely used in production applications, from Andrii Nakryiko.
2) While Andrii was adding support for x86{-64}-specific logic of parsing
USDT argument specification, Ilya followed-up with USDT support for s390
architecture, from Ilya Leoshkevich.
3) Support name-based attaching for uprobe BPF programs in libbpf. The format
supported is `u[ret]probe/binary_path:[raw_offset|function[+offset]]`, e.g.
attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc")
now, from Alan Maguire.
4) Various load/store optimizations for the arm64 JIT to shrink the image
size by using arm64 str/ldr immediate instructions. Also enable pointer
authentication to verify return address for JITed code, from Xu Kuohai.
5) BPF verifier fixes for write access checks to helper functions, e.g.
rd-only memory from bpf_*_cpu_ptr() must not be passed to helpers that
write into passed buffers, from Kumar Kartikeya Dwivedi.
6) Fix overly excessive stack map allocation for its base map structure and
buckets which slipped-in from cleanups during the rlimit accounting removal
back then, from Yuntao Wang.
7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter
connection tracking tuple direction, from Lorenzo Bianconi.
8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde.
9) Minor cleanups all over the place from various others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits)
bpf: Fix excessive memory allocation in stack_map_alloc()
selftests/bpf: Fix return value checks in perf_event_stackmap test
selftests/bpf: Add CO-RE relos into linked_funcs selftests
libbpf: Use weak hidden modifier for USDT BPF-side API functions
libbpf: Don't error out on CO-RE relos for overriden weak subprogs
samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread
libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
libbpf: Use strlcpy() in path resolution fallback logic
libbpf: Add s390-specific USDT arg spec parsing logic
libbpf: Make BPF-side of USDT support work on big-endian machines
libbpf: Minor style improvements in USDT code
libbpf: Fix use #ifdef instead of #if to avoid compiler warning
libbpf: Potential NULL dereference in usdt_manager_attach_usdt()
selftests/bpf: Uprobe tests should verify param/return values
libbpf: Improve string parsing for uprobe auto-attach
libbpf: Improve library identification for uprobe binary path resolution
selftests/bpf: Test for writes to map key from BPF helpers
selftests/bpf: Test passing rdonly mem to global func
bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access
bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
...
====================
Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - new code bugs:
- mctp: correct mctp_i2c_header_create result
- eth: fungible: fix reference to __udivdi3 on 32b builds
- eth: micrel: remove latencies support lan8814
Previous releases - regressions:
- bpf: resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
- vrf: fix packet sniffing for traffic originating from ip tunnels
- rxrpc: fix a race in rxrpc_exit_net()
- dsa: revert "net: dsa: stop updating master MTU from master.c"
- eth: ice: fix MAC address setting
Previous releases - always broken:
- tls: fix slab-out-of-bounds bug in decrypt_internal
- bpf: support dual-stack sockets in bpf_tcp_check_syncookie
- xdp: fix coalescing for page_pool fragment recycling
- ovs: fix leak of nested actions
- eth: sfc:
- add missing xdp queue reinitialization
- fix using uninitialized xdp tx_queue
- eth: ice:
- clear default forwarding VSI during VSI release
- fix broken IFF_ALLMULTI handling
- synchronize_rcu() when terminating rings
- eth: qede: confirm skb is allocated before using
- eth: aqc111: fix out-of-bounds accesses in RX fixup
- eth: slip: fix NPD bug in sl_tx_timeout()"
* tag 'net-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
drivers: net: slip: fix NPD bug in sl_tx_timeout()
bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
qede: confirm skb is allocated before using
net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n
net: phy: mscc-miim: reject clause 45 register accesses
net: axiemac: use a phandle to reference pcs_phy
dt-bindings: net: add pcs-handle attribute
net: axienet: factor out phy_node in struct axienet_local
net: axienet: setup mdio unconditionally
net: sfc: fix using uninitialized xdp tx_queue
rxrpc: fix a race in rxrpc_exit_net()
net: openvswitch: fix leak of nested actions
net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
net: openvswitch: don't send internal clone attribute to the userspace.
net: micrel: Fix KS8851 Kconfig
ice: clear cmd_type_offset_bsz for TX rings
ice: xsk: fix VSI state check in ice_xsk_wakeup()
...
This moves ftrace_enabled to trace/ftrace.c.
We move sysctls to places where features actually belong to improve
the readability of the code and reduce the risk of code merge conflicts.
At the same time, the proc-sysctl maintainers do not want to know what
sysctl knobs you wish to add for your owner piece of code, we just care
about the core logic.
Signed-off-by: Wei Xiao <xiaowei66@huawei.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>