We recently added -Wuninitialized, but it's not enough to catch various
silly mistakes or omissions. Let's go all the way to -Wall, just like we
do for user-space code.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230309054015.4068562-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Once we enable -Wall for BPF sources, compiler will complain about lots
of unused variables, variables that are set but never read, etc.
Fix all these issues first before enabling -Wall in Makefile.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230309054015.4068562-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add __sink(expr) macro that forces compiler to believe that passed in
expression is both read and written. It used a simple embedded asm for
this. This is useful in a lot of tests where we assign value to some variable
to trigger some action, but later don't read variable, causing compiler
to complain (if corresponding compiler warnings are turned on, which
we'll do in the next patch).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230309054015.4068562-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With latest llvm17, selftest fexit_bpf2bpf/func_replace_return_code
has the following verification failure:
0: R1=ctx(off=0,imm=0) R10=fp0
; int connect_v4_prog(struct bpf_sock_addr *ctx)
0: (bf) r7 = r1 ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0)
1: (b4) w6 = 0 ; R6_w=0
; memset(&tuple.ipv4.saddr, 0, sizeof(tuple.ipv4.saddr));
...
; return do_bind(ctx) ? 1 : 0;
179: (bf) r1 = r7 ; R1=ctx(off=0,imm=0) R7=ctx(off=0,imm=0)
180: (85) call pc+147
Func#3 is global and valid. Skipping.
181: R0_w=scalar()
181: (bc) w6 = w0 ; R0_w=scalar() R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
182: (05) goto pc-129
; }
54: (bc) w0 = w6 ; R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
55: (95) exit
At program exit the register R0 has value (0x0; 0xffffffff) should have been in (0x0; 0x1)
processed 281 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 13
-- END PROG LOAD LOG --
libbpf: prog 'connect_v4_prog': failed to load: -22
The corresponding source code:
__attribute__ ((noinline))
int do_bind(struct bpf_sock_addr *ctx)
{
struct sockaddr_in sa = {};
sa.sin_family = AF_INET;
sa.sin_port = bpf_htons(0);
sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4);
if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0)
return 0;
return 1;
}
...
SEC("cgroup/connect4")
int connect_v4_prog(struct bpf_sock_addr *ctx)
{
...
return do_bind(ctx) ? 1 : 0;
}
Insn 180 is a call to 'do_bind'. The call's return value is also the return value
for the program. Since do_bind() returns 0/1, so it is legitimate for compiler to
optimize 'return do_bind(ctx) ? 1 : 0' to 'return do_bind(ctx)'. However, such
optimization breaks verifier as the return value of 'do_bind()' is marked as any
scalar which violates the requirement of prog return value 0/1.
There are two ways to fix this problem, (1) changing 'return 1' in do_bind() to
e.g. 'return 10' so the compiler has to do 'do_bind(ctx) ? 1 :0', or (2)
suggested by Andrii, marking do_bind() with __weak attribute so the compiler
cannot make any assumption on do_bind() return value.
This patch adopted adding __weak approach which is simpler and more resistant
to potential compiler optimizations.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230310012410.2920570-1-yhs@fb.com
There is a report that fib_lookup test is flaky when running in parallel.
A symptom of slowness or delay. An example:
Testing IPv6 stale neigh
set_lookup_params:PASS:inet_pton(IPV6_IFACE_ADDR) 0 nsec
test_fib_lookup:PASS:bpf_prog_test_run_opts 0 nsec
test_fib_lookup:FAIL:fib_lookup_ret unexpected fib_lookup_ret: actual 0 != expected 7
test_fib_lookup:FAIL:dmac not match unexpected dmac not match: actual 1 != expected 0
dmac expected 11:11:11:11:11:11 actual 00:00:00:00:00:00
[ Note that the "fib_lookup_ret unexpected fib_lookup_ret actual 0 ..."
is reversed in terms of expected and actual value. Fixing in this
patch also. ]
One possibility is the testing stale neigh entry was marked dead by the
gc (in neigh_periodic_work). The default gc_stale_time sysctl is 60s.
This patch increases it to 15 mins.
It also:
- fixes the reversed arg (actual vs expected) in one of the
ASSERT_EQ test
- removes the nodad command arg when adding v4 neigh entry which
currently has a warning.
Fixes: 168de02335 ("selftests/bpf: Add bpf_fib_lookup test")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309060244.3242491-1-martin.lau@linux.dev
Implement a trivial iterator returning same specified integer value
N times as part of bpf_testmod kernel module. Add selftests to validate
everything works end to end.
We also reuse these tests as "verification-only" tests to validate that
kernel prints the state of custom kernel module-defined iterator correctly:
fp-16=iter_testmod_seq(ref_id=1,state=drained,depth=0)
"testmod_seq" part is an iterator type, and is coming from module's BTF
data dynamically at runtime.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add number iterator (bpf_iter_num_{new,next,destroy}()) tests,
validating the correct handling of various corner and common cases
*at runtime*.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add various tests for open-coded iterators. Some of them excercise
various possible coding patterns in C, some go down to low-level
assembly for more control over various conditions, especially invalid
ones.
We also make use of bpf_for(), bpf_for_each(), bpf_repeat() macros in
some of these tests.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add bpf_for_each(), bpf_for(), and bpf_repeat() macros that make writing
open-coded iterator-based loops much more convenient and natural. These
macros utilize cleanup attribute to ensure proper destruction of the
iterator and thanks to that manage to provide the ergonomics that is
very close to C language's for() construct. Typical loop would look like:
int i;
int arr[N];
bpf_for(i, 0, N) {
/* verifier will know that i >= 0 && i < N, so could be used to
* directly access array elements with no extra checks
*/
arr[i] = i;
}
bpf_repeat() is very similar, but it doesn't expose iteration number and
is meant as a simple "repeat action N times" loop:
bpf_repeat(N) { /* whatever, N times */ }
Note that `break` and `continue` statements inside the {} block work as
expected.
bpf_for_each() is a generalization over any kind of BPF open-coded
iterator allowing to use for-each-like approach instead of calling
low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
struct cgroup *cg;
bpf_for_each(cgroup, cg, some, input, args) {
/* do something with each cg */
}
would call (not-yet-implemented) bpf_iter_cgroup_{new,next,destroy}()
functions to form a loop over cgroups, where `some, input, args` are
passed verbatim into constructor as
bpf_iter_cgroup_new(&it, some, input, args).
As a first demonstration, add pyperf variant based on the bpf_for() loop.
Also clean up a few tests that either included bpf_misc.h header
unnecessarily from the user-space, which is unsupported, or included it
before any common types are defined (and thus leading to unnecessary
compilation warnings, potentially).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement the first open-coded iterator type over a range of integers.
It's public API consists of:
- bpf_iter_num_new() constructor, which accepts [start, end) range
(that is, start is inclusive, end is exclusive).
- bpf_iter_num_next() which will keep returning read-only pointer to int
until the range is exhausted, at which point NULL will be returned.
If bpf_iter_num_next() is kept calling after this, NULL will be
persistently returned.
- bpf_iter_num_destroy() destructor, which needs to be called at some
point to clean up iterator state. BPF verifier enforces that iterator
destructor is called at some point before BPF program exits.
Note that `start = end = X` is a valid combination to setup an empty
iterator. bpf_iter_num_new() will return 0 (success) for any such
combination.
If bpf_iter_num_new() detects invalid combination of input arguments, it
returns error, resets iterator state to, effectively, empty iterator, so
any subsequent call to bpf_iter_num_next() will keep returning NULL.
BPF verifier has no knowledge that returned integers are in the
[start, end) value range, as both `start` and `end` are not statically
known and enforced: they are runtime values.
While the implementation is pretty trivial, some care needs to be taken
to avoid overflows and underflows. Subsequent selftests will validate
correctness of [start, end) semantics, especially around extremes
(INT_MIN and INT_MAX).
Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can
be specified.
bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded
BPF loops and bpf_loop() helper and is the basis for implementing
ergonomic BPF loops with no statically known or verified bounds.
Subsequent patches implement bpf_for() macro, demonstrating how this can
be wrapped into something that works and feels like a normal for() loop
in C language.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED
flag is set") caused bpf_ima_inode_hash() to refuse to give non-fresh
digests. IMA test #3 assumed the old behavior, that bpf_ima_inode_hash()
still returned also non-fresh digests.
Correct the test by accepting both cases. If the samples returned are 1,
assume that the commit above is applied and that the returned digest is
fresh. If the samples returned are 2, assume that the commit above is not
applied, and check both the non-fresh and fresh digest.
Fixes: 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED flag is set")
Reported-by: David Vernet <void@manifault.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/bpf/20230308103713.1681200-1-roberto.sassu@huaweicloud.com
Parsing of USDT arguments is architecture-specific; on arm it is
relatively easy since registers used are r[0-10], fp, ip, sp, lr,
pc. Format is slightly different compared to aarch64; forms are
- "size @ [ reg, #offset ]" for dereferences, for example
"-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]"
- "size @ reg" for register values; for example
"-4@r0"
- "size @ #value" for raw values; for example
"-8@#1"
Add support for parsing USDT arguments for ARM architecture.
To test the above changes QEMU's virt[1] board with cortex-a15
CPU was used. libbpf-bootstrap's usdt example[2] was modified to attach
to a test program with DTRACE_PROBE1/2/3/4... probes to test different
combinations.
[1] https://www.qemu.org/docs/master/system/arm/virt.html
[2] https://github.com/libbpf/libbpf-bootstrap/blob/master/examples/c/usdt.bpf.c
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-3-puranjay12@gmail.com
The parse_usdt_arg() function is defined differently for each
architecture but the last part of the function is repeated
verbatim for each architecture.
Refactor parse_usdt_arg() to fill the arg_sz and then do the repeated
post-processing in parse_usdt_spec().
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-2-puranjay12@gmail.com
Coverity reported a potential underflow of the offset variable used in
the find_cd() function. Switch to using a signed 64 bit integer for the
representation of offset to make sure we can never underflow.
Fixes: 1eebcb6063 ("libbpf: Implement basic zip archive parsing support")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307215504.837321-1-deso@posteo.net
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI
g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG
2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc=
=EjJQ
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).
The main changes are:
1) Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses, from Joanne Koong.
2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities,
from Andrii Nakryiko.
3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc,
from Alexei Starovoitov.
4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps, from Kumar Kartikeya Dwivedi.
5) Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them,
from Eduard Zingerman.
6) Make uprobe attachment Android APK aware by supporting attachment
to functions inside ELF objects contained in APKs via function names,
from Daniel Müller.
7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
to start the timer with absolute expiration value instead of relative
one, from Tero Kristo.
8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
from Tejun Heo.
9) Extend libbpf to support users manually attaching kprobes/uprobes
in the legacy/perf/link mode, from Menglong Dong.
10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
from Jiaxun Yang.
11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
from Hengqi Chen.
12) Extend BPF instruction set doc with describing the encoding of BPF
instructions in terms of how bytes are stored under big/little endian,
from Jose E. Marchesi.
13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.
14) Fix bpf_xdp_query() backwards compatibility on old kernels,
from Yonghong Song.
15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
from Florent Revest.
16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
from Hou Tao.
17) Fix BPF verifier's check_subprogs to not unnecessarily mark
a subprogram with has_tail_call, from Ilya Leoshkevich.
18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
selftests/bpf: Split test_attach_probe into multi subtests
libbpf: Add support to set kprobe/uprobe attach mode
tools/resolve_btfids: Add /libsubcmd to .gitignore
bpf: add support for fixed-size memory pointer returns for kfuncs
bpf: generalize dynptr_get_spi to be usable for iters
bpf: mark PTR_TO_MEM as non-null register type
bpf: move kfunc_call_arg_meta higher in the file
bpf: ensure that r0 is marked scratched after any function call
bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
bpf: clean up visit_insn()'s instruction processing
selftests/bpf: adjust log_fixup's buffer size for proper truncation
bpf: honor env->test_state_freq flag in is_state_visited()
selftests/bpf: enhance align selftest's expected log matching
bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
bpf: improve stack slot state printing
selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
bpf: allow ctx writes using BPF_ST_MEM instruction
bpf: Use separate RCU callbacks for freeing selem
...
====================
Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to adapt to the older kernel, now we split the "attach_probe"
testing into multi subtests:
manual // manual attach tests for kprobe/uprobe
auto // auto-attach tests for kprobe and uprobe
kprobe-sleepable // kprobe sleepable test
uprobe-lib // uprobe tests for library function by name
uprobe-sleepable // uprobe sleepable test
uprobe-ref_ctr // uprobe ref_ctr test
As sleepable kprobe needs to set BPF_F_SLEEPABLE flag before loading,
we need to move it to a stand alone skel file, in case of it is not
supported by kernel and make the whole loading fail.
Therefore, we can only enable part of the subtests for older kernel.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: https://lore.kernel.org/bpf/20230306064833.7932-3-imagedong@tencent.com
By default, libbpf will attach the kprobe/uprobe BPF program in the
latest mode that supported by kernel. In this patch, we add the support
to let users manually attach kprobe/uprobe in legacy or perf mode.
There are 3 mode that supported by the kernel to attach kprobe/uprobe:
LEGACY: create perf event in legacy way and don't use bpf_link
PERF: create perf event with perf_event_open() and don't use bpf_link
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: create perf event with perf_event_open() and use bpf_link
Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com/
Link: https://lore.kernel.org/bpf/20230306064833.7932-2-imagedong@tencent.com
Users now can manually choose the mode with
bpf_program__attach_uprobe_opts()/bpf_program__attach_kprobe_opts().
Add libsubcmd to .gitignore, otherwise after compiling the kernel it
would result in the following:
# bpf-next...bpf-next/master
?? tools/bpf/resolve_btfids/libsubcmd/
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_F13D670D5D7AA9C4BD868D3220921AAC090A@qq.com
Adjust log_fixup's expected buffer length to fix the test. It's pretty
finicky in its length expectation, but it doesn't break often. So just
adjust the length to work on current kernel and with follow up iterator
changes as well.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allow to search for expected register state in all the verifier log
output that's related to specified instruction number.
See added comment for an example of possible situation that is happening
due to a simple enhancement done in the next patch, which fixes handling
of env->test_state_freq flag in state checkpointing logic.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Function verifier.c:convert_ctx_access() applies some rewrites to BPF
instructions that read or write BPF program context. This commit adds
machinery to allow test cases that inspect BPF program after these
rewrites are applied.
An example of a test case:
{
// Shorthand for field offset and size specification
N(CGROUP_SOCKOPT, struct bpf_sockopt, retval),
// Pattern generated for field read
.read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
"$dst = *(u64 *)($dst + task_struct::bpf_ctx);"
"$dst = *(u32 *)($dst + bpf_cg_run_ctx::retval);",
// Pattern generated for field write
.write = "*(u64 *)($ctx + bpf_sockopt_kern::tmp_reg) = r9;"
"r9 = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
"r9 = *(u64 *)(r9 + task_struct::bpf_ctx);"
"*(u32 *)(r9 + bpf_cg_run_ctx::retval) = $src;"
"r9 = *(u64 *)($ctx + bpf_sockopt_kern::tmp_reg);" ,
},
For each test case, up to three programs are created:
- One that uses BPF_LDX_MEM to read the context field.
- One that uses BPF_STX_MEM to write to the context field.
- One that uses BPF_ST_MEM to write to the context field.
The disassembly of each program is compared with the pattern specified
in the test case.
Kernel code for disassembly is reused (as is in the bpftool).
To keep Makefile changes to the minimum, symbolic links to
`kernel/bpf/disasm.c` and `kernel/bpf/disasm.h ` are added.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Check that verifier tracks pointer types for BPF_ST_MEM instructions
and reports error if pointer types do not match for different
execution branches.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lift verifier restriction to use BPF_ST_MEM instructions to write to
context data structures. This requires the following changes:
- verifier.c:do_check() for BPF_ST updated to:
- no longer forbid writes to registers of type PTR_TO_CTX;
- track dst_reg type in the env->insn_aux_data[...].ptr_type field
(same way it is done for BPF_STX and BPF_LDX instructions).
- verifier.c:convert_ctx_access() and various callbacks invoked by
it are updated to handled BPF_ST instruction alongside BPF_STX.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack
of such key mechanism makes it impossible for sleepable bpf programs to use RCU
pointers.
Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't
support btf_type_tag yet) and allowlist certain field dereferences in important
data structures like tast_struct, cgroup, socket that are used by sleepable
programs either as RCU pointer or full trusted pointer (which is valid outside
of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such
tagging. They will be removed once GCC supports btf_type_tag.
With that refactor check_ptr_to_btf_access(). Make it strict in enforcing
PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without
modifier flags. There is a chance that this strict enforcement might break
existing programs (especially on GCC compiled kernels), but this cleanup has to
start sooner than later. Note PTR_TO_CTX access still yields old deprecated
PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the
kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will
remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0.
Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
Adjust cgroup kfunc test to dereference RCU protected cgroup pointer
as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-6-alexei.starovoitov@gmail.com
The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
Derefrence of other kptr-s returns PTR_UNTRUSTED.
For example:
struct map_value {
struct cgroup __kptr *cgrp;
};
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
{
struct cgroup *cg, *cg2;
cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
bpf_kptr_xchg(&v->cgrp, cg);
cg2 = v->cgrp; // This is new feature introduced by this patch.
// cg2 is PTR_MAYBE_NULL | MEM_RCU.
// When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
if (cg2)
bpf_cgroup_ancestor(cg2, level); // safe to do.
}
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
The concept felt useful, but didn't get much traction,
since bpf_rdonly_cast() was added soon after and bpf programs received
a simpler way to access PTR_UNTRUSTED kernel pointers
without going through restrictive __kptr usage.
Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
its intended usage.
The main goal of __kptr_untrusted was to read/write such pointers
directly while bpf_kptr_xchg was a mechanism to access refcnted
kernel pointers. The next patch will allow RCU protected __kptr access
with direct read. At that point __kptr_untrusted will be deprecated.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
Add test for the absolute BPF timer under the existing timer tests. This
will run the timer two times with 1us expiration time, and then re-arm
the timer at ~35s in the future. At the end, it is verified that the
absolute timer expired exactly two times.
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20230302114614.2985072-3-tero.kristo@linux.intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start()
to start an absolute value timer instead of the default relative value.
This makes the timer expire at an exact point in time, instead of a time
with latencies induced by both the BPF and timer subsystems.
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Per C99 standard [0], Section 6.7.8, Paragraph 10:
If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate.
And in the same document, in appendix "J.2 Undefined behavior":
The behavior is undefined in the following circumstances:
[...]
The value of an object with automatic storage duration is used while
it is indeterminate (6.2.4, 6.7.8, 6.8).
This means that use of an uninitialized stack variable is undefined
behavior, and therefore that clang can choose to do a variety of scary
things, such as not generating bytecode for "bunch of useful code" in
the below example:
void some_func()
{
int i;
if (!i)
return;
// bunch of useful code
}
To add insult to injury, if some_func above is a helper function for
some BPF program, clang can choose to not generate an "exit" insn,
causing verifier to fail with "last insn is not an exit or jmp". Going
from that verification failure to the root cause of uninitialized use
is certain to be frustrating.
This patch adds -Wuninitialized to the cflags for selftest BPF progs and
fixes up existing instances of uninitialized use.
[0]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: David Vernet <void@manifault.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230303005500.1614874-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This change adds support for attaching uprobes to shared objects located
in APKs, which is relevant for Android systems where various libraries
may reside in APKs. To make that happen, we extend the syntax for the
"binary path" argument to attach to with that supported by various
Android tools:
<archive>!/<binary-in-archive>
For example:
/system/app/test-app/test-app.apk!/lib/arm64-v8a/libc++_shared.so
APKs need to be specified via full path, i.e., we do not attempt to
resolve mere file names by searching system directories.
We cannot currently test this functionality end-to-end in an automated
fashion, because it relies on an Android system being present, but there
is no support for that in CI. I have tested the functionality manually,
by creating a libbpf program containing a uretprobe, attaching it to a
function inside a shared object inside an APK, and verifying the sanity
of the returned values.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-4-deso@posteo.net
This change splits the elf_find_func_offset() function in two:
elf_find_func_offset(), which now accepts an already opened Elf object
instead of a path to a file that is to be opened, as well as
elf_find_func_offset_from_file(), which opens a binary based on a
path and then invokes elf_find_func_offset() on the Elf object. Having
this split in responsibilities will allow us to call
elf_find_func_offset() from other code paths on Elf objects that did not
necessarily come from a file on disk.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-3-deso@posteo.net
This change implements support for reading zip archives, including
opening an archive, finding an entry based on its path and name in it,
and closing it.
The code was copied from https://github.com/iovisor/bcc/pull/4440, which
implements similar functionality for bcc. The author confirmed that he
is fine with this usage and the corresponding relicensing. I adjusted it
to adhere to libbpf coding standards.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Michał Gregorczyk <michalgr@meta.com>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-2-deso@posteo.net
Extend __flag attribute by allowing to specify one of the following:
* BPF_F_STRICT_ALIGNMENT
* BPF_F_ANY_ALIGNMENT
* BPF_F_TEST_RND_HI32
* BPF_F_TEST_STATE_FREQ
* BPF_F_SLEEPABLE
* BPF_F_XDP_HAS_FRAGS
* Some numeric value
Extend __msg attribute by allowing to specify multiple exepcted messages.
All messages are expected to be present in the verifier log in the
order of application.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301175417.3146070-2-eddyz87@gmail.com
[ Eduard: added commit message, formatting, comments ]
If target is bpf, there is no __loongarch__ definition, __BITS_PER_LONG
defaults to 32, __NR_nanosleep is not defined:
#if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32
#define __NR_nanosleep 101
__SC_3264(__NR_nanosleep, sys_nanosleep_time32, sys_nanosleep)
#endif
Work around this problem, by explicitly setting __BITS_PER_LONG to
__loongarch_grlen which is defined by compiler as 64 for LA64.
This is similar with commit 36e70b9b06 ("selftests, bpf: Fix broken
riscv build").
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1677585781-21628-1-git-send-email-yangtiezhu@loongson.cn
Firstly, ensure programs successfully load when using all of the
supported maps. Then, extend existing tests to test more cases at
runtime. We are currently testing both the synchronous freeing of items
and asynchronous destruction when map is freed, but the code needs to be
adjusted a bit to be able to also accomodate support for percpu maps.
We now do a delete on the item (and update for array maps which has a
similar effect for kptrs) to perform a synchronous free of the kptr, and
test destruction both for the synchronous and asynchronous deletion.
Next time the program runs, it should observe the refcount as 1 since
all existing references should have been released by then. By running
the program after both possible paths freeing kptrs, we establish that
they correctly release resources. Next, we augment the existing test to
also test the same code path shared by all local storage maps using a
task local storage map.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230225154010.391965-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test skb and xdp dynptr functionality in the following ways:
1) progs/test_cls_redirect_dynptr.c
* Rewrite "progs/test_cls_redirect.c" test to use dynptrs to parse
skb data
* This is a great example of how dynptrs can be used to simplify a
lot of the parsing logic for non-statically known values.
When measuring the user + system time between the original version
vs. using dynptrs, and averaging the time for 10 runs (using
"time ./test_progs -t cls_redirect"):
original version: 0.092 sec
with dynptrs: 0.078 sec
2) progs/test_xdp_dynptr.c
* Rewrite "progs/test_xdp.c" test to use dynptrs to parse xdp data
When measuring the user + system time between the original version
vs. using dynptrs, and averaging the time for 10 runs (using
"time ./test_progs -t xdp_attach"):
original version: 0.118 sec
with dynptrs: 0.094 sec
3) progs/test_l4lb_noinline_dynptr.c
* Rewrite "progs/test_l4lb_noinline.c" test to use dynptrs to parse
skb data
When measuring the user + system time between the original version
vs. using dynptrs, and averaging the time for 10 runs (using
"time ./test_progs -t l4lb_all"):
original version: 0.062 sec
with dynptrs: 0.081 sec
For number of processed verifier instructions:
original version: 6268 insns
with dynptrs: 2588 insns
4) progs/test_parse_tcp_hdr_opt_dynptr.c
* Add sample code for parsing tcp hdr opt lookup using dynptrs.
This logic is lifted from a real-world use case of packet parsing
in katran [0], a layer 4 load balancer. The original version
"progs/test_parse_tcp_hdr_opt.c" (not using dynptrs) is included
here as well, for comparison.
When measuring the user + system time between the original version
vs. using dynptrs, and averaging the time for 10 runs (using
"time ./test_progs -t parse_tcp_hdr_opt"):
original version: 0.031 sec
with dynptrs: 0.045 sec
5) progs/dynptr_success.c
* Add test case "test_skb_readonly" for testing attempts at writes
on a prog type with read-only skb ctx.
* Add "test_dynptr_skb_data" for testing that bpf_dynptr_data isn't
supported for skb progs.
6) progs/dynptr_fail.c
* Add test cases "skb_invalid_data_slice{1,2,3,4}" and
"xdp_invalid_data_slice{1,2}" for testing that helpers that modify the
underlying packet buffer automatically invalidate the associated
data slice.
* Add test cases "skb_invalid_ctx" and "xdp_invalid_ctx" for testing
that prog types that do not support bpf_dynptr_from_skb/xdp don't
have access to the API.
* Add test case "dynptr_slice_var_len{1,2}" for testing that
variable-sized len can't be passed in to bpf_dynptr_slice
* Add test case "skb_invalid_slice_write" for testing that writes to a
read-only data slice are rejected by the verifier.
* Add test case "data_slice_out_of_bounds_skb" for testing that
writes to an area outside the slice are rejected.
* Add test case "invalid_slice_rdwr_rdonly" for testing that prog
types that don't allow writes to packet data don't accept any calls
to bpf_dynptr_slice_rdwr.
[0] https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230301154953.641654-11-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
The user must pass in a buffer to store the contents of the data slice
if a direct pointer to the data cannot be obtained.
For skb and xdp type dynptrs, these two APIs are the only way to obtain
a data slice. However, for other types of dynptrs, there is no
difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.
For skb type dynptrs, the data is copied into the user provided buffer
if any of the data is not in the linear portion of the skb. For xdp type
dynptrs, the data is copied into the user provided buffer if the data is
between xdp frags.
If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
the skb will be uncloned (see bpf_unclone_prologue()).
Please note that any bpf_dynptr_write() automatically invalidates any prior
data slices of the skb dynptr. This is because the skb may be cloned or
may need to pull its paged buffer into the head. As such, any
bpf_dynptr_write() will automatically have its prior data slices
invalidated, even if the write is to data in the skb head of an uncloned
skb. Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well, for the same reasons.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add xdp dynptrs, which are dynptrs whose underlying pointer points
to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of xdp->data and xdp->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).
For reads and writes on the dynptr, this includes reading/writing
from/to and across fragments. Data slices through the bpf_dynptr_data
API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() should be used.
For examples of how xdp dynptrs can be used, please see the attached
selftests.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).
For bpf prog types that don't support writes on skb data, the dynptr is
read-only (bpf_dynptr_write() will return an error)
For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
interfaces, reading and writing from/to data in the head as well as from/to
non-linear paged buffers is supported. Data slices through the
bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.
For examples of how skb dynptrs can be used, please see the attached
selftests.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 04d58f1b26a4("libbpf: add API to get XDP/XSK supported features")
added feature_flags to struct bpf_xdp_query_opts. If a user uses
bpf_xdp_query_opts with feature_flags member, the bpf_xdp_query()
will check whether 'netdev' family exists or not in the kernel.
If it does not exist, the bpf_xdp_query() will return -ENOENT.
But 'netdev' family does not exist in old kernels as it is
introduced in the same patch set as Commit 04d58f1b26.
So old kernel with newer libbpf won't work properly with
bpf_xdp_query() api call.
To fix this issue, if the return value of
libbpf_netlink_resolve_genl_family_id() is -ENOENT, bpf_xdp_query()
will just return 0, skipping the rest of xdp feature query.
This preserves backward compatibility.
Fixes: 04d58f1b26 ("libbpf: add API to get XDP/XSK supported features")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230227224943.1153459-1-yhs@fb.com