bpf-next-for-netdev

-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZi9+AAAKCRDbK58LschI
 g0nEAP487m7L0nLVriC2oIOWsi29tklW3etm6DO7gmGRGIHgrgEAnMyV1xBj3bGj
 v6jJwDcybCym1hLx+1x1JCZ4eoAFswE=
 =xbna
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-04-29

We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).

The main changes are:

1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
   memory addresses and implement support in x86 BPF JIT. This allows
   inlining per-CPU array and hashmap lookups
   and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.

2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.

3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
   atomics in bpf_arena which can be JITed as a single x86 instruction,
   from Alexei Starovoitov.

4) Add support for passing mark with bpf_fib_lookup helper,
   from Anton Protopopov.

5) Add a new bpf_wq API for deferring events and refactor sleepable
   bpf_timer code to keep common code where possible,
   from Benjamin Tissoires.

6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
   to check when NULL is passed for non-NULLable parameters,
   from Eduard Zingerman.

7) Harden the BPF verifier's and/or/xor value tracking,
   from Harishankar Vishwanathan.

8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
   crypto subsystem, from Vadim Fedorenko.

9) Various improvements to the BPF instruction set standardization doc,
   from Dave Thaler.

10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
    from Andrea Righi.

11) Bigger batch of BPF selftests refactoring to use common network helpers
    and to drop duplicate code, from Geliang Tang.

12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
    from Jose E. Marchesi.

13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled,
    from Kumar Kartikeya Dwivedi.

14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
    from David Vernet.

15) Extend the BPF verifier to allow different input maps for a given
    bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.

16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
    for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.

17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
    the stack memory, from Martin KaFai Lau.

18) Improve xsk selftest coverage with new tests on maximum and minimum
    hardware ring size configurations, from Tushar Vyavahare.

19) Various ReST man pages fixes as well as documentation and bash completion
    improvements for bpftool, from Rameez Rehman & Quentin Monnet.

20) Fix libbpf with regards to dumping subsequent char arrays,
    from Quentin Deslandes.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
  bpf, docs: Clarify PC use in instruction-set.rst
  bpf_helpers.h: Define bpf_tail_call_static when building with GCC
  bpf, docs: Add introduction for use in the ISA Internet Draft
  selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
  bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
  selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
  bpf: check bpf_dummy_struct_ops program params for test runs
  selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
  selftests/bpf: adjust dummy_st_ops_success to detect additional error
  bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
  selftests/bpf: Add ring_buffer__consume_n test.
  bpf: Add bpf_guard_preempt() convenience macro
  selftests: bpf: crypto: add benchmark for crypto functions
  selftests: bpf: crypto skcipher algo selftests
  bpf: crypto: add skcipher to bpf crypto
  bpf: make common crypto API for TC/XDP programs
  bpf: update the comment for BTF_FIELDS_MAX
  selftests/bpf: Fix wq test.
  selftests/bpf: Use make_sockaddr in test_sock_addr
  selftests/bpf: Use connect_to_addr in test_sock_addr
  ...
====================

Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2024-04-29 11:59:20 -07:00
commit 89de2db193
158 changed files with 9328 additions and 2141 deletions

View File

@ -5,7 +5,11 @@
BPF Instruction Set Architecture (ISA)
======================================
This document specifies the BPF instruction set architecture (ISA).
eBPF (which is no longer an acronym for anything), also commonly
referred to as BPF, is a technology with origins in the Linux kernel
that can run untrusted programs in a privileged context such as an
operating system kernel. This document specifies the BPF instruction
set architecture (ISA).
Documentation conventions
=========================
@ -43,7 +47,7 @@ a type's signedness (`S`) and bit width (`N`), respectively.
===== =========
For example, `u32` is a type whose valid values are all the 32-bit unsigned
numbers and `s16` is a types whose valid values are all the 16-bit signed
numbers and `s16` is a type whose valid values are all the 16-bit signed
numbers.
Functions
@ -108,7 +112,7 @@ conformance group means it must support all instructions in that conformance
group.
The use of named conformance groups enables interoperability between a runtime
that executes instructions, and tools as such compilers that generate
that executes instructions, and tools such as compilers that generate
instructions for the runtime. Thus, capability discovery in terms of
conformance groups might be done manually by users or automatically by tools.
@ -181,10 +185,13 @@ A basic instruction is encoded as follows::
(`64-bit immediate instructions`_ reuse this field for other purposes)
**dst_reg**
destination register number (0-10)
destination register number (0-10), unless otherwise specified
(future instructions might reuse this field for other purposes)
**offset**
signed integer offset used with pointer arithmetic
signed integer offset used with pointer arithmetic, except where
otherwise specified (some arithmetic instructions reuse this field
for other purposes)
**imm**
signed integer immediate value
@ -228,10 +235,12 @@ This is depicted in the following figure::
operation to perform, encoded as explained above
**regs**
The source and destination register numbers, encoded as explained above
The source and destination register numbers (unless otherwise
specified), encoded as explained above
**offset**
signed integer offset used with pointer arithmetic
signed integer offset used with pointer arithmetic, unless
otherwise specified
**imm**
signed integer immediate value
@ -342,8 +351,8 @@ where '(u32)' indicates that the upper 32 bits are zeroed.
dst = dst ^ imm
Note that most instructions have instruction offset of 0. Only three instructions
(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero offset.
Note that most arithmetic instructions have 'offset' set to 0. Only three instructions
(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero 'offset'.
Division, multiplication, and modulo operations for ``ALU`` are part
of the "divmul32" conformance group, and division, multiplication, and
@ -365,15 +374,15 @@ Note that there are varying definitions of the signed modulo operation
when the dividend or divisor are negative, where implementations often
vary by language such that Python, Ruby, etc. differ from C, Go, Java,
etc. This specification requires that signed modulo use truncated division
(where -13 % 3 == -1) as implemented in C, Go, etc.:
(where -13 % 3 == -1) as implemented in C, Go, etc.::
a % n = a - n * trunc(a / n)
The ``MOVSX`` instruction does a move operation with sign extension.
``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32
bit operands, and zeroes the remaining upper 32 bits.
``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into
32-bit operands, and zeroes the remaining upper 32 bits.
``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
operands into 64 bit operands. Unlike other arithmetic instructions,
operands into 64-bit operands. Unlike other arithmetic instructions,
``MOVSX`` is only defined for register source operands (``X``).
The ``NEG`` instruction is only defined when the source bit is clear
@ -411,19 +420,19 @@ conformance group.
Examples:
``{END, TO_LE, ALU}`` with imm = 16/32/64 means::
``{END, TO_LE, ALU}`` with 'imm' = 16/32/64 means::
dst = htole16(dst)
dst = htole32(dst)
dst = htole64(dst)
``{END, TO_BE, ALU}`` with imm = 16/32/64 means::
``{END, TO_BE, ALU}`` with 'imm' = 16/32/64 means::
dst = htobe16(dst)
dst = htobe32(dst)
dst = htobe64(dst)
``{END, TO_LE, ALU64}`` with imm = 16/32/64 means::
``{END, TO_LE, ALU64}`` with 'imm' = 16/32/64 means::
dst = bswap16(dst)
dst = bswap32(dst)
@ -438,27 +447,33 @@ otherwise identical operations, and indicates the base64 conformance
group unless otherwise specified.
The 'code' field encodes the operation as below:
======== ===== ======= =============================== ===================================================
code value src_reg description notes
======== ===== ======= =============================== ===================================================
JA 0x0 0x0 PC += offset {JA, K, JMP} only
JA 0x0 0x0 PC += imm {JA, K, JMP32} only
======== ===== ======= ================================= ===================================================
code value src_reg description notes
======== ===== ======= ================================= ===================================================
JA 0x0 0x0 PC += offset {JA, K, JMP} only
JA 0x0 0x0 PC += imm {JA, K, JMP32} only
JEQ 0x1 any PC += offset if dst == src
JGT 0x2 any PC += offset if dst > src unsigned
JGE 0x3 any PC += offset if dst >= src unsigned
JGT 0x2 any PC += offset if dst > src unsigned
JGE 0x3 any PC += offset if dst >= src unsigned
JSET 0x4 any PC += offset if dst & src
JNE 0x5 any PC += offset if dst != src
JSGT 0x6 any PC += offset if dst > src signed
JSGE 0x7 any PC += offset if dst >= src signed
CALL 0x8 0x0 call helper function by address {CALL, K, JMP} only, see `Helper functions`_
CALL 0x8 0x1 call PC += imm {CALL, K, JMP} only, see `Program-local functions`_
CALL 0x8 0x2 call helper function by BTF ID {CALL, K, JMP} only, see `Helper functions`_
EXIT 0x9 0x0 return {CALL, K, JMP} only
JLT 0xa any PC += offset if dst < src unsigned
JLE 0xb any PC += offset if dst <= src unsigned
JSLT 0xc any PC += offset if dst < src signed
JSLE 0xd any PC += offset if dst <= src signed
======== ===== ======= =============================== ===================================================
JSGT 0x6 any PC += offset if dst > src signed
JSGE 0x7 any PC += offset if dst >= src signed
CALL 0x8 0x0 call helper function by static ID {CALL, K, JMP} only, see `Helper functions`_
CALL 0x8 0x1 call PC += imm {CALL, K, JMP} only, see `Program-local functions`_
CALL 0x8 0x2 call helper function by BTF ID {CALL, K, JMP} only, see `Helper functions`_
EXIT 0x9 0x0 return {CALL, K, JMP} only
JLT 0xa any PC += offset if dst < src unsigned
JLE 0xb any PC += offset if dst <= src unsigned
JSLT 0xc any PC += offset if dst < src signed
JSLE 0xd any PC += offset if dst <= src signed
======== ===== ======= ================================= ===================================================
where 'PC' denotes the program counter, and the offset to increment by
is in units of 64-bit instructions relative to the instruction following
the jump instruction. Thus 'PC += 1' skips execution of the next
instruction if it's a basic instruction or results in undefined behavior
if the next instruction is a 128-bit wide instruction.
The BPF program needs to store the return value into register R0 before doing an
``EXIT``.
@ -475,7 +490,7 @@ where 's>=' indicates a signed '>=' comparison.
gotol +imm
where 'imm' means the branch offset comes from insn 'imm' field.
where 'imm' means the branch offset comes from the 'imm' field.
Note that there are two flavors of ``JA`` instructions. The
``JMP`` class permits a 16-bit jump offset specified by the 'offset'
@ -493,26 +508,26 @@ Helper functions
Helper functions are a concept whereby BPF programs can call into a
set of function calls exposed by the underlying platform.
Historically, each helper function was identified by an address
encoded in the imm field. The available helper functions may differ
for each program type, but address values are unique across all program types.
Historically, each helper function was identified by a static ID
encoded in the 'imm' field. The available helper functions may differ
for each program type, but static IDs are unique across all program types.
Platforms that support the BPF Type Format (BTF) support identifying
a helper function by a BTF ID encoded in the imm field, where the BTF ID
a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID
identifies the helper name and type.
Program-local functions
~~~~~~~~~~~~~~~~~~~~~~~
Program-local functions are functions exposed by the same BPF program as the
caller, and are referenced by offset from the call instruction, similar to
``JA``. The offset is encoded in the imm field of the call instruction.
A ``EXIT`` within the program-local function will return to the caller.
``JA``. The offset is encoded in the 'imm' field of the call instruction.
An ``EXIT`` within the program-local function will return to the caller.
Load and store instructions
===========================
For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
8-bit 'opcode' field is divided as::
8-bit 'opcode' field is divided as follows::
+-+-+-+-+-+-+-+-+
|mode |sz |class|
@ -580,7 +595,7 @@ instructions that transfer data between a register and memory.
dst = *(signed size *) (src + offset)
Where size is one of: ``B``, ``H``, or ``W``, and
Where '<size>' is one of: ``B``, ``H``, or ``W``, and
'signed size' is one of: s8, s16, or s32.
Atomic operations
@ -662,11 +677,11 @@ src_reg pseudocode imm type dst type
======= ========================================= =========== ==============
0x0 dst = (next_imm << 32) | imm integer integer
0x1 dst = map_by_fd(imm) map fd map
0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer
0x3 dst = var_addr(imm) variable id data pointer
0x4 dst = code_addr(imm) integer code pointer
0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data address
0x3 dst = var_addr(imm) variable id data address
0x4 dst = code_addr(imm) integer code address
0x5 dst = map_by_idx(imm) map index map
0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer
0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data address
======= ========================================= =========== ==============
where

View File

@ -3822,6 +3822,14 @@ F: kernel/bpf/tnum.c
F: kernel/bpf/trampoline.c
F: kernel/bpf/verifier.c
BPF [CRYPTO]
M: Vadim Fedorenko <vadim.fedorenko@linux.dev>
L: bpf@vger.kernel.org
S: Maintained
F: crypto/bpf_crypto_skcipher.c
F: include/linux/bpf_crypto.h
F: kernel/bpf/crypto.c
BPF [DOCUMENTATION] (Related to Standardization)
R: David Vernet <void@manifault.com>
L: bpf@vger.kernel.org

View File

@ -29,6 +29,7 @@
#define TCALL_CNT (MAX_BPF_JIT_REG + 2)
#define TMP_REG_3 (MAX_BPF_JIT_REG + 3)
#define FP_BOTTOM (MAX_BPF_JIT_REG + 4)
#define ARENA_VM_START (MAX_BPF_JIT_REG + 5)
#define check_imm(bits, imm) do { \
if ((((imm) > 0) && ((imm) >> (bits))) || \
@ -67,6 +68,8 @@ static const int bpf2a64[] = {
/* temporary register for blinding constants */
[BPF_REG_AX] = A64_R(9),
[FP_BOTTOM] = A64_R(27),
/* callee saved register for kern_vm_start address */
[ARENA_VM_START] = A64_R(28),
};
struct jit_ctx {
@ -79,6 +82,7 @@ struct jit_ctx {
__le32 *ro_image;
u32 stack_size;
int fpb_offset;
u64 user_vm_start;
};
struct bpf_plt {
@ -295,7 +299,7 @@ static bool is_lsi_offset(int offset, int scale)
#define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8)
static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
bool is_exception_cb)
bool is_exception_cb, u64 arena_vm_start)
{
const struct bpf_prog *prog = ctx->prog;
const bool is_main_prog = !bpf_is_subprog(prog);
@ -306,6 +310,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
const u8 fp = bpf2a64[BPF_REG_FP];
const u8 tcc = bpf2a64[TCALL_CNT];
const u8 fpb = bpf2a64[FP_BOTTOM];
const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
const int idx0 = ctx->idx;
int cur_offset;
@ -411,6 +416,10 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
/* Set up function call stack */
emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
if (arena_vm_start)
emit_a64_mov_i64(arena_vm_base, arena_vm_start, ctx);
return 0;
}
@ -738,6 +747,7 @@ static void build_epilogue(struct jit_ctx *ctx, bool is_exception_cb)
#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
#define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
bool ex_handler_bpf(const struct exception_table_entry *ex,
struct pt_regs *regs)
@ -745,7 +755,8 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
regs->regs[dst_reg] = 0;
if (dst_reg != DONT_CLEAR)
regs->regs[dst_reg] = 0;
regs->pc = (unsigned long)&ex->fixup - offset;
return true;
}
@ -765,7 +776,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
return 0;
if (BPF_MODE(insn->code) != BPF_PROBE_MEM &&
BPF_MODE(insn->code) != BPF_PROBE_MEMSX)
BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
BPF_MODE(insn->code) != BPF_PROBE_MEM32)
return 0;
if (!ctx->prog->aux->extable ||
@ -810,6 +822,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
ex->insn = ins_offset;
if (BPF_CLASS(insn->code) != BPF_LDX)
dst_reg = DONT_CLEAR;
ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
@ -829,12 +844,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
bool extra_pass)
{
const u8 code = insn->code;
const u8 dst = bpf2a64[insn->dst_reg];
const u8 src = bpf2a64[insn->src_reg];
u8 dst = bpf2a64[insn->dst_reg];
u8 src = bpf2a64[insn->src_reg];
const u8 tmp = bpf2a64[TMP_REG_1];
const u8 tmp2 = bpf2a64[TMP_REG_2];
const u8 fp = bpf2a64[BPF_REG_FP];
const u8 fpb = bpf2a64[FP_BOTTOM];
const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
const s16 off = insn->off;
const s32 imm = insn->imm;
const int i = insn - ctx->prog->insnsi;
@ -853,6 +869,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
/* dst = src */
case BPF_ALU | BPF_MOV | BPF_X:
case BPF_ALU64 | BPF_MOV | BPF_X:
if (insn_is_cast_user(insn)) {
emit(A64_MOV(0, tmp, src), ctx); // 32-bit mov clears the upper 32 bits
emit_a64_mov_i(0, dst, ctx->user_vm_start >> 32, ctx);
emit(A64_LSL(1, dst, dst, 32), ctx);
emit(A64_CBZ(1, tmp, 2), ctx);
emit(A64_ORR(1, tmp, dst, tmp), ctx);
emit(A64_MOV(1, dst, tmp), ctx);
break;
}
switch (insn->off) {
case 0:
emit(A64_MOV(is64, dst, src), ctx);
@ -1237,7 +1262,15 @@ emit_cond_jmp:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
if (ctx->fpb_offset > 0 && src == fp) {
case BPF_LDX | BPF_PROBE_MEM32 | BPF_B:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit(A64_ADD(1, tmp2, src, arena_vm_base), ctx);
src = tmp2;
}
if (ctx->fpb_offset > 0 && src == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
src_adj = fpb;
off_adj = off + ctx->fpb_offset;
} else {
@ -1322,7 +1355,15 @@ emit_cond_jmp:
case BPF_ST | BPF_MEM | BPF_H:
case BPF_ST | BPF_MEM | BPF_B:
case BPF_ST | BPF_MEM | BPF_DW:
if (ctx->fpb_offset > 0 && dst == fp) {
case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
case BPF_ST | BPF_PROBE_MEM32 | BPF_DW:
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit(A64_ADD(1, tmp2, dst, arena_vm_base), ctx);
dst = tmp2;
}
if (ctx->fpb_offset > 0 && dst == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
dst_adj = fpb;
off_adj = off + ctx->fpb_offset;
} else {
@ -1365,6 +1406,10 @@ emit_cond_jmp:
}
break;
}
ret = add_exception_handler(insn, ctx, dst);
if (ret)
return ret;
break;
/* STX: *(size *)(dst + off) = src */
@ -1372,7 +1417,15 @@ emit_cond_jmp:
case BPF_STX | BPF_MEM | BPF_H:
case BPF_STX | BPF_MEM | BPF_B:
case BPF_STX | BPF_MEM | BPF_DW:
if (ctx->fpb_offset > 0 && dst == fp) {
case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
case BPF_STX | BPF_PROBE_MEM32 | BPF_DW:
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit(A64_ADD(1, tmp2, dst, arena_vm_base), ctx);
dst = tmp2;
}
if (ctx->fpb_offset > 0 && dst == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
dst_adj = fpb;
off_adj = off + ctx->fpb_offset;
} else {
@ -1413,6 +1466,10 @@ emit_cond_jmp:
}
break;
}
ret = add_exception_handler(insn, ctx, dst);
if (ret)
return ret;
break;
case BPF_STX | BPF_ATOMIC | BPF_W:
@ -1594,6 +1651,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
bool tmp_blinded = false;
bool extra_pass = false;
struct jit_ctx ctx;
u64 arena_vm_start;
u8 *image_ptr;
u8 *ro_image_ptr;
@ -1611,6 +1669,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
prog = tmp;
}
arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
jit_data = prog->aux->jit_data;
if (!jit_data) {
jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
@ -1641,6 +1700,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
}
ctx.fpb_offset = find_fpb_offset(prog);
ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
/*
* 1. Initial fake pass to compute ctx->idx and ctx->offset.
@ -1648,7 +1708,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
* BPF line info needs ctx->offset[i] to be the offset of
* instruction[i] in jited image, so build prologue first.
*/
if (build_prologue(&ctx, was_classic, prog->aux->exception_cb)) {
if (build_prologue(&ctx, was_classic, prog->aux->exception_cb,
arena_vm_start)) {
prog = orig_prog;
goto out_off;
}
@ -1696,7 +1757,7 @@ skip_init_ctx:
ctx.idx = 0;
ctx.exentry_idx = 0;
build_prologue(&ctx, was_classic, prog->aux->exception_cb);
build_prologue(&ctx, was_classic, prog->aux->exception_cb, arena_vm_start);
if (build_body(&ctx, extra_pass)) {
prog = orig_prog;
@ -2461,6 +2522,11 @@ bool bpf_jit_supports_exceptions(void)
return true;
}
bool bpf_jit_supports_arena(void)
{
return true;
}
void bpf_jit_free(struct bpf_prog *prog)
{
if (prog->jited) {

View File

@ -81,6 +81,8 @@ struct rv_jit_context {
int nexentries;
unsigned long flags;
int stack_size;
u64 arena_vm_start;
u64 user_vm_start;
};
/* Convert from ninsns to bytes. */

View File

@ -18,6 +18,7 @@
#define RV_REG_TCC RV_REG_A6
#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
#define RV_REG_ARENA RV_REG_S7 /* For storing arena_vm_start */
static const int regmap[] = {
[BPF_REG_0] = RV_REG_A5,
@ -255,6 +256,10 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
store_offset -= 8;
}
if (ctx->arena_vm_start) {
emit_ld(RV_REG_ARENA, store_offset, RV_REG_SP, ctx);
store_offset -= 8;
}
emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
/* Set return value. */
@ -548,6 +553,7 @@ static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64,
#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
#define REG_DONT_CLEAR_MARKER 0 /* RV_REG_ZERO unused in pt_regmap */
bool ex_handler_bpf(const struct exception_table_entry *ex,
struct pt_regs *regs)
@ -555,7 +561,8 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
int regs_offset = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
*(unsigned long *)((void *)regs + pt_regmap[regs_offset]) = 0;
if (regs_offset != REG_DONT_CLEAR_MARKER)
*(unsigned long *)((void *)regs + pt_regmap[regs_offset]) = 0;
regs->epc = (unsigned long)&ex->fixup - offset;
return true;
@ -572,7 +579,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
off_t fixup_offset;
if (!ctx->insns || !ctx->ro_insns || !ctx->prog->aux->extable ||
(BPF_MODE(insn->code) != BPF_PROBE_MEM && BPF_MODE(insn->code) != BPF_PROBE_MEMSX))
(BPF_MODE(insn->code) != BPF_PROBE_MEM && BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
BPF_MODE(insn->code) != BPF_PROBE_MEM32))
return 0;
if (WARN_ON_ONCE(ctx->nexentries >= ctx->prog->aux->num_exentries))
@ -1073,6 +1081,15 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
/* dst = src */
case BPF_ALU | BPF_MOV | BPF_X:
case BPF_ALU64 | BPF_MOV | BPF_X:
if (insn_is_cast_user(insn)) {
emit_mv(RV_REG_T1, rs, ctx);
emit_zextw(RV_REG_T1, RV_REG_T1, ctx);
emit_imm(rd, (ctx->user_vm_start >> 32) << 32, ctx);
emit(rv_beq(RV_REG_T1, RV_REG_ZERO, 4), ctx);
emit_or(RV_REG_T1, rd, RV_REG_T1, ctx);
emit_mv(rd, RV_REG_T1, ctx);
break;
}
if (imm == 1) {
/* Special mov32 for zext */
emit_zextw(rd, rd, ctx);
@ -1539,6 +1556,11 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
/* LDX | PROBE_MEM32: dst = *(unsigned size *)(src + RV_REG_ARENA + off) */
case BPF_LDX | BPF_PROBE_MEM32 | BPF_B:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
{
int insn_len, insns_start;
bool sign_ext;
@ -1546,6 +1568,11 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
sign_ext = BPF_MODE(insn->code) == BPF_MEMSX ||
BPF_MODE(insn->code) == BPF_PROBE_MEMSX;
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit_add(RV_REG_T2, rs, RV_REG_ARENA, ctx);
rs = RV_REG_T2;
}
switch (BPF_SIZE(code)) {
case BPF_B:
if (is_12b_int(off)) {
@ -1682,6 +1709,86 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
break;
case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
case BPF_ST | BPF_PROBE_MEM32 | BPF_DW:
{
int insn_len, insns_start;
emit_add(RV_REG_T3, rd, RV_REG_ARENA, ctx);
rd = RV_REG_T3;
/* Load imm to a register then store it */
emit_imm(RV_REG_T1, imm, ctx);
switch (BPF_SIZE(code)) {
case BPF_B:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sb(rd, off, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_H:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sh(rd, off, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_W:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sw(rd, off, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit_sw(RV_REG_T2, 0, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_DW:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sd(rd, off, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
insn_len);
if (ret)
return ret;
break;
}
/* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_B:
if (is_12b_int(off)) {
@ -1728,6 +1835,84 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
emit_atomic(rd, rs, off, imm,
BPF_SIZE(code) == BPF_DW, ctx);
break;
case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
case BPF_STX | BPF_PROBE_MEM32 | BPF_DW:
{
int insn_len, insns_start;
emit_add(RV_REG_T2, rd, RV_REG_ARENA, ctx);
rd = RV_REG_T2;
switch (BPF_SIZE(code)) {
case BPF_B:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sb(rd, off, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sb(RV_REG_T1, 0, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_H:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sh(rd, off, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sh(RV_REG_T1, 0, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_W:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sw(rd, off, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit_sw(RV_REG_T1, 0, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_DW:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sd(rd, off, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit_sd(RV_REG_T1, 0, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
insn_len);
if (ret)
return ret;
break;
}
default:
pr_err("bpf-jit: unknown opcode %02x\n", code);
return -EINVAL;
@ -1759,6 +1944,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
stack_adjust += 8;
if (seen_reg(RV_REG_S6, ctx))
stack_adjust += 8;
if (ctx->arena_vm_start)
stack_adjust += 8;
stack_adjust = round_up(stack_adjust, 16);
stack_adjust += bpf_stack_adjust;
@ -1810,6 +1997,10 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
store_offset -= 8;
}
if (ctx->arena_vm_start) {
emit_sd(RV_REG_SP, store_offset, RV_REG_ARENA, ctx);
store_offset -= 8;
}
emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
@ -1823,6 +2014,9 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
ctx->stack_size = stack_adjust;
if (ctx->arena_vm_start)
emit_imm(RV_REG_ARENA, ctx->arena_vm_start, ctx);
}
void bpf_jit_build_epilogue(struct rv_jit_context *ctx)
@ -1839,3 +2033,8 @@ bool bpf_jit_supports_ptr_xchg(void)
{
return true;
}
bool bpf_jit_supports_arena(void)
{
return true;
}

View File

@ -80,6 +80,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
goto skip_init_ctx;
}
ctx->arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
ctx->user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
ctx->prog = prog;
ctx->offset = kcalloc(prog->len, sizeof(int), GFP_KERNEL);
if (!ctx->offset) {

View File

@ -816,9 +816,10 @@ done:
static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
const u32 imm32_hi, const u32 imm32_lo)
{
u64 imm64 = ((u64)imm32_hi << 32) | (u32)imm32_lo;
u8 *prog = *pprog;
if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) {
if (is_uimm32(imm64)) {
/*
* For emitting plain u32, where sign bit must not be
* propagated LLVM tends to load imm64 over mov32
@ -826,6 +827,8 @@ static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
* 'mov %eax, imm32' instead.
*/
emit_mov_imm32(&prog, false, dst_reg, imm32_lo);
} else if (is_simm32(imm64)) {
emit_mov_imm32(&prog, true, dst_reg, imm32_lo);
} else {
/* movabsq rax, imm64 */
EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
@ -1169,6 +1172,54 @@ static int emit_atomic(u8 **pprog, u8 atomic_op,
return 0;
}
static int emit_atomic_index(u8 **pprog, u8 atomic_op, u32 size,
u32 dst_reg, u32 src_reg, u32 index_reg, int off)
{
u8 *prog = *pprog;
EMIT1(0xF0); /* lock prefix */
switch (size) {
case BPF_W:
EMIT1(add_3mod(0x40, dst_reg, src_reg, index_reg));
break;
case BPF_DW:
EMIT1(add_3mod(0x48, dst_reg, src_reg, index_reg));
break;
default:
pr_err("bpf_jit: 1 and 2 byte atomics are not supported\n");
return -EFAULT;
}
/* emit opcode */
switch (atomic_op) {
case BPF_ADD:
case BPF_AND:
case BPF_OR:
case BPF_XOR:
/* lock *(u32/u64*)(dst_reg + idx_reg + off) <op>= src_reg */
EMIT1(simple_alu_opcodes[atomic_op]);
break;
case BPF_ADD | BPF_FETCH:
/* src_reg = atomic_fetch_add(dst_reg + idx_reg + off, src_reg); */
EMIT2(0x0F, 0xC1);
break;
case BPF_XCHG:
/* src_reg = atomic_xchg(dst_reg + idx_reg + off, src_reg); */
EMIT1(0x87);
break;
case BPF_CMPXCHG:
/* r0 = atomic_cmpxchg(dst_reg + idx_reg + off, r0, src_reg); */
EMIT2(0x0F, 0xB1);
break;
default:
pr_err("bpf_jit: unknown atomic opcode %02x\n", atomic_op);
return -EFAULT;
}
emit_insn_suffix_SIB(&prog, dst_reg, src_reg, index_reg, off);
*pprog = prog;
return 0;
}
#define DONT_CLEAR 1
bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
@ -1382,6 +1433,16 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
maybe_emit_mod(&prog, AUX_REG, dst_reg, true);
EMIT3(0x0F, 0x44, add_2reg(0xC0, AUX_REG, dst_reg));
break;
} else if (insn_is_mov_percpu_addr(insn)) {
/* mov <dst>, <src> (if necessary) */
EMIT_mov(dst_reg, src_reg);
#ifdef CONFIG_SMP
/* add <dst>, gs:[<off>] */
EMIT2(0x65, add_1mod(0x48, dst_reg));
EMIT3(0x03, add_2reg(0x04, 0, dst_reg), 0x25);
EMIT((u32)(unsigned long)&this_cpu_off, 4);
#endif
break;
}
fallthrough;
case BPF_ALU | BPF_MOV | BPF_X:
@ -1969,6 +2030,15 @@ populate_extable:
return err;
break;
case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
start_of_ldx = prog;
err = emit_atomic_index(&prog, insn->imm, BPF_SIZE(insn->code),
dst_reg, src_reg, X86_REG_R12, insn->off);
if (err)
return err;
goto populate_extable;
/* call */
case BPF_JMP | BPF_CALL: {
u8 *ip = image + addrs[i - 1];
@ -3362,6 +3432,11 @@ bool bpf_jit_supports_subprog_tailcalls(void)
return true;
}
bool bpf_jit_supports_percpu_insn(void)
{
return true;
}
void bpf_jit_free(struct bpf_prog *prog)
{
if (prog->jited) {
@ -3465,6 +3540,21 @@ bool bpf_jit_supports_arena(void)
return true;
}
bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
{
if (!in_arena)
return true;
switch (insn->code) {
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm == (BPF_AND | BPF_FETCH) ||
insn->imm == (BPF_OR | BPF_FETCH) ||
insn->imm == (BPF_XOR | BPF_FETCH))
return false;
}
return true;
}
bool bpf_jit_supports_ptr_xchg(void)
{
return true;

View File

@ -20,6 +20,9 @@ crypto_skcipher-y += lskcipher.o
crypto_skcipher-y += skcipher.o
obj-$(CONFIG_CRYPTO_SKCIPHER2) += crypto_skcipher.o
ifeq ($(CONFIG_BPF_SYSCALL),y)
obj-$(CONFIG_CRYPTO_SKCIPHER2) += bpf_crypto_skcipher.o
endif
obj-$(CONFIG_CRYPTO_SEQIV) += seqiv.o
obj-$(CONFIG_CRYPTO_ECHAINIV) += echainiv.o

View File

@ -0,0 +1,82 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2024 Meta, Inc */
#include <linux/types.h>
#include <linux/module.h>
#include <linux/bpf_crypto.h>
#include <crypto/skcipher.h>
static void *bpf_crypto_lskcipher_alloc_tfm(const char *algo)
{
return crypto_alloc_lskcipher(algo, 0, 0);
}
static void bpf_crypto_lskcipher_free_tfm(void *tfm)
{
crypto_free_lskcipher(tfm);
}
static int bpf_crypto_lskcipher_has_algo(const char *algo)
{
return crypto_has_skcipher(algo, CRYPTO_ALG_TYPE_LSKCIPHER, CRYPTO_ALG_TYPE_MASK);
}
static int bpf_crypto_lskcipher_setkey(void *tfm, const u8 *key, unsigned int keylen)
{
return crypto_lskcipher_setkey(tfm, key, keylen);
}
static u32 bpf_crypto_lskcipher_get_flags(void *tfm)
{
return crypto_lskcipher_get_flags(tfm);
}
static unsigned int bpf_crypto_lskcipher_ivsize(void *tfm)
{
return crypto_lskcipher_ivsize(tfm);
}
static unsigned int bpf_crypto_lskcipher_statesize(void *tfm)
{
return crypto_lskcipher_statesize(tfm);
}
static int bpf_crypto_lskcipher_encrypt(void *tfm, const u8 *src, u8 *dst,
unsigned int len, u8 *siv)
{
return crypto_lskcipher_encrypt(tfm, src, dst, len, siv);
}
static int bpf_crypto_lskcipher_decrypt(void *tfm, const u8 *src, u8 *dst,
unsigned int len, u8 *siv)
{
return crypto_lskcipher_decrypt(tfm, src, dst, len, siv);
}
static const struct bpf_crypto_type bpf_crypto_lskcipher_type = {
.alloc_tfm = bpf_crypto_lskcipher_alloc_tfm,
.free_tfm = bpf_crypto_lskcipher_free_tfm,
.has_algo = bpf_crypto_lskcipher_has_algo,
.setkey = bpf_crypto_lskcipher_setkey,
.encrypt = bpf_crypto_lskcipher_encrypt,
.decrypt = bpf_crypto_lskcipher_decrypt,
.ivsize = bpf_crypto_lskcipher_ivsize,
.statesize = bpf_crypto_lskcipher_statesize,
.get_flags = bpf_crypto_lskcipher_get_flags,
.owner = THIS_MODULE,
.name = "skcipher",
};
static int __init bpf_crypto_skcipher_init(void)
{
return bpf_crypto_register_type(&bpf_crypto_lskcipher_type);
}
static void __exit bpf_crypto_skcipher_exit(void)
{
int err = bpf_crypto_unregister_type(&bpf_crypto_lskcipher_type);
WARN_ON_ONCE(err);
}
module_init(bpf_crypto_skcipher_init);
module_exit(bpf_crypto_skcipher_exit);
MODULE_LICENSE("GPL");

View File

@ -184,8 +184,8 @@ struct bpf_map_ops {
};
enum {
/* Support at most 10 fields in a BTF type */
BTF_FIELDS_MAX = 10,
/* Support at most 11 fields in a BTF type */
BTF_FIELDS_MAX = 11,
};
enum btf_field_type {
@ -202,6 +202,7 @@ enum btf_field_type {
BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE,
BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
BPF_REFCOUNT = (1 << 9),
BPF_WORKQUEUE = (1 << 10),
};
typedef void (*btf_dtor_kfunc_t)(void *);
@ -238,6 +239,7 @@ struct btf_record {
u32 field_mask;
int spin_lock_off;
int timer_off;
int wq_off;
int refcount_off;
struct btf_field fields[];
};
@ -312,6 +314,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
return "bpf_spin_lock";
case BPF_TIMER:
return "bpf_timer";
case BPF_WORKQUEUE:
return "bpf_wq";
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
return "kptr";
@ -340,6 +344,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
return sizeof(struct bpf_spin_lock);
case BPF_TIMER:
return sizeof(struct bpf_timer);
case BPF_WORKQUEUE:
return sizeof(struct bpf_wq);
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
@ -367,6 +373,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
return __alignof__(struct bpf_spin_lock);
case BPF_TIMER:
return __alignof__(struct bpf_timer);
case BPF_WORKQUEUE:
return __alignof__(struct bpf_wq);
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
@ -406,6 +414,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_WORKQUEUE:
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
@ -525,6 +534,7 @@ static inline void zero_map_value(struct bpf_map *map, void *dst)
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
bool lock_src);
void bpf_timer_cancel_and_free(void *timer);
void bpf_wq_cancel_and_free(void *timer);
void bpf_list_head_free(const struct btf_field *field, void *list_head,
struct bpf_spin_lock *spin_lock);
void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
@ -1265,6 +1275,7 @@ int bpf_dynptr_check_size(u32 size);
u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr);
const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len);
void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len);
bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr);
#ifdef CONFIG_BPF_JIT
int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
@ -2209,6 +2220,7 @@ void bpf_map_free_record(struct bpf_map *map);
struct btf_record *btf_record_dup(const struct btf_record *rec);
bool btf_record_equal(const struct btf_record *rec_a, const struct btf_record *rec_b);
void bpf_obj_free_timer(const struct btf_record *rec, void *obj);
void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj);
void bpf_obj_free_fields(const struct btf_record *rec, void *obj);
void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu);
@ -3010,6 +3022,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
int sock_map_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr);
int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog);
void sock_map_unhash(struct sock *sk);
void sock_map_destroy(struct sock *sk);
@ -3108,6 +3121,11 @@ static inline int sock_map_bpf_prog_query(const union bpf_attr *attr,
{
return -EINVAL;
}
static inline int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
#endif /* CONFIG_BPF_SYSCALL */
#endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */

View File

@ -0,0 +1,24 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#ifndef _BPF_CRYPTO_H
#define _BPF_CRYPTO_H
struct bpf_crypto_type {
void *(*alloc_tfm)(const char *algo);
void (*free_tfm)(void *tfm);
int (*has_algo)(const char *algo);
int (*setkey)(void *tfm, const u8 *key, unsigned int keylen);
int (*setauthsize)(void *tfm, unsigned int authsize);
int (*encrypt)(void *tfm, const u8 *src, u8 *dst, unsigned int len, u8 *iv);
int (*decrypt)(void *tfm, const u8 *src, u8 *dst, unsigned int len, u8 *iv);
unsigned int (*ivsize)(void *tfm);
unsigned int (*statesize)(void *tfm);
u32 (*get_flags)(void *tfm);
struct module *owner;
char name[14];
};
int bpf_crypto_register_type(const struct bpf_crypto_type *type);
int bpf_crypto_unregister_type(const struct bpf_crypto_type *type);
#endif /* _BPF_CRYPTO_H */

View File

@ -421,11 +421,13 @@ struct bpf_verifier_state {
struct bpf_active_lock active_lock;
bool speculative;
bool active_rcu_lock;
u32 active_preempt_lock;
/* If this state was ever pointed-to by other state's loop_entry field
* this flag would be set to true. Used to avoid freeing such states
* while they are still in use.
*/
bool used_as_loop_entry;
bool in_sleepable;
/* first and last insn idx of this verifier state */
u32 first_insn_idx;
@ -502,6 +504,13 @@ struct bpf_loop_inline_state {
u32 callback_subprogno; /* valid when fit_for_inline is true */
};
/* pointer and state for maps */
struct bpf_map_ptr_state {
struct bpf_map *map_ptr;
bool poison;
bool unpriv;
};
/* Possible states for alu_state member. */
#define BPF_ALU_SANITIZE_SRC (1U << 0)
#define BPF_ALU_SANITIZE_DST (1U << 1)
@ -514,7 +523,7 @@ struct bpf_loop_inline_state {
struct bpf_insn_aux_data {
union {
enum bpf_reg_type ptr_type; /* pointer type for load/store insns */
unsigned long map_ptr_state; /* pointer/poison value for maps */
struct bpf_map_ptr_state map_ptr_state;
s32 call_imm; /* saved imm field of call insn */
u32 alu_limit; /* limit for add/sub register with pointer */
struct {

View File

@ -75,6 +75,9 @@ struct ctl_table_header;
/* unused opcode to mark special load instruction. Same as BPF_MSH */
#define BPF_PROBE_MEM32 0xa0
/* unused opcode to mark special atomic instruction */
#define BPF_PROBE_ATOMIC 0xe0
/* unused opcode to mark call to interpreter with arguments */
#define BPF_CALL_ARGS 0xe0
@ -178,6 +181,25 @@ struct ctl_table_header;
.off = 0, \
.imm = 0 })
/* Special (internal-only) form of mov, used to resolve per-CPU addrs:
* dst_reg = src_reg + <percpu_base_off>
* BPF_ADDR_PERCPU is used as a special insn->off value.
*/
#define BPF_ADDR_PERCPU (-1)
#define BPF_MOV64_PERCPU_REG(DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = BPF_ADDR_PERCPU, \
.imm = 0 })
static inline bool insn_is_mov_percpu_addr(const struct bpf_insn *insn)
{
return insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->off == BPF_ADDR_PERCPU;
}
/* Short form of mov, dst_reg = imm32 */
#define BPF_MOV64_IMM(DST, IMM) \
@ -654,14 +676,16 @@ static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
cant_migrate();
if (static_branch_unlikely(&bpf_stats_enabled_key)) {
struct bpf_prog_stats *stats;
u64 start = sched_clock();
u64 duration, start = sched_clock();
unsigned long flags;
ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
duration = sched_clock() - start;
stats = this_cpu_ptr(prog->stats);
flags = u64_stats_update_begin_irqsave(&stats->syncp);
u64_stats_inc(&stats->cnt);
u64_stats_add(&stats->nsecs, sched_clock() - start);
u64_stats_add(&stats->nsecs, duration);
u64_stats_update_end_irqrestore(&stats->syncp, flags);
} else {
ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
@ -970,11 +994,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
void bpf_jit_compile(struct bpf_prog *prog);
bool bpf_jit_needs_zext(void);
bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_percpu_insn(void);
bool bpf_jit_supports_kfunc_call(void);
bool bpf_jit_supports_far_kfunc_call(void);
bool bpf_jit_supports_exceptions(void);
bool bpf_jit_supports_ptr_xchg(void);
bool bpf_jit_supports_arena(void);
bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena);
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
bool bpf_helper_changes_pkt_data(void *func);

View File

@ -58,6 +58,10 @@ struct sk_psock_progs {
struct bpf_prog *stream_parser;
struct bpf_prog *stream_verdict;
struct bpf_prog *skb_verdict;
struct bpf_link *msg_parser_link;
struct bpf_link *stream_parser_link;
struct bpf_link *stream_verdict_link;
struct bpf_link *skb_verdict_link;
};
enum sk_psock_state_bits {

View File

@ -2711,10 +2711,10 @@ static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN, 0, NULL) == 1);
}
static inline void tcp_bpf_rtt(struct sock *sk)
static inline void tcp_bpf_rtt(struct sock *sk, long mrtt, u32 srtt)
{
if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_RTT_CB_FLAG))
tcp_call_bpf(sk, BPF_SOCK_OPS_RTT_CB, 0, NULL);
tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_RTT_CB, mrtt, srtt);
}
#if IS_ENABLED(CONFIG_SMC)

View File

@ -7,6 +7,23 @@
#include <linux/tracepoint.h>
TRACE_EVENT(bpf_trigger_tp,
TP_PROTO(int nonce),
TP_ARGS(nonce),
TP_STRUCT__entry(
__field(int, nonce)
),
TP_fast_assign(
__entry->nonce = nonce;
),
TP_printk("nonce %d", __entry->nonce)
);
DECLARE_EVENT_CLASS(bpf_test_finish,
TP_PROTO(int *err),

View File

@ -1135,6 +1135,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_TCX = 11,
BPF_LINK_TYPE_UPROBE_MULTI = 12,
BPF_LINK_TYPE_NETKIT = 13,
BPF_LINK_TYPE_SOCKMAP = 14,
__MAX_BPF_LINK_TYPE,
};
@ -3394,6 +3395,10 @@ union bpf_attr {
* for the nexthop. If the src addr cannot be derived,
* **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this
* case, *params*->dmac and *params*->smac are not set either.
* **BPF_FIB_LOOKUP_MARK**
* Use the mark present in *params*->mark for the fib lookup.
* This option should not be used with BPF_FIB_LOOKUP_DIRECT,
* as it only has meaning for full lookups.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs.
@ -5022,7 +5027,7 @@ union bpf_attr {
* bytes will be copied to *dst*
* Return
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
* **-EOPNOTSUPP** if IMA is disabled or **-EINVAL** if
* invalid arguments are passed.
*
* struct socket *bpf_sock_from_file(struct file *file)
@ -5508,7 +5513,7 @@ union bpf_attr {
* bytes will be copied to *dst*
* Return
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* **-EOPNOTSUPP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
*
* void *bpf_kptr_xchg(void *map_value, void *ptr)
@ -6720,6 +6725,10 @@ struct bpf_link_info {
__u32 ifindex;
__u32 attach_type;
} netkit;
struct {
__u32 map_id;
__u32 attach_type;
} sockmap;
};
} __attribute__((aligned(8)));
@ -6938,6 +6947,8 @@ enum {
* socket transition to LISTEN state.
*/
BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
* Arg1: measured RTT input (mrtt)
* Arg2: updated srtt
*/
BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option.
* It will be called to handle
@ -7120,6 +7131,7 @@ enum {
BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
BPF_FIB_LOOKUP_TBID = (1U << 3),
BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
};
enum {
@ -7152,7 +7164,7 @@ struct bpf_fib_lookup {
/* output: MTU value */
__u16 mtu_result;
};
} __attribute__((packed, aligned(2)));
/* input: L3 device index for lookup
* output: device index from FIB lookup
*/
@ -7197,8 +7209,19 @@ struct bpf_fib_lookup {
__u32 tbid;
};
__u8 smac[6]; /* ETH_ALEN */
__u8 dmac[6]; /* ETH_ALEN */
union {
/* input */
struct {
__u32 mark; /* policy routing */
/* 2 4-byte holes for input */
};
/* output: source and dest mac */
struct {
__u8 smac[6]; /* ETH_ALEN */
__u8 dmac[6]; /* ETH_ALEN */
};
};
};
struct bpf_redir_neigh {
@ -7285,6 +7308,10 @@ struct bpf_timer {
__u64 __opaque[2];
} __attribute__((aligned(8)));
struct bpf_wq {
__u64 __opaque[2];
} __attribute__((aligned(8)));
struct bpf_dynptr {
__u64 __opaque[2];
} __attribute__((aligned(8)));

View File

@ -44,6 +44,9 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
obj-$(CONFIG_BPF_SYSCALL) += cpumask.o
obj-${CONFIG_BPF_LSM} += bpf_lsm.o
endif
ifeq ($(CONFIG_CRYPTO),y)
obj-$(CONFIG_BPF_SYSCALL) += crypto.o
endif
obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o

View File

@ -37,7 +37,7 @@
*/
/* number of bytes addressable by LDX/STX insn with 16-bit 'off' field */
#define GUARD_SZ (1ull << sizeof(((struct bpf_insn *)0)->off) * 8)
#define GUARD_SZ (1ull << sizeof_field(struct bpf_insn, off) * 8)
#define KERN_VM_SZ (SZ_4G + GUARD_SZ)
struct bpf_arena {

View File

@ -246,6 +246,38 @@ static void *percpu_array_map_lookup_elem(struct bpf_map *map, void *key)
return this_cpu_ptr(array->pptrs[index & array->index_mask]);
}
/* emit BPF instructions equivalent to C code of percpu_array_map_lookup_elem() */
static int percpu_array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
struct bpf_insn *insn = insn_buf;
if (!bpf_jit_supports_percpu_insn())
return -EOPNOTSUPP;
if (map->map_flags & BPF_F_INNER_MAP)
return -EOPNOTSUPP;
BUILD_BUG_ON(offsetof(struct bpf_array, map) != 0);
*insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, offsetof(struct bpf_array, pptrs));
*insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0);
if (!map->bypass_spec_v1) {
*insn++ = BPF_JMP_IMM(BPF_JGE, BPF_REG_0, map->max_entries, 6);
*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_0, array->index_mask);
} else {
*insn++ = BPF_JMP_IMM(BPF_JGE, BPF_REG_0, map->max_entries, 5);
}
*insn++ = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
*insn++ = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
*insn++ = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
*insn++ = BPF_JMP_IMM(BPF_JA, 0, 0, 1);
*insn++ = BPF_MOV64_IMM(BPF_REG_0, 0);
return insn - insn_buf;
}
static void *percpu_array_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
@ -396,17 +428,21 @@ static void *array_map_vmalloc_addr(struct bpf_array *array)
return (void *)round_down((unsigned long)array, PAGE_SIZE);
}
static void array_map_free_timers(struct bpf_map *map)
static void array_map_free_timers_wq(struct bpf_map *map)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
int i;
/* We don't reset or free fields other than timer on uref dropping to zero. */
if (!btf_record_has_field(map->record, BPF_TIMER))
return;
/* We don't reset or free fields other than timer and workqueue
* on uref dropping to zero.
*/
if (btf_record_has_field(map->record, BPF_TIMER))
for (i = 0; i < array->map.max_entries; i++)
bpf_obj_free_timer(map->record, array_map_elem_ptr(array, i));
for (i = 0; i < array->map.max_entries; i++)
bpf_obj_free_timer(map->record, array_map_elem_ptr(array, i));
if (btf_record_has_field(map->record, BPF_WORKQUEUE))
for (i = 0; i < array->map.max_entries; i++)
bpf_obj_free_workqueue(map->record, array_map_elem_ptr(array, i));
}
/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@ -750,7 +786,7 @@ const struct bpf_map_ops array_map_ops = {
.map_alloc = array_map_alloc,
.map_free = array_map_free,
.map_get_next_key = array_map_get_next_key,
.map_release_uref = array_map_free_timers,
.map_release_uref = array_map_free_timers_wq,
.map_lookup_elem = array_map_lookup_elem,
.map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem,
@ -776,6 +812,7 @@ const struct bpf_map_ops percpu_array_map_ops = {
.map_free = array_map_free,
.map_get_next_key = array_map_get_next_key,
.map_lookup_elem = percpu_array_map_lookup_elem,
.map_gen_lookup = percpu_array_map_gen_lookup,
.map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem,
.map_lookup_percpu_elem = percpu_array_map_lookup_percpu_elem,

View File

@ -318,7 +318,7 @@ static bool check_storage_bpf_ma(struct bpf_local_storage *local_storage,
*
* If the local_storage->list is already empty, the caller will not
* care about the bpf_ma value also because the caller is not
* responsibile to free the local_storage.
* responsible to free the local_storage.
*/
if (storage_smap)

View File

@ -3464,6 +3464,15 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
goto end;
}
}
if (field_mask & BPF_WORKQUEUE) {
if (!strcmp(name, "bpf_wq")) {
if (*seen_mask & BPF_WORKQUEUE)
return -E2BIG;
*seen_mask |= BPF_WORKQUEUE;
type = BPF_WORKQUEUE;
goto end;
}
}
field_mask_test_name(BPF_LIST_HEAD, "bpf_list_head");
field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root");
@ -3515,6 +3524,7 @@ static int btf_find_struct_field(const struct btf *btf,
switch (field_type) {
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_WORKQUEUE:
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
@ -3582,6 +3592,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
switch (field_type) {
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_WORKQUEUE:
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
@ -3816,6 +3827,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
rec->spin_lock_off = -EINVAL;
rec->timer_off = -EINVAL;
rec->wq_off = -EINVAL;
rec->refcount_off = -EINVAL;
for (i = 0; i < cnt; i++) {
field_type_size = btf_field_type_size(info_arr[i].type);
@ -3846,6 +3858,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
/* Cache offset for faster lookup at runtime */
rec->timer_off = rec->fields[i].offset;
break;
case BPF_WORKQUEUE:
WARN_ON_ONCE(rec->wq_off >= 0);
/* Cache offset for faster lookup at runtime */
rec->wq_off = rec->fields[i].offset;
break;
case BPF_REFCOUNT:
WARN_ON_ONCE(rec->refcount_off >= 0);
/* Cache offset for faster lookup at runtime */
@ -5642,8 +5659,8 @@ errout_free:
return ERR_PTR(err);
}
extern char __weak __start_BTF[];
extern char __weak __stop_BTF[];
extern char __start_BTF[];
extern char __stop_BTF[];
extern struct btf *btf_vmlinux;
#define BPF_MAP_TYPE(_id, _ops)
@ -5971,6 +5988,9 @@ struct btf *btf_parse_vmlinux(void)
struct btf *btf = NULL;
int err;
if (!IS_ENABLED(CONFIG_DEBUG_INFO_BTF))
return ERR_PTR(-ENOENT);
env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
if (!env)
return ERR_PTR(-ENOMEM);

View File

@ -747,7 +747,7 @@ const char *__bpf_address_lookup(unsigned long addr, unsigned long *size,
unsigned long symbol_start = ksym->start;
unsigned long symbol_end = ksym->end;
strncpy(sym, ksym->name, KSYM_NAME_LEN);
strscpy(sym, ksym->name, KSYM_NAME_LEN);
ret = sym;
if (size)
@ -813,7 +813,7 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
if (it++ != symnum)
continue;
strncpy(sym, ksym->name, KSYM_NAME_LEN);
strscpy(sym, ksym->name, KSYM_NAME_LEN);
*value = ksym->start;
*type = BPF_SYM_ELF_TYPE;
@ -2218,6 +2218,7 @@ static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn
u64 stack[stack_size / sizeof(u64)]; \
u64 regs[MAX_BPF_EXT_REG] = {}; \
\
kmsan_unpoison_memory(stack, sizeof(stack)); \
FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
ARG1 = (u64) (unsigned long) ctx; \
return ___bpf_prog_run(regs, insn); \
@ -2231,6 +2232,7 @@ static u64 PROG_NAME_ARGS(stack_size)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5, \
u64 stack[stack_size / sizeof(u64)]; \
u64 regs[MAX_BPF_EXT_REG]; \
\
kmsan_unpoison_memory(stack, sizeof(stack)); \
FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
BPF_R1 = r1; \
BPF_R2 = r2; \
@ -2812,7 +2814,7 @@ void bpf_prog_free(struct bpf_prog *fp)
}
EXPORT_SYMBOL_GPL(bpf_prog_free);
/* RNG for unpriviledged user space with separated state from prandom_u32(). */
/* RNG for unprivileged user space with separated state from prandom_u32(). */
static DEFINE_PER_CPU(struct rnd_state, bpf_user_rnd_state);
void bpf_user_rnd_init_once(void)
@ -2943,6 +2945,11 @@ bool __weak bpf_jit_supports_subprog_tailcalls(void)
return false;
}
bool __weak bpf_jit_supports_percpu_insn(void)
{
return false;
}
bool __weak bpf_jit_supports_kfunc_call(void)
{
return false;
@ -2958,6 +2965,11 @@ bool __weak bpf_jit_supports_arena(void)
return false;
}
bool __weak bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
{
return false;
}
/* Return TRUE if the JIT backend satisfies the following two conditions:
* 1) JIT backend supports atomic_xchg() on pointer-sized words.
* 2) Under the specific arch, the implementation of xchg() is the same

View File

@ -474,6 +474,7 @@ static int __init cpumask_kfunc_init(void)
ret = bpf_mem_alloc_init(&bpf_cpumask_ma, sizeof(struct bpf_cpumask), false);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &cpumask_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &cpumask_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &cpumask_kfunc_set);
return ret ?: register_btf_id_dtor_kfuncs(cpumask_dtors,
ARRAY_SIZE(cpumask_dtors),
THIS_MODULE);

385
kernel/bpf/crypto.c Normal file
View File

@ -0,0 +1,385 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2024 Meta, Inc */
#include <linux/bpf.h>
#include <linux/bpf_crypto.h>
#include <linux/bpf_mem_alloc.h>
#include <linux/btf.h>
#include <linux/btf_ids.h>
#include <linux/filter.h>
#include <linux/scatterlist.h>
#include <linux/skbuff.h>
#include <crypto/skcipher.h>
struct bpf_crypto_type_list {
const struct bpf_crypto_type *type;
struct list_head list;
};
/* BPF crypto initialization parameters struct */
/**
* struct bpf_crypto_params - BPF crypto initialization parameters structure
* @type: The string of crypto operation type.
* @reserved: Reserved member, will be reused for more options in future
* Values:
* 0
* @algo: The string of algorithm to initialize.
* @key: The cipher key used to init crypto algorithm.
* @key_len: The length of cipher key.
* @authsize: The length of authentication tag used by algorithm.
*/
struct bpf_crypto_params {
char type[14];
u8 reserved[2];
char algo[128];
u8 key[256];
u32 key_len;
u32 authsize;
};
static LIST_HEAD(bpf_crypto_types);
static DECLARE_RWSEM(bpf_crypto_types_sem);
/**
* struct bpf_crypto_ctx - refcounted BPF crypto context structure
* @type: The pointer to bpf crypto type
* @tfm: The pointer to instance of crypto API struct.
* @siv_len: Size of IV and state storage for cipher
* @rcu: The RCU head used to free the crypto context with RCU safety.
* @usage: Object reference counter. When the refcount goes to 0, the
* memory is released back to the BPF allocator, which provides
* RCU safety.
*/
struct bpf_crypto_ctx {
const struct bpf_crypto_type *type;
void *tfm;
u32 siv_len;
struct rcu_head rcu;
refcount_t usage;
};
int bpf_crypto_register_type(const struct bpf_crypto_type *type)
{
struct bpf_crypto_type_list *node;
int err = -EEXIST;
down_write(&bpf_crypto_types_sem);
list_for_each_entry(node, &bpf_crypto_types, list) {
if (!strcmp(node->type->name, type->name))
goto unlock;
}
node = kmalloc(sizeof(*node), GFP_KERNEL);
err = -ENOMEM;
if (!node)
goto unlock;
node->type = type;
list_add(&node->list, &bpf_crypto_types);
err = 0;
unlock:
up_write(&bpf_crypto_types_sem);
return err;
}
EXPORT_SYMBOL_GPL(bpf_crypto_register_type);
int bpf_crypto_unregister_type(const struct bpf_crypto_type *type)
{
struct bpf_crypto_type_list *node;
int err = -ENOENT;
down_write(&bpf_crypto_types_sem);
list_for_each_entry(node, &bpf_crypto_types, list) {
if (strcmp(node->type->name, type->name))
continue;
list_del(&node->list);
kfree(node);
err = 0;
break;
}
up_write(&bpf_crypto_types_sem);
return err;
}
EXPORT_SYMBOL_GPL(bpf_crypto_unregister_type);
static const struct bpf_crypto_type *bpf_crypto_get_type(const char *name)
{
const struct bpf_crypto_type *type = ERR_PTR(-ENOENT);
struct bpf_crypto_type_list *node;
down_read(&bpf_crypto_types_sem);
list_for_each_entry(node, &bpf_crypto_types, list) {
if (strcmp(node->type->name, name))
continue;
if (try_module_get(node->type->owner))
type = node->type;
break;
}
up_read(&bpf_crypto_types_sem);
return type;
}
__bpf_kfunc_start_defs();
/**
* bpf_crypto_ctx_create() - Create a mutable BPF crypto context.
*
* Allocates a crypto context that can be used, acquired, and released by
* a BPF program. The crypto context returned by this function must either
* be embedded in a map as a kptr, or freed with bpf_crypto_ctx_release().
* As crypto API functions use GFP_KERNEL allocations, this function can
* only be used in sleepable BPF programs.
*
* bpf_crypto_ctx_create() allocates memory for crypto context.
* It may return NULL if no memory is available.
* @params: pointer to struct bpf_crypto_params which contains all the
* details needed to initialise crypto context.
* @params__sz: size of steuct bpf_crypto_params usef by bpf program
* @err: integer to store error code when NULL is returned.
*/
__bpf_kfunc struct bpf_crypto_ctx *
bpf_crypto_ctx_create(const struct bpf_crypto_params *params, u32 params__sz,
int *err)
{
const struct bpf_crypto_type *type;
struct bpf_crypto_ctx *ctx;
if (!params || params->reserved[0] || params->reserved[1] ||
params__sz != sizeof(struct bpf_crypto_params)) {
*err = -EINVAL;
return NULL;
}
type = bpf_crypto_get_type(params->type);
if (IS_ERR(type)) {
*err = PTR_ERR(type);
return NULL;
}
if (!type->has_algo(params->algo)) {
*err = -EOPNOTSUPP;
goto err_module_put;
}
if (!!params->authsize ^ !!type->setauthsize) {
*err = -EOPNOTSUPP;
goto err_module_put;
}
if (!params->key_len || params->key_len > sizeof(params->key)) {
*err = -EINVAL;
goto err_module_put;
}
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx) {
*err = -ENOMEM;
goto err_module_put;
}
ctx->type = type;
ctx->tfm = type->alloc_tfm(params->algo);
if (IS_ERR(ctx->tfm)) {
*err = PTR_ERR(ctx->tfm);
goto err_free_ctx;
}
if (params->authsize) {
*err = type->setauthsize(ctx->tfm, params->authsize);
if (*err)
goto err_free_tfm;
}
*err = type->setkey(ctx->tfm, params->key, params->key_len);
if (*err)
goto err_free_tfm;
if (type->get_flags(ctx->tfm) & CRYPTO_TFM_NEED_KEY) {
*err = -EINVAL;
goto err_free_tfm;
}
ctx->siv_len = type->ivsize(ctx->tfm) + type->statesize(ctx->tfm);
refcount_set(&ctx->usage, 1);
return ctx;
err_free_tfm:
type->free_tfm(ctx->tfm);
err_free_ctx:
kfree(ctx);
err_module_put:
module_put(type->owner);
return NULL;
}
static void crypto_free_cb(struct rcu_head *head)
{
struct bpf_crypto_ctx *ctx;
ctx = container_of(head, struct bpf_crypto_ctx, rcu);
ctx->type->free_tfm(ctx->tfm);
module_put(ctx->type->owner);
kfree(ctx);
}
/**
* bpf_crypto_ctx_acquire() - Acquire a reference to a BPF crypto context.
* @ctx: The BPF crypto context being acquired. The ctx must be a trusted
* pointer.
*
* Acquires a reference to a BPF crypto context. The context returned by this function
* must either be embedded in a map as a kptr, or freed with
* bpf_crypto_ctx_release().
*/
__bpf_kfunc struct bpf_crypto_ctx *
bpf_crypto_ctx_acquire(struct bpf_crypto_ctx *ctx)
{
if (!refcount_inc_not_zero(&ctx->usage))
return NULL;
return ctx;
}
/**
* bpf_crypto_ctx_release() - Release a previously acquired BPF crypto context.
* @ctx: The crypto context being released.
*
* Releases a previously acquired reference to a BPF crypto context. When the final
* reference of the BPF crypto context has been released, its memory
* will be released.
*/
__bpf_kfunc void bpf_crypto_ctx_release(struct bpf_crypto_ctx *ctx)
{
if (refcount_dec_and_test(&ctx->usage))
call_rcu(&ctx->rcu, crypto_free_cb);
}
static int bpf_crypto_crypt(const struct bpf_crypto_ctx *ctx,
const struct bpf_dynptr_kern *src,
const struct bpf_dynptr_kern *dst,
const struct bpf_dynptr_kern *siv,
bool decrypt)
{
u32 src_len, dst_len, siv_len;
const u8 *psrc;
u8 *pdst, *piv;
int err;
if (__bpf_dynptr_is_rdonly(dst))
return -EINVAL;
siv_len = __bpf_dynptr_size(siv);
src_len = __bpf_dynptr_size(src);
dst_len = __bpf_dynptr_size(dst);
if (!src_len || !dst_len)
return -EINVAL;
if (siv_len != ctx->siv_len)
return -EINVAL;
psrc = __bpf_dynptr_data(src, src_len);
if (!psrc)
return -EINVAL;
pdst = __bpf_dynptr_data_rw(dst, dst_len);
if (!pdst)
return -EINVAL;
piv = siv_len ? __bpf_dynptr_data_rw(siv, siv_len) : NULL;
if (siv_len && !piv)
return -EINVAL;
err = decrypt ? ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv)
: ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);
return err;
}
/**
* bpf_crypto_decrypt() - Decrypt buffer using configured context and IV provided.
* @ctx: The crypto context being used. The ctx must be a trusted pointer.
* @src: bpf_dynptr to the encrypted data. Must be a trusted pointer.
* @dst: bpf_dynptr to the buffer where to store the result. Must be a trusted pointer.
* @siv: bpf_dynptr to IV data and state data to be used by decryptor.
*
* Decrypts provided buffer using IV data and the crypto context. Crypto context must be configured.
*/
__bpf_kfunc int bpf_crypto_decrypt(struct bpf_crypto_ctx *ctx,
const struct bpf_dynptr_kern *src,
const struct bpf_dynptr_kern *dst,
const struct bpf_dynptr_kern *siv)
{
return bpf_crypto_crypt(ctx, src, dst, siv, true);
}
/**
* bpf_crypto_encrypt() - Encrypt buffer using configured context and IV provided.
* @ctx: The crypto context being used. The ctx must be a trusted pointer.
* @src: bpf_dynptr to the plain data. Must be a trusted pointer.
* @dst: bpf_dynptr to buffer where to store the result. Must be a trusted pointer.
* @siv: bpf_dynptr to IV data and state data to be used by decryptor.
*
* Encrypts provided buffer using IV data and the crypto context. Crypto context must be configured.
*/
__bpf_kfunc int bpf_crypto_encrypt(struct bpf_crypto_ctx *ctx,
const struct bpf_dynptr_kern *src,
const struct bpf_dynptr_kern *dst,
const struct bpf_dynptr_kern *siv)
{
return bpf_crypto_crypt(ctx, src, dst, siv, false);
}
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(crypt_init_kfunc_btf_ids)
BTF_ID_FLAGS(func, bpf_crypto_ctx_create, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_crypto_ctx_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_crypto_ctx_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_KFUNCS_END(crypt_init_kfunc_btf_ids)
static const struct btf_kfunc_id_set crypt_init_kfunc_set = {
.owner = THIS_MODULE,
.set = &crypt_init_kfunc_btf_ids,
};
BTF_KFUNCS_START(crypt_kfunc_btf_ids)
BTF_ID_FLAGS(func, bpf_crypto_decrypt, KF_RCU)
BTF_ID_FLAGS(func, bpf_crypto_encrypt, KF_RCU)
BTF_KFUNCS_END(crypt_kfunc_btf_ids)
static const struct btf_kfunc_id_set crypt_kfunc_set = {
.owner = THIS_MODULE,
.set = &crypt_kfunc_btf_ids,
};
BTF_ID_LIST(bpf_crypto_dtor_ids)
BTF_ID(struct, bpf_crypto_ctx)
BTF_ID(func, bpf_crypto_ctx_release)
static int __init crypto_kfunc_init(void)
{
int ret;
const struct btf_id_dtor_kfunc bpf_crypto_dtors[] = {
{
.btf_id = bpf_crypto_dtor_ids[0],
.kfunc_btf_id = bpf_crypto_dtor_ids[1]
},
};
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &crypt_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &crypt_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &crypt_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL,
&crypt_init_kfunc_set);
return ret ?: register_btf_id_dtor_kfuncs(bpf_crypto_dtors,
ARRAY_SIZE(bpf_crypto_dtors),
THIS_MODULE);
}
late_initcall(crypto_kfunc_init);

View File

@ -172,6 +172,17 @@ static bool is_addr_space_cast(const struct bpf_insn *insn)
insn->off == BPF_ADDR_SPACE_CAST;
}
/* Special (internal-only) form of mov, used to resolve per-CPU addrs:
* dst_reg = src_reg + <percpu_base_off>
* BPF_ADDR_PERCPU is used as a special insn->off value.
*/
#define BPF_ADDR_PERCPU (-1)
static inline bool is_mov_percpu_addr(const struct bpf_insn *insn)
{
return insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->off == BPF_ADDR_PERCPU;
}
void print_bpf_insn(const struct bpf_insn_cbs *cbs,
const struct bpf_insn *insn,
bool allow_ptr_leaks)
@ -194,6 +205,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
verbose(cbs->private_data, "(%02x) r%d = addr_space_cast(r%d, %d, %d)\n",
insn->code, insn->dst_reg,
insn->src_reg, ((u32)insn->imm) >> 16, (u16)insn->imm);
} else if (is_mov_percpu_addr(insn)) {
verbose(cbs->private_data, "(%02x) r%d = &(void __percpu *)(r%d)\n",
insn->code, insn->dst_reg, insn->src_reg);
} else if (BPF_SRC(insn->code) == BPF_X) {
verbose(cbs->private_data, "(%02x) %c%d %s %s%c%d\n",
insn->code, class == BPF_ALU ? 'w' : 'r',

View File

@ -240,6 +240,26 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
}
}
static void htab_free_prealloced_wq(struct bpf_htab *htab)
{
u32 num_entries = htab->map.max_entries;
int i;
if (!btf_record_has_field(htab->map.record, BPF_WORKQUEUE))
return;
if (htab_has_extra_elems(htab))
num_entries += num_possible_cpus();
for (i = 0; i < num_entries; i++) {
struct htab_elem *elem;
elem = get_htab_elem(htab, i);
bpf_obj_free_workqueue(htab->map.record,
elem->key + round_up(htab->map.key_size, 8));
cond_resched();
}
}
static void htab_free_prealloced_fields(struct bpf_htab *htab)
{
u32 num_entries = htab->map.max_entries;
@ -1490,11 +1510,12 @@ static void delete_all_elements(struct bpf_htab *htab)
hlist_nulls_del_rcu(&l->hash_node);
htab_elem_free(htab, l);
}
cond_resched();
}
migrate_enable();
}
static void htab_free_malloced_timers(struct bpf_htab *htab)
static void htab_free_malloced_timers_or_wq(struct bpf_htab *htab, bool is_timer)
{
int i;
@ -1506,24 +1527,35 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
hlist_nulls_for_each_entry(l, n, head, hash_node) {
/* We only free timer on uref dropping to zero */
bpf_obj_free_timer(htab->map.record, l->key + round_up(htab->map.key_size, 8));
if (is_timer)
bpf_obj_free_timer(htab->map.record,
l->key + round_up(htab->map.key_size, 8));
else
bpf_obj_free_workqueue(htab->map.record,
l->key + round_up(htab->map.key_size, 8));
}
cond_resched_rcu();
}
rcu_read_unlock();
}
static void htab_map_free_timers(struct bpf_map *map)
static void htab_map_free_timers_and_wq(struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
/* We only free timer on uref dropping to zero */
if (!btf_record_has_field(htab->map.record, BPF_TIMER))
return;
if (!htab_is_prealloc(htab))
htab_free_malloced_timers(htab);
else
htab_free_prealloced_timers(htab);
/* We only free timer and workqueue on uref dropping to zero */
if (btf_record_has_field(htab->map.record, BPF_TIMER)) {
if (!htab_is_prealloc(htab))
htab_free_malloced_timers_or_wq(htab, true);
else
htab_free_prealloced_timers(htab);
}
if (btf_record_has_field(htab->map.record, BPF_WORKQUEUE)) {
if (!htab_is_prealloc(htab))
htab_free_malloced_timers_or_wq(htab, false);
else
htab_free_prealloced_wq(htab);
}
}
/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@ -1538,7 +1570,7 @@ static void htab_map_free(struct bpf_map *map)
*/
/* htab no longer uses call_rcu() directly. bpf_mem_alloc does it
* underneath and is reponsible for waiting for callbacks to finish
* underneath and is responsible for waiting for callbacks to finish
* during bpf_mem_alloc_destroy().
*/
if (!htab_is_prealloc(htab)) {
@ -2259,7 +2291,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_alloc = htab_map_alloc,
.map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key,
.map_release_uref = htab_map_free_timers,
.map_release_uref = htab_map_free_timers_and_wq,
.map_lookup_elem = htab_map_lookup_elem,
.map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem,
.map_update_elem = htab_map_update_elem,
@ -2280,7 +2312,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_alloc = htab_map_alloc,
.map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key,
.map_release_uref = htab_map_free_timers,
.map_release_uref = htab_map_free_timers_and_wq,
.map_lookup_elem = htab_lru_map_lookup_elem,
.map_lookup_and_delete_elem = htab_lru_map_lookup_and_delete_elem,
.map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys,
@ -2307,6 +2339,26 @@ static void *htab_percpu_map_lookup_elem(struct bpf_map *map, void *key)
return NULL;
}
/* inline bpf_map_lookup_elem() call for per-CPU hashmap */
static int htab_percpu_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{
struct bpf_insn *insn = insn_buf;
if (!bpf_jit_supports_percpu_insn())
return -EOPNOTSUPP;
BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem,
(void *(*)(struct bpf_map *map, void *key))NULL));
*insn++ = BPF_EMIT_CALL(__htab_map_lookup_elem);
*insn++ = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3);
*insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0,
offsetof(struct htab_elem, key) + map->key_size);
*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
*insn++ = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
return insn - insn_buf;
}
static void *htab_percpu_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu)
{
struct htab_elem *l;
@ -2435,6 +2487,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
.map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key,
.map_lookup_elem = htab_percpu_map_lookup_elem,
.map_gen_lookup = htab_percpu_map_gen_lookup,
.map_lookup_and_delete_elem = htab_percpu_map_lookup_and_delete_elem,
.map_update_elem = htab_percpu_map_update_elem,
.map_delete_elem = htab_map_delete_elem,

View File

@ -1079,11 +1079,20 @@ const struct bpf_func_proto bpf_snprintf_proto = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
};
struct bpf_async_cb {
struct bpf_map *map;
struct bpf_prog *prog;
void __rcu *callback_fn;
void *value;
struct rcu_head rcu;
u64 flags;
};
/* BPF map elements can contain 'struct bpf_timer'.
* Such map owns all of its BPF timers.
* 'struct bpf_timer' is allocated as part of map element allocation
* and it's zero initialized.
* That space is used to keep 'struct bpf_timer_kern'.
* That space is used to keep 'struct bpf_async_kern'.
* bpf_timer_init() allocates 'struct bpf_hrtimer', inits hrtimer, and
* remembers 'struct bpf_map *' pointer it's part of.
* bpf_timer_set_callback() increments prog refcnt and assign bpf callback_fn.
@ -1096,17 +1105,23 @@ const struct bpf_func_proto bpf_snprintf_proto = {
* freeing the timers when inner map is replaced or deleted by user space.
*/
struct bpf_hrtimer {
struct bpf_async_cb cb;
struct hrtimer timer;
struct bpf_map *map;
struct bpf_prog *prog;
void __rcu *callback_fn;
void *value;
struct rcu_head rcu;
};
/* the actual struct hidden inside uapi struct bpf_timer */
struct bpf_timer_kern {
struct bpf_hrtimer *timer;
struct bpf_work {
struct bpf_async_cb cb;
struct work_struct work;
struct work_struct delete_work;
};
/* the actual struct hidden inside uapi struct bpf_timer and bpf_wq */
struct bpf_async_kern {
union {
struct bpf_async_cb *cb;
struct bpf_hrtimer *timer;
struct bpf_work *work;
};
/* bpf_spin_lock is used here instead of spinlock_t to make
* sure that it always fits into space reserved by struct bpf_timer
* regardless of LOCKDEP and spinlock debug flags.
@ -1114,19 +1129,24 @@ struct bpf_timer_kern {
struct bpf_spin_lock lock;
} __attribute__((aligned(8)));
enum bpf_async_type {
BPF_ASYNC_TYPE_TIMER = 0,
BPF_ASYNC_TYPE_WQ,
};
static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
{
struct bpf_hrtimer *t = container_of(hrtimer, struct bpf_hrtimer, timer);
struct bpf_map *map = t->map;
void *value = t->value;
struct bpf_map *map = t->cb.map;
void *value = t->cb.value;
bpf_callback_t callback_fn;
void *key;
u32 idx;
BTF_TYPE_EMIT(struct bpf_timer);
callback_fn = rcu_dereference_check(t->callback_fn, rcu_read_lock_bh_held());
callback_fn = rcu_dereference_check(t->cb.callback_fn, rcu_read_lock_bh_held());
if (!callback_fn)
goto out;
@ -1155,46 +1175,112 @@ out:
return HRTIMER_NORESTART;
}
BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map,
u64, flags)
static void bpf_wq_work(struct work_struct *work)
{
clockid_t clockid = flags & (MAX_CLOCKS - 1);
struct bpf_hrtimer *t;
int ret = 0;
struct bpf_work *w = container_of(work, struct bpf_work, work);
struct bpf_async_cb *cb = &w->cb;
struct bpf_map *map = cb->map;
bpf_callback_t callback_fn;
void *value = cb->value;
void *key;
u32 idx;
BUILD_BUG_ON(MAX_CLOCKS != 16);
BUILD_BUG_ON(sizeof(struct bpf_timer_kern) > sizeof(struct bpf_timer));
BUILD_BUG_ON(__alignof__(struct bpf_timer_kern) != __alignof__(struct bpf_timer));
BTF_TYPE_EMIT(struct bpf_wq);
callback_fn = READ_ONCE(cb->callback_fn);
if (!callback_fn)
return;
if (map->map_type == BPF_MAP_TYPE_ARRAY) {
struct bpf_array *array = container_of(map, struct bpf_array, map);
/* compute the key */
idx = ((char *)value - array->value) / array->elem_size;
key = &idx;
} else { /* hash or lru */
key = value - round_up(map->key_size, 8);
}
rcu_read_lock_trace();
migrate_disable();
callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
migrate_enable();
rcu_read_unlock_trace();
}
static void bpf_wq_delete_work(struct work_struct *work)
{
struct bpf_work *w = container_of(work, struct bpf_work, delete_work);
cancel_work_sync(&w->work);
kfree_rcu(w, cb.rcu);
}
static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
enum bpf_async_type type)
{
struct bpf_async_cb *cb;
struct bpf_hrtimer *t;
struct bpf_work *w;
clockid_t clockid;
size_t size;
int ret = 0;
if (in_nmi())
return -EOPNOTSUPP;
if (flags >= MAX_CLOCKS ||
/* similar to timerfd except _ALARM variants are not supported */
(clockid != CLOCK_MONOTONIC &&
clockid != CLOCK_REALTIME &&
clockid != CLOCK_BOOTTIME))
switch (type) {
case BPF_ASYNC_TYPE_TIMER:
size = sizeof(struct bpf_hrtimer);
break;
case BPF_ASYNC_TYPE_WQ:
size = sizeof(struct bpf_work);
break;
default:
return -EINVAL;
__bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer;
}
__bpf_spin_lock_irqsave(&async->lock);
t = async->timer;
if (t) {
ret = -EBUSY;
goto out;
}
/* allocate hrtimer via map_kmalloc to use memcg accounting */
t = bpf_map_kmalloc_node(map, sizeof(*t), GFP_ATOMIC, map->numa_node);
if (!t) {
cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC, map->numa_node);
if (!cb) {
ret = -ENOMEM;
goto out;
}
t->value = (void *)timer - map->record->timer_off;
t->map = map;
t->prog = NULL;
rcu_assign_pointer(t->callback_fn, NULL);
hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
t->timer.function = bpf_timer_cb;
WRITE_ONCE(timer->timer, t);
/* Guarantee the order between timer->timer and map->usercnt. So
switch (type) {
case BPF_ASYNC_TYPE_TIMER:
clockid = flags & (MAX_CLOCKS - 1);
t = (struct bpf_hrtimer *)cb;
hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
t->timer.function = bpf_timer_cb;
cb->value = (void *)async - map->record->timer_off;
break;
case BPF_ASYNC_TYPE_WQ:
w = (struct bpf_work *)cb;
INIT_WORK(&w->work, bpf_wq_work);
INIT_WORK(&w->delete_work, bpf_wq_delete_work);
cb->value = (void *)async - map->record->wq_off;
break;
}
cb->map = map;
cb->prog = NULL;
cb->flags = flags;
rcu_assign_pointer(cb->callback_fn, NULL);
WRITE_ONCE(async->cb, cb);
/* Guarantee the order between async->cb and map->usercnt. So
* when there are concurrent uref release and bpf timer init, either
* bpf_timer_cancel_and_free() called by uref release reads a no-NULL
* timer or atomic64_read() below returns a zero usercnt.
@ -1204,15 +1290,34 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
/* maps with timers must be either held by user space
* or pinned in bpffs.
*/
WRITE_ONCE(timer->timer, NULL);
kfree(t);
WRITE_ONCE(async->cb, NULL);
kfree(cb);
ret = -EPERM;
}
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
__bpf_spin_unlock_irqrestore(&async->lock);
return ret;
}
BPF_CALL_3(bpf_timer_init, struct bpf_async_kern *, timer, struct bpf_map *, map,
u64, flags)
{
clock_t clockid = flags & (MAX_CLOCKS - 1);
BUILD_BUG_ON(MAX_CLOCKS != 16);
BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_timer));
BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_timer));
if (flags >= MAX_CLOCKS ||
/* similar to timerfd except _ALARM variants are not supported */
(clockid != CLOCK_MONOTONIC &&
clockid != CLOCK_REALTIME &&
clockid != CLOCK_BOOTTIME))
return -EINVAL;
return __bpf_async_init(timer, map, flags, BPF_ASYNC_TYPE_TIMER);
}
static const struct bpf_func_proto bpf_timer_init_proto = {
.func = bpf_timer_init,
.gpl_only = true,
@ -1222,22 +1327,23 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
.arg3_type = ARG_ANYTHING,
};
BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
struct bpf_prog_aux *, aux)
static int __bpf_async_set_callback(struct bpf_async_kern *async, void *callback_fn,
struct bpf_prog_aux *aux, unsigned int flags,
enum bpf_async_type type)
{
struct bpf_prog *prev, *prog = aux->prog;
struct bpf_hrtimer *t;
struct bpf_async_cb *cb;
int ret = 0;
if (in_nmi())
return -EOPNOTSUPP;
__bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer;
if (!t) {
__bpf_spin_lock_irqsave(&async->lock);
cb = async->cb;
if (!cb) {
ret = -EINVAL;
goto out;
}
if (!atomic64_read(&t->map->usercnt)) {
if (!atomic64_read(&cb->map->usercnt)) {
/* maps with timers must be either held by user space
* or pinned in bpffs. Otherwise timer might still be
* running even when bpf prog is detached and user space
@ -1246,7 +1352,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
ret = -EPERM;
goto out;
}
prev = t->prog;
prev = cb->prog;
if (prev != prog) {
/* Bump prog refcnt once. Every bpf_timer_set_callback()
* can pick different callback_fn-s within the same prog.
@ -1259,14 +1365,20 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
if (prev)
/* Drop prev prog refcnt when swapping with new prog */
bpf_prog_put(prev);
t->prog = prog;
cb->prog = prog;
}
rcu_assign_pointer(t->callback_fn, callback_fn);
rcu_assign_pointer(cb->callback_fn, callback_fn);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
__bpf_spin_unlock_irqrestore(&async->lock);
return ret;
}
BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callback_fn,
struct bpf_prog_aux *, aux)
{
return __bpf_async_set_callback(timer, callback_fn, aux, 0, BPF_ASYNC_TYPE_TIMER);
}
static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.func = bpf_timer_set_callback,
.gpl_only = true,
@ -1275,7 +1387,7 @@ static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.arg2_type = ARG_PTR_TO_FUNC,
};
BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, flags)
BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, flags)
{
struct bpf_hrtimer *t;
int ret = 0;
@ -1287,7 +1399,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
return -EINVAL;
__bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer;
if (!t || !t->prog) {
if (!t || !t->cb.prog) {
ret = -EINVAL;
goto out;
}
@ -1315,18 +1427,18 @@ static const struct bpf_func_proto bpf_timer_start_proto = {
.arg3_type = ARG_ANYTHING,
};
static void drop_prog_refcnt(struct bpf_hrtimer *t)
static void drop_prog_refcnt(struct bpf_async_cb *async)
{
struct bpf_prog *prog = t->prog;
struct bpf_prog *prog = async->prog;
if (prog) {
bpf_prog_put(prog);
t->prog = NULL;
rcu_assign_pointer(t->callback_fn, NULL);
async->prog = NULL;
rcu_assign_pointer(async->callback_fn, NULL);
}
}
BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
{
struct bpf_hrtimer *t;
int ret = 0;
@ -1348,7 +1460,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
ret = -EDEADLK;
goto out;
}
drop_prog_refcnt(t);
drop_prog_refcnt(&t->cb);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
/* Cancel the timer and wait for associated callback to finish
@ -1366,36 +1478,44 @@ static const struct bpf_func_proto bpf_timer_cancel_proto = {
.arg1_type = ARG_PTR_TO_TIMER,
};
static struct bpf_async_cb *__bpf_async_cancel_and_free(struct bpf_async_kern *async)
{
struct bpf_async_cb *cb;
/* Performance optimization: read async->cb without lock first. */
if (!READ_ONCE(async->cb))
return NULL;
__bpf_spin_lock_irqsave(&async->lock);
/* re-read it under lock */
cb = async->cb;
if (!cb)
goto out;
drop_prog_refcnt(cb);
/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
* this timer, since it won't be initialized.
*/
WRITE_ONCE(async->cb, NULL);
out:
__bpf_spin_unlock_irqrestore(&async->lock);
return cb;
}
/* This function is called by map_delete/update_elem for individual element and
* by ops->map_release_uref when the user space reference to a map reaches zero.
*/
void bpf_timer_cancel_and_free(void *val)
{
struct bpf_timer_kern *timer = val;
struct bpf_hrtimer *t;
/* Performance optimization: read timer->timer without lock first. */
if (!READ_ONCE(timer->timer))
return;
t = (struct bpf_hrtimer *)__bpf_async_cancel_and_free(val);
__bpf_spin_lock_irqsave(&timer->lock);
/* re-read it under lock */
t = timer->timer;
if (!t)
goto out;
drop_prog_refcnt(t);
/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
* this timer, since it won't be initialized.
*/
WRITE_ONCE(timer->timer, NULL);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
if (!t)
return;
/* Cancel the timer and wait for callback to complete if it was running.
* If hrtimer_cancel() can be safely called it's safe to call kfree(t)
* right after for both preallocated and non-preallocated maps.
* The timer->timer = NULL was already done and no code path can
* The async->cb = NULL was already done and no code path can
* see address 't' anymore.
*
* Check that bpf_map_delete/update_elem() wasn't called from timer
@ -1404,13 +1524,33 @@ out:
* return -1). Though callback_fn is still running on this cpu it's
* safe to do kfree(t) because bpf_timer_cb() read everything it needed
* from 't'. The bpf subprog callback_fn won't be able to access 't',
* since timer->timer = NULL was already done. The timer will be
* since async->cb = NULL was already done. The timer will be
* effectively cancelled because bpf_timer_cb() will return
* HRTIMER_NORESTART.
*/
if (this_cpu_read(hrtimer_running) != t)
hrtimer_cancel(&t->timer);
kfree_rcu(t, rcu);
kfree_rcu(t, cb.rcu);
}
/* This function is called by map_delete/update_elem for individual element and
* by ops->map_release_uref when the user space reference to a map reaches zero.
*/
void bpf_wq_cancel_and_free(void *val)
{
struct bpf_work *work;
BTF_TYPE_EMIT(struct bpf_wq);
work = (struct bpf_work *)__bpf_async_cancel_and_free(val);
if (!work)
return;
/* Trigger cancel of the sleepable work, but *do not* wait for
* it to finish if it was running as we might not be in a
* sleepable context.
* kfree will be called once the work has finished.
*/
schedule_work(&work->delete_work);
}
BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
@ -1443,7 +1583,7 @@ static const struct bpf_func_proto bpf_kptr_xchg_proto = {
#define DYNPTR_SIZE_MASK 0xFFFFFF
#define DYNPTR_RDONLY_BIT BIT(31)
static bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr)
bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr)
{
return ptr->size & DYNPTR_RDONLY_BIT;
}
@ -2412,7 +2552,7 @@ __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 o
/* bpf_dynptr_slice_rdwr is the same logic as bpf_dynptr_slice.
*
* For skb-type dynptrs, it is safe to write into the returned pointer
* if the bpf program allows skb data writes. There are two possiblities
* if the bpf program allows skb data writes. There are two possibilities
* that may occur when calling bpf_dynptr_slice_rdwr:
*
* 1) The requested slice is in the head of the skb. In this case, the
@ -2549,6 +2689,61 @@ __bpf_kfunc void bpf_throw(u64 cookie)
WARN(1, "A call to BPF exception callback should never return\n");
}
__bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags)
{
struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
struct bpf_map *map = p__map;
BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_wq));
BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_wq));
if (flags)
return -EINVAL;
return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
}
__bpf_kfunc int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
{
struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
struct bpf_work *w;
if (in_nmi())
return -EOPNOTSUPP;
if (flags)
return -EINVAL;
w = READ_ONCE(async->work);
if (!w || !READ_ONCE(w->cb.prog))
return -EINVAL;
schedule_work(&w->work);
return 0;
}
__bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq,
int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
unsigned int flags,
void *aux__ign)
{
struct bpf_prog_aux *aux = (struct bpf_prog_aux *)aux__ign;
struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
if (flags)
return -EINVAL;
return __bpf_async_set_callback(async, callback_fn, aux, flags, BPF_ASYNC_TYPE_WQ);
}
__bpf_kfunc void bpf_preempt_disable(void)
{
preempt_disable();
}
__bpf_kfunc void bpf_preempt_enable(void)
{
preempt_enable();
}
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(generic_btf_ids)
@ -2625,6 +2820,12 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
BTF_ID_FLAGS(func, bpf_dynptr_size)
BTF_ID_FLAGS(func, bpf_dynptr_clone)
BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
BTF_ID_FLAGS(func, bpf_wq_init)
BTF_ID_FLAGS(func, bpf_wq_set_callback_impl)
BTF_ID_FLAGS(func, bpf_wq_start)
BTF_ID_FLAGS(func, bpf_preempt_disable)
BTF_ID_FLAGS(func, bpf_preempt_enable)
BTF_KFUNCS_END(common_btf_ids)
static const struct btf_kfunc_id_set common_kfunc_set = {
@ -2652,6 +2853,7 @@ static int __init kfunc_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &generic_kfunc_set);
ret = ret ?: register_btf_id_dtor_kfuncs(generic_dtors,
ARRAY_SIZE(generic_dtors),
THIS_MODULE);

View File

@ -467,9 +467,9 @@ const char *reg_type_str(struct bpf_verifier_env *env, enum bpf_reg_type type)
if (type & PTR_MAYBE_NULL) {
if (base_type(type) == PTR_TO_BTF_ID)
strncpy(postfix, "or_null_", 16);
strscpy(postfix, "or_null_");
else
strncpy(postfix, "_or_null", 16);
strscpy(postfix, "_or_null");
}
snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s%s",

View File

@ -316,6 +316,7 @@ static long trie_update_elem(struct bpf_map *map,
{
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL;
struct lpm_trie_node *free_node = NULL;
struct lpm_trie_node __rcu **slot;
struct bpf_lpm_trie_key_u8 *key = _key;
unsigned long irq_flags;
@ -390,7 +391,7 @@ static long trie_update_elem(struct bpf_map *map,
trie->n_entries--;
rcu_assign_pointer(*slot, new_node);
kfree_rcu(node, rcu);
free_node = node;
goto out;
}
@ -437,6 +438,7 @@ out:
}
spin_unlock_irqrestore(&trie->lock, irq_flags);
kfree_rcu(free_node, rcu);
return ret;
}
@ -445,6 +447,7 @@ out:
static long trie_delete_elem(struct bpf_map *map, void *_key)
{
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
struct lpm_trie_node *free_node = NULL, *free_parent = NULL;
struct bpf_lpm_trie_key_u8 *key = _key;
struct lpm_trie_node __rcu **trim, **trim2;
struct lpm_trie_node *node, *parent;
@ -514,8 +517,8 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
else
rcu_assign_pointer(
*trim2, rcu_access_pointer(parent->child[0]));
kfree_rcu(parent, rcu);
kfree_rcu(node, rcu);
free_parent = parent;
free_node = node;
goto out;
}
@ -529,10 +532,12 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
rcu_assign_pointer(*trim, rcu_access_pointer(node->child[1]));
else
RCU_INIT_POINTER(*trim, NULL);
kfree_rcu(node, rcu);
free_node = node;
out:
spin_unlock_irqrestore(&trie->lock, irq_flags);
kfree_rcu(free_parent, rcu);
kfree_rcu(free_node, rcu);
return ret;
}

View File

@ -559,6 +559,7 @@ void btf_record_free(struct btf_record *rec)
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_REFCOUNT:
case BPF_WORKQUEUE:
/* Nothing to release */
break;
default:
@ -608,6 +609,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_REFCOUNT:
case BPF_WORKQUEUE:
/* Nothing to acquire */
break;
default:
@ -659,6 +661,13 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
bpf_timer_cancel_and_free(obj + rec->timer_off);
}
void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj)
{
if (WARN_ON_ONCE(!btf_record_has_field(rec, BPF_WORKQUEUE)))
return;
bpf_wq_cancel_and_free(obj + rec->wq_off);
}
void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
{
const struct btf_field *fields;
@ -679,6 +688,9 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
case BPF_TIMER:
bpf_timer_cancel_and_free(field_ptr);
break;
case BPF_WORKQUEUE:
bpf_wq_cancel_and_free(field_ptr);
break;
case BPF_KPTR_UNREF:
WRITE_ONCE(*(u64 *)field_ptr, 0);
break;
@ -1085,7 +1097,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
map->record = btf_parse_fields(btf, value_type,
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
BPF_RB_ROOT | BPF_REFCOUNT,
BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE,
map->value_size);
if (!IS_ERR_OR_NULL(map->record)) {
int i;
@ -1115,6 +1127,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
}
break;
case BPF_TIMER:
case BPF_WORKQUEUE:
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
@ -5242,6 +5255,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_PROG_TYPE_SK_LOOKUP:
ret = netns_bpf_link_create(attr, prog);
break;
case BPF_PROG_TYPE_SK_MSG:
case BPF_PROG_TYPE_SK_SKB:
ret = sock_map_link_create(attr, prog);
break;
#ifdef CONFIG_NET
case BPF_PROG_TYPE_XDP:
ret = bpf_xdp_link_attach(attr, prog);

View File

@ -9,8 +9,8 @@
#include <linux/sysfs.h>
/* See scripts/link-vmlinux.sh, gen_btf() func for details */
extern char __weak __start_BTF[];
extern char __weak __stop_BTF[];
extern char __start_BTF[];
extern char __stop_BTF[];
static ssize_t
btf_vmlinux_read(struct file *file, struct kobject *kobj,
@ -32,7 +32,7 @@ static int __init btf_vmlinux_init(void)
{
bin_attr_btf_vmlinux.size = __stop_BTF - __start_BTF;
if (!__start_BTF || bin_attr_btf_vmlinux.size == 0)
if (bin_attr_btf_vmlinux.size == 0)
return 0;
btf_kobj = kobject_create_and_add("btf", kernel_kobj);

View File

@ -885,12 +885,13 @@ static void notrace update_prog_stats(struct bpf_prog *prog,
* Hence check that 'start' is valid.
*/
start > NO_START_TIME) {
u64 duration = sched_clock() - start;
unsigned long flags;
stats = this_cpu_ptr(prog->stats);
flags = u64_stats_update_begin_irqsave(&stats->syncp);
u64_stats_inc(&stats->cnt);
u64_stats_add(&stats->nsecs, sched_clock() - start);
u64_stats_add(&stats->nsecs, duration);
u64_stats_update_end_irqrestore(&stats->syncp, flags);
}
}

View File

@ -172,7 +172,7 @@ static bool bpf_global_percpu_ma_set;
/* verifier_state + insn_idx are pushed to stack when branch is encountered */
struct bpf_verifier_stack_elem {
/* verifer state is 'st'
/* verifier state is 'st'
* before processing instruction 'insn_idx'
* and after processing instruction 'prev_insn_idx'
*/
@ -190,11 +190,6 @@ struct bpf_verifier_stack_elem {
#define BPF_MAP_KEY_POISON (1ULL << 63)
#define BPF_MAP_KEY_SEEN (1ULL << 62)
#define BPF_MAP_PTR_UNPRIV 1UL
#define BPF_MAP_PTR_POISON ((void *)((0xeB9FUL << 1) + \
POISON_POINTER_DELTA))
#define BPF_MAP_PTR(X) ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
#define BPF_GLOBAL_PERCPU_MA_MAX_SIZE 512
static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
@ -209,21 +204,22 @@ static bool is_trusted_reg(const struct bpf_reg_state *reg);
static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
{
return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON;
return aux->map_ptr_state.poison;
}
static bool bpf_map_ptr_unpriv(const struct bpf_insn_aux_data *aux)
{
return aux->map_ptr_state & BPF_MAP_PTR_UNPRIV;
return aux->map_ptr_state.unpriv;
}
static void bpf_map_ptr_store(struct bpf_insn_aux_data *aux,
const struct bpf_map *map, bool unpriv)
struct bpf_map *map,
bool unpriv, bool poison)
{
BUILD_BUG_ON((unsigned long)BPF_MAP_PTR_POISON & BPF_MAP_PTR_UNPRIV);
unpriv |= bpf_map_ptr_unpriv(aux);
aux->map_ptr_state = (unsigned long)map |
(unpriv ? BPF_MAP_PTR_UNPRIV : 0UL);
aux->map_ptr_state.unpriv = unpriv;
aux->map_ptr_state.poison = poison;
aux->map_ptr_state.map_ptr = map;
}
static bool bpf_map_key_poisoned(const struct bpf_insn_aux_data *aux)
@ -336,6 +332,10 @@ struct bpf_kfunc_call_arg_meta {
u8 spi;
u8 frameno;
} iter;
struct {
struct bpf_map *ptr;
int uid;
} map;
u64 mem_size;
};
@ -501,8 +501,12 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
}
static bool is_sync_callback_calling_kfunc(u32 btf_id);
static bool is_async_callback_calling_kfunc(u32 btf_id);
static bool is_callback_calling_kfunc(u32 btf_id);
static bool is_bpf_throw_kfunc(struct bpf_insn *insn);
static bool is_bpf_wq_set_callback_impl_kfunc(u32 btf_id);
static bool is_sync_callback_calling_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_for_each_map_elem ||
@ -530,7 +534,8 @@ static bool is_sync_callback_calling_insn(struct bpf_insn *insn)
static bool is_async_callback_calling_insn(struct bpf_insn *insn)
{
return bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm);
return (bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm)) ||
(bpf_pseudo_kfunc_call(insn) && is_async_callback_calling_kfunc(insn->imm));
}
static bool is_may_goto_insn(struct bpf_insn *insn)
@ -1429,6 +1434,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
}
dst_state->speculative = src->speculative;
dst_state->active_rcu_lock = src->active_rcu_lock;
dst_state->active_preempt_lock = src->active_preempt_lock;
dst_state->in_sleepable = src->in_sleepable;
dst_state->curframe = src->curframe;
dst_state->active_lock.ptr = src->active_lock.ptr;
dst_state->active_lock.id = src->active_lock.id;
@ -1842,6 +1849,8 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
*/
if (btf_record_has_field(map->inner_map_meta->record, BPF_TIMER))
reg->map_uid = reg->id;
if (btf_record_has_field(map->inner_map_meta->record, BPF_WORKQUEUE))
reg->map_uid = reg->id;
} else if (map->map_type == BPF_MAP_TYPE_XSKMAP) {
reg->type = PTR_TO_XDP_SOCK;
} else if (map->map_type == BPF_MAP_TYPE_SOCKMAP ||
@ -2135,7 +2144,7 @@ static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg)
{
/* Try to tighten 64-bit bounds from 32-bit knowledge, using 32-bit
* values on both sides of 64-bit range in hope to have tigher range.
* values on both sides of 64-bit range in hope to have tighter range.
* E.g., if r1 is [0x1'00000000, 0x3'80000000], and we learn from
* 32-bit signed > 0 operation that s32 bounds are now [1; 0x7fffffff].
* With this, we can substitute 1 as low 32-bits of _low_ 64-bit bound
@ -2143,7 +2152,7 @@ static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg)
* _high_ 64-bit bound (0x380000000 -> 0x37fffffff) and arrive at a
* better overall bounds for r1 as [0x1'000000001; 0x3'7fffffff].
* We just need to make sure that derived bounds we are intersecting
* with are well-formed ranges in respecitve s64 or u64 domain, just
* with are well-formed ranges in respective s64 or u64 domain, just
* like we do with similar kinds of 32-to-64 or 64-to-32 adjustments.
*/
__u64 new_umin, new_umax;
@ -2402,7 +2411,7 @@ static void init_func_state(struct bpf_verifier_env *env,
/* Similar to push_stack(), but for async callbacks */
static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx,
int subprog)
int subprog, bool is_sleepable)
{
struct bpf_verifier_stack_elem *elem;
struct bpf_func_state *frame;
@ -2429,6 +2438,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
* Initialize it similar to do_check_common().
*/
elem->st.branches = 1;
elem->st.in_sleepable = is_sleepable;
frame = kzalloc(sizeof(*frame), GFP_KERNEL);
if (!frame)
goto err;
@ -3615,7 +3625,8 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
* sreg needs precision before this insn
*/
bt_clear_reg(bt, dreg);
bt_set_reg(bt, sreg);
if (sreg != BPF_REG_FP)
bt_set_reg(bt, sreg);
} else {
/* dreg = K
* dreg needs precision after this insn.
@ -3631,7 +3642,8 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
* both dreg and sreg need precision
* before this insn
*/
bt_set_reg(bt, sreg);
if (sreg != BPF_REG_FP)
bt_set_reg(bt, sreg);
} /* else dreg += K
* dreg still needs precision before this insn
*/
@ -5274,7 +5286,8 @@ bad_type:
static bool in_sleepable(struct bpf_verifier_env *env)
{
return env->prog->sleepable;
return env->prog->sleepable ||
(env->cur_state && env->cur_state->in_sleepable);
}
/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
@ -5297,6 +5310,7 @@ BTF_ID(struct, cgroup)
BTF_ID(struct, bpf_cpumask)
#endif
BTF_ID(struct, task_struct)
BTF_ID(struct, bpf_crypto_ctx)
BTF_SET_END(rcu_protected_types)
static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
@ -6972,6 +6986,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
return err;
}
static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
bool allow_trust_mismatch);
static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
{
int load_reg;
@ -7032,7 +7049,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
is_pkt_reg(env, insn->dst_reg) ||
is_flow_key_reg(env, insn->dst_reg) ||
is_sk_reg(env, insn->dst_reg) ||
is_arena_reg(env, insn->dst_reg)) {
(is_arena_reg(env, insn->dst_reg) && !bpf_jit_supports_insn(insn, true))) {
verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
insn->dst_reg,
reg_type_str(env, reg_state(env, insn->dst_reg)->type));
@ -7068,6 +7085,11 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
if (err)
return err;
if (is_arena_reg(env, insn->dst_reg)) {
err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
if (err)
return err;
}
/* Check whether we can write into the same memory. */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
@ -7590,6 +7612,23 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
return 0;
}
static int process_wq_func(struct bpf_verifier_env *env, int regno,
struct bpf_kfunc_call_arg_meta *meta)
{
struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
struct bpf_map *map = reg->map_ptr;
u64 val = reg->var_off.value;
if (map->record->wq_off != val + reg->off) {
verbose(env, "off %lld doesn't point to 'struct bpf_wq' that is at %d\n",
val + reg->off, map->record->wq_off);
return -EINVAL;
}
meta->map.uid = reg->map_uid;
meta->map.ptr = map;
return 0;
}
static int process_kptr_func(struct bpf_verifier_env *env, int regno,
struct bpf_call_arg_meta *meta)
{
@ -9484,7 +9523,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
*/
env->subprog_info[subprog].is_cb = true;
if (bpf_pseudo_kfunc_call(insn) &&
!is_sync_callback_calling_kfunc(insn->imm)) {
!is_callback_calling_kfunc(insn->imm)) {
verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
func_id_name(insn->imm), insn->imm);
return -EFAULT;
@ -9498,10 +9537,11 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
if (is_async_callback_calling_insn(insn)) {
struct bpf_verifier_state *async_cb;
/* there is no real recursion here. timer callbacks are async */
/* there is no real recursion here. timer and workqueue callbacks are async */
env->subprog_info[subprog].is_async_cb = true;
async_cb = push_async_cb(env, env->subprog_info[subprog].start,
insn_idx, subprog);
insn_idx, subprog,
is_bpf_wq_set_callback_impl_kfunc(insn->imm));
if (!async_cb)
return -EFAULT;
callee = async_cb->frame[0];
@ -9561,6 +9601,13 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return -EINVAL;
}
/* Only global subprogs cannot be called with preemption disabled. */
if (env->cur_state->active_preempt_lock) {
verbose(env, "global function calls are not allowed with preemption disabled,\n"
"use static function instead\n");
return -EINVAL;
}
if (err) {
verbose(env, "Caller passes invalid args into func#%d ('%s')\n",
subprog, sub_name);
@ -9653,12 +9700,8 @@ static int set_map_elem_callback_state(struct bpf_verifier_env *env,
struct bpf_map *map;
int err;
if (bpf_map_ptr_poisoned(insn_aux)) {
verbose(env, "tail_call abusing map_ptr\n");
return -EINVAL;
}
map = BPF_MAP_PTR(insn_aux->map_ptr_state);
/* valid map_ptr and poison value does not matter */
map = insn_aux->map_ptr_state.map_ptr;
if (!map->ops->map_set_for_each_callback_args ||
!map->ops->map_for_each_callback) {
verbose(env, "callback function not allowed for map\n");
@ -10017,12 +10060,12 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
return -EACCES;
}
if (!BPF_MAP_PTR(aux->map_ptr_state))
if (!aux->map_ptr_state.map_ptr)
bpf_map_ptr_store(aux, meta->map_ptr,
!meta->map_ptr->bypass_spec_v1);
else if (BPF_MAP_PTR(aux->map_ptr_state) != meta->map_ptr)
bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON,
!meta->map_ptr->bypass_spec_v1);
!meta->map_ptr->bypass_spec_v1, false);
else if (aux->map_ptr_state.map_ptr != meta->map_ptr)
bpf_map_ptr_store(aux, meta->map_ptr,
!meta->map_ptr->bypass_spec_v1, true);
return 0;
}
@ -10201,8 +10244,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
if (env->ops->get_func_proto)
fn = env->ops->get_func_proto(func_id, env->prog);
if (!fn) {
verbose(env, "unknown func %s#%d\n", func_id_name(func_id),
func_id);
verbose(env, "program of this type cannot use helper %s#%d\n",
func_id_name(func_id), func_id);
return -EINVAL;
}
@ -10251,6 +10294,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
}
if (env->cur_state->active_preempt_lock) {
if (fn->might_sleep) {
verbose(env, "sleepable helper %s#%d in non-preemptible region\n",
func_id_name(func_id), func_id);
return -EINVAL;
}
if (in_sleepable(env) && is_storage_get_function(func_id))
env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
}
meta.func_id = func_id;
/* check args */
for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
@ -10839,6 +10893,7 @@ enum {
KF_ARG_LIST_NODE_ID,
KF_ARG_RB_ROOT_ID,
KF_ARG_RB_NODE_ID,
KF_ARG_WORKQUEUE_ID,
};
BTF_ID_LIST(kf_arg_btf_ids)
@ -10847,6 +10902,7 @@ BTF_ID(struct, bpf_list_head)
BTF_ID(struct, bpf_list_node)
BTF_ID(struct, bpf_rb_root)
BTF_ID(struct, bpf_rb_node)
BTF_ID(struct, bpf_wq)
static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
const struct btf_param *arg, int type)
@ -10890,6 +10946,11 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
}
static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
{
return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
}
static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
const struct btf_param *arg)
{
@ -10959,6 +11020,7 @@ enum kfunc_ptr_arg_type {
KF_ARG_PTR_TO_NULL,
KF_ARG_PTR_TO_CONST_STR,
KF_ARG_PTR_TO_MAP,
KF_ARG_PTR_TO_WORKQUEUE,
};
enum special_kfunc_type {
@ -10984,6 +11046,9 @@ enum special_kfunc_type {
KF_bpf_percpu_obj_new_impl,
KF_bpf_percpu_obj_drop_impl,
KF_bpf_throw,
KF_bpf_wq_set_callback_impl,
KF_bpf_preempt_disable,
KF_bpf_preempt_enable,
KF_bpf_iter_css_task_new,
};
@ -11008,6 +11073,7 @@ BTF_ID(func, bpf_dynptr_clone)
BTF_ID(func, bpf_percpu_obj_new_impl)
BTF_ID(func, bpf_percpu_obj_drop_impl)
BTF_ID(func, bpf_throw)
BTF_ID(func, bpf_wq_set_callback_impl)
#ifdef CONFIG_CGROUPS
BTF_ID(func, bpf_iter_css_task_new)
#endif
@ -11036,6 +11102,9 @@ BTF_ID(func, bpf_dynptr_clone)
BTF_ID(func, bpf_percpu_obj_new_impl)
BTF_ID(func, bpf_percpu_obj_drop_impl)
BTF_ID(func, bpf_throw)
BTF_ID(func, bpf_wq_set_callback_impl)
BTF_ID(func, bpf_preempt_disable)
BTF_ID(func, bpf_preempt_enable)
#ifdef CONFIG_CGROUPS
BTF_ID(func, bpf_iter_css_task_new)
#else
@ -11062,6 +11131,16 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock];
}
static bool is_kfunc_bpf_preempt_disable(struct bpf_kfunc_call_arg_meta *meta)
{
return meta->func_id == special_kfunc_list[KF_bpf_preempt_disable];
}
static bool is_kfunc_bpf_preempt_enable(struct bpf_kfunc_call_arg_meta *meta)
{
return meta->func_id == special_kfunc_list[KF_bpf_preempt_enable];
}
static enum kfunc_ptr_arg_type
get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
struct bpf_kfunc_call_arg_meta *meta,
@ -11115,6 +11194,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (is_kfunc_arg_map(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_MAP;
if (is_kfunc_arg_wq(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_WORKQUEUE;
if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
if (!btf_type_is_struct(ref_t)) {
verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@ -11366,12 +11448,28 @@ static bool is_sync_callback_calling_kfunc(u32 btf_id)
return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl];
}
static bool is_async_callback_calling_kfunc(u32 btf_id)
{
return btf_id == special_kfunc_list[KF_bpf_wq_set_callback_impl];
}
static bool is_bpf_throw_kfunc(struct bpf_insn *insn)
{
return bpf_pseudo_kfunc_call(insn) && insn->off == 0 &&
insn->imm == special_kfunc_list[KF_bpf_throw];
}
static bool is_bpf_wq_set_callback_impl_kfunc(u32 btf_id)
{
return btf_id == special_kfunc_list[KF_bpf_wq_set_callback_impl];
}
static bool is_callback_calling_kfunc(u32 btf_id)
{
return is_sync_callback_calling_kfunc(btf_id) ||
is_async_callback_calling_kfunc(btf_id);
}
static bool is_rbtree_lock_required_kfunc(u32 btf_id)
{
return is_bpf_rbtree_api_kfunc(btf_id);
@ -11716,6 +11814,34 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_NULL:
continue;
case KF_ARG_PTR_TO_MAP:
if (!reg->map_ptr) {
verbose(env, "pointer in R%d isn't map pointer\n", regno);
return -EINVAL;
}
if (meta->map.ptr && reg->map_ptr->record->wq_off >= 0) {
/* Use map_uid (which is unique id of inner map) to reject:
* inner_map1 = bpf_map_lookup_elem(outer_map, key1)
* inner_map2 = bpf_map_lookup_elem(outer_map, key2)
* if (inner_map1 && inner_map2) {
* wq = bpf_map_lookup_elem(inner_map1);
* if (wq)
* // mismatch would have been allowed
* bpf_wq_init(wq, inner_map2);
* }
*
* Comparing map_ptr is enough to distinguish normal and outer maps.
*/
if (meta->map.ptr != reg->map_ptr ||
meta->map.uid != reg->map_uid) {
verbose(env,
"workqueue pointer in R1 map_uid=%d doesn't match map pointer in R2 map_uid=%d\n",
meta->map.uid, reg->map_uid);
return -EINVAL;
}
}
meta->map.ptr = reg->map_ptr;
meta->map.uid = reg->map_uid;
fallthrough;
case KF_ARG_PTR_TO_ALLOC_BTF_ID:
case KF_ARG_PTR_TO_BTF_ID:
if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
@ -11748,6 +11874,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_CALLBACK:
case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
case KF_ARG_PTR_TO_CONST_STR:
case KF_ARG_PTR_TO_WORKQUEUE:
/* Trusted by default */
break;
default:
@ -12034,6 +12161,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
if (ret)
return ret;
break;
case KF_ARG_PTR_TO_WORKQUEUE:
if (reg->type != PTR_TO_MAP_VALUE) {
verbose(env, "arg#%d doesn't point to a map value\n", i);
return -EINVAL;
}
ret = process_wq_func(env, regno, meta);
if (ret < 0)
return ret;
break;
}
}
@ -12093,11 +12229,11 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char
static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int *insn_idx_p)
{
const struct btf_type *t, *ptr_type;
bool sleepable, rcu_lock, rcu_unlock, preempt_disable, preempt_enable;
u32 i, nargs, ptr_type_id, release_ref_obj_id;
struct bpf_reg_state *regs = cur_regs(env);
const char *func_name, *ptr_type_name;
bool sleepable, rcu_lock, rcu_unlock;
const struct btf_type *t, *ptr_type;
struct bpf_kfunc_call_arg_meta meta;
struct bpf_insn_aux_data *insn_aux;
int err, insn_idx = *insn_idx_p;
@ -12145,9 +12281,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
}
}
if (is_bpf_wq_set_callback_impl_kfunc(meta.func_id)) {
err = push_callback_call(env, insn, insn_idx, meta.subprogno,
set_timer_callback_state);
if (err) {
verbose(env, "kfunc %s#%d failed callback verification\n",
func_name, meta.func_id);
return err;
}
}
rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta);
rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);
preempt_disable = is_kfunc_bpf_preempt_disable(&meta);
preempt_enable = is_kfunc_bpf_preempt_enable(&meta);
if (env->cur_state->active_rcu_lock) {
struct bpf_func_state *state;
struct bpf_reg_state *reg;
@ -12180,6 +12329,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return -EINVAL;
}
if (env->cur_state->active_preempt_lock) {
if (preempt_disable) {
env->cur_state->active_preempt_lock++;
} else if (preempt_enable) {
env->cur_state->active_preempt_lock--;
} else if (sleepable) {
verbose(env, "kernel func %s is sleepable within non-preemptible region\n", func_name);
return -EACCES;
}
} else if (preempt_disable) {
env->cur_state->active_preempt_lock++;
} else if (preempt_enable) {
verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
return -EINVAL;
}
/* In case of release function, we get register number of refcounted
* PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
*/
@ -13318,7 +13483,6 @@ static void scalar32_min_max_and(struct bpf_reg_state *dst_reg,
bool src_known = tnum_subreg_is_const(src_reg->var_off);
bool dst_known = tnum_subreg_is_const(dst_reg->var_off);
struct tnum var32_off = tnum_subreg(dst_reg->var_off);
s32 smin_val = src_reg->s32_min_value;
u32 umax_val = src_reg->u32_max_value;
if (src_known && dst_known) {
@ -13331,18 +13495,16 @@ static void scalar32_min_max_and(struct bpf_reg_state *dst_reg,
*/
dst_reg->u32_min_value = var32_off.value;
dst_reg->u32_max_value = min(dst_reg->u32_max_value, umax_val);
if (dst_reg->s32_min_value < 0 || smin_val < 0) {
/* Lose signed bounds when ANDing negative numbers,
* ain't nobody got time for that.
*/
dst_reg->s32_min_value = S32_MIN;
dst_reg->s32_max_value = S32_MAX;
} else {
/* ANDing two positives gives a positive, so safe to
* cast result into s64.
*/
/* Safe to set s32 bounds by casting u32 result into s32 when u32
* doesn't cross sign boundary. Otherwise set s32 bounds to unbounded.
*/
if ((s32)dst_reg->u32_min_value <= (s32)dst_reg->u32_max_value) {
dst_reg->s32_min_value = dst_reg->u32_min_value;
dst_reg->s32_max_value = dst_reg->u32_max_value;
} else {
dst_reg->s32_min_value = S32_MIN;
dst_reg->s32_max_value = S32_MAX;
}
}
@ -13351,7 +13513,6 @@ static void scalar_min_max_and(struct bpf_reg_state *dst_reg,
{
bool src_known = tnum_is_const(src_reg->var_off);
bool dst_known = tnum_is_const(dst_reg->var_off);
s64 smin_val = src_reg->smin_value;
u64 umax_val = src_reg->umax_value;
if (src_known && dst_known) {
@ -13364,18 +13525,16 @@ static void scalar_min_max_and(struct bpf_reg_state *dst_reg,
*/
dst_reg->umin_value = dst_reg->var_off.value;
dst_reg->umax_value = min(dst_reg->umax_value, umax_val);
if (dst_reg->smin_value < 0 || smin_val < 0) {
/* Lose signed bounds when ANDing negative numbers,
* ain't nobody got time for that.
*/
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
} else {
/* ANDing two positives gives a positive, so safe to
* cast result into s64.
*/
/* Safe to set s64 bounds by casting u64 result into s64 when u64
* doesn't cross sign boundary. Otherwise set s64 bounds to unbounded.
*/
if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
dst_reg->smin_value = dst_reg->umin_value;
dst_reg->smax_value = dst_reg->umax_value;
} else {
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
}
/* We may learn something more from the var_off */
__update_reg_bounds(dst_reg);
@ -13387,7 +13546,6 @@ static void scalar32_min_max_or(struct bpf_reg_state *dst_reg,
bool src_known = tnum_subreg_is_const(src_reg->var_off);
bool dst_known = tnum_subreg_is_const(dst_reg->var_off);
struct tnum var32_off = tnum_subreg(dst_reg->var_off);
s32 smin_val = src_reg->s32_min_value;
u32 umin_val = src_reg->u32_min_value;
if (src_known && dst_known) {
@ -13400,18 +13558,16 @@ static void scalar32_min_max_or(struct bpf_reg_state *dst_reg,
*/
dst_reg->u32_min_value = max(dst_reg->u32_min_value, umin_val);
dst_reg->u32_max_value = var32_off.value | var32_off.mask;
if (dst_reg->s32_min_value < 0 || smin_val < 0) {
/* Lose signed bounds when ORing negative numbers,
* ain't nobody got time for that.
*/
dst_reg->s32_min_value = S32_MIN;
dst_reg->s32_max_value = S32_MAX;
} else {
/* ORing two positives gives a positive, so safe to
* cast result into s64.
*/
/* Safe to set s32 bounds by casting u32 result into s32 when u32
* doesn't cross sign boundary. Otherwise set s32 bounds to unbounded.
*/
if ((s32)dst_reg->u32_min_value <= (s32)dst_reg->u32_max_value) {
dst_reg->s32_min_value = dst_reg->u32_min_value;
dst_reg->s32_max_value = dst_reg->u32_max_value;
} else {
dst_reg->s32_min_value = S32_MIN;
dst_reg->s32_max_value = S32_MAX;
}
}
@ -13420,7 +13576,6 @@ static void scalar_min_max_or(struct bpf_reg_state *dst_reg,
{
bool src_known = tnum_is_const(src_reg->var_off);
bool dst_known = tnum_is_const(dst_reg->var_off);
s64 smin_val = src_reg->smin_value;
u64 umin_val = src_reg->umin_value;
if (src_known && dst_known) {
@ -13433,18 +13588,16 @@ static void scalar_min_max_or(struct bpf_reg_state *dst_reg,
*/
dst_reg->umin_value = max(dst_reg->umin_value, umin_val);
dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask;
if (dst_reg->smin_value < 0 || smin_val < 0) {
/* Lose signed bounds when ORing negative numbers,
* ain't nobody got time for that.
*/
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
} else {
/* ORing two positives gives a positive, so safe to
* cast result into s64.
*/
/* Safe to set s64 bounds by casting u64 result into s64 when u64
* doesn't cross sign boundary. Otherwise set s64 bounds to unbounded.
*/
if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
dst_reg->smin_value = dst_reg->umin_value;
dst_reg->smax_value = dst_reg->umax_value;
} else {
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
}
/* We may learn something more from the var_off */
__update_reg_bounds(dst_reg);
@ -13456,7 +13609,6 @@ static void scalar32_min_max_xor(struct bpf_reg_state *dst_reg,
bool src_known = tnum_subreg_is_const(src_reg->var_off);
bool dst_known = tnum_subreg_is_const(dst_reg->var_off);
struct tnum var32_off = tnum_subreg(dst_reg->var_off);
s32 smin_val = src_reg->s32_min_value;
if (src_known && dst_known) {
__mark_reg32_known(dst_reg, var32_off.value);
@ -13467,10 +13619,10 @@ static void scalar32_min_max_xor(struct bpf_reg_state *dst_reg,
dst_reg->u32_min_value = var32_off.value;
dst_reg->u32_max_value = var32_off.value | var32_off.mask;
if (dst_reg->s32_min_value >= 0 && smin_val >= 0) {
/* XORing two positive sign numbers gives a positive,
* so safe to cast u32 result into s32.
*/
/* Safe to set s32 bounds by casting u32 result into s32 when u32
* doesn't cross sign boundary. Otherwise set s32 bounds to unbounded.
*/
if ((s32)dst_reg->u32_min_value <= (s32)dst_reg->u32_max_value) {
dst_reg->s32_min_value = dst_reg->u32_min_value;
dst_reg->s32_max_value = dst_reg->u32_max_value;
} else {
@ -13484,7 +13636,6 @@ static void scalar_min_max_xor(struct bpf_reg_state *dst_reg,
{
bool src_known = tnum_is_const(src_reg->var_off);
bool dst_known = tnum_is_const(dst_reg->var_off);
s64 smin_val = src_reg->smin_value;
if (src_known && dst_known) {
/* dst_reg->var_off.value has been updated earlier */
@ -13496,10 +13647,10 @@ static void scalar_min_max_xor(struct bpf_reg_state *dst_reg,
dst_reg->umin_value = dst_reg->var_off.value;
dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask;
if (dst_reg->smin_value >= 0 && smin_val >= 0) {
/* XORing two positive sign numbers gives a positive,
* so safe to cast u64 result into s64.
*/
/* Safe to set s64 bounds by casting u64 result into s64 when u64
* doesn't cross sign boundary. Otherwise set s64 bounds to unbounded.
*/
if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
dst_reg->smin_value = dst_reg->umin_value;
dst_reg->smax_value = dst_reg->umax_value;
} else {
@ -14726,7 +14877,7 @@ static void regs_refine_cond_op(struct bpf_reg_state *reg1, struct bpf_reg_state
/* Adjusts the register min/max values in the case that the dst_reg and
* src_reg are both SCALAR_VALUE registers (or we are simply doing a BPF_K
* check, in which case we havea fake SCALAR_VALUE representing insn->imm).
* check, in which case we have a fake SCALAR_VALUE representing insn->imm).
* Technically we can do similar adjustments for pointers to the same object,
* but we don't support that right now.
*/
@ -15341,6 +15492,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
return -EINVAL;
}
if (env->cur_state->active_preempt_lock) {
verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_preempt_disable-ed region\n");
return -EINVAL;
}
if (regs[ctx_reg].type != PTR_TO_CTX) {
verbose(env,
"at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
@ -16908,6 +17064,12 @@ static bool states_equal(struct bpf_verifier_env *env,
if (old->active_rcu_lock != cur->active_rcu_lock)
return false;
if (old->active_preempt_lock != cur->active_preempt_lock)
return false;
if (old->in_sleepable != cur->in_sleepable)
return false;
/* for states to be equal callsites have to be the same
* and all frame states need to be equivalent
*/
@ -17364,7 +17526,7 @@ hit:
err = propagate_liveness(env, &sl->state, cur);
/* if previous state reached the exit with precision and
* current state is equivalent to it (except precsion marks)
* current state is equivalent to it (except precision marks)
* the precision needs to be propagated back in
* the current state.
*/
@ -17542,7 +17704,7 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
}
static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
bool allow_trust_missmatch)
bool allow_trust_mismatch)
{
enum bpf_reg_type *prev_type = &env->insn_aux_data[env->insn_idx].ptr_type;
@ -17560,7 +17722,7 @@ static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type typ
* src_reg == stack|map in some other branch.
* Reject it.
*/
if (allow_trust_missmatch &&
if (allow_trust_mismatch &&
base_type(type) == PTR_TO_BTF_ID &&
base_type(*prev_type) == PTR_TO_BTF_ID) {
/*
@ -17856,6 +18018,13 @@ process_bpf_exit_full:
return -EINVAL;
}
if (env->cur_state->active_preempt_lock && !env->cur_state->curframe) {
verbose(env, "%d bpf_preempt_enable%s missing\n",
env->cur_state->active_preempt_lock,
env->cur_state->active_preempt_lock == 1 ? " is" : "(s) are");
return -EINVAL;
}
/* We must do check_reference_leak here before
* prepare_func_exit to handle the case when
* state->curframe > 0, it may be a callback
@ -18153,6 +18322,13 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
}
}
if (btf_record_has_field(map->record, BPF_WORKQUEUE)) {
if (is_tracing_prog_type(prog_type)) {
verbose(env, "tracing progs cannot use bpf_wq yet\n");
return -EINVAL;
}
}
if ((bpf_prog_is_offloaded(prog->aux) || bpf_map_is_offloaded(map)) &&
!bpf_offload_prog_map_match(prog, map)) {
verbose(env, "offload device mismatch between prog and map\n");
@ -18348,6 +18524,8 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
}
if (env->used_map_cnt >= MAX_USED_MAPS) {
verbose(env, "The total number of maps per program has reached the limit of %u\n",
MAX_USED_MAPS);
fdput(f);
return -E2BIG;
}
@ -18962,6 +19140,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
type = BPF_WRITE;
} else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) ||
insn->code == (BPF_STX | BPF_ATOMIC | BPF_DW)) &&
env->insn_aux_data[i + delta].ptr_type == PTR_TO_ARENA) {
insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code);
env->prog->aux->num_exentries++;
continue;
} else {
continue;
}
@ -19148,12 +19332,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
env->insn_aux_data[i].call_imm = insn->imm;
/* point imm to __bpf_call_base+1 from JITs point of view */
insn->imm = 1;
if (bpf_pseudo_func(insn))
if (bpf_pseudo_func(insn)) {
#if defined(MODULES_VADDR)
u64 addr = MODULES_VADDR;
#else
u64 addr = VMALLOC_START;
#endif
/* jit (e.g. x86_64) may emit fewer instructions
* if it learns a u32 imm is the same as a u64 imm.
* Force a non zero here.
* Set close enough to possible prog address.
*/
insn[1].imm = 1;
insn[0].imm = (u32)addr;
insn[1].imm = addr >> 32;
}
}
err = bpf_prog_alloc_jited_linfo(prog);
@ -19226,6 +19417,9 @@ static int jit_subprogs(struct bpf_verifier_env *env)
BPF_CLASS(insn->code) == BPF_ST) &&
BPF_MODE(insn->code) == BPF_PROBE_MEM32)
num_exentries++;
if (BPF_CLASS(insn->code) == BPF_STX &&
BPF_MODE(insn->code) == BPF_PROBE_ATOMIC)
num_exentries++;
}
func[i]->aux->num_exentries = num_exentries;
func[i]->aux->tail_call_reachable = env->subprog_info[i].tail_call_reachable;
@ -19557,6 +19751,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
*cnt = 1;
} else if (is_bpf_wq_set_callback_impl_kfunc(desc->func_id)) {
struct bpf_insn ld_addrs[2] = { BPF_LD_IMM64(BPF_REG_4, (long)env->prog->aux) };
insn_buf[0] = ld_addrs[0];
insn_buf[1] = ld_addrs[1];
insn_buf[2] = *insn;
*cnt = 3;
}
return 0;
}
@ -19832,7 +20033,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
!bpf_map_ptr_unpriv(aux)) {
struct bpf_jit_poke_descriptor desc = {
.reason = BPF_POKE_REASON_TAIL_CALL,
.tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
.tail_call.map = aux->map_ptr_state.map_ptr,
.tail_call.key = bpf_map_key_immediate(aux),
.insn_idx = i + delta,
};
@ -19861,7 +20062,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
return -EINVAL;
}
map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
map_ptr = aux->map_ptr_state.map_ptr;
insn_buf[0] = BPF_JMP_IMM(BPF_JGE, BPF_REG_3,
map_ptr->max_entries, 2);
insn_buf[1] = BPF_ALU32_IMM(BPF_AND, BPF_REG_3,
@ -19969,7 +20170,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
if (bpf_map_ptr_poisoned(aux))
goto patch_call_imm;
map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
map_ptr = aux->map_ptr_state.map_ptr;
ops = map_ptr->ops;
if (insn->imm == BPF_FUNC_map_lookup_elem &&
ops->map_gen_lookup) {
@ -20075,6 +20276,30 @@ patch_map_ops_generic:
goto next_insn;
}
#ifdef CONFIG_X86_64
/* Implement bpf_get_smp_processor_id() inline. */
if (insn->imm == BPF_FUNC_get_smp_processor_id &&
prog->jit_requested && bpf_jit_supports_percpu_insn()) {
/* BPF_FUNC_get_smp_processor_id inlining is an
* optimization, so if pcpu_hot.cpu_number is ever
* changed in some incompatible and hard to support
* way, it's fine to back out this inlining logic
*/
insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
cnt = 3;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
goto next_insn;
}
#endif
/* Implement bpf_get_func_arg inline. */
if (prog_type == BPF_PROG_TYPE_TRACING &&
insn->imm == BPF_FUNC_get_func_arg) {
@ -20158,6 +20383,62 @@ patch_map_ops_generic:
goto next_insn;
}
/* Implement bpf_get_branch_snapshot inline. */
if (IS_ENABLED(CONFIG_PERF_EVENTS) &&
prog->jit_requested && BITS_PER_LONG == 64 &&
insn->imm == BPF_FUNC_get_branch_snapshot) {
/* We are dealing with the following func protos:
* u64 bpf_get_branch_snapshot(void *buf, u32 size, u64 flags);
* int perf_snapshot_branch_stack(struct perf_branch_entry *entries, u32 cnt);
*/
const u32 br_entry_size = sizeof(struct perf_branch_entry);
/* struct perf_branch_entry is part of UAPI and is
* used as an array element, so extremely unlikely to
* ever grow or shrink
*/
BUILD_BUG_ON(br_entry_size != 24);
/* if (unlikely(flags)) return -EINVAL */
insn_buf[0] = BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 0, 7);
/* Transform size (bytes) into number of entries (cnt = size / 24).
* But to avoid expensive division instruction, we implement
* divide-by-3 through multiplication, followed by further
* division by 8 through 3-bit right shift.
* Refer to book "Hacker's Delight, 2nd ed." by Henry S. Warren, Jr.,
* p. 227, chapter "Unsigned Division by 3" for details and proofs.
*
* N / 3 <=> M * N / 2^33, where M = (2^33 + 1) / 3 = 0xaaaaaaab.
*/
insn_buf[1] = BPF_MOV32_IMM(BPF_REG_0, 0xaaaaaaab);
insn_buf[2] = BPF_ALU64_REG(BPF_MUL, BPF_REG_2, BPF_REG_0);
insn_buf[3] = BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 36);
/* call perf_snapshot_branch_stack implementation */
insn_buf[4] = BPF_EMIT_CALL(static_call_query(perf_snapshot_branch_stack));
/* if (entry_cnt == 0) return -ENOENT */
insn_buf[5] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4);
/* return entry_cnt * sizeof(struct perf_branch_entry) */
insn_buf[6] = BPF_ALU32_IMM(BPF_MUL, BPF_REG_0, br_entry_size);
insn_buf[7] = BPF_JMP_A(3);
/* return -EINVAL; */
insn_buf[8] = BPF_MOV64_IMM(BPF_REG_0, -EINVAL);
insn_buf[9] = BPF_JMP_A(1);
/* return -ENOENT; */
insn_buf[10] = BPF_MOV64_IMM(BPF_REG_0, -ENOENT);
cnt = 11;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
continue;
}
/* Implement bpf_kptr_xchg inline */
if (prog->jit_requested && BITS_PER_LONG == 64 &&
insn->imm == BPF_FUNC_kptr_xchg &&

View File

@ -1188,9 +1188,6 @@ static const struct bpf_func_proto bpf_get_attach_cookie_proto_tracing = {
BPF_CALL_3(bpf_get_branch_snapshot, void *, buf, u32, size, u64, flags)
{
#ifndef CONFIG_X86
return -ENOENT;
#else
static const u32 br_entry_size = sizeof(struct perf_branch_entry);
u32 entry_cnt = size / br_entry_size;
@ -1203,7 +1200,6 @@ BPF_CALL_3(bpf_get_branch_snapshot, void *, buf, u32, size, u64, flags)
return -ENOENT;
return entry_cnt * br_entry_size;
#endif
}
static const struct bpf_func_proto bpf_get_branch_snapshot_proto = {

View File

@ -13431,7 +13431,7 @@ static struct bpf_test tests[] = {
.stack_depth = 8,
.nr_testruns = NR_PATTERN_RUNS,
},
/* 64-bit atomic magnitudes */
/* 32-bit atomic magnitudes */
{
"ATOMIC_W_ADD: all operand magnitudes",
{ },

View File

@ -79,6 +79,51 @@ static int dummy_ops_call_op(void *image, struct bpf_dummy_ops_test_args *args)
args->args[3], args->args[4]);
}
static const struct bpf_ctx_arg_aux *find_ctx_arg_info(struct bpf_prog_aux *aux, int offset)
{
int i;
for (i = 0; i < aux->ctx_arg_info_size; i++)
if (aux->ctx_arg_info[i].offset == offset)
return &aux->ctx_arg_info[i];
return NULL;
}
/* There is only one check at the moment:
* - zero should not be passed for pointer parameters not marked as nullable.
*/
static int check_test_run_args(struct bpf_prog *prog, struct bpf_dummy_ops_test_args *args)
{
const struct btf_type *func_proto = prog->aux->attach_func_proto;
for (u32 arg_no = 0; arg_no < btf_type_vlen(func_proto) ; ++arg_no) {
const struct btf_param *param = &btf_params(func_proto)[arg_no];
const struct bpf_ctx_arg_aux *info;
const struct btf_type *t;
int offset;
if (args->args[arg_no] != 0)
continue;
/* Program is validated already, so there is no need
* to check if t is NULL.
*/
t = btf_type_skip_modifiers(bpf_dummy_ops_btf, param->type, NULL);
if (!btf_type_is_ptr(t))
continue;
offset = btf_ctx_arg_offset(bpf_dummy_ops_btf, func_proto, arg_no);
info = find_ctx_arg_info(prog->aux, offset);
if (info && (info->reg_type & PTR_MAYBE_NULL))
continue;
return -EINVAL;
}
return 0;
}
extern const struct bpf_link_ops bpf_struct_ops_link_lops;
int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
@ -87,7 +132,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
const struct bpf_struct_ops *st_ops = &bpf_bpf_dummy_ops;
const struct btf_type *func_proto;
struct bpf_dummy_ops_test_args *args;
struct bpf_tramp_links *tlinks;
struct bpf_tramp_links *tlinks = NULL;
struct bpf_tramp_link *link = NULL;
void *image = NULL;
unsigned int op_idx;
@ -109,6 +154,10 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
if (IS_ERR(args))
return PTR_ERR(args);
err = check_test_run_args(prog, args);
if (err)
goto out;
tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL);
if (!tlinks) {
err = -ENOMEM;
@ -232,7 +281,7 @@ static void bpf_dummy_unreg(void *kdata)
{
}
static int bpf_dummy_test_1(struct bpf_dummy_ops_state *cb)
static int bpf_dummy_ops__test_1(struct bpf_dummy_ops_state *cb__nullable)
{
return 0;
}
@ -249,7 +298,7 @@ static int bpf_dummy_test_sleepable(struct bpf_dummy_ops_state *cb)
}
static struct bpf_dummy_ops __bpf_bpf_dummy_ops = {
.test_1 = bpf_dummy_test_1,
.test_1 = bpf_dummy_ops__test_1,
.test_2 = bpf_dummy_test_2,
.test_sleepable = bpf_dummy_test_sleepable,
};

View File

@ -575,6 +575,13 @@ __bpf_kfunc int bpf_modify_return_test2(int a, int *b, short c, int d,
return a + *b + c + d + (long)e + f + g;
}
__bpf_kfunc int bpf_modify_return_test_tp(int nonce)
{
trace_bpf_trigger_tp(nonce);
return nonce;
}
int noinline bpf_fentry_shadow_test(int a)
{
return a + 1;
@ -622,6 +629,7 @@ __bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_test_modify_return_ids)
BTF_ID_FLAGS(func, bpf_modify_return_test)
BTF_ID_FLAGS(func, bpf_modify_return_test2)
BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
BTF_ID_FLAGS(func, bpf_fentry_test1, KF_SLEEPABLE)
BTF_KFUNCS_END(bpf_test_modify_return_ids)

View File

@ -87,6 +87,9 @@
#include "dev.h"
/* Keep the struct bpf_fib_lookup small so that it fits into a cacheline */
static_assert(sizeof(struct bpf_fib_lookup) == 64, "struct bpf_fib_lookup size check");
static const struct bpf_func_proto *
bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
@ -5886,7 +5889,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
} else {
fl4.flowi4_mark = 0;
if (flags & BPF_FIB_LOOKUP_MARK)
fl4.flowi4_mark = params->mark;
else
fl4.flowi4_mark = 0;
fl4.flowi4_secid = 0;
fl4.flowi4_tun_key.tun_id = 0;
fl4.flowi4_uid = sock_net_uid(net, NULL);
@ -6029,7 +6035,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
err = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, &res,
strict);
} else {
fl6.flowi6_mark = 0;
if (flags & BPF_FIB_LOOKUP_MARK)
fl6.flowi6_mark = params->mark;
else
fl6.flowi6_mark = 0;
fl6.flowi6_secid = 0;
fl6.flowi6_tun_key.tun_id = 0;
fl6.flowi6_uid = sock_net_uid(net, NULL);
@ -6107,7 +6116,7 @@ set_fwd_params:
#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
BPF_FIB_LOOKUP_SRC)
BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK)
BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
struct bpf_fib_lookup *, params, int, plen, u32, flags)

View File

@ -24,8 +24,16 @@ struct bpf_stab {
#define SOCK_CREATE_FLAG_MASK \
(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
/* This mutex is used to
* - protect race between prog/link attach/detach and link prog update, and
* - protect race between releasing and accessing map in bpf_link.
* A single global mutex lock is used since it is expected contention is low.
*/
static DEFINE_MUTEX(sockmap_mutex);
static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
struct bpf_prog *old, u32 which);
struct bpf_prog *old, struct bpf_link *link,
u32 which);
static struct sk_psock_progs *sock_map_progs(struct bpf_map *map);
static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
@ -71,7 +79,9 @@ int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
ret = sock_map_prog_update(map, prog, NULL, attr->attach_type);
mutex_lock(&sockmap_mutex);
ret = sock_map_prog_update(map, prog, NULL, NULL, attr->attach_type);
mutex_unlock(&sockmap_mutex);
fdput(f);
return ret;
}
@ -103,7 +113,9 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
goto put_prog;
}
ret = sock_map_prog_update(map, NULL, prog, attr->attach_type);
mutex_lock(&sockmap_mutex);
ret = sock_map_prog_update(map, NULL, prog, NULL, attr->attach_type);
mutex_unlock(&sockmap_mutex);
put_prog:
bpf_prog_put(prog);
put_map:
@ -1460,55 +1472,84 @@ static struct sk_psock_progs *sock_map_progs(struct bpf_map *map)
return NULL;
}
static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
u32 which)
static int sock_map_prog_link_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
struct bpf_link ***plink, u32 which)
{
struct sk_psock_progs *progs = sock_map_progs(map);
struct bpf_prog **cur_pprog;
struct bpf_link **cur_plink;
if (!progs)
return -EOPNOTSUPP;
switch (which) {
case BPF_SK_MSG_VERDICT:
*pprog = &progs->msg_parser;
cur_pprog = &progs->msg_parser;
cur_plink = &progs->msg_parser_link;
break;
#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
case BPF_SK_SKB_STREAM_PARSER:
*pprog = &progs->stream_parser;
cur_pprog = &progs->stream_parser;
cur_plink = &progs->stream_parser_link;
break;
#endif
case BPF_SK_SKB_STREAM_VERDICT:
if (progs->skb_verdict)
return -EBUSY;
*pprog = &progs->stream_verdict;
cur_pprog = &progs->stream_verdict;
cur_plink = &progs->stream_verdict_link;
break;
case BPF_SK_SKB_VERDICT:
if (progs->stream_verdict)
return -EBUSY;
*pprog = &progs->skb_verdict;
cur_pprog = &progs->skb_verdict;
cur_plink = &progs->skb_verdict_link;
break;
default:
return -EOPNOTSUPP;
}
*pprog = cur_pprog;
if (plink)
*plink = cur_plink;
return 0;
}
/* Handle the following four cases:
* prog_attach: prog != NULL, old == NULL, link == NULL
* prog_detach: prog == NULL, old != NULL, link == NULL
* link_attach: prog != NULL, old == NULL, link != NULL
* link_detach: prog == NULL, old != NULL, link != NULL
*/
static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
struct bpf_prog *old, u32 which)
struct bpf_prog *old, struct bpf_link *link,
u32 which)
{
struct bpf_prog **pprog;
struct bpf_link **plink;
int ret;
ret = sock_map_prog_lookup(map, &pprog, which);
ret = sock_map_prog_link_lookup(map, &pprog, &plink, which);
if (ret)
return ret;
if (old)
return psock_replace_prog(pprog, prog, old);
/* for prog_attach/prog_detach/link_attach, return error if a bpf_link
* exists for that prog.
*/
if ((!link || prog) && *plink)
return -EBUSY;
psock_set_prog(pprog, prog);
return 0;
if (old) {
ret = psock_replace_prog(pprog, prog, old);
if (!ret)
*plink = NULL;
} else {
psock_set_prog(pprog, prog);
if (link)
*plink = link;
}
return ret;
}
int sock_map_bpf_prog_query(const union bpf_attr *attr,
@ -1533,7 +1574,7 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
rcu_read_lock();
ret = sock_map_prog_lookup(map, &pprog, attr->query.attach_type);
ret = sock_map_prog_link_lookup(map, &pprog, NULL, attr->query.attach_type);
if (ret)
goto end;
@ -1663,6 +1704,196 @@ void sock_map_close(struct sock *sk, long timeout)
}
EXPORT_SYMBOL_GPL(sock_map_close);
struct sockmap_link {
struct bpf_link link;
struct bpf_map *map;
enum bpf_attach_type attach_type;
};
static void sock_map_link_release(struct bpf_link *link)
{
struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
mutex_lock(&sockmap_mutex);
if (!sockmap_link->map)
goto out;
WARN_ON_ONCE(sock_map_prog_update(sockmap_link->map, NULL, link->prog, link,
sockmap_link->attach_type));
bpf_map_put_with_uref(sockmap_link->map);
sockmap_link->map = NULL;
out:
mutex_unlock(&sockmap_mutex);
}
static int sock_map_link_detach(struct bpf_link *link)
{
sock_map_link_release(link);
return 0;
}
static void sock_map_link_dealloc(struct bpf_link *link)
{
kfree(link);
}
/* Handle the following two cases:
* case 1: link != NULL, prog != NULL, old != NULL
* case 2: link != NULL, prog != NULL, old == NULL
*/
static int sock_map_link_update_prog(struct bpf_link *link,
struct bpf_prog *prog,
struct bpf_prog *old)
{
const struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
struct bpf_prog **pprog, *old_link_prog;
struct bpf_link **plink;
int ret = 0;
mutex_lock(&sockmap_mutex);
/* If old prog is not NULL, ensure old prog is the same as link->prog. */
if (old && link->prog != old) {
ret = -EPERM;
goto out;
}
/* Ensure link->prog has the same type/attach_type as the new prog. */
if (link->prog->type != prog->type ||
link->prog->expected_attach_type != prog->expected_attach_type) {
ret = -EINVAL;
goto out;
}
ret = sock_map_prog_link_lookup(sockmap_link->map, &pprog, &plink,
sockmap_link->attach_type);
if (ret)
goto out;
/* return error if the stored bpf_link does not match the incoming bpf_link. */
if (link != *plink) {
ret = -EBUSY;
goto out;
}
if (old) {
ret = psock_replace_prog(pprog, prog, old);
if (ret)
goto out;
} else {
psock_set_prog(pprog, prog);
}
bpf_prog_inc(prog);
old_link_prog = xchg(&link->prog, prog);
bpf_prog_put(old_link_prog);
out:
mutex_unlock(&sockmap_mutex);
return ret;
}
static u32 sock_map_link_get_map_id(const struct sockmap_link *sockmap_link)
{
u32 map_id = 0;
mutex_lock(&sockmap_mutex);
if (sockmap_link->map)
map_id = sockmap_link->map->id;
mutex_unlock(&sockmap_mutex);
return map_id;
}
static int sock_map_link_fill_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
const struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
u32 map_id = sock_map_link_get_map_id(sockmap_link);
info->sockmap.map_id = map_id;
info->sockmap.attach_type = sockmap_link->attach_type;
return 0;
}
static void sock_map_link_show_fdinfo(const struct bpf_link *link,
struct seq_file *seq)
{
const struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
u32 map_id = sock_map_link_get_map_id(sockmap_link);
seq_printf(seq, "map_id:\t%u\n", map_id);
seq_printf(seq, "attach_type:\t%u\n", sockmap_link->attach_type);
}
static const struct bpf_link_ops sock_map_link_ops = {
.release = sock_map_link_release,
.dealloc = sock_map_link_dealloc,
.detach = sock_map_link_detach,
.update_prog = sock_map_link_update_prog,
.fill_link_info = sock_map_link_fill_info,
.show_fdinfo = sock_map_link_show_fdinfo,
};
int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
{
struct bpf_link_primer link_primer;
struct sockmap_link *sockmap_link;
enum bpf_attach_type attach_type;
struct bpf_map *map;
int ret;
if (attr->link_create.flags)
return -EINVAL;
map = bpf_map_get_with_uref(attr->link_create.target_fd);
if (IS_ERR(map))
return PTR_ERR(map);
if (map->map_type != BPF_MAP_TYPE_SOCKMAP && map->map_type != BPF_MAP_TYPE_SOCKHASH) {
ret = -EINVAL;
goto out;
}
sockmap_link = kzalloc(sizeof(*sockmap_link), GFP_USER);
if (!sockmap_link) {
ret = -ENOMEM;
goto out;
}
attach_type = attr->link_create.attach_type;
bpf_link_init(&sockmap_link->link, BPF_LINK_TYPE_SOCKMAP, &sock_map_link_ops, prog);
sockmap_link->map = map;
sockmap_link->attach_type = attach_type;
ret = bpf_link_prime(&sockmap_link->link, &link_primer);
if (ret) {
kfree(sockmap_link);
goto out;
}
mutex_lock(&sockmap_mutex);
ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
mutex_unlock(&sockmap_mutex);
if (ret) {
bpf_link_cleanup(&link_primer);
goto out;
}
/* Increase refcnt for the prog since when old prog is replaced with
* psock_replace_prog() and psock_set_prog() its refcnt will be decreased.
*
* Actually, we do not need to increase refcnt for the prog since bpf_link
* will hold a reference. But in order to have less complexity w.r.t.
* replacing/setting prog, let us increase the refcnt to make things simpler.
*/
bpf_prog_inc(prog);
return bpf_link_settle(&link_primer);
out:
bpf_map_put_with_uref(map);
return ret;
}
static int sock_map_iter_attach_target(struct bpf_prog *prog,
union bpf_iter_link_info *linfo,
struct bpf_iter_aux_info *aux)

View File

@ -1156,8 +1156,6 @@ static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = {
};
BTF_KFUNCS_START(tcp_bbr_check_kfunc_ids)
#ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID_FLAGS(func, bbr_init)
BTF_ID_FLAGS(func, bbr_main)
BTF_ID_FLAGS(func, bbr_sndbuf_expand)
@ -1166,8 +1164,6 @@ BTF_ID_FLAGS(func, bbr_cwnd_event)
BTF_ID_FLAGS(func, bbr_ssthresh)
BTF_ID_FLAGS(func, bbr_min_tso_segs)
BTF_ID_FLAGS(func, bbr_set_state)
#endif
#endif
BTF_KFUNCS_END(tcp_bbr_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = {

View File

@ -486,16 +486,12 @@ static struct tcp_congestion_ops cubictcp __read_mostly = {
};
BTF_KFUNCS_START(tcp_cubic_check_kfunc_ids)
#ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID_FLAGS(func, cubictcp_init)
BTF_ID_FLAGS(func, cubictcp_recalc_ssthresh)
BTF_ID_FLAGS(func, cubictcp_cong_avoid)
BTF_ID_FLAGS(func, cubictcp_state)
BTF_ID_FLAGS(func, cubictcp_cwnd_event)
BTF_ID_FLAGS(func, cubictcp_acked)
#endif
#endif
BTF_KFUNCS_END(tcp_cubic_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = {

View File

@ -261,16 +261,12 @@ static struct tcp_congestion_ops dctcp_reno __read_mostly = {
};
BTF_KFUNCS_START(tcp_dctcp_check_kfunc_ids)
#ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID_FLAGS(func, dctcp_init)
BTF_ID_FLAGS(func, dctcp_update_alpha)
BTF_ID_FLAGS(func, dctcp_cwnd_event)
BTF_ID_FLAGS(func, dctcp_ssthresh)
BTF_ID_FLAGS(func, dctcp_cwnd_undo)
BTF_ID_FLAGS(func, dctcp_state)
#endif
#endif
BTF_KFUNCS_END(tcp_dctcp_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = {

View File

@ -913,7 +913,7 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
tp->rtt_seq = tp->snd_nxt;
tp->mdev_max_us = tcp_rto_min_us(sk);
tcp_bpf_rtt(sk);
tcp_bpf_rtt(sk, mrtt_us, srtt);
}
} else {
/* no previous measure. */
@ -923,7 +923,7 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
tp->mdev_max_us = tp->rttvar_us;
tp->rtt_seq = tp->snd_nxt;
tcp_bpf_rtt(sk);
tcp_bpf_rtt(sk, mrtt_us, srtt);
}
tp->srtt_us = max(1U, srtt);
}

View File

@ -31,9 +31,9 @@ see_also = $(subst " ",, \
"\n" \
"SEE ALSO\n" \
"========\n" \
"\t**bpf**\ (2),\n" \
"\t**bpf-helpers**\\ (7)" \
$(foreach page,$(call list_pages,$(1)),",\n\t**$(page)**\\ (8)") \
"**bpf**\ (2),\n" \
"**bpf-helpers**\\ (7)" \
$(foreach page,$(call list_pages,$(1)),",\n**$(page)**\\ (8)") \
"\n")
$(OUTPUT)%.8: %.rst

View File

@ -14,82 +14,76 @@ tool for inspection of BTF data
SYNOPSIS
========
**bpftool** [*OPTIONS*] **btf** *COMMAND*
**bpftool** [*OPTIONS*] **btf** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } }
*COMMANDS* := { **dump** | **help** }
*COMMANDS* := { **dump** | **help** }
BTF COMMANDS
=============
| **bpftool** **btf** { **show** | **list** } [**id** *BTF_ID*]
| **bpftool** **btf dump** *BTF_SRC* [**format** *FORMAT*]
| **bpftool** **btf help**
| **bpftool** **btf** { **show** | **list** } [**id** *BTF_ID*]
| **bpftool** **btf dump** *BTF_SRC* [**format** *FORMAT*]
| **bpftool** **btf help**
|
| *BTF_SRC* := { **id** *BTF_ID* | **prog** *PROG* | **map** *MAP* [{**key** | **value** | **kv** | **all**}] | **file** *FILE* }
| *FORMAT* := { **raw** | **c** }
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
| *BTF_SRC* := { **id** *BTF_ID* | **prog** *PROG* | **map** *MAP* [{**key** | **value** | **kv** | **all**}] | **file** *FILE* }
| *FORMAT* := { **raw** | **c** }
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
DESCRIPTION
===========
**bpftool btf { show | list }** [**id** *BTF_ID*]
Show information about loaded BTF objects. If a BTF ID is
specified, show information only about given BTF object,
otherwise list all BTF objects currently loaded on the
system.
bpftool btf { show | list } [id *BTF_ID*]
Show information about loaded BTF objects. If a BTF ID is specified, show
information only about given BTF object, otherwise list all BTF objects
currently loaded on the system.
Since Linux 5.8 bpftool is able to discover information about
processes that hold open file descriptors (FDs) against BTF
objects. On such kernels bpftool will automatically emit this
information as well.
Since Linux 5.8 bpftool is able to discover information about processes
that hold open file descriptors (FDs) against BTF objects. On such kernels
bpftool will automatically emit this information as well.
**bpftool btf dump** *BTF_SRC*
Dump BTF entries from a given *BTF_SRC*.
bpftool btf dump *BTF_SRC*
Dump BTF entries from a given *BTF_SRC*.
When **id** is specified, BTF object with that ID will be
loaded and all its BTF types emitted.
When **id** is specified, BTF object with that ID will be loaded and all
its BTF types emitted.
When **map** is provided, it's expected that map has
associated BTF object with BTF types describing key and
value. It's possible to select whether to dump only BTF
type(s) associated with key (**key**), value (**value**),
both key and value (**kv**), or all BTF types present in
associated BTF object (**all**). If not specified, **kv**
is assumed.
When **map** is provided, it's expected that map has associated BTF object
with BTF types describing key and value. It's possible to select whether to
dump only BTF type(s) associated with key (**key**), value (**value**),
both key and value (**kv**), or all BTF types present in associated BTF
object (**all**). If not specified, **kv** is assumed.
When **prog** is provided, it's expected that program has
associated BTF object with BTF types.
When **prog** is provided, it's expected that program has associated BTF
object with BTF types.
When specifying *FILE*, an ELF file is expected, containing
.BTF section with well-defined BTF binary format data,
typically produced by clang or pahole.
When specifying *FILE*, an ELF file is expected, containing .BTF section
with well-defined BTF binary format data, typically produced by clang or
pahole.
**format** option can be used to override default (raw)
output format. Raw (**raw**) or C-syntax (**c**) output
formats are supported.
**format** option can be used to override default (raw) output format. Raw
(**raw**) or C-syntax (**c**) output formats are supported.
**bpftool btf help**
Print short help message.
bpftool btf help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
-B, --base-btf *FILE*
Pass a base BTF object. Base BTF objects are typically used
with BTF objects for kernel modules. To avoid duplicating
all kernel symbols required by modules, BTF objects for
modules are "split", they are built incrementally on top of
the kernel (vmlinux) BTF object. So the base BTF reference
should usually point to the kernel BTF.
-B, --base-btf *FILE*
Pass a base BTF object. Base BTF objects are typically used with BTF
objects for kernel modules. To avoid duplicating all kernel symbols
required by modules, BTF objects for modules are "split", they are
built incrementally on top of the kernel (vmlinux) BTF object. So the
base BTF reference should usually point to the kernel BTF.
When the main BTF object to process (for example, the
module BTF to dump) is passed as a *FILE*, bpftool attempts
to autodetect the path for the base object, and passing
this option is optional. When the main BTF object is passed
through other handles, this option becomes necessary.
When the main BTF object to process (for example, the module BTF to
dump) is passed as a *FILE*, bpftool attempts to autodetect the path
for the base object, and passing this option is optional. When the main
BTF object is passed through other handles, this option becomes
necessary.
EXAMPLES
========

View File

@ -14,134 +14,125 @@ tool for inspection and simple manipulation of eBPF progs
SYNOPSIS
========
**bpftool** [*OPTIONS*] **cgroup** *COMMAND*
**bpftool** [*OPTIONS*] **cgroup** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } }
*COMMANDS* :=
{ **show** | **list** | **tree** | **attach** | **detach** | **help** }
*COMMANDS* :=
{ **show** | **list** | **tree** | **attach** | **detach** | **help** }
CGROUP COMMANDS
===============
| **bpftool** **cgroup** { **show** | **list** } *CGROUP* [**effective**]
| **bpftool** **cgroup tree** [*CGROUP_ROOT*] [**effective**]
| **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
| **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
| **bpftool** **cgroup help**
| **bpftool** **cgroup** { **show** | **list** } *CGROUP* [**effective**]
| **bpftool** **cgroup tree** [*CGROUP_ROOT*] [**effective**]
| **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
| **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
| **bpftool** **cgroup help**
|
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
| *ATTACH_TYPE* := { **cgroup_inet_ingress** | **cgroup_inet_egress** |
| **cgroup_inet_sock_create** | **cgroup_sock_ops** |
| **cgroup_device** | **cgroup_inet4_bind** | **cgroup_inet6_bind** |
| **cgroup_inet4_post_bind** | **cgroup_inet6_post_bind** |
| **cgroup_inet4_connect** | **cgroup_inet6_connect** |
| **cgroup_unix_connect** | **cgroup_inet4_getpeername** |
| **cgroup_inet6_getpeername** | **cgroup_unix_getpeername** |
| **cgroup_inet4_getsockname** | **cgroup_inet6_getsockname** |
| **cgroup_unix_getsockname** | **cgroup_udp4_sendmsg** |
| **cgroup_udp6_sendmsg** | **cgroup_unix_sendmsg** |
| **cgroup_udp4_recvmsg** | **cgroup_udp6_recvmsg** |
| **cgroup_unix_recvmsg** | **cgroup_sysctl** |
| **cgroup_getsockopt** | **cgroup_setsockopt** |
| **cgroup_inet_sock_release** }
| *ATTACH_FLAGS* := { **multi** | **override** }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *ATTACH_TYPE* := { **cgroup_inet_ingress** | **cgroup_inet_egress** |
| **cgroup_inet_sock_create** | **cgroup_sock_ops** |
| **cgroup_device** | **cgroup_inet4_bind** | **cgroup_inet6_bind** |
| **cgroup_inet4_post_bind** | **cgroup_inet6_post_bind** |
| **cgroup_inet4_connect** | **cgroup_inet6_connect** |
| **cgroup_unix_connect** | **cgroup_inet4_getpeername** |
| **cgroup_inet6_getpeername** | **cgroup_unix_getpeername** |
| **cgroup_inet4_getsockname** | **cgroup_inet6_getsockname** |
| **cgroup_unix_getsockname** | **cgroup_udp4_sendmsg** |
| **cgroup_udp6_sendmsg** | **cgroup_unix_sendmsg** |
| **cgroup_udp4_recvmsg** | **cgroup_udp6_recvmsg** |
| **cgroup_unix_recvmsg** | **cgroup_sysctl** |
| **cgroup_getsockopt** | **cgroup_setsockopt** |
| **cgroup_inet_sock_release** }
| *ATTACH_FLAGS* := { **multi** | **override** }
DESCRIPTION
===========
**bpftool cgroup { show | list }** *CGROUP* [**effective**]
List all programs attached to the cgroup *CGROUP*.
bpftool cgroup { show | list } *CGROUP* [effective]
List all programs attached to the cgroup *CGROUP*.
Output will start with program ID followed by attach type,
attach flags and program name.
Output will start with program ID followed by attach type, attach flags and
program name.
If **effective** is specified retrieve effective programs that
will execute for events within a cgroup. This includes
inherited along with attached ones.
If **effective** is specified retrieve effective programs that will execute
for events within a cgroup. This includes inherited along with attached
ones.
**bpftool cgroup tree** [*CGROUP_ROOT*] [**effective**]
Iterate over all cgroups in *CGROUP_ROOT* and list all
attached programs. If *CGROUP_ROOT* is not specified,
bpftool uses cgroup v2 mountpoint.
bpftool cgroup tree [*CGROUP_ROOT*] [effective]
Iterate over all cgroups in *CGROUP_ROOT* and list all attached programs.
If *CGROUP_ROOT* is not specified, bpftool uses cgroup v2 mountpoint.
The output is similar to the output of cgroup show/list
commands: it starts with absolute cgroup path, followed by
program ID, attach type, attach flags and program name.
The output is similar to the output of cgroup show/list commands: it starts
with absolute cgroup path, followed by program ID, attach type, attach
flags and program name.
If **effective** is specified retrieve effective programs that
will execute for events within a cgroup. This includes
inherited along with attached ones.
If **effective** is specified retrieve effective programs that will execute
for events within a cgroup. This includes inherited along with attached
ones.
**bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
Attach program *PROG* to the cgroup *CGROUP* with attach type
*ATTACH_TYPE* and optional *ATTACH_FLAGS*.
bpftool cgroup attach *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
Attach program *PROG* to the cgroup *CGROUP* with attach type *ATTACH_TYPE*
and optional *ATTACH_FLAGS*.
*ATTACH_FLAGS* can be one of: **override** if a sub-cgroup installs
some bpf program, the program in this cgroup yields to sub-cgroup
program; **multi** if a sub-cgroup installs some bpf program,
that cgroup program gets run in addition to the program in this
cgroup.
*ATTACH_FLAGS* can be one of: **override** if a sub-cgroup installs some
bpf program, the program in this cgroup yields to sub-cgroup program;
**multi** if a sub-cgroup installs some bpf program, that cgroup program
gets run in addition to the program in this cgroup.
Only one program is allowed to be attached to a cgroup with
no attach flags or the **override** flag. Attaching another
program will release old program and attach the new one.
Only one program is allowed to be attached to a cgroup with no attach flags
or the **override** flag. Attaching another program will release old
program and attach the new one.
Multiple programs are allowed to be attached to a cgroup with
**multi**. They are executed in FIFO order (those that were
attached first, run first).
Multiple programs are allowed to be attached to a cgroup with **multi**.
They are executed in FIFO order (those that were attached first, run
first).
Non-default *ATTACH_FLAGS* are supported by kernel version 4.14
and later.
Non-default *ATTACH_FLAGS* are supported by kernel version 4.14 and later.
*ATTACH_TYPE* can be on of:
**ingress** ingress path of the inet socket (since 4.10);
**egress** egress path of the inet socket (since 4.10);
**sock_create** opening of an inet socket (since 4.10);
**sock_ops** various socket operations (since 4.12);
**device** device access (since 4.15);
**bind4** call to bind(2) for an inet4 socket (since 4.17);
**bind6** call to bind(2) for an inet6 socket (since 4.17);
**post_bind4** return from bind(2) for an inet4 socket (since 4.17);
**post_bind6** return from bind(2) for an inet6 socket (since 4.17);
**connect4** call to connect(2) for an inet4 socket (since 4.17);
**connect6** call to connect(2) for an inet6 socket (since 4.17);
**connect_unix** call to connect(2) for a unix socket (since 6.7);
**sendmsg4** call to sendto(2), sendmsg(2), sendmmsg(2) for an
unconnected udp4 socket (since 4.18);
**sendmsg6** call to sendto(2), sendmsg(2), sendmmsg(2) for an
unconnected udp6 socket (since 4.18);
**sendmsg_unix** call to sendto(2), sendmsg(2), sendmmsg(2) for
an unconnected unix socket (since 6.7);
**recvmsg4** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected udp4 socket (since 5.2);
**recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected udp6 socket (since 5.2);
**recvmsg_unix** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected unix socket (since 6.7);
**sysctl** sysctl access (since 5.2);
**getsockopt** call to getsockopt (since 5.3);
**setsockopt** call to setsockopt (since 5.3);
**getpeername4** call to getpeername(2) for an inet4 socket (since 5.8);
**getpeername6** call to getpeername(2) for an inet6 socket (since 5.8);
**getpeername_unix** call to getpeername(2) for a unix socket (since 6.7);
**getsockname4** call to getsockname(2) for an inet4 socket (since 5.8);
**getsockname6** call to getsockname(2) for an inet6 socket (since 5.8).
**getsockname_unix** call to getsockname(2) for a unix socket (since 6.7);
**sock_release** closing an userspace inet socket (since 5.9).
*ATTACH_TYPE* can be one of:
**bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
Detach *PROG* from the cgroup *CGROUP* and attach type
*ATTACH_TYPE*.
- **ingress** ingress path of the inet socket (since 4.10)
- **egress** egress path of the inet socket (since 4.10)
- **sock_create** opening of an inet socket (since 4.10)
- **sock_ops** various socket operations (since 4.12)
- **device** device access (since 4.15)
- **bind4** call to bind(2) for an inet4 socket (since 4.17)
- **bind6** call to bind(2) for an inet6 socket (since 4.17)
- **post_bind4** return from bind(2) for an inet4 socket (since 4.17)
- **post_bind6** return from bind(2) for an inet6 socket (since 4.17)
- **connect4** call to connect(2) for an inet4 socket (since 4.17)
- **connect6** call to connect(2) for an inet6 socket (since 4.17)
- **connect_unix** call to connect(2) for a unix socket (since 6.7)
- **sendmsg4** call to sendto(2), sendmsg(2), sendmmsg(2) for an unconnected udp4 socket (since 4.18)
- **sendmsg6** call to sendto(2), sendmsg(2), sendmmsg(2) for an unconnected udp6 socket (since 4.18)
- **sendmsg_unix** call to sendto(2), sendmsg(2), sendmmsg(2) for an unconnected unix socket (since 6.7)
- **recvmsg4** call to recvfrom(2), recvmsg(2), recvmmsg(2) for an unconnected udp4 socket (since 5.2)
- **recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for an unconnected udp6 socket (since 5.2)
- **recvmsg_unix** call to recvfrom(2), recvmsg(2), recvmmsg(2) for an unconnected unix socket (since 6.7)
- **sysctl** sysctl access (since 5.2)
- **getsockopt** call to getsockopt (since 5.3)
- **setsockopt** call to setsockopt (since 5.3)
- **getpeername4** call to getpeername(2) for an inet4 socket (since 5.8)
- **getpeername6** call to getpeername(2) for an inet6 socket (since 5.8)
- **getpeername_unix** call to getpeername(2) for a unix socket (since 6.7)
- **getsockname4** call to getsockname(2) for an inet4 socket (since 5.8)
- **getsockname6** call to getsockname(2) for an inet6 socket (since 5.8)
- **getsockname_unix** call to getsockname(2) for a unix socket (since 6.7)
- **sock_release** closing a userspace inet socket (since 5.9)
**bpftool prog help**
Print short help message.
bpftool cgroup detach *CGROUP* *ATTACH_TYPE* *PROG*
Detach *PROG* from the cgroup *CGROUP* and attach type *ATTACH_TYPE*.
bpftool prog help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
-f, --bpffs
Show file names of pinned programs.
-f, --bpffs
Show file names of pinned programs.
EXAMPLES
========

View File

@ -14,77 +14,70 @@ tool for inspection of eBPF-related parameters for Linux kernel or net device
SYNOPSIS
========
**bpftool** [*OPTIONS*] **feature** *COMMAND*
**bpftool** [*OPTIONS*] **feature** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := { **probe** | **help** }
*COMMANDS* := { **probe** | **help** }
FEATURE COMMANDS
================
| **bpftool** **feature probe** [*COMPONENT*] [**full**] [**unprivileged**] [**macros** [**prefix** *PREFIX*]]
| **bpftool** **feature list_builtins** *GROUP*
| **bpftool** **feature help**
| **bpftool** **feature probe** [*COMPONENT*] [**full**] [**unprivileged**] [**macros** [**prefix** *PREFIX*]]
| **bpftool** **feature list_builtins** *GROUP*
| **bpftool** **feature help**
|
| *COMPONENT* := { **kernel** | **dev** *NAME* }
| *GROUP* := { **prog_types** | **map_types** | **attach_types** | **link_types** | **helpers** }
| *COMPONENT* := { **kernel** | **dev** *NAME* }
| *GROUP* := { **prog_types** | **map_types** | **attach_types** | **link_types** | **helpers** }
DESCRIPTION
===========
**bpftool feature probe** [**kernel**] [**full**] [**macros** [**prefix** *PREFIX*]]
Probe the running kernel and dump a number of eBPF-related
parameters, such as availability of the **bpf**\ () system call,
JIT status, eBPF program types availability, eBPF helper
functions availability, and more.
bpftool feature probe [kernel] [full] [macros [prefix *PREFIX*]]
Probe the running kernel and dump a number of eBPF-related parameters, such
as availability of the **bpf**\ () system call, JIT status, eBPF program
types availability, eBPF helper functions availability, and more.
By default, bpftool **does not run probes** for
**bpf_probe_write_user**\ () and **bpf_trace_printk**\()
helpers which print warnings to kernel logs. To enable them
and run all probes, the **full** keyword should be used.
By default, bpftool **does not run probes** for **bpf_probe_write_user**\
() and **bpf_trace_printk**\() helpers which print warnings to kernel logs.
To enable them and run all probes, the **full** keyword should be used.
If the **macros** keyword (but not the **-j** option) is
passed, a subset of the output is dumped as a list of
**#define** macros that are ready to be included in a C
header file, for example. If, additionally, **prefix** is
used to define a *PREFIX*, the provided string will be used
as a prefix to the names of the macros: this can be used to
avoid conflicts on macro names when including the output of
this command as a header file.
If the **macros** keyword (but not the **-j** option) is passed, a subset
of the output is dumped as a list of **#define** macros that are ready to
be included in a C header file, for example. If, additionally, **prefix**
is used to define a *PREFIX*, the provided string will be used as a prefix
to the names of the macros: this can be used to avoid conflicts on macro
names when including the output of this command as a header file.
Keyword **kernel** can be omitted. If no probe target is
specified, probing the kernel is the default behaviour.
Keyword **kernel** can be omitted. If no probe target is specified, probing
the kernel is the default behaviour.
When the **unprivileged** keyword is used, bpftool will dump
only the features available to a user who does not have the
**CAP_SYS_ADMIN** capability set. The features available in
that case usually represent a small subset of the parameters
supported by the system. Unprivileged users MUST use the
**unprivileged** keyword: This is to avoid misdetection if
bpftool is inadvertently run as non-root, for example. This
keyword is unavailable if bpftool was compiled without
libcap.
When the **unprivileged** keyword is used, bpftool will dump only the
features available to a user who does not have the **CAP_SYS_ADMIN**
capability set. The features available in that case usually represent a
small subset of the parameters supported by the system. Unprivileged users
MUST use the **unprivileged** keyword: This is to avoid misdetection if
bpftool is inadvertently run as non-root, for example. This keyword is
unavailable if bpftool was compiled without libcap.
**bpftool feature probe dev** *NAME* [**full**] [**macros** [**prefix** *PREFIX*]]
Probe network device for supported eBPF features and dump
results to the console.
bpftool feature probe dev *NAME* [full] [macros [prefix *PREFIX*]]
Probe network device for supported eBPF features and dump results to the
console.
The keywords **full**, **macros** and **prefix** have the
same role as when probing the kernel.
The keywords **full**, **macros** and **prefix** have the same role as when
probing the kernel.
**bpftool feature list_builtins** *GROUP*
List items known to bpftool. These can be BPF program types
(**prog_types**), BPF map types (**map_types**), attach types
(**attach_types**), link types (**link_types**), or BPF helper
functions (**helpers**). The command does not probe the system, but
simply lists the elements that bpftool knows from compilation time,
as provided from libbpf (for all object types) or from the BPF UAPI
header (list of helpers). This can be used in scripts to iterate over
BPF types or helpers.
bpftool feature list_builtins *GROUP*
List items known to bpftool. These can be BPF program types
(**prog_types**), BPF map types (**map_types**), attach types
(**attach_types**), link types (**link_types**), or BPF helper functions
(**helpers**). The command does not probe the system, but simply lists the
elements that bpftool knows from compilation time, as provided from libbpf
(for all object types) or from the BPF UAPI header (list of helpers). This
can be used in scripts to iterate over BPF types or helpers.
**bpftool feature help**
Print short help message.
bpftool feature help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst

View File

@ -14,199 +14,177 @@ tool for BPF code-generation
SYNOPSIS
========
**bpftool** [*OPTIONS*] **gen** *COMMAND*
**bpftool** [*OPTIONS*] **gen** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-L** | **--use-loader** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-L** | **--use-loader** } }
*COMMAND* := { **object** | **skeleton** | **help** }
*COMMAND* := { **object** | **skeleton** | **help** }
GEN COMMANDS
=============
| **bpftool** **gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...]
| **bpftool** **gen skeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen subskeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
| **bpftool** **gen help**
| **bpftool** **gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...]
| **bpftool** **gen skeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen subskeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
| **bpftool** **gen help**
DESCRIPTION
===========
**bpftool gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...]
Statically link (combine) together one or more *INPUT_FILE*'s
into a single resulting *OUTPUT_FILE*. All the files involved
are BPF ELF object files.
bpftool gen object *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...]
Statically link (combine) together one or more *INPUT_FILE*'s into a single
resulting *OUTPUT_FILE*. All the files involved are BPF ELF object files.
The rules of BPF static linking are mostly the same as for
user-space object files, but in addition to combining data
and instruction sections, .BTF and .BTF.ext (if present in
any of the input files) data are combined together. .BTF
data is deduplicated, so all the common types across
*INPUT_FILE*'s will only be represented once in the resulting
BTF information.
The rules of BPF static linking are mostly the same as for user-space
object files, but in addition to combining data and instruction sections,
.BTF and .BTF.ext (if present in any of the input files) data are combined
together. .BTF data is deduplicated, so all the common types across
*INPUT_FILE*'s will only be represented once in the resulting BTF
information.
BPF static linking allows to partition BPF source code into
individually compiled files that are then linked into
a single resulting BPF object file, which can be used to
generated BPF skeleton (with **gen skeleton** command) or
passed directly into **libbpf** (using **bpf_object__open()**
family of APIs).
BPF static linking allows to partition BPF source code into individually
compiled files that are then linked into a single resulting BPF object
file, which can be used to generated BPF skeleton (with **gen skeleton**
command) or passed directly into **libbpf** (using **bpf_object__open()**
family of APIs).
**bpftool gen skeleton** *FILE*
Generate BPF skeleton C header file for a given *FILE*.
bpftool gen skeleton *FILE*
Generate BPF skeleton C header file for a given *FILE*.
BPF skeleton is an alternative interface to existing libbpf
APIs for working with BPF objects. Skeleton code is intended
to significantly shorten and simplify code to load and work
with BPF programs from userspace side. Generated code is
tailored to specific input BPF object *FILE*, reflecting its
structure by listing out available maps, program, variables,
etc. Skeleton eliminates the need to lookup mentioned
components by name. Instead, if skeleton instantiation
succeeds, they are populated in skeleton structure as valid
libbpf types (e.g., **struct bpf_map** pointer) and can be
passed to existing generic libbpf APIs.
BPF skeleton is an alternative interface to existing libbpf APIs for
working with BPF objects. Skeleton code is intended to significantly
shorten and simplify code to load and work with BPF programs from userspace
side. Generated code is tailored to specific input BPF object *FILE*,
reflecting its structure by listing out available maps, program, variables,
etc. Skeleton eliminates the need to lookup mentioned components by name.
Instead, if skeleton instantiation succeeds, they are populated in skeleton
structure as valid libbpf types (e.g., **struct bpf_map** pointer) and can
be passed to existing generic libbpf APIs.
In addition to simple and reliable access to maps and
programs, skeleton provides a storage for BPF links (**struct
bpf_link**) for each BPF program within BPF object. When
requested, supported BPF programs will be automatically
attached and resulting BPF links stored for further use by
user in pre-allocated fields in skeleton struct. For BPF
programs that can't be automatically attached by libbpf,
user can attach them manually, but store resulting BPF link
in per-program link field. All such set up links will be
automatically destroyed on BPF skeleton destruction. This
eliminates the need for users to manage links manually and
rely on libbpf support to detach programs and free up
resources.
In addition to simple and reliable access to maps and programs, skeleton
provides a storage for BPF links (**struct bpf_link**) for each BPF program
within BPF object. When requested, supported BPF programs will be
automatically attached and resulting BPF links stored for further use by
user in pre-allocated fields in skeleton struct. For BPF programs that
can't be automatically attached by libbpf, user can attach them manually,
but store resulting BPF link in per-program link field. All such set up
links will be automatically destroyed on BPF skeleton destruction. This
eliminates the need for users to manage links manually and rely on libbpf
support to detach programs and free up resources.
Another facility provided by BPF skeleton is an interface to
global variables of all supported kinds: mutable, read-only,
as well as extern ones. This interface allows to pre-setup
initial values of variables before BPF object is loaded and
verified by kernel. For non-read-only variables, the same
interface can be used to fetch values of global variables on
userspace side, even if they are modified by BPF code.
Another facility provided by BPF skeleton is an interface to global
variables of all supported kinds: mutable, read-only, as well as extern
ones. This interface allows to pre-setup initial values of variables before
BPF object is loaded and verified by kernel. For non-read-only variables,
the same interface can be used to fetch values of global variables on
userspace side, even if they are modified by BPF code.
During skeleton generation, contents of source BPF object
*FILE* is embedded within generated code and is thus not
necessary to keep around. This ensures skeleton and BPF
object file are matching 1-to-1 and always stay in sync.
Generated code is dual-licensed under LGPL-2.1 and
BSD-2-Clause licenses.
During skeleton generation, contents of source BPF object *FILE* is
embedded within generated code and is thus not necessary to keep around.
This ensures skeleton and BPF object file are matching 1-to-1 and always
stay in sync. Generated code is dual-licensed under LGPL-2.1 and
BSD-2-Clause licenses.
It is a design goal and guarantee that skeleton interfaces
are interoperable with generic libbpf APIs. User should
always be able to use skeleton API to create and load BPF
object, and later use libbpf APIs to keep working with
specific maps, programs, etc.
It is a design goal and guarantee that skeleton interfaces are
interoperable with generic libbpf APIs. User should always be able to use
skeleton API to create and load BPF object, and later use libbpf APIs to
keep working with specific maps, programs, etc.
As part of skeleton, few custom functions are generated.
Each of them is prefixed with object name. Object name can
either be derived from object file name, i.e., if BPF object
file name is **example.o**, BPF object name will be
**example**. Object name can be also specified explicitly
through **name** *OBJECT_NAME* parameter. The following
custom functions are provided (assuming **example** as
the object name):
As part of skeleton, few custom functions are generated. Each of them is
prefixed with object name. Object name can either be derived from object
file name, i.e., if BPF object file name is **example.o**, BPF object name
will be **example**. Object name can be also specified explicitly through
**name** *OBJECT_NAME* parameter. The following custom functions are
provided (assuming **example** as the object name):
- **example__open** and **example__open_opts**.
These functions are used to instantiate skeleton. It
corresponds to libbpf's **bpf_object__open**\ () API.
**_opts** variants accepts extra **bpf_object_open_opts**
options.
- **example__open** and **example__open_opts**.
These functions are used to instantiate skeleton. It corresponds to
libbpf's **bpf_object__open**\ () API. **_opts** variants accepts extra
**bpf_object_open_opts** options.
- **example__load**.
This function creates maps, loads and verifies BPF
programs, initializes global data maps. It corresponds to
libppf's **bpf_object__load**\ () API.
- **example__load**.
This function creates maps, loads and verifies BPF programs, initializes
global data maps. It corresponds to libppf's **bpf_object__load**\ ()
API.
- **example__open_and_load** combines **example__open** and
**example__load** invocations in one commonly used
operation.
- **example__open_and_load** combines **example__open** and
**example__load** invocations in one commonly used operation.
- **example__attach** and **example__detach**
This pair of functions allow to attach and detach,
correspondingly, already loaded BPF object. Only BPF
programs of types supported by libbpf for auto-attachment
will be auto-attached and their corresponding BPF links
instantiated. For other BPF programs, user can manually
create a BPF link and assign it to corresponding fields in
skeleton struct. **example__detach** will detach both
links created automatically, as well as those populated by
user manually.
- **example__attach** and **example__detach**.
This pair of functions allow to attach and detach, correspondingly,
already loaded BPF object. Only BPF programs of types supported by libbpf
for auto-attachment will be auto-attached and their corresponding BPF
links instantiated. For other BPF programs, user can manually create a
BPF link and assign it to corresponding fields in skeleton struct.
**example__detach** will detach both links created automatically, as well
as those populated by user manually.
- **example__destroy**
Detach and unload BPF programs, free up all the resources
used by skeleton and BPF object.
- **example__destroy**.
Detach and unload BPF programs, free up all the resources used by
skeleton and BPF object.
If BPF object has global variables, corresponding structs
with memory layout corresponding to global data data section
layout will be created. Currently supported ones are: *.data*,
*.bss*, *.rodata*, and *.kconfig* structs/data sections.
These data sections/structs can be used to set up initial
values of variables, if set before **example__load**.
Afterwards, if target kernel supports memory-mapped BPF
arrays, same structs can be used to fetch and update
(non-read-only) data from userspace, with same simplicity
as for BPF side.
If BPF object has global variables, corresponding structs with memory
layout corresponding to global data data section layout will be created.
Currently supported ones are: *.data*, *.bss*, *.rodata*, and *.kconfig*
structs/data sections. These data sections/structs can be used to set up
initial values of variables, if set before **example__load**. Afterwards,
if target kernel supports memory-mapped BPF arrays, same structs can be
used to fetch and update (non-read-only) data from userspace, with same
simplicity as for BPF side.
**bpftool gen subskeleton** *FILE*
Generate BPF subskeleton C header file for a given *FILE*.
bpftool gen subskeleton *FILE*
Generate BPF subskeleton C header file for a given *FILE*.
Subskeletons are similar to skeletons, except they do not own
the corresponding maps, programs, or global variables. They
require that the object file used to generate them is already
loaded into a *bpf_object* by some other means.
Subskeletons are similar to skeletons, except they do not own the
corresponding maps, programs, or global variables. They require that the
object file used to generate them is already loaded into a *bpf_object* by
some other means.
This functionality is useful when a library is included into a
larger BPF program. A subskeleton for the library would have
access to all objects and globals defined in it, without
having to know about the larger program.
This functionality is useful when a library is included into a larger BPF
program. A subskeleton for the library would have access to all objects and
globals defined in it, without having to know about the larger program.
Consequently, there are only two functions defined
for subskeletons:
Consequently, there are only two functions defined for subskeletons:
- **example__open(bpf_object\*)**
Instantiates a subskeleton from an already opened (but not
necessarily loaded) **bpf_object**.
- **example__open(bpf_object\*)**.
Instantiates a subskeleton from an already opened (but not necessarily
loaded) **bpf_object**.
- **example__destroy()**
Frees the storage for the subskeleton but *does not* unload
any BPF programs or maps.
- **example__destroy()**.
Frees the storage for the subskeleton but *does not* unload any BPF
programs or maps.
**bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
Generate a minimum BTF file as *OUTPUT*, derived from a given
*INPUT* BTF file, containing all needed BTF types so one, or
more, given eBPF objects CO-RE relocations may be satisfied.
bpftool gen min_core_btf *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
Generate a minimum BTF file as *OUTPUT*, derived from a given *INPUT* BTF
file, containing all needed BTF types so one, or more, given eBPF objects
CO-RE relocations may be satisfied.
When kernels aren't compiled with CONFIG_DEBUG_INFO_BTF,
libbpf, when loading an eBPF object, has to rely on external
BTF files to be able to calculate CO-RE relocations.
When kernels aren't compiled with CONFIG_DEBUG_INFO_BTF, libbpf, when
loading an eBPF object, has to rely on external BTF files to be able to
calculate CO-RE relocations.
Usually, an external BTF file is built from existing kernel
DWARF data using pahole. It contains all the types used by
its respective kernel image and, because of that, is big.
Usually, an external BTF file is built from existing kernel DWARF data
using pahole. It contains all the types used by its respective kernel image
and, because of that, is big.
The min_core_btf feature builds smaller BTF files, customized
to one or multiple eBPF objects, so they can be distributed
together with an eBPF CO-RE based application, turning the
application portable to different kernel versions.
The min_core_btf feature builds smaller BTF files, customized to one or
multiple eBPF objects, so they can be distributed together with an eBPF
CO-RE based application, turning the application portable to different
kernel versions.
Check examples bellow for more information how to use it.
Check examples bellow for more information how to use it.
**bpftool gen help**
Print short help message.
bpftool gen help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
-L, --use-loader
For skeletons, generate a "light" skeleton (also known as "loader"
skeleton). A light skeleton contains a loader eBPF program. It does
not use the majority of the libbpf infrastructure, and does not need
libelf.
-L, --use-loader
For skeletons, generate a "light" skeleton (also known as "loader"
skeleton). A light skeleton contains a loader eBPF program. It does not use
the majority of the libbpf infrastructure, and does not need libelf.
EXAMPLES
========

View File

@ -14,50 +14,46 @@ tool to create BPF iterators
SYNOPSIS
========
**bpftool** [*OPTIONS*] **iter** *COMMAND*
**bpftool** [*OPTIONS*] **iter** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := { **pin** | **help** }
*COMMANDS* := { **pin** | **help** }
ITER COMMANDS
===================
=============
| **bpftool** **iter pin** *OBJ* *PATH* [**map** *MAP*]
| **bpftool** **iter help**
| **bpftool** **iter pin** *OBJ* *PATH* [**map** *MAP*]
| **bpftool** **iter help**
|
| *OBJ* := /a/file/of/bpf_iter_target.o
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
| *OBJ* := /a/file/of/bpf_iter_target.o
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
DESCRIPTION
===========
**bpftool iter pin** *OBJ* *PATH* [**map** *MAP*]
A bpf iterator combines a kernel iterating of
particular kernel data (e.g., tasks, bpf_maps, etc.)
and a bpf program called for each kernel data object
(e.g., one task, one bpf_map, etc.). User space can
*read* kernel iterator output through *read()* syscall.
bpftool iter pin *OBJ* *PATH* [map *MAP*]
A bpf iterator combines a kernel iterating of particular kernel data (e.g.,
tasks, bpf_maps, etc.) and a bpf program called for each kernel data object
(e.g., one task, one bpf_map, etc.). User space can *read* kernel iterator
output through *read()* syscall.
The *pin* command creates a bpf iterator from *OBJ*,
and pin it to *PATH*. The *PATH* should be located
in *bpffs* mount. It must not contain a dot
character ('.'), which is reserved for future extensions
of *bpffs*.
The *pin* command creates a bpf iterator from *OBJ*, and pin it to *PATH*.
The *PATH* should be located in *bpffs* mount. It must not contain a dot
character ('.'), which is reserved for future extensions of *bpffs*.
Map element bpf iterator requires an additional parameter
*MAP* so bpf program can iterate over map elements for
that map. User can have a bpf program in kernel to run
with each map element, do checking, filtering, aggregation,
etc. without copying data to user space.
Map element bpf iterator requires an additional parameter *MAP* so bpf
program can iterate over map elements for that map. User can have a bpf
program in kernel to run with each map element, do checking, filtering,
aggregation, etc. without copying data to user space.
User can then *cat PATH* to see the bpf iterator output.
User can then *cat PATH* to see the bpf iterator output.
**bpftool iter help**
Print short help message.
bpftool iter help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
EXAMPLES
========

View File

@ -14,67 +14,62 @@ tool for inspection and simple manipulation of eBPF links
SYNOPSIS
========
**bpftool** [*OPTIONS*] **link** *COMMAND*
**bpftool** [*OPTIONS*] **link** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*COMMANDS* := { **show** | **list** | **pin** | **help** }
*COMMANDS* := { **show** | **list** | **pin** | **help** }
LINK COMMANDS
=============
| **bpftool** **link { show | list }** [*LINK*]
| **bpftool** **link pin** *LINK* *FILE*
| **bpftool** **link detach** *LINK*
| **bpftool** **link help**
| **bpftool** **link { show | list }** [*LINK*]
| **bpftool** **link pin** *LINK* *FILE*
| **bpftool** **link detach** *LINK*
| **bpftool** **link help**
|
| *LINK* := { **id** *LINK_ID* | **pinned** *FILE* }
| *LINK* := { **id** *LINK_ID* | **pinned** *FILE* }
DESCRIPTION
===========
**bpftool link { show | list }** [*LINK*]
Show information about active links. If *LINK* is
specified show information only about given link,
otherwise list all links currently active on the system.
bpftool link { show | list } [*LINK*]
Show information about active links. If *LINK* is specified show
information only about given link, otherwise list all links currently
active on the system.
Output will start with link ID followed by link type and
zero or more named attributes, some of which depend on type
of link.
Output will start with link ID followed by link type and zero or more named
attributes, some of which depend on type of link.
Since Linux 5.8 bpftool is able to discover information about
processes that hold open file descriptors (FDs) against BPF
links. On such kernels bpftool will automatically emit this
information as well.
Since Linux 5.8 bpftool is able to discover information about processes
that hold open file descriptors (FDs) against BPF links. On such kernels
bpftool will automatically emit this information as well.
**bpftool link pin** *LINK* *FILE*
Pin link *LINK* as *FILE*.
bpftool link pin *LINK* *FILE*
Pin link *LINK* as *FILE*.
Note: *FILE* must be located in *bpffs* mount. It must not
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.
Note: *FILE* must be located in *bpffs* mount. It must not contain a dot
character ('.'), which is reserved for future extensions of *bpffs*.
**bpftool link detach** *LINK*
Force-detach link *LINK*. BPF link and its underlying BPF
program will stay valid, but they will be detached from the
respective BPF hook and BPF link will transition into
a defunct state until last open file descriptor for that
link is closed.
bpftool link detach *LINK*
Force-detach link *LINK*. BPF link and its underlying BPF program will stay
valid, but they will be detached from the respective BPF hook and BPF link
will transition into a defunct state until last open file descriptor for
that link is closed.
**bpftool link help**
Print short help message.
bpftool link help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
-f, --bpffs
When showing BPF links, show file names of pinned
links.
-f, --bpffs
When showing BPF links, show file names of pinned links.
-n, --nomount
Do not automatically attempt to mount any virtual file system
(such as tracefs or BPF virtual file system) when necessary.
-n, --nomount
Do not automatically attempt to mount any virtual file system (such as
tracefs or BPF virtual file system) when necessary.
EXAMPLES
========

View File

@ -14,166 +14,160 @@ tool for inspection and simple manipulation of eBPF maps
SYNOPSIS
========
**bpftool** [*OPTIONS*] **map** *COMMAND*
**bpftool** [*OPTIONS*] **map** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |
**delete** | **pin** | **help** }
*COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |
**delete** | **pin** | **help** }
MAP COMMANDS
=============
| **bpftool** **map** { **show** | **list** } [*MAP*]
| **bpftool** **map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \
| **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**inner_map** *MAP*] \
| [**offload_dev** *NAME*]
| **bpftool** **map dump** *MAP*
| **bpftool** **map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
| **bpftool** **map lookup** *MAP* [**key** *DATA*]
| **bpftool** **map getnext** *MAP* [**key** *DATA*]
| **bpftool** **map delete** *MAP* **key** *DATA*
| **bpftool** **map pin** *MAP* *FILE*
| **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
| **bpftool** **map peek** *MAP*
| **bpftool** **map push** *MAP* **value** *VALUE*
| **bpftool** **map pop** *MAP*
| **bpftool** **map enqueue** *MAP* **value** *VALUE*
| **bpftool** **map dequeue** *MAP*
| **bpftool** **map freeze** *MAP*
| **bpftool** **map help**
| **bpftool** **map** { **show** | **list** } [*MAP*]
| **bpftool** **map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \
| **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**inner_map** *MAP*] \
| [**offload_dev** *NAME*]
| **bpftool** **map dump** *MAP*
| **bpftool** **map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
| **bpftool** **map lookup** *MAP* [**key** *DATA*]
| **bpftool** **map getnext** *MAP* [**key** *DATA*]
| **bpftool** **map delete** *MAP* **key** *DATA*
| **bpftool** **map pin** *MAP* *FILE*
| **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
| **bpftool** **map peek** *MAP*
| **bpftool** **map push** *MAP* **value** *VALUE*
| **bpftool** **map pop** *MAP*
| **bpftool** **map enqueue** *MAP* **value** *VALUE*
| **bpftool** **map dequeue** *MAP*
| **bpftool** **map freeze** *MAP*
| **bpftool** **map help**
|
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* | **name** *MAP_NAME* }
| *DATA* := { [**hex**] *BYTES* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *VALUE* := { *DATA* | *MAP* | *PROG* }
| *UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
| *TYPE* := { **hash** | **array** | **prog_array** | **perf_event_array** | **percpu_hash**
| | **percpu_array** | **stack_trace** | **cgroup_array** | **lru_hash**
| | **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
| | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* | **name** *MAP_NAME* }
| *DATA* := { [**hex**] *BYTES* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *VALUE* := { *DATA* | *MAP* | *PROG* }
| *UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
| *TYPE* := { **hash** | **array** | **prog_array** | **perf_event_array** | **percpu_hash**
| | **percpu_array** | **stack_trace** | **cgroup_array** | **lru_hash**
| | **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
| | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
DESCRIPTION
===========
**bpftool map { show | list }** [*MAP*]
Show information about loaded maps. If *MAP* is specified
show information only about given maps, otherwise list all
maps currently loaded on the system. In case of **name**,
*MAP* may match several maps which will all be shown.
bpftool map { show | list } [*MAP*]
Show information about loaded maps. If *MAP* is specified show information
only about given maps, otherwise list all maps currently loaded on the
system. In case of **name**, *MAP* may match several maps which will all
be shown.
Output will start with map ID followed by map type and
zero or more named attributes (depending on kernel version).
Output will start with map ID followed by map type and zero or more named
attributes (depending on kernel version).
Since Linux 5.8 bpftool is able to discover information about
processes that hold open file descriptors (FDs) against BPF
maps. On such kernels bpftool will automatically emit this
information as well.
Since Linux 5.8 bpftool is able to discover information about processes
that hold open file descriptors (FDs) against BPF maps. On such kernels
bpftool will automatically emit this information as well.
**bpftool map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**inner_map** *MAP*] [**offload_dev** *NAME*]
Create a new map with given parameters and pin it to *bpffs*
as *FILE*.
bpftool map create *FILE* type *TYPE* key *KEY_SIZE* value *VALUE_SIZE* entries *MAX_ENTRIES* name *NAME* [flags *FLAGS*] [inner_map *MAP*] [offload_dev *NAME*]
Create a new map with given parameters and pin it to *bpffs* as *FILE*.
*FLAGS* should be an integer which is the combination of
desired flags, e.g. 1024 for **BPF_F_MMAPABLE** (see bpf.h
UAPI header for existing flags).
*FLAGS* should be an integer which is the combination of desired flags,
e.g. 1024 for **BPF_F_MMAPABLE** (see bpf.h UAPI header for existing
flags).
To create maps of type array-of-maps or hash-of-maps, the
**inner_map** keyword must be used to pass an inner map. The
kernel needs it to collect metadata related to the inner maps
that the new map will work with.
To create maps of type array-of-maps or hash-of-maps, the **inner_map**
keyword must be used to pass an inner map. The kernel needs it to collect
metadata related to the inner maps that the new map will work with.
Keyword **offload_dev** expects a network interface name,
and is used to request hardware offload for the map.
Keyword **offload_dev** expects a network interface name, and is used to
request hardware offload for the map.
**bpftool map dump** *MAP*
Dump all entries in a given *MAP*. In case of **name**,
*MAP* may match several maps which will all be dumped.
bpftool map dump *MAP*
Dump all entries in a given *MAP*. In case of **name**, *MAP* may match
several maps which will all be dumped.
**bpftool map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
Update map entry for a given *KEY*.
bpftool map update *MAP* [key *DATA*] [value *VALUE*] [*UPDATE_FLAGS*]
Update map entry for a given *KEY*.
*UPDATE_FLAGS* can be one of: **any** update existing entry
or add if doesn't exit; **exist** update only if entry already
exists; **noexist** update only if entry doesn't exist.
*UPDATE_FLAGS* can be one of: **any** update existing entry or add if
doesn't exit; **exist** update only if entry already exists; **noexist**
update only if entry doesn't exist.
If the **hex** keyword is provided in front of the bytes
sequence, the bytes are parsed as hexadecimal values, even if
no "0x" prefix is added. If the keyword is not provided, then
the bytes are parsed as decimal values, unless a "0x" prefix
(for hexadecimal) or a "0" prefix (for octal) is provided.
If the **hex** keyword is provided in front of the bytes sequence, the
bytes are parsed as hexadecimal values, even if no "0x" prefix is added. If
the keyword is not provided, then the bytes are parsed as decimal values,
unless a "0x" prefix (for hexadecimal) or a "0" prefix (for octal) is
provided.
**bpftool map lookup** *MAP* [**key** *DATA*]
Lookup **key** in the map.
bpftool map lookup *MAP* [key *DATA*]
Lookup **key** in the map.
**bpftool map getnext** *MAP* [**key** *DATA*]
Get next key. If *key* is not specified, get first key.
bpftool map getnext *MAP* [key *DATA*]
Get next key. If *key* is not specified, get first key.
**bpftool map delete** *MAP* **key** *DATA*
Remove entry from the map.
bpftool map delete *MAP* key *DATA*
Remove entry from the map.
**bpftool map pin** *MAP* *FILE*
Pin map *MAP* as *FILE*.
bpftool map pin *MAP* *FILE*
Pin map *MAP* as *FILE*.
Note: *FILE* must be located in *bpffs* mount. It must not
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.
Note: *FILE* must be located in *bpffs* mount. It must not contain a dot
character ('.'), which is reserved for future extensions of *bpffs*.
**bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
Read events from a **BPF_MAP_TYPE_PERF_EVENT_ARRAY** map.
bpftool map event_pipe *MAP* [cpu *N* index *M*]
Read events from a **BPF_MAP_TYPE_PERF_EVENT_ARRAY** map.
Install perf rings into a perf event array map and dump
output of any **bpf_perf_event_output**\ () call in the kernel.
By default read the number of CPUs on the system and
install perf ring for each CPU in the corresponding index
in the array.
Install perf rings into a perf event array map and dump output of any
**bpf_perf_event_output**\ () call in the kernel. By default read the
number of CPUs on the system and install perf ring for each CPU in the
corresponding index in the array.
If **cpu** and **index** are specified, install perf ring
for given **cpu** at **index** in the array (single ring).
If **cpu** and **index** are specified, install perf ring for given **cpu**
at **index** in the array (single ring).
Note that installing a perf ring into an array will silently
replace any existing ring. Any other application will stop
receiving events if it installed its rings earlier.
Note that installing a perf ring into an array will silently replace any
existing ring. Any other application will stop receiving events if it
installed its rings earlier.
**bpftool map peek** *MAP*
Peek next value in the queue or stack.
bpftool map peek *MAP*
Peek next value in the queue or stack.
**bpftool map push** *MAP* **value** *VALUE*
Push *VALUE* onto the stack.
bpftool map push *MAP* value *VALUE*
Push *VALUE* onto the stack.
**bpftool map pop** *MAP*
Pop and print value from the stack.
bpftool map pop *MAP*
Pop and print value from the stack.
**bpftool map enqueue** *MAP* **value** *VALUE*
Enqueue *VALUE* into the queue.
bpftool map enqueue *MAP* value *VALUE*
Enqueue *VALUE* into the queue.
**bpftool map dequeue** *MAP*
Dequeue and print value from the queue.
bpftool map dequeue *MAP*
Dequeue and print value from the queue.
**bpftool map freeze** *MAP*
Freeze the map as read-only from user space. Entries from a
frozen map can not longer be updated or deleted with the
**bpf**\ () system call. This operation is not reversible,
and the map remains immutable from user space until its
destruction. However, read and write permissions for BPF
programs to the map remain unchanged.
bpftool map freeze *MAP*
Freeze the map as read-only from user space. Entries from a frozen map can
not longer be updated or deleted with the **bpf**\ () system call. This
operation is not reversible, and the map remains immutable from user space
until its destruction. However, read and write permissions for BPF programs
to the map remain unchanged.
**bpftool map help**
Print short help message.
bpftool map help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
-f, --bpffs
Show file names of pinned maps.
-f, --bpffs
Show file names of pinned maps.
-n, --nomount
Do not automatically attempt to mount any virtual file system
(such as tracefs or BPF virtual file system) when necessary.
-n, --nomount
Do not automatically attempt to mount any virtual file system (such as
tracefs or BPF virtual file system) when necessary.
EXAMPLES
========

View File

@ -14,76 +14,74 @@ tool for inspection of networking related bpf prog attachments
SYNOPSIS
========
**bpftool** [*OPTIONS*] **net** *COMMAND*
**bpftool** [*OPTIONS*] **net** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* :=
{ **show** | **list** | **attach** | **detach** | **help** }
*COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
NET COMMANDS
============
| **bpftool** **net** { **show** | **list** } [ **dev** *NAME* ]
| **bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ]
| **bpftool** **net detach** *ATTACH_TYPE* **dev** *NAME*
| **bpftool** **net help**
| **bpftool** **net** { **show** | **list** } [ **dev** *NAME* ]
| **bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ]
| **bpftool** **net detach** *ATTACH_TYPE* **dev** *NAME*
| **bpftool** **net help**
|
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
| *ATTACH_TYPE* := { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *ATTACH_TYPE* := { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** }
DESCRIPTION
===========
**bpftool net { show | list }** [ **dev** *NAME* ]
List bpf program attachments in the kernel networking subsystem.
bpftool net { show | list } [ dev *NAME* ]
List bpf program attachments in the kernel networking subsystem.
Currently, device driver xdp attachments, tcx, netkit and old-style tc
classifier/action attachments, flow_dissector as well as netfilter
attachments are implemented, i.e., for
program types **BPF_PROG_TYPE_XDP**, **BPF_PROG_TYPE_SCHED_CLS**,
**BPF_PROG_TYPE_SCHED_ACT**, **BPF_PROG_TYPE_FLOW_DISSECTOR**,
**BPF_PROG_TYPE_NETFILTER**.
Currently, device driver xdp attachments, tcx, netkit and old-style tc
classifier/action attachments, flow_dissector as well as netfilter
attachments are implemented, i.e., for program types **BPF_PROG_TYPE_XDP**,
**BPF_PROG_TYPE_SCHED_CLS**, **BPF_PROG_TYPE_SCHED_ACT**,
**BPF_PROG_TYPE_FLOW_DISSECTOR**, **BPF_PROG_TYPE_NETFILTER**.
For programs attached to a particular cgroup, e.g.,
**BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**,
**BPF_PROG_TYPE_SOCK_OPS** and **BPF_PROG_TYPE_CGROUP_SOCK_ADDR**,
users can use **bpftool cgroup** to dump cgroup attachments.
For sk_{filter, skb, msg, reuseport} and lwt/seg6
bpf programs, users should consult other tools, e.g., iproute2.
For programs attached to a particular cgroup, e.g.,
**BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**,
**BPF_PROG_TYPE_SOCK_OPS** and **BPF_PROG_TYPE_CGROUP_SOCK_ADDR**, users
can use **bpftool cgroup** to dump cgroup attachments. For sk_{filter, skb,
msg, reuseport} and lwt/seg6 bpf programs, users should consult other
tools, e.g., iproute2.
The current output will start with all xdp program attachments, followed by
all tcx, netkit, then tc class/qdisc bpf program attachments, then flow_dissector
and finally netfilter programs. Both xdp programs and tcx/netkit/tc programs are
ordered based on ifindex number. If multiple bpf programs attached
to the same networking device through **tc**, the order will be first
all bpf programs attached to tcx, netkit, then tc classes, then all bpf programs
attached to non clsact qdiscs, and finally all bpf programs attached
to root and clsact qdisc.
The current output will start with all xdp program attachments, followed by
all tcx, netkit, then tc class/qdisc bpf program attachments, then
flow_dissector and finally netfilter programs. Both xdp programs and
tcx/netkit/tc programs are ordered based on ifindex number. If multiple bpf
programs attached to the same networking device through **tc**, the order
will be first all bpf programs attached to tcx, netkit, then tc classes,
then all bpf programs attached to non clsact qdiscs, and finally all bpf
programs attached to root and clsact qdisc.
**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ]
Attach bpf program *PROG* to network interface *NAME* with
type specified by *ATTACH_TYPE*. Previously attached bpf program
can be replaced by the command used with **overwrite** option.
Currently, only XDP-related modes are supported for *ATTACH_TYPE*.
bpftool net attach *ATTACH_TYPE* *PROG* dev *NAME* [ overwrite ]
Attach bpf program *PROG* to network interface *NAME* with type specified
by *ATTACH_TYPE*. Previously attached bpf program can be replaced by the
command used with **overwrite** option. Currently, only XDP-related modes
are supported for *ATTACH_TYPE*.
*ATTACH_TYPE* can be of:
**xdp** - try native XDP and fallback to generic XDP if NIC driver does not support it;
**xdpgeneric** - Generic XDP. runs at generic XDP hook when packet already enters receive path as skb;
**xdpdrv** - Native XDP. runs earliest point in driver's receive path;
**xdpoffload** - Offload XDP. runs directly on NIC on each packet reception;
*ATTACH_TYPE* can be of:
**xdp** - try native XDP and fallback to generic XDP if NIC driver does not support it;
**xdpgeneric** - Generic XDP. runs at generic XDP hook when packet already enters receive path as skb;
**xdpdrv** - Native XDP. runs earliest point in driver's receive path;
**xdpoffload** - Offload XDP. runs directly on NIC on each packet reception;
**bpftool** **net detach** *ATTACH_TYPE* **dev** *NAME*
Detach bpf program attached to network interface *NAME* with
type specified by *ATTACH_TYPE*. To detach bpf program, same
*ATTACH_TYPE* previously used for attach must be specified.
Currently, only XDP-related modes are supported for *ATTACH_TYPE*.
bpftool net detach *ATTACH_TYPE* dev *NAME*
Detach bpf program attached to network interface *NAME* with type specified
by *ATTACH_TYPE*. To detach bpf program, same *ATTACH_TYPE* previously used
for attach must be specified. Currently, only XDP-related modes are
supported for *ATTACH_TYPE*.
**bpftool net help**
Print short help message.
bpftool net help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
EXAMPLES
========

View File

@ -14,37 +14,37 @@ tool for inspection of perf related bpf prog attachments
SYNOPSIS
========
**bpftool** [*OPTIONS*] **perf** *COMMAND*
**bpftool** [*OPTIONS*] **perf** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* :=
{ **show** | **list** | **help** }
*COMMANDS* :=
{ **show** | **list** | **help** }
PERF COMMANDS
=============
| **bpftool** **perf** { **show** | **list** }
| **bpftool** **perf help**
| **bpftool** **perf** { **show** | **list** }
| **bpftool** **perf help**
DESCRIPTION
===========
**bpftool perf { show | list }**
List all raw_tracepoint, tracepoint, kprobe attachment in the system.
bpftool perf { show | list }
List all raw_tracepoint, tracepoint, kprobe attachment in the system.
Output will start with process id and file descriptor in that process,
followed by bpf program id, attachment information, and attachment point.
The attachment point for raw_tracepoint/tracepoint is the trace probe name.
The attachment point for k[ret]probe is either symbol name and offset,
or a kernel virtual address.
The attachment point for u[ret]probe is the file name and the file offset.
Output will start with process id and file descriptor in that process,
followed by bpf program id, attachment information, and attachment point.
The attachment point for raw_tracepoint/tracepoint is the trace probe name.
The attachment point for k[ret]probe is either symbol name and offset, or a
kernel virtual address. The attachment point for u[ret]probe is the file
name and the file offset.
**bpftool perf help**
Print short help message.
bpftool perf help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
EXAMPLES
========

View File

@ -14,250 +14,226 @@ tool for inspection and simple manipulation of eBPF progs
SYNOPSIS
========
**bpftool** [*OPTIONS*] **prog** *COMMAND*
**bpftool** [*OPTIONS*] **prog** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| |
{ **-f** | **--bpffs** } | { **-m** | **--mapcompat** } | { **-n** | **--nomount** } |
{ **-L** | **--use-loader** } }
*OPTIONS* := { |COMMON_OPTIONS| |
{ **-f** | **--bpffs** } | { **-m** | **--mapcompat** } | { **-n** | **--nomount** } |
{ **-L** | **--use-loader** } }
*COMMANDS* :=
{ **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load** |
**loadall** | **help** }
*COMMANDS* :=
{ **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load** |
**loadall** | **help** }
PROG COMMANDS
=============
| **bpftool** **prog** { **show** | **list** } [*PROG*]
| **bpftool** **prog dump xlated** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] [**visual**] }]
| **bpftool** **prog dump jited** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] }]
| **bpftool** **prog pin** *PROG* *FILE*
| **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**]
| **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog tracelog**
| **bpftool** **prog run** *PROG* **data_in** *FILE* [**data_out** *FILE* [**data_size_out** *L*]] [**ctx_in** *FILE* [**ctx_out** *FILE* [**ctx_size_out** *M*]]] [**repeat** *N*]
| **bpftool** **prog profile** *PROG* [**duration** *DURATION*] *METRICs*
| **bpftool** **prog help**
| **bpftool** **prog** { **show** | **list** } [*PROG*]
| **bpftool** **prog dump xlated** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] [**visual**] }]
| **bpftool** **prog dump jited** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] }]
| **bpftool** **prog pin** *PROG* *FILE*
| **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**]
| **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog tracelog**
| **bpftool** **prog run** *PROG* **data_in** *FILE* [**data_out** *FILE* [**data_size_out** *L*]] [**ctx_in** *FILE* [**ctx_out** *FILE* [**ctx_size_out** *M*]]] [**repeat** *N*]
| **bpftool** **prog profile** *PROG* [**duration** *DURATION*] *METRICs*
| **bpftool** **prog help**
|
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *TYPE* := {
| **socket** | **kprobe** | **kretprobe** | **classifier** | **action** |
| **tracepoint** | **raw_tracepoint** | **xdp** | **perf_event** | **cgroup/skb** |
| **cgroup/sock** | **cgroup/dev** | **lwt_in** | **lwt_out** | **lwt_xmit** |
| **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** |
| **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
| **cgroup/connect4** | **cgroup/connect6** | **cgroup/connect_unix** |
| **cgroup/getpeername4** | **cgroup/getpeername6** | **cgroup/getpeername_unix** |
| **cgroup/getsockname4** | **cgroup/getsockname6** | **cgroup/getsockname_unix** |
| **cgroup/sendmsg4** | **cgroup/sendmsg6** | **cgroup/sendmsg_unix** |
| **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/recvmsg_unix** | **cgroup/sysctl** |
| **cgroup/getsockopt** | **cgroup/setsockopt** | **cgroup/sock_release** |
| **struct_ops** | **fentry** | **fexit** | **freplace** | **sk_lookup**
| }
| *ATTACH_TYPE* := {
| **sk_msg_verdict** | **sk_skb_verdict** | **sk_skb_stream_verdict** |
| **sk_skb_stream_parser** | **flow_dissector**
| }
| *METRICs* := {
| **cycles** | **instructions** | **l1d_loads** | **llc_misses** |
| **itlb_misses** | **dtlb_misses**
| }
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* | **name** *MAP_NAME* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *TYPE* := {
| **socket** | **kprobe** | **kretprobe** | **classifier** | **action** |
| **tracepoint** | **raw_tracepoint** | **xdp** | **perf_event** | **cgroup/skb** |
| **cgroup/sock** | **cgroup/dev** | **lwt_in** | **lwt_out** | **lwt_xmit** |
| **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** |
| **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
| **cgroup/connect4** | **cgroup/connect6** | **cgroup/connect_unix** |
| **cgroup/getpeername4** | **cgroup/getpeername6** | **cgroup/getpeername_unix** |
| **cgroup/getsockname4** | **cgroup/getsockname6** | **cgroup/getsockname_unix** |
| **cgroup/sendmsg4** | **cgroup/sendmsg6** | **cgroup/sendmsg_unix** |
| **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/recvmsg_unix** | **cgroup/sysctl** |
| **cgroup/getsockopt** | **cgroup/setsockopt** | **cgroup/sock_release** |
| **struct_ops** | **fentry** | **fexit** | **freplace** | **sk_lookup**
| }
| *ATTACH_TYPE* := {
| **sk_msg_verdict** | **sk_skb_verdict** | **sk_skb_stream_verdict** |
| **sk_skb_stream_parser** | **flow_dissector**
| }
| *METRICs* := {
| **cycles** | **instructions** | **l1d_loads** | **llc_misses** |
| **itlb_misses** | **dtlb_misses**
| }
DESCRIPTION
===========
**bpftool prog { show | list }** [*PROG*]
Show information about loaded programs. If *PROG* is
specified show information only about given programs,
otherwise list all programs currently loaded on the system.
In case of **tag** or **name**, *PROG* may match several
programs which will all be shown.
bpftool prog { show | list } [*PROG*]
Show information about loaded programs. If *PROG* is specified show
information only about given programs, otherwise list all programs
currently loaded on the system. In case of **tag** or **name**, *PROG* may
match several programs which will all be shown.
Output will start with program ID followed by program type and
zero or more named attributes (depending on kernel version).
Output will start with program ID followed by program type and zero or more
named attributes (depending on kernel version).
Since Linux 5.1 the kernel can collect statistics on BPF
programs (such as the total time spent running the program,
and the number of times it was run). If available, bpftool
shows such statistics. However, the kernel does not collect
them by defaults, as it slightly impacts performance on each
program run. Activation or deactivation of the feature is
performed via the **kernel.bpf_stats_enabled** sysctl knob.
Since Linux 5.1 the kernel can collect statistics on BPF programs (such as
the total time spent running the program, and the number of times it was
run). If available, bpftool shows such statistics. However, the kernel does
not collect them by defaults, as it slightly impacts performance on each
program run. Activation or deactivation of the feature is performed via the
**kernel.bpf_stats_enabled** sysctl knob.
Since Linux 5.8 bpftool is able to discover information about
processes that hold open file descriptors (FDs) against BPF
programs. On such kernels bpftool will automatically emit this
information as well.
Since Linux 5.8 bpftool is able to discover information about processes
that hold open file descriptors (FDs) against BPF programs. On such kernels
bpftool will automatically emit this information as well.
**bpftool prog dump xlated** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] [**visual**] }]
Dump eBPF instructions of the programs from the kernel. By
default, eBPF will be disassembled and printed to standard
output in human-readable format. In this case, **opcodes**
controls if raw opcodes should be printed as well.
bpftool prog dump xlated *PROG* [{ file *FILE* | [opcodes] [linum] [visual] }]
Dump eBPF instructions of the programs from the kernel. By default, eBPF
will be disassembled and printed to standard output in human-readable
format. In this case, **opcodes** controls if raw opcodes should be printed
as well.
In case of **tag** or **name**, *PROG* may match several
programs which will all be dumped. However, if **file** or
**visual** is specified, *PROG* must match a single program.
In case of **tag** or **name**, *PROG* may match several programs which
will all be dumped. However, if **file** or **visual** is specified,
*PROG* must match a single program.
If **file** is specified, the binary image will instead be
written to *FILE*.
If **file** is specified, the binary image will instead be written to
*FILE*.
If **visual** is specified, control flow graph (CFG) will be
built instead, and eBPF instructions will be presented with
CFG in DOT format, on standard output.
If **visual** is specified, control flow graph (CFG) will be built instead,
and eBPF instructions will be presented with CFG in DOT format, on standard
output.
If the programs have line_info available, the source line will
be displayed. If **linum** is specified, the filename, line
number and line column will also be displayed.
If the programs have line_info available, the source line will be
displayed. If **linum** is specified, the filename, line number and line
column will also be displayed.
**bpftool prog dump jited** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] }]
Dump jited image (host machine code) of the program.
bpftool prog dump jited *PROG* [{ file *FILE* | [opcodes] [linum] }]
Dump jited image (host machine code) of the program.
If *FILE* is specified image will be written to a file,
otherwise it will be disassembled and printed to stdout.
*PROG* must match a single program when **file** is specified.
If *FILE* is specified image will be written to a file, otherwise it will
be disassembled and printed to stdout. *PROG* must match a single program
when **file** is specified.
**opcodes** controls if raw opcodes will be printed.
**opcodes** controls if raw opcodes will be printed.
If the prog has line_info available, the source line will
be displayed. If **linum** is specified, the filename, line
number and line column will also be displayed.
If the prog has line_info available, the source line will be displayed. If
**linum** is specified, the filename, line number and line column will also
be displayed.
**bpftool prog pin** *PROG* *FILE*
Pin program *PROG* as *FILE*.
bpftool prog pin *PROG* *FILE*
Pin program *PROG* as *FILE*.
Note: *FILE* must be located in *bpffs* mount. It must not
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.
Note: *FILE* must be located in *bpffs* mount. It must not contain a dot
character ('.'), which is reserved for future extensions of *bpffs*.
**bpftool prog { load | loadall }** *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**]
Load bpf program(s) from binary *OBJ* and pin as *PATH*.
**bpftool prog load** pins only the first program from the
*OBJ* as *PATH*. **bpftool prog loadall** pins all programs
from the *OBJ* under *PATH* directory.
**type** is optional, if not specified program type will be
inferred from section names.
By default bpftool will create new maps as declared in the ELF
object being loaded. **map** parameter allows for the reuse
of existing maps. It can be specified multiple times, each
time for a different map. *IDX* refers to index of the map
to be replaced in the ELF file counting from 0, while *NAME*
allows to replace a map by name. *MAP* specifies the map to
use, referring to it by **id** or through a **pinned** file.
If **offload_dev** *NAME* is specified program will be loaded
onto given networking device (offload).
If **xdpmeta_dev** *NAME* is specified program will become
device-bound without offloading, this facilitates access
to XDP metadata.
Optional **pinmaps** argument can be provided to pin all
maps under *MAP_DIR* directory.
bpftool prog { load | loadall } *OBJ* *PATH* [type *TYPE*] [map { idx *IDX* | name *NAME* } *MAP*] [{ offload_dev | xdpmeta_dev } *NAME*] [pinmaps *MAP_DIR*] [autoattach]
Load bpf program(s) from binary *OBJ* and pin as *PATH*. **bpftool prog
load** pins only the first program from the *OBJ* as *PATH*. **bpftool prog
loadall** pins all programs from the *OBJ* under *PATH* directory. **type**
is optional, if not specified program type will be inferred from section
names. By default bpftool will create new maps as declared in the ELF
object being loaded. **map** parameter allows for the reuse of existing
maps. It can be specified multiple times, each time for a different map.
*IDX* refers to index of the map to be replaced in the ELF file counting
from 0, while *NAME* allows to replace a map by name. *MAP* specifies the
map to use, referring to it by **id** or through a **pinned** file. If
**offload_dev** *NAME* is specified program will be loaded onto given
networking device (offload). If **xdpmeta_dev** *NAME* is specified program
will become device-bound without offloading, this facilitates access to XDP
metadata. Optional **pinmaps** argument can be provided to pin all maps
under *MAP_DIR* directory.
If **autoattach** is specified program will be attached
before pin. In that case, only the link (representing the
program attached to its hook) is pinned, not the program as
such, so the path won't show in **bpftool prog show -f**,
only show in **bpftool link show -f**. Also, this only works
when bpftool (libbpf) is able to infer all necessary
information from the object file, in particular, it's not
supported for all program types. If a program does not
support autoattach, bpftool falls back to regular pinning
for that program instead.
If **autoattach** is specified program will be attached before pin. In that
case, only the link (representing the program attached to its hook) is
pinned, not the program as such, so the path won't show in **bpftool prog
show -f**, only show in **bpftool link show -f**. Also, this only works
when bpftool (libbpf) is able to infer all necessary information from the
object file, in particular, it's not supported for all program types. If a
program does not support autoattach, bpftool falls back to regular pinning
for that program instead.
Note: *PATH* must be located in *bpffs* mount. It must not
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.
Note: *PATH* must be located in *bpffs* mount. It must not contain a dot
character ('.'), which is reserved for future extensions of *bpffs*.
**bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
Attach bpf program *PROG* (with type specified by
*ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
parameter, with the exception of *flow_dissector* which is
attached to current networking name space.
bpftool prog attach *PROG* *ATTACH_TYPE* [*MAP*]
Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*). Most
*ATTACH_TYPEs* require a *MAP* parameter, with the exception of
*flow_dissector* which is attached to current networking name space.
**bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
Detach bpf program *PROG* (with type specified by
*ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
parameter, with the exception of *flow_dissector* which is
detached from the current networking name space.
bpftool prog detach *PROG* *ATTACH_TYPE* [*MAP*]
Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*). Most
*ATTACH_TYPEs* require a *MAP* parameter, with the exception of
*flow_dissector* which is detached from the current networking name space.
**bpftool prog tracelog**
Dump the trace pipe of the system to the console (stdout).
Hit <Ctrl+C> to stop printing. BPF programs can write to this
trace pipe at runtime with the **bpf_trace_printk**\ () helper.
This should be used only for debugging purposes. For
streaming data from BPF programs to user space, one can use
perf events (see also **bpftool-map**\ (8)).
bpftool prog tracelog
Dump the trace pipe of the system to the console (stdout). Hit <Ctrl+C> to
stop printing. BPF programs can write to this trace pipe at runtime with
the **bpf_trace_printk**\ () helper. This should be used only for debugging
purposes. For streaming data from BPF programs to user space, one can use
perf events (see also **bpftool-map**\ (8)).
**bpftool prog run** *PROG* **data_in** *FILE* [**data_out** *FILE* [**data_size_out** *L*]] [**ctx_in** *FILE* [**ctx_out** *FILE* [**ctx_size_out** *M*]]] [**repeat** *N*]
Run BPF program *PROG* in the kernel testing infrastructure
for BPF, meaning that the program works on the data and
context provided by the user, and not on actual packets or
monitored functions etc. Return value and duration for the
test run are printed out to the console.
bpftool prog run *PROG* data_in *FILE* [data_out *FILE* [data_size_out *L*]] [ctx_in *FILE* [ctx_out *FILE* [ctx_size_out *M*]]] [repeat *N*]
Run BPF program *PROG* in the kernel testing infrastructure for BPF,
meaning that the program works on the data and context provided by the
user, and not on actual packets or monitored functions etc. Return value
and duration for the test run are printed out to the console.
Input data is read from the *FILE* passed with **data_in**.
If this *FILE* is "**-**", input data is read from standard
input. Input context, if any, is read from *FILE* passed with
**ctx_in**. Again, "**-**" can be used to read from standard
input, but only if standard input is not already in use for
input data. If a *FILE* is passed with **data_out**, output
data is written to that file. Similarly, output context is
written to the *FILE* passed with **ctx_out**. For both
output flows, "**-**" can be used to print to the standard
output (as plain text, or JSON if relevant option was
passed). If output keywords are omitted, output data and
context are discarded. Keywords **data_size_out** and
**ctx_size_out** are used to pass the size (in bytes) for the
output buffers to the kernel, although the default of 32 kB
should be more than enough for most cases.
Input data is read from the *FILE* passed with **data_in**. If this *FILE*
is "**-**", input data is read from standard input. Input context, if any,
is read from *FILE* passed with **ctx_in**. Again, "**-**" can be used to
read from standard input, but only if standard input is not already in use
for input data. If a *FILE* is passed with **data_out**, output data is
written to that file. Similarly, output context is written to the *FILE*
passed with **ctx_out**. For both output flows, "**-**" can be used to
print to the standard output (as plain text, or JSON if relevant option was
passed). If output keywords are omitted, output data and context are
discarded. Keywords **data_size_out** and **ctx_size_out** are used to pass
the size (in bytes) for the output buffers to the kernel, although the
default of 32 kB should be more than enough for most cases.
Keyword **repeat** is used to indicate the number of
consecutive runs to perform. Note that output data and
context printed to files correspond to the last of those
runs. The duration printed out at the end of the runs is an
average over all runs performed by the command.
Keyword **repeat** is used to indicate the number of consecutive runs to
perform. Note that output data and context printed to files correspond to
the last of those runs. The duration printed out at the end of the runs is
an average over all runs performed by the command.
Not all program types support test run. Among those which do,
not all of them can take the **ctx_in**/**ctx_out**
arguments. bpftool does not perform checks on program types.
Not all program types support test run. Among those which do, not all of
them can take the **ctx_in**/**ctx_out** arguments. bpftool does not
perform checks on program types.
**bpftool prog profile** *PROG* [**duration** *DURATION*] *METRICs*
Profile *METRICs* for bpf program *PROG* for *DURATION*
seconds or until user hits <Ctrl+C>. *DURATION* is optional.
If *DURATION* is not specified, the profiling will run up to
**UINT_MAX** seconds.
bpftool prog profile *PROG* [duration *DURATION*] *METRICs*
Profile *METRICs* for bpf program *PROG* for *DURATION* seconds or until
user hits <Ctrl+C>. *DURATION* is optional. If *DURATION* is not specified,
the profiling will run up to **UINT_MAX** seconds.
**bpftool prog help**
Print short help message.
bpftool prog help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
-f, --bpffs
When showing BPF programs, show file names of pinned
programs.
-f, --bpffs
When showing BPF programs, show file names of pinned programs.
-m, --mapcompat
Allow loading maps with unknown map definitions.
-m, --mapcompat
Allow loading maps with unknown map definitions.
-n, --nomount
Do not automatically attempt to mount any virtual file system
(such as tracefs or BPF virtual file system) when necessary.
-n, --nomount
Do not automatically attempt to mount any virtual file system (such as
tracefs or BPF virtual file system) when necessary.
-L, --use-loader
Load program as a "loader" program. This is useful to debug
the generation of such programs. When this option is in
use, bpftool attempts to load the programs from the object
file into the kernel, but does not pin them (therefore, the
*PATH* must not be provided).
-L, --use-loader
Load program as a "loader" program. This is useful to debug the generation
of such programs. When this option is in use, bpftool attempts to load the
programs from the object file into the kernel, but does not pin them
(therefore, the *PATH* must not be provided).
When combined with the **-d**\ \|\ **--debug** option,
additional debug messages are generated, and the execution
of the loader program will use the **bpf_trace_printk**\ ()
helper to log each step of loading BTF, creating the maps,
and loading the programs (see **bpftool prog tracelog** as
a way to dump those messages).
When combined with the **-d**\ \|\ **--debug** option, additional debug
messages are generated, and the execution of the loader program will use
the **bpf_trace_printk**\ () helper to log each step of loading BTF,
creating the maps, and loading the programs (see **bpftool prog tracelog**
as a way to dump those messages).
EXAMPLES
========

View File

@ -14,61 +14,60 @@ tool to register/unregister/introspect BPF struct_ops
SYNOPSIS
========
**bpftool** [*OPTIONS*] **struct_ops** *COMMAND*
**bpftool** [*OPTIONS*] **struct_ops** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* :=
{ **show** | **list** | **dump** | **register** | **unregister** | **help** }
*COMMANDS* :=
{ **show** | **list** | **dump** | **register** | **unregister** | **help** }
STRUCT_OPS COMMANDS
===================
| **bpftool** **struct_ops { show | list }** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops dump** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops register** *OBJ* [*LINK_DIR*]
| **bpftool** **struct_ops unregister** *STRUCT_OPS_MAP*
| **bpftool** **struct_ops help**
| **bpftool** **struct_ops { show | list }** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops dump** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops register** *OBJ* [*LINK_DIR*]
| **bpftool** **struct_ops unregister** *STRUCT_OPS_MAP*
| **bpftool** **struct_ops help**
|
| *STRUCT_OPS_MAP* := { **id** *STRUCT_OPS_MAP_ID* | **name** *STRUCT_OPS_MAP_NAME* }
| *OBJ* := /a/file/of/bpf_struct_ops.o
| *STRUCT_OPS_MAP* := { **id** *STRUCT_OPS_MAP_ID* | **name** *STRUCT_OPS_MAP_NAME* }
| *OBJ* := /a/file/of/bpf_struct_ops.o
DESCRIPTION
===========
**bpftool struct_ops { show | list }** [*STRUCT_OPS_MAP*]
Show brief information about the struct_ops in the system.
If *STRUCT_OPS_MAP* is specified, it shows information only
for the given struct_ops. Otherwise, it lists all struct_ops
currently existing in the system.
bpftool struct_ops { show | list } [*STRUCT_OPS_MAP*]
Show brief information about the struct_ops in the system. If
*STRUCT_OPS_MAP* is specified, it shows information only for the given
struct_ops. Otherwise, it lists all struct_ops currently existing in the
system.
Output will start with struct_ops map ID, followed by its map
name and its struct_ops's kernel type.
Output will start with struct_ops map ID, followed by its map name and its
struct_ops's kernel type.
**bpftool struct_ops dump** [*STRUCT_OPS_MAP*]
Dump details information about the struct_ops in the system.
If *STRUCT_OPS_MAP* is specified, it dumps information only
for the given struct_ops. Otherwise, it dumps all struct_ops
currently existing in the system.
bpftool struct_ops dump [*STRUCT_OPS_MAP*]
Dump details information about the struct_ops in the system. If
*STRUCT_OPS_MAP* is specified, it dumps information only for the given
struct_ops. Otherwise, it dumps all struct_ops currently existing in the
system.
**bpftool struct_ops register** *OBJ* [*LINK_DIR*]
Register bpf struct_ops from *OBJ*. All struct_ops under
the ELF section ".struct_ops" and ".struct_ops.link" will
be registered to its kernel subsystem. For each
struct_ops in the ".struct_ops.link" section, a link
will be created. You can give *LINK_DIR* to provide a
directory path where these links will be pinned with the
same name as their corresponding map name.
bpftool struct_ops register *OBJ* [*LINK_DIR*]
Register bpf struct_ops from *OBJ*. All struct_ops under the ELF section
".struct_ops" and ".struct_ops.link" will be registered to its kernel
subsystem. For each struct_ops in the ".struct_ops.link" section, a link
will be created. You can give *LINK_DIR* to provide a directory path where
these links will be pinned with the same name as their corresponding map
name.
**bpftool struct_ops unregister** *STRUCT_OPS_MAP*
Unregister the *STRUCT_OPS_MAP* from the kernel subsystem.
bpftool struct_ops unregister *STRUCT_OPS_MAP*
Unregister the *STRUCT_OPS_MAP* from the kernel subsystem.
**bpftool struct_ops help**
Print short help message.
bpftool struct_ops help
Print short help message.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
EXAMPLES
========

View File

@ -14,57 +14,57 @@ tool for inspection and simple manipulation of eBPF programs and maps
SYNOPSIS
========
**bpftool** [*OPTIONS*] *OBJECT* { *COMMAND* | **help** }
**bpftool** [*OPTIONS*] *OBJECT* { *COMMAND* | **help** }
**bpftool** **batch file** *FILE*
**bpftool** **batch file** *FILE*
**bpftool** **version**
**bpftool** **version**
*OBJECT* := { **map** | **prog** | **link** | **cgroup** | **perf** | **net** | **feature** |
**btf** | **gen** | **struct_ops** | **iter** }
*OBJECT* := { **map** | **prog** | **link** | **cgroup** | **perf** | **net** | **feature** |
**btf** | **gen** | **struct_ops** | **iter** }
*OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| }
*OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| }
*MAP-COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |
**delete** | **pin** | **event_pipe** | **help** }
*MAP-COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |
**delete** | **pin** | **event_pipe** | **help** }
*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin** |
**load** | **attach** | **detach** | **help** }
*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin** |
**load** | **attach** | **detach** | **help** }
*LINK-COMMANDS* := { **show** | **list** | **pin** | **detach** | **help** }
*LINK-COMMANDS* := { **show** | **list** | **pin** | **detach** | **help** }
*CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
*CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
*PERF-COMMANDS* := { **show** | **list** | **help** }
*PERF-COMMANDS* := { **show** | **list** | **help** }
*NET-COMMANDS* := { **show** | **list** | **help** }
*NET-COMMANDS* := { **show** | **list** | **help** }
*FEATURE-COMMANDS* := { **probe** | **help** }
*FEATURE-COMMANDS* := { **probe** | **help** }
*BTF-COMMANDS* := { **show** | **list** | **dump** | **help** }
*BTF-COMMANDS* := { **show** | **list** | **dump** | **help** }
*GEN-COMMANDS* := { **object** | **skeleton** | **min_core_btf** | **help** }
*GEN-COMMANDS* := { **object** | **skeleton** | **min_core_btf** | **help** }
*STRUCT-OPS-COMMANDS* := { **show** | **list** | **dump** | **register** | **unregister** | **help** }
*STRUCT-OPS-COMMANDS* := { **show** | **list** | **dump** | **register** | **unregister** | **help** }
*ITER-COMMANDS* := { **pin** | **help** }
*ITER-COMMANDS* := { **pin** | **help** }
DESCRIPTION
===========
*bpftool* allows for inspection and simple modification of BPF objects
on the system.
*bpftool* allows for inspection and simple modification of BPF objects on the
system.
Note that format of the output of all tools is not guaranteed to be
stable and should not be depended upon.
Note that format of the output of all tools is not guaranteed to be stable and
should not be depended upon.
OPTIONS
=======
.. include:: common_options.rst
.. include:: common_options.rst
-m, --mapcompat
Allow loading maps with unknown map definitions.
-m, --mapcompat
Allow loading maps with unknown map definitions.
-n, --nomount
Do not automatically attempt to mount any virtual file system
(such as tracefs or BPF virtual file system) when necessary.
-n, --nomount
Do not automatically attempt to mount any virtual file system (such as
tracefs or BPF virtual file system) when necessary.

View File

@ -1,25 +1,23 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
-h, --help
Print short help message (similar to **bpftool help**).
Print short help message (similar to **bpftool help**).
-V, --version
Print bpftool's version number (similar to **bpftool version**), the
number of the libbpf version in use, and optional features that were
included when bpftool was compiled. Optional features include linking
against LLVM or libbfd to provide the disassembler for JIT-ted
programs (**bpftool prog dump jited**) and usage of BPF skeletons
(some features like **bpftool prog profile** or showing pids
associated to BPF objects may rely on it).
Print bpftool's version number (similar to **bpftool version**), the number
of the libbpf version in use, and optional features that were included when
bpftool was compiled. Optional features include linking against LLVM or
libbfd to provide the disassembler for JIT-ted programs (**bpftool prog
dump jited**) and usage of BPF skeletons (some features like **bpftool prog
profile** or showing pids associated to BPF objects may rely on it).
-j, --json
Generate JSON output. For commands that cannot produce JSON, this
option has no effect.
Generate JSON output. For commands that cannot produce JSON, this option
has no effect.
-p, --pretty
Generate human-readable JSON output. Implies **-j**.
Generate human-readable JSON output. Implies **-j**.
-d, --debug
Print all logs available, even debug-level information. This includes
logs from libbpf as well as from the verifier, when attempting to
load programs.
Print all logs available, even debug-level information. This includes logs
from libbpf as well as from the verifier, when attempting to load programs.

View File

@ -106,19 +106,19 @@ _bpftool_get_link_ids()
_bpftool_get_obj_map_names()
{
local obj
local obj maps
obj=$1
maps=$(objdump -j maps -t $obj 2>/dev/null | \
command awk '/g . maps/ {print $NF}')
maps=$(objdump -j .maps -t $obj 2>/dev/null | \
command awk '/g . .maps/ {print $NF}')
COMPREPLY+=( $( compgen -W "$maps" -- "$cur" ) )
}
_bpftool_get_obj_map_idxs()
{
local obj
local obj nmaps
obj=$1
@ -136,7 +136,7 @@ _sysfs_get_netdevs()
# Retrieve type of the map that we are operating on.
_bpftool_map_guess_map_type()
{
local keyword ref
local keyword idx ref=""
for (( idx=3; idx < ${#words[@]}-1; idx++ )); do
case "${words[$((idx-2))]}" in
lookup|update)
@ -255,8 +255,9 @@ _bpftool_map_update_get_name()
_bpftool()
{
local cur prev words objword json=0
_init_completion || return
local cur prev words cword comp_args
local json=0
_init_completion -- "$@" || return
# Deal with options
if [[ ${words[cword]} == -* ]]; then
@ -293,7 +294,7 @@ _bpftool()
esac
# Remove all options so completions don't have to deal with them.
local i
local i pprev
for (( i=1; i < ${#words[@]}; )); do
if [[ ${words[i]::1} == - ]] &&
[[ ${words[i]} != "-B" ]] && [[ ${words[i]} != "--base-btf" ]]; then
@ -307,7 +308,7 @@ _bpftool()
prev=${words[cword - 1]}
pprev=${words[cword - 2]}
local object=${words[1]} command=${words[2]}
local object=${words[1]}
if [[ -z $object || $cword -eq 1 ]]; then
case $cur in
@ -324,8 +325,12 @@ _bpftool()
esac
fi
local command=${words[2]}
[[ $command == help ]] && return 0
local MAP_TYPE='id pinned name'
local PROG_TYPE='id pinned tag name'
# Completion depends on object and command in use
case $object in
prog)
@ -346,8 +351,6 @@ _bpftool()
;;
esac
local PROG_TYPE='id pinned tag name'
local MAP_TYPE='id pinned name'
local METRIC_TYPE='cycles instructions l1d_loads llc_misses \
itlb_misses dtlb_misses'
case $command in
@ -457,7 +460,7 @@ _bpftool()
obj=${words[3]}
if [[ ${words[-4]} == "map" ]]; then
COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
return 0
fi
if [[ ${words[-3]} == "map" ]]; then
@ -541,20 +544,9 @@ _bpftool()
COMPREPLY=( $( compgen -W "$METRIC_TYPE duration" -- "$cur" ) )
return 0
;;
6)
case $prev in
duration)
return 0
;;
*)
COMPREPLY=( $( compgen -W "$METRIC_TYPE" -- "$cur" ) )
return 0
;;
esac
return 0
;;
*)
COMPREPLY=( $( compgen -W "$METRIC_TYPE" -- "$cur" ) )
[[ $prev == duration ]] && return 0
_bpftool_once_attr "$METRIC_TYPE"
return 0
;;
esac
@ -612,7 +604,7 @@ _bpftool()
return 0
;;
register)
_filedir
[[ $prev == $command ]] && _filedir
return 0
;;
*)
@ -638,9 +630,12 @@ _bpftool()
pinned)
_filedir
;;
*)
map)
_bpftool_one_of_list $MAP_TYPE
;;
*)
_bpftool_once_attr 'map'
;;
esac
return 0
;;
@ -652,7 +647,6 @@ _bpftool()
esac
;;
map)
local MAP_TYPE='id pinned name'
case $command in
show|list|dump|peek|pop|dequeue|freeze)
case $prev in
@ -793,13 +787,11 @@ _bpftool()
# map, depending on the type of the map to update.
case "$(_bpftool_map_guess_map_type)" in
array_of_maps|hash_of_maps)
local MAP_TYPE='id pinned name'
COMPREPLY+=( $( compgen -W "$MAP_TYPE" \
-- "$cur" ) )
return 0
;;
prog_array)
local PROG_TYPE='id pinned tag name'
COMPREPLY+=( $( compgen -W "$PROG_TYPE" \
-- "$cur" ) )
return 0
@ -821,7 +813,7 @@ _bpftool()
esac
_bpftool_once_attr 'key'
local UPDATE_FLAGS='any exist noexist'
local UPDATE_FLAGS='any exist noexist' idx
for (( idx=3; idx < ${#words[@]}-1; idx++ )); do
if [[ ${words[idx]} == 'value' ]]; then
# 'value' is present, but is not the last
@ -893,7 +885,6 @@ _bpftool()
esac
;;
btf)
local PROG_TYPE='id pinned tag name'
local MAP_TYPE='id pinned name'
case $command in
dump)
@ -1033,7 +1024,6 @@ _bpftool()
local BPFTOOL_CGROUP_ATTACH_TYPES="$(bpftool feature list_builtins attach_types 2>/dev/null | \
grep '^cgroup_')"
local ATTACH_FLAGS='multi override'
local PROG_TYPE='id pinned tag name'
# Check for $prev = $command first
if [ $prev = $command ]; then
_filedir
@ -1086,7 +1076,6 @@ _bpftool()
esac
;;
net)
local PROG_TYPE='id pinned tag name'
local ATTACH_TYPES='xdp xdpgeneric xdpdrv xdpoffload'
case $command in
show|list)
@ -1193,14 +1182,14 @@ _bpftool()
pin|detach)
if [[ $prev == "$command" ]]; then
COMPREPLY=( $( compgen -W "$LINK_TYPE" -- "$cur" ) )
else
elif [[ $pprev == "$command" ]]; then
_filedir
fi
return 0
;;
*)
[[ $prev == $object ]] && \
COMPREPLY=( $( compgen -W 'help pin show list' -- "$cur" ) )
COMPREPLY=( $( compgen -W 'help pin detach show list' -- "$cur" ) )
;;
esac
;;

View File

@ -244,29 +244,101 @@ int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type)
return fd;
}
int mount_bpffs_for_pin(const char *name, bool is_dir)
int create_and_mount_bpffs_dir(const char *dir_name)
{
char err_str[ERR_MAX_LEN];
char *file;
bool dir_exists;
int err = 0;
if (is_bpffs(dir_name))
return err;
dir_exists = access(dir_name, F_OK) == 0;
if (!dir_exists) {
char *temp_name;
char *parent_name;
temp_name = strdup(dir_name);
if (!temp_name) {
p_err("mem alloc failed");
return -1;
}
parent_name = dirname(temp_name);
if (is_bpffs(parent_name)) {
/* nothing to do if already mounted */
free(temp_name);
return err;
}
if (access(parent_name, F_OK) == -1) {
p_err("can't create dir '%s' to pin BPF object: parent dir '%s' doesn't exist",
dir_name, parent_name);
free(temp_name);
return -1;
}
free(temp_name);
}
if (block_mount) {
p_err("no BPF file system found, not mounting it due to --nomount option");
return -1;
}
if (!dir_exists) {
err = mkdir(dir_name, S_IRWXU);
if (err) {
p_err("failed to create dir '%s': %s", dir_name, strerror(errno));
return err;
}
}
err = mnt_fs(dir_name, "bpf", err_str, ERR_MAX_LEN);
if (err) {
err_str[ERR_MAX_LEN - 1] = '\0';
p_err("can't mount BPF file system on given dir '%s': %s",
dir_name, err_str);
if (!dir_exists)
rmdir(dir_name);
}
return err;
}
int mount_bpffs_for_file(const char *file_name)
{
char err_str[ERR_MAX_LEN];
char *temp_name;
char *dir;
int err = 0;
if (is_dir && is_bpffs(name))
return err;
if (access(file_name, F_OK) != -1) {
p_err("can't pin BPF object: path '%s' already exists", file_name);
return -1;
}
file = malloc(strlen(name) + 1);
if (!file) {
temp_name = strdup(file_name);
if (!temp_name) {
p_err("mem alloc failed");
return -1;
}
strcpy(file, name);
dir = dirname(file);
dir = dirname(temp_name);
if (is_bpffs(dir))
/* nothing to do if already mounted */
goto out_free;
if (access(dir, F_OK) == -1) {
p_err("can't pin BPF object: dir '%s' doesn't exist", dir);
err = -1;
goto out_free;
}
if (block_mount) {
p_err("no BPF file system found, not mounting it due to --nomount option");
err = -1;
@ -276,12 +348,12 @@ int mount_bpffs_for_pin(const char *name, bool is_dir)
err = mnt_fs(dir, "bpf", err_str, ERR_MAX_LEN);
if (err) {
err_str[ERR_MAX_LEN - 1] = '\0';
p_err("can't mount BPF file system to pin the object (%s): %s",
name, err_str);
p_err("can't mount BPF file system to pin the object '%s': %s",
file_name, err_str);
}
out_free:
free(file);
free(temp_name);
return err;
}
@ -289,7 +361,7 @@ int do_pin_fd(int fd, const char *name)
{
int err;
err = mount_bpffs_for_pin(name, false);
err = mount_bpffs_for_file(name);
if (err)
return err;

View File

@ -664,7 +664,8 @@ probe_helper_ifindex(enum bpf_func_id id, enum bpf_prog_type prog_type,
probe_prog_load_ifindex(prog_type, insns, ARRAY_SIZE(insns), buf,
sizeof(buf), ifindex);
res = !grep(buf, "invalid func ") && !grep(buf, "unknown func ");
res = !grep(buf, "invalid func ") && !grep(buf, "unknown func ") &&
!grep(buf, "program of this type cannot use helper ");
switch (get_vendor_id(ifindex)) {
case 0x19ee: /* Netronome specific */

View File

@ -386,7 +386,7 @@ static int codegen_subskel_datasecs(struct bpf_object *obj, const char *obj_name
*/
needs_typeof = btf_is_array(var) || btf_is_ptr_to_func_proto(btf, var);
if (needs_typeof)
printf("typeof(");
printf("__typeof__(");
err = btf_dump__emit_type_decl(d, var_type_id, &opts);
if (err)
@ -1131,7 +1131,7 @@ static void gen_st_ops_shadow_init(struct btf *btf, struct bpf_object *obj)
continue;
codegen("\
\n\
obj->struct_ops.%1$s = (typeof(obj->struct_ops.%1$s))\n\
obj->struct_ops.%1$s = (__typeof__(obj->struct_ops.%1$s))\n\
bpf_map__initial_value(obj->maps.%1$s, NULL);\n\
\n\
", ident);

View File

@ -76,7 +76,7 @@ static int do_pin(int argc, char **argv)
goto close_obj;
}
err = mount_bpffs_for_pin(path, false);
err = mount_bpffs_for_file(path);
if (err)
goto close_link;

View File

@ -526,6 +526,10 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
show_link_ifindex_json(info->netkit.ifindex, json_wtr);
show_link_attach_type_json(info->netkit.attach_type, json_wtr);
break;
case BPF_LINK_TYPE_SOCKMAP:
jsonw_uint_field(json_wtr, "map_id", info->sockmap.map_id);
show_link_attach_type_json(info->sockmap.attach_type, json_wtr);
break;
case BPF_LINK_TYPE_XDP:
show_link_ifindex_json(info->xdp.ifindex, json_wtr);
break;
@ -915,6 +919,11 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
show_link_ifindex_plain(info->netkit.ifindex);
show_link_attach_type_plain(info->netkit.attach_type);
break;
case BPF_LINK_TYPE_SOCKMAP:
printf("\n\t");
printf("map_id %u ", info->sockmap.map_id);
show_link_attach_type_plain(info->sockmap.attach_type);
break;
case BPF_LINK_TYPE_XDP:
printf("\n\t");
show_link_ifindex_plain(info->xdp.ifindex);

View File

@ -142,7 +142,8 @@ const char *get_fd_type_name(enum bpf_obj_type type);
char *get_fdinfo(int fd, const char *key);
int open_obj_pinned(const char *path, bool quiet);
int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type);
int mount_bpffs_for_pin(const char *name, bool is_dir);
int mount_bpffs_for_file(const char *file_name);
int create_and_mount_bpffs_dir(const char *dir_name);
int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(int *, char ***));
int do_pin_fd(int fd, const char *name);

View File

@ -1778,7 +1778,10 @@ offload_dev:
goto err_close_obj;
}
err = mount_bpffs_for_pin(pinfile, !first_prog_only);
if (first_prog_only)
err = mount_bpffs_for_file(pinfile);
else
err = create_and_mount_bpffs_dir(pinfile);
if (err)
goto err_close_obj;
@ -2078,7 +2081,7 @@ static int profile_parse_metrics(int argc, char **argv)
NEXT_ARG();
}
if (selected_cnt > MAX_NUM_PROFILE_METRICS) {
p_err("too many (%d) metrics, please specify no more than %d metrics at at time",
p_err("too many (%d) metrics, please specify no more than %d metrics at a time",
selected_cnt, MAX_NUM_PROFILE_METRICS);
return -1;
}

View File

@ -515,7 +515,7 @@ static int do_register(int argc, char **argv)
if (argc == 1)
linkdir = GET_ARG();
if (linkdir && mount_bpffs_for_pin(linkdir, true)) {
if (linkdir && create_and_mount_bpffs_dir(linkdir)) {
p_err("can't mount bpffs for pinning");
return -1;
}

View File

@ -111,6 +111,24 @@
.off = 0, \
.imm = IMM })
/* Short form of movsx, dst_reg = (s8,s16,s32)src_reg */
#define BPF_MOVSX64_REG(DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
#define BPF_MOVSX32_REG(DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \

View File

@ -1135,6 +1135,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_TCX = 11,
BPF_LINK_TYPE_UPROBE_MULTI = 12,
BPF_LINK_TYPE_NETKIT = 13,
BPF_LINK_TYPE_SOCKMAP = 14,
__MAX_BPF_LINK_TYPE,
};
@ -3394,6 +3395,10 @@ union bpf_attr {
* for the nexthop. If the src addr cannot be derived,
* **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this
* case, *params*->dmac and *params*->smac are not set either.
* **BPF_FIB_LOOKUP_MARK**
* Use the mark present in *params*->mark for the fib lookup.
* This option should not be used with BPF_FIB_LOOKUP_DIRECT,
* as it only has meaning for full lookups.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs.
@ -5022,7 +5027,7 @@ union bpf_attr {
* bytes will be copied to *dst*
* Return
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
* **-EOPNOTSUPP** if IMA is disabled or **-EINVAL** if
* invalid arguments are passed.
*
* struct socket *bpf_sock_from_file(struct file *file)
@ -5508,7 +5513,7 @@ union bpf_attr {
* bytes will be copied to *dst*
* Return
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* **-EOPNOTSUPP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
*
* void *bpf_kptr_xchg(void *map_value, void *ptr)
@ -6720,6 +6725,10 @@ struct bpf_link_info {
__u32 ifindex;
__u32 attach_type;
} netkit;
struct {
__u32 map_id;
__u32 attach_type;
} sockmap;
};
} __attribute__((aligned(8)));
@ -6938,6 +6947,8 @@ enum {
* socket transition to LISTEN state.
*/
BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
* Arg1: measured RTT input (mrtt)
* Arg2: updated srtt
*/
BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option.
* It will be called to handle
@ -7120,6 +7131,7 @@ enum {
BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
BPF_FIB_LOOKUP_TBID = (1U << 3),
BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
};
enum {
@ -7152,7 +7164,7 @@ struct bpf_fib_lookup {
/* output: MTU value */
__u16 mtu_result;
};
} __attribute__((packed, aligned(2)));
/* input: L3 device index for lookup
* output: device index from FIB lookup
*/
@ -7197,8 +7209,19 @@ struct bpf_fib_lookup {
__u32 tbid;
};
__u8 smac[6]; /* ETH_ALEN */
__u8 dmac[6]; /* ETH_ALEN */
union {
/* input */
struct {
__u32 mark; /* policy routing */
/* 2 4-byte holes for input */
};
/* output: source and dest mac */
struct {
__u8 smac[6]; /* ETH_ALEN */
__u8 dmac[6]; /* ETH_ALEN */
};
};
};
struct bpf_redir_neigh {
@ -7285,6 +7308,10 @@ struct bpf_timer {
__u64 __opaque[2];
} __attribute__((aligned(8)));
struct bpf_wq {
__u64 __opaque[2];
} __attribute__((aligned(8)));
struct bpf_dynptr {
__u64 __opaque[2];
} __attribute__((aligned(8)));

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,7 @@
#ifndef __BPF_CORE_READ_H__
#define __BPF_CORE_READ_H__
#include <bpf/bpf_helpers.h>
#include "bpf_helpers.h"
/*
* enum bpf_field_info_kind is passed as a second argument into

View File

@ -137,7 +137,8 @@
/*
* Helper function to perform a tail call with a constant/immediate map slot.
*/
#if __clang_major__ >= 8 && defined(__bpf__)
#if (defined(__clang__) && __clang_major__ >= 8) || (!defined(__clang__) && __GNUC__ > 12)
#if defined(__bpf__)
static __always_inline void
bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
{
@ -165,6 +166,7 @@ bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
: "r0", "r1", "r2", "r3", "r4", "r5");
}
#endif
#endif
enum libbpf_pin_type {
LIBBPF_PIN_NONE,

View File

@ -1929,6 +1929,7 @@ static int btf_dump_int_data(struct btf_dump *d,
if (d->typed_dump->is_array_terminated)
break;
if (*(char *)data == '\0') {
btf_dump_type_values(d, "'\\0'");
d->typed_dump->is_array_terminated = true;
break;
}
@ -2031,6 +2032,7 @@ static int btf_dump_array_data(struct btf_dump *d,
__u32 i, elem_type_id;
__s64 elem_size;
bool is_array_member;
bool is_array_terminated;
elem_type_id = array->type;
elem_type = skip_mods_and_typedefs(d->btf, elem_type_id, NULL);
@ -2066,12 +2068,15 @@ static int btf_dump_array_data(struct btf_dump *d,
*/
is_array_member = d->typed_dump->is_array_member;
d->typed_dump->is_array_member = true;
is_array_terminated = d->typed_dump->is_array_terminated;
d->typed_dump->is_array_terminated = false;
for (i = 0; i < array->nelems; i++, data += elem_size) {
if (d->typed_dump->is_array_terminated)
break;
btf_dump_dump_type_data(d, NULL, elem_type, elem_type_id, data, 0, 0);
}
d->typed_dump->is_array_member = is_array_member;
d->typed_dump->is_array_terminated = is_array_terminated;
d->typed_dump->depth--;
btf_dump_data_pfx(d);
btf_dump_type_values(d, "]");

View File

@ -149,6 +149,7 @@ static const char * const link_type_name[] = {
[BPF_LINK_TYPE_TCX] = "tcx",
[BPF_LINK_TYPE_UPROBE_MULTI] = "uprobe_multi",
[BPF_LINK_TYPE_NETKIT] = "netkit",
[BPF_LINK_TYPE_SOCKMAP] = "sockmap",
};
static const char * const map_type_name[] = {
@ -1970,6 +1971,20 @@ static struct extern_desc *find_extern_by_name(const struct bpf_object *obj,
return NULL;
}
static struct extern_desc *find_extern_by_name_with_len(const struct bpf_object *obj,
const void *name, int len)
{
const char *ext_name;
int i;
for (i = 0; i < obj->nr_extern; i++) {
ext_name = obj->externs[i].name;
if (strlen(ext_name) == len && strncmp(ext_name, name, len) == 0)
return &obj->externs[i];
}
return NULL;
}
static int set_kcfg_value_tri(struct extern_desc *ext, void *ext_val,
char value)
{
@ -7986,7 +8001,10 @@ static int bpf_object__sanitize_maps(struct bpf_object *obj)
return 0;
}
int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *ctx)
typedef int (*kallsyms_cb_t)(unsigned long long sym_addr, char sym_type,
const char *sym_name, void *ctx);
static int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *ctx)
{
char sym_type, sym_name[500];
unsigned long long sym_addr;
@ -8026,8 +8044,13 @@ static int kallsyms_cb(unsigned long long sym_addr, char sym_type,
struct bpf_object *obj = ctx;
const struct btf_type *t;
struct extern_desc *ext;
char *res;
ext = find_extern_by_name(obj, sym_name);
res = strstr(sym_name, ".llvm.");
if (sym_type == 'd' && res)
ext = find_extern_by_name_with_len(obj, sym_name, res - sym_name);
else
ext = find_extern_by_name(obj, sym_name);
if (!ext || ext->type != EXT_KSYM)
return 0;
@ -12511,6 +12534,12 @@ bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
}
struct bpf_link *
bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd)
{
return bpf_program_attach_fd(prog, map_fd, "sockmap", NULL);
}
struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
{
/* target_fd/target_ifindex use the same field in LINK_CREATE */

View File

@ -795,6 +795,8 @@ bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);
LIBBPF_API struct bpf_link *
bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd);
LIBBPF_API struct bpf_link *
bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd);
LIBBPF_API struct bpf_link *
bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex);
LIBBPF_API struct bpf_link *
bpf_program__attach_freplace(const struct bpf_program *prog,
@ -1293,6 +1295,7 @@ LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
ring_buffer_sample_fn sample_cb, void *ctx);
LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__consume_n(struct ring_buffer *rb, size_t n);
LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
/**
@ -1367,6 +1370,17 @@ LIBBPF_API int ring__map_fd(const struct ring *r);
*/
LIBBPF_API int ring__consume(struct ring *r);
/**
* @brief **ring__consume_n()** consumes up to a requested amount of items from
* a ringbuffer without event polling.
*
* @param r A ringbuffer object.
* @param n Maximum amount of items to consume.
* @return The number of items consumed, or a negative number if any of the
* callbacks return an error.
*/
LIBBPF_API int ring__consume_n(struct ring *r, size_t n);
struct user_ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatibility */
};

View File

@ -416,3 +416,10 @@ LIBBPF_1.4.0 {
btf__new_split;
btf_ext__raw_data;
} LIBBPF_1.3.0;
LIBBPF_1.5.0 {
global:
bpf_program__attach_sockmap;
ring__consume_n;
ring_buffer__consume_n;
} LIBBPF_1.4.0;

View File

@ -518,11 +518,6 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
__s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name,
__u32 kind);
typedef int (*kallsyms_cb_t)(unsigned long long sym_addr, char sym_type,
const char *sym_name, void *ctx);
int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *arg);
/* handle direct returned errors */
static inline int libbpf_err(int ret)
{

View File

@ -448,7 +448,8 @@ int libbpf_probe_bpf_helper(enum bpf_prog_type prog_type, enum bpf_func_id helpe
/* If BPF verifier doesn't recognize BPF helper ID (enum bpf_func_id)
* at all, it will emit something like "invalid func unknown#181".
* If BPF verifier recognizes BPF helper but it's not supported for
* given BPF program type, it will emit "unknown func bpf_sys_bpf#166".
* given BPF program type, it will emit "unknown func bpf_sys_bpf#166"
* or "program of this type cannot use helper bpf_sys_bpf#166".
* In both cases, provided combination of BPF program type and BPF
* helper is not supported by the kernel.
* In all other cases, probe_prog_load() above will either succeed (e.g.,
@ -457,7 +458,8 @@ int libbpf_probe_bpf_helper(enum bpf_prog_type prog_type, enum bpf_func_id helpe
* that), or we'll get some more specific BPF verifier error about
* some unsatisfied conditions.
*/
if (ret == 0 && (strstr(buf, "invalid func ") || strstr(buf, "unknown func ")))
if (ret == 0 && (strstr(buf, "invalid func ") || strstr(buf, "unknown func ") ||
strstr(buf, "program of this type cannot use helper ")))
return 0;
return 1; /* assume supported */
}

View File

@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 4
#define LIBBPF_MINOR_VERSION 5
#endif /* __LIBBPF_VERSION_H */

View File

@ -231,7 +231,7 @@ static inline int roundup_len(__u32 len)
return (len + 7) / 8 * 8;
}
static int64_t ringbuf_process_ring(struct ring *r)
static int64_t ringbuf_process_ring(struct ring *r, size_t n)
{
int *len_ptr, len, err;
/* 64-bit to avoid overflow in case of extreme application behavior */
@ -268,12 +268,42 @@ static int64_t ringbuf_process_ring(struct ring *r)
}
smp_store_release(r->consumer_pos, cons_pos);
if (cnt >= n)
goto done;
}
} while (got_new_data);
done:
return cnt;
}
/* Consume available ring buffer(s) data without event polling, up to n
* records.
*
* Returns number of records consumed across all registered ring buffers (or
* n, whichever is less), or negative number if any of the callbacks return
* error.
*/
int ring_buffer__consume_n(struct ring_buffer *rb, size_t n)
{
int64_t err, res = 0;
int i;
for (i = 0; i < rb->ring_cnt; i++) {
struct ring *ring = rb->rings[i];
err = ringbuf_process_ring(ring, n);
if (err < 0)
return libbpf_err(err);
res += err;
n -= err;
if (n == 0)
break;
}
return res;
}
/* Consume available ring buffer(s) data without event polling.
* Returns number of records consumed across all registered ring buffers (or
* INT_MAX, whichever is less), or negative number if any of the callbacks
@ -287,13 +317,15 @@ int ring_buffer__consume(struct ring_buffer *rb)
for (i = 0; i < rb->ring_cnt; i++) {
struct ring *ring = rb->rings[i];
err = ringbuf_process_ring(ring);
err = ringbuf_process_ring(ring, INT_MAX);
if (err < 0)
return libbpf_err(err);
res += err;
if (res > INT_MAX) {
res = INT_MAX;
break;
}
}
if (res > INT_MAX)
return INT_MAX;
return res;
}
@ -314,13 +346,13 @@ int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
__u32 ring_id = rb->events[i].data.fd;
struct ring *ring = rb->rings[ring_id];
err = ringbuf_process_ring(ring);
err = ringbuf_process_ring(ring, INT_MAX);
if (err < 0)
return libbpf_err(err);
res += err;
}
if (res > INT_MAX)
return INT_MAX;
res = INT_MAX;
return res;
}
@ -371,17 +403,22 @@ int ring__map_fd(const struct ring *r)
return r->map_fd;
}
int ring__consume(struct ring *r)
int ring__consume_n(struct ring *r, size_t n)
{
int64_t res;
int res;
res = ringbuf_process_ring(r);
res = ringbuf_process_ring(r, n);
if (res < 0)
return libbpf_err(res);
return res > INT_MAX ? INT_MAX : res;
}
int ring__consume(struct ring *r)
{
return ring__consume_n(r, INT_MAX);
}
static void user_ringbuf_unmap_ring(struct user_ring_buffer *rb)
{
if (rb->consumer_pos) {

View File

@ -10,5 +10,4 @@ fill_link_info/kprobe_multi_link_info # bpf_program__attach_kprobe_mu
fill_link_info/kretprobe_multi_link_info # bpf_program__attach_kprobe_multi_opts unexpected error: -95
fill_link_info/kprobe_multi_invalid_ubuff # bpf_program__attach_kprobe_multi_opts unexpected error: -95
missed/kprobe_recursion # missed_kprobe_recursion__attach unexpected error: -95 (errno 95)
verifier_arena # JIT does not support arena
arena_htab # JIT does not support arena
arena_atomics

View File

@ -6,3 +6,4 @@ stacktrace_build_id # compare_map_keys stackid_hmap vs. sta
verifier_iterating_callbacks
verifier_arena # JIT does not support arena
arena_htab # JIT does not support arena
arena_atomics

View File

@ -278,11 +278,12 @@ UNPRIV_HELPERS := $(OUTPUT)/unpriv_helpers.o
TRACE_HELPERS := $(OUTPUT)/trace_helpers.o
JSON_WRITER := $(OUTPUT)/json_writer.o
CAP_HELPERS := $(OUTPUT)/cap_helpers.o
NETWORK_HELPERS := $(OUTPUT)/network_helpers.o
$(OUTPUT)/test_dev_cgroup: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_skb_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_sock: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_sock_addr: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_sock_addr: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(NETWORK_HELPERS)
$(OUTPUT)/test_sockmap: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELPERS)
$(OUTPUT)/get_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS)
@ -443,7 +444,7 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \
LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c \
trace_printk.c trace_vprintk.c map_ptr_kern.c \
core_kern.c core_kern_overflow.c test_ringbuf.c \
test_ringbuf_map_key.c
test_ringbuf_n.c test_ringbuf_map_key.c
# Generate both light skeleton and libbpf skeleton for these
LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \
@ -646,7 +647,7 @@ $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
# Define test_progs-cpuv4 test runner.
ifneq ($(CLANG_CPUV4),)
TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
$(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4))
endif
@ -683,7 +684,7 @@ $(OUTPUT)/test_verifier: test_verifier.c verifier/tests.h $(BPFOBJ) | $(OUTPUT)
# Include find_bit.c to compile xskxceiver.
EXTRA_SRC := $(TOOLSDIR)/lib/find_bit.c
$(OUTPUT)/xskxceiver: $(EXTRA_SRC) xskxceiver.c xskxceiver.h $(OUTPUT)/xsk.o $(OUTPUT)/xsk_xdp_progs.skel.h $(BPFOBJ) | $(OUTPUT)
$(OUTPUT)/xskxceiver: $(EXTRA_SRC) xskxceiver.c xskxceiver.h $(OUTPUT)/network_helpers.o $(OUTPUT)/xsk.o $(OUTPUT)/xsk_xdp_progs.skel.h $(BPFOBJ) | $(OUTPUT)
$(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
@ -717,6 +718,7 @@ $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o: $(OUTPUT)/local_storage_rcu_tas
$(OUTPUT)/bench_local_storage_create.o: $(OUTPUT)/bench_local_storage_create.skel.h
$(OUTPUT)/bench_bpf_hashmap_lookup.o: $(OUTPUT)/bpf_hashmap_lookup.skel.h
$(OUTPUT)/bench_htab_mem.o: $(OUTPUT)/htab_mem_bench.skel.h
$(OUTPUT)/bench_bpf_crypto.o: $(OUTPUT)/crypto_bench.skel.h
$(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
$(OUTPUT)/bench: LDLIBS += -lm
$(OUTPUT)/bench: $(OUTPUT)/bench.o \
@ -736,6 +738,7 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
$(OUTPUT)/bench_bpf_hashmap_lookup.o \
$(OUTPUT)/bench_local_storage_create.o \
$(OUTPUT)/bench_htab_mem.o \
$(OUTPUT)/bench_bpf_crypto.o \
#
$(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
@ -747,7 +750,7 @@ $(OUTPUT)/veristat: $(OUTPUT)/veristat.o
$(OUTPUT)/uprobe_multi: uprobe_multi.c
$(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $^ $(LDLIBS) -o $@
$(Q)$(CC) $(CFLAGS) -O0 $(LDFLAGS) $^ $(LDLIBS) -o $@
EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \
prog_tests/tests.h map_tests/tests.h verifier/tests.h \

View File

@ -280,6 +280,8 @@ extern struct argp bench_strncmp_argp;
extern struct argp bench_hashmap_lookup_argp;
extern struct argp bench_local_storage_create_argp;
extern struct argp bench_htab_mem_argp;
extern struct argp bench_trigger_batch_argp;
extern struct argp bench_crypto_argp;
static const struct argp_child bench_parsers[] = {
{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
@ -292,6 +294,8 @@ static const struct argp_child bench_parsers[] = {
{ &bench_hashmap_lookup_argp, 0, "Hashmap lookup benchmark", 0 },
{ &bench_local_storage_create_argp, 0, "local-storage-create benchmark", 0 },
{ &bench_htab_mem_argp, 0, "hash map memory benchmark", 0 },
{ &bench_trigger_batch_argp, 0, "BPF triggering benchmark", 0 },
{ &bench_crypto_argp, 0, "bpf crypto benchmark", 0 },
{},
};
@ -491,24 +495,31 @@ extern const struct bench bench_rename_kretprobe;
extern const struct bench bench_rename_rawtp;
extern const struct bench bench_rename_fentry;
extern const struct bench bench_rename_fexit;
extern const struct bench bench_trig_base;
extern const struct bench bench_trig_tp;
extern const struct bench bench_trig_rawtp;
/* pure counting benchmarks to establish theoretical lmits */
extern const struct bench bench_trig_usermode_count;
extern const struct bench bench_trig_syscall_count;
extern const struct bench bench_trig_kernel_count;
/* batched, staying mostly in-kernel benchmarks */
extern const struct bench bench_trig_kprobe;
extern const struct bench bench_trig_kretprobe;
extern const struct bench bench_trig_kprobe_multi;
extern const struct bench bench_trig_kretprobe_multi;
extern const struct bench bench_trig_fentry;
extern const struct bench bench_trig_fexit;
extern const struct bench bench_trig_fentry_sleep;
extern const struct bench bench_trig_fmodret;
extern const struct bench bench_trig_uprobe_base;
extern const struct bench bench_trig_tp;
extern const struct bench bench_trig_rawtp;
/* uprobe/uretprobe benchmarks */
extern const struct bench bench_trig_uprobe_nop;
extern const struct bench bench_trig_uretprobe_nop;
extern const struct bench bench_trig_uprobe_push;
extern const struct bench bench_trig_uretprobe_push;
extern const struct bench bench_trig_uprobe_ret;
extern const struct bench bench_trig_uretprobe_ret;
extern const struct bench bench_rb_libbpf;
extern const struct bench bench_rb_custom;
extern const struct bench bench_pb_libbpf;
@ -529,6 +540,8 @@ extern const struct bench bench_local_storage_tasks_trace;
extern const struct bench bench_bpf_hashmap_lookup;
extern const struct bench bench_local_storage_create;
extern const struct bench bench_htab_mem;
extern const struct bench bench_crypto_encrypt;
extern const struct bench bench_crypto_decrypt;
static const struct bench *benchs[] = {
&bench_count_global,
@ -539,24 +552,28 @@ static const struct bench *benchs[] = {
&bench_rename_rawtp,
&bench_rename_fentry,
&bench_rename_fexit,
&bench_trig_base,
&bench_trig_tp,
&bench_trig_rawtp,
/* pure counting benchmarks for establishing theoretical limits */
&bench_trig_usermode_count,
&bench_trig_kernel_count,
&bench_trig_syscall_count,
/* batched, staying mostly in-kernel triggers */
&bench_trig_kprobe,
&bench_trig_kretprobe,
&bench_trig_kprobe_multi,
&bench_trig_kretprobe_multi,
&bench_trig_fentry,
&bench_trig_fexit,
&bench_trig_fentry_sleep,
&bench_trig_fmodret,
&bench_trig_uprobe_base,
&bench_trig_tp,
&bench_trig_rawtp,
/* uprobes */
&bench_trig_uprobe_nop,
&bench_trig_uretprobe_nop,
&bench_trig_uprobe_push,
&bench_trig_uretprobe_push,
&bench_trig_uprobe_ret,
&bench_trig_uretprobe_ret,
/* ringbuf/perfbuf benchmarks */
&bench_rb_libbpf,
&bench_rb_custom,
&bench_pb_libbpf,
@ -577,6 +594,8 @@ static const struct bench *benchs[] = {
&bench_bpf_hashmap_lookup,
&bench_local_storage_create,
&bench_htab_mem,
&bench_crypto_encrypt,
&bench_crypto_decrypt,
};
static void find_benchmark(void)

View File

@ -0,0 +1,185 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <argp.h>
#include "bench.h"
#include "crypto_bench.skel.h"
#define MAX_CIPHER_LEN 32
static char *input;
static struct crypto_ctx {
struct crypto_bench *skel;
int pfd;
} ctx;
static struct crypto_args {
u32 crypto_len;
char *crypto_cipher;
} args = {
.crypto_len = 16,
.crypto_cipher = "ecb(aes)",
};
enum {
ARG_CRYPTO_LEN = 5000,
ARG_CRYPTO_CIPHER = 5001,
};
static const struct argp_option opts[] = {
{ "crypto-len", ARG_CRYPTO_LEN, "CRYPTO_LEN", 0,
"Set the length of crypto buffer" },
{ "crypto-cipher", ARG_CRYPTO_CIPHER, "CRYPTO_CIPHER", 0,
"Set the cipher to use (default:ecb(aes))" },
{},
};
static error_t crypto_parse_arg(int key, char *arg, struct argp_state *state)
{
switch (key) {
case ARG_CRYPTO_LEN:
args.crypto_len = strtoul(arg, NULL, 10);
if (!args.crypto_len ||
args.crypto_len > sizeof(ctx.skel->bss->dst)) {
fprintf(stderr, "Invalid crypto buffer len (limit %zu)\n",
sizeof(ctx.skel->bss->dst));
argp_usage(state);
}
break;
case ARG_CRYPTO_CIPHER:
args.crypto_cipher = strdup(arg);
if (!strlen(args.crypto_cipher) ||
strlen(args.crypto_cipher) > MAX_CIPHER_LEN) {
fprintf(stderr, "Invalid crypto cipher len (limit %d)\n",
MAX_CIPHER_LEN);
argp_usage(state);
}
break;
default:
return ARGP_ERR_UNKNOWN;
}
return 0;
}
const struct argp bench_crypto_argp = {
.options = opts,
.parser = crypto_parse_arg,
};
static void crypto_validate(void)
{
if (env.consumer_cnt != 0) {
fprintf(stderr, "bpf crypto benchmark doesn't support consumer!\n");
exit(1);
}
}
static void crypto_setup(void)
{
LIBBPF_OPTS(bpf_test_run_opts, opts);
int err, pfd;
size_t i, sz;
sz = args.crypto_len;
if (!sz || sz > sizeof(ctx.skel->bss->dst)) {
fprintf(stderr, "invalid encrypt buffer size (source %zu, target %zu)\n",
sz, sizeof(ctx.skel->bss->dst));
exit(1);
}
setup_libbpf();
ctx.skel = crypto_bench__open();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
snprintf(ctx.skel->bss->cipher, 128, "%s", args.crypto_cipher);
memcpy(ctx.skel->bss->key, "12345678testtest", 16);
ctx.skel->bss->key_len = 16;
ctx.skel->bss->authsize = 0;
srandom(time(NULL));
input = malloc(sz);
for (i = 0; i < sz - 1; i++)
input[i] = '1' + random() % 9;
input[sz - 1] = '\0';
ctx.skel->rodata->len = args.crypto_len;
err = crypto_bench__load(ctx.skel);
if (err) {
fprintf(stderr, "failed to load skeleton\n");
crypto_bench__destroy(ctx.skel);
exit(1);
}
pfd = bpf_program__fd(ctx.skel->progs.crypto_setup);
if (pfd < 0) {
fprintf(stderr, "failed to get fd for setup prog\n");
crypto_bench__destroy(ctx.skel);
exit(1);
}
err = bpf_prog_test_run_opts(pfd, &opts);
if (err || ctx.skel->bss->status) {
fprintf(stderr, "failed to run setup prog: err %d, status %d\n",
err, ctx.skel->bss->status);
crypto_bench__destroy(ctx.skel);
exit(1);
}
}
static void crypto_encrypt_setup(void)
{
crypto_setup();
ctx.pfd = bpf_program__fd(ctx.skel->progs.crypto_encrypt);
}
static void crypto_decrypt_setup(void)
{
crypto_setup();
ctx.pfd = bpf_program__fd(ctx.skel->progs.crypto_decrypt);
}
static void crypto_measure(struct bench_res *res)
{
res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
}
static void *crypto_producer(void *unused)
{
LIBBPF_OPTS(bpf_test_run_opts, opts,
.repeat = 64,
.data_in = input,
.data_size_in = args.crypto_len,
);
while (true)
(void)bpf_prog_test_run_opts(ctx.pfd, &opts);
return NULL;
}
const struct bench bench_crypto_encrypt = {
.name = "crypto-encrypt",
.argp = &bench_crypto_argp,
.validate = crypto_validate,
.setup = crypto_encrypt_setup,
.producer_thread = crypto_producer,
.measure = crypto_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_crypto_decrypt = {
.name = "crypto-decrypt",
.argp = &bench_crypto_argp,
.validate = crypto_validate,
.setup = crypto_decrypt_setup,
.producer_thread = crypto_producer,
.measure = crypto_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};

View File

@ -1,11 +1,57 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#define _GNU_SOURCE
#include <argp.h>
#include <unistd.h>
#include <stdint.h>
#include "bench.h"
#include "trigger_bench.skel.h"
#include "trace_helpers.h"
#define MAX_TRIG_BATCH_ITERS 1000
static struct {
__u32 batch_iters;
} args = {
.batch_iters = 100,
};
enum {
ARG_TRIG_BATCH_ITERS = 7000,
};
static const struct argp_option opts[] = {
{ "trig-batch-iters", ARG_TRIG_BATCH_ITERS, "BATCH_ITER_CNT", 0,
"Number of in-kernel iterations per one driver test run"},
{},
};
static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
long ret;
switch (key) {
case ARG_TRIG_BATCH_ITERS:
ret = strtol(arg, NULL, 10);
if (ret < 1 || ret > MAX_TRIG_BATCH_ITERS) {
fprintf(stderr, "invalid --trig-batch-iters value (should be between %d and %d)\n",
1, MAX_TRIG_BATCH_ITERS);
argp_usage(state);
}
args.batch_iters = ret;
break;
default:
return ARGP_ERR_UNKNOWN;
}
return 0;
}
const struct argp bench_trigger_batch_argp = {
.options = opts,
.parser = parse_arg,
};
/* adjust slot shift in inc_hits() if changing */
#define MAX_BUCKETS 256
@ -14,6 +60,8 @@
/* BPF triggering benchmarks */
static struct trigger_ctx {
struct trigger_bench *skel;
bool usermode_counters;
int driver_prog_fd;
} ctx;
static struct counter base_hits[MAX_BUCKETS];
@ -51,41 +99,63 @@ static void trigger_validate(void)
}
}
static void *trigger_base_producer(void *input)
static void *trigger_producer(void *input)
{
while (true) {
(void)syscall(__NR_getpgid);
inc_counter(base_hits);
if (ctx.usermode_counters) {
while (true) {
(void)syscall(__NR_getpgid);
inc_counter(base_hits);
}
} else {
while (true)
(void)syscall(__NR_getpgid);
}
return NULL;
}
static void trigger_base_measure(struct bench_res *res)
static void *trigger_producer_batch(void *input)
{
res->hits = sum_and_reset_counters(base_hits);
}
int fd = ctx.driver_prog_fd ?: bpf_program__fd(ctx.skel->progs.trigger_driver);
static void *trigger_producer(void *input)
{
while (true)
(void)syscall(__NR_getpgid);
bpf_prog_test_run_opts(fd, NULL);
return NULL;
}
static void trigger_measure(struct bench_res *res)
{
res->hits = sum_and_reset_counters(ctx.skel->bss->hits);
if (ctx.usermode_counters)
res->hits = sum_and_reset_counters(base_hits);
else
res->hits = sum_and_reset_counters(ctx.skel->bss->hits);
}
static void setup_ctx(void)
{
setup_libbpf();
ctx.skel = trigger_bench__open_and_load();
ctx.skel = trigger_bench__open();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
/* default "driver" BPF program */
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, true);
ctx.skel->rodata->batch_iters = args.batch_iters;
}
static void load_ctx(void)
{
int err;
err = trigger_bench__load(ctx.skel);
if (err) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
}
static void attach_bpf(struct bpf_program *prog)
@ -99,66 +169,106 @@ static void attach_bpf(struct bpf_program *prog)
}
}
static void trigger_tp_setup(void)
static void trigger_syscall_count_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_tp);
ctx.usermode_counters = true;
}
static void trigger_rawtp_setup(void)
/* Batched, staying mostly in-kernel triggering setups */
static void trigger_kernel_count_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_raw_tp);
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_count, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_count);
}
static void trigger_kprobe_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kprobe, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kprobe);
}
static void trigger_kretprobe_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kretprobe, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kretprobe);
}
static void trigger_kprobe_multi_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kprobe_multi, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kprobe_multi);
}
static void trigger_kretprobe_multi_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kretprobe_multi, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kretprobe_multi);
}
static void trigger_fentry_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_fentry, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry);
}
static void trigger_fexit_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_fexit, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fexit);
}
static void trigger_fentry_sleep_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry_sleep);
}
static void trigger_fmodret_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_driver_kfunc, true);
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_fmodret, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_driver_kfunc);
attach_bpf(ctx.skel->progs.bench_trigger_fmodret);
}
static void trigger_tp_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_driver_kfunc, true);
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_tp, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_driver_kfunc);
attach_bpf(ctx.skel->progs.bench_trigger_tp);
}
static void trigger_rawtp_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_driver_kfunc, true);
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_rawtp, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_driver_kfunc);
attach_bpf(ctx.skel->progs.bench_trigger_rawtp);
}
/* make sure call is not inlined and not avoided by compiler, so __weak and
* inline asm volatile in the body of the function
*
@ -192,7 +302,7 @@ __nocf_check __weak void uprobe_target_ret(void)
asm volatile ("");
}
static void *uprobe_base_producer(void *input)
static void *uprobe_producer_count(void *input)
{
while (true) {
uprobe_target_nop();
@ -226,15 +336,24 @@ static void usetup(bool use_retprobe, void *target_addr)
{
size_t uprobe_offset;
struct bpf_link *link;
int err;
setup_libbpf();
ctx.skel = trigger_bench__open_and_load();
ctx.skel = trigger_bench__open();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_uprobe, true);
err = trigger_bench__load(ctx.skel);
if (err) {
fprintf(stderr, "failed to load skeleton\n");
exit(1);
}
uprobe_offset = get_uprobe_offset(target_addr);
link = bpf_program__attach_uprobe(ctx.skel->progs.bench_trigger_uprobe,
use_retprobe,
@ -248,204 +367,90 @@ static void usetup(bool use_retprobe, void *target_addr)
ctx.skel->links.bench_trigger_uprobe = link;
}
static void uprobe_setup_nop(void)
static void usermode_count_setup(void)
{
ctx.usermode_counters = true;
}
static void uprobe_nop_setup(void)
{
usetup(false, &uprobe_target_nop);
}
static void uretprobe_setup_nop(void)
static void uretprobe_nop_setup(void)
{
usetup(true, &uprobe_target_nop);
}
static void uprobe_setup_push(void)
static void uprobe_push_setup(void)
{
usetup(false, &uprobe_target_push);
}
static void uretprobe_setup_push(void)
static void uretprobe_push_setup(void)
{
usetup(true, &uprobe_target_push);
}
static void uprobe_setup_ret(void)
static void uprobe_ret_setup(void)
{
usetup(false, &uprobe_target_ret);
}
static void uretprobe_setup_ret(void)
static void uretprobe_ret_setup(void)
{
usetup(true, &uprobe_target_ret);
}
const struct bench bench_trig_base = {
.name = "trig-base",
const struct bench bench_trig_syscall_count = {
.name = "trig-syscall-count",
.validate = trigger_validate,
.producer_thread = trigger_base_producer,
.measure = trigger_base_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_tp = {
.name = "trig-tp",
.validate = trigger_validate,
.setup = trigger_tp_setup,
.setup = trigger_syscall_count_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_rawtp = {
.name = "trig-rawtp",
.validate = trigger_validate,
.setup = trigger_rawtp_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
/* batched (staying mostly in kernel) kprobe/fentry benchmarks */
#define BENCH_TRIG_KERNEL(KIND, NAME) \
const struct bench bench_trig_##KIND = { \
.name = "trig-" NAME, \
.setup = trigger_##KIND##_setup, \
.producer_thread = trigger_producer_batch, \
.measure = trigger_measure, \
.report_progress = hits_drops_report_progress, \
.report_final = hits_drops_report_final, \
.argp = &bench_trigger_batch_argp, \
}
const struct bench bench_trig_kprobe = {
.name = "trig-kprobe",
.validate = trigger_validate,
.setup = trigger_kprobe_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
BENCH_TRIG_KERNEL(kernel_count, "kernel-count");
BENCH_TRIG_KERNEL(kprobe, "kprobe");
BENCH_TRIG_KERNEL(kretprobe, "kretprobe");
BENCH_TRIG_KERNEL(kprobe_multi, "kprobe-multi");
BENCH_TRIG_KERNEL(kretprobe_multi, "kretprobe-multi");
BENCH_TRIG_KERNEL(fentry, "fentry");
BENCH_TRIG_KERNEL(fexit, "fexit");
BENCH_TRIG_KERNEL(fmodret, "fmodret");
BENCH_TRIG_KERNEL(tp, "tp");
BENCH_TRIG_KERNEL(rawtp, "rawtp");
const struct bench bench_trig_kretprobe = {
.name = "trig-kretprobe",
.validate = trigger_validate,
.setup = trigger_kretprobe_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
/* uprobe benchmarks */
#define BENCH_TRIG_USERMODE(KIND, PRODUCER, NAME) \
const struct bench bench_trig_##KIND = { \
.name = "trig-" NAME, \
.validate = trigger_validate, \
.setup = KIND##_setup, \
.producer_thread = uprobe_producer_##PRODUCER, \
.measure = trigger_measure, \
.report_progress = hits_drops_report_progress, \
.report_final = hits_drops_report_final, \
}
const struct bench bench_trig_kprobe_multi = {
.name = "trig-kprobe-multi",
.validate = trigger_validate,
.setup = trigger_kprobe_multi_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_kretprobe_multi = {
.name = "trig-kretprobe-multi",
.validate = trigger_validate,
.setup = trigger_kretprobe_multi_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fentry = {
.name = "trig-fentry",
.validate = trigger_validate,
.setup = trigger_fentry_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fexit = {
.name = "trig-fexit",
.validate = trigger_validate,
.setup = trigger_fexit_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fentry_sleep = {
.name = "trig-fentry-sleep",
.validate = trigger_validate,
.setup = trigger_fentry_sleep_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fmodret = {
.name = "trig-fmodret",
.validate = trigger_validate,
.setup = trigger_fmodret_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_base = {
.name = "trig-uprobe-base",
.setup = NULL, /* no uprobe/uretprobe is attached */
.producer_thread = uprobe_base_producer,
.measure = trigger_base_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_nop = {
.name = "trig-uprobe-nop",
.setup = uprobe_setup_nop,
.producer_thread = uprobe_producer_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_nop = {
.name = "trig-uretprobe-nop",
.setup = uretprobe_setup_nop,
.producer_thread = uprobe_producer_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_push = {
.name = "trig-uprobe-push",
.setup = uprobe_setup_push,
.producer_thread = uprobe_producer_push,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_push = {
.name = "trig-uretprobe-push",
.setup = uretprobe_setup_push,
.producer_thread = uprobe_producer_push,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_ret = {
.name = "trig-uprobe-ret",
.setup = uprobe_setup_ret,
.producer_thread = uprobe_producer_ret,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_ret = {
.name = "trig-uretprobe-ret",
.setup = uretprobe_setup_ret,
.producer_thread = uprobe_producer_ret,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
BENCH_TRIG_USERMODE(usermode_count, count, "usermode-count");
BENCH_TRIG_USERMODE(uprobe_nop, nop, "uprobe-nop");
BENCH_TRIG_USERMODE(uprobe_push, push, "uprobe-push");
BENCH_TRIG_USERMODE(uprobe_ret, ret, "uprobe-ret");
BENCH_TRIG_USERMODE(uretprobe_nop, nop, "uretprobe-nop");
BENCH_TRIG_USERMODE(uretprobe_push, push, "uretprobe-push");
BENCH_TRIG_USERMODE(uretprobe_ret, ret, "uretprobe-ret");

View File

@ -2,8 +2,22 @@
set -eufo pipefail
for i in base tp rawtp kprobe fentry fmodret
do
summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-10s: %s\n" $i "$summary"
def_tests=( \
usermode-count kernel-count syscall-count \
fentry fexit fmodret \
rawtp tp \
kprobe kprobe-multi \
kretprobe kretprobe-multi \
)
tests=("$@")
if [ ${#tests[@]} -eq 0 ]; then
tests=("${def_tests[@]}")
fi
p=${PROD_CNT:-1}
for t in "${tests[@]}"; do
summary=$(sudo ./bench -w2 -d5 -a -p$p trig-$t | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-15s: %s\n" $t "$summary"
done

View File

@ -2,7 +2,7 @@
set -eufo pipefail
for i in base {uprobe,uretprobe}-{nop,push,ret}
for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret}
do
summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-15s: %s\n" $i "$summary"

View File

@ -326,6 +326,16 @@ l_true: \
})
#endif
#ifdef __BPF_FEATURE_MAY_GOTO
#define cond_break \
({ __label__ l_break, l_continue; \
asm volatile goto("may_goto %l[l_break]" \
:::: l_break); \
goto l_continue; \
l_break: break; \
l_continue:; \
})
#else
#define cond_break \
({ __label__ l_break, l_continue; \
asm volatile goto("1:.byte 0xe5; \
@ -337,6 +347,7 @@ l_true: \
l_break: break; \
l_continue:; \
})
#endif
#ifndef bpf_nop_mov
#define bpf_nop_mov(var) \
@ -386,6 +397,28 @@ l_true: \
, [as]"i"((dst_as << 16) | src_as));
#endif
void bpf_preempt_disable(void) __weak __ksym;
void bpf_preempt_enable(void) __weak __ksym;
typedef struct {
} __bpf_preempt_t;
static inline __bpf_preempt_t __bpf_preempt_constructor(void)
{
__bpf_preempt_t ret = {};
bpf_preempt_disable();
return ret;
}
static inline void __bpf_preempt_destructor(__bpf_preempt_t *t)
{
bpf_preempt_enable();
}
#define bpf_guard_preempt() \
__bpf_preempt_t ___bpf_apply(preempt, __COUNTER__) \
__attribute__((__unused__, __cleanup__(__bpf_preempt_destructor))) = \
__bpf_preempt_constructor()
/* Description
* Assert that a conditional expression is true.
* Returns
@ -459,4 +492,11 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym;
extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
unsigned int flags__k, void *aux__ign) __ksym;
#define bpf_wq_set_callback(timer, cb, flags) \
bpf_wq_set_callback_impl(timer, cb, flags, NULL)
#endif

View File

@ -494,6 +494,10 @@ __bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused
return arg;
}
__bpf_kfunc void bpf_kfunc_call_test_sleepable(void)
{
}
BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids)
BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
@ -520,6 +524,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_sleepable, KF_SLEEPABLE)
BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids)
static int bpf_testmod_ops_init(struct btf *btf)

View File

@ -96,6 +96,7 @@ void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p) __ksym;
void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym;
void bpf_kfunc_call_test_destructive(void) __ksym;
void bpf_kfunc_call_test_sleepable(void) __ksym;
void bpf_kfunc_call_test_offset(struct prog_test_ref_kfunc *p);
struct prog_test_member *bpf_kfunc_call_memb_acquire(void);

View File

@ -429,7 +429,7 @@ int create_and_get_cgroup(const char *relative_path)
* which is an invalid cgroup id.
* If there is a failure, it prints the error to stderr.
*/
unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir)
static unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir)
{
int dirfd, err, flags, mount_id, fhsize;
union {

View File

@ -13,7 +13,12 @@ CONFIG_BPF_SYSCALL=y
CONFIG_CGROUP_BPF=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_AES=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_DWARF4=y
@ -88,3 +93,5 @@ CONFIG_VSOCKETS=y
CONFIG_VXLAN=y
CONFIG_XDP_SOCKETS=y
CONFIG_XFRM_INTERFACE=y
CONFIG_TCP_CONG_DCTCP=y
CONFIG_TCP_CONG_BBR=y

View File

@ -52,6 +52,8 @@ struct ipv6_packet pkt_v6 = {
.tcp.doff = 5,
};
static const struct network_helper_opts default_opts;
int settimeo(int fd, int timeout_ms)
{
struct timeval timeout = { .tv_sec = 3 };
@ -185,6 +187,16 @@ close_fds:
return NULL;
}
int start_server_addr(int type, const struct sockaddr_storage *addr, socklen_t len,
const struct network_helper_opts *opts)
{
if (!opts)
opts = &default_opts;
return __start_server(type, 0, (struct sockaddr *)addr, len,
opts->timeout_ms, 0);
}
void free_fds(int *fds, unsigned int nr_close_fds)
{
if (fds) {
@ -258,17 +270,24 @@ static int connect_fd_to_addr(int fd,
return 0;
}
int connect_to_addr(const struct sockaddr_storage *addr, socklen_t addrlen, int type)
int connect_to_addr(int type, const struct sockaddr_storage *addr, socklen_t addrlen,
const struct network_helper_opts *opts)
{
int fd;
fd = socket(addr->ss_family, type, 0);
if (!opts)
opts = &default_opts;
fd = socket(addr->ss_family, type, opts->proto);
if (fd < 0) {
log_err("Failed to create client socket");
return -1;
}
if (connect_fd_to_addr(fd, addr, addrlen, false))
if (settimeo(fd, opts->timeout_ms))
goto error_close;
if (connect_fd_to_addr(fd, addr, addrlen, opts->must_fail))
goto error_close;
return fd;
@ -278,8 +297,6 @@ error_close:
return -1;
}
static const struct network_helper_opts default_opts;
int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts)
{
struct sockaddr_storage addr;
@ -442,25 +459,35 @@ struct nstoken *open_netns(const char *name)
struct nstoken *token;
token = calloc(1, sizeof(struct nstoken));
if (!ASSERT_OK_PTR(token, "malloc token"))
if (!token) {
log_err("Failed to malloc token");
return NULL;
}
token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net"))
if (token->orig_netns_fd == -1) {
log_err("Failed to open(/proc/self/ns/net)");
goto fail;
}
snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
if (!ASSERT_GE(nsfd, 0, "open netns fd"))
if (nsfd == -1) {
log_err("Failed to open(%s)", nspath);
goto fail;
}
err = setns(nsfd, CLONE_NEWNET);
close(nsfd);
if (!ASSERT_OK(err, "setns"))
if (err) {
log_err("Failed to setns(nsfd)");
goto fail;
}
return token;
fail:
if (token->orig_netns_fd != -1)
close(token->orig_netns_fd);
free(token);
return NULL;
}
@ -470,7 +497,8 @@ void close_netns(struct nstoken *token)
if (!token)
return;
ASSERT_OK(setns(token->orig_netns_fd, CLONE_NEWNET), "setns");
if (setns(token->orig_netns_fd, CLONE_NEWNET))
log_err("Failed to setns(orig_netns_fd)");
close(token->orig_netns_fd);
free(token);
}
@ -497,3 +525,153 @@ int get_socket_local_port(int sock_fd)
return -1;
}
int get_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
{
struct ifreq ifr = {0};
int sockfd, err;
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sockfd < 0)
return -errno;
memcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name));
ring_param->cmd = ETHTOOL_GRINGPARAM;
ifr.ifr_data = (char *)ring_param;
if (ioctl(sockfd, SIOCETHTOOL, &ifr) < 0) {
err = errno;
close(sockfd);
return -err;
}
close(sockfd);
return 0;
}
int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
{
struct ifreq ifr = {0};
int sockfd, err;
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sockfd < 0)
return -errno;
memcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name));
ring_param->cmd = ETHTOOL_SRINGPARAM;
ifr.ifr_data = (char *)ring_param;
if (ioctl(sockfd, SIOCETHTOOL, &ifr) < 0) {
err = errno;
close(sockfd);
return -err;
}
close(sockfd);
return 0;
}
struct send_recv_arg {
int fd;
uint32_t bytes;
int stop;
};
static void *send_recv_server(void *arg)
{
struct send_recv_arg *a = (struct send_recv_arg *)arg;
ssize_t nr_sent = 0, bytes = 0;
char batch[1500];
int err = 0, fd;
fd = accept(a->fd, NULL, NULL);
while (fd == -1) {
if (errno == EINTR)
continue;
err = -errno;
goto done;
}
if (settimeo(fd, 0)) {
err = -errno;
goto done;
}
while (bytes < a->bytes && !READ_ONCE(a->stop)) {
nr_sent = send(fd, &batch,
MIN(a->bytes - bytes, sizeof(batch)), 0);
if (nr_sent == -1 && errno == EINTR)
continue;
if (nr_sent == -1) {
err = -errno;
break;
}
bytes += nr_sent;
}
if (bytes != a->bytes) {
log_err("send %zd expected %u", bytes, a->bytes);
if (!err)
err = bytes > a->bytes ? -E2BIG : -EINTR;
}
done:
if (fd >= 0)
close(fd);
if (err) {
WRITE_ONCE(a->stop, 1);
return ERR_PTR(err);
}
return NULL;
}
int send_recv_data(int lfd, int fd, uint32_t total_bytes)
{
ssize_t nr_recv = 0, bytes = 0;
struct send_recv_arg arg = {
.fd = lfd,
.bytes = total_bytes,
.stop = 0,
};
pthread_t srv_thread;
void *thread_ret;
char batch[1500];
int err = 0;
err = pthread_create(&srv_thread, NULL, send_recv_server, (void *)&arg);
if (err) {
log_err("Failed to pthread_create");
return err;
}
/* recv total_bytes */
while (bytes < total_bytes && !READ_ONCE(arg.stop)) {
nr_recv = recv(fd, &batch,
MIN(total_bytes - bytes, sizeof(batch)), 0);
if (nr_recv == -1 && errno == EINTR)
continue;
if (nr_recv == -1) {
err = -errno;
break;
}
bytes += nr_recv;
}
if (bytes != total_bytes) {
log_err("recv %zd expected %u", bytes, total_bytes);
if (!err)
err = bytes > total_bytes ? -E2BIG : -EINTR;
}
WRITE_ONCE(arg.stop, 1);
pthread_join(srv_thread, &thread_ret);
if (IS_ERR(thread_ret)) {
log_err("Failed in thread_ret %ld", PTR_ERR(thread_ret));
err = err ? : PTR_ERR(thread_ret);
}
return err;
}

View File

@ -9,8 +9,12 @@ typedef __u16 __sum16;
#include <linux/if_packet.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/ethtool.h>
#include <linux/sockios.h>
#include <linux/err.h>
#include <netinet/tcp.h>
#include <bpf/bpf_endian.h>
#include <net/if.h>
#define MAGIC_VAL 0x1234
#define NUM_ITER 100000
@ -50,8 +54,11 @@ int start_mptcp_server(int family, const char *addr, __u16 port,
int *start_reuseport_server(int family, int type, const char *addr_str,
__u16 port, int timeout_ms,
unsigned int nr_listens);
int start_server_addr(int type, const struct sockaddr_storage *addr, socklen_t len,
const struct network_helper_opts *opts);
void free_fds(int *fds, unsigned int nr_close_fds);
int connect_to_addr(const struct sockaddr_storage *addr, socklen_t len, int type);
int connect_to_addr(int type, const struct sockaddr_storage *addr, socklen_t len,
const struct network_helper_opts *opts);
int connect_to_fd(int server_fd, int timeout_ms);
int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts);
int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms);
@ -61,6 +68,8 @@ int make_sockaddr(int family, const char *addr_str, __u16 port,
struct sockaddr_storage *addr, socklen_t *len);
char *ping_command(int family);
int get_socket_local_port(int sock_fd);
int get_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
struct nstoken;
/**
@ -71,6 +80,7 @@ struct nstoken;
*/
struct nstoken *open_netns(const char *name);
void close_netns(struct nstoken *token);
int send_recv_data(int lfd, int fd, uint32_t total_bytes);
static __u16 csum_fold(__u32 csum)
{

View File

@ -0,0 +1,186 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <test_progs.h>
#include "arena_atomics.skel.h"
static void test_add(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.add);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->add64_value, 3, "add64_value");
ASSERT_EQ(skel->arena->add64_result, 1, "add64_result");
ASSERT_EQ(skel->arena->add32_value, 3, "add32_value");
ASSERT_EQ(skel->arena->add32_result, 1, "add32_result");
ASSERT_EQ(skel->arena->add_stack_value_copy, 3, "add_stack_value");
ASSERT_EQ(skel->arena->add_stack_result, 1, "add_stack_result");
ASSERT_EQ(skel->arena->add_noreturn_value, 3, "add_noreturn_value");
}
static void test_sub(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.sub);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->sub64_value, -1, "sub64_value");
ASSERT_EQ(skel->arena->sub64_result, 1, "sub64_result");
ASSERT_EQ(skel->arena->sub32_value, -1, "sub32_value");
ASSERT_EQ(skel->arena->sub32_result, 1, "sub32_result");
ASSERT_EQ(skel->arena->sub_stack_value_copy, -1, "sub_stack_value");
ASSERT_EQ(skel->arena->sub_stack_result, 1, "sub_stack_result");
ASSERT_EQ(skel->arena->sub_noreturn_value, -1, "sub_noreturn_value");
}
static void test_and(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.and);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->and64_value, 0x010ull << 32, "and64_value");
ASSERT_EQ(skel->arena->and32_value, 0x010, "and32_value");
}
static void test_or(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.or);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->or64_value, 0x111ull << 32, "or64_value");
ASSERT_EQ(skel->arena->or32_value, 0x111, "or32_value");
}
static void test_xor(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.xor);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->xor64_value, 0x101ull << 32, "xor64_value");
ASSERT_EQ(skel->arena->xor32_value, 0x101, "xor32_value");
}
static void test_cmpxchg(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.cmpxchg);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->cmpxchg64_value, 2, "cmpxchg64_value");
ASSERT_EQ(skel->arena->cmpxchg64_result_fail, 1, "cmpxchg_result_fail");
ASSERT_EQ(skel->arena->cmpxchg64_result_succeed, 1, "cmpxchg_result_succeed");
ASSERT_EQ(skel->arena->cmpxchg32_value, 2, "lcmpxchg32_value");
ASSERT_EQ(skel->arena->cmpxchg32_result_fail, 1, "cmpxchg_result_fail");
ASSERT_EQ(skel->arena->cmpxchg32_result_succeed, 1, "cmpxchg_result_succeed");
}
static void test_xchg(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.xchg);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->xchg64_value, 2, "xchg64_value");
ASSERT_EQ(skel->arena->xchg64_result, 1, "xchg64_result");
ASSERT_EQ(skel->arena->xchg32_value, 2, "xchg32_value");
ASSERT_EQ(skel->arena->xchg32_result, 1, "xchg32_result");
}
void test_arena_atomics(void)
{
struct arena_atomics *skel;
int err;
skel = arena_atomics__open();
if (!ASSERT_OK_PTR(skel, "arena atomics skeleton open"))
return;
if (skel->data->skip_tests) {
printf("%s:SKIP:no ENABLE_ATOMICS_TESTS or no addr_space_cast support in clang",
__func__);
test__skip();
goto cleanup;
}
err = arena_atomics__load(skel);
if (!ASSERT_OK(err, "arena atomics skeleton load"))
return;
skel->bss->pid = getpid();
if (test__start_subtest("add"))
test_add(skel);
if (test__start_subtest("sub"))
test_sub(skel);
if (test__start_subtest("and"))
test_and(skel);
if (test__start_subtest("or"))
test_or(skel);
if (test__start_subtest("xor"))
test_xor(skel);
if (test__start_subtest("cmpxchg"))
test_cmpxchg(skel);
if (test__start_subtest("xchg"))
test_xchg(skel);
cleanup:
arena_atomics__destroy(skel);
}

View File

@ -13,6 +13,7 @@
#include "tcp_ca_write_sk_pacing.skel.h"
#include "tcp_ca_incompl_cong_ops.skel.h"
#include "tcp_ca_unsupp_cong_op.skel.h"
#include "tcp_ca_kfunc.skel.h"
#ifndef ENOTSUPP
#define ENOTSUPP 524
@ -20,7 +21,6 @@
static const unsigned int total_bytes = 10 * 1024 * 1024;
static int expected_stg = 0xeB9F;
static int stop;
static int settcpca(int fd, const char *tcp_ca)
{
@ -33,62 +33,11 @@ static int settcpca(int fd, const char *tcp_ca)
return 0;
}
static void *server(void *arg)
{
int lfd = (int)(long)arg, err = 0, fd;
ssize_t nr_sent = 0, bytes = 0;
char batch[1500];
fd = accept(lfd, NULL, NULL);
while (fd == -1) {
if (errno == EINTR)
continue;
err = -errno;
goto done;
}
if (settimeo(fd, 0)) {
err = -errno;
goto done;
}
while (bytes < total_bytes && !READ_ONCE(stop)) {
nr_sent = send(fd, &batch,
MIN(total_bytes - bytes, sizeof(batch)), 0);
if (nr_sent == -1 && errno == EINTR)
continue;
if (nr_sent == -1) {
err = -errno;
break;
}
bytes += nr_sent;
}
ASSERT_EQ(bytes, total_bytes, "send");
done:
if (fd >= 0)
close(fd);
if (err) {
WRITE_ONCE(stop, 1);
return ERR_PTR(err);
}
return NULL;
}
static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
{
struct sockaddr_in6 sa6 = {};
ssize_t nr_recv = 0, bytes = 0;
int lfd = -1, fd = -1;
pthread_t srv_thread;
socklen_t addrlen = sizeof(sa6);
void *thread_ret;
char batch[1500];
int err;
WRITE_ONCE(stop, 0);
lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
if (!ASSERT_NEQ(lfd, -1, "socket"))
return;
@ -99,12 +48,7 @@ static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
return;
}
if (settcpca(lfd, tcp_ca) || settcpca(fd, tcp_ca) ||
settimeo(lfd, 0) || settimeo(fd, 0))
goto done;
err = getsockname(lfd, (struct sockaddr *)&sa6, &addrlen);
if (!ASSERT_NEQ(err, -1, "getsockname"))
if (settcpca(lfd, tcp_ca) || settcpca(fd, tcp_ca))
goto done;
if (sk_stg_map) {
@ -115,7 +59,7 @@ static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
}
/* connect to server */
err = connect(fd, (struct sockaddr *)&sa6, addrlen);
err = connect_fd_to_fd(fd, lfd, 0);
if (!ASSERT_NEQ(err, -1, "connect"))
goto done;
@ -129,26 +73,7 @@ static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
goto done;
}
err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd);
if (!ASSERT_OK(err, "pthread_create"))
goto done;
/* recv total_bytes */
while (bytes < total_bytes && !READ_ONCE(stop)) {
nr_recv = recv(fd, &batch,
MIN(total_bytes - bytes, sizeof(batch)), 0);
if (nr_recv == -1 && errno == EINTR)
continue;
if (nr_recv == -1)
break;
bytes += nr_recv;
}
ASSERT_EQ(bytes, total_bytes, "recv");
WRITE_ONCE(stop, 1);
pthread_join(srv_thread, &thread_ret);
ASSERT_OK(IS_ERR(thread_ret), "thread_ret");
ASSERT_OK(send_recv_data(lfd, fd, total_bytes), "send_recv_data");
done:
close(lfd);
@ -304,7 +229,7 @@ static void test_rel_setsockopt(void)
struct bpf_dctcp_release *rel_skel;
libbpf_print_fn_t old_print_fn;
err_str = "unknown func bpf_setsockopt";
err_str = "program of this type cannot use helper bpf_setsockopt";
found = false;
old_print_fn = libbpf_set_print(libbpf_debug_print);
@ -518,6 +443,15 @@ static void test_link_replace(void)
tcp_ca_update__destroy(skel);
}
static void test_tcp_ca_kfunc(void)
{
struct tcp_ca_kfunc *skel;
skel = tcp_ca_kfunc__open_and_load();
ASSERT_OK_PTR(skel, "tcp_ca_kfunc__open_and_load");
tcp_ca_kfunc__destroy(skel);
}
void test_bpf_tcp_ca(void)
{
if (test__start_subtest("dctcp"))
@ -546,4 +480,6 @@ void test_bpf_tcp_ca(void)
test_multi_links();
if (test__start_subtest("link_replace"))
test_link_replace();
if (test__start_subtest("tcp_ca_kfunc"))
test_tcp_ca_kfunc();
}

View File

@ -10,6 +10,7 @@
#include <netinet/tcp.h>
#include <test_progs.h>
#include "network_helpers.h"
#include "progs/test_cls_redirect.h"
#include "test_cls_redirect.skel.h"
@ -35,39 +36,6 @@ struct tuple {
struct addr_port dst;
};
static int start_server(const struct sockaddr *addr, socklen_t len, int type)
{
int fd = socket(addr->sa_family, type, 0);
if (CHECK_FAIL(fd == -1))
return -1;
if (CHECK_FAIL(bind(fd, addr, len) == -1))
goto err;
if (type == SOCK_STREAM && CHECK_FAIL(listen(fd, 128) == -1))
goto err;
return fd;
err:
close(fd);
return -1;
}
static int connect_to_server(const struct sockaddr *addr, socklen_t len,
int type)
{
int fd = socket(addr->sa_family, type, 0);
if (CHECK_FAIL(fd == -1))
return -1;
if (CHECK_FAIL(connect(fd, addr, len)))
goto err;
return fd;
err:
close(fd);
return -1;
}
static bool fill_addr_port(const struct sockaddr *sa, struct addr_port *ap)
{
const struct sockaddr_in6 *in6;
@ -98,14 +66,14 @@ static bool set_up_conn(const struct sockaddr *addr, socklen_t len, int type,
socklen_t slen = sizeof(ss);
struct sockaddr *sa = (struct sockaddr *)&ss;
*server = start_server(addr, len, type);
*server = start_server_addr(type, (struct sockaddr_storage *)addr, len, NULL);
if (*server < 0)
return false;
if (CHECK_FAIL(getsockname(*server, sa, &slen)))
goto close_server;
*conn = connect_to_server(sa, slen, type);
*conn = connect_to_addr(type, (struct sockaddr_storage *)sa, slen, NULL);
if (*conn < 0)
goto close_server;

View File

@ -0,0 +1,197 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <sys/types.h>
#include <sys/socket.h>
#include <net/if.h>
#include <linux/in6.h>
#include <linux/if_alg.h>
#include "test_progs.h"
#include "network_helpers.h"
#include "crypto_sanity.skel.h"
#include "crypto_basic.skel.h"
#define NS_TEST "crypto_sanity_ns"
#define IPV6_IFACE_ADDR "face::1"
static const unsigned char crypto_key[] = "testtest12345678";
static const char plain_text[] = "stringtoencrypt0";
static int opfd = -1, tfmfd = -1;
static const char algo[] = "ecb(aes)";
static int init_afalg(void)
{
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher",
.salg_name = "ecb(aes)"
};
tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (tfmfd == -1)
return errno;
if (bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)) == -1)
return errno;
if (setsockopt(tfmfd, SOL_ALG, ALG_SET_KEY, crypto_key, 16) == -1)
return errno;
opfd = accept(tfmfd, NULL, 0);
if (opfd == -1)
return errno;
return 0;
}
static void deinit_afalg(void)
{
if (tfmfd != -1)
close(tfmfd);
if (opfd != -1)
close(opfd);
}
static void do_crypt_afalg(const void *src, void *dst, int size, bool encrypt)
{
struct msghdr msg = {};
struct cmsghdr *cmsg;
char cbuf[CMSG_SPACE(4)] = {0};
struct iovec iov;
msg.msg_control = cbuf;
msg.msg_controllen = sizeof(cbuf);
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_ALG;
cmsg->cmsg_type = ALG_SET_OP;
cmsg->cmsg_len = CMSG_LEN(4);
*(__u32 *)CMSG_DATA(cmsg) = encrypt ? ALG_OP_ENCRYPT : ALG_OP_DECRYPT;
iov.iov_base = (char *)src;
iov.iov_len = size;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
sendmsg(opfd, &msg, 0);
read(opfd, dst, size);
}
void test_crypto_basic(void)
{
RUN_TESTS(crypto_basic);
}
void test_crypto_sanity(void)
{
LIBBPF_OPTS(bpf_tc_hook, qdisc_hook, .attach_point = BPF_TC_EGRESS);
LIBBPF_OPTS(bpf_tc_opts, tc_attach_enc);
LIBBPF_OPTS(bpf_tc_opts, tc_attach_dec);
LIBBPF_OPTS(bpf_test_run_opts, opts);
struct nstoken *nstoken = NULL;
struct crypto_sanity *skel;
char afalg_plain[16] = {0};
char afalg_dst[16] = {0};
struct sockaddr_in6 addr;
int sockfd, err, pfd;
socklen_t addrlen;
u16 udp_test_port;
skel = crypto_sanity__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel open"))
return;
SYS(fail, "ip netns add %s", NS_TEST);
SYS(fail, "ip -net %s -6 addr add %s/128 dev lo nodad", NS_TEST, IPV6_IFACE_ADDR);
SYS(fail, "ip -net %s link set dev lo up", NS_TEST);
nstoken = open_netns(NS_TEST);
if (!ASSERT_OK_PTR(nstoken, "open_netns"))
goto fail;
err = init_afalg();
if (!ASSERT_OK(err, "AF_ALG init fail"))
goto fail;
qdisc_hook.ifindex = if_nametoindex("lo");
if (!ASSERT_GT(qdisc_hook.ifindex, 0, "if_nametoindex lo"))
goto fail;
skel->bss->key_len = 16;
skel->bss->authsize = 0;
udp_test_port = skel->data->udp_test_port;
memcpy(skel->bss->key, crypto_key, sizeof(crypto_key));
snprintf(skel->bss->algo, 128, "%s", algo);
pfd = bpf_program__fd(skel->progs.skb_crypto_setup);
if (!ASSERT_GT(pfd, 0, "skb_crypto_setup fd"))
goto fail;
err = bpf_prog_test_run_opts(pfd, &opts);
if (!ASSERT_OK(err, "skb_crypto_setup") ||
!ASSERT_OK(opts.retval, "skb_crypto_setup retval"))
goto fail;
if (!ASSERT_OK(skel->bss->status, "skb_crypto_setup status"))
goto fail;
err = bpf_tc_hook_create(&qdisc_hook);
if (!ASSERT_OK(err, "create qdisc hook"))
goto fail;
addrlen = sizeof(addr);
err = make_sockaddr(AF_INET6, IPV6_IFACE_ADDR, udp_test_port,
(void *)&addr, &addrlen);
if (!ASSERT_OK(err, "make_sockaddr"))
goto fail;
tc_attach_enc.prog_fd = bpf_program__fd(skel->progs.encrypt_sanity);
err = bpf_tc_attach(&qdisc_hook, &tc_attach_enc);
if (!ASSERT_OK(err, "attach encrypt filter"))
goto fail;
sockfd = socket(AF_INET6, SOCK_DGRAM, 0);
if (!ASSERT_NEQ(sockfd, -1, "encrypt socket"))
goto fail;
err = sendto(sockfd, plain_text, sizeof(plain_text), 0, (void *)&addr, addrlen);
close(sockfd);
if (!ASSERT_EQ(err, sizeof(plain_text), "encrypt send"))
goto fail;
do_crypt_afalg(plain_text, afalg_dst, sizeof(afalg_dst), true);
if (!ASSERT_OK(skel->bss->status, "encrypt status"))
goto fail;
if (!ASSERT_STRNEQ(skel->bss->dst, afalg_dst, sizeof(afalg_dst), "encrypt AF_ALG"))
goto fail;
tc_attach_enc.flags = tc_attach_enc.prog_fd = tc_attach_enc.prog_id = 0;
err = bpf_tc_detach(&qdisc_hook, &tc_attach_enc);
if (!ASSERT_OK(err, "bpf_tc_detach encrypt"))
goto fail;
tc_attach_dec.prog_fd = bpf_program__fd(skel->progs.decrypt_sanity);
err = bpf_tc_attach(&qdisc_hook, &tc_attach_dec);
if (!ASSERT_OK(err, "attach decrypt filter"))
goto fail;
sockfd = socket(AF_INET6, SOCK_DGRAM, 0);
if (!ASSERT_NEQ(sockfd, -1, "decrypt socket"))
goto fail;
err = sendto(sockfd, afalg_dst, sizeof(afalg_dst), 0, (void *)&addr, addrlen);
close(sockfd);
if (!ASSERT_EQ(err, sizeof(afalg_dst), "decrypt send"))
goto fail;
do_crypt_afalg(afalg_dst, afalg_plain, sizeof(afalg_plain), false);
if (!ASSERT_OK(skel->bss->status, "decrypt status"))
goto fail;
if (!ASSERT_STRNEQ(skel->bss->dst, afalg_plain, sizeof(afalg_plain), "decrypt AF_ALG"))
goto fail;
tc_attach_dec.flags = tc_attach_dec.prog_fd = tc_attach_dec.prog_id = 0;
err = bpf_tc_detach(&qdisc_hook, &tc_attach_dec);
ASSERT_OK(err, "bpf_tc_detach decrypt");
fail:
close_netns(nstoken);
deinit_afalg();
SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null");
crypto_sanity__destroy(skel);
}

View File

@ -98,7 +98,8 @@ done:
static void test_dummy_multiple_args(void)
{
__u64 args[5] = {0, -100, 0x8a5f, 'c', 0x1234567887654321ULL};
struct bpf_dummy_ops_state st = { 7 };
__u64 args[5] = {(__u64)&st, -100, 0x8a5f, 'c', 0x1234567887654321ULL};
LIBBPF_OPTS(bpf_test_run_opts, attr,
.ctx_in = args,
.ctx_size_in = sizeof(args),
@ -115,6 +116,7 @@ static void test_dummy_multiple_args(void)
fd = bpf_program__fd(skel->progs.test_2);
err = bpf_prog_test_run_opts(fd, &attr);
ASSERT_OK(err, "test_run");
args[0] = 7;
for (i = 0; i < ARRAY_SIZE(args); i++) {
snprintf(name, sizeof(name), "arg %zu", i);
ASSERT_EQ(skel->bss->test_2_args[i], args[i], name);
@ -125,7 +127,8 @@ static void test_dummy_multiple_args(void)
static void test_dummy_sleepable(void)
{
__u64 args[1] = {0};
struct bpf_dummy_ops_state st;
__u64 args[1] = {(__u64)&st};
LIBBPF_OPTS(bpf_test_run_opts, attr,
.ctx_in = args,
.ctx_size_in = sizeof(args),
@ -144,6 +147,31 @@ static void test_dummy_sleepable(void)
dummy_st_ops_success__destroy(skel);
}
/* dummy_st_ops.test_sleepable() parameter is not marked as nullable,
* thus bpf_prog_test_run_opts() below should be rejected as it tries
* to pass NULL for this parameter.
*/
static void test_dummy_sleepable_reject_null(void)
{
__u64 args[1] = {0};
LIBBPF_OPTS(bpf_test_run_opts, attr,
.ctx_in = args,
.ctx_size_in = sizeof(args),
);
struct dummy_st_ops_success *skel;
int fd, err;
skel = dummy_st_ops_success__open_and_load();
if (!ASSERT_OK_PTR(skel, "dummy_st_ops_load"))
return;
fd = bpf_program__fd(skel->progs.test_sleepable);
err = bpf_prog_test_run_opts(fd, &attr);
ASSERT_EQ(err, -EINVAL, "test_run");
dummy_st_ops_success__destroy(skel);
}
void test_dummy_st_ops(void)
{
if (test__start_subtest("dummy_st_ops_attach"))
@ -156,6 +184,8 @@ void test_dummy_st_ops(void)
test_dummy_multiple_args();
if (test__start_subtest("dummy_sleepable"))
test_dummy_sleepable();
if (test__start_subtest("dummy_sleepable_reject_null"))
test_dummy_sleepable_reject_null();
RUN_TESTS(dummy_st_ops_fail);
}

Some files were not shown because too many files have changed in this diff Show More