Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-11-20

The following pull-request contains BPF updates for your *net-next* tree.

We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).

There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca748:

<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca748

<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca748

<<<<<<< HEAD
        if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
        /* kmalloc()'ed memory can't be mmap()'ed */
        if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca748

The main changes are:

1) Addition of BPF trampoline which works as a bridge between kernel functions,
   BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
   BPF programs for tracing with practically zero overhead to call into BPF (as
   opposed to k[ret]probes) and ii) attachment of the former to networking related
   programs to see input/output of networking programs (covering xdpdump use case),
   from Alexei Starovoitov.

2) BPF array map mmap support and use in libbpf for global data maps; also a big
   batch of libbpf improvements, among others, support for reading bitfields in a
   relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.

3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
   the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.

4) Add BPF audit support and emit messages upon successful prog load and unload in
   order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.

5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
   (XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.

6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
   call named bpf_get_link_xdp_info() for retrieving the full set of prog
   IDs attached to XDP, from Toke Høiland-Jørgensen.

7) Add BTF support for array of int, array of struct and multidimensional arrays
   and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.

8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.

9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
   xdping to be run as standalone, from Jiri Benc.

10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.

11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.

12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
    samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller 2019-11-20 18:11:23 -08:00
commit ee5a489fd9
120 changed files with 4952 additions and 1078 deletions

View File

@ -295,7 +295,7 @@ round-robin example of distributing packets is shown below:
{
rr = (rr + 1) & (MAX_SOCKS - 1);
return bpf_redirect_map(&xsks_map, rr, 0);
return bpf_redirect_map(&xsks_map, rr, XDP_DROP);
}
Note, that since there is only a single set of FILL and COMPLETION
@ -304,6 +304,12 @@ to make sure that multiple processes or threads do not use these rings
concurrently. There are no synchronization primitives in the
libbpf code that protects multiple users at this point in time.
Libbpf uses this mode if you create more than one socket tied to the
same umem. However, note that you need to supply the
XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the
xsk_socket__create calls and load your own XDP program as there is no
built in one in libbpf that will route the traffic for you.
XDP_USE_NEED_WAKEUP bind flag
-----------------------------
@ -355,10 +361,22 @@ to set the size of at least one of the RX and TX rings. If you set
both, you will be able to both receive and send traffic from your
application, but if you only want to do one of them, you can save
resources by only setting up one of them. Both the FILL ring and the
COMPLETION ring are mandatory if you have a UMEM tied to your socket,
which is the normal case. But if the XDP_SHARED_UMEM flag is used, any
socket after the first one does not have a UMEM and should in that
case not have any FILL or COMPLETION rings created.
COMPLETION ring are mandatory as you need to have a UMEM tied to your
socket. But if the XDP_SHARED_UMEM flag is used, any socket after the
first one does not have a UMEM and should in that case not have any
FILL or COMPLETION rings created as the ones from the shared umem will
be used. Note, that the rings are single-producer single-consumer, so
do not try to access them from multiple processes at the same
time. See the XDP_SHARED_UMEM section.
In libbpf, you can create Rx-only and Tx-only sockets by supplying
NULL to the rx and tx arguments, respectively, to the
xsk_socket__create function.
If you create a Tx-only socket, we recommend that you do not put any
packets on the fill ring. If you do this, drivers might think you are
going to receive something when you in fact will not, and this can
negatively impact performance.
XDP_UMEM_REG setsockopt
-----------------------

View File

@ -770,10 +770,10 @@ Some core changes of the new internal format:
callq foo
mov %rax,%r13
mov %rbx,%rdi
mov $0x2,%esi
mov $0x3,%edx
mov $0x4,%ecx
mov $0x5,%r8d
mov $0x6,%esi
mov $0x7,%edx
mov $0x8,%ecx
mov $0x9,%r8d
callq bar
add %r13,%rax
mov -0x228(%rbp),%rbx

View File

@ -23,6 +23,8 @@
#include <linux/filter.h>
#include <linux/init.h>
#include <linux/bpf.h>
#include <linux/mm.h>
#include <linux/kernel.h>
#include <asm/cacheflush.h>
#include <asm/dis.h>
#include <asm/facility.h>
@ -38,10 +40,11 @@ struct bpf_jit {
int size; /* Size of program and literal pool */
int size_prg; /* Size of program */
int prg; /* Current position in program */
int lit_start; /* Start of literal pool */
int lit; /* Current position in literal pool */
int lit32_start; /* Start of 32-bit literal pool */
int lit32; /* Current position in 32-bit literal pool */
int lit64_start; /* Start of 64-bit literal pool */
int lit64; /* Current position in 64-bit literal pool */
int base_ip; /* Base address for literal pool */
int ret0_ip; /* Address of return 0 */
int exit_ip; /* Address of exit */
int r1_thunk_ip; /* Address of expoline thunk for 'br %r1' */
int r14_thunk_ip; /* Address of expoline thunk for 'br %r14' */
@ -49,14 +52,10 @@ struct bpf_jit {
int labels[1]; /* Labels for local jumps */
};
#define BPF_SIZE_MAX 0xffff /* Max size for program (16 bit branches) */
#define SEEN_MEM (1 << 0) /* use mem[] for temporary storage */
#define SEEN_RET0 (1 << 1) /* ret0_ip points to a valid return 0 */
#define SEEN_LITERAL (1 << 2) /* code uses literals */
#define SEEN_FUNC (1 << 3) /* calls C functions */
#define SEEN_TAIL_CALL (1 << 4) /* code uses tail calls */
#define SEEN_REG_AX (1 << 5) /* code uses constant blinding */
#define SEEN_MEM BIT(0) /* use mem[] for temporary storage */
#define SEEN_LITERAL BIT(1) /* code uses literals */
#define SEEN_FUNC BIT(2) /* calls C functions */
#define SEEN_TAIL_CALL BIT(3) /* code uses tail calls */
#define SEEN_STACK (SEEN_FUNC | SEEN_MEM)
/*
@ -131,13 +130,13 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define _EMIT2(op) \
({ \
if (jit->prg_buf) \
*(u16 *) (jit->prg_buf + jit->prg) = op; \
*(u16 *) (jit->prg_buf + jit->prg) = (op); \
jit->prg += 2; \
})
#define EMIT2(op, b1, b2) \
({ \
_EMIT2(op | reg(b1, b2)); \
_EMIT2((op) | reg(b1, b2)); \
REG_SET_SEEN(b1); \
REG_SET_SEEN(b2); \
})
@ -145,20 +144,20 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define _EMIT4(op) \
({ \
if (jit->prg_buf) \
*(u32 *) (jit->prg_buf + jit->prg) = op; \
*(u32 *) (jit->prg_buf + jit->prg) = (op); \
jit->prg += 4; \
})
#define EMIT4(op, b1, b2) \
({ \
_EMIT4(op | reg(b1, b2)); \
_EMIT4((op) | reg(b1, b2)); \
REG_SET_SEEN(b1); \
REG_SET_SEEN(b2); \
})
#define EMIT4_RRF(op, b1, b2, b3) \
({ \
_EMIT4(op | reg_high(b3) << 8 | reg(b1, b2)); \
_EMIT4((op) | reg_high(b3) << 8 | reg(b1, b2)); \
REG_SET_SEEN(b1); \
REG_SET_SEEN(b2); \
REG_SET_SEEN(b3); \
@ -167,13 +166,13 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define _EMIT4_DISP(op, disp) \
({ \
unsigned int __disp = (disp) & 0xfff; \
_EMIT4(op | __disp); \
_EMIT4((op) | __disp); \
})
#define EMIT4_DISP(op, b1, b2, disp) \
({ \
_EMIT4_DISP(op | reg_high(b1) << 16 | \
reg_high(b2) << 8, disp); \
_EMIT4_DISP((op) | reg_high(b1) << 16 | \
reg_high(b2) << 8, (disp)); \
REG_SET_SEEN(b1); \
REG_SET_SEEN(b2); \
})
@ -181,21 +180,27 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define EMIT4_IMM(op, b1, imm) \
({ \
unsigned int __imm = (imm) & 0xffff; \
_EMIT4(op | reg_high(b1) << 16 | __imm); \
_EMIT4((op) | reg_high(b1) << 16 | __imm); \
REG_SET_SEEN(b1); \
})
#define EMIT4_PCREL(op, pcrel) \
({ \
long __pcrel = ((pcrel) >> 1) & 0xffff; \
_EMIT4(op | __pcrel); \
_EMIT4((op) | __pcrel); \
})
#define EMIT4_PCREL_RIC(op, mask, target) \
({ \
int __rel = ((target) - jit->prg) / 2; \
_EMIT4((op) | (mask) << 20 | (__rel & 0xffff)); \
})
#define _EMIT6(op1, op2) \
({ \
if (jit->prg_buf) { \
*(u32 *) (jit->prg_buf + jit->prg) = op1; \
*(u16 *) (jit->prg_buf + jit->prg + 4) = op2; \
*(u32 *) (jit->prg_buf + jit->prg) = (op1); \
*(u16 *) (jit->prg_buf + jit->prg + 4) = (op2); \
} \
jit->prg += 6; \
})
@ -203,20 +208,20 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define _EMIT6_DISP(op1, op2, disp) \
({ \
unsigned int __disp = (disp) & 0xfff; \
_EMIT6(op1 | __disp, op2); \
_EMIT6((op1) | __disp, op2); \
})
#define _EMIT6_DISP_LH(op1, op2, disp) \
({ \
u32 _disp = (u32) disp; \
u32 _disp = (u32) (disp); \
unsigned int __disp_h = _disp & 0xff000; \
unsigned int __disp_l = _disp & 0x00fff; \
_EMIT6(op1 | __disp_l, op2 | __disp_h >> 4); \
_EMIT6((op1) | __disp_l, (op2) | __disp_h >> 4); \
})
#define EMIT6_DISP_LH(op1, op2, b1, b2, b3, disp) \
({ \
_EMIT6_DISP_LH(op1 | reg(b1, b2) << 16 | \
_EMIT6_DISP_LH((op1) | reg(b1, b2) << 16 | \
reg_high(b3) << 8, op2, disp); \
REG_SET_SEEN(b1); \
REG_SET_SEEN(b2); \
@ -226,8 +231,8 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define EMIT6_PCREL_LABEL(op1, op2, b1, b2, label, mask) \
({ \
int rel = (jit->labels[label] - jit->prg) >> 1; \
_EMIT6(op1 | reg(b1, b2) << 16 | (rel & 0xffff), \
op2 | mask << 12); \
_EMIT6((op1) | reg(b1, b2) << 16 | (rel & 0xffff), \
(op2) | (mask) << 12); \
REG_SET_SEEN(b1); \
REG_SET_SEEN(b2); \
})
@ -235,66 +240,81 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define EMIT6_PCREL_IMM_LABEL(op1, op2, b1, imm, label, mask) \
({ \
int rel = (jit->labels[label] - jit->prg) >> 1; \
_EMIT6(op1 | (reg_high(b1) | mask) << 16 | \
(rel & 0xffff), op2 | (imm & 0xff) << 8); \
_EMIT6((op1) | (reg_high(b1) | (mask)) << 16 | \
(rel & 0xffff), (op2) | ((imm) & 0xff) << 8); \
REG_SET_SEEN(b1); \
BUILD_BUG_ON(((unsigned long) imm) > 0xff); \
BUILD_BUG_ON(((unsigned long) (imm)) > 0xff); \
})
#define EMIT6_PCREL(op1, op2, b1, b2, i, off, mask) \
({ \
/* Branch instruction needs 6 bytes */ \
int rel = (addrs[i + off + 1] - (addrs[i + 1] - 6)) / 2;\
_EMIT6(op1 | reg(b1, b2) << 16 | (rel & 0xffff), op2 | mask); \
int rel = (addrs[(i) + (off) + 1] - (addrs[(i) + 1] - 6)) / 2;\
_EMIT6((op1) | reg(b1, b2) << 16 | (rel & 0xffff), (op2) | (mask));\
REG_SET_SEEN(b1); \
REG_SET_SEEN(b2); \
})
#define EMIT6_PCREL_RILB(op, b, target) \
({ \
int rel = (target - jit->prg) / 2; \
_EMIT6(op | reg_high(b) << 16 | rel >> 16, rel & 0xffff); \
unsigned int rel = (int)((target) - jit->prg) / 2; \
_EMIT6((op) | reg_high(b) << 16 | rel >> 16, rel & 0xffff);\
REG_SET_SEEN(b); \
})
#define EMIT6_PCREL_RIL(op, target) \
({ \
int rel = (target - jit->prg) / 2; \
_EMIT6(op | rel >> 16, rel & 0xffff); \
unsigned int rel = (int)((target) - jit->prg) / 2; \
_EMIT6((op) | rel >> 16, rel & 0xffff); \
})
#define EMIT6_PCREL_RILC(op, mask, target) \
({ \
EMIT6_PCREL_RIL((op) | (mask) << 20, (target)); \
})
#define _EMIT6_IMM(op, imm) \
({ \
unsigned int __imm = (imm); \
_EMIT6(op | (__imm >> 16), __imm & 0xffff); \
_EMIT6((op) | (__imm >> 16), __imm & 0xffff); \
})
#define EMIT6_IMM(op, b1, imm) \
({ \
_EMIT6_IMM(op | reg_high(b1) << 16, imm); \
_EMIT6_IMM((op) | reg_high(b1) << 16, imm); \
REG_SET_SEEN(b1); \
})
#define _EMIT_CONST_U32(val) \
({ \
unsigned int ret; \
ret = jit->lit32; \
if (jit->prg_buf) \
*(u32 *)(jit->prg_buf + jit->lit32) = (u32)(val);\
jit->lit32 += 4; \
ret; \
})
#define EMIT_CONST_U32(val) \
({ \
unsigned int ret; \
ret = jit->lit - jit->base_ip; \
jit->seen |= SEEN_LITERAL; \
_EMIT_CONST_U32(val) - jit->base_ip; \
})
#define _EMIT_CONST_U64(val) \
({ \
unsigned int ret; \
ret = jit->lit64; \
if (jit->prg_buf) \
*(u32 *) (jit->prg_buf + jit->lit) = (u32) val; \
jit->lit += 4; \
*(u64 *)(jit->prg_buf + jit->lit64) = (u64)(val);\
jit->lit64 += 8; \
ret; \
})
#define EMIT_CONST_U64(val) \
({ \
unsigned int ret; \
ret = jit->lit - jit->base_ip; \
jit->seen |= SEEN_LITERAL; \
if (jit->prg_buf) \
*(u64 *) (jit->prg_buf + jit->lit) = (u64) val; \
jit->lit += 8; \
ret; \
_EMIT_CONST_U64(val) - jit->base_ip; \
})
#define EMIT_ZERO(b1) \
@ -306,6 +326,67 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
} \
})
/*
* Return whether this is the first pass. The first pass is special, since we
* don't know any sizes yet, and thus must be conservative.
*/
static bool is_first_pass(struct bpf_jit *jit)
{
return jit->size == 0;
}
/*
* Return whether this is the code generation pass. The code generation pass is
* special, since we should change as little as possible.
*/
static bool is_codegen_pass(struct bpf_jit *jit)
{
return jit->prg_buf;
}
/*
* Return whether "rel" can be encoded as a short PC-relative offset
*/
static bool is_valid_rel(int rel)
{
return rel >= -65536 && rel <= 65534;
}
/*
* Return whether "off" can be reached using a short PC-relative offset
*/
static bool can_use_rel(struct bpf_jit *jit, int off)
{
return is_valid_rel(off - jit->prg);
}
/*
* Return whether given displacement can be encoded using
* Long-Displacement Facility
*/
static bool is_valid_ldisp(int disp)
{
return disp >= -524288 && disp <= 524287;
}
/*
* Return whether the next 32-bit literal pool entry can be referenced using
* Long-Displacement Facility
*/
static bool can_use_ldisp_for_lit32(struct bpf_jit *jit)
{
return is_valid_ldisp(jit->lit32 - jit->base_ip);
}
/*
* Return whether the next 64-bit literal pool entry can be referenced using
* Long-Displacement Facility
*/
static bool can_use_ldisp_for_lit64(struct bpf_jit *jit)
{
return is_valid_ldisp(jit->lit64 - jit->base_ip);
}
/*
* Fill whole space with illegal instructions
*/
@ -383,9 +464,18 @@ static int get_end(struct bpf_jit *jit, int start)
*/
static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth)
{
const int last = 15, save_restore_size = 6;
int re = 6, rs;
if (is_first_pass(jit)) {
/*
* We don't know yet which registers are used. Reserve space
* conservatively.
*/
jit->prg += (last - re + 1) * save_restore_size;
return;
}
do {
rs = get_start(jit, re);
if (!rs)
@ -396,7 +486,7 @@ static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth)
else
restore_regs(jit, rs, re, stack_depth);
re++;
} while (re <= 15);
} while (re <= last);
}
/*
@ -420,21 +510,28 @@ static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth)
/* Save registers */
save_restore_regs(jit, REGS_SAVE, stack_depth);
/* Setup literal pool */
if (jit->seen & SEEN_LITERAL) {
/* basr %r13,0 */
EMIT2(0x0d00, REG_L, REG_0);
jit->base_ip = jit->prg;
if (is_first_pass(jit) || (jit->seen & SEEN_LITERAL)) {
if (!is_first_pass(jit) &&
is_valid_ldisp(jit->size - (jit->prg + 2))) {
/* basr %l,0 */
EMIT2(0x0d00, REG_L, REG_0);
jit->base_ip = jit->prg;
} else {
/* larl %l,lit32_start */
EMIT6_PCREL_RILB(0xc0000000, REG_L, jit->lit32_start);
jit->base_ip = jit->lit32_start;
}
}
/* Setup stack and backchain */
if (jit->seen & SEEN_STACK) {
if (jit->seen & SEEN_FUNC)
if (is_first_pass(jit) || (jit->seen & SEEN_STACK)) {
if (is_first_pass(jit) || (jit->seen & SEEN_FUNC))
/* lgr %w1,%r15 (backchain) */
EMIT4(0xb9040000, REG_W1, REG_15);
/* la %bfp,STK_160_UNUSED(%r15) (BPF frame pointer) */
EMIT4_DISP(0x41000000, BPF_REG_FP, REG_15, STK_160_UNUSED);
/* aghi %r15,-STK_OFF */
EMIT4_IMM(0xa70b0000, REG_15, -(STK_OFF + stack_depth));
if (jit->seen & SEEN_FUNC)
if (is_first_pass(jit) || (jit->seen & SEEN_FUNC))
/* stg %w1,152(%r15) (backchain) */
EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0,
REG_15, 152);
@ -446,12 +543,6 @@ static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth)
*/
static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth)
{
/* Return 0 */
if (jit->seen & SEEN_RET0) {
jit->ret0_ip = jit->prg;
/* lghi %b0,0 */
EMIT4_IMM(0xa7090000, BPF_REG_0, 0);
}
jit->exit_ip = jit->prg;
/* Load exit code: lgr %r2,%b0 */
EMIT4(0xb9040000, REG_2, BPF_REG_0);
@ -476,7 +567,7 @@ static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth)
_EMIT2(0x07fe);
if (__is_defined(CC_USING_EXPOLINE) && !nospec_disable &&
(jit->seen & SEEN_FUNC)) {
(is_first_pass(jit) || (jit->seen & SEEN_FUNC))) {
jit->r1_thunk_ip = jit->prg;
/* Generate __s390_indirect_jump_r1 thunk */
if (test_facility(35)) {
@ -506,16 +597,14 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
int i, bool extra_pass)
{
struct bpf_insn *insn = &fp->insnsi[i];
int jmp_off, last, insn_count = 1;
u32 dst_reg = insn->dst_reg;
u32 src_reg = insn->src_reg;
int last, insn_count = 1;
u32 *addrs = jit->addrs;
s32 imm = insn->imm;
s16 off = insn->off;
unsigned int mask;
if (dst_reg == BPF_REG_AX || src_reg == BPF_REG_AX)
jit->seen |= SEEN_REG_AX;
switch (insn->code) {
/*
* BPF_MOV
@ -549,9 +638,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
u64 imm64;
imm64 = (u64)(u32) insn[0].imm | ((u64)(u32) insn[1].imm) << 32;
/* lg %dst,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0004, dst_reg, REG_0, REG_L,
EMIT_CONST_U64(imm64));
/* lgrl %dst,imm */
EMIT6_PCREL_RILB(0xc4080000, dst_reg, _EMIT_CONST_U64(imm64));
insn_count = 2;
break;
}
@ -680,9 +768,18 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT4_IMM(0xa7080000, REG_W0, 0);
/* lr %w1,%dst */
EMIT2(0x1800, REG_W1, dst_reg);
/* dl %w0,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0097, REG_W0, REG_0, REG_L,
EMIT_CONST_U32(imm));
if (!is_first_pass(jit) && can_use_ldisp_for_lit32(jit)) {
/* dl %w0,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0097, REG_W0, REG_0, REG_L,
EMIT_CONST_U32(imm));
} else {
/* lgfrl %dst,imm */
EMIT6_PCREL_RILB(0xc40c0000, dst_reg,
_EMIT_CONST_U32(imm));
jit->seen |= SEEN_LITERAL;
/* dlr %w0,%dst */
EMIT4(0xb9970000, REG_W0, dst_reg);
}
/* llgfr %dst,%rc */
EMIT4(0xb9160000, dst_reg, rc_reg);
if (insn_is_zext(&insn[1]))
@ -704,9 +801,18 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT4_IMM(0xa7090000, REG_W0, 0);
/* lgr %w1,%dst */
EMIT4(0xb9040000, REG_W1, dst_reg);
/* dlg %w0,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0087, REG_W0, REG_0, REG_L,
EMIT_CONST_U64(imm));
if (!is_first_pass(jit) && can_use_ldisp_for_lit64(jit)) {
/* dlg %w0,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0087, REG_W0, REG_0, REG_L,
EMIT_CONST_U64(imm));
} else {
/* lgrl %dst,imm */
EMIT6_PCREL_RILB(0xc4080000, dst_reg,
_EMIT_CONST_U64(imm));
jit->seen |= SEEN_LITERAL;
/* dlgr %w0,%dst */
EMIT4(0xb9870000, REG_W0, dst_reg);
}
/* lgr %dst,%rc */
EMIT4(0xb9040000, dst_reg, rc_reg);
break;
@ -729,9 +835,19 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT_ZERO(dst_reg);
break;
case BPF_ALU64 | BPF_AND | BPF_K: /* dst = dst & imm */
/* ng %dst,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0080, dst_reg, REG_0, REG_L,
EMIT_CONST_U64(imm));
if (!is_first_pass(jit) && can_use_ldisp_for_lit64(jit)) {
/* ng %dst,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0080,
dst_reg, REG_0, REG_L,
EMIT_CONST_U64(imm));
} else {
/* lgrl %w0,imm */
EMIT6_PCREL_RILB(0xc4080000, REG_W0,
_EMIT_CONST_U64(imm));
jit->seen |= SEEN_LITERAL;
/* ngr %dst,%w0 */
EMIT4(0xb9800000, dst_reg, REG_W0);
}
break;
/*
* BPF_OR
@ -751,9 +867,19 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT_ZERO(dst_reg);
break;
case BPF_ALU64 | BPF_OR | BPF_K: /* dst = dst | imm */
/* og %dst,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0081, dst_reg, REG_0, REG_L,
EMIT_CONST_U64(imm));
if (!is_first_pass(jit) && can_use_ldisp_for_lit64(jit)) {
/* og %dst,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0081,
dst_reg, REG_0, REG_L,
EMIT_CONST_U64(imm));
} else {
/* lgrl %w0,imm */
EMIT6_PCREL_RILB(0xc4080000, REG_W0,
_EMIT_CONST_U64(imm));
jit->seen |= SEEN_LITERAL;
/* ogr %dst,%w0 */
EMIT4(0xb9810000, dst_reg, REG_W0);
}
break;
/*
* BPF_XOR
@ -775,9 +901,19 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT_ZERO(dst_reg);
break;
case BPF_ALU64 | BPF_XOR | BPF_K: /* dst = dst ^ imm */
/* xg %dst,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0082, dst_reg, REG_0, REG_L,
EMIT_CONST_U64(imm));
if (!is_first_pass(jit) && can_use_ldisp_for_lit64(jit)) {
/* xg %dst,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0082,
dst_reg, REG_0, REG_L,
EMIT_CONST_U64(imm));
} else {
/* lgrl %w0,imm */
EMIT6_PCREL_RILB(0xc4080000, REG_W0,
_EMIT_CONST_U64(imm));
jit->seen |= SEEN_LITERAL;
/* xgr %dst,%w0 */
EMIT4(0xb9820000, dst_reg, REG_W0);
}
break;
/*
* BPF_LSH
@ -1023,9 +1159,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
REG_SET_SEEN(BPF_REG_5);
jit->seen |= SEEN_FUNC;
/* lg %w1,<d(imm)>(%l) */
EMIT6_DISP_LH(0xe3000000, 0x0004, REG_W1, REG_0, REG_L,
EMIT_CONST_U64(func));
/* lgrl %w1,func */
EMIT6_PCREL_RILB(0xc4080000, REG_W1, _EMIT_CONST_U64(func));
if (__is_defined(CC_USING_EXPOLINE) && !nospec_disable) {
/* brasl %r14,__s390_indirect_jump_r1 */
EMIT6_PCREL_RILB(0xc0050000, REG_14, jit->r1_thunk_ip);
@ -1054,9 +1189,17 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
/* llgf %w1,map.max_entries(%b2) */
EMIT6_DISP_LH(0xe3000000, 0x0016, REG_W1, REG_0, BPF_REG_2,
offsetof(struct bpf_array, map.max_entries));
/* clrj %b3,%w1,0xa,label0: if (u32)%b3 >= (u32)%w1 goto out */
EMIT6_PCREL_LABEL(0xec000000, 0x0077, BPF_REG_3,
REG_W1, 0, 0xa);
/* if ((u32)%b3 >= (u32)%w1) goto out; */
if (!is_first_pass(jit) && can_use_rel(jit, jit->labels[0])) {
/* clrj %b3,%w1,0xa,label0 */
EMIT6_PCREL_LABEL(0xec000000, 0x0077, BPF_REG_3,
REG_W1, 0, 0xa);
} else {
/* clr %b3,%w1 */
EMIT2(0x1500, BPF_REG_3, REG_W1);
/* brcl 0xa,label0 */
EMIT6_PCREL_RILC(0xc0040000, 0xa, jit->labels[0]);
}
/*
* if (tail_call_cnt++ > MAX_TAIL_CALL_CNT)
@ -1071,9 +1214,16 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT4_IMM(0xa7080000, REG_W0, 1);
/* laal %w1,%w0,off(%r15) */
EMIT6_DISP_LH(0xeb000000, 0x00fa, REG_W1, REG_W0, REG_15, off);
/* clij %w1,MAX_TAIL_CALL_CNT,0x2,label0 */
EMIT6_PCREL_IMM_LABEL(0xec000000, 0x007f, REG_W1,
MAX_TAIL_CALL_CNT, 0, 0x2);
if (!is_first_pass(jit) && can_use_rel(jit, jit->labels[0])) {
/* clij %w1,MAX_TAIL_CALL_CNT,0x2,label0 */
EMIT6_PCREL_IMM_LABEL(0xec000000, 0x007f, REG_W1,
MAX_TAIL_CALL_CNT, 0, 0x2);
} else {
/* clfi %w1,MAX_TAIL_CALL_CNT */
EMIT6_IMM(0xc20f0000, REG_W1, MAX_TAIL_CALL_CNT);
/* brcl 0x2,label0 */
EMIT6_PCREL_RILC(0xc0040000, 0x2, jit->labels[0]);
}
/*
* prog = array->ptrs[index];
@ -1085,11 +1235,16 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT4(0xb9160000, REG_1, BPF_REG_3);
/* sllg %r1,%r1,3: %r1 *= 8 */
EMIT6_DISP_LH(0xeb000000, 0x000d, REG_1, REG_1, REG_0, 3);
/* lg %r1,prog(%b2,%r1) */
EMIT6_DISP_LH(0xe3000000, 0x0004, REG_1, BPF_REG_2,
/* ltg %r1,prog(%b2,%r1) */
EMIT6_DISP_LH(0xe3000000, 0x0002, REG_1, BPF_REG_2,
REG_1, offsetof(struct bpf_array, ptrs));
/* clgij %r1,0,0x8,label0 */
EMIT6_PCREL_IMM_LABEL(0xec000000, 0x007d, REG_1, 0, 0, 0x8);
if (!is_first_pass(jit) && can_use_rel(jit, jit->labels[0])) {
/* brc 0x8,label0 */
EMIT4_PCREL_RIC(0xa7040000, 0x8, jit->labels[0]);
} else {
/* brcl 0x8,label0 */
EMIT6_PCREL_RILC(0xc0040000, 0x8, jit->labels[0]);
}
/*
* Restore registers before calling function
@ -1110,7 +1265,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
break;
case BPF_JMP | BPF_EXIT: /* return b0 */
last = (i == fp->len - 1) ? 1 : 0;
if (last && !(jit->seen & SEEN_RET0))
if (last)
break;
/* j <exit> */
EMIT4_PCREL(0xa7f40000, jit->exit_ip - jit->prg);
@ -1246,36 +1401,83 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
goto branch_oc;
branch_ks:
is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
/* lgfi %w1,imm (load sign extend imm) */
EMIT6_IMM(0xc0010000, REG_W1, imm);
/* crj or cgrj %dst,%w1,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0076 : 0x0064),
dst_reg, REG_W1, i, off, mask);
/* cfi or cgfi %dst,imm */
EMIT6_IMM(is_jmp32 ? 0xc20d0000 : 0xc20c0000,
dst_reg, imm);
if (!is_first_pass(jit) &&
can_use_rel(jit, addrs[i + off + 1])) {
/* brc mask,off */
EMIT4_PCREL_RIC(0xa7040000,
mask >> 12, addrs[i + off + 1]);
} else {
/* brcl mask,off */
EMIT6_PCREL_RILC(0xc0040000,
mask >> 12, addrs[i + off + 1]);
}
break;
branch_ku:
is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
/* lgfi %w1,imm (load sign extend imm) */
EMIT6_IMM(0xc0010000, REG_W1, imm);
/* clrj or clgrj %dst,%w1,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0077 : 0x0065),
dst_reg, REG_W1, i, off, mask);
/* clfi or clgfi %dst,imm */
EMIT6_IMM(is_jmp32 ? 0xc20f0000 : 0xc20e0000,
dst_reg, imm);
if (!is_first_pass(jit) &&
can_use_rel(jit, addrs[i + off + 1])) {
/* brc mask,off */
EMIT4_PCREL_RIC(0xa7040000,
mask >> 12, addrs[i + off + 1]);
} else {
/* brcl mask,off */
EMIT6_PCREL_RILC(0xc0040000,
mask >> 12, addrs[i + off + 1]);
}
break;
branch_xs:
is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
/* crj or cgrj %dst,%src,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0076 : 0x0064),
dst_reg, src_reg, i, off, mask);
if (!is_first_pass(jit) &&
can_use_rel(jit, addrs[i + off + 1])) {
/* crj or cgrj %dst,%src,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0076 : 0x0064),
dst_reg, src_reg, i, off, mask);
} else {
/* cr or cgr %dst,%src */
if (is_jmp32)
EMIT2(0x1900, dst_reg, src_reg);
else
EMIT4(0xb9200000, dst_reg, src_reg);
/* brcl mask,off */
EMIT6_PCREL_RILC(0xc0040000,
mask >> 12, addrs[i + off + 1]);
}
break;
branch_xu:
is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
/* clrj or clgrj %dst,%src,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0077 : 0x0065),
dst_reg, src_reg, i, off, mask);
if (!is_first_pass(jit) &&
can_use_rel(jit, addrs[i + off + 1])) {
/* clrj or clgrj %dst,%src,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0077 : 0x0065),
dst_reg, src_reg, i, off, mask);
} else {
/* clr or clgr %dst,%src */
if (is_jmp32)
EMIT2(0x1500, dst_reg, src_reg);
else
EMIT4(0xb9210000, dst_reg, src_reg);
/* brcl mask,off */
EMIT6_PCREL_RILC(0xc0040000,
mask >> 12, addrs[i + off + 1]);
}
break;
branch_oc:
/* brc mask,jmp_off (branch instruction needs 4 bytes) */
jmp_off = addrs[i + off + 1] - (addrs[i + 1] - 4);
EMIT4_PCREL(0xa7040000 | mask << 8, jmp_off);
if (!is_first_pass(jit) &&
can_use_rel(jit, addrs[i + off + 1])) {
/* brc mask,off */
EMIT4_PCREL_RIC(0xa7040000,
mask >> 12, addrs[i + off + 1]);
} else {
/* brcl mask,off */
EMIT6_PCREL_RILC(0xc0040000,
mask >> 12, addrs[i + off + 1]);
}
break;
}
default: /* too complex, give up */
@ -1285,29 +1487,68 @@ branch_oc:
return insn_count;
}
/*
* Return whether new i-th instruction address does not violate any invariant
*/
static bool bpf_is_new_addr_sane(struct bpf_jit *jit, int i)
{
/* On the first pass anything goes */
if (is_first_pass(jit))
return true;
/* The codegen pass must not change anything */
if (is_codegen_pass(jit))
return jit->addrs[i] == jit->prg;
/* Passes in between must not increase code size */
return jit->addrs[i] >= jit->prg;
}
/*
* Update the address of i-th instruction
*/
static int bpf_set_addr(struct bpf_jit *jit, int i)
{
if (!bpf_is_new_addr_sane(jit, i))
return -1;
jit->addrs[i] = jit->prg;
return 0;
}
/*
* Compile eBPF program into s390x code
*/
static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp,
bool extra_pass)
{
int i, insn_count;
int i, insn_count, lit32_size, lit64_size;
jit->lit = jit->lit_start;
jit->lit32 = jit->lit32_start;
jit->lit64 = jit->lit64_start;
jit->prg = 0;
bpf_jit_prologue(jit, fp->aux->stack_depth);
if (bpf_set_addr(jit, 0) < 0)
return -1;
for (i = 0; i < fp->len; i += insn_count) {
insn_count = bpf_jit_insn(jit, fp, i, extra_pass);
if (insn_count < 0)
return -1;
/* Next instruction address */
jit->addrs[i + insn_count] = jit->prg;
if (bpf_set_addr(jit, i + insn_count) < 0)
return -1;
}
bpf_jit_epilogue(jit, fp->aux->stack_depth);
jit->lit_start = jit->prg;
jit->size = jit->lit;
lit32_size = jit->lit32 - jit->lit32_start;
lit64_size = jit->lit64 - jit->lit64_start;
jit->lit32_start = jit->prg;
if (lit32_size)
jit->lit32_start = ALIGN(jit->lit32_start, 4);
jit->lit64_start = jit->lit32_start + lit32_size;
if (lit64_size)
jit->lit64_start = ALIGN(jit->lit64_start, 8);
jit->size = jit->lit64_start + lit64_size;
jit->size_prg = jit->prg;
return 0;
}
@ -1369,7 +1610,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
}
memset(&jit, 0, sizeof(jit));
jit.addrs = kcalloc(fp->len + 1, sizeof(*jit.addrs), GFP_KERNEL);
jit.addrs = kvcalloc(fp->len + 1, sizeof(*jit.addrs), GFP_KERNEL);
if (jit.addrs == NULL) {
fp = orig_fp;
goto out;
@ -1388,12 +1629,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/*
* Final pass: Allocate and generate program
*/
if (jit.size >= BPF_SIZE_MAX) {
fp = orig_fp;
goto free_addrs;
}
header = bpf_jit_binary_alloc(jit.size, &jit.prg_buf, 2, jit_fill_hole);
header = bpf_jit_binary_alloc(jit.size, &jit.prg_buf, 8, jit_fill_hole);
if (!header) {
fp = orig_fp;
goto free_addrs;
@ -1422,7 +1658,7 @@ skip_init_ctx:
if (!fp->is_func || extra_pass) {
bpf_prog_fill_jited_linfo(fp, jit.addrs + 1);
free_addrs:
kfree(jit.addrs);
kvfree(jit.addrs);
kfree(jit_data);
fp->aux->jit_data = NULL;
}

View File

@ -26,10 +26,11 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
#define POKE_MAX_OPCODE_SIZE 5
struct text_poke_loc {
void *detour;
void *addr;
size_t len;
const char opcode[POKE_MAX_OPCODE_SIZE];
int len;
s32 rel32;
u8 opcode;
const u8 text[POKE_MAX_OPCODE_SIZE];
};
extern void text_poke_early(void *addr, const void *opcode, size_t len);
@ -51,8 +52,10 @@ extern void text_poke_early(void *addr, const void *opcode, size_t len);
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
extern int poke_int3_handler(struct pt_regs *regs);
extern void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries);
extern void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
const void *opcode, size_t len, const void *emulate);
extern int after_bootmem;
extern __ro_after_init struct mm_struct *poking_mm;
extern __ro_after_init unsigned long poking_addr;
@ -63,8 +66,17 @@ static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
regs->ip = ip;
}
#define INT3_INSN_SIZE 1
#define CALL_INSN_SIZE 5
#define INT3_INSN_SIZE 1
#define INT3_INSN_OPCODE 0xCC
#define CALL_INSN_SIZE 5
#define CALL_INSN_OPCODE 0xE8
#define JMP32_INSN_SIZE 5
#define JMP32_INSN_OPCODE 0xE9
#define JMP8_INSN_SIZE 2
#define JMP8_INSN_OPCODE 0xEB
static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
{

View File

@ -956,16 +956,15 @@ NOKPROBE_SYMBOL(patch_cmp);
int poke_int3_handler(struct pt_regs *regs)
{
struct text_poke_loc *tp;
unsigned char int3 = 0xcc;
void *ip;
/*
* Having observed our INT3 instruction, we now must observe
* bp_patching.nr_entries.
*
* nr_entries != 0 INT3
* WMB RMB
* write INT3 if (nr_entries)
* nr_entries != 0 INT3
* WMB RMB
* write INT3 if (nr_entries)
*
* Idem for other elements in bp_patching.
*/
@ -978,9 +977,9 @@ int poke_int3_handler(struct pt_regs *regs)
return 0;
/*
* Discount the sizeof(int3). See text_poke_bp_batch().
* Discount the INT3. See text_poke_bp_batch().
*/
ip = (void *) regs->ip - sizeof(int3);
ip = (void *) regs->ip - INT3_INSN_SIZE;
/*
* Skip the binary search if there is a single member in the vector.
@ -997,8 +996,28 @@ int poke_int3_handler(struct pt_regs *regs)
return 0;
}
/* set up the specified breakpoint detour */
regs->ip = (unsigned long) tp->detour;
ip += tp->len;
switch (tp->opcode) {
case INT3_INSN_OPCODE:
/*
* Someone poked an explicit INT3, they'll want to handle it,
* do not consume.
*/
return 0;
case CALL_INSN_OPCODE:
int3_emulate_call(regs, (long)ip + tp->rel32);
break;
case JMP32_INSN_OPCODE:
case JMP8_INSN_OPCODE:
int3_emulate_jmp(regs, (long)ip + tp->rel32);
break;
default:
BUG();
}
return 1;
}
@ -1014,7 +1033,7 @@ NOKPROBE_SYMBOL(poke_int3_handler);
* synchronization using int3 breakpoint.
*
* The way it is done:
* - For each entry in the vector:
* - For each entry in the vector:
* - add a int3 trap to the address that will be patched
* - sync cores
* - For each entry in the vector:
@ -1027,9 +1046,9 @@ NOKPROBE_SYMBOL(poke_int3_handler);
*/
void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
{
int patched_all_but_first = 0;
unsigned char int3 = 0xcc;
unsigned char int3 = INT3_INSN_OPCODE;
unsigned int i;
int do_sync;
lockdep_assert_held(&text_mutex);
@ -1053,16 +1072,16 @@ void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
/*
* Second step: update all but the first byte of the patched range.
*/
for (i = 0; i < nr_entries; i++) {
for (do_sync = 0, i = 0; i < nr_entries; i++) {
if (tp[i].len - sizeof(int3) > 0) {
text_poke((char *)tp[i].addr + sizeof(int3),
(const char *)tp[i].opcode + sizeof(int3),
(const char *)tp[i].text + sizeof(int3),
tp[i].len - sizeof(int3));
patched_all_but_first++;
do_sync++;
}
}
if (patched_all_but_first) {
if (do_sync) {
/*
* According to Intel, this core syncing is very likely
* not necessary and we'd be safe even without it. But
@ -1075,10 +1094,17 @@ void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
* Third step: replace the first byte (int3) by the first byte of
* replacing opcode.
*/
for (i = 0; i < nr_entries; i++)
text_poke(tp[i].addr, tp[i].opcode, sizeof(int3));
for (do_sync = 0, i = 0; i < nr_entries; i++) {
if (tp[i].text[0] == INT3_INSN_OPCODE)
continue;
text_poke(tp[i].addr, tp[i].text, sizeof(int3));
do_sync++;
}
if (do_sync)
on_each_cpu(do_sync_core, NULL, 1);
on_each_cpu(do_sync_core, NULL, 1);
/*
* sync_core() implies an smp_mb() and orders this store against
* the writing of the new instruction.
@ -1087,6 +1113,60 @@ void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
bp_patching.nr_entries = 0;
}
void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
const void *opcode, size_t len, const void *emulate)
{
struct insn insn;
if (!opcode)
opcode = (void *)tp->text;
else
memcpy((void *)tp->text, opcode, len);
if (!emulate)
emulate = opcode;
kernel_insn_init(&insn, emulate, MAX_INSN_SIZE);
insn_get_length(&insn);
BUG_ON(!insn_complete(&insn));
BUG_ON(len != insn.length);
tp->addr = addr;
tp->len = len;
tp->opcode = insn.opcode.bytes[0];
switch (tp->opcode) {
case INT3_INSN_OPCODE:
break;
case CALL_INSN_OPCODE:
case JMP32_INSN_OPCODE:
case JMP8_INSN_OPCODE:
tp->rel32 = insn.immediate.value;
break;
default: /* assume NOP */
switch (len) {
case 2: /* NOP2 -- emulate as JMP8+0 */
BUG_ON(memcmp(emulate, ideal_nops[len], len));
tp->opcode = JMP8_INSN_OPCODE;
tp->rel32 = 0;
break;
case 5: /* NOP5 -- emulate as JMP32+0 */
BUG_ON(memcmp(emulate, ideal_nops[NOP_ATOMIC5], len));
tp->opcode = JMP32_INSN_OPCODE;
tp->rel32 = 0;
break;
default: /* unknown instruction */
BUG();
}
break;
}
}
/**
* text_poke_bp() -- update instructions on live kernel on SMP
* @addr: address to patch
@ -1098,20 +1178,10 @@ void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
* dynamically allocated memory. This function should be used when it is
* not possible to allocate memory.
*/
void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate)
{
struct text_poke_loc tp = {
.detour = handler,
.addr = addr,
.len = len,
};
if (len > POKE_MAX_OPCODE_SIZE) {
WARN_ONCE(1, "len is larger than %d\n", POKE_MAX_OPCODE_SIZE);
return;
}
memcpy((void *)tp.opcode, opcode, len);
struct text_poke_loc tp;
text_poke_loc_init(&tp, addr, opcode, len, emulate);
text_poke_bp_batch(&tp, 1);
}

View File

@ -89,8 +89,7 @@ static void __ref __jump_label_transform(struct jump_entry *entry,
return;
}
text_poke_bp((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE,
(void *)jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
text_poke_bp((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE, NULL);
}
void arch_jump_label_transform(struct jump_entry *entry,
@ -147,11 +146,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry,
}
__jump_label_set_jump_code(entry, type,
(union jump_code_union *) &tp->opcode, 0);
(union jump_code_union *)&tp->text, 0);
tp->addr = entry_code;
tp->detour = entry_code + JUMP_LABEL_NOP_SIZE;
tp->len = JUMP_LABEL_NOP_SIZE;
text_poke_loc_init(tp, entry_code, NULL, JUMP_LABEL_NOP_SIZE, NULL);
tp_vec_nr++;

View File

@ -437,8 +437,7 @@ void arch_optimize_kprobes(struct list_head *oplist)
insn_buff[0] = RELATIVEJUMP_OPCODE;
*(s32 *)(&insn_buff[1]) = rel;
text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE,
op->optinsn.insn);
text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE, NULL);
list_del_init(&op->list);
}
@ -448,12 +447,18 @@ void arch_optimize_kprobes(struct list_head *oplist)
void arch_unoptimize_kprobe(struct optimized_kprobe *op)
{
u8 insn_buff[RELATIVEJUMP_SIZE];
u8 emulate_buff[RELATIVEJUMP_SIZE];
/* Set int3 to first byte for kprobes */
insn_buff[0] = BREAKPOINT_INSTRUCTION;
memcpy(insn_buff + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
emulate_buff[0] = RELATIVEJUMP_OPCODE;
*(s32 *)(&emulate_buff[1]) = (s32)((long)op->optinsn.insn -
((long)op->kp.addr + RELATIVEJUMP_SIZE));
text_poke_bp(op->kp.addr, insn_buff, RELATIVEJUMP_SIZE,
op->optinsn.insn);
emulate_buff);
}
/*

View File

@ -9,9 +9,11 @@
#include <linux/filter.h>
#include <linux/if_vlan.h>
#include <linux/bpf.h>
#include <linux/memory.h>
#include <asm/extable.h>
#include <asm/set_memory.h>
#include <asm/nospec-branch.h>
#include <asm/text-patching.h>
static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
{
@ -96,6 +98,7 @@ static int bpf_size_to_x86_bytes(int bpf_size)
/* Pick a register outside of BPF range for JIT internal work */
#define AUX_REG (MAX_BPF_JIT_REG + 1)
#define X86_REG_R9 (MAX_BPF_JIT_REG + 2)
/*
* The following table maps BPF registers to x86-64 registers.
@ -104,8 +107,8 @@ static int bpf_size_to_x86_bytes(int bpf_size)
* register in load/store instructions, it always needs an
* extra byte of encoding and is callee saved.
*
* Also x86-64 register R9 is unused. x86-64 register R10 is
* used for blinding (if enabled).
* x86-64 register R9 is not used by BPF programs, but can be used by BPF
* trampoline. x86-64 register R10 is used for blinding (if enabled).
*/
static const int reg2hex[] = {
[BPF_REG_0] = 0, /* RAX */
@ -121,6 +124,7 @@ static const int reg2hex[] = {
[BPF_REG_FP] = 5, /* RBP readonly */
[BPF_REG_AX] = 2, /* R10 temp register */
[AUX_REG] = 3, /* R11 temp register */
[X86_REG_R9] = 1, /* R9 register, 6th function argument */
};
static const int reg2pt_regs[] = {
@ -148,6 +152,7 @@ static bool is_ereg(u32 reg)
BIT(BPF_REG_7) |
BIT(BPF_REG_8) |
BIT(BPF_REG_9) |
BIT(X86_REG_R9) |
BIT(BPF_REG_AX));
}
@ -198,8 +203,10 @@ struct jit_context {
/* Maximum number of bytes emitted while JITing one eBPF insn */
#define BPF_MAX_INSN_SIZE 128
#define BPF_INSN_SAFETY 64
/* number of bytes emit_call() needs to generate call instruction */
#define X86_CALL_SIZE 5
#define PROLOGUE_SIZE 20
#define PROLOGUE_SIZE 25
/*
* Emit x86-64 prologue code for BPF program and check its size.
@ -208,8 +215,13 @@ struct jit_context {
static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf)
{
u8 *prog = *pprog;
int cnt = 0;
int cnt = X86_CALL_SIZE;
/* BPF trampoline can be made to work without these nops,
* but let's waste 5 bytes for now and optimize later
*/
memcpy(prog, ideal_nops[NOP_ATOMIC5], cnt);
prog += cnt;
EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
/* sub rsp, rounded_stack_depth */
@ -390,6 +402,149 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg)
*pprog = prog;
}
/* LDX: dst_reg = *(u8*)(src_reg + off) */
static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
{
u8 *prog = *pprog;
int cnt = 0;
switch (size) {
case BPF_B:
/* Emit 'movzx rax, byte ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
break;
case BPF_H:
/* Emit 'movzx rax, word ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
break;
case BPF_W:
/* Emit 'mov eax, dword ptr [rax+0x14]' */
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
else
EMIT1(0x8B);
break;
case BPF_DW:
/* Emit 'mov rax, qword ptr [rax+0x14]' */
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
break;
}
/*
* If insn->off == 0 we can save one extra byte, but
* special case of x86 R13 which always needs an offset
* is not worth the hassle
*/
if (is_imm8(off))
EMIT2(add_2reg(0x40, src_reg, dst_reg), off);
else
EMIT1_off32(add_2reg(0x80, src_reg, dst_reg), off);
*pprog = prog;
}
/* STX: *(u8*)(dst_reg + off) = src_reg */
static void emit_stx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
{
u8 *prog = *pprog;
int cnt = 0;
switch (size) {
case BPF_B:
/* Emit 'mov byte ptr [rax + off], al' */
if (is_ereg(dst_reg) || is_ereg(src_reg) ||
/* We have to add extra byte for x86 SIL, DIL regs */
src_reg == BPF_REG_1 || src_reg == BPF_REG_2)
EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x88);
else
EMIT1(0x88);
break;
case BPF_H:
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT3(0x66, add_2mod(0x40, dst_reg, src_reg), 0x89);
else
EMIT2(0x66, 0x89);
break;
case BPF_W:
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x89);
else
EMIT1(0x89);
break;
case BPF_DW:
EMIT2(add_2mod(0x48, dst_reg, src_reg), 0x89);
break;
}
if (is_imm8(off))
EMIT2(add_2reg(0x40, dst_reg, src_reg), off);
else
EMIT1_off32(add_2reg(0x80, dst_reg, src_reg), off);
*pprog = prog;
}
static int emit_call(u8 **pprog, void *func, void *ip)
{
u8 *prog = *pprog;
int cnt = 0;
s64 offset;
offset = func - (ip + X86_CALL_SIZE);
if (!is_simm32(offset)) {
pr_err("Target call %p is out of range\n", func);
return -EINVAL;
}
EMIT1_off32(0xE8, offset);
*pprog = prog;
return 0;
}
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
void *old_addr, void *new_addr)
{
u8 old_insn[X86_CALL_SIZE] = {};
u8 new_insn[X86_CALL_SIZE] = {};
u8 *prog;
int ret;
if (!is_kernel_text((long)ip) &&
!is_bpf_text_address((long)ip))
/* BPF trampoline in modules is not supported */
return -EINVAL;
if (old_addr) {
prog = old_insn;
ret = emit_call(&prog, old_addr, (void *)ip);
if (ret)
return ret;
}
if (new_addr) {
prog = new_insn;
ret = emit_call(&prog, new_addr, (void *)ip);
if (ret)
return ret;
}
ret = -EBUSY;
mutex_lock(&text_mutex);
switch (t) {
case BPF_MOD_NOP_TO_CALL:
if (memcmp(ip, ideal_nops[NOP_ATOMIC5], X86_CALL_SIZE))
goto out;
text_poke_bp(ip, new_insn, X86_CALL_SIZE, NULL);
break;
case BPF_MOD_CALL_TO_CALL:
if (memcmp(ip, old_insn, X86_CALL_SIZE))
goto out;
text_poke_bp(ip, new_insn, X86_CALL_SIZE, NULL);
break;
case BPF_MOD_CALL_TO_NOP:
if (memcmp(ip, old_insn, X86_CALL_SIZE))
goto out;
text_poke_bp(ip, ideal_nops[NOP_ATOMIC5], X86_CALL_SIZE, NULL);
break;
}
ret = 0;
out:
mutex_unlock(&text_mutex);
return ret;
}
static bool ex_handler_bpf(const struct exception_table_entry *x,
struct pt_regs *regs, int trapnr,
@ -773,68 +928,22 @@ st: if (is_imm8(insn->off))
/* STX: *(u8*)(dst_reg + off) = src_reg */
case BPF_STX | BPF_MEM | BPF_B:
/* Emit 'mov byte ptr [rax + off], al' */
if (is_ereg(dst_reg) || is_ereg(src_reg) ||
/* We have to add extra byte for x86 SIL, DIL regs */
src_reg == BPF_REG_1 || src_reg == BPF_REG_2)
EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x88);
else
EMIT1(0x88);
goto stx;
case BPF_STX | BPF_MEM | BPF_H:
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT3(0x66, add_2mod(0x40, dst_reg, src_reg), 0x89);
else
EMIT2(0x66, 0x89);
goto stx;
case BPF_STX | BPF_MEM | BPF_W:
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x89);
else
EMIT1(0x89);
goto stx;
case BPF_STX | BPF_MEM | BPF_DW:
EMIT2(add_2mod(0x48, dst_reg, src_reg), 0x89);
stx: if (is_imm8(insn->off))
EMIT2(add_2reg(0x40, dst_reg, src_reg), insn->off);
else
EMIT1_off32(add_2reg(0x80, dst_reg, src_reg),
insn->off);
emit_stx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
break;
/* LDX: dst_reg = *(u8*)(src_reg + off) */
case BPF_LDX | BPF_MEM | BPF_B:
case BPF_LDX | BPF_PROBE_MEM | BPF_B:
/* Emit 'movzx rax, byte ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_H:
case BPF_LDX | BPF_PROBE_MEM | BPF_H:
/* Emit 'movzx rax, word ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_W:
case BPF_LDX | BPF_PROBE_MEM | BPF_W:
/* Emit 'mov eax, dword ptr [rax+0x14]' */
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
else
EMIT1(0x8B);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_DW:
case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
/* Emit 'mov rax, qword ptr [rax+0x14]' */
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
ldx: /*
* If insn->off == 0 we can save one extra byte, but
* special case of x86 R13 which always needs an offset
* is not worth the hassle
*/
if (is_imm8(insn->off))
EMIT2(add_2reg(0x40, src_reg, dst_reg), insn->off);
else
EMIT1_off32(add_2reg(0x80, src_reg, dst_reg),
insn->off);
emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
struct exception_table_entry *ex;
u8 *_insn = image + proglen;
@ -899,13 +1008,8 @@ xadd: if (is_imm8(insn->off))
/* call */
case BPF_JMP | BPF_CALL:
func = (u8 *) __bpf_call_base + imm32;
jmp_offset = func - (image + addrs[i]);
if (!imm32 || !is_simm32(jmp_offset)) {
pr_err("unsupported BPF func %d addr %p image %p\n",
imm32, func, image);
if (!imm32 || emit_call(&prog, func, image + addrs[i - 1]))
return -EINVAL;
}
EMIT1_off32(0xE8, jmp_offset);
break;
case BPF_JMP | BPF_TAIL_CALL:
@ -1138,6 +1242,210 @@ emit_jmp:
return proglen;
}
static void save_regs(struct btf_func_model *m, u8 **prog, int nr_args,
int stack_size)
{
int i;
/* Store function arguments to stack.
* For a function that accepts two pointers the sequence will be:
* mov QWORD PTR [rbp-0x10],rdi
* mov QWORD PTR [rbp-0x8],rsi
*/
for (i = 0; i < min(nr_args, 6); i++)
emit_stx(prog, bytes_to_bpf_size(m->arg_size[i]),
BPF_REG_FP,
i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
-(stack_size - i * 8));
}
static void restore_regs(struct btf_func_model *m, u8 **prog, int nr_args,
int stack_size)
{
int i;
/* Restore function arguments from stack.
* For a function that accepts two pointers the sequence will be:
* EMIT4(0x48, 0x8B, 0x7D, 0xF0); mov rdi,QWORD PTR [rbp-0x10]
* EMIT4(0x48, 0x8B, 0x75, 0xF8); mov rsi,QWORD PTR [rbp-0x8]
*/
for (i = 0; i < min(nr_args, 6); i++)
emit_ldx(prog, bytes_to_bpf_size(m->arg_size[i]),
i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
BPF_REG_FP,
-(stack_size - i * 8));
}
static int invoke_bpf(struct btf_func_model *m, u8 **pprog,
struct bpf_prog **progs, int prog_cnt, int stack_size)
{
u8 *prog = *pprog;
int cnt = 0, i;
for (i = 0; i < prog_cnt; i++) {
if (emit_call(&prog, __bpf_prog_enter, prog))
return -EINVAL;
/* remember prog start time returned by __bpf_prog_enter */
emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
/* arg1: lea rdi, [rbp - stack_size] */
EMIT4(0x48, 0x8D, 0x7D, -stack_size);
/* arg2: progs[i]->insnsi for interpreter */
if (!progs[i]->jited)
emit_mov_imm64(&prog, BPF_REG_2,
(long) progs[i]->insnsi >> 32,
(u32) (long) progs[i]->insnsi);
/* call JITed bpf program or interpreter */
if (emit_call(&prog, progs[i]->bpf_func, prog))
return -EINVAL;
/* arg1: mov rdi, progs[i] */
emit_mov_imm64(&prog, BPF_REG_1, (long) progs[i] >> 32,
(u32) (long) progs[i]);
/* arg2: mov rsi, rbx <- start time in nsec */
emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
if (emit_call(&prog, __bpf_prog_exit, prog))
return -EINVAL;
}
*pprog = prog;
return 0;
}
/* Example:
* __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
* its 'struct btf_func_model' will be nr_args=2
* The assembly code when eth_type_trans is executing after trampoline:
*
* push rbp
* mov rbp, rsp
* sub rsp, 16 // space for skb and dev
* push rbx // temp regs to pass start time
* mov qword ptr [rbp - 16], rdi // save skb pointer to stack
* mov qword ptr [rbp - 8], rsi // save dev pointer to stack
* call __bpf_prog_enter // rcu_read_lock and preempt_disable
* mov rbx, rax // remember start time in bpf stats are enabled
* lea rdi, [rbp - 16] // R1==ctx of bpf prog
* call addr_of_jited_FENTRY_prog
* movabsq rdi, 64bit_addr_of_struct_bpf_prog // unused if bpf stats are off
* mov rsi, rbx // prog start time
* call __bpf_prog_exit // rcu_read_unlock, preempt_enable and stats math
* mov rdi, qword ptr [rbp - 16] // restore skb pointer from stack
* mov rsi, qword ptr [rbp - 8] // restore dev pointer from stack
* pop rbx
* leave
* ret
*
* eth_type_trans has 5 byte nop at the beginning. These 5 bytes will be
* replaced with 'call generated_bpf_trampoline'. When it returns
* eth_type_trans will continue executing with original skb and dev pointers.
*
* The assembly code when eth_type_trans is called from trampoline:
*
* push rbp
* mov rbp, rsp
* sub rsp, 24 // space for skb, dev, return value
* push rbx // temp regs to pass start time
* mov qword ptr [rbp - 24], rdi // save skb pointer to stack
* mov qword ptr [rbp - 16], rsi // save dev pointer to stack
* call __bpf_prog_enter // rcu_read_lock and preempt_disable
* mov rbx, rax // remember start time if bpf stats are enabled
* lea rdi, [rbp - 24] // R1==ctx of bpf prog
* call addr_of_jited_FENTRY_prog // bpf prog can access skb and dev
* movabsq rdi, 64bit_addr_of_struct_bpf_prog // unused if bpf stats are off
* mov rsi, rbx // prog start time
* call __bpf_prog_exit // rcu_read_unlock, preempt_enable and stats math
* mov rdi, qword ptr [rbp - 24] // restore skb pointer from stack
* mov rsi, qword ptr [rbp - 16] // restore dev pointer from stack
* call eth_type_trans+5 // execute body of eth_type_trans
* mov qword ptr [rbp - 8], rax // save return value
* call __bpf_prog_enter // rcu_read_lock and preempt_disable
* mov rbx, rax // remember start time in bpf stats are enabled
* lea rdi, [rbp - 24] // R1==ctx of bpf prog
* call addr_of_jited_FEXIT_prog // bpf prog can access skb, dev, return value
* movabsq rdi, 64bit_addr_of_struct_bpf_prog // unused if bpf stats are off
* mov rsi, rbx // prog start time
* call __bpf_prog_exit // rcu_read_unlock, preempt_enable and stats math
* mov rax, qword ptr [rbp - 8] // restore eth_type_trans's return value
* pop rbx
* leave
* add rsp, 8 // skip eth_type_trans's frame
* ret // return to its caller
*/
int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
struct bpf_prog **fentry_progs, int fentry_cnt,
struct bpf_prog **fexit_progs, int fexit_cnt,
void *orig_call)
{
int cnt = 0, nr_args = m->nr_args;
int stack_size = nr_args * 8;
u8 *prog;
/* x86-64 supports up to 6 arguments. 7+ can be added in the future */
if (nr_args > 6)
return -ENOTSUPP;
if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
(flags & BPF_TRAMP_F_SKIP_FRAME))
return -EINVAL;
if (flags & BPF_TRAMP_F_CALL_ORIG)
stack_size += 8; /* room for return value of orig_call */
if (flags & BPF_TRAMP_F_SKIP_FRAME)
/* skip patched call instruction and point orig_call to actual
* body of the kernel function.
*/
orig_call += X86_CALL_SIZE;
prog = image;
EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
EMIT1(0x53); /* push rbx */
save_regs(m, &prog, nr_args, stack_size);
if (fentry_cnt)
if (invoke_bpf(m, &prog, fentry_progs, fentry_cnt, stack_size))
return -EINVAL;
if (flags & BPF_TRAMP_F_CALL_ORIG) {
if (fentry_cnt)
restore_regs(m, &prog, nr_args, stack_size);
/* call original function */
if (emit_call(&prog, orig_call, prog))
return -EINVAL;
/* remember return value in a stack for bpf prog to access */
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
}
if (fexit_cnt)
if (invoke_bpf(m, &prog, fexit_progs, fexit_cnt, stack_size))
return -EINVAL;
if (flags & BPF_TRAMP_F_RESTORE_REGS)
restore_regs(m, &prog, nr_args, stack_size);
if (flags & BPF_TRAMP_F_CALL_ORIG)
/* restore original return value back into RAX */
emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
EMIT1(0x5B); /* pop rbx */
EMIT1(0xC9); /* leave */
if (flags & BPF_TRAMP_F_SKIP_FRAME)
/* skip our return address and return to parent */
EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
EMIT1(0xC3); /* ret */
/* One half of the page has active running trampoline.
* Another half is an area for next trampoline.
* Make sure the trampoline generation logic doesn't overflow.
*/
if (WARN_ON_ONCE(prog - (u8 *)image > PAGE_SIZE / 2 - BPF_INSN_SAFETY))
return -EFAULT;
return 0;
}
struct x64_jit_data {
struct bpf_binary_header *header;
int *addrs;

View File

@ -3175,13 +3175,8 @@ static int bnxt_init_one_rx_ring(struct bnxt *bp, int ring_nr)
bnxt_init_rxbd_pages(ring, type);
if (BNXT_RX_PAGE_MODE(bp) && bp->xdp_prog) {
rxr->xdp_prog = bpf_prog_add(bp->xdp_prog, 1);
if (IS_ERR(rxr->xdp_prog)) {
int rc = PTR_ERR(rxr->xdp_prog);
rxr->xdp_prog = NULL;
return rc;
}
bpf_prog_add(bp->xdp_prog, 1);
rxr->xdp_prog = bp->xdp_prog;
}
prod = rxr->rx_prod;
for (i = 0; i < bp->rx_ring_size; i++) {

View File

@ -1876,13 +1876,8 @@ static int nicvf_xdp_setup(struct nicvf *nic, struct bpf_prog *prog)
if (nic->xdp_prog) {
/* Attach BPF program */
nic->xdp_prog = bpf_prog_add(nic->xdp_prog, nic->rx_queues - 1);
if (!IS_ERR(nic->xdp_prog)) {
bpf_attached = true;
} else {
ret = PTR_ERR(nic->xdp_prog);
nic->xdp_prog = NULL;
}
bpf_prog_add(nic->xdp_prog, nic->rx_queues - 1);
bpf_attached = true;
}
/* Calculate Tx queues needed for XDP and network stack */

View File

@ -1807,11 +1807,8 @@ static int setup_xdp(struct net_device *dev, struct bpf_prog *prog)
if (prog && !xdp_mtu_valid(priv, dev->mtu))
return -EINVAL;
if (prog) {
prog = bpf_prog_add(prog, priv->num_channels);
if (IS_ERR(prog))
return PTR_ERR(prog);
}
if (prog)
bpf_prog_add(prog, priv->num_channels);
up = netif_running(dev);
need_update = (!!priv->xdp_prog != !!prog);

View File

@ -2286,11 +2286,7 @@ int mlx4_en_try_alloc_resources(struct mlx4_en_priv *priv,
lockdep_is_held(&priv->mdev->state_lock));
if (xdp_prog && carry_xdp_prog) {
xdp_prog = bpf_prog_add(xdp_prog, tmp->rx_ring_num);
if (IS_ERR(xdp_prog)) {
mlx4_en_free_resources(tmp);
return PTR_ERR(xdp_prog);
}
bpf_prog_add(xdp_prog, tmp->rx_ring_num);
for (i = 0; i < tmp->rx_ring_num; i++)
rcu_assign_pointer(tmp->rx_ring[i]->xdp_prog,
xdp_prog);
@ -2782,11 +2778,9 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
* program for a new one.
*/
if (priv->tx_ring_num[TX_XDP] == xdp_ring_num) {
if (prog) {
prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
if (IS_ERR(prog))
return PTR_ERR(prog);
}
if (prog)
bpf_prog_add(prog, priv->rx_ring_num - 1);
mutex_lock(&mdev->state_lock);
for (i = 0; i < priv->rx_ring_num; i++) {
old_prog = rcu_dereference_protected(
@ -2807,13 +2801,8 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
if (!tmp)
return -ENOMEM;
if (prog) {
prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
if (IS_ERR(prog)) {
err = PTR_ERR(prog);
goto out;
}
}
if (prog)
bpf_prog_add(prog, priv->rx_ring_num - 1);
mutex_lock(&mdev->state_lock);
memcpy(&new_prof, priv->prof, sizeof(struct mlx4_en_port_profile));
@ -2862,7 +2851,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
unlock_out:
mutex_unlock(&mdev->state_lock);
out:
kfree(tmp);
return err;
}

View File

@ -409,12 +409,9 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
rq->stats = &c->priv->channel_stats[c->ix].rq;
INIT_WORK(&rq->recover_work, mlx5e_rq_err_cqe_work);
rq->xdp_prog = params->xdp_prog ? bpf_prog_inc(params->xdp_prog) : NULL;
if (IS_ERR(rq->xdp_prog)) {
err = PTR_ERR(rq->xdp_prog);
rq->xdp_prog = NULL;
goto err_rq_wq_destroy;
}
if (params->xdp_prog)
bpf_prog_inc(params->xdp_prog);
rq->xdp_prog = params->xdp_prog;
rq_xdp_ix = rq->ix;
if (xsk)
@ -4407,16 +4404,11 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
/* no need for full reset when exchanging programs */
reset = (!priv->channels.params.xdp_prog || !prog);
if (was_opened && !reset) {
if (was_opened && !reset)
/* num_channels is invariant here, so we can take the
* batched reference right upfront.
*/
prog = bpf_prog_add(prog, priv->channels.num);
if (IS_ERR(prog)) {
err = PTR_ERR(prog);
goto unlock;
}
}
bpf_prog_add(prog, priv->channels.num);
if (was_opened && reset) {
struct mlx5e_channels new_channels = {};

View File

@ -46,9 +46,7 @@ nfp_map_ptr_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
/* Grab a single ref to the map for our record. The prog destroy ndo
* happens after free_used_maps().
*/
map = bpf_map_inc(map, false);
if (IS_ERR(map))
return PTR_ERR(map);
bpf_map_inc(map);
record = kmalloc(sizeof(*record), GFP_KERNEL);
if (!record) {

View File

@ -2115,12 +2115,8 @@ static int qede_start_queues(struct qede_dev *edev, bool clear_stats)
if (rc)
goto out;
fp->rxq->xdp_prog = bpf_prog_add(edev->xdp_prog, 1);
if (IS_ERR(fp->rxq->xdp_prog)) {
rc = PTR_ERR(fp->rxq->xdp_prog);
fp->rxq->xdp_prog = NULL;
goto out;
}
bpf_prog_add(edev->xdp_prog, 1);
fp->rxq->xdp_prog = edev->xdp_prog;
}
if (fp->type & QEDE_FASTPATH_TX) {

View File

@ -2445,11 +2445,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
if (!prog && !old_prog)
return 0;
if (prog) {
prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
if (IS_ERR(prog))
return PTR_ERR(prog);
}
if (prog)
bpf_prog_add(prog, vi->max_queue_pairs - 1);
/* Make sure NAPI is not using any XDP TX queues for RX. */
if (netif_running(dev)) {

View File

@ -159,6 +159,7 @@ extern void audit_log_key(struct audit_buffer *ab,
extern void audit_log_link_denied(const char *operation);
extern void audit_log_lost(const char *message);
extern void audit_log_task(struct audit_buffer *ab);
extern int audit_log_task_context(struct audit_buffer *ab);
extern void audit_log_task_info(struct audit_buffer *ab);
@ -219,6 +220,8 @@ static inline void audit_log_key(struct audit_buffer *ab, char *key)
{ }
static inline void audit_log_link_denied(const char *string)
{ }
static inline void audit_log_task(struct audit_buffer *ab)
{ }
static inline int audit_log_task_context(struct audit_buffer *ab)
{
return 0;

View File

@ -12,8 +12,11 @@
#include <linux/err.h>
#include <linux/rbtree_latch.h>
#include <linux/numa.h>
#include <linux/mm_types.h>
#include <linux/wait.h>
#include <linux/u64_stats_sync.h>
#include <linux/refcount.h>
#include <linux/mutex.h>
struct bpf_verifier_env;
struct bpf_verifier_log;
@ -66,6 +69,7 @@ struct bpf_map_ops {
u64 *imm, u32 off);
int (*map_direct_value_meta)(const struct bpf_map *map,
u64 imm, u32 *off);
int (*map_mmap)(struct bpf_map *map, struct vm_area_struct *vma);
};
struct bpf_map_memory {
@ -94,17 +98,19 @@ struct bpf_map {
u32 btf_value_type_id;
struct btf *btf;
struct bpf_map_memory memory;
char name[BPF_OBJ_NAME_LEN];
bool unpriv_array;
bool frozen; /* write-once */
/* 48 bytes hole */
bool frozen; /* write-once; write-protected by freeze_mutex */
/* 22 bytes hole */
/* The 3rd and 4th cacheline with misc members to avoid false sharing
* particularly with refcounting.
*/
atomic_t refcnt ____cacheline_aligned;
atomic_t usercnt;
atomic64_t refcnt ____cacheline_aligned;
atomic64_t usercnt;
struct work_struct work;
char name[BPF_OBJ_NAME_LEN];
struct mutex freeze_mutex;
u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
};
static inline bool map_value_has_spin_lock(const struct bpf_map *map)
@ -246,7 +252,7 @@ struct bpf_func_proto {
};
enum bpf_arg_type arg_type[5];
};
u32 *btf_id; /* BTF ids of arguments */
int *btf_id; /* BTF ids of arguments */
};
/* bpf_context is intentionally undefined structure. Pointer to bpf_context is
@ -384,8 +390,106 @@ struct bpf_prog_stats {
struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));
struct btf_func_model {
u8 ret_size;
u8 nr_args;
u8 arg_size[MAX_BPF_FUNC_ARGS];
};
/* Restore arguments before returning from trampoline to let original function
* continue executing. This flag is used for fentry progs when there are no
* fexit progs.
*/
#define BPF_TRAMP_F_RESTORE_REGS BIT(0)
/* Call original function after fentry progs, but before fexit progs.
* Makes sense for fentry/fexit, normal calls and indirect calls.
*/
#define BPF_TRAMP_F_CALL_ORIG BIT(1)
/* Skip current frame and return to parent. Makes sense for fentry/fexit
* programs only. Should not be used with normal calls and indirect calls.
*/
#define BPF_TRAMP_F_SKIP_FRAME BIT(2)
/* Different use cases for BPF trampoline:
* 1. replace nop at the function entry (kprobe equivalent)
* flags = BPF_TRAMP_F_RESTORE_REGS
* fentry = a set of programs to run before returning from trampoline
*
* 2. replace nop at the function entry (kprobe + kretprobe equivalent)
* flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME
* orig_call = fentry_ip + MCOUNT_INSN_SIZE
* fentry = a set of program to run before calling original function
* fexit = a set of program to run after original function
*
* 3. replace direct call instruction anywhere in the function body
* or assign a function pointer for indirect call (like tcp_congestion_ops->cong_avoid)
* With flags = 0
* fentry = a set of programs to run before returning from trampoline
* With flags = BPF_TRAMP_F_CALL_ORIG
* orig_call = original callback addr or direct function addr
* fentry = a set of program to run before calling original function
* fexit = a set of program to run after original function
*/
int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
struct bpf_prog **fentry_progs, int fentry_cnt,
struct bpf_prog **fexit_progs, int fexit_cnt,
void *orig_call);
/* these two functions are called from generated trampoline */
u64 notrace __bpf_prog_enter(void);
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start);
enum bpf_tramp_prog_type {
BPF_TRAMP_FENTRY,
BPF_TRAMP_FEXIT,
BPF_TRAMP_MAX
};
struct bpf_trampoline {
/* hlist for trampoline_table */
struct hlist_node hlist;
/* serializes access to fields of this trampoline */
struct mutex mutex;
refcount_t refcnt;
u64 key;
struct {
struct btf_func_model model;
void *addr;
} func;
/* list of BPF programs using this trampoline */
struct hlist_head progs_hlist[BPF_TRAMP_MAX];
/* Number of attached programs. A counter per kind. */
int progs_cnt[BPF_TRAMP_MAX];
/* Executable image of trampoline */
void *image;
u64 selector;
};
#ifdef CONFIG_BPF_JIT
struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
int bpf_trampoline_link_prog(struct bpf_prog *prog);
int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
void bpf_trampoline_put(struct bpf_trampoline *tr);
#else
static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
{
return NULL;
}
static inline int bpf_trampoline_link_prog(struct bpf_prog *prog)
{
return -ENOTSUPP;
}
static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
{
return -ENOTSUPP;
}
static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
#endif
struct bpf_func_info_aux {
bool unreliable;
};
struct bpf_prog_aux {
atomic_t refcnt;
atomic64_t refcnt;
u32 used_map_cnt;
u32 max_ctx_offset;
u32 max_pkt_offset;
@ -395,9 +499,14 @@ struct bpf_prog_aux {
u32 func_cnt; /* used by non-func prog as the number of func progs */
u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
u32 attach_btf_id; /* in-kernel BTF type id to attach to */
struct bpf_prog *linked_prog;
bool verifier_zext; /* Zero extensions has been inserted by verifier. */
bool offload_requested;
bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
bool func_proto_unreliable;
enum bpf_tramp_prog_type trampoline_prog_type;
struct bpf_trampoline *trampoline;
struct hlist_node tramp_hlist;
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto;
/* function name for valid attach_btf_id */
@ -419,6 +528,7 @@ struct bpf_prog_aux {
struct bpf_prog_offload *offload;
struct btf *btf;
struct bpf_func_info *func_info;
struct bpf_func_info_aux *func_info_aux;
/* bpf_line_info loaded from userspace. linfo->insn_off
* has the xlated insn offset.
* Both the main and sub prog share the same linfo.
@ -648,7 +758,7 @@ DECLARE_PER_CPU(int, bpf_prog_active);
extern const struct file_operations bpf_map_fops;
extern const struct file_operations bpf_prog_fops;
#define BPF_PROG_TYPE(_id, _name) \
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
extern const struct bpf_prog_ops _name ## _prog_ops; \
extern const struct bpf_verifier_ops _name ## _verifier_ops;
#define BPF_MAP_TYPE(_id, _ops) \
@ -664,9 +774,9 @@ extern const struct bpf_verifier_ops xdp_analyzer_ops;
struct bpf_prog *bpf_prog_get(u32 ufd);
struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
bool attach_drv);
struct bpf_prog * __must_check bpf_prog_add(struct bpf_prog *prog, int i);
void bpf_prog_add(struct bpf_prog *prog, int i);
void bpf_prog_sub(struct bpf_prog *prog, int i);
struct bpf_prog * __must_check bpf_prog_inc(struct bpf_prog *prog);
void bpf_prog_inc(struct bpf_prog *prog);
struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
void bpf_prog_put(struct bpf_prog *prog);
int __bpf_prog_charge(struct user_struct *user, u32 pages);
@ -677,9 +787,9 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map,
bool uref);
void bpf_map_inc(struct bpf_map *map);
void bpf_map_inc_with_uref(struct bpf_map *map);
struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
void bpf_map_put_with_uref(struct bpf_map *map);
void bpf_map_put(struct bpf_map *map);
int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
@ -689,6 +799,7 @@ void bpf_map_charge_finish(struct bpf_map_memory *mem);
void bpf_map_charge_move(struct bpf_map_memory *dst,
struct bpf_map_memory *src);
void *bpf_map_area_alloc(u64 size, int numa_node);
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
void bpf_map_area_free(void *base);
void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
@ -782,7 +893,16 @@ int btf_struct_access(struct bpf_verifier_log *log,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
u32 *next_btf_id);
u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *, int);
int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int);
int btf_distill_func_proto(struct bpf_verifier_log *log,
struct btf *btf,
const struct btf_type *func_proto,
const char *func_name,
struct btf_func_model *m);
int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog);
#else /* !CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
@ -797,10 +917,8 @@ static inline struct bpf_prog *bpf_prog_get_type_dev(u32 ufd,
return ERR_PTR(-EOPNOTSUPP);
}
static inline struct bpf_prog * __must_check bpf_prog_add(struct bpf_prog *prog,
int i)
static inline void bpf_prog_add(struct bpf_prog *prog, int i)
{
return ERR_PTR(-EOPNOTSUPP);
}
static inline void bpf_prog_sub(struct bpf_prog *prog, int i)
@ -811,9 +929,8 @@ static inline void bpf_prog_put(struct bpf_prog *prog)
{
}
static inline struct bpf_prog * __must_check bpf_prog_inc(struct bpf_prog *prog)
static inline void bpf_prog_inc(struct bpf_prog *prog)
{
return ERR_PTR(-EOPNOTSUPP);
}
static inline struct bpf_prog *__must_check
@ -1107,6 +1224,15 @@ static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
#endif
#ifdef CONFIG_INET
struct sk_reuseport_kern {
struct sk_buff *skb;
struct sock *sk;
struct sock *selected_sk;
void *data_end;
u32 hash;
u32 reuseport_id;
bool bind_inany;
};
bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info);
@ -1157,4 +1283,12 @@ static inline u32 bpf_xdp_sock_convert_ctx_access(enum bpf_access_type type,
}
#endif /* CONFIG_INET */
enum bpf_text_poke_type {
BPF_MOD_NOP_TO_CALL,
BPF_MOD_CALL_TO_CALL,
BPF_MOD_CALL_TO_NOP,
};
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
void *addr1, void *addr2);
#endif /* _LINUX_BPF_H */

View File

@ -2,42 +2,68 @@
/* internal file - do not include directly */
#ifdef CONFIG_NET
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCKET_FILTER, sk_filter)
BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED_CLS, tc_cls_act)
BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED_ACT, tc_cls_act)
BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCKET_FILTER, sk_filter,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED_CLS, tc_cls_act,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED_ACT, tc_cls_act,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp,
struct xdp_md, struct xdp_buff)
#ifdef CONFIG_CGROUP_BPF
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SKB, cg_skb)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, cg_sock_addr)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SKB, cg_skb,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock,
struct bpf_sock, struct sock)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, cg_sock_addr,
struct bpf_sock_addr, struct bpf_sock_addr_kern)
#endif
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_SEG6LOCAL, lwt_seg6local)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
BPF_PROG_TYPE(BPF_PROG_TYPE_FLOW_DISSECTOR, flow_dissector)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_SEG6LOCAL, lwt_seg6local,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops,
struct bpf_sock_ops, struct bpf_sock_ops_kern)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb,
struct __sk_buff, struct sk_buff)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg,
struct sk_msg_md, struct sk_msg)
BPF_PROG_TYPE(BPF_PROG_TYPE_FLOW_DISSECTOR, flow_dissector,
struct __sk_buff, struct bpf_flow_dissector)
#endif
#ifdef CONFIG_BPF_EVENTS
BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
BPF_PROG_TYPE(BPF_PROG_TYPE_TRACING, tracing)
BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe,
bpf_user_pt_regs_t, struct pt_regs)
BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint,
__u64, u64)
BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event,
struct bpf_perf_event_data, struct bpf_perf_event_data_kern)
BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint,
struct bpf_raw_tracepoint_args, u64)
BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable,
struct bpf_raw_tracepoint_args, u64)
BPF_PROG_TYPE(BPF_PROG_TYPE_TRACING, tracing,
void *, void *)
#endif
#ifdef CONFIG_CGROUP_BPF
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SYSCTL, cg_sysctl)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCKOPT, cg_sockopt)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev,
struct bpf_cgroup_dev_ctx, struct bpf_cgroup_dev_ctx)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SYSCTL, cg_sysctl,
struct bpf_sysctl, struct bpf_sysctl_kern)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCKOPT, cg_sockopt,
struct bpf_sockopt, struct bpf_sockopt_kern)
#endif
#ifdef CONFIG_BPF_LIRC_MODE2
BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
__u32, u32)
#endif
#ifdef CONFIG_INET
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
struct sk_reuseport_md, struct sk_reuseport_kern)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)

View File

@ -343,6 +343,7 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
#define BPF_MAX_SUBPROGS 256
struct bpf_subprog_info {
/* 'start' has to be the first field otherwise find_subprog() won't work */
u32 start; /* insn idx of function entry point */
u32 linfo_idx; /* The idx to the main_prog->aux->linfo */
u16 stack_depth; /* max. stack depth used by this function */

View File

@ -88,6 +88,7 @@ static inline bool btf_type_is_func_proto(const struct btf_type *t)
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
const char *btf_name_by_offset(const struct btf *btf, u32 offset);
struct btf *btf_parse_vmlinux(void);
struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
#else
static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
u32 type_id)

View File

@ -515,10 +515,12 @@ struct sock_fprog_kern {
struct sock_filter *filter;
};
/* Some arches need doubleword alignment for their instructions and/or data */
#define BPF_IMAGE_ALIGNMENT 8
struct bpf_binary_header {
u32 pages;
/* Some arches need word alignment for their instructions */
u8 image[] __aligned(4);
u8 image[] __aligned(BPF_IMAGE_ALIGNMENT);
};
struct bpf_prog {

View File

@ -93,6 +93,7 @@ extern void *vzalloc(unsigned long size);
extern void *vmalloc_user(unsigned long size);
extern void *vmalloc_node(unsigned long size, int node);
extern void *vzalloc_node(unsigned long size, int node);
extern void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags);
extern void *vmalloc_exec(unsigned long size);
extern void *vmalloc_32(unsigned long size);
extern void *vmalloc_32_user(unsigned long size);

View File

@ -116,6 +116,7 @@
#define AUDIT_FANOTIFY 1331 /* Fanotify access decision */
#define AUDIT_TIME_INJOFFSET 1332 /* Timekeeping offset injected */
#define AUDIT_TIME_ADJNTPVAL 1333 /* NTP value adjustment */
#define AUDIT_BPF 1334 /* BPF subsystem */
#define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
#define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */

View File

@ -201,6 +201,8 @@ enum bpf_attach_type {
BPF_CGROUP_GETSOCKOPT,
BPF_CGROUP_SETSOCKOPT,
BPF_TRACE_RAW_TP,
BPF_TRACE_FENTRY,
BPF_TRACE_FEXIT,
__MAX_BPF_ATTACH_TYPE
};
@ -346,6 +348,9 @@ enum bpf_attach_type {
/* Clone map from listener for newly accepted socket */
#define BPF_F_CLONE (1U << 9)
/* Enable memory-mapping BPF map */
#define BPF_F_MMAPABLE (1U << 10)
/* flags for BPF_PROG_QUERY */
#define BPF_F_QUERY_EFFECTIVE (1U << 0)
@ -423,6 +428,7 @@ union bpf_attr {
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */
__u32 attach_prog_fd; /* 0 to attach to vmlinux */
};
struct { /* anonymous struct used by BPF_OBJ_* commands */

View File

@ -2545,7 +2545,7 @@ void __audit_ntp_log(const struct audit_ntp_data *ad)
audit_log_ntp_val(ad, "adjust", AUDIT_NTP_ADJUST);
}
static void audit_log_task(struct audit_buffer *ab)
void audit_log_task(struct audit_buffer *ab)
{
kuid_t auid, uid;
kgid_t gid;

View File

@ -6,6 +6,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o
ifeq ($(CONFIG_NET),y)
obj-$(CONFIG_BPF_SYSCALL) += devmap.o

View File

@ -14,7 +14,7 @@
#include "map_in_map.h"
#define ARRAY_CREATE_FLAG_MASK \
(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
(BPF_F_NUMA_NODE | BPF_F_MMAPABLE | BPF_F_ACCESS_MASK)
static void bpf_array_free_percpu(struct bpf_array *array)
{
@ -59,6 +59,10 @@ int array_map_alloc_check(union bpf_attr *attr)
(percpu && numa_node != NUMA_NO_NODE))
return -EINVAL;
if (attr->map_type != BPF_MAP_TYPE_ARRAY &&
attr->map_flags & BPF_F_MMAPABLE)
return -EINVAL;
if (attr->value_size > KMALLOC_MAX_SIZE)
/* if value_size is bigger, the user space won't be able to
* access the elements.
@ -102,10 +106,19 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
}
array_size = sizeof(*array);
if (percpu)
if (percpu) {
array_size += (u64) max_entries * sizeof(void *);
else
array_size += (u64) max_entries * elem_size;
} else {
/* rely on vmalloc() to return page-aligned memory and
* ensure array->value is exactly page-aligned
*/
if (attr->map_flags & BPF_F_MMAPABLE) {
array_size = PAGE_ALIGN(array_size);
array_size += PAGE_ALIGN((u64) max_entries * elem_size);
} else {
array_size += (u64) max_entries * elem_size;
}
}
/* make sure there is no u32 overflow later in round_up() */
cost = array_size;
@ -117,7 +130,20 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
return ERR_PTR(ret);
/* allocate all map elements and zero-initialize them */
array = bpf_map_area_alloc(array_size, numa_node);
if (attr->map_flags & BPF_F_MMAPABLE) {
void *data;
/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
data = bpf_map_area_mmapable_alloc(array_size, numa_node);
if (!data) {
bpf_map_charge_finish(&mem);
return ERR_PTR(-ENOMEM);
}
array = data + PAGE_ALIGN(sizeof(struct bpf_array))
- offsetof(struct bpf_array, value);
} else {
array = bpf_map_area_alloc(array_size, numa_node);
}
if (!array) {
bpf_map_charge_finish(&mem);
return ERR_PTR(-ENOMEM);
@ -350,6 +376,11 @@ static int array_map_delete_elem(struct bpf_map *map, void *key)
return -EINVAL;
}
static void *array_map_vmalloc_addr(struct bpf_array *array)
{
return (void *)round_down((unsigned long)array, PAGE_SIZE);
}
/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
static void array_map_free(struct bpf_map *map)
{
@ -365,7 +396,10 @@ static void array_map_free(struct bpf_map *map)
if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
bpf_array_free_percpu(array);
bpf_map_area_free(array);
if (array->map.map_flags & BPF_F_MMAPABLE)
bpf_map_area_free(array_map_vmalloc_addr(array));
else
bpf_map_area_free(array);
}
static void array_map_seq_show_elem(struct bpf_map *map, void *key,
@ -444,6 +478,17 @@ static int array_map_check_btf(const struct bpf_map *map,
return 0;
}
static int array_map_mmap(struct bpf_map *map, struct vm_area_struct *vma)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
pgoff_t pgoff = PAGE_ALIGN(sizeof(*array)) >> PAGE_SHIFT;
if (!(map->map_flags & BPF_F_MMAPABLE))
return -EINVAL;
return remap_vmalloc_range(vma, array_map_vmalloc_addr(array), pgoff);
}
const struct bpf_map_ops array_map_ops = {
.map_alloc_check = array_map_alloc_check,
.map_alloc = array_map_alloc,
@ -455,6 +500,7 @@ const struct bpf_map_ops array_map_ops = {
.map_gen_lookup = array_map_gen_lookup,
.map_direct_value_addr = array_map_direct_value_addr,
.map_direct_value_meta = array_map_direct_value_meta,
.map_mmap = array_map_mmap,
.map_seq_show_elem = array_map_seq_show_elem,
.map_check_btf = array_map_check_btf,
};

View File

@ -2,6 +2,8 @@
/* Copyright (c) 2018 Facebook */
#include <uapi/linux/btf.h>
#include <uapi/linux/bpf.h>
#include <uapi/linux/bpf_perf_event.h>
#include <uapi/linux/types.h>
#include <linux/seq_file.h>
#include <linux/compiler.h>
@ -16,6 +18,9 @@
#include <linux/sort.h>
#include <linux/bpf_verifier.h>
#include <linux/btf.h>
#include <linux/skmsg.h>
#include <linux/perf_event.h>
#include <net/sock.h>
/* BTF (BPF Type Format) is the meta data format which describes
* the data types of BPF program/map. Hence, it basically focus
@ -1036,6 +1041,82 @@ static const struct resolve_vertex *env_stack_peak(struct btf_verifier_env *env)
return env->top_stack ? &env->stack[env->top_stack - 1] : NULL;
}
/* Resolve the size of a passed-in "type"
*
* type: is an array (e.g. u32 array[x][y])
* return type: type "u32[x][y]", i.e. BTF_KIND_ARRAY,
* *type_size: (x * y * sizeof(u32)). Hence, *type_size always
* corresponds to the return type.
* *elem_type: u32
* *total_nelems: (x * y). Hence, individual elem size is
* (*type_size / *total_nelems)
*
* type: is not an array (e.g. const struct X)
* return type: type "struct X"
* *type_size: sizeof(struct X)
* *elem_type: same as return type ("struct X")
* *total_nelems: 1
*/
static const struct btf_type *
btf_resolve_size(const struct btf *btf, const struct btf_type *type,
u32 *type_size, const struct btf_type **elem_type,
u32 *total_nelems)
{
const struct btf_type *array_type = NULL;
const struct btf_array *array;
u32 i, size, nelems = 1;
for (i = 0; i < MAX_RESOLVE_DEPTH; i++) {
switch (BTF_INFO_KIND(type->info)) {
/* type->size can be used */
case BTF_KIND_INT:
case BTF_KIND_STRUCT:
case BTF_KIND_UNION:
case BTF_KIND_ENUM:
size = type->size;
goto resolved;
case BTF_KIND_PTR:
size = sizeof(void *);
goto resolved;
/* Modifiers */
case BTF_KIND_TYPEDEF:
case BTF_KIND_VOLATILE:
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
type = btf_type_by_id(btf, type->type);
break;
case BTF_KIND_ARRAY:
if (!array_type)
array_type = type;
array = btf_type_array(type);
if (nelems && array->nelems > U32_MAX / nelems)
return ERR_PTR(-EINVAL);
nelems *= array->nelems;
type = btf_type_by_id(btf, array->type);
break;
/* type without size */
default:
return ERR_PTR(-EINVAL);
}
}
return ERR_PTR(-EINVAL);
resolved:
if (nelems && size > U32_MAX / nelems)
return ERR_PTR(-EINVAL);
*type_size = nelems * size;
*total_nelems = nelems;
*elem_type = type;
return array_type ? : type;
}
/* The input param "type_id" must point to a needs_resolve type */
static const struct btf_type *btf_type_id_resolve(const struct btf *btf,
u32 *type_id)
@ -3363,13 +3444,112 @@ errout:
extern char __weak _binary__btf_vmlinux_bin_start[];
extern char __weak _binary__btf_vmlinux_bin_end[];
extern struct btf *btf_vmlinux;
#define BPF_MAP_TYPE(_id, _ops)
static union {
struct bpf_ctx_convert {
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
prog_ctx_type _id##_prog; \
kern_ctx_type _id##_kern;
#include <linux/bpf_types.h>
#undef BPF_PROG_TYPE
} *__t;
/* 't' is written once under lock. Read many times. */
const struct btf_type *t;
} bpf_ctx_convert;
enum {
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
__ctx_convert##_id,
#include <linux/bpf_types.h>
#undef BPF_PROG_TYPE
};
static u8 bpf_ctx_convert_map[] = {
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
[_id] = __ctx_convert##_id,
#include <linux/bpf_types.h>
#undef BPF_PROG_TYPE
};
#undef BPF_MAP_TYPE
static const struct btf_member *
btf_get_prog_ctx_type(struct bpf_verifier_log *log, struct btf *btf,
const struct btf_type *t, enum bpf_prog_type prog_type)
{
const struct btf_type *conv_struct;
const struct btf_type *ctx_struct;
const struct btf_member *ctx_type;
const char *tname, *ctx_tname;
conv_struct = bpf_ctx_convert.t;
if (!conv_struct) {
bpf_log(log, "btf_vmlinux is malformed\n");
return NULL;
}
t = btf_type_by_id(btf, t->type);
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf, t->type);
if (!btf_type_is_struct(t)) {
/* Only pointer to struct is supported for now.
* That means that BPF_PROG_TYPE_TRACEPOINT with BTF
* is not supported yet.
* BPF_PROG_TYPE_RAW_TRACEPOINT is fine.
*/
bpf_log(log, "BPF program ctx type is not a struct\n");
return NULL;
}
tname = btf_name_by_offset(btf, t->name_off);
if (!tname) {
bpf_log(log, "BPF program ctx struct doesn't have a name\n");
return NULL;
}
/* prog_type is valid bpf program type. No need for bounds check. */
ctx_type = btf_type_member(conv_struct) + bpf_ctx_convert_map[prog_type] * 2;
/* ctx_struct is a pointer to prog_ctx_type in vmlinux.
* Like 'struct __sk_buff'
*/
ctx_struct = btf_type_by_id(btf_vmlinux, ctx_type->type);
if (!ctx_struct)
/* should not happen */
return NULL;
ctx_tname = btf_name_by_offset(btf_vmlinux, ctx_struct->name_off);
if (!ctx_tname) {
/* should not happen */
bpf_log(log, "Please fix kernel include/linux/bpf_types.h\n");
return NULL;
}
/* only compare that prog's ctx type name is the same as
* kernel expects. No need to compare field by field.
* It's ok for bpf prog to do:
* struct __sk_buff {};
* int socket_filter_bpf_prog(struct __sk_buff *skb)
* { // no fields of skb are ever used }
*/
if (strcmp(ctx_tname, tname))
return NULL;
return ctx_type;
}
static int btf_translate_to_vmlinux(struct bpf_verifier_log *log,
struct btf *btf,
const struct btf_type *t,
enum bpf_prog_type prog_type)
{
const struct btf_member *prog_ctx_type, *kern_ctx_type;
prog_ctx_type = btf_get_prog_ctx_type(log, btf, t, prog_type);
if (!prog_ctx_type)
return -ENOENT;
kern_ctx_type = prog_ctx_type + 1;
return kern_ctx_type->type;
}
struct btf *btf_parse_vmlinux(void)
{
struct btf_verifier_env *env = NULL;
struct bpf_verifier_log *log;
struct btf *btf = NULL;
int err;
int err, i;
env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
if (!env)
@ -3403,6 +3583,26 @@ struct btf *btf_parse_vmlinux(void)
if (err)
goto errout;
/* find struct bpf_ctx_convert for type checking later */
for (i = 1; i <= btf->nr_types; i++) {
const struct btf_type *t;
const char *tname;
t = btf_type_by_id(btf, i);
if (!__btf_type_is_struct(t))
continue;
tname = __btf_name_by_offset(btf, t->name_off);
if (!strcmp(tname, "bpf_ctx_convert")) {
/* btf_parse_vmlinux() runs under bpf_verifier_lock */
bpf_ctx_convert.t = t;
break;
}
}
if (i > btf->nr_types) {
err = -ENOENT;
goto errout;
}
btf_verifier_env_free(env);
refcount_set(&btf->refcnt, 1);
return btf;
@ -3416,17 +3616,29 @@ errout:
return ERR_PTR(err);
}
extern struct btf *btf_vmlinux;
struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog)
{
struct bpf_prog *tgt_prog = prog->aux->linked_prog;
if (tgt_prog) {
return tgt_prog->aux->btf;
} else {
return btf_vmlinux;
}
}
bool btf_ctx_access(int off, int size, enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
const struct btf_type *t = prog->aux->attach_func_proto;
struct bpf_prog *tgt_prog = prog->aux->linked_prog;
struct btf *btf = bpf_prog_get_target_btf(prog);
const char *tname = prog->aux->attach_func_name;
struct bpf_verifier_log *log = info->log;
const struct btf_param *args;
u32 nr_args, arg;
int ret;
if (off % 8) {
bpf_log(log, "func '%s' offset %d is not multiple of 8\n",
@ -3435,22 +3647,34 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
}
arg = off / 8;
args = (const struct btf_param *)(t + 1);
nr_args = btf_type_vlen(t);
/* if (t == NULL) Fall back to default BPF prog with 5 u64 arguments */
nr_args = t ? btf_type_vlen(t) : 5;
if (prog->aux->attach_btf_trace) {
/* skip first 'void *__data' argument in btf_trace_##name typedef */
args++;
nr_args--;
}
if (arg >= nr_args) {
bpf_log(log, "func '%s' doesn't have %d-th argument\n",
tname, arg);
return false;
}
t = btf_type_by_id(btf_vmlinux, args[arg].type);
if (prog->expected_attach_type == BPF_TRACE_FEXIT &&
arg == nr_args) {
if (!t)
/* Default prog with 5 args. 6th arg is retval. */
return true;
/* function return type */
t = btf_type_by_id(btf, t->type);
} else if (arg >= nr_args) {
bpf_log(log, "func '%s' doesn't have %d-th argument\n",
tname, arg + 1);
return false;
} else {
if (!t)
/* Default prog with 5 args */
return true;
t = btf_type_by_id(btf, args[arg].type);
}
/* skip modifiers */
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf_vmlinux, t->type);
t = btf_type_by_id(btf, t->type);
if (btf_type_is_int(t))
/* accessing a scalar */
return true;
@ -3458,7 +3682,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
bpf_log(log,
"func '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
tname, arg,
__btf_name_by_offset(btf_vmlinux, t->name_off),
__btf_name_by_offset(btf, t->name_off),
btf_kind_str[BTF_INFO_KIND(t->info)]);
return false;
}
@ -3473,10 +3697,19 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
info->reg_type = PTR_TO_BTF_ID;
info->btf_id = t->type;
t = btf_type_by_id(btf_vmlinux, t->type);
if (tgt_prog) {
ret = btf_translate_to_vmlinux(log, btf, t, tgt_prog->type);
if (ret > 0) {
info->btf_id = ret;
return true;
} else {
return false;
}
}
t = btf_type_by_id(btf, t->type);
/* skip modifiers */
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf_vmlinux, t->type);
t = btf_type_by_id(btf, t->type);
if (!btf_type_is_struct(t)) {
bpf_log(log,
"func '%s' arg%d type %s is not a struct\n",
@ -3485,7 +3718,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
}
bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
tname, arg, info->btf_id, btf_kind_str[BTF_INFO_KIND(t->info)],
__btf_name_by_offset(btf_vmlinux, t->name_off));
__btf_name_by_offset(btf, t->name_off));
return true;
}
@ -3494,10 +3727,10 @@ int btf_struct_access(struct bpf_verifier_log *log,
enum bpf_access_type atype,
u32 *next_btf_id)
{
u32 i, moff, mtrue_end, msize = 0, total_nelems = 0;
const struct btf_type *mtype, *elem_type = NULL;
const struct btf_member *member;
const struct btf_type *mtype;
const char *tname, *mname;
int i, moff = 0, msize;
again:
tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
@ -3507,40 +3740,88 @@ again:
}
for_each_member(i, t, member) {
/* offset of the field in bits */
moff = btf_member_bit_offset(t, member);
if (btf_member_bitfield_size(t, member))
/* bitfields are not supported yet */
continue;
if (off + size <= moff / 8)
/* offset of the field in bytes */
moff = btf_member_bit_offset(t, member) / 8;
if (off + size <= moff)
/* won't find anything, field is already too far */
break;
/* In case of "off" is pointing to holes of a struct */
if (off < moff)
continue;
/* type of the field */
mtype = btf_type_by_id(btf_vmlinux, member->type);
mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
/* skip modifiers */
while (btf_type_is_modifier(mtype))
mtype = btf_type_by_id(btf_vmlinux, mtype->type);
if (btf_type_is_array(mtype))
/* array deref is not supported yet */
continue;
if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
mtype = btf_resolve_size(btf_vmlinux, mtype, &msize,
&elem_type, &total_nelems);
if (IS_ERR(mtype)) {
bpf_log(log, "field %s doesn't have size\n", mname);
return -EFAULT;
}
if (btf_type_is_ptr(mtype))
msize = 8;
else
msize = mtype->size;
if (off >= moff / 8 + msize)
mtrue_end = moff + msize;
if (off >= mtrue_end)
/* no overlap with member, keep iterating */
continue;
if (btf_type_is_array(mtype)) {
u32 elem_idx;
/* btf_resolve_size() above helps to
* linearize a multi-dimensional array.
*
* The logic here is treating an array
* in a struct as the following way:
*
* struct outer {
* struct inner array[2][2];
* };
*
* looks like:
*
* struct outer {
* struct inner array_elem0;
* struct inner array_elem1;
* struct inner array_elem2;
* struct inner array_elem3;
* };
*
* When accessing outer->array[1][0], it moves
* moff to "array_elem2", set mtype to
* "struct inner", and msize also becomes
* sizeof(struct inner). Then most of the
* remaining logic will fall through without
* caring the current member is an array or
* not.
*
* Unlike mtype/msize/moff, mtrue_end does not
* change. The naming difference ("_true") tells
* that it is not always corresponding to
* the current mtype/msize/moff.
* It is the true end of the current
* member (i.e. array in this case). That
* will allow an int array to be accessed like
* a scratch space,
* i.e. allow access beyond the size of
* the array's element as long as it is
* within the mtrue_end boundary.
*/
/* skip empty array */
if (moff == mtrue_end)
continue;
msize /= total_nelems;
elem_idx = (off - moff) / msize;
moff += elem_idx * msize;
mtype = elem_type;
}
/* the 'off' we're looking for is either equal to start
* of this field or inside of this struct
*/
@ -3549,20 +3830,20 @@ again:
t = mtype;
/* adjust offset we're looking for */
off -= moff / 8;
off -= moff;
goto again;
}
if (msize != size) {
/* field access size doesn't match */
bpf_log(log,
"cannot access %d bytes in struct %s field %s that has size %d\n",
size, tname, mname, msize);
return -EACCES;
}
if (btf_type_is_ptr(mtype)) {
const struct btf_type *stype;
if (msize != size || off != moff) {
bpf_log(log,
"cannot access ptr member %s with moff %u in struct %s with off %u size %u\n",
mname, moff, tname, off, size);
return -EACCES;
}
stype = btf_type_by_id(btf_vmlinux, mtype->type);
/* skip modifiers */
while (btf_type_is_modifier(stype))
@ -3572,14 +3853,28 @@ again:
return PTR_TO_BTF_ID;
}
}
/* all other fields are treated as scalars */
/* Allow more flexible access within an int as long as
* it is within mtrue_end.
* Since mtrue_end could be the end of an array,
* that also allows using an array of int as a scratch
* space. e.g. skb->cb[].
*/
if (off + size > mtrue_end) {
bpf_log(log,
"access beyond the end of member %s (mend:%u) in struct %s with off %u size %u\n",
mname, mtrue_end, tname, off, size);
return -EACCES;
}
return SCALAR_VALUE;
}
bpf_log(log, "struct %s doesn't have field at offset %d\n", tname, off);
return -EINVAL;
}
u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *fn, int arg)
static int __btf_resolve_helper_id(struct bpf_verifier_log *log, void *fn,
int arg)
{
char fnname[KSYM_SYMBOL_LEN + 4] = "btf_";
const struct btf_param *args;
@ -3647,6 +3942,185 @@ u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *fn, int arg)
return btf_id;
}
int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int arg)
{
int *btf_id = &fn->btf_id[arg];
int ret;
if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID)
return -EINVAL;
ret = READ_ONCE(*btf_id);
if (ret)
return ret;
/* ok to race the search. The result is the same */
ret = __btf_resolve_helper_id(log, fn->func, arg);
if (!ret) {
/* Function argument cannot be type 'void' */
bpf_log(log, "BTF resolution bug\n");
return -EFAULT;
}
WRITE_ONCE(*btf_id, ret);
return ret;
}
static int __get_type_size(struct btf *btf, u32 btf_id,
const struct btf_type **bad_type)
{
const struct btf_type *t;
if (!btf_id)
/* void */
return 0;
t = btf_type_by_id(btf, btf_id);
while (t && btf_type_is_modifier(t))
t = btf_type_by_id(btf, t->type);
if (!t)
return -EINVAL;
if (btf_type_is_ptr(t))
/* kernel size of pointer. Not BPF's size of pointer*/
return sizeof(void *);
if (btf_type_is_int(t) || btf_type_is_enum(t))
return t->size;
*bad_type = t;
return -EINVAL;
}
int btf_distill_func_proto(struct bpf_verifier_log *log,
struct btf *btf,
const struct btf_type *func,
const char *tname,
struct btf_func_model *m)
{
const struct btf_param *args;
const struct btf_type *t;
u32 i, nargs;
int ret;
if (!func) {
/* BTF function prototype doesn't match the verifier types.
* Fall back to 5 u64 args.
*/
for (i = 0; i < 5; i++)
m->arg_size[i] = 8;
m->ret_size = 8;
m->nr_args = 5;
return 0;
}
args = (const struct btf_param *)(func + 1);
nargs = btf_type_vlen(func);
if (nargs >= MAX_BPF_FUNC_ARGS) {
bpf_log(log,
"The function %s has %d arguments. Too many.\n",
tname, nargs);
return -EINVAL;
}
ret = __get_type_size(btf, func->type, &t);
if (ret < 0) {
bpf_log(log,
"The function %s return type %s is unsupported.\n",
tname, btf_kind_str[BTF_INFO_KIND(t->info)]);
return -EINVAL;
}
m->ret_size = ret;
for (i = 0; i < nargs; i++) {
ret = __get_type_size(btf, args[i].type, &t);
if (ret < 0) {
bpf_log(log,
"The function %s arg%d type %s is unsupported.\n",
tname, i, btf_kind_str[BTF_INFO_KIND(t->info)]);
return -EINVAL;
}
m->arg_size[i] = ret;
}
m->nr_args = nargs;
return 0;
}
int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog)
{
struct bpf_verifier_state *st = env->cur_state;
struct bpf_func_state *func = st->frame[st->curframe];
struct bpf_reg_state *reg = func->regs;
struct bpf_verifier_log *log = &env->log;
struct bpf_prog *prog = env->prog;
struct btf *btf = prog->aux->btf;
const struct btf_param *args;
const struct btf_type *t;
u32 i, nargs, btf_id;
const char *tname;
if (!prog->aux->func_info)
return 0;
btf_id = prog->aux->func_info[subprog].type_id;
if (!btf_id)
return 0;
if (prog->aux->func_info_aux[subprog].unreliable)
return 0;
t = btf_type_by_id(btf, btf_id);
if (!t || !btf_type_is_func(t)) {
bpf_log(log, "BTF of subprog %d doesn't point to KIND_FUNC\n",
subprog);
return -EINVAL;
}
tname = btf_name_by_offset(btf, t->name_off);
t = btf_type_by_id(btf, t->type);
if (!t || !btf_type_is_func_proto(t)) {
bpf_log(log, "Invalid type of func %s\n", tname);
return -EINVAL;
}
args = (const struct btf_param *)(t + 1);
nargs = btf_type_vlen(t);
if (nargs > 5) {
bpf_log(log, "Function %s has %d > 5 args\n", tname, nargs);
goto out;
}
/* check that BTF function arguments match actual types that the
* verifier sees.
*/
for (i = 0; i < nargs; i++) {
t = btf_type_by_id(btf, args[i].type);
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf, t->type);
if (btf_type_is_int(t) || btf_type_is_enum(t)) {
if (reg[i + 1].type == SCALAR_VALUE)
continue;
bpf_log(log, "R%d is not a scalar\n", i + 1);
goto out;
}
if (btf_type_is_ptr(t)) {
if (reg[i + 1].type == SCALAR_VALUE) {
bpf_log(log, "R%d is not a pointer\n", i + 1);
goto out;
}
/* If program is passing PTR_TO_CTX into subprogram
* check that BTF type matches.
*/
if (reg[i + 1].type == PTR_TO_CTX &&
!btf_get_prog_ctx_type(log, btf, t, prog->type))
goto out;
/* All other pointers are ok */
continue;
}
bpf_log(log, "Unrecognized argument type %s\n",
btf_kind_str[BTF_INFO_KIND(t->info)]);
goto out;
}
return 0;
out:
/* LLVM optimizations can remove arguments from static functions. */
bpf_log(log,
"Type info disagrees with actual arguments due to compiler optimizations\n");
prog->aux->func_info_aux[subprog].unreliable = true;
return 0;
}
void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
struct seq_file *m)
{

View File

@ -31,6 +31,7 @@
#include <linux/rcupdate.h>
#include <linux/perf_event.h>
#include <linux/extable.h>
#include <linux/log2.h>
#include <asm/unaligned.h>
/* Registers */
@ -815,6 +816,9 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
struct bpf_binary_header *hdr;
u32 size, hole, start, pages;
WARN_ON_ONCE(!is_power_of_2(alignment) ||
alignment > BPF_IMAGE_ALIGNMENT);
/* Most of BPF filters are really small, but if some of them
* fill a page, allow at least 128 extra bytes to insert a
* random section of illegal instructions.
@ -1569,7 +1573,7 @@ out:
#undef LDST
#define LDX_PROBE(SIZEOP, SIZE) \
LDX_PROBE_MEM_##SIZEOP: \
bpf_probe_read_kernel(&DST, SIZE, (const void *)(long) SRC); \
bpf_probe_read_kernel(&DST, SIZE, (const void *)(long) (SRC + insn->off)); \
CONT;
LDX_PROBE(B, 1)
LDX_PROBE(H, 2)
@ -2011,6 +2015,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
if (aux->prog->has_callchain_buf)
put_callchain_buffers();
#endif
bpf_trampoline_put(aux->trampoline);
for (i = 0; i < aux->func_cnt; i++)
bpf_jit_free(aux->func[i]);
if (aux->func_cnt) {
@ -2026,6 +2031,8 @@ void bpf_prog_free(struct bpf_prog *fp)
{
struct bpf_prog_aux *aux = fp->aux;
if (aux->linked_prog)
bpf_prog_put(aux->linked_prog);
INIT_WORK(&aux->work, bpf_prog_free_deferred);
schedule_work(&aux->work);
}
@ -2140,6 +2147,12 @@ int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to,
return -EFAULT;
}
int __weak bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
void *addr1, void *addr2)
{
return -ENOTSUPP;
}
DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
EXPORT_SYMBOL(bpf_stats_enabled_key);

View File

@ -31,10 +31,10 @@ static void *bpf_any_get(void *raw, enum bpf_type type)
{
switch (type) {
case BPF_TYPE_PROG:
raw = bpf_prog_inc(raw);
bpf_prog_inc(raw);
break;
case BPF_TYPE_MAP:
raw = bpf_map_inc(raw, true);
bpf_map_inc_with_uref(raw);
break;
default:
WARN_ON_ONCE(1);
@ -534,7 +534,8 @@ static struct bpf_prog *__get_prog_inode(struct inode *inode, enum bpf_prog_type
if (!bpf_prog_get_ok(prog, &type, false))
return ERR_PTR(-EINVAL);
return bpf_prog_inc(prog);
bpf_prog_inc(prog);
return prog;
}
struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type)

View File

@ -98,7 +98,7 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
return inner_map;
if (bpf_map_meta_equal(map->inner_map_meta, inner_map))
inner_map = bpf_map_inc(inner_map, false);
bpf_map_inc(inner_map);
else
inner_map = ERR_PTR(-EINVAL);

View File

@ -23,6 +23,7 @@
#include <linux/timekeeping.h>
#include <linux/ctype.h>
#include <linux/nospec.h>
#include <linux/audit.h>
#include <uapi/linux/btf.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
@ -43,7 +44,7 @@ static DEFINE_SPINLOCK(map_idr_lock);
int sysctl_unprivileged_bpf_disabled __read_mostly;
static const struct bpf_map_ops * const bpf_map_types[] = {
#define BPF_PROG_TYPE(_id, _ops)
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
#define BPF_MAP_TYPE(_id, _ops) \
[_id] = &_ops,
#include <linux/bpf_types.h>
@ -127,7 +128,7 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
return map;
}
void *bpf_map_area_alloc(u64 size, int numa_node)
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
{
/* We really just want to fail instead of triggering OOM killer
* under memory pressure, therefore we set __GFP_NORETRY to kmalloc,
@ -145,18 +146,33 @@ void *bpf_map_area_alloc(u64 size, int numa_node)
if (size >= SIZE_MAX)
return NULL;
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
/* kmalloc()'ed memory can't be mmap()'ed */
if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags,
numa_node);
if (area != NULL)
return area;
}
if (mmapable) {
BUG_ON(!PAGE_ALIGNED(size));
return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL |
__GFP_RETRY_MAYFAIL | flags);
}
return __vmalloc_node_flags_caller(size, numa_node,
GFP_KERNEL | __GFP_RETRY_MAYFAIL |
flags, __builtin_return_address(0));
}
void *bpf_map_area_alloc(u64 size, int numa_node)
{
return __bpf_map_area_alloc(size, numa_node, false);
}
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node)
{
return __bpf_map_area_alloc(size, numa_node, true);
}
void bpf_map_area_free(void *area)
{
kvfree(area);
@ -314,7 +330,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
static void bpf_map_put_uref(struct bpf_map *map)
{
if (atomic_dec_and_test(&map->usercnt)) {
if (atomic64_dec_and_test(&map->usercnt)) {
if (map->ops->map_release_uref)
map->ops->map_release_uref(map);
}
@ -325,7 +341,7 @@ static void bpf_map_put_uref(struct bpf_map *map)
*/
static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
{
if (atomic_dec_and_test(&map->refcnt)) {
if (atomic64_dec_and_test(&map->refcnt)) {
/* bpf_map_free_id() must be called first */
bpf_map_free_id(map, do_idr_lock);
btf_put(map->btf);
@ -428,6 +444,74 @@ static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
return -EINVAL;
}
/* called for any extra memory-mapped regions (except initial) */
static void bpf_map_mmap_open(struct vm_area_struct *vma)
{
struct bpf_map *map = vma->vm_file->private_data;
bpf_map_inc_with_uref(map);
if (vma->vm_flags & VM_WRITE) {
mutex_lock(&map->freeze_mutex);
map->writecnt++;
mutex_unlock(&map->freeze_mutex);
}
}
/* called for all unmapped memory region (including initial) */
static void bpf_map_mmap_close(struct vm_area_struct *vma)
{
struct bpf_map *map = vma->vm_file->private_data;
if (vma->vm_flags & VM_WRITE) {
mutex_lock(&map->freeze_mutex);
map->writecnt--;
mutex_unlock(&map->freeze_mutex);
}
bpf_map_put_with_uref(map);
}
static const struct vm_operations_struct bpf_map_default_vmops = {
.open = bpf_map_mmap_open,
.close = bpf_map_mmap_close,
};
static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
{
struct bpf_map *map = filp->private_data;
int err;
if (!map->ops->map_mmap || map_value_has_spin_lock(map))
return -ENOTSUPP;
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
mutex_lock(&map->freeze_mutex);
if ((vma->vm_flags & VM_WRITE) && map->frozen) {
err = -EPERM;
goto out;
}
/* set default open/close callbacks */
vma->vm_ops = &bpf_map_default_vmops;
vma->vm_private_data = map;
err = map->ops->map_mmap(map, vma);
if (err)
goto out;
bpf_map_inc_with_uref(map);
if (vma->vm_flags & VM_WRITE)
map->writecnt++;
out:
mutex_unlock(&map->freeze_mutex);
return err;
}
const struct file_operations bpf_map_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = bpf_map_show_fdinfo,
@ -435,6 +519,7 @@ const struct file_operations bpf_map_fops = {
.release = bpf_map_release,
.read = bpf_dummy_read,
.write = bpf_dummy_write,
.mmap = bpf_map_mmap,
};
int bpf_map_new_fd(struct bpf_map *map, int flags)
@ -578,8 +663,9 @@ static int map_create(union bpf_attr *attr)
if (err)
goto free_map;
atomic_set(&map->refcnt, 1);
atomic_set(&map->usercnt, 1);
atomic64_set(&map->refcnt, 1);
atomic64_set(&map->usercnt, 1);
mutex_init(&map->freeze_mutex);
if (attr->btf_key_type_id || attr->btf_value_type_id) {
struct btf *btf;
@ -656,21 +742,19 @@ struct bpf_map *__bpf_map_get(struct fd f)
return f.file->private_data;
}
/* prog's and map's refcnt limit */
#define BPF_MAX_REFCNT 32768
struct bpf_map *bpf_map_inc(struct bpf_map *map, bool uref)
void bpf_map_inc(struct bpf_map *map)
{
if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {
atomic_dec(&map->refcnt);
return ERR_PTR(-EBUSY);
}
if (uref)
atomic_inc(&map->usercnt);
return map;
atomic64_inc(&map->refcnt);
}
EXPORT_SYMBOL_GPL(bpf_map_inc);
void bpf_map_inc_with_uref(struct bpf_map *map)
{
atomic64_inc(&map->refcnt);
atomic64_inc(&map->usercnt);
}
EXPORT_SYMBOL_GPL(bpf_map_inc_with_uref);
struct bpf_map *bpf_map_get_with_uref(u32 ufd)
{
struct fd f = fdget(ufd);
@ -680,38 +764,30 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
if (IS_ERR(map))
return map;
map = bpf_map_inc(map, true);
bpf_map_inc_with_uref(map);
fdput(f);
return map;
}
/* map_idr_lock should have been held */
static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map,
bool uref)
static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
{
int refold;
refold = atomic_fetch_add_unless(&map->refcnt, 1, 0);
if (refold >= BPF_MAX_REFCNT) {
__bpf_map_put(map, false);
return ERR_PTR(-EBUSY);
}
refold = atomic64_fetch_add_unless(&map->refcnt, 1, 0);
if (!refold)
return ERR_PTR(-ENOENT);
if (uref)
atomic_inc(&map->usercnt);
atomic64_inc(&map->usercnt);
return map;
}
struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map)
{
spin_lock_bh(&map_idr_lock);
map = __bpf_map_inc_not_zero(map, uref);
map = __bpf_map_inc_not_zero(map, false);
spin_unlock_bh(&map_idr_lock);
return map;
@ -1176,6 +1252,13 @@ static int map_freeze(const union bpf_attr *attr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
mutex_lock(&map->freeze_mutex);
if (map->writecnt) {
err = -EBUSY;
goto err_put;
}
if (READ_ONCE(map->frozen)) {
err = -EBUSY;
goto err_put;
@ -1187,12 +1270,13 @@ static int map_freeze(const union bpf_attr *attr)
WRITE_ONCE(map->frozen, true);
err_put:
mutex_unlock(&map->freeze_mutex);
fdput(f);
return err;
}
static const struct bpf_prog_ops * const bpf_prog_types[] = {
#define BPF_PROG_TYPE(_id, _name) \
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
[_id] = & _name ## _prog_ops,
#define BPF_MAP_TYPE(_id, _ops)
#include <linux/bpf_types.h>
@ -1238,6 +1322,34 @@ static void free_used_maps(struct bpf_prog_aux *aux)
kfree(aux->used_maps);
}
enum bpf_event {
BPF_EVENT_LOAD,
BPF_EVENT_UNLOAD,
};
static const char * const bpf_event_audit_str[] = {
[BPF_EVENT_LOAD] = "LOAD",
[BPF_EVENT_UNLOAD] = "UNLOAD",
};
static void bpf_audit_prog(const struct bpf_prog *prog, enum bpf_event event)
{
bool has_task_context = event == BPF_EVENT_LOAD;
struct audit_buffer *ab;
if (audit_enabled == AUDIT_OFF)
return;
ab = audit_log_start(audit_context(), GFP_ATOMIC, AUDIT_BPF);
if (unlikely(!ab))
return;
if (has_task_context)
audit_log_task(ab);
audit_log_format(ab, "%sprog-id=%u event=%s",
has_task_context ? " " : "",
prog->aux->id, bpf_event_audit_str[event]);
audit_log_end(ab);
}
int __bpf_prog_charge(struct user_struct *user, u32 pages)
{
unsigned long memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
@ -1331,6 +1443,7 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
kvfree(aux->func_info);
kfree(aux->func_info_aux);
free_used_maps(aux);
bpf_prog_uncharge_memlock(aux->prog);
security_bpf_prog_free(aux);
@ -1351,8 +1464,9 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
{
if (atomic_dec_and_test(&prog->aux->refcnt)) {
if (atomic64_dec_and_test(&prog->aux->refcnt)) {
perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_UNLOAD, 0);
bpf_audit_prog(prog, BPF_EVENT_UNLOAD);
/* bpf_prog_free_id() must be called first */
bpf_prog_free_id(prog, do_idr_lock);
__bpf_prog_put_noref(prog, true);
@ -1457,13 +1571,9 @@ static struct bpf_prog *____bpf_prog_get(struct fd f)
return f.file->private_data;
}
struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
void bpf_prog_add(struct bpf_prog *prog, int i)
{
if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) {
atomic_sub(i, &prog->aux->refcnt);
return ERR_PTR(-EBUSY);
}
return prog;
atomic64_add(i, &prog->aux->refcnt);
}
EXPORT_SYMBOL_GPL(bpf_prog_add);
@ -1474,13 +1584,13 @@ void bpf_prog_sub(struct bpf_prog *prog, int i)
* path holds a reference to the program, thus atomic_sub() can
* be safely used in such cases!
*/
WARN_ON(atomic_sub_return(i, &prog->aux->refcnt) == 0);
WARN_ON(atomic64_sub_return(i, &prog->aux->refcnt) == 0);
}
EXPORT_SYMBOL_GPL(bpf_prog_sub);
struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
void bpf_prog_inc(struct bpf_prog *prog)
{
return bpf_prog_add(prog, 1);
atomic64_inc(&prog->aux->refcnt);
}
EXPORT_SYMBOL_GPL(bpf_prog_inc);
@ -1489,12 +1599,7 @@ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog)
{
int refold;
refold = atomic_fetch_add_unless(&prog->aux->refcnt, 1, 0);
if (refold >= BPF_MAX_REFCNT) {
__bpf_prog_put(prog, false);
return ERR_PTR(-EBUSY);
}
refold = atomic64_fetch_add_unless(&prog->aux->refcnt, 1, 0);
if (!refold)
return ERR_PTR(-ENOENT);
@ -1532,7 +1637,7 @@ static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type,
goto out;
}
prog = bpf_prog_inc(prog);
bpf_prog_inc(prog);
out:
fdput(f);
return prog;
@ -1579,7 +1684,7 @@ static void bpf_prog_load_fixup_attach_type(union bpf_attr *attr)
static int
bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
enum bpf_attach_type expected_attach_type,
u32 btf_id)
u32 btf_id, u32 prog_fd)
{
switch (prog_type) {
case BPF_PROG_TYPE_TRACING:
@ -1587,7 +1692,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
return -EINVAL;
break;
default:
if (btf_id)
if (btf_id || prog_fd)
return -EINVAL;
break;
}
@ -1638,7 +1743,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
}
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD attach_btf_id
#define BPF_PROG_LOAD_LAST_FIELD attach_prog_fd
static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
{
@ -1681,7 +1786,8 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
bpf_prog_load_fixup_attach_type(attr);
if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
attr->attach_btf_id))
attr->attach_btf_id,
attr->attach_prog_fd))
return -EINVAL;
/* plain bpf_prog allocation */
@ -1691,6 +1797,16 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
prog->expected_attach_type = attr->expected_attach_type;
prog->aux->attach_btf_id = attr->attach_btf_id;
if (attr->attach_prog_fd) {
struct bpf_prog *tgt_prog;
tgt_prog = bpf_prog_get(attr->attach_prog_fd);
if (IS_ERR(tgt_prog)) {
err = PTR_ERR(tgt_prog);
goto free_prog_nouncharge;
}
prog->aux->linked_prog = tgt_prog;
}
prog->aux->offload_requested = !!attr->prog_ifindex;
@ -1712,7 +1828,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
prog->orig_prog = NULL;
prog->jited = 0;
atomic_set(&prog->aux->refcnt, 1);
atomic64_set(&prog->aux->refcnt, 1);
prog->gpl_compatible = is_gpl ? 1 : 0;
if (bpf_prog_is_dev_bound(prog->aux)) {
@ -1760,6 +1876,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
*/
bpf_prog_kallsyms_add(prog);
perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_LOAD, 0);
bpf_audit_prog(prog, BPF_EVENT_LOAD);
err = bpf_prog_new_fd(prog);
if (err < 0)
@ -1802,6 +1919,49 @@ static int bpf_obj_get(const union bpf_attr *attr)
attr->file_flags);
}
static int bpf_tracing_prog_release(struct inode *inode, struct file *filp)
{
struct bpf_prog *prog = filp->private_data;
WARN_ON_ONCE(bpf_trampoline_unlink_prog(prog));
bpf_prog_put(prog);
return 0;
}
static const struct file_operations bpf_tracing_prog_fops = {
.release = bpf_tracing_prog_release,
.read = bpf_dummy_read,
.write = bpf_dummy_write,
};
static int bpf_tracing_prog_attach(struct bpf_prog *prog)
{
int tr_fd, err;
if (prog->expected_attach_type != BPF_TRACE_FENTRY &&
prog->expected_attach_type != BPF_TRACE_FEXIT) {
err = -EINVAL;
goto out_put_prog;
}
err = bpf_trampoline_link_prog(prog);
if (err)
goto out_put_prog;
tr_fd = anon_inode_getfd("bpf-tracing-prog", &bpf_tracing_prog_fops,
prog, O_CLOEXEC);
if (tr_fd < 0) {
WARN_ON_ONCE(bpf_trampoline_unlink_prog(prog));
err = tr_fd;
goto out_put_prog;
}
return tr_fd;
out_put_prog:
bpf_prog_put(prog);
return err;
}
struct bpf_raw_tracepoint {
struct bpf_raw_event_map *btp;
struct bpf_prog *prog;
@ -1853,14 +2013,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
if (prog->type == BPF_PROG_TYPE_TRACING) {
if (attr->raw_tracepoint.name) {
/* raw_tp name should not be specified in raw_tp
* programs that were verified via in-kernel BTF info
/* The attach point for this category of programs
* should be specified via btf_id during program load.
*/
err = -EINVAL;
goto out_put_prog;
}
/* raw_tp name is taken from type name instead */
tp_name = prog->aux->attach_func_name;
if (prog->expected_attach_type == BPF_TRACE_RAW_TP)
tp_name = prog->aux->attach_func_name;
else
return bpf_tracing_prog_attach(prog);
} else {
if (strncpy_from_user(buf,
u64_to_user_ptr(attr->raw_tracepoint.name),

253
kernel/bpf/trampoline.c Normal file
View File

@ -0,0 +1,253 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2019 Facebook */
#include <linux/hash.h>
#include <linux/bpf.h>
#include <linux/filter.h>
/* btf_vmlinux has ~22k attachable functions. 1k htab is enough. */
#define TRAMPOLINE_HASH_BITS 10
#define TRAMPOLINE_TABLE_SIZE (1 << TRAMPOLINE_HASH_BITS)
static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
/* serializes access to trampoline_table */
static DEFINE_MUTEX(trampoline_mutex);
struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
{
struct bpf_trampoline *tr;
struct hlist_head *head;
void *image;
int i;
mutex_lock(&trampoline_mutex);
head = &trampoline_table[hash_64(key, TRAMPOLINE_HASH_BITS)];
hlist_for_each_entry(tr, head, hlist) {
if (tr->key == key) {
refcount_inc(&tr->refcnt);
goto out;
}
}
tr = kzalloc(sizeof(*tr), GFP_KERNEL);
if (!tr)
goto out;
/* is_root was checked earlier. No need for bpf_jit_charge_modmem() */
image = bpf_jit_alloc_exec(PAGE_SIZE);
if (!image) {
kfree(tr);
tr = NULL;
goto out;
}
tr->key = key;
INIT_HLIST_NODE(&tr->hlist);
hlist_add_head(&tr->hlist, head);
refcount_set(&tr->refcnt, 1);
mutex_init(&tr->mutex);
for (i = 0; i < BPF_TRAMP_MAX; i++)
INIT_HLIST_HEAD(&tr->progs_hlist[i]);
set_vm_flush_reset_perms(image);
/* Keep image as writeable. The alternative is to keep flipping ro/rw
* everytime new program is attached or detached.
*/
set_memory_x((long)image, 1);
tr->image = image;
out:
mutex_unlock(&trampoline_mutex);
return tr;
}
/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
* bytes on x86. Pick a number to fit into PAGE_SIZE / 2
*/
#define BPF_MAX_TRAMP_PROGS 40
static int bpf_trampoline_update(struct bpf_trampoline *tr)
{
void *old_image = tr->image + ((tr->selector + 1) & 1) * PAGE_SIZE/2;
void *new_image = tr->image + (tr->selector & 1) * PAGE_SIZE/2;
struct bpf_prog *progs_to_run[BPF_MAX_TRAMP_PROGS];
int fentry_cnt = tr->progs_cnt[BPF_TRAMP_FENTRY];
int fexit_cnt = tr->progs_cnt[BPF_TRAMP_FEXIT];
struct bpf_prog **progs, **fentry, **fexit;
u32 flags = BPF_TRAMP_F_RESTORE_REGS;
struct bpf_prog_aux *aux;
int err;
if (fentry_cnt + fexit_cnt == 0) {
err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_CALL_TO_NOP,
old_image, NULL);
tr->selector = 0;
goto out;
}
/* populate fentry progs */
fentry = progs = progs_to_run;
hlist_for_each_entry(aux, &tr->progs_hlist[BPF_TRAMP_FENTRY], tramp_hlist)
*progs++ = aux->prog;
/* populate fexit progs */
fexit = progs;
hlist_for_each_entry(aux, &tr->progs_hlist[BPF_TRAMP_FEXIT], tramp_hlist)
*progs++ = aux->prog;
if (fexit_cnt)
flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
err = arch_prepare_bpf_trampoline(new_image, &tr->func.model, flags,
fentry, fentry_cnt,
fexit, fexit_cnt,
tr->func.addr);
if (err)
goto out;
if (tr->selector)
/* progs already running at this address */
err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_CALL_TO_CALL,
old_image, new_image);
else
/* first time registering */
err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_NOP_TO_CALL,
NULL, new_image);
if (err)
goto out;
tr->selector++;
out:
return err;
}
static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(enum bpf_attach_type t)
{
switch (t) {
case BPF_TRACE_FENTRY:
return BPF_TRAMP_FENTRY;
default:
return BPF_TRAMP_FEXIT;
}
}
int bpf_trampoline_link_prog(struct bpf_prog *prog)
{
enum bpf_tramp_prog_type kind;
struct bpf_trampoline *tr;
int err = 0;
tr = prog->aux->trampoline;
kind = bpf_attach_type_to_tramp(prog->expected_attach_type);
mutex_lock(&tr->mutex);
if (tr->progs_cnt[BPF_TRAMP_FENTRY] + tr->progs_cnt[BPF_TRAMP_FEXIT]
>= BPF_MAX_TRAMP_PROGS) {
err = -E2BIG;
goto out;
}
if (!hlist_unhashed(&prog->aux->tramp_hlist)) {
/* prog already linked */
err = -EBUSY;
goto out;
}
hlist_add_head(&prog->aux->tramp_hlist, &tr->progs_hlist[kind]);
tr->progs_cnt[kind]++;
err = bpf_trampoline_update(prog->aux->trampoline);
if (err) {
hlist_del(&prog->aux->tramp_hlist);
tr->progs_cnt[kind]--;
}
out:
mutex_unlock(&tr->mutex);
return err;
}
/* bpf_trampoline_unlink_prog() should never fail. */
int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
{
enum bpf_tramp_prog_type kind;
struct bpf_trampoline *tr;
int err;
tr = prog->aux->trampoline;
kind = bpf_attach_type_to_tramp(prog->expected_attach_type);
mutex_lock(&tr->mutex);
hlist_del(&prog->aux->tramp_hlist);
tr->progs_cnt[kind]--;
err = bpf_trampoline_update(prog->aux->trampoline);
mutex_unlock(&tr->mutex);
return err;
}
void bpf_trampoline_put(struct bpf_trampoline *tr)
{
if (!tr)
return;
mutex_lock(&trampoline_mutex);
if (!refcount_dec_and_test(&tr->refcnt))
goto out;
WARN_ON_ONCE(mutex_is_locked(&tr->mutex));
if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FENTRY])))
goto out;
if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
goto out;
bpf_jit_free_exec(tr->image);
hlist_del(&tr->hlist);
kfree(tr);
out:
mutex_unlock(&trampoline_mutex);
}
/* The logic is similar to BPF_PROG_RUN, but with explicit rcu and preempt that
* are needed for trampoline. The macro is split into
* call _bpf_prog_enter
* call prog->bpf_func
* call __bpf_prog_exit
*/
u64 notrace __bpf_prog_enter(void)
{
u64 start = 0;
rcu_read_lock();
preempt_disable();
if (static_branch_unlikely(&bpf_stats_enabled_key))
start = sched_clock();
return start;
}
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
{
struct bpf_prog_stats *stats;
if (static_branch_unlikely(&bpf_stats_enabled_key) &&
/* static_key could be enabled in __bpf_prog_enter
* and disabled in __bpf_prog_exit.
* And vice versa.
* Hence check that 'start' is not zero.
*/
start) {
stats = this_cpu_ptr(prog->aux->stats);
u64_stats_update_begin(&stats->syncp);
stats->cnt++;
stats->nsecs += sched_clock() - start;
u64_stats_update_end(&stats->syncp);
}
preempt_enable();
rcu_read_unlock();
}
int __weak
arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
struct bpf_prog **fentry_progs, int fentry_cnt,
struct bpf_prog **fexit_progs, int fexit_cnt,
void *orig_call)
{
return -ENOTSUPP;
}
static int __init init_trampolines(void)
{
int i;
for (i = 0; i < TRAMPOLINE_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&trampoline_table[i]);
return 0;
}
late_initcall(init_trampolines);

View File

@ -23,7 +23,7 @@
#include "disasm.h"
static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
#define BPF_PROG_TYPE(_id, _name) \
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
[_id] = & _name ## _verifier_ops,
#define BPF_MAP_TYPE(_id, _ops)
#include <linux/bpf_types.h>
@ -3970,6 +3970,9 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
/* only increment it after check_reg_arg() finished */
state->curframe++;
if (btf_check_func_arg_match(env, subprog))
return -EINVAL;
/* and go analyze first insn of the callee */
*insn_idx = target_insn;
@ -4147,11 +4150,9 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
meta.func_id = func_id;
/* check args */
for (i = 0; i < 5; i++) {
if (fn->arg_type[i] == ARG_PTR_TO_BTF_ID) {
if (!fn->btf_id[i])
fn->btf_id[i] = btf_resolve_helper_id(&env->log, fn->func, i);
meta.btf_id = fn->btf_id[i];
}
err = btf_resolve_helper_id(&env->log, fn, i);
if (err > 0)
meta.btf_id = err;
err = check_func_arg(env, BPF_REG_1 + i, fn->arg_type[i], &meta);
if (err)
return err;
@ -6566,6 +6567,7 @@ static int check_btf_func(struct bpf_verifier_env *env,
u32 i, nfuncs, urec_size, min_size;
u32 krec_size = sizeof(struct bpf_func_info);
struct bpf_func_info *krecord;
struct bpf_func_info_aux *info_aux = NULL;
const struct btf_type *type;
struct bpf_prog *prog;
const struct btf *btf;
@ -6599,6 +6601,9 @@ static int check_btf_func(struct bpf_verifier_env *env,
krecord = kvcalloc(nfuncs, krec_size, GFP_KERNEL | __GFP_NOWARN);
if (!krecord)
return -ENOMEM;
info_aux = kcalloc(nfuncs, sizeof(*info_aux), GFP_KERNEL | __GFP_NOWARN);
if (!info_aux)
goto err_free;
for (i = 0; i < nfuncs; i++) {
ret = bpf_check_uarg_tail_zero(urecord, krec_size, urec_size);
@ -6650,29 +6655,31 @@ static int check_btf_func(struct bpf_verifier_env *env,
ret = -EINVAL;
goto err_free;
}
prev_offset = krecord[i].insn_off;
urecord += urec_size;
}
prog->aux->func_info = krecord;
prog->aux->func_info_cnt = nfuncs;
prog->aux->func_info_aux = info_aux;
return 0;
err_free:
kvfree(krecord);
kfree(info_aux);
return ret;
}
static void adjust_btf_func(struct bpf_verifier_env *env)
{
struct bpf_prog_aux *aux = env->prog->aux;
int i;
if (!env->prog->aux->func_info)
if (!aux->func_info)
return;
for (i = 0; i < env->subprog_cnt; i++)
env->prog->aux->func_info[i].insn_off = env->subprog_info[i].start;
aux->func_info[i].insn_off = env->subprog_info[i].start;
}
#define MIN_BPF_LINEINFO_SIZE (offsetof(struct bpf_line_info, line_col) + \
@ -7653,6 +7660,9 @@ static int do_check(struct bpf_verifier_env *env)
0 /* frameno */,
0 /* subprogno, zero == main subprog */);
if (btf_check_func_arg_match(env, 0))
return -EINVAL;
for (;;) {
struct bpf_insn *insn;
u8 class;
@ -8169,11 +8179,7 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
* will be used by the valid program until it's unloaded
* and all maps are released in free_used_maps()
*/
map = bpf_map_inc(map, false);
if (IS_ERR(map)) {
fdput(f);
return PTR_ERR(map);
}
bpf_map_inc(map);
aux->map_index = env->used_map_cnt;
env->used_maps[env->used_map_cnt++] = map;
@ -9380,10 +9386,17 @@ static void print_verification_stats(struct bpf_verifier_env *env)
static int check_attach_btf_id(struct bpf_verifier_env *env)
{
struct bpf_prog *prog = env->prog;
struct bpf_prog *tgt_prog = prog->aux->linked_prog;
u32 btf_id = prog->aux->attach_btf_id;
const char prefix[] = "btf_trace_";
int ret = 0, subprog = -1, i;
struct bpf_trampoline *tr;
const struct btf_type *t;
bool conservative = true;
const char *tname;
struct btf *btf;
long addr;
u64 key;
if (prog->type != BPF_PROG_TYPE_TRACING)
return 0;
@ -9392,19 +9405,47 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
verbose(env, "Tracing programs must provide btf_id\n");
return -EINVAL;
}
t = btf_type_by_id(btf_vmlinux, btf_id);
btf = bpf_prog_get_target_btf(prog);
if (!btf) {
verbose(env,
"FENTRY/FEXIT program can only be attached to another program annotated with BTF\n");
return -EINVAL;
}
t = btf_type_by_id(btf, btf_id);
if (!t) {
verbose(env, "attach_btf_id %u is invalid\n", btf_id);
return -EINVAL;
}
tname = btf_name_by_offset(btf_vmlinux, t->name_off);
tname = btf_name_by_offset(btf, t->name_off);
if (!tname) {
verbose(env, "attach_btf_id %u doesn't have a name\n", btf_id);
return -EINVAL;
}
if (tgt_prog) {
struct bpf_prog_aux *aux = tgt_prog->aux;
for (i = 0; i < aux->func_info_cnt; i++)
if (aux->func_info[i].type_id == btf_id) {
subprog = i;
break;
}
if (subprog == -1) {
verbose(env, "Subprog %s doesn't exist\n", tname);
return -EINVAL;
}
conservative = aux->func_info_aux[subprog].unreliable;
key = ((u64)aux->id) << 32 | btf_id;
} else {
key = btf_id;
}
switch (prog->expected_attach_type) {
case BPF_TRACE_RAW_TP:
if (tgt_prog) {
verbose(env,
"Only FENTRY/FEXIT progs are attachable to another BPF prog\n");
return -EINVAL;
}
if (!btf_type_is_typedef(t)) {
verbose(env, "attach_btf_id %u is not a typedef\n",
btf_id);
@ -9416,11 +9457,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
return -EINVAL;
}
tname += sizeof(prefix) - 1;
t = btf_type_by_id(btf_vmlinux, t->type);
t = btf_type_by_id(btf, t->type);
if (!btf_type_is_ptr(t))
/* should never happen in valid vmlinux build */
return -EINVAL;
t = btf_type_by_id(btf_vmlinux, t->type);
t = btf_type_by_id(btf, t->type);
if (!btf_type_is_func_proto(t))
/* should never happen in valid vmlinux build */
return -EINVAL;
@ -9432,6 +9473,66 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
prog->aux->attach_func_proto = t;
prog->aux->attach_btf_trace = true;
return 0;
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
if (!btf_type_is_func(t)) {
verbose(env, "attach_btf_id %u is not a function\n",
btf_id);
return -EINVAL;
}
t = btf_type_by_id(btf, t->type);
if (!btf_type_is_func_proto(t))
return -EINVAL;
tr = bpf_trampoline_lookup(key);
if (!tr)
return -ENOMEM;
prog->aux->attach_func_name = tname;
/* t is either vmlinux type or another program's type */
prog->aux->attach_func_proto = t;
mutex_lock(&tr->mutex);
if (tr->func.addr) {
prog->aux->trampoline = tr;
goto out;
}
if (tgt_prog && conservative) {
prog->aux->attach_func_proto = NULL;
t = NULL;
}
ret = btf_distill_func_proto(&env->log, btf, t,
tname, &tr->func.model);
if (ret < 0)
goto out;
if (tgt_prog) {
if (!tgt_prog->jited) {
/* for now */
verbose(env, "Can trace only JITed BPF progs\n");
ret = -EINVAL;
goto out;
}
if (tgt_prog->type == BPF_PROG_TYPE_TRACING) {
/* prevent cycles */
verbose(env, "Cannot recursively attach\n");
ret = -EINVAL;
goto out;
}
addr = (long) tgt_prog->aux->func[subprog]->bpf_func;
} else {
addr = kallsyms_lookup_name(tname);
if (!addr) {
verbose(env,
"The address of function %s cannot be found\n",
tname);
ret = -ENOENT;
goto out;
}
}
tr->func.addr = (void *)addr;
prog->aux->trampoline = tr;
out:
mutex_unlock(&tr->mutex);
if (ret)
bpf_trampoline_put(tr);
return ret;
default:
return -EINVAL;
}

View File

@ -11,10 +11,8 @@
int xsk_map_inc(struct xsk_map *map)
{
struct bpf_map *m = &map->map;
m = bpf_map_inc(m, false);
return PTR_ERR_OR_ZERO(m);
bpf_map_inc(&map->map);
return 0;
}
void xsk_map_put(struct xsk_map *map)

View File

@ -10477,12 +10477,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
context = parent_event->overflow_handler_context;
#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
if (overflow_handler == bpf_overflow_handler) {
struct bpf_prog *prog = bpf_prog_inc(parent_event->prog);
struct bpf_prog *prog = parent_event->prog;
if (IS_ERR(prog)) {
err = PTR_ERR(prog);
goto err_ns;
}
bpf_prog_inc(prog);
event->prog = prog;
event->orig_overflow_handler =
parent_event->orig_overflow_handler;

View File

@ -2671,6 +2671,26 @@ void *vzalloc_node(unsigned long size, int node)
}
EXPORT_SYMBOL(vzalloc_node);
/**
* vmalloc_user_node_flags - allocate memory for userspace on a specific node
* @size: allocation size
* @node: numa node
* @flags: flags for the page level allocator
*
* The resulting memory area is zeroed so it can be mapped to userspace
* without leaking data.
*
* Return: pointer to the allocated memory or %NULL on error
*/
void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags)
{
return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END,
flags | __GFP_ZERO, PAGE_KERNEL,
VM_USERMAP, node,
__builtin_return_address(0));
}
EXPORT_SYMBOL(vmalloc_user_node_flags);
/**
* vmalloc_exec - allocate virtually contiguous, executable memory
* @size: allocation size

View File

@ -105,6 +105,40 @@ out:
return err;
}
/* Integer types of various sizes and pointer combinations cover variety of
* architecture dependent calling conventions. 7+ can be supported in the
* future.
*/
int noinline bpf_fentry_test1(int a)
{
return a + 1;
}
int noinline bpf_fentry_test2(int a, u64 b)
{
return a + b;
}
int noinline bpf_fentry_test3(char a, int b, u64 c)
{
return a + b + c;
}
int noinline bpf_fentry_test4(void *a, char b, int c, u64 d)
{
return (long)a + b + c + d;
}
int noinline bpf_fentry_test5(u64 a, void *b, short c, int d, u64 e)
{
return a + (long)b + c + d + e;
}
int noinline bpf_fentry_test6(u64 a, void *b, short c, int d, void *e, u64 f)
{
return a + (long)b + c + d + (long)e + f;
}
static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
u32 headroom, u32 tailroom)
{
@ -122,6 +156,15 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
kfree(data);
return ERR_PTR(-EFAULT);
}
if (bpf_fentry_test1(1) != 2 ||
bpf_fentry_test2(2, 3) != 5 ||
bpf_fentry_test3(4, 5, 6) != 15 ||
bpf_fentry_test4((void *)7, 8, 9, 10) != 34 ||
bpf_fentry_test5(11, (void *)12, 13, 14, 15) != 65 ||
bpf_fentry_test6(16, (void *)17, 18, 19, (void *)20, 21) != 111) {
kfree(data);
return ERR_PTR(-EFAULT);
}
return data;
}

View File

@ -798,7 +798,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
* Try to grab map refcnt to make sure that it's still
* alive and prevent concurrent removal.
*/
map = bpf_map_inc_not_zero(&smap->map, false);
map = bpf_map_inc_not_zero(&smap->map);
if (IS_ERR(map))
continue;

View File

@ -3816,7 +3816,7 @@ static const struct bpf_func_proto bpf_skb_event_output_proto = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
};
static u32 bpf_skb_output_btf_ids[5];
static int bpf_skb_output_btf_ids[5];
const struct bpf_func_proto bpf_skb_output_proto = {
.func = bpf_skb_event_output,
.gpl_only = true,
@ -8684,16 +8684,6 @@ out:
}
#ifdef CONFIG_INET
struct sk_reuseport_kern {
struct sk_buff *skb;
struct sock *sk;
struct sock *selected_sk;
void *data_end;
u32 hash;
u32 reuseport_id;
bool bind_inany;
};
static void bpf_init_reuseport_kern(struct sk_reuseport_kern *reuse_kern,
struct sock_reuseport *reuse,
struct sock *sk, struct sk_buff *skb,

View File

@ -167,6 +167,7 @@ always += xdp_sample_pkts_kern.o
always += ibumad_kern.o
always += hbm_out_kern.o
always += hbm_edt_kern.o
always += xdpsock_kern.o
ifeq ($(ARCH), arm)
# Strip all except -D__LINUX_ARM_ARCH__ option needed to handle linux

View File

@ -147,7 +147,7 @@ static int prog_load(char *prog)
}
if (ret) {
printf("ERROR: load_bpf_file failed for: %s\n", prog);
printf("ERROR: bpf_prog_load_xattr failed for: %s\n", prog);
printf(" Output from verifier:\n%s\n------\n", bpf_log_buf);
ret = -1;
} else {

View File

@ -5,12 +5,12 @@
#include "bpf_helpers.h"
#include "bpf_legacy.h"
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 256,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, long);
__uint(max_entries, 256);
} my_map SEC(".maps");
SEC("socket1")
int bpf_prog1(struct __sk_buff *skb)

View File

@ -190,12 +190,12 @@ struct pair {
long bytes;
};
struct bpf_map_def SEC("maps") hash_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__be32),
.value_size = sizeof(struct pair),
.max_entries = 1024,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, __be32);
__type(value, struct pair);
__uint(max_entries, 1024);
} hash_map SEC(".maps");
SEC("socket2")
int bpf_prog2(struct __sk_buff *skb)

View File

@ -14,12 +14,12 @@
#include <linux/ipv6.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 256,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, long);
__uint(max_entries, 256);
} rxcnt SEC(".maps");
static int parse_ipv4(void *data, u64 nh_off, void *data_end)
{

View File

@ -139,7 +139,7 @@ int main(int argc, char **argv)
map_fd = bpf_map__fd(map);
if (!prog_fd) {
printf("load_bpf_file: %s\n", strerror(errno));
printf("bpf_prog_load_xattr: %s\n", strerror(errno));
return 1;
}

View File

@ -14,12 +14,12 @@
#include <linux/ipv6.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 256,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, long);
__uint(max_entries, 256);
} rxcnt SEC(".maps");
static void swap_src_dst_mac(void *data)
{

View File

@ -28,12 +28,12 @@
/* volatile to prevent compiler optimizations */
static volatile __u32 max_pcktsz = MAX_PCKT_SIZE;
struct bpf_map_def SEC("maps") icmpcnt = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(__u64),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, __u32);
__type(value, __u64);
__uint(max_entries, 1);
} icmpcnt SEC(".maps");
static __always_inline void count_icmp(void)
{

View File

@ -23,13 +23,12 @@
#define IPV6_FLOWINFO_MASK cpu_to_be32(0x0FFFFFFF)
/* For TX-traffic redirect requires net_device ifindex to be in this devmap */
struct bpf_map_def SEC("maps") xdp_tx_ports = {
.type = BPF_MAP_TYPE_DEVMAP,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 64,
};
struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
__uint(max_entries, 64);
} xdp_tx_ports SEC(".maps");
/* from include/net/ip.h */
static __always_inline int ip_decrease_ttl(struct iphdr *iph)

View File

@ -18,12 +18,12 @@
#define MAX_CPUS 64 /* WARNING - sync with _user.c */
/* Special map type that can XDP_REDIRECT frames to another CPU */
struct bpf_map_def SEC("maps") cpu_map = {
.type = BPF_MAP_TYPE_CPUMAP,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
struct {
__uint(type, BPF_MAP_TYPE_CPUMAP);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
__uint(max_entries, MAX_CPUS);
} cpu_map SEC(".maps");
/* Common stats data record to keep userspace more simple */
struct datarec {
@ -35,67 +35,67 @@ struct datarec {
/* Count RX packets, as XDP bpf_prog doesn't get direct TX-success
* feedback. Redirect TX errors can be caught via a tracepoint.
*/
struct bpf_map_def SEC("maps") rx_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, 1);
} rx_cnt SEC(".maps");
/* Used by trace point */
struct bpf_map_def SEC("maps") redirect_err_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = 2,
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, 2);
/* TODO: have entries for all possible errno's */
};
} redirect_err_cnt SEC(".maps");
/* Used by trace point */
struct bpf_map_def SEC("maps") cpumap_enqueue_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = MAX_CPUS,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, MAX_CPUS);
} cpumap_enqueue_cnt SEC(".maps");
/* Used by trace point */
struct bpf_map_def SEC("maps") cpumap_kthread_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, 1);
} cpumap_kthread_cnt SEC(".maps");
/* Set of maps controlling available CPU, and for iterating through
* selectable redirect CPUs.
*/
struct bpf_map_def SEC("maps") cpus_available = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
struct bpf_map_def SEC("maps") cpus_count = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = 1,
};
struct bpf_map_def SEC("maps") cpus_iterator = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(max_entries, MAX_CPUS);
} cpus_available SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(max_entries, 1);
} cpus_count SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(max_entries, 1);
} cpus_iterator SEC(".maps");
/* Used by trace point */
struct bpf_map_def SEC("maps") exception_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, 1);
} exception_cnt SEC(".maps");
/* Helper parse functions */

View File

@ -19,22 +19,22 @@
#include <linux/ipv6.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") tx_port = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, int);
__uint(max_entries, 1);
} tx_port SEC(".maps");
/* Count RX packets, as XDP bpf_prog doesn't get direct TX-success
* feedback. Redirect TX errors can be caught via a tracepoint.
*/
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, long);
__uint(max_entries, 1);
} rxcnt SEC(".maps");
static void swap_src_dst_mac(void *data)
{

View File

@ -19,22 +19,22 @@
#include <linux/ipv6.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") tx_port = {
.type = BPF_MAP_TYPE_DEVMAP,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 100,
};
struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
__uint(max_entries, 100);
} tx_port SEC(".maps");
/* Count RX packets, as XDP bpf_prog doesn't get direct TX-success
* feedback. Redirect TX errors can be caught via a tracepoint.
*/
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, long);
__uint(max_entries, 1);
} rxcnt SEC(".maps");
static void swap_src_dst_mac(void *data)
{

View File

@ -42,44 +42,44 @@ struct direct_map {
};
/* Map for trie implementation*/
struct bpf_map_def SEC("maps") lpm_map = {
.type = BPF_MAP_TYPE_LPM_TRIE,
.key_size = 8,
.value_size = sizeof(struct trie_value),
.max_entries = 50,
.map_flags = BPF_F_NO_PREALLOC,
};
struct {
__uint(type, BPF_MAP_TYPE_LPM_TRIE);
__uint(key_size, 8);
__uint(value_size, sizeof(struct trie_value));
__uint(max_entries, 50);
__uint(map_flags, BPF_F_NO_PREALLOC);
} lpm_map SEC(".maps");
/* Map for counter*/
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = 256,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, 256);
} rxcnt SEC(".maps");
/* Map for ARP table*/
struct bpf_map_def SEC("maps") arp_table = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__be32),
.value_size = sizeof(__be64),
.max_entries = 50,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, __be32);
__type(value, __be64);
__uint(max_entries, 50);
} arp_table SEC(".maps");
/* Map to keep the exact match entries in the route table*/
struct bpf_map_def SEC("maps") exact_match = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__be32),
.value_size = sizeof(struct direct_map),
.max_entries = 50,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, __be32);
__type(value, struct direct_map);
__uint(max_entries, 50);
} exact_match SEC(".maps");
struct bpf_map_def SEC("maps") tx_port = {
.type = BPF_MAP_TYPE_DEVMAP,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 100,
};
struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
__uint(max_entries, 100);
} tx_port SEC(".maps");
/* Function to set source and destination mac of the packet */
static inline void set_src_dst_mac(void *data, void *src, void *dst)

View File

@ -23,12 +23,13 @@ enum cfg_options_flags {
READ_MEM = 0x1U,
SWAP_MAC = 0x2U,
};
struct bpf_map_def SEC("maps") config_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(struct config),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, struct config);
__uint(max_entries, 1);
} config_map SEC(".maps");
/* Common stats data record (shared with userspace) */
struct datarec {
@ -36,22 +37,22 @@ struct datarec {
__u64 issue;
};
struct bpf_map_def SEC("maps") stats_global_map = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, 1);
} stats_global_map SEC(".maps");
#define MAX_RXQs 64
/* Stats per rx_queue_index (per CPU) */
struct bpf_map_def SEC("maps") rx_queue_index_map = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = MAX_RXQs + 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, MAX_RXQs + 1);
} rx_queue_index_map SEC(".maps");
static __always_inline
void swap_src_dst_mac(void *data)

View File

@ -51,8 +51,8 @@ static const struct option long_options[] = {
{"sec", required_argument, NULL, 's' },
{"no-separators", no_argument, NULL, 'z' },
{"action", required_argument, NULL, 'a' },
{"readmem", no_argument, NULL, 'r' },
{"swapmac", no_argument, NULL, 'm' },
{"readmem", no_argument, NULL, 'r' },
{"swapmac", no_argument, NULL, 'm' },
{"force", no_argument, NULL, 'F' },
{0, 0, NULL, 0 }
};
@ -499,7 +499,7 @@ int main(int argc, char **argv)
map_fd = bpf_map__fd(map);
if (!prog_fd) {
fprintf(stderr, "ERR: load_bpf_file: %s\n", strerror(errno));
fprintf(stderr, "ERR: bpf_prog_load_xattr: %s\n", strerror(errno));
return EXIT_FAIL;
}

View File

@ -150,7 +150,7 @@ int main(int argc, char **argv)
return 1;
if (!prog_fd) {
printf("load_bpf_file: %s\n", strerror(errno));
printf("bpf_prog_load_xattr: %s\n", strerror(errno));
return 1;
}

View File

@ -19,19 +19,19 @@
#include "bpf_helpers.h"
#include "xdp_tx_iptunnel_common.h"
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(__u64),
.max_entries = 256,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, __u32);
__type(value, __u64);
__uint(max_entries, 256);
} rxcnt SEC(".maps");
struct bpf_map_def SEC("maps") vip2tnl = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(struct vip),
.value_size = sizeof(struct iptnl_info),
.max_entries = MAX_IPTNL_ENTRIES,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, struct vip);
__type(value, struct iptnl_info);
__uint(max_entries, MAX_IPTNL_ENTRIES);
} vip2tnl SEC(".maps");
static __always_inline void count_tx(u32 protocol)
{

View File

@ -268,7 +268,7 @@ int main(int argc, char **argv)
return 1;
if (!prog_fd) {
printf("load_bpf_file: %s\n", strerror(errno));
printf("bpf_prog_load_xattr: %s\n", strerror(errno));
return 1;
}

11
samples/bpf/xdpsock.h Normal file
View File

@ -0,0 +1,11 @@
/* SPDX-License-Identifier: GPL-2.0
*
* Copyright(c) 2019 Intel Corporation.
*/
#ifndef XDPSOCK_H_
#define XDPSOCK_H_
#define MAX_SOCKS 4
#endif /* XDPSOCK_H */

View File

@ -0,0 +1,24 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include "bpf_helpers.h"
#include "xdpsock.h"
/* This XDP program is only needed for the XDP_SHARED_UMEM mode.
* If you do not use this mode, libbpf can supply an XDP program for you.
*/
struct {
__uint(type, BPF_MAP_TYPE_XSKMAP);
__uint(max_entries, MAX_SOCKS);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
} xsks_map SEC(".maps");
static unsigned int rr;
SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
{
rr = (rr + 1) & (MAX_SOCKS - 1);
return bpf_redirect_map(&xsks_map, rr, XDP_DROP);
}

View File

@ -29,6 +29,7 @@
#include "libbpf.h"
#include "xsk.h"
#include "xdpsock.h"
#include <bpf/bpf.h>
#ifndef SOL_XDP
@ -47,7 +48,6 @@
#define BATCH_SIZE 64
#define DEBUG_HEXDUMP 0
#define MAX_SOCKS 8
typedef __u64 u64;
typedef __u32 u32;
@ -75,7 +75,8 @@ static u32 opt_xdp_bind_flags;
static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
static int opt_timeout = 1000;
static bool opt_need_wakeup = true;
static __u32 prog_id;
static u32 opt_num_xsks = 1;
static u32 prog_id;
struct xsk_umem_info {
struct xsk_ring_prod fq;
@ -179,7 +180,7 @@ static void *poller(void *arg)
static void remove_xdp_program(void)
{
__u32 curr_prog_id = 0;
u32 curr_prog_id = 0;
if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
printf("bpf_get_link_xdp_id failed\n");
@ -196,11 +197,11 @@ static void remove_xdp_program(void)
static void int_exit(int sig)
{
struct xsk_umem *umem = xsks[0]->umem->umem;
(void)sig;
int i;
dump_stats();
xsk_socket__delete(xsks[0]->xsk);
for (i = 0; i < num_socks; i++)
xsk_socket__delete(xsks[i]->xsk);
(void)xsk_umem__delete(umem);
remove_xdp_program();
@ -290,7 +291,6 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
.flags = opt_umem_flags
};
int ret;
umem = calloc(1, sizeof(*umem));
@ -299,7 +299,6 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
&cfg);
if (ret)
exit_with_error(-ret);
@ -307,13 +306,29 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
return umem;
}
static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
static void xsk_populate_fill_ring(struct xsk_umem_info *umem)
{
int ret, i;
u32 idx;
ret = xsk_ring_prod__reserve(&umem->fq,
XSK_RING_PROD__DEFAULT_NUM_DESCS, &idx);
if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
exit_with_error(-ret);
for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS; i++)
*xsk_ring_prod__fill_addr(&umem->fq, idx++) =
i * opt_xsk_frame_size;
xsk_ring_prod__submit(&umem->fq, XSK_RING_PROD__DEFAULT_NUM_DESCS);
}
static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem,
bool rx, bool tx)
{
struct xsk_socket_config cfg;
struct xsk_socket_info *xsk;
struct xsk_ring_cons *rxr;
struct xsk_ring_prod *txr;
int ret;
u32 idx;
int i;
xsk = calloc(1, sizeof(*xsk));
if (!xsk)
@ -322,11 +337,17 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
xsk->umem = umem;
cfg.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
cfg.libbpf_flags = 0;
if (opt_num_xsks > 1)
cfg.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
else
cfg.libbpf_flags = 0;
cfg.xdp_flags = opt_xdp_flags;
cfg.bind_flags = opt_xdp_bind_flags;
rxr = rx ? &xsk->rx : NULL;
txr = tx ? &xsk->tx : NULL;
ret = xsk_socket__create(&xsk->xsk, opt_if, opt_queue, umem->umem,
&xsk->rx, &xsk->tx, &cfg);
rxr, txr, &cfg);
if (ret)
exit_with_error(-ret);
@ -334,17 +355,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
if (ret)
exit_with_error(-ret);
ret = xsk_ring_prod__reserve(&xsk->umem->fq,
XSK_RING_PROD__DEFAULT_NUM_DESCS,
&idx);
if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
exit_with_error(-ret);
for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS; i++)
*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) =
i * opt_xsk_frame_size;
xsk_ring_prod__submit(&xsk->umem->fq,
XSK_RING_PROD__DEFAULT_NUM_DESCS);
return xsk;
}
@ -363,6 +373,8 @@ static struct option long_options[] = {
{"frame-size", required_argument, 0, 'f'},
{"no-need-wakeup", no_argument, 0, 'm'},
{"unaligned", no_argument, 0, 'u'},
{"shared-umem", no_argument, 0, 'M'},
{"force", no_argument, 0, 'F'},
{0, 0, 0, 0}
};
@ -382,10 +394,11 @@ static void usage(const char *prog)
" -n, --interval=n Specify statistics update interval (default 1 sec).\n"
" -z, --zero-copy Force zero-copy mode.\n"
" -c, --copy Force copy mode.\n"
" -f, --frame-size=n Set the frame size (must be a power of two, default is %d).\n"
" -m, --no-need-wakeup Turn off use of driver need wakeup flag.\n"
" -f, --frame-size=n Set the frame size (must be a power of two in aligned mode, default is %d).\n"
" -u, --unaligned Enable unaligned chunk placement\n"
" -M, --shared-umem Enable XDP_SHARED_UMEM\n"
" -F, --force Force loading the XDP prog\n"
"\n";
fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
exit(EXIT_FAILURE);
@ -398,7 +411,7 @@ static void parse_command_line(int argc, char **argv)
opterr = 0;
for (;;) {
c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:mu",
c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:muM",
long_options, &option_index);
if (c == -1)
break;
@ -448,11 +461,14 @@ static void parse_command_line(int argc, char **argv)
break;
case 'f':
opt_xsk_frame_size = atoi(optarg);
break;
case 'm':
opt_need_wakeup = false;
opt_xdp_bind_flags &= ~XDP_USE_NEED_WAKEUP;
break;
case 'M':
opt_num_xsks = MAX_SOCKS;
break;
default:
usage(basename(argv[0]));
}
@ -586,11 +602,9 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
static void rx_drop_all(void)
{
struct pollfd fds[MAX_SOCKS + 1];
struct pollfd fds[MAX_SOCKS] = {};
int i, ret;
memset(fds, 0, sizeof(fds));
for (i = 0; i < num_socks; i++) {
fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
fds[i].events = POLLIN;
@ -633,11 +647,10 @@ static void tx_only(struct xsk_socket_info *xsk, u32 frame_nb)
static void tx_only_all(void)
{
struct pollfd fds[MAX_SOCKS];
struct pollfd fds[MAX_SOCKS] = {};
u32 frame_nb[MAX_SOCKS] = {};
int i, ret;
memset(fds, 0, sizeof(fds));
for (i = 0; i < num_socks; i++) {
fds[0].fd = xsk_socket__fd(xsks[i]->xsk);
fds[0].events = POLLOUT;
@ -706,11 +719,9 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
static void l2fwd_all(void)
{
struct pollfd fds[MAX_SOCKS];
struct pollfd fds[MAX_SOCKS] = {};
int i, ret;
memset(fds, 0, sizeof(fds));
for (i = 0; i < num_socks; i++) {
fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
fds[i].events = POLLOUT | POLLIN;
@ -728,13 +739,66 @@ static void l2fwd_all(void)
}
}
static void load_xdp_program(char **argv, struct bpf_object **obj)
{
struct bpf_prog_load_attr prog_load_attr = {
.prog_type = BPF_PROG_TYPE_XDP,
};
char xdp_filename[256];
int prog_fd;
snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
prog_load_attr.file = xdp_filename;
if (bpf_prog_load_xattr(&prog_load_attr, obj, &prog_fd))
exit(EXIT_FAILURE);
if (prog_fd < 0) {
fprintf(stderr, "ERROR: no program found: %s\n",
strerror(prog_fd));
exit(EXIT_FAILURE);
}
if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags) < 0) {
fprintf(stderr, "ERROR: link set xdp fd failed\n");
exit(EXIT_FAILURE);
}
}
static void enter_xsks_into_map(struct bpf_object *obj)
{
struct bpf_map *map;
int i, xsks_map;
map = bpf_object__find_map_by_name(obj, "xsks_map");
xsks_map = bpf_map__fd(map);
if (xsks_map < 0) {
fprintf(stderr, "ERROR: no xsks map found: %s\n",
strerror(xsks_map));
exit(EXIT_FAILURE);
}
for (i = 0; i < num_socks; i++) {
int fd = xsk_socket__fd(xsks[i]->xsk);
int key, ret;
key = i;
ret = bpf_map_update_elem(xsks_map, &key, &fd, 0);
if (ret) {
fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
exit(EXIT_FAILURE);
}
}
}
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
bool rx = false, tx = false;
struct xsk_umem_info *umem;
struct bpf_object *obj;
pthread_t pt;
int i, ret;
void *bufs;
int ret;
parse_command_line(argc, argv);
@ -744,6 +808,9 @@ int main(int argc, char **argv)
exit(EXIT_FAILURE);
}
if (opt_num_xsks > 1)
load_xdp_program(argv, &obj);
/* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
PROT_READ | PROT_WRITE,
@ -752,16 +819,24 @@ int main(int argc, char **argv)
printf("ERROR: mmap failed\n");
exit(EXIT_FAILURE);
}
/* Create sockets... */
/* Create sockets... */
umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
xsks[num_socks++] = xsk_configure_socket(umem);
if (opt_bench == BENCH_TXONLY) {
int i;
for (i = 0; i < NUM_FRAMES; i++)
(void)gen_eth_frame(umem, i * opt_xsk_frame_size);
if (opt_bench == BENCH_RXDROP || opt_bench == BENCH_L2FWD) {
rx = true;
xsk_populate_fill_ring(umem);
}
if (opt_bench == BENCH_L2FWD || opt_bench == BENCH_TXONLY)
tx = true;
for (i = 0; i < opt_num_xsks; i++)
xsks[num_socks++] = xsk_configure_socket(umem, rx, tx);
if (opt_bench == BENCH_TXONLY)
for (i = 0; i < NUM_FRAMES; i++)
gen_eth_frame(umem, i * opt_xsk_frame_size);
if (opt_num_xsks > 1 && opt_bench != BENCH_TXONLY)
enter_xsks_into_map(obj);
signal(SIGINT, int_exit);
signal(SIGTERM, int_exit);

View File

@ -545,6 +545,16 @@ static void bpf_reduce_k_jumps(void)
}
}
static uint8_t bpf_encode_jt_jf_offset(int off, int i)
{
int delta = off - i - 1;
if (delta < 0 || delta > 255)
fprintf(stderr, "warning: insn #%d jumps to insn #%d, "
"which is out of range\n", i, off);
return (uint8_t) delta;
}
static void bpf_reduce_jt_jumps(void)
{
int i;
@ -552,7 +562,7 @@ static void bpf_reduce_jt_jumps(void)
for (i = 0; i < curr_instr; i++) {
if (labels_jt[i]) {
int off = bpf_find_insns_offset(labels_jt[i]);
out[i].jt = (uint8_t) (off - i -1);
out[i].jt = bpf_encode_jt_jf_offset(off, i);
}
}
}
@ -564,7 +574,7 @@ static void bpf_reduce_jf_jumps(void)
for (i = 0; i < curr_instr; i++) {
if (labels_jf[i]) {
int off = bpf_find_insns_offset(labels_jf[i]);
out[i].jf = (uint8_t) (off - i - 1);
out[i].jf = bpf_encode_jt_jf_offset(off, i);
}
}
}

View File

@ -201,6 +201,8 @@ enum bpf_attach_type {
BPF_CGROUP_GETSOCKOPT,
BPF_CGROUP_SETSOCKOPT,
BPF_TRACE_RAW_TP,
BPF_TRACE_FENTRY,
BPF_TRACE_FEXIT,
__MAX_BPF_ATTACH_TYPE
};
@ -346,6 +348,9 @@ enum bpf_attach_type {
/* Clone map from listener for newly accepted socket */
#define BPF_F_CLONE (1U << 9)
/* Enable memory-mapping BPF map */
#define BPF_F_MMAPABLE (1U << 10)
/* flags for BPF_PROG_QUERY */
#define BPF_F_QUERY_EFFECTIVE (1U << 0)
@ -423,6 +428,7 @@ union bpf_attr {
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */
__u32 attach_prog_fd; /* 0 to attach to vmlinux */
};
struct { /* anonymous struct used by BPF_OBJ_* commands */

View File

@ -189,7 +189,7 @@ static void *
alloc_zero_tailing_info(const void *orecord, __u32 cnt,
__u32 actual_rec_size, __u32 expected_rec_size)
{
__u64 info_len = actual_rec_size * cnt;
__u64 info_len = (__u64)actual_rec_size * cnt;
void *info, *nrecord;
int i;
@ -228,10 +228,13 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
memset(&attr, 0, sizeof(attr));
attr.prog_type = load_attr->prog_type;
attr.expected_attach_type = load_attr->expected_attach_type;
if (attr.prog_type == BPF_PROG_TYPE_TRACING)
if (attr.prog_type == BPF_PROG_TYPE_TRACING) {
attr.attach_btf_id = load_attr->attach_btf_id;
else
attr.attach_prog_fd = load_attr->attach_prog_fd;
} else {
attr.prog_ifindex = load_attr->prog_ifindex;
attr.kern_version = load_attr->kern_version;
}
attr.insn_cnt = (__u32)load_attr->insns_cnt;
attr.insns = ptr_to_u64(load_attr->insns);
attr.license = ptr_to_u64(load_attr->license);
@ -245,7 +248,6 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
attr.log_size = 0;
}
attr.kern_version = load_attr->kern_version;
attr.prog_btf_fd = load_attr->prog_btf_fd;
attr.func_info_rec_size = load_attr->func_info_rec_size;
attr.func_info_cnt = load_attr->func_info_cnt;

View File

@ -77,7 +77,10 @@ struct bpf_load_program_attr {
const struct bpf_insn *insns;
size_t insns_cnt;
const char *license;
__u32 kern_version;
union {
__u32 kern_version;
__u32 attach_prog_fd;
};
union {
__u32 prog_ifindex;
__u32 attach_btf_id;

View File

@ -12,9 +12,76 @@
*/
enum bpf_field_info_kind {
BPF_FIELD_BYTE_OFFSET = 0, /* field byte offset */
BPF_FIELD_BYTE_SIZE = 1,
BPF_FIELD_EXISTS = 2, /* field existence in target kernel */
BPF_FIELD_SIGNED = 3,
BPF_FIELD_LSHIFT_U64 = 4,
BPF_FIELD_RSHIFT_U64 = 5,
};
#define __CORE_RELO(src, field, info) \
__builtin_preserve_field_info((src)->field, BPF_FIELD_##info)
#if __BYTE_ORDER == __LITTLE_ENDIAN
#define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \
bpf_probe_read((void *)dst, \
__CORE_RELO(src, fld, BYTE_SIZE), \
(const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
#else
/* semantics of LSHIFT_64 assumes loading values into low-ordered bytes, so
* for big-endian we need to adjust destination pointer accordingly, based on
* field byte size
*/
#define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \
bpf_probe_read((void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \
__CORE_RELO(src, fld, BYTE_SIZE), \
(const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
#endif
/*
* Extract bitfield, identified by s->field, and return its value as u64.
* All this is done in relocatable manner, so bitfield changes such as
* signedness, bit size, offset changes, this will be handled automatically.
* This version of macro is using bpf_probe_read() to read underlying integer
* storage. Macro functions as an expression and its return type is
* bpf_probe_read()'s return value: 0, on success, <0 on error.
*/
#define BPF_CORE_READ_BITFIELD_PROBED(s, field) ({ \
unsigned long long val = 0; \
\
__CORE_BITFIELD_PROBE_READ(&val, s, field); \
val <<= __CORE_RELO(s, field, LSHIFT_U64); \
if (__CORE_RELO(s, field, SIGNED)) \
val = ((long long)val) >> __CORE_RELO(s, field, RSHIFT_U64); \
else \
val = val >> __CORE_RELO(s, field, RSHIFT_U64); \
val; \
})
/*
* Extract bitfield, identified by s->field, and return its value as u64.
* This version of macro is using direct memory reads and should be used from
* BPF program types that support such functionality (e.g., typed raw
* tracepoints).
*/
#define BPF_CORE_READ_BITFIELD(s, field) ({ \
const void *p = (const void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \
unsigned long long val; \
\
switch (__CORE_RELO(s, field, BYTE_SIZE)) { \
case 1: val = *(const unsigned char *)p; \
case 2: val = *(const unsigned short *)p; \
case 4: val = *(const unsigned int *)p; \
case 8: val = *(const unsigned long long *)p; \
} \
val <<= __CORE_RELO(s, field, LSHIFT_U64); \
if (__CORE_RELO(s, field, SIGNED)) \
val = ((long long)val) >> __CORE_RELO(s, field, RSHIFT_U64); \
else \
val = val >> __CORE_RELO(s, field, RSHIFT_U64); \
val; \
})
/*
* Convenience macro to check that field actually exists in target kernel's.
* Returns:
@ -24,6 +91,13 @@ enum bpf_field_info_kind {
#define bpf_core_field_exists(field) \
__builtin_preserve_field_info(field, BPF_FIELD_EXISTS)
/*
* Convenience macro to get byte size of a field. Works for integers,
* struct/unions, pointers, arrays, and enums.
*/
#define bpf_core_field_size(field) \
__builtin_preserve_field_info(field, BPF_FIELD_BYTE_SIZE)
/*
* bpf_core_read() abstracts away bpf_probe_read() call and captures offset
* relocation for source address using __builtin_preserve_access_index()

View File

@ -44,4 +44,17 @@ enum libbpf_pin_type {
LIBBPF_PIN_BY_NAME,
};
/* The following types should be used by BPF_PROG_TYPE_TRACING program to
* access kernel function arguments. BPF trampoline and raw tracepoints
* typecast arguments to 'unsigned long long'.
*/
typedef int __attribute__((aligned(8))) ks32;
typedef char __attribute__((aligned(8))) ks8;
typedef short __attribute__((aligned(8))) ks16;
typedef long long __attribute__((aligned(8))) ks64;
typedef unsigned int __attribute__((aligned(8))) ku32;
typedef unsigned char __attribute__((aligned(8))) ku8;
typedef unsigned short __attribute__((aligned(8))) ku16;
typedef unsigned long long __attribute__((aligned(8))) ku64;
#endif

View File

@ -101,6 +101,7 @@ struct bpf_prog_linfo *bpf_prog_linfo__new(const struct bpf_prog_info *info)
{
struct bpf_prog_linfo *prog_linfo;
__u32 nr_linfo, nr_jited_func;
__u64 data_sz;
nr_linfo = info->nr_line_info;
@ -122,11 +123,11 @@ struct bpf_prog_linfo *bpf_prog_linfo__new(const struct bpf_prog_info *info)
/* Copy xlated line_info */
prog_linfo->nr_linfo = nr_linfo;
prog_linfo->rec_size = info->line_info_rec_size;
prog_linfo->raw_linfo = malloc(nr_linfo * prog_linfo->rec_size);
data_sz = (__u64)nr_linfo * prog_linfo->rec_size;
prog_linfo->raw_linfo = malloc(data_sz);
if (!prog_linfo->raw_linfo)
goto err_free;
memcpy(prog_linfo->raw_linfo, (void *)(long)info->line_info,
nr_linfo * prog_linfo->rec_size);
memcpy(prog_linfo->raw_linfo, (void *)(long)info->line_info, data_sz);
nr_jited_func = info->nr_jited_ksyms;
if (!nr_jited_func ||
@ -142,13 +143,12 @@ struct bpf_prog_linfo *bpf_prog_linfo__new(const struct bpf_prog_info *info)
/* Copy jited_line_info */
prog_linfo->nr_jited_func = nr_jited_func;
prog_linfo->jited_rec_size = info->jited_line_info_rec_size;
prog_linfo->raw_jited_linfo = malloc(nr_linfo *
prog_linfo->jited_rec_size);
data_sz = (__u64)nr_linfo * prog_linfo->jited_rec_size;
prog_linfo->raw_jited_linfo = malloc(data_sz);
if (!prog_linfo->raw_jited_linfo)
goto err_free;
memcpy(prog_linfo->raw_jited_linfo,
(void *)(long)info->jited_line_info,
nr_linfo * prog_linfo->jited_rec_size);
(void *)(long)info->jited_line_info, data_sz);
/* Number of jited_line_info per jited func */
prog_linfo->nr_jited_linfo_per_func = malloc(nr_jited_func *

View File

@ -269,10 +269,9 @@ __s64 btf__resolve_size(const struct btf *btf, __u32 type_id)
t = btf__type_by_id(btf, type_id);
}
done:
if (size < 0)
return -EINVAL;
done:
if (nelems && size > UINT32_MAX / nelems)
return -E2BIG;
@ -317,6 +316,28 @@ __s32 btf__find_by_name(const struct btf *btf, const char *type_name)
return -ENOENT;
}
__s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name,
__u32 kind)
{
__u32 i;
if (kind == BTF_KIND_UNKN || !strcmp(type_name, "void"))
return 0;
for (i = 1; i <= btf->nr_types; i++) {
const struct btf_type *t = btf->types[i];
const char *name;
if (btf_kind(t) != kind)
continue;
name = btf__name_by_offset(btf, t->name_off);
if (name && !strcmp(type_name, name))
return i;
}
return -ENOENT;
}
void btf__free(struct btf *btf)
{
if (!btf)

View File

@ -72,6 +72,8 @@ LIBBPF_API int btf__finalize_data(struct bpf_object *obj, struct btf *btf);
LIBBPF_API int btf__load(struct btf *btf);
LIBBPF_API __s32 btf__find_by_name(const struct btf *btf,
const char *type_name);
LIBBPF_API __s32 btf__find_by_name_kind(const struct btf *btf,
const char *type_name, __u32 kind);
LIBBPF_API __u32 btf__get_nr_types(const struct btf *btf);
LIBBPF_API const struct btf_type *btf__type_by_id(const struct btf *btf,
__u32 id);

View File

@ -142,6 +142,8 @@ struct bpf_capabilities {
__u32 btf_func:1;
/* BTF_KIND_VAR and BTF_KIND_DATASEC support */
__u32 btf_datasec:1;
/* BPF_F_MMAPABLE is supported for arrays */
__u32 array_mmap:1;
};
/*
@ -189,6 +191,7 @@ struct bpf_program {
enum bpf_attach_type expected_attach_type;
__u32 attach_btf_id;
__u32 attach_prog_fd;
void *func_info;
__u32 func_info_rec_size;
__u32 func_info_cnt;
@ -229,6 +232,7 @@ struct bpf_map {
enum libbpf_map_type libbpf_type;
char *pin_path;
bool pinned;
bool reused;
};
struct bpf_secdata {
@ -855,8 +859,6 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
pr_warn("failed to alloc map name\n");
return -ENOMEM;
}
pr_debug("map '%s' (global data): at sec_idx %d, offset %zu.\n",
map_name, map->sec_idx, map->sec_offset);
def = &map->def;
def->type = BPF_MAP_TYPE_ARRAY;
@ -864,6 +866,12 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
def->value_size = data->d_size;
def->max_entries = 1;
def->map_flags = type == LIBBPF_MAP_RODATA ? BPF_F_RDONLY_PROG : 0;
if (obj->caps.array_mmap)
def->map_flags |= BPF_F_MMAPABLE;
pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
map_name, map->sec_idx, map->sec_offset, def->map_flags);
if (data_buff) {
*data_buff = malloc(data->d_size);
if (!*data_buff) {
@ -956,13 +964,13 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
pr_debug("maps in %s: %d maps in %zd bytes\n",
obj->path, nr_maps, data->d_size);
map_def_sz = data->d_size / nr_maps;
if (!data->d_size || (data->d_size % nr_maps) != 0) {
if (!data->d_size || nr_maps == 0 || (data->d_size % nr_maps) != 0) {
pr_warn("unable to determine map definition size "
"section %s, %d maps in %zd bytes\n",
obj->path, nr_maps, data->d_size);
return -EINVAL;
}
map_def_sz = data->d_size / nr_maps;
/* Fill obj->maps using data in "maps" section. */
for (i = 0; i < nr_syms; i++) {
@ -1862,9 +1870,13 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
pr_warn("incorrect bpf_call opcode\n");
return -LIBBPF_ERRNO__RELOC;
}
if (sym.st_value % 8) {
pr_warn("bad call relo offset: %lu\n", sym.st_value);
return -LIBBPF_ERRNO__RELOC;
}
prog->reloc_desc[i].type = RELO_CALL;
prog->reloc_desc[i].insn_idx = insn_idx;
prog->reloc_desc[i].text_off = sym.st_value;
prog->reloc_desc[i].text_off = sym.st_value / 8;
obj->has_pseudo_calls = true;
continue;
}
@ -1995,6 +2007,7 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
map->def.map_flags = info.map_flags;
map->btf_key_type_id = info.btf_key_type_id;
map->btf_value_type_id = info.btf_value_type_id;
map->reused = true;
return 0;
@ -2158,6 +2171,27 @@ static int bpf_object__probe_btf_datasec(struct bpf_object *obj)
return 0;
}
static int bpf_object__probe_array_mmap(struct bpf_object *obj)
{
struct bpf_create_map_attr attr = {
.map_type = BPF_MAP_TYPE_ARRAY,
.map_flags = BPF_F_MMAPABLE,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
int fd;
fd = bpf_create_map_xattr(&attr);
if (fd >= 0) {
obj->caps.array_mmap = 1;
close(fd);
return 1;
}
return 0;
}
static int
bpf_object__probe_caps(struct bpf_object *obj)
{
@ -2166,6 +2200,7 @@ bpf_object__probe_caps(struct bpf_object *obj)
bpf_object__probe_global_data,
bpf_object__probe_btf_func,
bpf_object__probe_btf_datasec,
bpf_object__probe_array_mmap,
};
int i, ret;
@ -2470,8 +2505,8 @@ struct bpf_core_spec {
int raw_spec[BPF_CORE_SPEC_MAX_LEN];
/* raw spec length */
int raw_len;
/* field byte offset represented by spec */
__u32 offset;
/* field bit offset represented by spec */
__u32 bit_offset;
};
static bool str_is_empty(const char *s)
@ -2482,8 +2517,8 @@ static bool str_is_empty(const char *s)
/*
* Turn bpf_field_reloc into a low- and high-level spec representation,
* validating correctness along the way, as well as calculating resulting
* field offset (in bytes), specified by accessor string. Low-level spec
* captures every single level of nestedness, including traversing anonymous
* field bit offset, specified by accessor string. Low-level spec captures
* every single level of nestedness, including traversing anonymous
* struct/union members. High-level one only captures semantically meaningful
* "turning points": named fields and array indicies.
* E.g., for this case:
@ -2555,7 +2590,7 @@ static int bpf_core_spec_parse(const struct btf *btf,
sz = btf__resolve_size(btf, id);
if (sz < 0)
return sz;
spec->offset = access_idx * sz;
spec->bit_offset = access_idx * sz * 8;
for (i = 1; i < spec->raw_len; i++) {
t = skip_mods_and_typedefs(btf, id, &id);
@ -2566,17 +2601,13 @@ static int bpf_core_spec_parse(const struct btf *btf,
if (btf_is_composite(t)) {
const struct btf_member *m;
__u32 offset;
__u32 bit_offset;
if (access_idx >= btf_vlen(t))
return -EINVAL;
if (btf_member_bitfield_size(t, access_idx))
return -EINVAL;
offset = btf_member_bit_offset(t, access_idx);
if (offset % 8)
return -EINVAL;
spec->offset += offset / 8;
bit_offset = btf_member_bit_offset(t, access_idx);
spec->bit_offset += bit_offset;
m = btf_members(t) + access_idx;
if (m->name_off) {
@ -2605,7 +2636,7 @@ static int bpf_core_spec_parse(const struct btf *btf,
sz = btf__resolve_size(btf, id);
if (sz < 0)
return sz;
spec->offset += access_idx * sz;
spec->bit_offset += access_idx * sz * 8;
} else {
pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n",
type_id, spec_str, i, id, btf_kind(t));
@ -2706,12 +2737,14 @@ err_out:
}
/* Check two types for compatibility, skipping const/volatile/restrict and
* typedefs, to ensure we are relocating offset to the compatible entities:
* typedefs, to ensure we are relocating compatible entities:
* - any two STRUCTs/UNIONs are compatible and can be mixed;
* - any two FWDs are compatible;
* - any two FWDs are compatible, if their names match (modulo flavor suffix);
* - any two PTRs are always compatible;
* - for ENUMs, names should be the same (ignoring flavor suffix) or at
* least one of enums should be anonymous;
* - for ENUMs, check sizes, names are ignored;
* - for INT, size and bitness should match, signedness is ignored;
* - for INT, size and signedness are ignored;
* - for ARRAY, dimensionality is ignored, element types are checked for
* compatibility recursively;
* - everything else shouldn't be ever a target of relocation.
@ -2737,16 +2770,29 @@ recur:
return 0;
switch (btf_kind(local_type)) {
case BTF_KIND_FWD:
case BTF_KIND_PTR:
return 1;
case BTF_KIND_ENUM:
return local_type->size == targ_type->size;
case BTF_KIND_FWD:
case BTF_KIND_ENUM: {
const char *local_name, *targ_name;
size_t local_len, targ_len;
local_name = btf__name_by_offset(local_btf,
local_type->name_off);
targ_name = btf__name_by_offset(targ_btf, targ_type->name_off);
local_len = bpf_core_essential_name_len(local_name);
targ_len = bpf_core_essential_name_len(targ_name);
/* one of them is anonymous or both w/ same flavor-less names */
return local_len == 0 || targ_len == 0 ||
(local_len == targ_len &&
strncmp(local_name, targ_name, local_len) == 0);
}
case BTF_KIND_INT:
/* just reject deprecated bitfield-like integers; all other
* integers are by default compatible between each other
*/
return btf_int_offset(local_type) == 0 &&
btf_int_offset(targ_type) == 0 &&
local_type->size == targ_type->size &&
btf_int_bits(local_type) == btf_int_bits(targ_type);
btf_int_offset(targ_type) == 0;
case BTF_KIND_ARRAY:
local_id = btf_array(local_type)->type;
targ_id = btf_array(targ_type)->type;
@ -2762,7 +2808,7 @@ recur:
* Given single high-level named field accessor in local type, find
* corresponding high-level accessor for a target type. Along the way,
* maintain low-level spec for target as well. Also keep updating target
* offset.
* bit offset.
*
* Searching is performed through recursive exhaustive enumeration of all
* fields of a struct/union. If there are any anonymous (embedded)
@ -2801,21 +2847,16 @@ static int bpf_core_match_member(const struct btf *local_btf,
n = btf_vlen(targ_type);
m = btf_members(targ_type);
for (i = 0; i < n; i++, m++) {
__u32 offset;
__u32 bit_offset;
/* bitfield relocations not supported */
if (btf_member_bitfield_size(targ_type, i))
continue;
offset = btf_member_bit_offset(targ_type, i);
if (offset % 8)
continue;
bit_offset = btf_member_bit_offset(targ_type, i);
/* too deep struct/union/array nesting */
if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
return -E2BIG;
/* speculate this member will be the good one */
spec->offset += offset / 8;
spec->bit_offset += bit_offset;
spec->raw_spec[spec->raw_len++] = i;
targ_name = btf__name_by_offset(targ_btf, m->name_off);
@ -2844,7 +2885,7 @@ static int bpf_core_match_member(const struct btf *local_btf,
return found;
}
/* member turned out not to be what we looked for */
spec->offset -= offset / 8;
spec->bit_offset -= bit_offset;
spec->raw_len--;
}
@ -2853,7 +2894,7 @@ static int bpf_core_match_member(const struct btf *local_btf,
/*
* Try to match local spec to a target type and, if successful, produce full
* target spec (high-level, low-level + offset).
* target spec (high-level, low-level + bit offset).
*/
static int bpf_core_spec_match(struct bpf_core_spec *local_spec,
const struct btf *targ_btf, __u32 targ_id,
@ -2916,13 +2957,120 @@ static int bpf_core_spec_match(struct bpf_core_spec *local_spec,
sz = btf__resolve_size(targ_btf, targ_id);
if (sz < 0)
return sz;
targ_spec->offset += local_acc->idx * sz;
targ_spec->bit_offset += local_acc->idx * sz * 8;
}
}
return 1;
}
static int bpf_core_calc_field_relo(const struct bpf_program *prog,
const struct bpf_field_reloc *relo,
const struct bpf_core_spec *spec,
__u32 *val, bool *validate)
{
const struct bpf_core_accessor *acc = &spec->spec[spec->len - 1];
const struct btf_type *t = btf__type_by_id(spec->btf, acc->type_id);
__u32 byte_off, byte_sz, bit_off, bit_sz;
const struct btf_member *m;
const struct btf_type *mt;
bool bitfield;
__s64 sz;
/* a[n] accessor needs special handling */
if (!acc->name) {
if (relo->kind == BPF_FIELD_BYTE_OFFSET) {
*val = spec->bit_offset / 8;
} else if (relo->kind == BPF_FIELD_BYTE_SIZE) {
sz = btf__resolve_size(spec->btf, acc->type_id);
if (sz < 0)
return -EINVAL;
*val = sz;
} else {
pr_warn("prog '%s': relo %d at insn #%d can't be applied to array access\n",
bpf_program__title(prog, false),
relo->kind, relo->insn_off / 8);
return -EINVAL;
}
if (validate)
*validate = true;
return 0;
}
m = btf_members(t) + acc->idx;
mt = skip_mods_and_typedefs(spec->btf, m->type, NULL);
bit_off = spec->bit_offset;
bit_sz = btf_member_bitfield_size(t, acc->idx);
bitfield = bit_sz > 0;
if (bitfield) {
byte_sz = mt->size;
byte_off = bit_off / 8 / byte_sz * byte_sz;
/* figure out smallest int size necessary for bitfield load */
while (bit_off + bit_sz - byte_off * 8 > byte_sz * 8) {
if (byte_sz >= 8) {
/* bitfield can't be read with 64-bit read */
pr_warn("prog '%s': relo %d at insn #%d can't be satisfied for bitfield\n",
bpf_program__title(prog, false),
relo->kind, relo->insn_off / 8);
return -E2BIG;
}
byte_sz *= 2;
byte_off = bit_off / 8 / byte_sz * byte_sz;
}
} else {
sz = btf__resolve_size(spec->btf, m->type);
if (sz < 0)
return -EINVAL;
byte_sz = sz;
byte_off = spec->bit_offset / 8;
bit_sz = byte_sz * 8;
}
/* for bitfields, all the relocatable aspects are ambiguous and we
* might disagree with compiler, so turn off validation of expected
* value, except for signedness
*/
if (validate)
*validate = !bitfield;
switch (relo->kind) {
case BPF_FIELD_BYTE_OFFSET:
*val = byte_off;
break;
case BPF_FIELD_BYTE_SIZE:
*val = byte_sz;
break;
case BPF_FIELD_SIGNED:
/* enums will be assumed unsigned */
*val = btf_is_enum(mt) ||
(btf_int_encoding(mt) & BTF_INT_SIGNED);
if (validate)
*validate = true; /* signedness is never ambiguous */
break;
case BPF_FIELD_LSHIFT_U64:
#if __BYTE_ORDER == __LITTLE_ENDIAN
*val = 64 - (bit_off + bit_sz - byte_off * 8);
#else
*val = (8 - byte_sz) * 8 + (bit_off - byte_off * 8);
#endif
break;
case BPF_FIELD_RSHIFT_U64:
*val = 64 - bit_sz;
if (validate)
*validate = true; /* right shift is never ambiguous */
break;
case BPF_FIELD_EXISTS:
default:
pr_warn("prog '%s': unknown relo %d at insn #%d\n",
bpf_program__title(prog, false),
relo->kind, relo->insn_off / 8);
return -EINVAL;
}
return 0;
}
/*
* Patch relocatable BPF instruction.
*
@ -2942,36 +3090,31 @@ static int bpf_core_reloc_insn(struct bpf_program *prog,
const struct bpf_core_spec *local_spec,
const struct bpf_core_spec *targ_spec)
{
bool failed = false, validate = true;
__u32 orig_val, new_val;
struct bpf_insn *insn;
int insn_idx;
int insn_idx, err;
__u8 class;
if (relo->insn_off % sizeof(struct bpf_insn))
return -EINVAL;
insn_idx = relo->insn_off / sizeof(struct bpf_insn);
switch (relo->kind) {
case BPF_FIELD_BYTE_OFFSET:
orig_val = local_spec->offset;
if (targ_spec) {
new_val = targ_spec->offset;
} else {
pr_warn("prog '%s': patching insn #%d w/ failed reloc, imm %d -> %d\n",
bpf_program__title(prog, false), insn_idx,
orig_val, -1);
new_val = (__u32)-1;
}
break;
case BPF_FIELD_EXISTS:
if (relo->kind == BPF_FIELD_EXISTS) {
orig_val = 1; /* can't generate EXISTS relo w/o local field */
new_val = targ_spec ? 1 : 0;
break;
default:
pr_warn("prog '%s': unknown relo %d at insn #%d'\n",
bpf_program__title(prog, false),
relo->kind, insn_idx);
return -EINVAL;
} else if (!targ_spec) {
failed = true;
new_val = (__u32)-1;
} else {
err = bpf_core_calc_field_relo(prog, relo, local_spec,
&orig_val, &validate);
if (err)
return err;
err = bpf_core_calc_field_relo(prog, relo, targ_spec,
&new_val, NULL);
if (err)
return err;
}
insn = &prog->insns[insn_idx];
@ -2980,12 +3123,17 @@ static int bpf_core_reloc_insn(struct bpf_program *prog,
if (class == BPF_ALU || class == BPF_ALU64) {
if (BPF_SRC(insn->code) != BPF_K)
return -EINVAL;
if (insn->imm != orig_val)
if (!failed && validate && insn->imm != orig_val) {
pr_warn("prog '%s': unexpected insn #%d value: got %u, exp %u -> %u\n",
bpf_program__title(prog, false), insn_idx,
insn->imm, orig_val, new_val);
return -EINVAL;
}
orig_val = insn->imm;
insn->imm = new_val;
pr_debug("prog '%s': patched insn #%d (ALU/ALU64) imm %d -> %d\n",
bpf_program__title(prog, false),
insn_idx, orig_val, new_val);
pr_debug("prog '%s': patched insn #%d (ALU/ALU64)%s imm %u -> %u\n",
bpf_program__title(prog, false), insn_idx,
failed ? " w/ failed reloc" : "", orig_val, new_val);
} else {
pr_warn("prog '%s': trying to relocate unrecognized insn #%d, code:%x, src:%x, dst:%x, off:%x, imm:%x\n",
bpf_program__title(prog, false),
@ -3103,7 +3251,8 @@ static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec)
libbpf_print(level, "%d%s", spec->raw_spec[i],
i == spec->raw_len - 1 ? " => " : ":");
libbpf_print(level, "%u @ &x", spec->offset);
libbpf_print(level, "%u.%u @ &x",
spec->bit_offset / 8, spec->bit_offset % 8);
for (i = 0; i < spec->len; i++) {
if (spec->spec[i].name)
@ -3217,7 +3366,8 @@ static int bpf_core_reloc_field(struct bpf_program *prog,
return -EINVAL;
}
pr_debug("prog '%s': relo #%d: spec is ", prog_name, relo_idx);
pr_debug("prog '%s': relo #%d: kind %d, spec is ", prog_name, relo_idx,
relo->kind);
bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec);
libbpf_print(LIBBPF_DEBUG, "\n");
@ -3257,13 +3407,13 @@ static int bpf_core_reloc_field(struct bpf_program *prog,
if (j == 0) {
targ_spec = cand_spec;
} else if (cand_spec.offset != targ_spec.offset) {
} else if (cand_spec.bit_offset != targ_spec.bit_offset) {
/* if there are many candidates, they should all
* resolve to the same offset
* resolve to the same bit offset
*/
pr_warn("prog '%s': relo #%d: offset ambiguity: %u != %u\n",
prog_name, relo_idx, cand_spec.offset,
targ_spec.offset);
prog_name, relo_idx, cand_spec.bit_offset,
targ_spec.bit_offset);
return -EINVAL;
}
@ -3408,6 +3558,7 @@ bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
pr_warn("oom in prog realloc\n");
return -ENOMEM;
}
prog->insns = new_insn;
if (obj->btf_ext) {
err = bpf_program_reloc_btf_ext(prog, obj,
@ -3419,7 +3570,6 @@ bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
memcpy(new_insn + prog->insns_cnt, text->insns,
text->insns_cnt * sizeof(*insn));
prog->insns = new_insn;
prog->main_prog_cnt = prog->insns_cnt;
prog->insns_cnt = new_cnt;
pr_debug("added %zd insn from %s to prog %s\n",
@ -3427,7 +3577,7 @@ bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
prog->section_name);
}
insn = &prog->insns[relo->insn_idx];
insn->imm += prog->main_prog_cnt - relo->insn_idx;
insn->imm += relo->text_off + prog->main_prog_cnt - relo->insn_idx;
return 0;
}
@ -3566,8 +3716,13 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
load_attr.insns = insns;
load_attr.insns_cnt = insns_cnt;
load_attr.license = license;
load_attr.kern_version = kern_version;
load_attr.prog_ifindex = prog->prog_ifindex;
if (prog->type == BPF_PROG_TYPE_TRACING) {
load_attr.attach_prog_fd = prog->attach_prog_fd;
load_attr.attach_btf_id = prog->attach_btf_id;
} else {
load_attr.kern_version = kern_version;
load_attr.prog_ifindex = prog->prog_ifindex;
}
/* if .BTF.ext was loaded, kernel supports associated BTF for prog */
if (prog->obj->btf_ext)
btf_fd = bpf_object__btf_fd(prog->obj);
@ -3582,7 +3737,6 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
load_attr.line_info_cnt = prog->line_info_cnt;
load_attr.log_level = prog->log_level;
load_attr.prog_flags = prog->prog_flags;
load_attr.attach_btf_id = prog->attach_btf_id;
retry_load:
log_buf = malloc(log_buf_size);
@ -3604,7 +3758,7 @@ retry_load:
free(log_buf);
goto retry_load;
}
ret = -LIBBPF_ERRNO__LOAD;
ret = -errno;
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warn("load bpf program failed: %s\n", cp);
@ -3617,23 +3771,18 @@ retry_load:
pr_warn("Program too large (%zu insns), at most %d insns\n",
load_attr.insns_cnt, BPF_MAXINSNS);
ret = -LIBBPF_ERRNO__PROG2BIG;
} else {
} else if (load_attr.prog_type != BPF_PROG_TYPE_KPROBE) {
/* Wrong program type? */
if (load_attr.prog_type != BPF_PROG_TYPE_KPROBE) {
int fd;
int fd;
load_attr.prog_type = BPF_PROG_TYPE_KPROBE;
load_attr.expected_attach_type = 0;
fd = bpf_load_program_xattr(&load_attr, NULL, 0);
if (fd >= 0) {
close(fd);
ret = -LIBBPF_ERRNO__PROGTYPE;
goto out;
}
load_attr.prog_type = BPF_PROG_TYPE_KPROBE;
load_attr.expected_attach_type = 0;
fd = bpf_load_program_xattr(&load_attr, NULL, 0);
if (fd >= 0) {
close(fd);
ret = -LIBBPF_ERRNO__PROGTYPE;
goto out;
}
if (log_buf)
ret = -LIBBPF_ERRNO__KVER;
}
out:
@ -3744,8 +3893,9 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level)
return 0;
}
static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id);
static int libbpf_find_attach_btf_id(const char *name,
enum bpf_attach_type attach_type,
__u32 attach_prog_fd);
static struct bpf_object *
__bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
struct bpf_object_open_opts *opts)
@ -3756,6 +3906,7 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
const char *obj_name;
char tmp_name[64];
bool relaxed_maps;
__u32 attach_prog_fd;
int err;
if (elf_version(EV_CURRENT) == EV_NONE) {
@ -3786,6 +3937,7 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
obj->relaxed_core_relocs = OPTS_GET(opts, relaxed_core_relocs, false);
relaxed_maps = OPTS_GET(opts, relaxed_maps, false);
pin_root_path = OPTS_GET(opts, pin_root_path, NULL);
attach_prog_fd = OPTS_GET(opts, attach_prog_fd, 0);
CHECK_ERR(bpf_object__elf_init(obj), err, out);
CHECK_ERR(bpf_object__check_endianness(obj), err, out);
@ -3798,7 +3950,6 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
bpf_object__for_each_program(prog, obj) {
enum bpf_prog_type prog_type;
enum bpf_attach_type attach_type;
__u32 btf_id;
err = libbpf_prog_type_by_name(prog->section_name, &prog_type,
&attach_type);
@ -3811,10 +3962,13 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
bpf_program__set_type(prog, prog_type);
bpf_program__set_expected_attach_type(prog, attach_type);
if (prog_type == BPF_PROG_TYPE_TRACING) {
err = libbpf_attach_btf_id_by_name(prog->section_name, &btf_id);
if (err)
err = libbpf_find_attach_btf_id(prog->section_name,
attach_type,
attach_prog_fd);
if (err <= 0)
goto out;
prog->attach_btf_id = btf_id;
prog->attach_btf_id = err;
prog->attach_prog_fd = attach_prog_fd;
}
}
@ -3911,7 +4065,7 @@ int bpf_object__unload(struct bpf_object *obj)
int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
{
struct bpf_object *obj;
int err;
int err, i;
if (!attr)
return -EINVAL;
@ -3932,6 +4086,11 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
return 0;
out:
/* unpin any maps that were auto-pinned during load */
for (i = 0; i < obj->nr_maps; i++)
if (obj->maps[i].pinned && !obj->maps[i].reused)
bpf_map__unpin(&obj->maps[i], NULL);
bpf_object__unload(obj);
pr_warn("failed to load object '%s'\n", obj->path);
return err;
@ -4665,6 +4824,11 @@ int bpf_program__fd(const struct bpf_program *prog)
return bpf_program__nth_fd(prog, 0);
}
size_t bpf_program__size(const struct bpf_program *prog)
{
return prog->insns_cnt * sizeof(struct bpf_insn);
}
int bpf_program__set_prep(struct bpf_program *prog, int nr_instances,
bpf_program_prep_t prep)
{
@ -4813,6 +4977,10 @@ static const struct {
BPF_PROG_SEC("raw_tp/", BPF_PROG_TYPE_RAW_TRACEPOINT),
BPF_PROG_BTF("tp_btf/", BPF_PROG_TYPE_TRACING,
BPF_TRACE_RAW_TP),
BPF_PROG_BTF("fentry/", BPF_PROG_TYPE_TRACING,
BPF_TRACE_FENTRY),
BPF_PROG_BTF("fexit/", BPF_PROG_TYPE_TRACING,
BPF_TRACE_FEXIT),
BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT),
BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN),
@ -4930,43 +5098,94 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
}
#define BTF_PREFIX "btf_trace_"
static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id)
int libbpf_find_vmlinux_btf_id(const char *name,
enum bpf_attach_type attach_type)
{
struct btf *btf = bpf_core_find_kernel_btf();
char raw_tp_btf_name[128] = BTF_PREFIX;
char *dst = raw_tp_btf_name + sizeof(BTF_PREFIX) - 1;
int ret, i, err = -EINVAL;
char raw_tp_btf[128] = BTF_PREFIX;
char *dst = raw_tp_btf + sizeof(BTF_PREFIX) - 1;
const char *btf_name;
int err = -EINVAL;
u32 kind;
if (IS_ERR(btf)) {
pr_warn("vmlinux BTF is not found\n");
return -EINVAL;
}
if (!name)
if (attach_type == BPF_TRACE_RAW_TP) {
/* prepend "btf_trace_" prefix per kernel convention */
strncat(dst, name, sizeof(raw_tp_btf) - sizeof(BTF_PREFIX));
btf_name = raw_tp_btf;
kind = BTF_KIND_TYPEDEF;
} else {
btf_name = name;
kind = BTF_KIND_FUNC;
}
err = btf__find_by_name_kind(btf, btf_name, kind);
btf__free(btf);
return err;
}
static int libbpf_find_prog_btf_id(const char *name, __u32 attach_prog_fd)
{
struct bpf_prog_info_linear *info_linear;
struct bpf_prog_info *info;
struct btf *btf = NULL;
int err = -EINVAL;
info_linear = bpf_program__get_prog_info_linear(attach_prog_fd, 0);
if (IS_ERR_OR_NULL(info_linear)) {
pr_warn("failed get_prog_info_linear for FD %d\n",
attach_prog_fd);
return -EINVAL;
}
info = &info_linear->info;
if (!info->btf_id) {
pr_warn("The target program doesn't have BTF\n");
goto out;
}
if (btf__get_from_id(info->btf_id, &btf)) {
pr_warn("Failed to get BTF of the program\n");
goto out;
}
err = btf__find_by_name_kind(btf, name, BTF_KIND_FUNC);
btf__free(btf);
if (err <= 0) {
pr_warn("%s is not found in prog's BTF\n", name);
goto out;
}
out:
free(info_linear);
return err;
}
static int libbpf_find_attach_btf_id(const char *name,
enum bpf_attach_type attach_type,
__u32 attach_prog_fd)
{
int i, err;
if (!name)
return -EINVAL;
for (i = 0; i < ARRAY_SIZE(section_names); i++) {
if (!section_names[i].is_attach_btf)
continue;
if (strncmp(name, section_names[i].sec, section_names[i].len))
continue;
/* prepend "btf_trace_" prefix per kernel convention */
strncat(dst, name + section_names[i].len,
sizeof(raw_tp_btf_name) - sizeof(BTF_PREFIX));
ret = btf__find_by_name(btf, raw_tp_btf_name);
if (ret <= 0) {
pr_warn("%s is not found in vmlinux BTF\n", dst);
goto out;
}
*btf_id = ret;
err = 0;
goto out;
if (attach_prog_fd)
err = libbpf_find_prog_btf_id(name + section_names[i].len,
attach_prog_fd);
else
err = libbpf_find_vmlinux_btf_id(name + section_names[i].len,
attach_type);
if (err <= 0)
pr_warn("%s is not found in vmlinux BTF\n", name);
return err;
}
pr_warn("failed to identify btf_id based on ELF section name '%s'\n", name);
err = -ESRCH;
out:
btf__free(btf);
return err;
return -ESRCH;
}
int libbpf_attach_type_by_name(const char *name,
@ -5594,6 +5813,37 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
return (struct bpf_link *)link;
}
struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog)
{
char errmsg[STRERR_BUFSIZE];
struct bpf_link_fd *link;
int prog_fd, pfd;
prog_fd = bpf_program__fd(prog);
if (prog_fd < 0) {
pr_warn("program '%s': can't attach before loaded\n",
bpf_program__title(prog, false));
return ERR_PTR(-EINVAL);
}
link = malloc(sizeof(*link));
if (!link)
return ERR_PTR(-ENOMEM);
link->link.destroy = &bpf_link__destroy_fd;
pfd = bpf_raw_tracepoint_open(NULL, prog_fd);
if (pfd < 0) {
pfd = -errno;
free(link);
pr_warn("program '%s': failed to attach to trace: %s\n",
bpf_program__title(prog, false),
libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
return ERR_PTR(pfd);
}
link->fd = pfd;
return (struct bpf_link *)link;
}
enum bpf_perf_event_ret
bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
void **copy_mem, size_t *copy_size,

View File

@ -108,8 +108,9 @@ struct bpf_object_open_opts {
* auto-pinned to that path on load; defaults to "/sys/fs/bpf".
*/
const char *pin_root_path;
__u32 attach_prog_fd;
};
#define bpf_object_open_opts__last_field pin_root_path
#define bpf_object_open_opts__last_field attach_prog_fd
LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
LIBBPF_API struct bpf_object *
@ -188,6 +189,8 @@ libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
enum bpf_attach_type *expected_attach_type);
LIBBPF_API int libbpf_attach_type_by_name(const char *name,
enum bpf_attach_type *attach_type);
LIBBPF_API int libbpf_find_vmlinux_btf_id(const char *name,
enum bpf_attach_type attach_type);
/* Accessors of bpf_program */
struct bpf_program;
@ -214,6 +217,9 @@ LIBBPF_API void bpf_program__set_ifindex(struct bpf_program *prog,
LIBBPF_API const char *bpf_program__title(const struct bpf_program *prog,
bool needs_copy);
/* returns program size in bytes */
LIBBPF_API size_t bpf_program__size(const struct bpf_program *prog);
LIBBPF_API int bpf_program__load(struct bpf_program *prog, char *license,
__u32 kern_version);
LIBBPF_API int bpf_program__fd(const struct bpf_program *prog);
@ -248,6 +254,8 @@ LIBBPF_API struct bpf_link *
bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
const char *tp_name);
LIBBPF_API struct bpf_link *
bpf_program__attach_trace(struct bpf_program *prog);
struct bpf_insn;
/*
@ -427,8 +435,18 @@ LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
LIBBPF_API int bpf_prog_load(const char *file, enum bpf_prog_type type,
struct bpf_object **pobj, int *prog_fd);
struct xdp_link_info {
__u32 prog_id;
__u32 drv_prog_id;
__u32 hw_prog_id;
__u32 skb_prog_id;
__u8 attach_mode;
};
LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
size_t info_size, __u32 flags);
struct perf_buffer;

View File

@ -193,13 +193,18 @@ LIBBPF_0.0.5 {
LIBBPF_0.0.6 {
global:
bpf_get_link_xdp_info;
bpf_map__get_pin_path;
bpf_map__is_pinned;
bpf_map__set_pin_path;
bpf_object__open_file;
bpf_object__open_mem;
bpf_program__attach_trace;
bpf_program__get_expected_attach_type;
bpf_program__get_type;
bpf_program__is_tracing;
bpf_program__set_tracing;
bpf_program__size;
btf__find_by_name_kind;
libbpf_find_vmlinux_btf_id;
} LIBBPF_0.0.5;

View File

@ -158,7 +158,11 @@ struct bpf_line_info_min {
*/
enum bpf_field_info_kind {
BPF_FIELD_BYTE_OFFSET = 0, /* field byte offset */
BPF_FIELD_BYTE_SIZE = 1,
BPF_FIELD_EXISTS = 2, /* field existence in target kernel */
BPF_FIELD_SIGNED = 3,
BPF_FIELD_LSHIFT_U64 = 4,
BPF_FIELD_RSHIFT_U64 = 5,
};
/* The minimum bpf_field_reloc checked by the loader

View File

@ -12,6 +12,7 @@
#include "bpf.h"
#include "libbpf.h"
#include "libbpf_internal.h"
#include "nlattr.h"
#ifndef SOL_NETLINK
@ -24,7 +25,7 @@ typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t,
struct xdp_id_md {
int ifindex;
__u32 flags;
__u32 id;
struct xdp_link_info info;
};
int libbpf_netlink_open(__u32 *nl_pid)
@ -43,7 +44,7 @@ int libbpf_netlink_open(__u32 *nl_pid)
if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
&one, sizeof(one)) < 0) {
fprintf(stderr, "Netlink error reporting not supported\n");
pr_warn("Netlink error reporting not supported\n");
}
if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
@ -202,26 +203,11 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh,
return dump_link_nlmsg(cookie, ifi, tb);
}
static unsigned char get_xdp_id_attr(unsigned char mode, __u32 flags)
{
if (mode != XDP_ATTACHED_MULTI)
return IFLA_XDP_PROG_ID;
if (flags & XDP_FLAGS_DRV_MODE)
return IFLA_XDP_DRV_PROG_ID;
if (flags & XDP_FLAGS_HW_MODE)
return IFLA_XDP_HW_PROG_ID;
if (flags & XDP_FLAGS_SKB_MODE)
return IFLA_XDP_SKB_PROG_ID;
return IFLA_XDP_UNSPEC;
}
static int get_xdp_id(void *cookie, void *msg, struct nlattr **tb)
static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
{
struct nlattr *xdp_tb[IFLA_XDP_MAX + 1];
struct xdp_id_md *xdp_id = cookie;
struct ifinfomsg *ifinfo = msg;
unsigned char mode, xdp_attr;
int ret;
if (xdp_id->ifindex && xdp_id->ifindex != ifinfo->ifi_index)
@ -237,27 +223,40 @@ static int get_xdp_id(void *cookie, void *msg, struct nlattr **tb)
if (!xdp_tb[IFLA_XDP_ATTACHED])
return 0;
mode = libbpf_nla_getattr_u8(xdp_tb[IFLA_XDP_ATTACHED]);
if (mode == XDP_ATTACHED_NONE)
xdp_id->info.attach_mode = libbpf_nla_getattr_u8(
xdp_tb[IFLA_XDP_ATTACHED]);
if (xdp_id->info.attach_mode == XDP_ATTACHED_NONE)
return 0;
xdp_attr = get_xdp_id_attr(mode, xdp_id->flags);
if (!xdp_attr || !xdp_tb[xdp_attr])
return 0;
if (xdp_tb[IFLA_XDP_PROG_ID])
xdp_id->info.prog_id = libbpf_nla_getattr_u32(
xdp_tb[IFLA_XDP_PROG_ID]);
xdp_id->id = libbpf_nla_getattr_u32(xdp_tb[xdp_attr]);
if (xdp_tb[IFLA_XDP_SKB_PROG_ID])
xdp_id->info.skb_prog_id = libbpf_nla_getattr_u32(
xdp_tb[IFLA_XDP_SKB_PROG_ID]);
if (xdp_tb[IFLA_XDP_DRV_PROG_ID])
xdp_id->info.drv_prog_id = libbpf_nla_getattr_u32(
xdp_tb[IFLA_XDP_DRV_PROG_ID]);
if (xdp_tb[IFLA_XDP_HW_PROG_ID])
xdp_id->info.hw_prog_id = libbpf_nla_getattr_u32(
xdp_tb[IFLA_XDP_HW_PROG_ID]);
return 0;
}
int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
size_t info_size, __u32 flags)
{
struct xdp_id_md xdp_id = {};
int sock, ret;
__u32 nl_pid;
__u32 mask;
if (flags & ~XDP_FLAGS_MASK)
if (flags & ~XDP_FLAGS_MASK || !info_size)
return -EINVAL;
/* Check whether the single {HW,DRV,SKB} mode is set */
@ -273,14 +272,44 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
xdp_id.ifindex = ifindex;
xdp_id.flags = flags;
ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_id, &xdp_id);
if (!ret)
*prog_id = xdp_id.id;
ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
if (!ret) {
size_t sz = min(info_size, sizeof(xdp_id.info));
memcpy(info, &xdp_id.info, sz);
memset((void *) info + sz, 0, info_size - sz);
}
close(sock);
return ret;
}
static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags)
{
if (info->attach_mode != XDP_ATTACHED_MULTI)
return info->prog_id;
if (flags & XDP_FLAGS_DRV_MODE)
return info->drv_prog_id;
if (flags & XDP_FLAGS_HW_MODE)
return info->hw_prog_id;
if (flags & XDP_FLAGS_SKB_MODE)
return info->skb_prog_id;
return 0;
}
int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
{
struct xdp_link_info info;
int ret;
ret = bpf_get_link_xdp_info(ifindex, &info, sizeof(info), flags);
if (!ret)
*prog_id = get_xdp_id(&info, flags);
return ret;
}
int libbpf_nl_get_link(int sock, unsigned int nl_pid,
libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
{

View File

@ -8,6 +8,7 @@
#include <errno.h>
#include "nlattr.h"
#include "libbpf_internal.h"
#include <linux/rtnetlink.h>
#include <string.h>
#include <stdio.h>
@ -121,8 +122,8 @@ int libbpf_nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head,
}
if (tb[type])
fprintf(stderr, "Attribute of type %#x found multiple times in message, "
"previous attribute is being ignored.\n", type);
pr_warn("Attribute of type %#x found multiple times in message, "
"previous attribute is being ignored.\n", type);
tb[type] = nla;
}
@ -181,15 +182,14 @@ int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh)
if (libbpf_nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen,
extack_policy) != 0) {
fprintf(stderr,
"Failed to parse extended error attributes\n");
pr_warn("Failed to parse extended error attributes\n");
return 0;
}
if (tb[NLMSGERR_ATTR_MSG])
errmsg = (char *) libbpf_nla_data(tb[NLMSGERR_ATTR_MSG]);
fprintf(stderr, "Kernel error message: %s\n", errmsg);
pr_warn("Kernel error message: %s\n", errmsg);
return 0;
}

View File

@ -431,13 +431,18 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
goto out;
}
if (err || channels.max_combined == 0)
if (err) {
/* If the device says it has no channels, then all traffic
* is sent to a single stream, so max queues = 1.
*/
ret = 1;
else
ret = channels.max_combined;
} else {
/* Take the max of rx, tx, combined. Drivers return
* the number of channels in different ways.
*/
ret = max(channels.max_rx, channels.max_tx);
ret = max(ret, (int)channels.max_combined);
}
out:
close(fd);
@ -553,6 +558,8 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
}
} else {
xsk->prog_fd = bpf_prog_get_fd_by_id(prog_id);
if (xsk->prog_fd < 0)
return -errno;
err = xsk_lookup_bpf_maps(xsk);
if (err) {
close(xsk->prog_fd);
@ -560,7 +567,8 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
}
}
err = xsk_set_bpf_maps(xsk);
if (xsk->rx)
err = xsk_set_bpf_maps(xsk);
if (err) {
xsk_delete_bpf_maps(xsk);
close(xsk->prog_fd);
@ -581,18 +589,24 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
struct xsk_socket *xsk;
int err;
if (!umem || !xsk_ptr || !rx || !tx)
if (!umem || !xsk_ptr || !(rx || tx))
return -EFAULT;
if (umem->refcount) {
pr_warn("Error: shared umems not supported by libbpf.\n");
return -EBUSY;
}
xsk = calloc(1, sizeof(*xsk));
if (!xsk)
return -ENOMEM;
err = xsk_set_xdp_socket_config(&xsk->config, usr_config);
if (err)
goto out_xsk_alloc;
if (umem->refcount &&
!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
pr_warn("Error: shared umems not supported by libbpf supplied XDP program.\n");
err = -EBUSY;
goto out_xsk_alloc;
}
if (umem->refcount++ > 0) {
xsk->fd = socket(AF_XDP, SOCK_RAW, 0);
if (xsk->fd < 0) {
@ -614,10 +628,6 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
memcpy(xsk->ifname, ifname, IFNAMSIZ - 1);
xsk->ifname[IFNAMSIZ - 1] = '\0';
err = xsk_set_xdp_socket_config(&xsk->config, usr_config);
if (err)
goto out_socket;
if (rx) {
err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING,
&xsk->config.rx_size,
@ -685,7 +695,12 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
sxdp.sxdp_family = PF_XDP;
sxdp.sxdp_ifindex = xsk->ifindex;
sxdp.sxdp_queue_id = xsk->queue_id;
sxdp.sxdp_flags = xsk->config.bind_flags;
if (umem->refcount > 1) {
sxdp.sxdp_flags = XDP_SHARED_UMEM;
sxdp.sxdp_shared_umem_fd = umem->fd;
} else {
sxdp.sxdp_flags = xsk->config.bind_flags;
}
err = bind(xsk->fd, (struct sockaddr *)&sxdp, sizeof(sxdp));
if (err) {

View File

@ -30,7 +30,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \
test_cgroup_storage test_select_reuseport \
test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
test_cgroup_attach xdping test_progs-no_alu32
test_cgroup_attach test_progs-no_alu32
# Also test bpf-gcc, if present
ifneq ($(BPF_GCC),)
@ -38,7 +38,8 @@ TEST_GEN_PROGS += test_progs-bpf_gcc
endif
TEST_GEN_FILES =
TEST_FILES =
TEST_FILES = test_lwt_ip_encap.o \
test_tc_edt.o
# Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \
@ -70,7 +71,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \
# Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
test_lirc_mode2_user
test_lirc_mode2_user xdping
TEST_CUSTOM_PROGS = urandom_read
@ -162,6 +163,12 @@ define CLANG_BPF_BUILD_RULE
-c $1 -o - || echo "BPF obj compilation failed") | \
$(LLC) -march=bpf -mcpu=probe $4 -filetype=obj -o $2
endef
# Similar to CLANG_BPF_BUILD_RULE, but with disabled alu32
define CLANG_NOALU32_BPF_BUILD_RULE
($(CLANG) $3 -O2 -target bpf -emit-llvm \
-c $1 -o - || echo "BPF obj compilation failed") | \
$(LLC) -march=bpf -mcpu=v2 $4 -filetype=obj -o $2
endef
# Similar to CLANG_BPF_BUILD_RULE, but using native Clang and bpf LLC
define CLANG_NATIVE_BPF_BUILD_RULE
($(CLANG) $3 -O2 -emit-llvm \
@ -274,6 +281,7 @@ TRUNNER_BPF_LDFLAGS := -mattr=+alu32
$(eval $(call DEFINE_TEST_RUNNER,test_progs))
# Define test_progs-no_alu32 test runner.
TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
TRUNNER_BPF_LDFLAGS :=
$(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))

View File

@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include "progs/core_reloc_types.h"
#include <sys/mman.h>
#define STRUCT_TO_CHAR_PTR(struct_name) (const char *)&(struct struct_name)
@ -174,21 +175,82 @@
.fails = true, \
}
#define EXISTENCE_DATA(struct_name) STRUCT_TO_CHAR_PTR(struct_name) { \
.a = 42, \
}
#define EXISTENCE_CASE_COMMON(name) \
.case_name = #name, \
.bpf_obj_file = "test_core_reloc_existence.o", \
.btf_src_file = "btf__core_reloc_" #name ".o", \
.relaxed_core_relocs = true \
.relaxed_core_relocs = true
#define EXISTENCE_ERR_CASE(name) { \
EXISTENCE_CASE_COMMON(name), \
.fails = true, \
}
#define BITFIELDS_CASE_COMMON(objfile, test_name_prefix, name) \
.case_name = test_name_prefix#name, \
.bpf_obj_file = objfile, \
.btf_src_file = "btf__core_reloc_" #name ".o"
#define BITFIELDS_CASE(name, ...) { \
BITFIELDS_CASE_COMMON("test_core_reloc_bitfields_probed.o", \
"direct:", name), \
.input = STRUCT_TO_CHAR_PTR(core_reloc_##name) __VA_ARGS__, \
.input_len = sizeof(struct core_reloc_##name), \
.output = STRUCT_TO_CHAR_PTR(core_reloc_bitfields_output) \
__VA_ARGS__, \
.output_len = sizeof(struct core_reloc_bitfields_output), \
}, { \
BITFIELDS_CASE_COMMON("test_core_reloc_bitfields_direct.o", \
"probed:", name), \
.input = STRUCT_TO_CHAR_PTR(core_reloc_##name) __VA_ARGS__, \
.input_len = sizeof(struct core_reloc_##name), \
.output = STRUCT_TO_CHAR_PTR(core_reloc_bitfields_output) \
__VA_ARGS__, \
.output_len = sizeof(struct core_reloc_bitfields_output), \
.direct_raw_tp = true, \
}
#define BITFIELDS_ERR_CASE(name) { \
BITFIELDS_CASE_COMMON("test_core_reloc_bitfields_probed.o", \
"probed:", name), \
.fails = true, \
}, { \
BITFIELDS_CASE_COMMON("test_core_reloc_bitfields_direct.o", \
"direct:", name), \
.direct_raw_tp = true, \
.fails = true, \
}
#define SIZE_CASE_COMMON(name) \
.case_name = #name, \
.bpf_obj_file = "test_core_reloc_size.o", \
.btf_src_file = "btf__core_reloc_" #name ".o", \
.relaxed_core_relocs = true
#define SIZE_OUTPUT_DATA(type) \
STRUCT_TO_CHAR_PTR(core_reloc_size_output) { \
.int_sz = sizeof(((type *)0)->int_field), \
.struct_sz = sizeof(((type *)0)->struct_field), \
.union_sz = sizeof(((type *)0)->union_field), \
.arr_sz = sizeof(((type *)0)->arr_field), \
.arr_elem_sz = sizeof(((type *)0)->arr_field[0]), \
.ptr_sz = sizeof(((type *)0)->ptr_field), \
.enum_sz = sizeof(((type *)0)->enum_field), \
}
#define SIZE_CASE(name) { \
SIZE_CASE_COMMON(name), \
.input_len = 0, \
.output = SIZE_OUTPUT_DATA(struct core_reloc_##name), \
.output_len = sizeof(struct core_reloc_size_output), \
}
#define SIZE_ERR_CASE(name) { \
SIZE_CASE_COMMON(name), \
.fails = true, \
}
struct core_reloc_test_case {
const char *case_name;
const char *bpf_obj_file;
@ -199,6 +261,7 @@ struct core_reloc_test_case {
int output_len;
bool fails;
bool relaxed_core_relocs;
bool direct_raw_tp;
};
static struct core_reloc_test_case test_cases[] = {
@ -275,12 +338,6 @@ static struct core_reloc_test_case test_cases[] = {
INTS_CASE(ints___bool),
INTS_CASE(ints___reverse_sign),
INTS_ERR_CASE(ints___err_bitfield),
INTS_ERR_CASE(ints___err_wrong_sz_8),
INTS_ERR_CASE(ints___err_wrong_sz_16),
INTS_ERR_CASE(ints___err_wrong_sz_32),
INTS_ERR_CASE(ints___err_wrong_sz_64),
/* validate edge cases of capturing relocations */
{
.case_name = "misc",
@ -352,6 +409,44 @@ static struct core_reloc_test_case test_cases[] = {
EXISTENCE_ERR_CASE(existence__err_arr_kind),
EXISTENCE_ERR_CASE(existence__err_arr_value_type),
EXISTENCE_ERR_CASE(existence__err_struct_type),
/* bitfield relocation checks */
BITFIELDS_CASE(bitfields, {
.ub1 = 1,
.ub2 = 2,
.ub7 = 96,
.sb4 = -7,
.sb20 = -0x76543,
.u32 = 0x80000000,
.s32 = -0x76543210,
}),
BITFIELDS_CASE(bitfields___bit_sz_change, {
.ub1 = 6,
.ub2 = 0xABCDE,
.ub7 = 1,
.sb4 = -1,
.sb20 = -0x17654321,
.u32 = 0xBEEF,
.s32 = -0x3FEDCBA987654321,
}),
BITFIELDS_CASE(bitfields___bitfield_vs_int, {
.ub1 = 0xFEDCBA9876543210,
.ub2 = 0xA6,
.ub7 = -0x7EDCBA987654321,
.sb4 = -0x6123456789ABCDE,
.sb20 = 0xD00D,
.u32 = -0x76543,
.s32 = 0x0ADEADBEEFBADB0B,
}),
BITFIELDS_CASE(bitfields___just_big_enough, {
.ub1 = 0xF,
.ub2 = 0x0812345678FEDCBA,
}),
BITFIELDS_ERR_CASE(bitfields___err_too_big_bitfield),
/* size relocation checks */
SIZE_CASE(size),
SIZE_CASE(size___diff_sz),
};
struct data {
@ -359,18 +454,25 @@ struct data {
char out[256];
};
static size_t roundup_page(size_t sz)
{
long page_size = sysconf(_SC_PAGE_SIZE);
return (sz + page_size - 1) / page_size * page_size;
}
void test_core_reloc(void)
{
const char *probe_name = "raw_tracepoint/sys_enter";
const size_t mmap_sz = roundup_page(sizeof(struct data));
struct bpf_object_load_attr load_attr = {};
struct core_reloc_test_case *test_case;
const char *tp_name, *probe_name;
int err, duration = 0, i, equal;
struct bpf_link *link = NULL;
struct bpf_map *data_map;
struct bpf_program *prog;
struct bpf_object *obj;
const int zero = 0;
struct data data;
struct data *data;
void *mmap_data = NULL;
for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
test_case = &test_cases[i];
@ -382,11 +484,19 @@ void test_core_reloc(void)
);
obj = bpf_object__open_file(test_case->bpf_obj_file, &opts);
if (CHECK(IS_ERR_OR_NULL(obj), "obj_open",
"failed to open '%s': %ld\n",
if (CHECK(IS_ERR(obj), "obj_open", "failed to open '%s': %ld\n",
test_case->bpf_obj_file, PTR_ERR(obj)))
continue;
/* for typed raw tracepoints, NULL should be specified */
if (test_case->direct_raw_tp) {
probe_name = "tp_btf/sys_enter";
tp_name = NULL;
} else {
probe_name = "raw_tracepoint/sys_enter";
tp_name = "sys_enter";
}
prog = bpf_object__find_program_by_title(obj, probe_name);
if (CHECK(!prog, "find_probe",
"prog '%s' not found\n", probe_name))
@ -407,7 +517,7 @@ void test_core_reloc(void)
goto cleanup;
}
link = bpf_program__attach_raw_tracepoint(prog, "sys_enter");
link = bpf_program__attach_raw_tracepoint(prog, tp_name);
if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n",
PTR_ERR(link)))
goto cleanup;
@ -416,24 +526,22 @@ void test_core_reloc(void)
if (CHECK(!data_map, "find_data_map", "data map not found\n"))
goto cleanup;
memset(&data, 0, sizeof(data));
memcpy(data.in, test_case->input, test_case->input_len);
err = bpf_map_update_elem(bpf_map__fd(data_map),
&zero, &data, 0);
if (CHECK(err, "update_data_map",
"failed to update .data map: %d\n", err))
mmap_data = mmap(NULL, mmap_sz, PROT_READ | PROT_WRITE,
MAP_SHARED, bpf_map__fd(data_map), 0);
if (CHECK(mmap_data == MAP_FAILED, "mmap",
".bss mmap failed: %d", errno)) {
mmap_data = NULL;
goto cleanup;
}
data = mmap_data;
memset(mmap_data, 0, sizeof(*data));
memcpy(data->in, test_case->input, test_case->input_len);
/* trigger test run */
usleep(1);
err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &data);
if (CHECK(err, "get_result",
"failed to get output data: %d\n", err))
goto cleanup;
equal = memcmp(data.out, test_case->output,
equal = memcmp(data->out, test_case->output,
test_case->output_len) == 0;
if (CHECK(!equal, "check_result",
"input/output data don't match\n")) {
@ -445,12 +553,16 @@ void test_core_reloc(void)
}
for (j = 0; j < test_case->output_len; j++) {
printf("output byte #%d: EXP 0x%02hhx GOT 0x%02hhx\n",
j, test_case->output[j], data.out[j]);
j, test_case->output[j], data->out[j]);
}
goto cleanup;
}
cleanup:
if (mmap_data) {
CHECK_FAIL(munmap(mmap_data, mmap_sz));
mmap_data = NULL;
}
if (!IS_ERR_OR_NULL(link)) {
bpf_link__destroy(link);
link = NULL;

View File

@ -0,0 +1,90 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
void test_fentry_fexit(void)
{
struct bpf_prog_load_attr attr_fentry = {
.file = "./fentry_test.o",
};
struct bpf_prog_load_attr attr_fexit = {
.file = "./fexit_test.o",
};
struct bpf_object *obj_fentry = NULL, *obj_fexit = NULL, *pkt_obj;
struct bpf_map *data_map_fentry, *data_map_fexit;
char fentry_name[] = "fentry/bpf_fentry_testX";
char fexit_name[] = "fexit/bpf_fentry_testX";
int err, pkt_fd, kfree_skb_fd, i;
struct bpf_link *link[12] = {};
struct bpf_program *prog[12];
__u32 duration, retval;
const int zero = 0;
u64 result[12];
err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS,
&pkt_obj, &pkt_fd);
if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
return;
err = bpf_prog_load_xattr(&attr_fentry, &obj_fentry, &kfree_skb_fd);
if (CHECK(err, "prog_load fail", "err %d errno %d\n", err, errno))
goto close_prog;
err = bpf_prog_load_xattr(&attr_fexit, &obj_fexit, &kfree_skb_fd);
if (CHECK(err, "prog_load fail", "err %d errno %d\n", err, errno))
goto close_prog;
for (i = 0; i < 6; i++) {
fentry_name[sizeof(fentry_name) - 2] = '1' + i;
prog[i] = bpf_object__find_program_by_title(obj_fentry, fentry_name);
if (CHECK(!prog[i], "find_prog", "prog %s not found\n", fentry_name))
goto close_prog;
link[i] = bpf_program__attach_trace(prog[i]);
if (CHECK(IS_ERR(link[i]), "attach_trace", "failed to link\n"))
goto close_prog;
}
data_map_fentry = bpf_object__find_map_by_name(obj_fentry, "fentry_t.bss");
if (CHECK(!data_map_fentry, "find_data_map", "data map not found\n"))
goto close_prog;
for (i = 6; i < 12; i++) {
fexit_name[sizeof(fexit_name) - 2] = '1' + i - 6;
prog[i] = bpf_object__find_program_by_title(obj_fexit, fexit_name);
if (CHECK(!prog[i], "find_prog", "prog %s not found\n", fexit_name))
goto close_prog;
link[i] = bpf_program__attach_trace(prog[i]);
if (CHECK(IS_ERR(link[i]), "attach_trace", "failed to link\n"))
goto close_prog;
}
data_map_fexit = bpf_object__find_map_by_name(obj_fexit, "fexit_te.bss");
if (CHECK(!data_map_fexit, "find_data_map", "data map not found\n"))
goto close_prog;
err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
NULL, NULL, &retval, &duration);
CHECK(err || retval, "ipv6",
"err %d errno %d retval %d duration %d\n",
err, errno, retval, duration);
err = bpf_map_lookup_elem(bpf_map__fd(data_map_fentry), &zero, &result);
if (CHECK(err, "get_result",
"failed to get output data: %d\n", err))
goto close_prog;
err = bpf_map_lookup_elem(bpf_map__fd(data_map_fexit), &zero, result + 6);
if (CHECK(err, "get_result",
"failed to get output data: %d\n", err))
goto close_prog;
for (i = 0; i < 12; i++)
if (CHECK(result[i] != 1, "result", "bpf_fentry_test%d failed err %ld\n",
i % 6 + 1, result[i]))
goto close_prog;
close_prog:
for (i = 0; i < 12; i++)
if (!IS_ERR_OR_NULL(link[i]))
bpf_link__destroy(link[i]);
bpf_object__close(obj_fentry);
bpf_object__close(obj_fexit);
bpf_object__close(pkt_obj);
}

View File

@ -0,0 +1,64 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
void test_fentry_test(void)
{
struct bpf_prog_load_attr attr = {
.file = "./fentry_test.o",
};
char prog_name[] = "fentry/bpf_fentry_testX";
struct bpf_object *obj = NULL, *pkt_obj;
int err, pkt_fd, kfree_skb_fd, i;
struct bpf_link *link[6] = {};
struct bpf_program *prog[6];
__u32 duration, retval;
struct bpf_map *data_map;
const int zero = 0;
u64 result[6];
err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS,
&pkt_obj, &pkt_fd);
if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
return;
err = bpf_prog_load_xattr(&attr, &obj, &kfree_skb_fd);
if (CHECK(err, "prog_load fail", "err %d errno %d\n", err, errno))
goto close_prog;
for (i = 0; i < 6; i++) {
prog_name[sizeof(prog_name) - 2] = '1' + i;
prog[i] = bpf_object__find_program_by_title(obj, prog_name);
if (CHECK(!prog[i], "find_prog", "prog %s not found\n", prog_name))
goto close_prog;
link[i] = bpf_program__attach_trace(prog[i]);
if (CHECK(IS_ERR(link[i]), "attach_trace", "failed to link\n"))
goto close_prog;
}
data_map = bpf_object__find_map_by_name(obj, "fentry_t.bss");
if (CHECK(!data_map, "find_data_map", "data map not found\n"))
goto close_prog;
err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
NULL, NULL, &retval, &duration);
CHECK(err || retval, "ipv6",
"err %d errno %d retval %d duration %d\n",
err, errno, retval, duration);
err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &result);
if (CHECK(err, "get_result",
"failed to get output data: %d\n", err))
goto close_prog;
for (i = 0; i < 6; i++)
if (CHECK(result[i] != 1, "result", "bpf_fentry_test%d failed err %ld\n",
i + 1, result[i]))
goto close_prog;
close_prog:
for (i = 0; i < 6; i++)
if (!IS_ERR_OR_NULL(link[i]))
bpf_link__destroy(link[i]);
bpf_object__close(obj);
bpf_object__close(pkt_obj);
}

View File

@ -0,0 +1,76 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
#define PROG_CNT 3
void test_fexit_bpf2bpf(void)
{
const char *prog_name[PROG_CNT] = {
"fexit/test_pkt_access",
"fexit/test_pkt_access_subprog1",
"fexit/test_pkt_access_subprog2",
};
struct bpf_object *obj = NULL, *pkt_obj;
int err, pkt_fd, i;
struct bpf_link *link[PROG_CNT] = {};
struct bpf_program *prog[PROG_CNT];
__u32 duration, retval;
struct bpf_map *data_map;
const int zero = 0;
u64 result[PROG_CNT];
err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_UNSPEC,
&pkt_obj, &pkt_fd);
if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
return;
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts,
.attach_prog_fd = pkt_fd,
);
obj = bpf_object__open_file("./fexit_bpf2bpf.o", &opts);
if (CHECK(IS_ERR_OR_NULL(obj), "obj_open",
"failed to open fexit_bpf2bpf: %ld\n",
PTR_ERR(obj)))
goto close_prog;
err = bpf_object__load(obj);
if (CHECK(err, "obj_load", "err %d\n", err))
goto close_prog;
for (i = 0; i < PROG_CNT; i++) {
prog[i] = bpf_object__find_program_by_title(obj, prog_name[i]);
if (CHECK(!prog[i], "find_prog", "prog %s not found\n", prog_name[i]))
goto close_prog;
link[i] = bpf_program__attach_trace(prog[i]);
if (CHECK(IS_ERR(link[i]), "attach_trace", "failed to link\n"))
goto close_prog;
}
data_map = bpf_object__find_map_by_name(obj, "fexit_bp.bss");
if (CHECK(!data_map, "find_data_map", "data map not found\n"))
goto close_prog;
err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
NULL, NULL, &retval, &duration);
CHECK(err || retval, "ipv6",
"err %d errno %d retval %d duration %d\n",
err, errno, retval, duration);
err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &result);
if (CHECK(err, "get_result",
"failed to get output data: %d\n", err))
goto close_prog;
for (i = 0; i < PROG_CNT; i++)
if (CHECK(result[i] != 1, "result", "fexit_bpf2bpf failed err %ld\n",
result[i]))
goto close_prog;
close_prog:
for (i = 0; i < PROG_CNT; i++)
if (!IS_ERR_OR_NULL(link[i]))
bpf_link__destroy(link[i]);
if (!IS_ERR_OR_NULL(obj))
bpf_object__close(obj);
bpf_object__close(pkt_obj);
}

View File

@ -0,0 +1,76 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
/* x86-64 fits 55 JITed and 43 interpreted progs into half page */
#define CNT 40
void test_fexit_stress(void)
{
char test_skb[128] = {};
int fexit_fd[CNT] = {};
int link_fd[CNT] = {};
__u32 duration = 0;
char error[4096];
__u32 prog_ret;
int err, i, filter_fd;
const struct bpf_insn trace_program[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
struct bpf_load_program_attr load_attr = {
.prog_type = BPF_PROG_TYPE_TRACING,
.license = "GPL",
.insns = trace_program,
.insns_cnt = sizeof(trace_program) / sizeof(struct bpf_insn),
.expected_attach_type = BPF_TRACE_FEXIT,
};
const struct bpf_insn skb_program[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
struct bpf_load_program_attr skb_load_attr = {
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.license = "GPL",
.insns = skb_program,
.insns_cnt = sizeof(skb_program) / sizeof(struct bpf_insn),
};
err = libbpf_find_vmlinux_btf_id("bpf_fentry_test1",
load_attr.expected_attach_type);
if (CHECK(err <= 0, "find_vmlinux_btf_id", "failed: %d\n", err))
goto out;
load_attr.attach_btf_id = err;
for (i = 0; i < CNT; i++) {
fexit_fd[i] = bpf_load_program_xattr(&load_attr, error, sizeof(error));
if (CHECK(fexit_fd[i] < 0, "fexit loaded",
"failed: %d errno %d\n", fexit_fd[i], errno))
goto out;
link_fd[i] = bpf_raw_tracepoint_open(NULL, fexit_fd[i]);
if (CHECK(link_fd[i] < 0, "fexit attach failed",
"prog %d failed: %d err %d\n", i, link_fd[i], errno))
goto out;
}
filter_fd = bpf_load_program_xattr(&skb_load_attr, error, sizeof(error));
if (CHECK(filter_fd < 0, "test_program_loaded", "failed: %d errno %d\n",
filter_fd, errno))
goto out;
err = bpf_prog_test_run(filter_fd, 1, test_skb, sizeof(test_skb), 0,
0, &prog_ret, 0);
close(filter_fd);
CHECK_FAIL(err);
out:
for (i = 0; i < CNT; i++) {
if (link_fd[i])
close(link_fd[i]);
if (fexit_fd[i])
close(fexit_fd[i]);
}
}

View File

@ -0,0 +1,64 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
void test_fexit_test(void)
{
struct bpf_prog_load_attr attr = {
.file = "./fexit_test.o",
};
char prog_name[] = "fexit/bpf_fentry_testX";
struct bpf_object *obj = NULL, *pkt_obj;
int err, pkt_fd, kfree_skb_fd, i;
struct bpf_link *link[6] = {};
struct bpf_program *prog[6];
__u32 duration, retval;
struct bpf_map *data_map;
const int zero = 0;
u64 result[6];
err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS,
&pkt_obj, &pkt_fd);
if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
return;
err = bpf_prog_load_xattr(&attr, &obj, &kfree_skb_fd);
if (CHECK(err, "prog_load fail", "err %d errno %d\n", err, errno))
goto close_prog;
for (i = 0; i < 6; i++) {
prog_name[sizeof(prog_name) - 2] = '1' + i;
prog[i] = bpf_object__find_program_by_title(obj, prog_name);
if (CHECK(!prog[i], "find_prog", "prog %s not found\n", prog_name))
goto close_prog;
link[i] = bpf_program__attach_trace(prog[i]);
if (CHECK(IS_ERR(link[i]), "attach_trace", "failed to link\n"))
goto close_prog;
}
data_map = bpf_object__find_map_by_name(obj, "fexit_te.bss");
if (CHECK(!data_map, "find_data_map", "data map not found\n"))
goto close_prog;
err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
NULL, NULL, &retval, &duration);
CHECK(err || retval, "ipv6",
"err %d errno %d retval %d duration %d\n",
err, errno, retval, duration);
err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &result);
if (CHECK(err, "get_result",
"failed to get output data: %d\n", err))
goto close_prog;
for (i = 0; i < 6; i++)
if (CHECK(result[i] != 1, "result", "bpf_fentry_test%d failed err %ld\n",
i + 1, result[i]))
goto close_prog;
close_prog:
for (i = 0; i < 6; i++)
if (!IS_ERR_OR_NULL(link[i]))
bpf_link__destroy(link[i]);
bpf_object__close(obj);
bpf_object__close(pkt_obj);
}

View File

@ -1,15 +1,38 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
struct meta {
int ifindex;
__u32 cb32_0;
__u8 cb8_0;
};
static union {
__u32 cb32[5];
__u8 cb8[20];
} cb = {
.cb32[0] = 0x81828384,
};
static void on_sample(void *ctx, int cpu, void *data, __u32 size)
{
int ifindex = *(int *)data, duration = 0;
struct ipv6_packet *pkt_v6 = data + 4;
struct meta *meta = (struct meta *)data;
struct ipv6_packet *pkt_v6 = data + sizeof(*meta);
int duration = 0;
if (ifindex != 1)
if (CHECK(size != 72 + sizeof(*meta), "check_size", "size %u != %zu\n",
size, 72 + sizeof(*meta)))
return;
if (CHECK(meta->ifindex != 1, "check_meta_ifindex",
"meta->ifindex = %d\n", meta->ifindex))
/* spurious kfree_skb not on loopback device */
return;
if (CHECK(size != 76, "check_size", "size %u != 76\n", size))
if (CHECK(meta->cb8_0 != cb.cb8[0], "check_cb8_0", "cb8_0 %x != %x\n",
meta->cb8_0, cb.cb8[0]))
return;
if (CHECK(meta->cb32_0 != cb.cb32[0], "check_cb32_0",
"cb32_0 %x != %x\n",
meta->cb32_0, cb.cb32[0]))
return;
if (CHECK(pkt_v6->eth.h_proto != 0xdd86, "check_eth",
"h_proto %x\n", pkt_v6->eth.h_proto))
@ -26,21 +49,31 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size)
void test_kfree_skb(void)
{
struct __sk_buff skb = {};
struct bpf_prog_test_run_attr tattr = {
.data_in = &pkt_v6,
.data_size_in = sizeof(pkt_v6),
.ctx_in = &skb,
.ctx_size_in = sizeof(skb),
};
struct bpf_prog_load_attr attr = {
.file = "./kfree_skb.o",
};
struct bpf_link *link = NULL, *link_fentry = NULL, *link_fexit = NULL;
struct bpf_map *perf_buf_map, *global_data;
struct bpf_program *prog, *fentry, *fexit;
struct bpf_object *obj, *obj2 = NULL;
struct perf_buffer_opts pb_opts = {};
struct perf_buffer *pb = NULL;
struct bpf_link *link = NULL;
struct bpf_map *perf_buf_map;
struct bpf_program *prog;
__u32 duration, retval;
int err, pkt_fd, kfree_skb_fd;
int err, kfree_skb_fd;
bool passed = false;
__u32 duration = 0;
const int zero = 0;
bool test_ok[2];
err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS, &obj, &pkt_fd);
err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS,
&obj, &tattr.prog_fd);
if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
return;
@ -51,9 +84,28 @@ void test_kfree_skb(void)
prog = bpf_object__find_program_by_title(obj2, "tp_btf/kfree_skb");
if (CHECK(!prog, "find_prog", "prog kfree_skb not found\n"))
goto close_prog;
fentry = bpf_object__find_program_by_title(obj2, "fentry/eth_type_trans");
if (CHECK(!fentry, "find_prog", "prog eth_type_trans not found\n"))
goto close_prog;
fexit = bpf_object__find_program_by_title(obj2, "fexit/eth_type_trans");
if (CHECK(!fexit, "find_prog", "prog eth_type_trans not found\n"))
goto close_prog;
global_data = bpf_object__find_map_by_name(obj2, "kfree_sk.bss");
if (CHECK(!global_data, "find global data", "not found\n"))
goto close_prog;
link = bpf_program__attach_raw_tracepoint(prog, NULL);
if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n", PTR_ERR(link)))
goto close_prog;
link_fentry = bpf_program__attach_trace(fentry);
if (CHECK(IS_ERR(link_fentry), "attach fentry", "err %ld\n",
PTR_ERR(link_fentry)))
goto close_prog;
link_fexit = bpf_program__attach_trace(fexit);
if (CHECK(IS_ERR(link_fexit), "attach fexit", "err %ld\n",
PTR_ERR(link_fexit)))
goto close_prog;
perf_buf_map = bpf_object__find_map_by_name(obj2, "perf_buf_map");
if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
@ -66,24 +118,37 @@ void test_kfree_skb(void)
if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
goto close_prog;
err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
NULL, NULL, &retval, &duration);
CHECK(err || retval, "ipv6",
memcpy(skb.cb, &cb, sizeof(cb));
err = bpf_prog_test_run_xattr(&tattr);
duration = tattr.duration;
CHECK(err || tattr.retval, "ipv6",
"err %d errno %d retval %d duration %d\n",
err, errno, retval, duration);
err, errno, tattr.retval, duration);
/* read perf buffer */
err = perf_buffer__poll(pb, 100);
if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
goto close_prog;
/* make sure kfree_skb program was triggered
* and it sent expected skb into ring buffer
*/
CHECK_FAIL(!passed);
err = bpf_map_lookup_elem(bpf_map__fd(global_data), &zero, test_ok);
if (CHECK(err, "get_result",
"failed to get output data: %d\n", err))
goto close_prog;
CHECK_FAIL(!test_ok[0] || !test_ok[1]);
close_prog:
perf_buffer__free(pb);
if (!IS_ERR_OR_NULL(link))
bpf_link__destroy(link);
if (!IS_ERR_OR_NULL(link_fentry))
bpf_link__destroy(link_fentry);
if (!IS_ERR_OR_NULL(link_fexit))
bpf_link__destroy(link_fexit);
bpf_object__close(obj);
bpf_object__close(obj2);
}

View File

@ -0,0 +1,220 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <sys/mman.h>
struct map_data {
__u64 val[512 * 4];
};
struct bss_data {
__u64 in_val;
__u64 out_val;
};
static size_t roundup_page(size_t sz)
{
long page_size = sysconf(_SC_PAGE_SIZE);
return (sz + page_size - 1) / page_size * page_size;
}
void test_mmap(void)
{
const char *file = "test_mmap.o";
const char *probe_name = "raw_tracepoint/sys_enter";
const char *tp_name = "sys_enter";
const size_t bss_sz = roundup_page(sizeof(struct bss_data));
const size_t map_sz = roundup_page(sizeof(struct map_data));
const int zero = 0, one = 1, two = 2, far = 1500;
const long page_size = sysconf(_SC_PAGE_SIZE);
int err, duration = 0, i, data_map_fd;
struct bpf_program *prog;
struct bpf_object *obj;
struct bpf_link *link = NULL;
struct bpf_map *data_map, *bss_map;
void *bss_mmaped = NULL, *map_mmaped = NULL, *tmp1, *tmp2;
volatile struct bss_data *bss_data;
volatile struct map_data *map_data;
__u64 val = 0;
obj = bpf_object__open_file("test_mmap.o", NULL);
if (CHECK(IS_ERR(obj), "obj_open", "failed to open '%s': %ld\n",
file, PTR_ERR(obj)))
return;
prog = bpf_object__find_program_by_title(obj, probe_name);
if (CHECK(!prog, "find_probe", "prog '%s' not found\n", probe_name))
goto cleanup;
err = bpf_object__load(obj);
if (CHECK(err, "obj_load", "failed to load prog '%s': %d\n",
probe_name, err))
goto cleanup;
bss_map = bpf_object__find_map_by_name(obj, "test_mma.bss");
if (CHECK(!bss_map, "find_bss_map", ".bss map not found\n"))
goto cleanup;
data_map = bpf_object__find_map_by_name(obj, "data_map");
if (CHECK(!data_map, "find_data_map", "data_map map not found\n"))
goto cleanup;
data_map_fd = bpf_map__fd(data_map);
bss_mmaped = mmap(NULL, bss_sz, PROT_READ | PROT_WRITE, MAP_SHARED,
bpf_map__fd(bss_map), 0);
if (CHECK(bss_mmaped == MAP_FAILED, "bss_mmap",
".bss mmap failed: %d\n", errno)) {
bss_mmaped = NULL;
goto cleanup;
}
/* map as R/W first */
map_mmaped = mmap(NULL, map_sz, PROT_READ | PROT_WRITE, MAP_SHARED,
data_map_fd, 0);
if (CHECK(map_mmaped == MAP_FAILED, "data_mmap",
"data_map mmap failed: %d\n", errno)) {
map_mmaped = NULL;
goto cleanup;
}
bss_data = bss_mmaped;
map_data = map_mmaped;
CHECK_FAIL(bss_data->in_val);
CHECK_FAIL(bss_data->out_val);
CHECK_FAIL(map_data->val[0]);
CHECK_FAIL(map_data->val[1]);
CHECK_FAIL(map_data->val[2]);
CHECK_FAIL(map_data->val[far]);
link = bpf_program__attach_raw_tracepoint(prog, tp_name);
if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n", PTR_ERR(link)))
goto cleanup;
bss_data->in_val = 123;
val = 111;
CHECK_FAIL(bpf_map_update_elem(data_map_fd, &zero, &val, 0));
usleep(1);
CHECK_FAIL(bss_data->in_val != 123);
CHECK_FAIL(bss_data->out_val != 123);
CHECK_FAIL(map_data->val[0] != 111);
CHECK_FAIL(map_data->val[1] != 222);
CHECK_FAIL(map_data->val[2] != 123);
CHECK_FAIL(map_data->val[far] != 3 * 123);
CHECK_FAIL(bpf_map_lookup_elem(data_map_fd, &zero, &val));
CHECK_FAIL(val != 111);
CHECK_FAIL(bpf_map_lookup_elem(data_map_fd, &one, &val));
CHECK_FAIL(val != 222);
CHECK_FAIL(bpf_map_lookup_elem(data_map_fd, &two, &val));
CHECK_FAIL(val != 123);
CHECK_FAIL(bpf_map_lookup_elem(data_map_fd, &far, &val));
CHECK_FAIL(val != 3 * 123);
/* data_map freeze should fail due to R/W mmap() */
err = bpf_map_freeze(data_map_fd);
if (CHECK(!err || errno != EBUSY, "no_freeze",
"data_map freeze succeeded: err=%d, errno=%d\n", err, errno))
goto cleanup;
/* unmap R/W mapping */
err = munmap(map_mmaped, map_sz);
map_mmaped = NULL;
if (CHECK(err, "data_map_munmap", "data_map munmap failed: %d\n", errno))
goto cleanup;
/* re-map as R/O now */
map_mmaped = mmap(NULL, map_sz, PROT_READ, MAP_SHARED, data_map_fd, 0);
if (CHECK(map_mmaped == MAP_FAILED, "data_mmap",
"data_map R/O mmap failed: %d\n", errno)) {
map_mmaped = NULL;
goto cleanup;
}
map_data = map_mmaped;
/* map/unmap in a loop to test ref counting */
for (i = 0; i < 10; i++) {
int flags = i % 2 ? PROT_READ : PROT_WRITE;
void *p;
p = mmap(NULL, map_sz, flags, MAP_SHARED, data_map_fd, 0);
if (CHECK_FAIL(p == MAP_FAILED))
goto cleanup;
err = munmap(p, map_sz);
if (CHECK_FAIL(err))
goto cleanup;
}
/* data_map freeze should now succeed due to no R/W mapping */
err = bpf_map_freeze(data_map_fd);
if (CHECK(err, "freeze", "data_map freeze failed: err=%d, errno=%d\n",
err, errno))
goto cleanup;
/* mapping as R/W now should fail */
tmp1 = mmap(NULL, map_sz, PROT_READ | PROT_WRITE, MAP_SHARED,
data_map_fd, 0);
if (CHECK(tmp1 != MAP_FAILED, "data_mmap", "mmap succeeded\n")) {
munmap(tmp1, map_sz);
goto cleanup;
}
bss_data->in_val = 321;
usleep(1);
CHECK_FAIL(bss_data->in_val != 321);
CHECK_FAIL(bss_data->out_val != 321);
CHECK_FAIL(map_data->val[0] != 111);
CHECK_FAIL(map_data->val[1] != 222);
CHECK_FAIL(map_data->val[2] != 321);
CHECK_FAIL(map_data->val[far] != 3 * 321);
/* check some more advanced mmap() manipulations */
/* map all but last page: pages 1-3 mapped */
tmp1 = mmap(NULL, 3 * page_size, PROT_READ, MAP_SHARED,
data_map_fd, 0);
if (CHECK(tmp1 == MAP_FAILED, "adv_mmap1", "errno %d\n", errno))
goto cleanup;
/* unmap second page: pages 1, 3 mapped */
err = munmap(tmp1 + page_size, page_size);
if (CHECK(err, "adv_mmap2", "errno %d\n", errno)) {
munmap(tmp1, map_sz);
goto cleanup;
}
/* map page 2 back */
tmp2 = mmap(tmp1 + page_size, page_size, PROT_READ,
MAP_SHARED | MAP_FIXED, data_map_fd, 0);
if (CHECK(tmp2 == MAP_FAILED, "adv_mmap3", "errno %d\n", errno)) {
munmap(tmp1, page_size);
munmap(tmp1 + 2*page_size, page_size);
goto cleanup;
}
CHECK(tmp1 + page_size != tmp2, "adv_mmap4",
"tmp1: %p, tmp2: %p\n", tmp1, tmp2);
/* re-map all 4 pages */
tmp2 = mmap(tmp1, 4 * page_size, PROT_READ, MAP_SHARED | MAP_FIXED,
data_map_fd, 0);
if (CHECK(tmp2 == MAP_FAILED, "adv_mmap5", "errno %d\n", errno)) {
munmap(tmp1, 3 * page_size); /* unmap page 1 */
goto cleanup;
}
CHECK(tmp1 != tmp2, "adv_mmap6", "tmp1: %p, tmp2: %p\n", tmp1, tmp2);
map_data = tmp2;
CHECK_FAIL(bss_data->in_val != 321);
CHECK_FAIL(bss_data->out_val != 321);
CHECK_FAIL(map_data->val[0] != 111);
CHECK_FAIL(map_data->val[1] != 222);
CHECK_FAIL(map_data->val[2] != 321);
CHECK_FAIL(map_data->val[far] != 3 * 321);
munmap(tmp2, 4 * page_size);
cleanup:
if (bss_mmaped)
CHECK_FAIL(munmap(bss_mmaped, bss_sz));
if (map_mmaped)
CHECK_FAIL(munmap(map_mmaped, map_sz));
if (!IS_ERR_OR_NULL(link))
bpf_link__destroy(link);
bpf_object__close(obj);
}

View File

@ -163,12 +163,15 @@ void test_pinning(void)
goto out;
}
/* swap pin paths of the two maps */
/* set pin paths so that nopinmap2 will attempt to reuse the map at
* pinpath (which will fail), but not before pinmap has already been
* reused
*/
bpf_object__for_each_map(map, obj) {
if (!strcmp(bpf_map__name(map), "nopinmap"))
err = bpf_map__set_pin_path(map, nopinpath2);
else if (!strcmp(bpf_map__name(map), "nopinmap2"))
err = bpf_map__set_pin_path(map, pinpath);
else if (!strcmp(bpf_map__name(map), "pinmap"))
err = bpf_map__set_pin_path(map, NULL);
else
continue;
@ -181,6 +184,17 @@ void test_pinning(void)
if (CHECK(err != -EINVAL, "param mismatch load", "err %d errno %d\n", err, errno))
goto out;
/* nopinmap2 should have been pinned and cleaned up again */
err = stat(nopinpath2, &statbuf);
if (CHECK(!err || errno != ENOENT, "stat nopinpath2",
"err %d errno %d\n", err, errno))
goto out;
/* pinmap should still be there */
err = stat(pinpath, &statbuf);
if (CHECK(err, "stat pinpath", "err %d errno %d\n", err, errno))
goto out;
bpf_object__close(obj);
/* test auto-pinning at custom path with open opt */

View File

@ -0,0 +1,3 @@
#include "core_reloc_types.h"
void f(struct core_reloc_arrays___err_wrong_val_type x) {}

View File

@ -1,3 +0,0 @@
#include "core_reloc_types.h"
void f(struct core_reloc_arrays___err_wrong_val_type1 x) {}

View File

@ -1,3 +0,0 @@
#include "core_reloc_types.h"
void f(struct core_reloc_arrays___err_wrong_val_type2 x) {}

View File

@ -0,0 +1,3 @@
#include "core_reloc_types.h"
void f(struct core_reloc_bitfields x) {}

View File

@ -0,0 +1,3 @@
#include "core_reloc_types.h"
void f(struct core_reloc_bitfields___bit_sz_change x) {}

View File

@ -0,0 +1,3 @@
#include "core_reloc_types.h"
void f(struct core_reloc_bitfields___bitfield_vs_int x) {}

View File

@ -0,0 +1,3 @@
#include "core_reloc_types.h"
void f(struct core_reloc_bitfields___err_too_big_bitfield x) {}

View File

@ -0,0 +1,3 @@
#include "core_reloc_types.h"
void f(struct core_reloc_bitfields___just_big_enough x) {}

View File

@ -1,3 +0,0 @@
#include "core_reloc_types.h"
void f(struct core_reloc_ints___err_bitfield x) {}

View File

@ -1,3 +0,0 @@
#include "core_reloc_types.h"
void f(struct core_reloc_ints___err_wrong_sz_16 x) {}

View File

@ -1,3 +0,0 @@
#include "core_reloc_types.h"
void f(struct core_reloc_ints___err_wrong_sz_32 x) {}

View File

@ -1,3 +0,0 @@
#include "core_reloc_types.h"
void f(struct core_reloc_ints___err_wrong_sz_64 x) {}

Some files were not shown because too many files have changed in this diff Show More